← Back to blog/Engineering

llms.txt vs sitemap.xml: What Each File Does for AI Discovery

sitemap.xml tells crawlers what exists. llms.txt tells AI agents what matters. If you run docs in 2026, you probably want both.

F
Faizan Khan
2026-06-01 • 8 min read

If you're asking whether llms.txt replaces sitemap.xml, you're asking the wrong question. They solve different problems.

sitemap.xml is about completeness. It tells crawlers which URLs exist.

llms.txt is about judgment. It tells AI agents which pages are worth reading first.

That distinction matters because documentation discovery is no longer just "can Google find this page?" It is also "if an agent needs to solve a task, where should it start?"

For most docs teams, the right answer is simple: keep your sitemap, add llms.txt, and do not try to make one impersonate the other.

If you need the basics first, start with What Is llms.txt? A Practical Guide for SaaS Docs Teams. If you want examples of good llms.txt structure, read llms.txt Examples: Real Patterns for API Docs, Help Centers, and Developer Docs.


What sitemap.xml Does

sitemap.xml is a crawler inventory.

Its job is straightforward:

  • list URLs
  • optionally include lastmod
  • help search engines discover pages
  • help search engines prioritize crawling

A typical sitemap entry looks like this:

XML
1<url>
2 <loc>https://docs.example.com/authentication</loc>
3 <lastmod>2026-05-30</lastmod>
4</url>

There is no editorial intent here. The sitemap is not trying to say "read this page first" or "these are the three pages that matter most for API onboarding." It is just telling a crawler what exists.

That is exactly what it should do.

For Google and traditional search engines, this is useful. For AI agents trying to answer a question or complete a task, it is often not enough.


What llms.txt Does

llms.txt is a curated docs guide for agents.

Its job is different:

  • explain what the docs set covers
  • point to the most important pages
  • group links by real tasks or concepts
  • reduce ambiguity about where an agent should start

A typical llms.txt section looks like this:

Markdown
1# Acme API Docs
2
3Developer documentation for Acme's REST API. Covers authentication,
4webhooks, rate limits, and SDK setup.
5
6## Start Here
7
8- [Quickstart](https://docs.acme.com/quickstart): Make your first request
9- [Authentication](https://docs.acme.com/authentication): API keys and OAuth
10- [Errors](https://docs.acme.com/errors): Error codes and retry guidance

This is not a full site inventory. It is a small, opinionated map.

That makes it useful in the exact places where a sitemap is weak:

  • "what page should I read first?"
  • "which auth path is canonical?"
  • "where are the webhook docs?"
  • "what should I look at before writing code?"

The Real Difference: Completeness vs Curation

The difference is not XML versus Markdown.

The real difference is this:

  • sitemap.xml optimizes for completeness
  • llms.txt optimizes for usefulness

That one distinction explains most of the confusion.

If you generate llms.txt straight from your sitemap, you usually lose the thing that makes llms.txt valuable. You get a second, worse sitemap.

If you try to use a sitemap as a task guide, you get a giant URL dump with no editorial signal.

They overlap in the broad sense that both help discovery. But they help different kinds of discovery.


Side-by-Side

Here is the simplest way to think about them:

sitemap.xmlllms.txt
Primary audiencesearch engine crawlersAI agents and tools
FormatXMLplain text / Markdown
Main purposelist what existspoint to what matters
Coverageexhaustiveselective
Maintenance stylegeneratedcurated
Best forcrawl discoverytask-oriented docs guidance
Bad attelling agents where to startrepresenting every page on the site

That table is more useful than arguing about whether one is "better."

They are not substitutes. They sit at different layers.


When a Sitemap Is Enough

For some jobs, sitemap.xml is enough.

If your goal is:

  • making sure Google can discover your pages
  • exposing a large docs surface to traditional crawling
  • tracking freshness through lastmod
  • helping search engines notice newly published docs

then the sitemap is doing exactly what you need.

A sitemap is also better whenever completeness matters more than editorial guidance.

For example:

  • versioned docs with lots of pages
  • large reference surfaces
  • generated API docs

In those cases, you still want the sitemap even if you also publish llms.txt.


When llms.txt Changes the Outcome

llms.txt matters when an agent needs to do more than just discover pages.

It matters when the agent needs help choosing.

Examples:

  • your docs have both OAuth and API key auth, but one is the recommended default
  • your product has three SDKs, but most users should start with one
  • your help center has 200 pages, but only 8 solve most support tasks
  • your developer docs have architecture pages that matter before implementation

These are not crawl problems. They are prioritization problems.

That is what llms.txt helps with.

The file is valuable because it captures editorial judgment in a machine-readable way.


The Common Mistake: Turning llms.txt into a Sitemap Clone

This is probably the most common failure mode.

Teams publish llms.txt, but instead of curating it, they dump every docs URL into the file.

At that point, it stops being helpful.

You end up with:

  • a longer file
  • more ambiguity
  • weaker prioritization
  • no clear starting path

If the file is just a second inventory, agents still have to guess where to begin.

A good llms.txt should feel like a senior engineer narrowing the search space, not a crawler export.


The Other Common Mistake: Expecting llms.txt to Replace Search Infrastructure

The opposite mistake is also common.

People hear "AI discovery" and assume llms.txt should replace traditional discovery files.

It should not.

You still want:

  • sitemap.xml
  • canonical URLs
  • clean internal linking
  • crawlable pages
  • good metadata

llms.txt is additive. It does not make your search infrastructure irrelevant.

If your sitemap is broken, llms.txt does not save you. If your docs structure is weak, llms.txt only papers over part of the problem.


A Practical Default for Docs Teams

If you run a docs site in 2026, the default setup is pretty simple:

Keep sitemap.xml exhaustive

Let it do the boring job:

  • include the important public docs pages
  • generate it automatically
  • update lastmod

Keep llms.txt short and curated

Let it do the editorial job:

  • quickstart
  • auth
  • errors
  • webhooks
  • SDKs
  • a few core concept pages if needed

Do not make them mirror each other

If both files contain the same long list of URLs, you probably are not getting much value from llms.txt.


What to Ship First

If you only have time for one hour of work, do this:

  1. make sure your sitemap is present and healthy
  2. create a small llms.txt with 10 to 20 important links
  3. group those links by real tasks, not by internal nav labels

That gets you most of the benefit.

You can refine from there.


What Neither File Solves

It is worth being explicit about the limit.

Neither sitemap.xml nor llms.txt solves:

  • bloated HTML payloads
  • weak docs IA
  • broken examples
  • unclear product boundaries
  • missing implementation guidance

And neither one tells the agent how to use your product well in the deeper sense.

That is why we keep treating llms.txt as a discovery layer, not the whole AI-readable docs story. If you want the systems argument for that, read llms.txt Isn't Enough.


The Short Version

If you want one sentence:

sitemap.xml tells crawlers what exists. llms.txt tells agents what matters.

Most teams should ship both.

One is infrastructure.

The other is judgment.

More Articles