/Engineering

llms.txt vs sitemap.xml: What Each File Does for AI Discovery

sitemap.xml tells crawlers what exists. llms.txt tells AI agents what matters. If you run docs in 2026, you probably want both.

F

Faizan Khan

2026-06-01 • 8 min read

If you're asking whether llms.txt replaces sitemap.xml, you're asking the wrong question. They solve different problems.

sitemap.xml is about completeness. It tells crawlers which URLs exist.

llms.txt is about judgment. It tells AI agents which pages are worth reading first.

That distinction matters because documentation discovery is no longer just "can Google find this page?" It is also "if an agent needs to solve a task, where should it start?"

For most docs teams, the right answer is simple: keep your sitemap, add llms.txt, and do not try to make one impersonate the other.

If you need the basics first, start with What Is llms.txt? A Practical Guide for SaaS Docs Teams. If you want examples of good llms.txt structure, read llms.txt Examples: Real Patterns for API Docs, Help Centers, and Developer Docs.

What `sitemap.xml` Does

sitemap.xml is a crawler inventory.

Its job is straightforward:

list URLs
optionally include lastmod
help search engines discover pages
help search engines prioritize crawling

A typical sitemap entry looks like this:

XML

1<url>
2  <loc>https://docs.example.com/authentication</loc>
3  <lastmod>2026-05-30</lastmod>
4</url>

There is no editorial intent here. The sitemap is not trying to say "read this page first" or "these are the three pages that matter most for API onboarding." It is just telling a crawler what exists.

That is exactly what it should do.

For Google and traditional search engines, this is useful. For AI agents trying to answer a question or complete a task, it is often not enough.

What `llms.txt` Does

llms.txt is a curated docs guide for agents.

Its job is different:

explain what the docs set covers
point to the most important pages
group links by real tasks or concepts
reduce ambiguity about where an agent should start

A typical llms.txt section looks like this:

Markdown

1# Acme API Docs
2
3Developer documentation for Acme's REST API. Covers authentication,
4webhooks, rate limits, and SDK setup.
5
6## Start Here
7
8- [Quickstart](https://docs.acme.com/quickstart): Make your first request
9- [Authentication](https://docs.acme.com/authentication): API keys and OAuth
10- [Errors](https://docs.acme.com/errors): Error codes and retry guidance

This is not a full site inventory. It is a small, opinionated map.

That makes it useful in the exact places where a sitemap is weak:

"what page should I read first?"
"which auth path is canonical?"
"where are the webhook docs?"
"what should I look at before writing code?"

The Real Difference: Completeness vs Curation

The difference is not XML versus Markdown.

The real difference is this:

sitemap.xml optimizes for completeness
llms.txt optimizes for usefulness

That one distinction explains most of the confusion.

If you generate llms.txt straight from your sitemap, you usually lose the thing that makes llms.txt valuable. You get a second, worse sitemap.

If you try to use a sitemap as a task guide, you get a giant URL dump with no editorial signal.

They overlap in the broad sense that both help discovery. But they help different kinds of discovery.

Side-by-Side

Here is the simplest way to think about them:

	`sitemap.xml`	`llms.txt`
Primary audience	search engine crawlers	AI agents and tools
Format	XML	plain text / Markdown
Main purpose	list what exists	point to what matters
Coverage	exhaustive	selective
Maintenance style	generated	curated
Best for	crawl discovery	task-oriented docs guidance
Bad at	telling agents where to start	representing every page on the site

That table is more useful than arguing about whether one is "better."

They are not substitutes. They sit at different layers.

When a Sitemap Is Enough

For some jobs, sitemap.xml is enough.

If your goal is:

making sure Google can discover your pages
exposing a large docs surface to traditional crawling
tracking freshness through lastmod
helping search engines notice newly published docs

then the sitemap is doing exactly what you need.

A sitemap is also better whenever completeness matters more than editorial guidance.

For example:

versioned docs with lots of pages
large reference surfaces
generated API docs

In those cases, you still want the sitemap even if you also publish llms.txt.

When `llms.txt` Changes the Outcome

llms.txt matters when an agent needs to do more than just discover pages.

It matters when the agent needs help choosing.

Examples:

your docs have both OAuth and API key auth, but one is the recommended default
your product has three SDKs, but most users should start with one
your help center has 200 pages, but only 8 solve most support tasks
your developer docs have architecture pages that matter before implementation

These are not crawl problems. They are prioritization problems.

That is what llms.txt helps with.

The file is valuable because it captures editorial judgment in a machine-readable way.

The Common Mistake: Turning `llms.txt` into a Sitemap Clone

This is probably the most common failure mode.

Teams publish llms.txt, but instead of curating it, they dump every docs URL into the file.

At that point, it stops being helpful.

You end up with:

a longer file
more ambiguity
weaker prioritization
no clear starting path

If the file is just a second inventory, agents still have to guess where to begin.

A good llms.txt should feel like a senior engineer narrowing the search space, not a crawler export.

The Other Common Mistake: Expecting `llms.txt` to Replace Search Infrastructure

The opposite mistake is also common.

People hear "AI discovery" and assume llms.txt should replace traditional discovery files.

It should not.

You still want:

sitemap.xml
canonical URLs
clean internal linking
crawlable pages
good metadata

llms.txt is additive. It does not make your search infrastructure irrelevant.

If your sitemap is broken, llms.txt does not save you. If your docs structure is weak, llms.txt only papers over part of the problem.

A Practical Default for Docs Teams

If you run a docs site in 2026, the default setup is pretty simple:

Keep `sitemap.xml` exhaustive

Let it do the boring job:

include the important public docs pages
generate it automatically
update lastmod

Keep `llms.txt` short and curated

Let it do the editorial job:

quickstart
auth
errors
webhooks
SDKs
a few core concept pages if needed

Do not make them mirror each other

If both files contain the same long list of URLs, you probably are not getting much value from llms.txt.

What to Ship First

If you only have time for one hour of work, do this:

make sure your sitemap is present and healthy
create a small llms.txt with 10 to 20 important links
group those links by real tasks, not by internal nav labels

That gets you most of the benefit.

You can refine from there.

What Neither File Solves

It is worth being explicit about the limit.

Neither sitemap.xml nor llms.txt solves:

bloated HTML payloads
weak docs IA
broken examples
unclear product boundaries
missing implementation guidance

And neither one tells the agent how to use your product well in the deeper sense.

That is why we keep treating llms.txt as a discovery layer, not the whole AI-readable docs story. If you want the systems argument for that, read llms.txt Isn't Enough.

The Short Version

If you want one sentence:

sitemap.xml tells crawlers what exists. llms.txt tells agents what matters.

Most teams should ship both.

One is infrastructure.

The other is judgment.

llms.txt vs sitemap.xml: What Each File Does for AI Discovery

What sitemap.xml Does

What llms.txt Does

The Real Difference: Completeness vs Curation

Side-by-Side

When a Sitemap Is Enough

When llms.txt Changes the Outcome

The Common Mistake: Turning llms.txt into a Sitemap Clone

The Other Common Mistake: Expecting llms.txt to Replace Search Infrastructure

A Practical Default for Docs Teams

Keep sitemap.xml exhaustive

Keep llms.txt short and curated