---
title: The complete guide to agent readability
description: How AI agents discover, parse, and cite your site — and how to make sure they can. Covers llms.txt, AGENTS.md, MCP cards, JSON-LD, and the Vercel Agent Readability Spec.
last_updated: 2026-05-12
canonical_url: https://agent-ready.dev/complete-guide-to-agent-readability
---

# The complete guide to agent readability

> How AI agents discover, parse, and cite your site — and how to make sure they can.

## What is agent readability?

Agent readability is how easily an autonomous software client — an AI agent, an LLM-backed search engine, a coding assistant — can fetch your site, understand its structure, and extract the specific passage it needs. It is the machine-facing twin of accessibility: the same content delivered through a channel the consumer can actually parse.

In practice it spans three layers. **Discovery** — can the agent even find the right page (llms.txt, sitemap.xml, robots.txt, AGENTS.md). **Extraction** — once on the page, can it parse the content cleanly (HTTP 200, no aggressive client-side rendering, semantic headings, JSON-LD, a markdown mirror). **Programmatic interaction** — if the agent wants to call your tools or pay for access, are the right manifests at the right well-known paths (MCP, A2A, agents.json, x402). The [Vercel Agent Readability Spec](https://vercel.com/kb/guide/agent-readability-spec) is the most complete public specification across all three layers; Agent Ready is an independent validator that scores sites against it.

## Why does agent readability matter?

Two things changed in 2025. First, AI Overviews became the default experience on Google and Bing — a synthesised answer renders above the link list, and [Pew Research found](https://www.pewresearch.org/short-reads/2025/07/22/google-users-rarely-click-on-links-when-they-see-an-ai-generated-summary/) that users click through to a source page about half as often when an AI summary is present. Second, AI assistants (ChatGPT, Claude, Perplexity, Gemini) became the place where many people start a research session at all — the search box isn't a search box anymore, it's a prompt.

Both shifts move the prize away from *click share* and toward *citation share*. The page that gets paraphrased in the generated answer is the page that builds brand recognition, even when nobody clicks through. The page that gets cited as a source link gets the high-intent click that does happen.

Citation share is heavily structural. [Princeton's GEO study](https://arxiv.org/abs/2311.09735) (KDD 2024) showed citation visibility lifting by up to 40% when pages were rewritten with cleaner structure, inline citations of their own, and quotable statistics — the same affordances that make a page agent-ready.

## What does an AI agent actually do when it visits your site?

A typical AI-agent fetch is not one HTTP request — it is a fan-out across several well-known paths. A modern crawler or answer engine, on first contact with a domain, will probe roughly the following in parallel:

- `/robots.txt` — am I allowed to crawl, and which user-agent strings does this site call out?
- `/sitemap.xml` — what does the site say its canonical URL list is?
- `/llms.txt` — is there a curated, token-efficient summary of the site for AI consumption?
- `/AGENTS.md` — is this a codebase or docs site with a skill file briefing coding agents?
- `/.well-known/mcp.json`, `/.well-known/agent-card.json`, `/agents.json` — agent-protocol manifests advertising tools, APIs, and capabilities.

Once a target page is fetched, the agent reads the HTML head (canonical, og:title, JSON-LD), evaluates whether the body has enough static text to extract, and may re-fetch the same URL with `Accept: text/markdown` to get a cleaner variant.

## How do AI agents discover your content?

Four files dominate discovery. None ship by default in any framework, and the absence of any one of them is the most common reason a site scores below 'good'.

**llms.txt** — a markdown file at `/llms.txt` that gives AI agents a curated directory of the site. Defined by [llmstxt.org](https://llmstxt.org). Validate with the [llms.txt checker](https://agent-ready.dev/llms-txt-checker).

**robots.txt** — same file, now read by AI bots too. The risk is the inverse of the usual one — most sites' robots.txt accidentally blocks AI bots (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot). The file must also not block `/llms.txt` or `/AGENTS.md`.

**sitemap.xml** — still the source of truth for canonical URL enumeration. Two checks matter: valid XML against the sitemaps.org schema, and a meaningful `<lastmod>` on every entry.

**AGENTS.md** — the skill-file convention for coding agents (Codex, Claude Code, Cursor, Aider). A short markdown brief at the repo root. Validate with the [AGENTS.md validator](https://agent-ready.dev/agents-md-validator).

## How do AI agents extract content from a single page?

Once the agent has chosen a URL, the page itself has to be parseable. The Vercel spec breaks this into three concerns:

**Transport** — HTTP 200, correct `Content-Type`, no more than one redirect hop, no `x-robots-tag` with `noai`/`noimageai`/`noindex`. Redirect chains kill agent crawl budgets fast.

**Metadata** — canonical link, html lang, og:title, og:description, meta description > 50 chars, and at least one JSON-LD block. `Article`, `FAQPage`, `HowTo`, and `BreadcrumbList` are the four schema.org types that earn the most citation lift.

**Signal-to-noise** — meaningful text-to-HTML ratio, ≥ 3 semantic section headings, and ideally a markdown mirror at `/<slug>.md` with content-negotiation routing for clients sending `Accept: text/markdown`. Heavy SPA shells whose initial HTML is empty until hydration score poorly across the board.

## What about MCP, A2A, agents.json, and the other protocol manifests?

The protocol layer is conditional. None of these files are required for a content site — a marketing page scored without protocol checks won't be penalised. They become required only when your site actively participates in the relevant protocol.

- **[MCP](https://modelcontextprotocol.io)** (Model Context Protocol) — discovery for AI clients calling remote tools and resources. Card at `/.well-known/mcp.json` per SEP-1649. Validate with [MCP Server Card validator](https://agent-ready.dev/mcp-card-validator).
- **[A2A](https://a2a-protocol.org)** (Agent-to-Agent) — manifest at `/.well-known/agent-card.json`. Validate with [A2A Agent Card validator](https://agent-ready.dev/agent-card-validator).
- **[agents.json](https://github.com/wild-card-ai/agents-json)** — OpenAPI-anchored manifest at `/agents.json` or `/.well-known/agents.json`. Validate with [agents.json validator](https://agent-ready.dev/agents-json-validator).
- **agent-permissions, UCP, x402** — newer manifests for declared permission scopes, agent-driven commerce, and HTTP 402 micropayments. Validators at [/agent-permissions-validator](https://agent-ready.dev/agent-permissions-validator), [/ucp-validator](https://agent-ready.dev/ucp-validator), and [/x402-checker](https://agent-ready.dev/x402-checker).

## How does agent readability differ from traditional SEO?

The fundamentals overlap. HTTPS, canonical URLs, valid sitemaps, JSON-LD, fast TTFB, clean semantic markup — every page that scores well on a Lighthouse SEO audit picks up most of what an agent needs. The divergence starts at the next layer.

| Concern | Traditional SEO | Agent readability |
|---|---|---|
| Consumer | Human reader after a click | Autonomous client extracting a passage |
| Goal metric | Click-through rate, rank position | Citation share, paraphrase fidelity |
| Discovery files | robots.txt, sitemap.xml | + llms.txt, AGENTS.md, /.well-known/* |
| Page checks | Title, meta, headings, schema | + Markdown mirror, content negotiation, JS-render guard |
| Robots policy | Block scrapers, allow Googlebot | Allow GPTBot, ClaudeBot, Google-Extended, Perplexity |
| Programmatic surface | N/A | MCP, A2A, agents.json, x402 |

An agent-ready site is almost always SEO-good as a byproduct. The reverse isn't true.

## Which validator should you run first?

For a fast triage, run the full [agent-readability score](https://agent-ready.dev/agent-readability-score) — all 59 checks with fix guidance. For targeted re-checks, the per-spec validators are faster:

| Validator | What it checks |
|---|---|
| [llms.txt validator](https://agent-ready.dev/llms-txt-checker) | 10 checks against the llmstxt.org spec |
| [AGENTS.md validator](https://agent-ready.dev/agents-md-validator) | Required sections, structure, and skill-file conventions |
| [MCP Server Card validator](https://agent-ready.dev/mcp-card-validator) | /.well-known/mcp.json against SEP-1649 required fields |
| [A2A Agent Card validator](https://agent-ready.dev/agent-card-validator) | /.well-known/agent-card.json against the A2A spec |
| [agents.json validator](https://agent-ready.dev/agents-json-validator) | Wildcard agents.json manifest at /agents.json |
| [agent-permissions.json validator](https://agent-ready.dev/agent-permissions-validator) | Permission-scope declarations at /.well-known/agent-permissions.json |
| [UCP validator](https://agent-ready.dev/ucp-validator) | Universal Commerce Protocol profile at /.well-known/ucp |
| [x402 checker](https://agent-ready.dev/x402-checker) | HTTP 402 Payment Required handshake for agent payments |
| [Full agent-readability score](https://agent-ready.dev/agent-readability-score) | All 59 checks combined into a 0–100 score with a fix list |

## What should you fix first?

Sequence the work by leverage. Each step assumes the previous one is done.

1. **Audit robots.txt.** Confirm GPTBot, ClaudeBot, Google-Extended, CCBot, PerplexityBot are allowed and that `/llms.txt` is not blocked.
2. **Ship /llms.txt.** Hand-author a 30-line file with H1, blockquote summary, and 2–3 H2 link sections. Reference: [how to add llms.txt to a Next.js site](https://agent-ready.dev/how-to-add-llms-txt-to-nextjs).
3. **Page metadata sweep.** Every page needs canonical, og:title, og:description, html lang, and one JSON-LD block.
4. **AGENTS.md for codebases.** Reference: [how to write an effective AGENTS.md](https://agent-ready.dev/how-to-write-an-effective-agents-md).
5. **Markdown mirrors + content negotiation.** For high-value pages, ship a `/<slug>.md` variant and rewrite requests carrying `Accept: text/markdown`. Largest single lift for AI-extraction fidelity.
6. **Protocol manifests, if applicable.** If you host an MCP server, ship `/.well-known/mcp.json`. If your APIs are agent-callable, ship `/agents.json`. Otherwise skip — protocol checks drop rather than fail when the relevant endpoint is absent.

## How is agent-readability scored?

Agent Ready maps each check to `pass`, `warn`, or `fail`, then produces a 0–100 score and one of four rating bands: excellent (90–100), good (70–89), fair (50–69), needs improvement (0–49). Full scoring logic, llms.txt sub-score weighting, and JS-render handling are documented in the [methodology](https://agent-ready.dev/methodology) doc.

## Frequently asked questions

### Is agent readability the same as SEO?

No. Traditional SEO optimises for human readers reaching your page via a search-results click. Agent readability optimises for autonomous software that fetches your page, extracts a passage, and either acts on it or cites it in a generated answer. They overlap on the basics (HTTPS, canonical URLs, structured data) but diverge sharply on the next layer: SEO doesn't care about /llms.txt, /.well-known/mcp.json, or content-negotiated markdown mirrors. An agent-ready site is usually SEO-good as a byproduct, but the reverse isn't true.

### Do I need every one of these files — llms.txt, AGENTS.md, MCP cards, agents.json?

No. Ship the ones that match what your site actually does. Every site benefits from llms.txt and a sitemap. Codebases and documentation sites should add AGENTS.md so coding agents have a brief. Any site that exposes an MCP server publishes /.well-known/mcp.json. Sites that ship A2A-compatible agents publish /.well-known/agent-card.json. The protocol manifests are conditional — Agent Ready drops the check rather than failing it when the relevant endpoint is absent, so a marketing site doesn't score itself against agent protocols it has no reason to ship.

### How long does it take to make a site agent-ready?

An hour for the foundations on most sites: add /llms.txt, verify robots.txt isn't blocking AI bots, confirm every page has a canonical URL, og:title, og:description, and JSON-LD. A second pass — markdown mirrors, AGENTS.md, content negotiation — typically takes a day. Protocol manifests (MCP, A2A, agents.json) only make sense if you actually run that protocol, and the work is measured in hours of config rather than days of engineering.

### Will agent readability lift my traffic in 2026?

Probably not directly — AI Overviews reduce click-through, they don't increase it. The play is citation share, not click share. Princeton's GEO study (KDD 2024) shows that structurally clean, citation-rich pages get cited disproportionately in generated answers, and citations are the surface where users decide which brand to trust. The traffic that does click through is higher-intent because the user has already read the synthesised answer.

### How do I check what bots are crawling my site?

Look at your access logs for the AI user-agents: GPTBot (OpenAI), ClaudeBot and Claude-User (Anthropic), CCBot (Common Crawl), Google-Extended (Google AI training), PerplexityBot, Amazonbot, and Bytespider. If you see frequent 404s on /llms.txt, /AGENTS.md, /sitemap.xml, or /.well-known/* paths, agents are probing for files you haven't shipped — that's the prioritised fix list.

---

Read the full guide on the web: <https://agent-ready.dev/complete-guide-to-agent-readability>

Scan your site: <https://agent-ready.dev>

## Sitemap

See the full [sitemap](https://agent-ready.dev/sitemap.md) for all pages on agent-ready.dev.
