What is agent readability?
Agent readability is how easily an autonomous software client — an AI agent, an LLM-backed search engine, a coding assistant — can fetch your site, understand its structure, and extract the specific passage it needs. It is the machine-facing twin of accessibility: the same content delivered through a channel the consumer can actually parse.
In practice it spans three layers. Discovery — can the agent even find the right page (llms.txt, sitemap.xml, robots.txt, AGENTS.md). Extraction — once on the page, can it parse the content cleanly (HTTP 200, no aggressive client-side rendering, semantic headings, JSON-LD, a markdown mirror). Programmatic interaction— if the agent wants to call your tools or pay for access, are the right manifests at the right well-known paths (MCP, A2A, agents.json, x402). The Vercel Agent Readability Spec is the most complete public specification across all three layers; Agent Ready is an independent validator that scores sites against it.
Why does agent readability matter?
Two things changed in 2025. First, AI Overviews became the default experience on Google and Bing — a synthesised answer renders above the link list, and Pew Research found that users click through to a source page about half as often when an AI summary is present. Second, AI assistants (ChatGPT, Claude, Perplexity, Gemini) became the place where many people start a research session at all — the search box isn’t a search box anymore, it’s a prompt.
Both shifts move the prize away from click share and toward citation share. The page that gets paraphrased in the generated answer is the page that builds brand recognition, even when nobody clicks through. The page that gets cited as a source link gets the high-intent click that does happen.
Citation share is heavily structural. Princeton’s GEO study (KDD 2024) showed citation visibility lifting by up to 40% when pages were rewritten with cleaner structure, inline citations of their own, and quotable statistics — the same affordances that make a page agent-ready. Sites that ship the basics get cited more; sites that skip them get paraphrased without attribution.
What does an AI agent actually do when it visits your site?
A typical AI-agent fetch is not one HTTP request — it is a fan-out across several well-known paths. A modern crawler or answer engine, on first contact with a domain, will probe roughly the following in parallel:
/robots.txt— am I allowed to crawl, and which user-agent strings does this site call out?/sitemap.xml— what does the site say its canonical URL list is?/llms.txt— is there a curated, token-efficient summary of the site for AI consumption?/AGENTS.md— is this a codebase or docs site with a skill file briefing coding agents?/.well-known/mcp.json,/.well-known/agent-card.json,/agents.json— agent-protocol manifests advertising tools, APIs, and capabilities.
Once a target page is fetched, the agent reads the HTML head (canonical, og:title, JSON-LD), evaluates whether the body has enough static text to extract, and may re-fetch the same URL with Accept: text/markdown to get a cleaner variant. If a passage is extracted, the agent records the source URL and either acts on it or attributes it in the generated answer.
How do AI agents discover your content?
Discovery is dominated by four files. None of them ship by default in any framework, and the absence of any one of them is the most common reason a site scores below ‘good’.
llms.txt
A markdown file at /llms.txt that gives AI agents a curated directory of the site. Defined by the llmstxt.org standard, it must start with an H1 title, a blockquote summary, and one or more H2-bounded link sections. The point is token efficiency: an agent fetches one ~50 KB file instead of crawling the full sitemap. Validate with the llms.txt checker.
robots.txt
The same robots.txt that’s served traditional crawlers for 25 years, now read by AI bots too. The risk is the inverse of the usual one — most sites’ robots.txt accidentally blocks AI bots (GPTBot, ClaudeBot, Google-Extended, PerplexityBot, CCBot) because the defaults bundled with a CMS predate the AI user-agents. Audit your robots.txt against the current list. The file must also not block /llms.txt or /AGENTS.md.
sitemap.xml
Still the source of truth for canonical URL enumeration. AI crawlers use it to plan their fetches and to detect new pages between visits. The two checks that matter: valid XML against the sitemaps.org schema, and a meaningful <lastmod> on every entry. A sitemap whose every entry has today’s date because the build process auto-stamps it is treated as noise.
AGENTS.md
The skill-file convention for coding agents (Codex, Claude Code, Cursor, Aider). A short markdown brief at the repo root that tells the agent what the project does, what conventions to respect, and what not to touch. Codebases score zero on this check until they ship one. Validate with the AGENTS.md validator.
How do AI agents extract content from a single page?
Once the agent has chosen a URL, the page itself has to be parseable. The Vercel spec breaks this into three concerns: transport, metadata, and signal-to-noise.
Transport. The page must return HTTP 200 with a correct Content-Type (text/html; charset=utf-8 for HTML), no more than one redirect hop, and no x-robots-tag directive containing noai, noimageai, or noindex. Redirect chains kill agent crawl budgets fast; a 2-hop chain is treated as a failure.
Metadata. Every page needs a canonical link, an html lang attribute, an og:title, an og:description, a meta description longer than 50 characters, and at least one JSON-LD block with a recognised schema.org type. The JSON-LD is where AI engines lift the structured facts of a page — Article, FAQPage, HowTo, BreadcrumbList are the four types that earn the most citation lift.
Signal-to-noise. The body must have a meaningful text-to-HTML ratio, at least three semantic section headings (h1–h3), and ideally a markdown mirror at /<slug>.md with content-negotiation routing so an agent sending Accept: text/markdown gets the clean variant. Heavy SPA shells whose initial HTML is empty until hydration are scored as JS-rendering-dependent; several dependent checks become unreliable and the score suffers across the board.
What about MCP, A2A, agents.json, and the other protocol manifests?
The protocol layer is conditional. None of these files are required for a content site — a marketing page scored without protocol checks won’t be penalised. They become required only when your site actively participates in the relevant protocol.
MCP (Model Context Protocol). Defines how AI clients (Claude Desktop, Cursor, Cline) discover and call remote tools and resources. A site that hosts an MCP server publishes a card at /.well-known/mcp.json advertising its transport, capabilities, and OAuth metadata. SEP-1649 standardises the required fields. Validate with the MCP Server Card validator.
A2A (Agent-to-Agent). Discovery for agents that talk to other agents. The manifest lives at /.well-known/agent-card.json and declares capabilities, skills, and authentication. Validate with the A2A Agent Card validator.
agents.json. An OpenAPI-anchored extension that lets API providers expose their endpoints to agent runtimes with workflows, sequencing, and auth pre-described. Sits at /agents.json or /.well-known/agents.json. Validate with the agents.json validator.
agent-permissions, UCP, x402. Three newer manifests covering, respectively: declared agent permission scopes, Universal Commerce Protocol for agent-driven shopping, and the HTTP 402 Payment Required handshake for agent micropayments. Each has a dedicated validator under /agent-permissions-validator, /ucp-validator, and /x402-checker.
How does agent readability differ from traditional SEO?
The fundamentals overlap. HTTPS, canonical URLs, valid sitemaps, JSON-LD, fast TTFB, clean semantic markup — every page that scores well on a Lighthouse SEO audit picks up most of what an agent needs. The divergence starts at the next layer.
| Concern | Traditional SEO | Agent readability |
|---|---|---|
| Consumer | Human reader after a click | Autonomous client extracting a passage |
| Goal metric | Click-through rate, rank position | Citation share, paraphrase fidelity |
| Discovery files | robots.txt, sitemap.xml | + llms.txt, AGENTS.md, /.well-known/* |
| Page checks | Title, meta, headings, schema | + Markdown mirror, content negotiation, JS-render guard |
| Robots policy | Block scrapers, allow Googlebot | Allow GPTBot, ClaudeBot, Google-Extended, Perplexity |
| Programmatic surface | N/A | MCP, A2A, agents.json, x402 |
An agent-ready site is almost always SEO-good as a byproduct. The reverse isn’t true: plenty of SEO-tuned sites score poorly on agent readability because nothing in the traditional checklist mentioned llms.txt or content negotiation.
Which validator should you run first?
For a fast triage, run the full agent-readability score — it executes all 59 checks against your URL and surfaces the top failures with fix guidance. Then, for targeted re-checks after a fix, the per-spec validators are faster:
| Validator | What it checks |
|---|---|
| llms.txt validator | 10 checks against the llmstxt.org spec |
| AGENTS.md validator | Required sections, structure, and skill-file conventions |
| MCP Server Card validator | /.well-known/mcp.json against SEP-1649 required fields |
| A2A Agent Card validator | /.well-known/agent-card.json against the A2A spec |
| agents.json validator | Wildcard agents.json manifest at /agents.json |
| agent-permissions.json validator | Permission-scope declarations at /.well-known/agent-permissions.json |
| UCP validator | Universal Commerce Protocol profile at /.well-known/ucp |
| x402 checker | HTTP 402 Payment Required handshake for agent payments |
| Full agent-readability score | All 59 checks combined into a 0–100 score with a fix list |
What should you fix first?
Sequence the work by leverage. Each step assumes the previous one is done.
- Audit robots.txt. Confirm GPTBot, ClaudeBot, Google-Extended, CCBot, PerplexityBot are allowed and that
/llms.txtis not blocked. Five-minute job; biggest blocker if it’s wrong. - Ship /llms.txt.Hand-author a 30-line file with H1, blockquote summary, and 2–3 H2 link sections. Validate with the llms.txt checker. Reference: how to add llms.txt to a Next.js site.
- Page metadata sweep. Every page needs canonical, og:title, og:description, html lang, and one JSON-LD block. Fixable in a single PR for most sites.
- AGENTS.md for codebases. If the site is a docs hub for a library, framework, or product, ship an AGENTS.md at the repo root. Reference: how to write an effective AGENTS.md.
- Markdown mirrors + content negotiation. For high-value pages, ship a
/<slug>.mdvariant and rewrite requests carryingAccept: text/markdown. This is the largest single lift you can ship for AI-extraction fidelity. - Protocol manifests, if applicable. If you host an MCP server, ship
/.well-known/mcp.json. If your APIs are agent-callable, ship/agents.json. Otherwise skip — protocol checks drop rather than fail when the relevant endpoint is absent.
How is agent-readability scored?
Agent Ready maps each check to pass, warn, or fail, then produces a 0–100 score and one of four rating bands: excellent (90–100), good (70–89), fair (50–69), needs improvement (0–49). The full scoring logic, including how llms.txt sub-scores are weighted and how JS-render dependencies are handled, lives in the methodology doc.
Frequently asked questions
- Is agent readability the same as SEO?
- No. Traditional SEO optimises for human readers reaching your page via a search-results click. Agent readability optimises for autonomous software that fetches your page, extracts a passage, and either acts on it or cites it in a generated answer. They overlap on the basics (HTTPS, canonical URLs, structured data) but diverge sharply on the next layer: SEO doesn't care about /llms.txt, /.well-known/mcp.json, or content-negotiated markdown mirrors. An agent-ready site is usually SEO-good as a byproduct, but the reverse isn't true.
- Do I need every one of these files — llms.txt, AGENTS.md, MCP cards, agents.json?
- No. Ship the ones that match what your site actually does. Every site benefits from llms.txt and a sitemap. Codebases and documentation sites should add AGENTS.md so coding agents have a brief. Any site that exposes an MCP server publishes /.well-known/mcp.json. Sites that ship A2A-compatible agents publish /.well-known/agent-card.json. The protocol manifests are conditional — Agent Ready drops the check rather than failing it when the relevant endpoint is absent, so a marketing site doesn't score itself against agent protocols it has no reason to ship.
- How long does it take to make a site agent-ready?
- An hour for the foundations on most sites: add /llms.txt, verify robots.txt isn't blocking AI bots, confirm every page has a canonical URL, og:title, og:description, and JSON-LD. A second pass — markdown mirrors, AGENTS.md, content negotiation — typically takes a day. Protocol manifests (MCP, A2A, agents.json) only make sense if you actually run that protocol, and the work is measured in hours of config rather than days of engineering.
- Will agent readability lift my traffic in 2026?
- Probably not directly — AI Overviews reduce click-through, they don't increase it. The play is citation share, not click share. Princeton's GEO study (KDD 2024) shows that structurally clean, citation-rich pages get cited disproportionately in generated answers, and citations are the surface where users decide which brand to trust. The traffic that does click through is higher-intent because the user has already read the synthesised answer.
- How do I check what bots are crawling my site?
- Look at your access logs for the AI user-agents: GPTBot (OpenAI), ClaudeBot and Claude-User (Anthropic), CCBot (Common Crawl), Google-Extended (Google AI training), PerplexityBot, Amazonbot, and Bytespider. If you see frequent 404s on /llms.txt, /AGENTS.md, /sitemap.xml, or /.well-known/* paths, agents are probing for files you haven't shipped — that's the prioritised fix list.