Where does this data come from?

Every public scan run on agent-ready.dev is stored, and this page aggregates them. To avoid letting a single heavily-scanned site skew the numbers, we take one scan per domain - the most recent completed scan - and compute statistics across those distinct domains. No individual domain is named anywhere on this page; the report is deliberately aggregate and anonymous.

Is this a representative sample of the whole web?

No. It's a sample of sites that someone chose to scan with Agent Ready, which skews toward sites whose owners already care about AI agents - so adoption rates here are almost certainly higher than the web at large. Read the numbers as 'among sites being actively checked for agent readiness', not as a census. The sample size and date range are shown at the top of the page so you can judge the weight of each figure.

How is the agent-readability score calculated?

Each scan runs dozens of checks across four layers - site-level discovery files, per-page extraction signals, the llms.txt standard, and agent-protocol manifests - and the Vercel score is the share of applicable checks that pass, expressed 0–100. A separate accessibility sub-score (WCAG 2.2 and layout stability) is also reported per scan but is graded on its own and isn't folded into the Vercel score. The full scoring formula, rating bands, and per-check weighting are documented on the methodology page.

How often do these numbers change?

The page recomputes from the live corpus on a daily cycle, so the figures drift as more sites are scanned and as scanned sites improve. The 'as of' date at the top reflects the latest scan included. For a fixed point-in-time reference, cite the date alongside the statistic.

Which checks count as the 'most common failures'?

We rank the site-level and llms.txt checks that run on essentially every scan by how often they fail, and exclude any check that ran on too small a slice of the corpus to be meaningful. Conditional agent-protocol checks (MCP, A2A, x402, and the rest) aren't in the failure ranking - a site isn't 'failing' a protocol it never claimed to support - so those appear in the adoption section instead.

State of Agent Readability

How agent-ready is the average site?

Across 786 distinct sites, the mean agent-readability score is 49 out of 100 and the median is 50. The score is the share of applicable checks a site passes; see the methodology for the formula and rating bands.

Excellent2% · 12
Good6% · 45
Fair46% · 360
Needs improvement47% · 369

Rating band	Sites	Share
Excellent (90–100)	12	2%
Good (70–89)	45	6%
Fair (50–69)	360	46%
Needs improvement (0–49)	369	47%

How many sites publish an llms.txt or AGENTS.md?

The discovery files are the cheapest agent-readiness win and the most-skipped. In this corpus, 46% of sites publish a valid llms.txt, 29% ship the llms-full.txt companion, and 28% publish an AGENTS.md. The mean llms.txt sub-score is 37/100.

Discovery file	Sites	Adoption
Publish a valid llms.txt	363	46%
Publish llms-full.txt	229	29%
Publish AGENTS.md	217	28%

What are the most common agent-readability failures?

These are the site-level and llms.txt checks that fail most often, ranked by fail rate among the sites the check actually ran on. They’re the highest-leverage fixes for a typical site.

Check	Fail rate	Sites checked
S13AGENTS.md has required sections	95%	786
S11sitemap.md has headings + links	94%	786
S10sitemap.md exists	77%	786
L4Blockquote summary	74%	786
S12AGENTS.md exists	72%	786
L5H2 file-list sections	71%	786
L6Link format correct	71%	786
L10llms-full.txt available	71%	786
S2llms.txt Content-Type	69%	786
L9Content-Type: text/plain	69%	786

Which agent protocols are sites actually adopting?

Agent-protocol manifests are conditional - a site only “has” one if it advertises the matching well-known endpoint. Adoption is still early, and the counts are small, so these figures lead with the raw number of sites.

Protocol surface	Sites	Share
A2A agent card	33	4%
API catalog	32	4%
MCP server card	31	4%
Agent Skills discovery	18	2%
Web Bot Auth	11	1%
agent-permissions.json	10	1%
UCP profile	10	1%
agents.json	8	1%
NLWeb endpoint	5	1%
x402 payments †	2	0%

† Fewer than 5 sites — indicative only, not a robust rate.

How many sites block AI crawlers?

A robots.txt that disallows AI user-agents keeps a site out of AI search and assistants entirely. 10% of sites in this corpus block at least one major AI crawler. The most-blocked crawlers:

AI crawler	Sites blocking it	Share
PerplexityBot	51	6%
YouBot	47	6%
GPTBot	38	5%
Google-Extended	36	5%
ClaudeBot	34	4%
Applebot-Extended	32	4%
Meta-WebIndexer	26	3%
DuckAssistBot	26	3%
OAI-SearchBot	25	3%
Amzn-SearchBot	25	3%

How was this measured?

Every figure on this page is computed from 786 distinct sites that were scanned on agent-ready.dev between 2026-04-18 and 2026-07-27. To stop a single heavily-scanned site from skewing the corpus, we keep only the most recent completed scan per domain, and we exclude sites operated by Agent Ready so our own pages don’t flatter the numbers. No individual site is named - the report is aggregate and anonymous by design.

These figures recompute daily, so the exact number you read today will have moved by next week. For a stable reference, cite the frozen 2026 edition — a point-in-time snapshot whose numbers never change.

This is a sample of sites whose owners chose to check their agent readiness, so adoption rates here run ahead of the web at large. Read it as a trend signal, not a census. The full check registry and scoring rules live on the methodology and specs pages.

Frequently asked questions

Where does this data come from?: Every public scan run on agent-ready.dev is stored, and this page aggregates them. To avoid letting a single heavily-scanned site skew the numbers, we take one scan per domain - the most recent completed scan - and compute statistics across those distinct domains. No individual domain is named anywhere on this page; the report is deliberately aggregate and anonymous.
Is this a representative sample of the whole web?: No. It's a sample of sites that someone chose to scan with Agent Ready, which skews toward sites whose owners already care about AI agents - so adoption rates here are almost certainly higher than the web at large. Read the numbers as 'among sites being actively checked for agent readiness', not as a census. The sample size and date range are shown at the top of the page so you can judge the weight of each figure.
How is the agent-readability score calculated?: Each scan runs dozens of checks across four layers - site-level discovery files, per-page extraction signals, the llms.txt standard, and agent-protocol manifests - and the Vercel score is the share of applicable checks that pass, expressed 0–100. A separate accessibility sub-score (WCAG 2.2 and layout stability) is also reported per scan but is graded on its own and isn't folded into the Vercel score. The full scoring formula, rating bands, and per-check weighting are documented on the methodology page.
How often do these numbers change?: The page recomputes from the live corpus on a daily cycle, so the figures drift as more sites are scanned and as scanned sites improve. The 'as of' date at the top reflects the latest scan included. For a fixed point-in-time reference, cite the date alongside the statistic.
Which checks count as the 'most common failures'?: We rank the site-level and llms.txt checks that run on essentially every scan by how often they fail, and exclude any check that ran on too small a slice of the corpus to be meaningful. Conditional agent-protocol checks (MCP, A2A, x402, and the rest) aren't in the failure ranking - a site isn't 'failing' a protocol it never claimed to support - so those appear in the adoption section instead.