How agent-ready is the average site?
Across 105 distinct sites, the mean agent-readability score is 52 out of 100 and the median is 55. The score is the share of applicable checks a site passes; see the methodology for the formula and rating bands.
- Excellent0% · 0
- Good12% · 13
- Fair50% · 53
- Needs improvement37% · 39
| Rating band | Sites | Share |
|---|---|---|
| Excellent (90–100) | 0 | 0% |
| Good (70–89) | 13 | 12% |
| Fair (50–69) | 53 | 50% |
| Needs improvement (0–49) | 39 | 37% |
How many sites publish an llms.txt or AGENTS.md?
The discovery files are the cheapest agent-readiness win and the most-skipped. In this corpus, 56% of sites publish a valid llms.txt, 31% ship the llms-full.txt companion, and 41% publish an AGENTS.md. The mean llms.txt sub-score is 49/100.
| Discovery file | Sites | Adoption |
|---|---|---|
| Publish a valid llms.txt | 59 | 56% |
| Publish llms-full.txt | 33 | 31% |
| Publish AGENTS.md | 43 | 41% |
What are the most common agent-readability failures?
These are the site-level and llms.txt checks that fail most often, ranked by fail rate among the sites the check actually ran on. They’re the highest-leverage fixes for a typical site.
| Check | Fail rate | Sites checked |
|---|---|---|
| S13AGENTS.md has required sections | 89% | 105 |
| S11sitemap.md has headings + links | 88% | 105 |
| S10sitemap.md exists | 69% | 105 |
| L10llms-full.txt available | 69% | 105 |
| L5H2 file-list sections | 61% | 105 |
| L6Link format correct | 61% | 105 |
| L4Blockquote summary | 60% | 105 |
| S12AGENTS.md exists | 59% | 105 |
| S9sitemap.xml has lastmod | 58% | 105 |
| L7Links are accessible | 52% | 105 |
Which agent protocols are sites actually adopting?
Agent-protocol manifests are conditional - a site only “has” one if it advertises the matching well-known endpoint. Adoption is still early, and the counts are small, so these figures lead with the raw number of sites.
| Protocol surface | Sites | Share |
|---|---|---|
| A2A agent card | 9 | 9% |
| MCP server card | 5 | 5% |
| agents.json † | 4 | 4% |
| agent-permissions.json † | 4 | 4% |
| NLWeb endpoint † | 2 | 2% |
| API catalog † | 2 | 2% |
| Web Bot Auth † | 1 | 1% |
| Agent Skills discovery † | 1 | 1% |
† Fewer than 5 sites — indicative only, not a robust rate.
How many sites block AI crawlers?
A robots.txt that disallows AI user-agents keeps a site out of AI search and assistants entirely. 14% of sites in this corpus block at least one major AI crawler. The most-blocked crawlers:
| AI crawler | Sites blocking it | Share |
|---|---|---|
| GPTBot | 13 | 12% |
| Google-Extended | 12 | 11% |
| ClaudeBot | 10 | 10% |
| CCBot | 9 | 9% |
| Applebot-Extended | 8 | 8% |
| Bytespider | 6 | 6% |
| PerplexityBot | 5 | 5% |
| cohere-ai † | 4 | 4% |
| YouBot † | 3 | 3% |
† Fewer than 5 sites — indicative only, not a robust rate.
How was this measured?
Every figure on this page is computed from 105 distinct sites that were scanned on agent-ready.dev between 2026-04-18 and 2026-06-12. To stop a single heavily-scanned site from skewing the corpus, we keep only the most recent completed scan per domain, and we exclude sites operated by Agent Ready so our own pages don’t flatter the numbers. No individual site is named - the report is aggregate and anonymous by design.
This is a sample of sites whose owners chose to check their agent readiness, so adoption rates here run ahead of the web at large. Read it as a trend signal, not a census. The full check registry and scoring rules live on the methodology and specs pages.
Frequently asked questions
- Where does this data come from?
- Every public scan run on agent-ready.dev is stored, and this page aggregates them. To avoid letting a single heavily-scanned site skew the numbers, we take one scan per domain - the most recent completed scan - and compute statistics across those distinct domains. No individual domain is named anywhere on this page; the report is deliberately aggregate and anonymous.
- Is this a representative sample of the whole web?
- No. It's a sample of sites that someone chose to scan with Agent Ready, which skews toward sites whose owners already care about AI agents - so adoption rates here are almost certainly higher than the web at large. Read the numbers as 'among sites being actively checked for agent readiness', not as a census. The sample size and date range are shown at the top of the page so you can judge the weight of each figure.
- How is the agent-readability score calculated?
- Each scan runs dozens of checks across four layers - site-level discovery files, per-page extraction signals, the llms.txt standard, and agent-protocol manifests - and the Vercel score is the share of applicable checks that pass, expressed 0–100. The full scoring formula, rating bands, and per-check weighting are documented on the methodology page.
- How often do these numbers change?
- The page recomputes from the live corpus on a daily cycle, so the figures drift as more sites are scanned and as scanned sites improve. The 'as of' date at the top reflects the latest scan included. For a fixed point-in-time reference, cite the date alongside the statistic.
- Which checks count as the 'most common failures'?
- We rank the site-level and llms.txt checks that run on essentially every scan by how often they fail, and exclude any check that ran on too small a slice of the corpus to be meaningful. Conditional agent-protocol checks (MCP, A2A, x402, and the rest) aren't in the failure ranking - a site isn't 'failing' a protocol it never claimed to support - so those appear in the adoption section instead.