---
title: State of Agent Readability
description: "Original research from the Agent Ready scan corpus: llms.txt and AGENTS.md adoption, the most common readability failures, agent-protocol uptake, and how often sites block AI crawlers."
last_updated: 2026-06-12
canonical_url: https://agent-ready.dev/state-of-agent-readability
---

# State of Agent Readability

> Original research from the Agent Ready scan corpus — how much of the web is actually ready for AI agents. As of 2026-06-12, based on 107 distinct sites.

## How agent-ready is the average site?

Across 107 distinct sites, the mean agent-readability score is **52/100** and the median is **56/100**.

| Rating band | Sites | Share |
| --- | --- | --- |
| Excellent (90–100) | 0 | 0% |
| Good (70–89) | 13 | 12% |
| Fair (50–69) | 55 | 51% |
| Needs improvement (0–49) | 39 | 36% |

## How many sites publish an llms.txt or AGENTS.md?

57% of sites publish a valid llms.txt, 32% ship the llms-full.txt companion, and 41% publish an AGENTS.md. The mean llms.txt sub-score is 49/100.

| Discovery file | Sites | Adoption |
| --- | --- | --- |
| Publish a valid llms.txt | 61 | 57% |
| Publish llms-full.txt | 34 | 32% |
| Publish AGENTS.md | 44 | 41% |

## What are the most common agent-readability failures?

Site-level and llms.txt checks ranked by fail rate among the sites each check ran on.

| Check | Fail rate | Sites checked |
| --- | --- | --- |
| S13 — AGENTS.md has required sections | 89% | 107 |
| S11 — sitemap.md has headings + links | 88% | 107 |
| S10 — sitemap.md exists | 68% | 107 |
| L10 — llms-full.txt available | 68% | 107 |
| L5 — H2 file-list sections | 60% | 107 |
| L6 — Link format correct | 60% | 107 |
| S12 — AGENTS.md exists | 59% | 107 |
| L4 — Blockquote summary | 59% | 107 |
| S9 — sitemap.xml has lastmod | 58% | 107 |
| L7 — Links are accessible | 52% | 107 |

## Which agent protocols are sites actually adopting?

Agent-protocol manifests are conditional — a site only "has" one if it advertises the matching well-known endpoint. Adoption is early and counts are small, so figures lead with the raw number of sites.

| Protocol surface | Sites | Share |
| --- | --- | --- |
| A2A agent card | 9 | 8% |
| MCP server card | 5 | 5% |
| agents.json † | 4 | 4% |
| agent-permissions.json † | 4 | 4% |
| NLWeb endpoint † | 2 | 2% |
| API catalog † | 2 | 2% |
| Web Bot Auth † | 1 | 1% |
| Agent Skills discovery † | 1 | 1% |

_† Fewer than 5 sites — indicative only, not a robust rate._


## How many sites block AI crawlers?

14% of sites in this corpus block at least one major AI crawler in robots.txt.

| AI crawler | Sites blocking it | Share |
| --- | --- | --- |
| GPTBot | 13 | 12% |
| Google-Extended | 12 | 11% |
| ClaudeBot | 10 | 9% |
| CCBot | 9 | 8% |
| Applebot-Extended | 8 | 7% |
| Bytespider | 6 | 6% |
| PerplexityBot | 5 | 5% |
| cohere-ai † | 4 | 4% |
| YouBot † | 3 | 3% |

_† Fewer than 5 sites — indicative only, not a robust rate._

## How was this measured?

Every figure is computed from 107 distinct sites scanned on agent-ready.dev between 2026-04-18 and 2026-06-12. To stop a single heavily-scanned site from skewing the corpus, only the most recent completed scan per domain is kept, and sites operated by Agent Ready are excluded so our own pages don't flatter the numbers. No individual site is named — the report is aggregate and anonymous by design. This is a sample of sites whose owners chose to check their agent readiness, so adoption rates run ahead of the web at large; read it as a trend signal, not a census. See the [methodology](https://agent-ready.dev/methodology) and [specs](https://agent-ready.dev/specs) for the full check registry and scoring rules.

## Frequently asked questions

### Where does this data come from?

Every public scan run on agent-ready.dev is stored, and this page aggregates them. To avoid letting a single heavily-scanned site skew the numbers, we take one scan per domain - the most recent completed scan - and compute statistics across those distinct domains. No individual domain is named anywhere on this page; the report is deliberately aggregate and anonymous.

### Is this a representative sample of the whole web?

No. It's a sample of sites that someone chose to scan with Agent Ready, which skews toward sites whose owners already care about AI agents - so adoption rates here are almost certainly higher than the web at large. Read the numbers as 'among sites being actively checked for agent readiness', not as a census. The sample size and date range are shown at the top of the page so you can judge the weight of each figure.

### How is the agent-readability score calculated?

Each scan runs dozens of checks across four layers - site-level discovery files, per-page extraction signals, the llms.txt standard, and agent-protocol manifests - and the Vercel score is the share of applicable checks that pass, expressed 0–100. The full scoring formula, rating bands, and per-check weighting are documented on the methodology page.

### How often do these numbers change?

The page recomputes from the live corpus on a daily cycle, so the figures drift as more sites are scanned and as scanned sites improve. The 'as of' date at the top reflects the latest scan included. For a fixed point-in-time reference, cite the date alongside the statistic.

### Which checks count as the 'most common failures'?

We rank the site-level and llms.txt checks that run on essentially every scan by how often they fail, and exclude any check that ran on too small a slice of the corpus to be meaningful. Conditional agent-protocol checks (MCP, A2A, x402, and the rest) aren't in the failure ranking - a site isn't 'failing' a protocol it never claimed to support - so those appear in the adoption section instead.

---

Read the full report on the web: <https://agent-ready.dev/state-of-agent-readability>

Scan your site: <https://agent-ready.dev>

## Sitemap

See the full [sitemap](https://agent-ready.dev/sitemap.md) for all pages on agent-ready.dev.