Agent Ready

State of MCP Servers

Original research from the Agent Ready MCP scan corpus - how connectable, and how well-built, the most popular MCP servers really are.

As of 2026-06-18 · based on 264 servers scanned, 231 introspected

Last updated

Can an agent actually connect?

The first question isn’t quality - it’s reach. Of 264 popular servers we tried, 88% accepted an unauthenticated handshake and let us read their tools. 7% refused without a per-server credential, and the rest failed to connect. Only the 231we could introspect feed the quality figures below - servers we couldn’t see into are never scored as if they were bad.

OutcomeServersShare
Connect without a per-server credential231
88%
Refused without a per-server credential19
7%
Failed to connect (transport / non-MCP)14
5%

How well-built is the average server?

Across the 231 servers we could introspect, the mean MCP score is 81 out of 100 and the median is 81. The typical server advertises 16 tools, 2 resources, and 1 prompts. The score is weighted toward tool quality; see the methodology for the formula and rating bands.

  • Excellent26% · 61
  • Good52% · 120
  • Fair21% · 49
  • Needs improvement0% · 1
Rating bandServersShare
Excellent (90–100)6126%
Good (70–89)12052%
Fair (50–69)4921%
Needs improvement (0–49)10%

Where do servers do well - and fall short?

Each bar is the share of introspected servers that passa given quality check, among the servers it applies to. Tool and parameter descriptions are what an agent reasons over; output schemas and annotations are newer best-practice signals that adoption is still catching up to. Error-output modeling (M13) is informational - it doesn’t affect the score - and surfaces a deeper gap: of the tools that do declare a schema, how many describe a failure path at all, so an agent can tell a failed call from an empty-but-successful one.

Quality checkServersPass rate
M2Complete server metadata231
94%
M3Every tool described231
99%
M4Every parameter described231
65%
M5Tools declare a specific output schema231
22%
M6Tools carry annotations231
39%
M8Resources well-formed61
84%
M9Prompts described60
100%
M12MCP Apps (ui://) served as HTML9
78%
M13Schema-bearing tools that model an error path63
5%

How was this measured?

Every figure is computed from 264 of the most-used MCP servers on the Smithery registry, scanned between 2026-06-18 and 2026-06-18. We keep the most recent scan per endpoint, exclude servers operated by Agent Ready, and name no individual server - the report is aggregate and anonymous by design. Score, tool-count, and quality stats are computed only over the 231 servers we could introspect; auth-gated and unreachable servers count only toward connectability.

This is the most popular slice of servers a registry hosts with a remote endpoint, so it runs ahead of the long tail of local-only servers. Read it as a trend signal, not a census. The M1-M13 checks and scoring rules live on the methodology page.

Frequently asked questions

Where does this data come from?
We took the most-used MCP servers listed on the Smithery registry (ranked by Smithery's useCount), connected to each through Smithery's hosted gateway with our own MCP scanner, and aggregated the results. We keep one scan per endpoint - the most recent - and compute statistics across those distinct servers. No individual server is named anywhere on this page; the report is deliberately aggregate and anonymous.
Why can't every server be scored - what does 'couldn't connect' mean?
Our scanner connects over remote Streamable HTTP and reads what a server advertises. A server that requires its own credential (an API key for the underlying service) refuses the unauthenticated handshake, so we can't see its tools. Those servers count toward the connectability finding - 'can an agent use this without a per-server credential?' - but are excluded from the score, tool-count, and quality figures entirely. Folding their unscored 0s into the averages would be misleading, so we don't.
Is this representative of all MCP servers?
No. It's the most popular servers that Smithery hosts with a remote endpoint - which skews toward actively-maintained, deployment-ready servers. Servers distributed only as a local stdio/npm package (the bulk of every registry) have no remote endpoint to scan and aren't included, nor are servers from other registries. Read the numbers as 'among the most popular remotely-reachable MCP servers', not as a census of the ecosystem.
How is the MCP score calculated?
Each scan runs the M1-M13 checks - handshake, server metadata, tool/parameter descriptions, output schemas, annotations, naming, resources, prompts, capability honesty, MCP Apps, and (informational) error-output modeling - weighted toward tool quality, with checks that don't apply to a given server excluded. The full weighting and the pass/warn/fail rules are on the methodology page. Output schemas and annotations are graded as best-practice adoption, not protocol compliance, and an output schema only counts if it declares specific fields - a bare {"type":"object"} doesn't.
Do MCP tools describe what happens when they fail?
Mostly no - and it's the gap hiding under the output-schema number. Of the tools that do declare an output schema, almost all model only the success case; the error path comes back as unstructured text. That means an agent can't cleanly tell 'the tool failed, retry or back off' from 'the tool succeeded with an empty result' - the other place tool chains silently break. We surface this as an informational signal (M13): the share of schema-bearing tools whose schema admits a failure path (an error/status field or a union variant). It doesn't affect the score. We can't observe the runtime isError flag because the scan is read-only and never calls a tool, so this measures what the schema itself declares.
How often do these numbers change?
The page recomputes from the corpus on a daily cycle. The 'as of' date at the top reflects the latest scan included; cite the date alongside any statistic for a fixed reference. Where the introspected sample is small, figures lead with raw counts and carry a caveat.