Can an agent actually connect?
The first question isn’t quality - it’s reach. Of 264 popular servers we tried, 88% accepted an unauthenticated handshake and let us read their tools. 7% refused without a per-server credential, and the rest failed to connect. Only the 231we could introspect feed the quality figures below - servers we couldn’t see into are never scored as if they were bad.
| Outcome | Servers | Share |
|---|---|---|
| Connect without a per-server credential | 231 | 88% |
| Refused without a per-server credential | 19 | 7% |
| Failed to connect (transport / non-MCP) | 14 | 5% |
How well-built is the average server?
Across the 231 servers we could introspect, the mean MCP score is 81 out of 100 and the median is 81. The typical server advertises 16 tools, 2 resources, and 1 prompts. The score is weighted toward tool quality; see the methodology for the formula and rating bands.
- Excellent26% · 61
- Good52% · 120
- Fair21% · 49
- Needs improvement0% · 1
| Rating band | Servers | Share |
|---|---|---|
| Excellent (90–100) | 61 | 26% |
| Good (70–89) | 120 | 52% |
| Fair (50–69) | 49 | 21% |
| Needs improvement (0–49) | 1 | 0% |
Where do servers do well - and fall short?
Each bar is the share of introspected servers that passa given quality check, among the servers it applies to. Tool and parameter descriptions are what an agent reasons over; output schemas and annotations are newer best-practice signals that adoption is still catching up to. Error-output modeling (M13) is informational - it doesn’t affect the score - and surfaces a deeper gap: of the tools that do declare a schema, how many describe a failure path at all, so an agent can tell a failed call from an empty-but-successful one.
| Quality check | Servers | Pass rate |
|---|---|---|
| M2Complete server metadata | 231 | 94% |
| M3Every tool described | 231 | 99% |
| M4Every parameter described | 231 | 65% |
| M5Tools declare a specific output schema | 231 | 22% |
| M6Tools carry annotations | 231 | 39% |
| M8Resources well-formed | 61 | 84% |
| M9Prompts described | 60 | 100% |
| M12MCP Apps (ui://) served as HTML | 9 | 78% |
| M13Schema-bearing tools that model an error path | 63 | 5% |
How was this measured?
Every figure is computed from 264 of the most-used MCP servers on the Smithery registry, scanned between 2026-06-18 and 2026-06-18. We keep the most recent scan per endpoint, exclude servers operated by Agent Ready, and name no individual server - the report is aggregate and anonymous by design. Score, tool-count, and quality stats are computed only over the 231 servers we could introspect; auth-gated and unreachable servers count only toward connectability.
This is the most popular slice of servers a registry hosts with a remote endpoint, so it runs ahead of the long tail of local-only servers. Read it as a trend signal, not a census. The M1-M13 checks and scoring rules live on the methodology page.
Frequently asked questions
- Where does this data come from?
- We took the most-used MCP servers listed on the Smithery registry (ranked by Smithery's useCount), connected to each through Smithery's hosted gateway with our own MCP scanner, and aggregated the results. We keep one scan per endpoint - the most recent - and compute statistics across those distinct servers. No individual server is named anywhere on this page; the report is deliberately aggregate and anonymous.
- Why can't every server be scored - what does 'couldn't connect' mean?
- Our scanner connects over remote Streamable HTTP and reads what a server advertises. A server that requires its own credential (an API key for the underlying service) refuses the unauthenticated handshake, so we can't see its tools. Those servers count toward the connectability finding - 'can an agent use this without a per-server credential?' - but are excluded from the score, tool-count, and quality figures entirely. Folding their unscored 0s into the averages would be misleading, so we don't.
- Is this representative of all MCP servers?
- No. It's the most popular servers that Smithery hosts with a remote endpoint - which skews toward actively-maintained, deployment-ready servers. Servers distributed only as a local stdio/npm package (the bulk of every registry) have no remote endpoint to scan and aren't included, nor are servers from other registries. Read the numbers as 'among the most popular remotely-reachable MCP servers', not as a census of the ecosystem.
- How is the MCP score calculated?
- Each scan runs the M1-M13 checks - handshake, server metadata, tool/parameter descriptions, output schemas, annotations, naming, resources, prompts, capability honesty, MCP Apps, and (informational) error-output modeling - weighted toward tool quality, with checks that don't apply to a given server excluded. The full weighting and the pass/warn/fail rules are on the methodology page. Output schemas and annotations are graded as best-practice adoption, not protocol compliance, and an output schema only counts if it declares specific fields - a bare {"type":"object"} doesn't.
- Do MCP tools describe what happens when they fail?
- Mostly no - and it's the gap hiding under the output-schema number. Of the tools that do declare an output schema, almost all model only the success case; the error path comes back as unstructured text. That means an agent can't cleanly tell 'the tool failed, retry or back off' from 'the tool succeeded with an empty result' - the other place tool chains silently break. We surface this as an informational signal (M13): the share of schema-bearing tools whose schema admits a failure path (an error/status field or a union variant). It doesn't affect the score. We can't observe the runtime isError flag because the scan is read-only and never calls a tool, so this measures what the schema itself declares.
- How often do these numbers change?
- The page recomputes from the corpus on a daily cycle. The 'as of' date at the top reflects the latest scan included; cite the date alongside any statistic for a fixed reference. Where the introspected sample is small, figures lead with raw counts and carry a caveat.