Validate Mcp-Param-* headers server-side on the 2026-07-28 HTTP path (SEP-2243)#3033
Conversation
…(SEP-2243) Servers that process the request body MUST validate that Mcp-Param-* headers match the corresponding x-mcp-header-annotated body arguments, rejecting mismatches with HTTP 400 and -32020 (HeaderMismatch). The client half shipped earlier; this adds the server half: - mcp.shared.inbound grows a pure, exported validate_mcp_param_headers built on the same walker and scalar rendering the client emit side uses, so the two halves of the mirror contract cannot drift. Presence rules follow the spec's scenario table; recognized headers supplied more than once are rejected (first-wins consumers vs last-wins validation would otherwise disagree); integer values compare numerically behind a canonical-decimal gate, exactly and in both directions; unrecognized headers stay ignored. - decode_header_value now requires canonical base64 in the sentinel (bad padding, stray characters, non-zero trailing bits, or invalid UTF-8 are malformed), which the conformance suite mandates for Mcp-Param-* and which now applies to Mcp-Name symmetrically. - The modern HTTP entry validates tools/call pre-dispatch, resolving the called tool's inputSchema through the server's own registered tools/list handler via the normal serve_one path with the caller's envelope - so a visibility-scoped catalog validates exactly what this caller was advertised, with nothing to configure on MCPServer or lowlevel servers. The listing is skipped (never failing the call) when no tools/list handler is registered, the tool is not advertised, the handler raises (logged), pagination exceeds a page cap or cycles, or the call has no arguments and no Mcp-Param-* headers. - Remove http-custom-header-server-validation from both conformance expected-failures baselines; the scenario passes 9/9 against the everything-server, with http-header-validation and server-stateless unchanged.
📚 Documentation preview
|
…ate routing headers, parse-limit 500s - A value's header rendering (or its absence) is now a single shared fact (_render_header_scalar returns None for non-primitives and integers beyond the int-to-str digit limit): the client omits such headers, the validator treats a header claiming such a value as a mismatch, and a huge-integer body can no longer raise out of the public validator. - A non-canonical-decimal header for an integer declaration falls back to the rendered comparison instead of rejecting outright, so the client's own scientific-notation rendering of large integral floats (str(1e16) == '1e+16') round-trips against this server. - Duplicated MCP-Protocol-Version / Mcp-Method / Mcp-Name raw header lines are rejected before classification (find_duplicated_routing_header) - the same first-wins/last-wins divergence the Mcp-Param duplicate rejection closes, where it matters most. - The synthetic listing's fail-open boundary now covers the result scan (a middleware short-circuiting tools/list with a mis-shaped result is a logged skip, not a 500), and a mis-shaped envelope failing the listing's surface validation logs at debug as client fault instead of an error-level traceback blaming the tools/list handler. - json.loads failures are caught as ValueError (an integer literal beyond the digit limit raises the bare parent, not JSONDecodeError), keeping unparseable bodies at 400 + PARSE_ERROR. - migration.md: handler-raise skip is logged as an error, not a warning; document the omitted-unrenderable-value and duplicate-line rules.
…dler faults in the schema listing json.loads raises RecursionError, not ValueError, for deeply nested bodies - widen the parse guard so they stay 400 + PARSE_ERROR. The synthetic tools/list now surface-validates its params before dispatch: a mis-shaped envelope is caught up front (debug, client fault), so a ValidationError escaping serve_one can only be handler-origin and gets the error-level log the fail-open skip promises.
| mcp_param_rejection = await _mcp_param_rejection(app, request, req, verdict, lifespan_state) | ||
| if mcp_param_rejection is not None: | ||
| await _write_rejection(mcp_param_rejection, req.id, scope, receive, send) | ||
| return | ||
|
|
There was a problem hiding this comment.
🟡 In SSE mode (json_response=False), the new pre-dispatch Mcp-Param validation phase runs before the SSE deferral/keepalive machinery, so a 2026-07-28 tools/call writes no bytes to the wire while the internal tools/list schema walk runs (up to 100 paginated serve_one round trips). A deployment whose tools/list handler is slower than the upstream proxy's idle-read timeout previously worked (the keepalive committed within 15s of dispatch) but would now have every validated tools/call reset before dispatch — consider bounding the schema-resolving walk with a timeout that degrades to the existing logged fail-open skip.
Extended reasoning...
What happens. _mcp_param_rejection is awaited in handle_modern_request (src/mcp/server/_streamable_http_modern.py:395-399) before the SSE branch is reached — before send_ch/run_handler are constructed and before the anyio.move_on_after(_SSE_PING_INTERVAL) deferral windows exist. For a 2026-07-28 tools/call with non-empty arguments or any Mcp-Param-* header (the gate only skips when both are absent — it does not depend on the tool actually carrying x-mcp-header annotations), the entry awaits _tool_input_schema, which dispatches an internal tools/list through serve_one for up to _MCP_PARAM_LIST_PAGE_CAP = 100 pages, each through middleware and the user handler with a fresh Connection. Nothing is written to the wire while that walk runs.
Why this is a coverage regression rather than just extra cost. The module's own docstring states the SSE deferral exists so that "a handler that runs silent past the window commits SSE so the keepalive ping can keep the connection open behind a proxy idle-read timeout" — i.e. the design explicitly bounds the silent window at _SSE_PING_INTERVAL (15s) precisely because slow work behind proxy idle timeouts is an acknowledged deployment reality. The new validation phase sits entirely outside that guarantee: the maximum silent time before the first byte grows from ~15s to (full listing duration + 15s). docs/migration.md documents that "middleware and an expensive or paginated tools/list handler see extra invocations" — the cost — but not the loss of keepalive coverage.
Concrete walkthrough. 1) An SSE-mode (default json_response=False) deployment sits behind a proxy with a 60s idle-read timeout, and its tools/list handler walks a slow paginated catalog backend taking ~90s. 2) Pre-PR: a tools/call is dispatched immediately; within 15s the entry either replies or commits text/event-stream and starts : ping keepalives, so the proxy never sees 60s of silence — the call succeeds. 3) Post-PR: the same tools/call (it has arguments, so the gate fires) first runs the internal tools/list walk; the connection is byte-silent for ~90s; the proxy resets it at 60s; the request never reaches dispatch. Every validated tools/call to that deployment now fails the same way, and the failure is a connection reset rather than a diagnosable JSON-RPC error.
Why it is narrow. The trigger requires SSE mode plus a tools/list handler (or paginated catalog walk) slower than the proxy idle-read timeout — typically 30-60s, which is unusual; most catalogs list from memory in milliseconds, and the walk stops as soon as the called tool is found. JSON-mode deployments were never protected by a keepalive, so they are unchanged. The placement before SSE is also partly forced by the spec: a HEADER_MISMATCH rejection MUST be a plain application/json 400, which cannot be honored after SSE has committed, so simply moving the validation under the keepalive machinery is not a drop-in fix.
Suggested fix. Bound the schema-resolving walk with a wall-clock timeout (e.g. wrap the _tool_input_schema call in anyio.move_on_after(...)) that degrades to the existing logged fail-open skip — the same availability-over-strictness posture already taken for a raising handler, cursor cycle, or page cap. Alternatively, document the limitation in docs/migration.md alongside the extra-invocations note, or use it as motivation for the registry fast-path the PR description already anticipates.
Implements the server half of SEP-2243 custom headers for the 2026-07-28 Streamable HTTP path:
tools/callrequests are validatedMcp-Param-*-header ↔ body before dispatch, and mismatches are rejected with HTTP 400 + JSON-RPC-32020(HeaderMismatch). The client half (mirroring, sentinel codec, annotation validation) already shipped; this closes the remaining server-side gap and removes thehttp-custom-header-server-validationentries from both conformance expected-failures baselines.Refs #2900, #2715.
Motivation and Context
The draft transport spec requires it (Server Validation): "Any server that processes the message body MUST validate that encoded header values, after decoding if Base64-encoded, match the corresponding values in the request body. Servers MUST reject requests with a
400 Bad RequestHTTP status and JSON-RPC error code-32020(HeaderMismatch) if any validation fails." Until now the modern entry validatedMCP-Protocol-Version/Mcp-Method/Mcp-Nameonly; thehttp-custom-header-server-validationconformance scenario ran 3 accept-pass / 6 reject-fail and was baselined as a known failure.Design
Schema source = the server's own registered
tools/listhandler. The validator needs the called tool'sinputSchemato know whichMcp-Param-*headers are recognized (the header name comes from thex-mcp-headerannotation value, not the property name, so without the schema there is nothing to compare). Rather than adding a registry hook or configuration, the entry resolves the schema by dispatching an internaltools/listthrough the normalserve_onepath with the caller's own envelope. Properties of this approach:MCPServeror on low-level servers — anyone servingtools/listgets validation automatically.tools/listhandler is registered (an undiscoverable catalog has no recognized headers), the tool isn't in the listing (dispatch owns unknown-tool), the handler raises (logged at exception level — a deliberate availability-over-strictness call: a server broken here is equally broken for real discovery, and the skip is just the pre-feature status quo for that request), the pagination walk exceeds a 100-page cap or repeats a cursor (logged), or the call has no arguments and noMcp-Param-*headers (no declaration can be violated in either direction).tools/call: middleware, lifespans, and expensive/paginatedtools/listhandlers see extra invocations. Documented indocs/migration.md; optimizable later behind the same surface (e.g. a registry fast-path for the built-in handler) without API change.Validation semantics (pure, exported
mcp.shared.inbound.validate_mcp_param_headers, sharing the schema walker and scalar rendering with the client emit side so the two halves cannot drift):null/absent → header must not be expected. A header present for an absent/null argument is rejected — the spec's table doesn't pin this cell, but its purpose clause ("a load balancer routing on the header value while the MCP server executes based on the body value") is exactly this case; the Go SDK rejects too, the TypeScript SDK skips.=?base64?...?=sentinel: wrong padding, non-alphabet characters, non-zero trailing bits, or invalid UTF-8 are malformed (the conformance suite mandates strict rejection; Python's default lenientb64decodewould wrongly accept two of its reject cases). This now applies toMcp-Namedecoding symmetrically.42≡42.0, the spec's SHOULD) behind a canonical-decimal gate (no1e2), exactly (no float round-trip), and in both directions (an integral-float body value like42.0matches).Mcp-Param-*headers are forwarded-and-ignored per the spec; a header claiming a non-primitive body value is a mismatch, while a non-primitive value without a header is left to params validation at dispatch.How Has This Been Tested?
conformance@4944b268, locally against the everything-server):http-custom-header-server-validation9/9 (was 3/9);http-header-validation13/13 (unchanged — guards the shared decoder hardening);server-stateless25/28 with exactly the 3 pre-existing known failures.initializeon the same endpoint is untouched../scripts/test: 4750 passed, 100% branch coverage,strict-no-coverclean. New unit matrix intests/shared/test_inbound.pymirrors every conformance leg plus the edge semantics (duplicates, huge-digit headers, integral floats, nested paths, invalid-annotation schemas); entry-level wire tests intests/server/test_streamable_http_modern.pycover skip paths (no list handler, unlisted tool, raising handler, cursor cycle, page cap), pagination, SSE-mode 400-as-JSON, and envelope isolation of the internal listing.Breaking Changes
Spec-mandated behavior change on the 2026-07-28 path only (documented in
docs/migration.md): a client that sends an annotated argument without its mirroring header — e.g. one that calls a tool it never listed — is now rejected with 400/-32020instead of silently served. The spec's recovery is to re-list and retry (a client-side SHOULD this SDK does not implement yet; the TypeScript SDK does). Base64-sentinel decoding, including forMcp-Name, is now strictly canonical. Legacy (≤2025-11-25) traffic is unaffected.Types of changes
Checklist
Additional context
The orphan-header cell (header present, argument absent) is genuinely unpinned by the spec's table and the SDKs split on it (Go rejects, TypeScript skips); this PR takes the reject posture with the purpose-clause rationale above. May be worth a small spec clarification upstream.
AI Disclaimer