Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/fix-transport-restart-after-close.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
'@modelcontextprotocol/client': patch
---

Allow `StreamableHTTPClientTransport` and `SSEClientTransport` to restart after `close()`. `close()` now clears `_abortController` (previously aborted but not unset, blocking the start guard) and `_sessionId` (previously leaked into post-restart requests, causing 404s).
3 changes: 3 additions & 0 deletions packages/client/src/client/sse.ts
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,10 @@ export class SSEClientTransport implements Transport {

async close(): Promise<void> {
this._abortController?.abort();
this._abortController = undefined;
this._eventSource?.close();
this._eventSource = undefined;
this._endpoint = undefined;
this.onclose?.();
}

Expand Down
20 changes: 16 additions & 4 deletions packages/client/src/client/streamableHttp.ts
Original file line number Diff line number Diff line change
Expand Up @@ -341,10 +341,16 @@ export class StreamableHTTPClientTransport implements Transport {
// Calculate next delay based on current attempt count
const delay = this._getNextReconnectionDelay(attemptCount);

// Capture the signal active when this reconnection was scheduled. close() + start()
// replaces this._abortController, so re-reading it later would see the new session's
// controller and allow a stale reconnect to fire into the restarted transport.
const signal = this._abortController?.signal;

const reconnect = (): void => {
this._cancelReconnection = undefined;
if (this._abortController?.signal.aborted) return;
if (signal?.aborted) return;
this._startOrAuthSse(options).catch(error => {
if (signal?.aborted) return;
this.onerror?.(new Error(`Failed to reconnect SSE stream: ${error instanceof Error ? error.message : String(error)}`));
try {
this._scheduleReconnection(options, attemptCount + 1);
Comment on lines 341 to 356
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 The PR closes the ghost-reconnect race in _scheduleReconnection and _handleSseStream by capturing the abort signal at schedule-time, but leaves two gaps: (1) _startOrAuthSse reads this._abortController?.signal live after await this._commonHeaders(), so a close()+start() during an OAuth token refresh suspends undetected and the ghost fetch opens on the new lifecycle; (2) the .catch handler has no second signal?.aborted guard after this.onerror?.(), so a synchronous close()+start() inside that callback causes the recursive _scheduleReconnection call to capture the new lifecycle signal (S2), scheduling a ghost reconnect under the new lifecycle. Fix: capture const signal = this._abortController?.signal at the top of _startOrAuthSse and use it for the fetch; add if (signal?.aborted) return; after this.onerror?.() and before the recursive _scheduleReconnection call.

Extended reasoning...

Bug 1: _startOrAuthSse live-reads this._abortController?.signal after async suspension

The PR explicitly explains its signal-capture design in the comment added to _scheduleReconnection (lines 344-346): close() + start() replaces this._abortController, so re-reading it later would see the new session's controller. The same capture is done in _handleSseStream (line 379). However, _startOrAuthSse does neither - it captures nothing before await this._commonHeaders() (line 239 of the modified file), then reads signal: this._abortController?.signal live at line 253.

The dangerous code path: _commonHeaders() calls await this._authProvider?.token(). For OAuth providers, token() may trigger a network token-refresh request, suspending for hundreds of milliseconds. During that window the JS event loop is free to run other tasks, including close() (which aborts the old controller) and start() (which creates a fresh AbortController - enabled by this very PR). After _commonHeaders() resolves, the live read at line 253 now sees the NEW lifecycle's non-aborted signal. The fetch proceeds, and _handleSseStream at line 379 captures that same new signal, fully binding the ghost SSE stream to the new lifecycle.

Step-by-step proof: (1) Old-lifecycle reconnect() fires; signal?.aborted guard passes (S1 not yet aborted). (2) _startOrAuthSse(options) is called; execution reaches await this._commonHeaders() and suspends during OAuth token refresh. (3) While suspended, user code calls close() - S1 is aborted - then start() - this._abortController = new AbortController() (S2, not aborted). (4) _commonHeaders() resolves. Line 253 reads this._abortController?.signal = S2. (5) The GET fetch is issued with S2 as its abort signal and the stale Last-Event-ID from the old session. (6) _handleSseStream captures S2, binding the ghost stream to the new lifecycle. The .catch guard at line 353 only prevents further reconnects if the attempt fails; it does not prevent the ghost stream from opening successfully.

The fix: add const signal = this._abortController?.signal; at the very top of _startOrAuthSse and use signal in the fetch call. Add if (signal?.aborted) return; after the await to exit early if the lifecycle was replaced during suspension.

Bug 2: TOCTOU between this.onerror?.() and recursive _scheduleReconnection

In the .catch handler of the reconnect() closure, the PR adds a single signal?.aborted guard before this.onerror?.() but adds no second guard between the onerror call and the recursive this._scheduleReconnection(options, attemptCount + 1). The onerror callback is typed (error: Error) => void - it is a synchronous callback. This PR specifically enables the pattern of calling close() + start() from onerror to implement OAuth restart flows.

Step-by-step proof: (1) _startOrAuthSse fails with a network error. (2) .catch fires; signal?.aborted check passes (S1 not yet aborted). (3) this.onerror?.(error) is called synchronously. (4) Inside onerror, user calls transport.close() - S1 is aborted synchronously (no await before _abortController?.abort()). Then transport.start() - this._abortController = new AbortController() (S2) runs synchronously. (5) onerror returns; execution falls to the try { this._scheduleReconnection(options, attemptCount + 1) } block. (6) Inside _scheduleReconnection, const signal = this._abortController?.signal captures S2 (not aborted). A new timer is set. (7) When the timer fires, if (signal?.aborted) return checks S2 - not aborted - and _startOrAuthSse(stale_options) runs under the new lifecycle with old-session options.

Before this PR, start() after close() threw "already started", so step 4 would have thrown and the TOCTOU was unreachable. This PR is the enabling change.

The fix: add if (signal?.aborted) return; immediately after this.onerror?.(error) and before the try { this._scheduleReconnection(...) } block. This mirrors the existing pre-onerror guard and closes the window.

No refutations to address - all verifiers confirmed both races are real.

Expand All @@ -369,6 +375,9 @@ export class StreamableHTTPClientTransport implements Transport {
}
const { onresumptiontoken, replayMessageId } = options;

// Capture the signal this stream is bound to so we don't reconnect into a restarted transport.
const signal = this._abortController?.signal;

let lastEventId: string | undefined;
// Track whether we've received a priming event (event with ID)
// Per spec, server SHOULD send a priming event with ID before closing
Expand Down Expand Up @@ -436,7 +445,7 @@ export class StreamableHTTPClientTransport implements Transport {
// BUT don't reconnect if we already received a response - the request is complete
const canResume = isReconnectable || hasPrimingEvent;
const needsReconnect = canResume && !receivedResponse;
if (needsReconnect && this._abortController && !this._abortController.signal.aborted) {
if (needsReconnect && signal && !signal.aborted) {
this._scheduleReconnection(
{
resumptionToken: lastEventId,
Expand All @@ -455,7 +464,7 @@ export class StreamableHTTPClientTransport implements Transport {
// BUT don't reconnect if we already received a response - the request is complete
const canResume = isReconnectable || hasPrimingEvent;
const needsReconnect = canResume && !receivedResponse;
if (needsReconnect && this._abortController && !this._abortController.signal.aborted) {
if (needsReconnect && signal && !signal.aborted) {
// Use the exponential backoff reconnection strategy
try {
this._scheduleReconnection(
Expand All @@ -476,7 +485,7 @@ export class StreamableHTTPClientTransport implements Transport {
}

async start() {
if (this._abortController) {
if (this._abortController && !this._abortController.signal.aborted) {
throw new Error(
'StreamableHTTPClientTransport already started! If using Client class, note that connect() calls start() automatically.'
);
Expand Down Expand Up @@ -511,6 +520,9 @@ export class StreamableHTTPClientTransport implements Transport {
} finally {
this._cancelReconnection = undefined;
this._abortController?.abort();
this._sessionId = undefined;
this._lastUpscopingHeader = undefined;
this._serverRetryMs = undefined;
this.onclose?.();
}
}
Comment on lines 520 to 528
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 close() clears _sessionId, _lastUpscopingHeader, and _serverRetryMs but does NOT clear _resourceMetadataUrl or _scope. After a restart, if the new lifecycle receives a 401 without a WWW-Authenticate header (valid per spec), stale values flow into finishAuth() or the 403 upscoping auth() call, targeting the wrong OAuth server. Fix: add this._resourceMetadataUrl = undefined and this._scope = undefined to the finally block in close().

Extended reasoning...

The PR establishes a clear pattern: all per-lifecycle state must be reset in close() so the transport can be safely reused after restart. It correctly adds this._sessionId = undefined, this._lastUpscopingHeader = undefined, and this._serverRetryMs = undefined to the finally block. However, _resourceMetadataUrl (initialized to undefined in the constructor at line ~194) and _scope (line ~195) are omitted from this cleanup, breaking the same invariant.

Where these fields are set: _resourceMetadataUrl and _scope are set during 401 handling in _startOrAuthSse() (lines 260-261) and _send() (lines 580-581). _scope is also updated in the 403/insufficient_scope path (lines 620, 624). The 403 path is especially subtle: _resourceMetadataUrl is only updated conditionally (if (resourceMetadataUrl) { this._resourceMetadataUrl = resourceMetadataUrl; }), so if a new lifecycle 403 response omits the resource_metadata parameter, the stale lifecycle-1 URL persists.

Where these fields are consumed: Both are passed directly to auth() in finishAuth() (lines 508-509: resourceMetadataUrl: this._resourceMetadataUrl, scope: this._scope) and in the 403 upscoping path (lines 631-632). These values drive token endpoint discovery; using the wrong URL means the client contacts the wrong OAuth authorization server.

Step-by-step failure scenario: (1) Lifecycle 1: server returns 401 with WWW-Authenticate header containing resource_metadata=https://old-server/resource, setting _resourceMetadataUrl. (2) close() is called; _resourceMetadataUrl is NOT cleared. (3) OAuth server config changes to a new resource metadata URL. (4) start() is called (newly enabled by this PR). (5) New lifecycle triggers a 401 WITHOUT a www-authenticate header, which is valid per spec when the server assumes the client already has the metadata. The guard at line 258 (if response.headers.has('www-authenticate')) is false, so _resourceMetadataUrl is NOT refreshed. (6) finishAuth() or onUnauthorized calls auth() with the stale _resourceMetadataUrl, performing token endpoint discovery against the wrong server.

Why existing code does not prevent this: The 401 path only updates _resourceMetadataUrl/_scope when the www-authenticate header is present. A spec-compliant server may omit this header on subsequent 401s after the client has already received the metadata. Before this PR, start() after close() threw already started, so users always created a new transport instance that re-initialized both fields to undefined in the constructor. This PR enables same-instance restart without the corresponding field resets, the same class of omission as _sessionId, _lastUpscopingHeader, and _serverRetryMs already fixed by this PR.

Fix: Add this._resourceMetadataUrl = undefined and this._scope = undefined to the finally block in close(), alongside the existing this._sessionId = undefined, following the exact same pattern the PR already uses for the other per-lifecycle fields.

Expand Down
82 changes: 82 additions & 0 deletions packages/client/test/client/streamableHttp.test.ts
Original file line number Diff line number Diff line change
Expand Up @@ -1998,6 +1998,31 @@ describe('StreamableHTTPClientTransport', () => {
expect(onerror).not.toHaveBeenCalled();
});

it('ignores a late-firing reconnect after close() + start()', async () => {
let capturedReconnect: (() => void) | undefined;
transport = new StreamableHTTPClientTransport(new URL('http://localhost:1234/mcp'), {
reconnectionOptions,
reconnectionScheduler: reconnect => {
capturedReconnect = reconnect;
}
});
const onerror = vi.fn();
transport.onerror = onerror;
const fetchMock = globalThis.fetch as Mock;

await transport.start();
triggerReconnection(transport);
await transport.close();
await transport.start();

fetchMock.mockClear();
capturedReconnect?.();
await vi.runAllTimersAsync();

expect(fetchMock).not.toHaveBeenCalled();
expect(onerror).not.toHaveBeenCalled();
});

it('still aborts and fires onclose if the cancel function throws', async () => {
transport = new StreamableHTTPClientTransport(new URL('http://localhost:1234/mcp'), {
reconnectionOptions,
Expand All @@ -2017,4 +2042,61 @@ describe('StreamableHTTPClientTransport', () => {
expect(onclose).toHaveBeenCalledTimes(1);
});
});

describe('Transport restart after close()', () => {
it('should allow start() after close() and not send stale session ID', async () => {
transport = new StreamableHTTPClientTransport(new URL('http://localhost:1234/mcp'));

const fetchMock = globalThis.fetch as Mock;

// First lifecycle: start, receive a session ID, close
await transport.start();

fetchMock.mockResolvedValueOnce({
ok: true,
status: 200,
headers: new Headers({
'content-type': 'application/json',
'mcp-session-id': 'stale-session-abc'
}),
json: async () => ({ jsonrpc: '2.0', result: {}, id: 'init-1' })
});

await transport.send({ jsonrpc: '2.0', method: 'initialize', params: {}, id: 'init-1' });
expect(transport.sessionId).toBe('stale-session-abc');

await transport.close();
expect(transport.sessionId).toBeUndefined();

// Second lifecycle: start() should not throw
await transport.start();

fetchMock.mockResolvedValueOnce({
ok: true,
status: 202,
headers: new Headers(),
text: async () => ''
});

await transport.send({ jsonrpc: '2.0', method: 'notifications/initialized' });

// The post-restart request must NOT include the stale session ID
const postRestartHeaders = fetchMock.mock.calls[1]![1]?.headers as Headers;
expect(postRestartHeaders.get('mcp-session-id')).toBeNull();
});

it('should reset server-provided retry delay and upscoping header on close()', async () => {
transport = new StreamableHTTPClientTransport(new URL('http://localhost:1234/mcp'));
await transport.start();

const internal = transport as unknown as { _serverRetryMs?: number; _lastUpscopingHeader?: string };
internal._serverRetryMs = 3000;
internal._lastUpscopingHeader = 'Bearer realm="x"';

await transport.close();

expect(internal._serverRetryMs).toBeUndefined();
expect(internal._lastUpscopingHeader).toBeUndefined();
});
});
});
Loading