Skip to content

fix: retry on Anthropic overloaded_error instead of halting session#20373

Open
OliPetry wants to merge 2 commits intoanomalyco:devfrom
OliPetry:fix/retry-anthropic-overloaded-error
Open

fix: retry on Anthropic overloaded_error instead of halting session#20373
OliPetry wants to merge 2 commits intoanomalyco:devfrom
OliPetry:fix/retry-anthropic-overloaded-error

Conversation

@OliPetry
Copy link
Copy Markdown

@OliPetry OliPetry commented Apr 1, 2026

Issue for this PR

Closes #20384

Type of change

  • Bug fix
  • New feature
  • Refactor / code improvement
  • Documentation

What does this PR do?

Anthropic returns {"type":"overloaded_error","message":"Overloaded"} as a stream error when their API is at capacity. This was not recognized by the error handling pipeline, causing sessions to halt with a terminal error instead of retrying.

The root cause: parseStreamError() in error.ts only checked for body.type === "error", so "overloaded_error" fell through. The error became a NamedError.Unknown, and retryable() in retry.ts didn't match it either since its JSON fallback path only checked for type === "error" variants.

The fix adds overloaded_error handling in two places:

  1. parseStreamError() — recognizes type: "overloaded_error" and returns it as a retryable api_error. Also widens ParsedStreamError.isRetryable from literal false to boolean so stream errors can be retryable.
  2. retryable() fallback — adds json.type === "overloaded_error" check as a safety net for the JSON parsing path.

With this fix, overloaded errors trigger the existing exponential backoff retry logic (2s → 4s → 8s → 16s → 30s cap) instead of killing the session.

How did you verify your code works?

  1. Traced the error flow manually from the raw Anthropic error through fromError()parseStreamError()retryable() confirming the classification bug
  2. Verified the fix path: parseStreamError() now returns {type: "api_error", isRetryable: true}, which fromError() wraps as MessageV2.APIError, which retryable() matches at line 54-58 (existing isRetryable + "Overloaded" check)
  3. Unit tests and typecheck pass

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

kitlangton and others added 2 commits March 31, 2026 20:07
Anthropic returns {"type":"overloaded_error","message":"Overloaded"}
as a stream error when at capacity. This was not recognized by
parseStreamError() (which only handled type=="error") or the fallback
JSON check in retryable(), causing the session to halt with a terminal
error instead of retrying with exponential backoff.

- Add overloaded_error handling to parseStreamError() as a retryable
  api_error
- Widen ParsedStreamError.isRetryable from literal false to boolean
- Add overloaded_error check to retryable() fallback JSON path as
  a safety net
@github-actions github-actions bot added needs:compliance This means the issue will auto-close after 2 hours. needs:issue labels Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Thanks for your contribution!

This PR doesn't have a linked issue. All PRs must reference an existing issue.

Please:

  1. Open an issue describing the bug/feature (if one doesn't exist)
  2. Add Fixes #<number> or Closes #<number> to this PR description

See CONTRIBUTING.md for details.

@github-actions github-actions bot removed needs:compliance This means the issue will auto-close after 2 hours. needs:issue labels Apr 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Thanks for updating your PR! It now meets our contributing guidelines. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Anthropic overloaded_error is not retried, halts session

2 participants