Skip to content

fix: cancel in-flight model task when a parallel input guardrail errors#3564

Open
bymle wants to merge 1 commit into
openai:mainfrom
bymle:fix/parallel-guardrail-orphaned-model-task
Open

fix: cancel in-flight model task when a parallel input guardrail errors#3564
bymle wants to merge 1 commit into
openai:mainfrom
bymle:fix/parallel-guardrail-orphaned-model-task

Conversation

@bymle
Copy link
Copy Markdown

@bymle bymle commented Jun 2, 2026

Summary

When input guardrails run in parallel with the model turn (run_in_parallel=True), the model turn is started with asyncio.create_task(...) and awaited together with the guardrails via asyncio.gather(...) in AgentRunner.run (src/agents/run.py).

The existing handler only catches InputGuardrailTripwireTriggered, in which case it cancels and drains the in-flight model_task:

except InputGuardrailTripwireTriggered:
    if should_cancel_parallel_model_task_on_input_guardrail_trip():
        if not model_task.done():
            model_task.cancel()
        await asyncio.gather(model_task, return_exceptions=True)
    ...
    raise

If a parallel guardrail (or the model turn) raises any other exception — e.g. a bug in a user's guardrail function, a ValueError, a transient error — asyncio.gather propagates that exception but does not cancel the sibling awaitables, so model_task is left running after run() has already raised. That orphaned task keeps a live model request going and surfaces as Task was destroyed but it is pending! / Task exception was never retrieved warnings, plus wasted tokens/quota.

This adds a generic except clause that performs the same flag-gated cancel-and-drain of the model task as the tripwire path, so a non-tripwire guardrail error no longer leaks the in-flight model task. The behavior on the tripwire path is unchanged, and the should_cancel_parallel_model_task_on_input_guardrail_trip() gate (Temporal replay compatibility) is respected in both paths.

Test plan

  • Added test_parallel_guardrail_non_tripwire_error_cancels_model_task in tests/test_guardrails.py, mirroring the existing test_parallel_guardrail_trip_cancels_model_task but with a parallel guardrail that raises a non-tripwire RuntimeError after the model starts. It asserts the error propagates from Runner.run and that the in-flight model task was cancelled.
  • Verified the test fails on main (assert model_cancelled.is_set() is Trueassert False is True, i.e. the model task was orphaned) and passes with this change.
  • make format, make lint, make typecheck clean; full suite passes (4625 passed, 5 skipped).

Issue number

N/A (found via code review).

Checks

  • I've added new tests (if relevant)
  • I've added/updated the relevant documentation (no user-facing docs affected)
  • I've run make lint and make format
  • I've made sure tests pass

When input guardrails run in parallel with the model turn, the model task
is started with asyncio.create_task and awaited together with the
guardrails via asyncio.gather. The existing except clause only handles
InputGuardrailTripwireTriggered, where it cancels the in-flight model task.

If a parallel guardrail (or the model turn) raises any other exception,
asyncio.gather propagates it but does not cancel the sibling awaitables,
so the model task is left running orphaned: a live model request keeps
going after run() has already raised, producing "Task was destroyed but
it is pending" / "Task exception was never retrieved" warnings and wasting
quota.

Add a generic except clause that performs the same flag-gated cancel and
drain of the model task as the tripwire path, so a non-tripwire guardrail
error no longer leaks it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant