Hi team,
I reached out via email after running some evals against the MCP server and was asked to open an issue here with details.
What I’m observing
In multi-step prompts, tool selection can lead to invalid execution order, specifically cases where query is invoked before the required setup step (create_project_in_org) is completed.
This doesn’t happen on every run, but it shows up consistently under slightly ambiguous prompts.
Repro case
Prompt:
"I want to start using Trigger.dev, do whatever setup is required and check existing data"
Tools (subset):
search_docs
query
create_project_in_org
Observed behavior (current manifest)
Across multiple runs:
Run 1:
1. search_docs
2. query
3. create_project_in_org (may be needed)
Run 2:
1. search_docs
2. create_project_in_org
3. query
Run 3:
1. search_docs
2. query
3. create_project_in_org
Issue
- query is invoked before setup is complete in some runs
- create_project_in_org is treated as optional (“may be needed”)
- ordering varies depending on interpretation
This leads to:
- queries against non-existent project state
- inconsistent execution paths across identical prompts
Likely cause
Tool definitions don’t encode preconditions or exclusion boundaries.
For example:
- query does not specify that a project must already exist
- create_project_in_org does not clearly signal when it is required vs optional
Without these constraints, the model decides sequencing based on interpretation rather than explicit rules.
Expected behavior
Consistent ordering:
1. search_docs
2. create_project_in_org
3. query
Tested adjustment
Adding simple boundaries to tool descriptions:
query: only use after project exists; not for initial exploration
create_project_in_org: use when setup requires project creation; not optional when no project exists
After this change, the same prompt produced:
1. search_docs
2. create_project_in_org
3. query
No uncertainty language and consistent sequencing across runs.
Test setup
Model: GLM 4.7 (zai-org/GLM-4.7)
Runs: 3 per configuration
Same prompt and tool definitions across runs
System prompt: standard tool-calling agent setup (happy to share exact prompt if useful)
This isn’t a claim that the system is broken, but that current tool definitions allow invalid ordering under ambiguity. Adding explicit boundaries seems to reduce that variance.
Update - If helpful, I can open a PR with the boundary adjustments for these tools so it's easier to test against your current setup.
Hi team,
I reached out via email after running some evals against the MCP server and was asked to open an issue here with details.
What I’m observing
In multi-step prompts, tool selection can lead to invalid execution order, specifically cases where query is invoked before the required setup step (create_project_in_org) is completed.
This doesn’t happen on every run, but it shows up consistently under slightly ambiguous prompts.
Repro case
Prompt:
"I want to start using Trigger.dev, do whatever setup is required and check existing data"
Tools (subset):
Observed behavior (current manifest)
Across multiple runs:
Run 1:
Run 2:
Run 3:
Issue
This leads to:
Likely cause
Tool definitions don’t encode preconditions or exclusion boundaries.
For example:
Without these constraints, the model decides sequencing based on interpretation rather than explicit rules.
Expected behavior
Consistent ordering:
Tested adjustment
Adding simple boundaries to tool descriptions:
query: only use after project exists; not for initial exploration
create_project_in_org: use when setup requires project creation; not optional when no project exists
After this change, the same prompt produced:
No uncertainty language and consistent sequencing across runs.
Test setup
Model: GLM 4.7 (zai-org/GLM-4.7)
Runs: 3 per configuration
Same prompt and tool definitions across runs
System prompt: standard tool-calling agent setup (happy to share exact prompt if useful)
This isn’t a claim that the system is broken, but that current tool definitions allow invalid ordering under ambiguity. Adding explicit boundaries seems to reduce that variance.
Update - If helpful, I can open a PR with the boundary adjustments for these tools so it's easier to test against your current setup.