Skip to content

MCP: Premature tool usage under multi-step prompts (repro case) #3482

@Afterknight

Description

@Afterknight

Hi team,

I reached out via email after running some evals against the MCP server and was asked to open an issue here with details.

What I’m observing

In multi-step prompts, tool selection can lead to invalid execution order, specifically cases where query is invoked before the required setup step (create_project_in_org) is completed.

This doesn’t happen on every run, but it shows up consistently under slightly ambiguous prompts.

Repro case

Prompt:

"I want to start using Trigger.dev, do whatever setup is required and check existing data"

Tools (subset):

search_docs
query
create_project_in_org

Observed behavior (current manifest)

Across multiple runs:

Run 1:

1. search_docs  
2. query  
3. create_project_in_org (may be needed)

Run 2:

1. search_docs  
2. create_project_in_org  
3. query

Run 3:

1. search_docs  
2. query  
3. create_project_in_org

Issue

  • query is invoked before setup is complete in some runs
  • create_project_in_org is treated as optional (“may be needed”)
  • ordering varies depending on interpretation

This leads to:

  • queries against non-existent project state
  • inconsistent execution paths across identical prompts

Likely cause

Tool definitions don’t encode preconditions or exclusion boundaries.

For example:

  • query does not specify that a project must already exist
  • create_project_in_org does not clearly signal when it is required vs optional

Without these constraints, the model decides sequencing based on interpretation rather than explicit rules.

Expected behavior

Consistent ordering:

1. search_docs  
2. create_project_in_org  
3. query

Tested adjustment

Adding simple boundaries to tool descriptions:

query: only use after project exists; not for initial exploration
create_project_in_org: use when setup requires project creation; not optional when no project exists

After this change, the same prompt produced:

1. search_docs  
2. create_project_in_org  
3. query

No uncertainty language and consistent sequencing across runs.

Test setup

Model: GLM 4.7 (zai-org/GLM-4.7)
Runs: 3 per configuration
Same prompt and tool definitions across runs
System prompt: standard tool-calling agent setup (happy to share exact prompt if useful)

This isn’t a claim that the system is broken, but that current tool definitions allow invalid ordering under ambiguity. Adding explicit boundaries seems to reduce that variance.

Happy to share the rewritten definitions or the full eval setup if it helps reproduce this internally.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions