agents/graphs/auto_debug: Add test for hardcoded default thread_id causing state corruption in concurrent invocations #10342

Open
opened 2026-04-18 08:55:50 +00:00 by HAL9000 · 0 comments
Owner

Metadata

  • Commit: test(agents/graphs/auto_debug): add failing test for hardcoded thread_id state corruption
  • Branch: test/auto-debug-concurrent-thread-id

Background and Context

AutoDebugAgent.invoke() and AutoDebugAgent.ainvoke() use a hardcoded default thread_id of "auto-debug" when no config is provided. This means multiple concurrent or sequential invocations without explicit config share the same LangGraph checkpoint state via MemorySaver, causing state corruption. A TDD test must be written first to capture the failing behavior before the fix is applied.

Expected Behavior

When two invocations of AutoDebugAgent are made without providing an explicit thread_id in config, each invocation should operate on an independent checkpoint state. State from invocation 1 must not bleed into invocation 2.

Acceptance Criteria

  • A test decorated with @tdd_issue, @tdd_issue_1, and @tdd_expected_fail exists in the test suite for agents/graphs/auto_debug
  • The test verifies that two sequential default-config invocations produce independent results
  • The test fails (as expected) against the current hardcoded implementation
  • The test passes once the fix (unique uuid4-based thread ID) is applied

Subtasks

  • Create test file or add to existing test module for auto_debug
  • Implement test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids with appropriate decorators
  • Verify test fails with current implementation (expected)
  • Confirm test passes after fix is applied (see linked bug issue)

Definition of Done

This issue is closed when:

  1. The failing test is merged to the relevant branch
  2. The test is confirmed to fail against the unfixed code
  3. The linked bug fix issue resolves the test to passing

Test Description

Add a test that verifies AutoDebugAgent.invoke() and AutoDebugAgent.ainvoke() generate unique thread IDs per invocation when no config is provided, preventing state corruption when called concurrently.

Failing Scenario

@tdd_issue
@tdd_issue_1
@tdd_expected_fail
def test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids():
    """Two concurrent invocations with default config must not share checkpoint state."""
    from langchain_community.llms import FakeListLLM
    from cleveragents.agents.graphs.auto_debug import AutoDebugAgent, AutoDebugState

    mock_llm = FakeListLLM(responses=["analysis", "fix", "valid", "analysis2", "fix2", "valid2"])
    agent = AutoDebugAgent(llm=mock_llm)

    state1: AutoDebugState = {
        "error_message": "Error 1",
        "code_context": "code1",
        "messages": [],
        "context": {},
        "result": None,
        "error": None,
        "metadata": {},
        "attempted_fixes": [],
        "current_fix": {},
        "fix_validated": False,
    }
    state2: AutoDebugState = {
        "error_message": "Error 2",
        "code_context": "code2",
        "messages": [],
        "context": {},
        "result": None,
        "error": None,
        "metadata": {},
        "attempted_fixes": [],
        "current_fix": {},
        "fix_validated": False,
    }

    # Both invocations use default config (no thread_id provided)
    result1 = agent.invoke(state1)
    result2 = agent.invoke(state2)

    # Results must be independent - state from invocation 1 must not bleed into invocation 2
    assert result1["error_message"] == "Error 1"
    assert result2["error_message"] == "Error 2"
    # The messages list must not contain entries from both invocations
    assert all(
        msg.get("content", "") != "analysis" or "Error 2" not in str(msg)
        for msg in result2.get("messages", [])
    )

Root Cause

In src/cleveragents/agents/graphs/auto_debug.py, the invoke() and ainvoke() methods use a hardcoded default thread_id:

def invoke(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": "auto-debug"}}  # HARDCODED!
    ...

async def ainvoke(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": "auto-debug"}}  # HARDCODED!
    ...

When multiple callers invoke the agent without providing a config, they all share the same "auto-debug" thread ID in the MemorySaver checkpointer. This causes LangGraph to resume from the previous invocation's checkpoint state rather than starting fresh, leading to state corruption.

Expected Fix

Generate a unique thread ID per invocation when no config is provided:

import uuid

def invoke(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": f"auto-debug-{uuid.uuid4()}"}}
    ...

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor


Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit:** `test(agents/graphs/auto_debug): add failing test for hardcoded thread_id state corruption` - **Branch:** `test/auto-debug-concurrent-thread-id` ## Background and Context `AutoDebugAgent.invoke()` and `AutoDebugAgent.ainvoke()` use a hardcoded default `thread_id` of `"auto-debug"` when no config is provided. This means multiple concurrent or sequential invocations without explicit config share the same LangGraph checkpoint state via `MemorySaver`, causing state corruption. A TDD test must be written first to capture the failing behavior before the fix is applied. ## Expected Behavior When two invocations of `AutoDebugAgent` are made without providing an explicit `thread_id` in config, each invocation should operate on an independent checkpoint state. State from invocation 1 must not bleed into invocation 2. ## Acceptance Criteria - [ ] A test decorated with `@tdd_issue`, `@tdd_issue_1`, and `@tdd_expected_fail` exists in the test suite for `agents/graphs/auto_debug` - [ ] The test verifies that two sequential default-config invocations produce independent results - [ ] The test fails (as expected) against the current hardcoded implementation - [ ] The test passes once the fix (unique `uuid4`-based thread ID) is applied ## Subtasks - [ ] Create test file or add to existing test module for `auto_debug` - [ ] Implement `test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids` with appropriate decorators - [ ] Verify test fails with current implementation (expected) - [ ] Confirm test passes after fix is applied (see linked bug issue) ## Definition of Done This issue is closed when: 1. The failing test is merged to the relevant branch 2. The test is confirmed to fail against the unfixed code 3. The linked bug fix issue resolves the test to passing --- ## Test Description Add a test that verifies `AutoDebugAgent.invoke()` and `AutoDebugAgent.ainvoke()` generate unique thread IDs per invocation when no config is provided, preventing state corruption when called concurrently. ## Failing Scenario ```python @tdd_issue @tdd_issue_1 @tdd_expected_fail def test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids(): """Two concurrent invocations with default config must not share checkpoint state.""" from langchain_community.llms import FakeListLLM from cleveragents.agents.graphs.auto_debug import AutoDebugAgent, AutoDebugState mock_llm = FakeListLLM(responses=["analysis", "fix", "valid", "analysis2", "fix2", "valid2"]) agent = AutoDebugAgent(llm=mock_llm) state1: AutoDebugState = { "error_message": "Error 1", "code_context": "code1", "messages": [], "context": {}, "result": None, "error": None, "metadata": {}, "attempted_fixes": [], "current_fix": {}, "fix_validated": False, } state2: AutoDebugState = { "error_message": "Error 2", "code_context": "code2", "messages": [], "context": {}, "result": None, "error": None, "metadata": {}, "attempted_fixes": [], "current_fix": {}, "fix_validated": False, } # Both invocations use default config (no thread_id provided) result1 = agent.invoke(state1) result2 = agent.invoke(state2) # Results must be independent - state from invocation 1 must not bleed into invocation 2 assert result1["error_message"] == "Error 1" assert result2["error_message"] == "Error 2" # The messages list must not contain entries from both invocations assert all( msg.get("content", "") != "analysis" or "Error 2" not in str(msg) for msg in result2.get("messages", []) ) ``` ## Root Cause In `src/cleveragents/agents/graphs/auto_debug.py`, the `invoke()` and `ainvoke()` methods use a hardcoded default `thread_id`: ```python def invoke(self, input_state, config=None): config = config or {"configurable": {"thread_id": "auto-debug"}} # HARDCODED! ... async def ainvoke(self, input_state, config=None): config = config or {"configurable": {"thread_id": "auto-debug"}} # HARDCODED! ... ``` When multiple callers invoke the agent without providing a config, they all share the same `"auto-debug"` thread ID in the `MemorySaver` checkpointer. This causes LangGraph to resume from the previous invocation's checkpoint state rather than starting fresh, leading to state corruption. ## Expected Fix Generate a unique thread ID per invocation when no config is provided: ```python import uuid def invoke(self, input_state, config=None): config = config or {"configurable": {"thread_id": f"auto-debug-{uuid.uuid4()}"}} ... ``` --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor --- **Automated by CleverAgents Bot** Agent: new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10342
No description provided.