`agents/graphs/auto_debug`: Add test for hardcoded default thread_id causing state corruption in concurrent invocations #10342

New issue

Open

opened 2026-04-18 08:55:50 +00:00 by HAL9000 · 0 comments

HAL9000 commented

2026-04-18 08:55:50 +00:00

Owner

Metadata

Commit: test(agents/graphs/auto_debug): add failing test for hardcoded thread_id state corruption
Branch: test/auto-debug-concurrent-thread-id

Background and Context

AutoDebugAgent.invoke() and AutoDebugAgent.ainvoke() use a hardcoded default thread_id of "auto-debug" when no config is provided. This means multiple concurrent or sequential invocations without explicit config share the same LangGraph checkpoint state via MemorySaver, causing state corruption. A TDD test must be written first to capture the failing behavior before the fix is applied.

Expected Behavior

When two invocations of AutoDebugAgent are made without providing an explicit thread_id in config, each invocation should operate on an independent checkpoint state. State from invocation 1 must not bleed into invocation 2.

Acceptance Criteria

A test decorated with @tdd_issue, @tdd_issue_1, and @tdd_expected_fail exists in the test suite for agents/graphs/auto_debug
The test verifies that two sequential default-config invocations produce independent results
The test fails (as expected) against the current hardcoded implementation
The test passes once the fix (unique uuid4-based thread ID) is applied

Subtasks

Create test file or add to existing test module for auto_debug
Implement test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids with appropriate decorators
Verify test fails with current implementation (expected)
Confirm test passes after fix is applied (see linked bug issue)

Definition of Done

This issue is closed when:

The failing test is merged to the relevant branch
The test is confirmed to fail against the unfixed code
The linked bug fix issue resolves the test to passing

Test Description

Add a test that verifies AutoDebugAgent.invoke() and AutoDebugAgent.ainvoke() generate unique thread IDs per invocation when no config is provided, preventing state corruption when called concurrently.

Failing Scenario

@tdd_issue
@tdd_issue_1
@tdd_expected_fail
def test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids():
    """Two concurrent invocations with default config must not share checkpoint state."""
    from langchain_community.llms import FakeListLLM
    from cleveragents.agents.graphs.auto_debug import AutoDebugAgent, AutoDebugState

    mock_llm = FakeListLLM(responses=["analysis", "fix", "valid", "analysis2", "fix2", "valid2"])
    agent = AutoDebugAgent(llm=mock_llm)

    state1: AutoDebugState = {
        "error_message": "Error 1",
        "code_context": "code1",
        "messages": [],
        "context": {},
        "result": None,
        "error": None,
        "metadata": {},
        "attempted_fixes": [],
        "current_fix": {},
        "fix_validated": False,
    }
    state2: AutoDebugState = {
        "error_message": "Error 2",
        "code_context": "code2",
        "messages": [],
        "context": {},
        "result": None,
        "error": None,
        "metadata": {},
        "attempted_fixes": [],
        "current_fix": {},
        "fix_validated": False,
    }

    # Both invocations use default config (no thread_id provided)
    result1 = agent.invoke(state1)
    result2 = agent.invoke(state2)

    # Results must be independent - state from invocation 1 must not bleed into invocation 2
    assert result1["error_message"] == "Error 1"
    assert result2["error_message"] == "Error 2"
    # The messages list must not contain entries from both invocations
    assert all(
        msg.get("content", "") != "analysis" or "Error 2" not in str(msg)
        for msg in result2.get("messages", [])
    )

Root Cause

In src/cleveragents/agents/graphs/auto_debug.py, the invoke() and ainvoke() methods use a hardcoded default thread_id:

def invoke(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": "auto-debug"}}  # HARDCODED!
    ...

async def ainvoke(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": "auto-debug"}}  # HARDCODED!
    ...

When multiple callers invoke the agent without providing a config, they all share the same "auto-debug" thread ID in the MemorySaver checkpointer. This causes LangGraph to resume from the previous invocation's checkpoint state rather than starting fresh, leading to state corruption.

Expected Fix

Generate a unique thread ID per invocation when no config is provided:

import uuid

def invoke(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": f"auto-debug-{uuid.uuid4()}"}}
    ...

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit:** `test(agents/graphs/auto_debug): add failing test for hardcoded thread_id state corruption` - **Branch:** `test/auto-debug-concurrent-thread-id` ## Background and Context `AutoDebugAgent.invoke()` and `AutoDebugAgent.ainvoke()` use a hardcoded default `thread_id` of `"auto-debug"` when no config is provided. This means multiple concurrent or sequential invocations without explicit config share the same LangGraph checkpoint state via `MemorySaver`, causing state corruption. A TDD test must be written first to capture the failing behavior before the fix is applied. ## Expected Behavior When two invocations of `AutoDebugAgent` are made without providing an explicit `thread_id` in config, each invocation should operate on an independent checkpoint state. State from invocation 1 must not bleed into invocation 2. ## Acceptance Criteria - [ ] A test decorated with `@tdd_issue`, `@tdd_issue_1`, and `@tdd_expected_fail` exists in the test suite for `agents/graphs/auto_debug` - [ ] The test verifies that two sequential default-config invocations produce independent results - [ ] The test fails (as expected) against the current hardcoded implementation - [ ] The test passes once the fix (unique `uuid4`-based thread ID) is applied ## Subtasks - [ ] Create test file or add to existing test module for `auto_debug` - [ ] Implement `test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids` with appropriate decorators - [ ] Verify test fails with current implementation (expected) - [ ] Confirm test passes after fix is applied (see linked bug issue) ## Definition of Done This issue is closed when: 1. The failing test is merged to the relevant branch 2. The test is confirmed to fail against the unfixed code 3. The linked bug fix issue resolves the test to passing --- ## Test Description Add a test that verifies `AutoDebugAgent.invoke()` and `AutoDebugAgent.ainvoke()` generate unique thread IDs per invocation when no config is provided, preventing state corruption when called concurrently. ## Failing Scenario ```python @tdd_issue @tdd_issue_1 @tdd_expected_fail def test_auto_debug_agent_concurrent_invocations_use_unique_thread_ids(): """Two concurrent invocations with default config must not share checkpoint state.""" from langchain_community.llms import FakeListLLM from cleveragents.agents.graphs.auto_debug import AutoDebugAgent, AutoDebugState mock_llm = FakeListLLM(responses=["analysis", "fix", "valid", "analysis2", "fix2", "valid2"]) agent = AutoDebugAgent(llm=mock_llm) state1: AutoDebugState = { "error_message": "Error 1", "code_context": "code1", "messages": [], "context": {}, "result": None, "error": None, "metadata": {}, "attempted_fixes": [], "current_fix": {}, "fix_validated": False, } state2: AutoDebugState = { "error_message": "Error 2", "code_context": "code2", "messages": [], "context": {}, "result": None, "error": None, "metadata": {}, "attempted_fixes": [], "current_fix": {}, "fix_validated": False, } # Both invocations use default config (no thread_id provided) result1 = agent.invoke(state1) result2 = agent.invoke(state2) # Results must be independent - state from invocation 1 must not bleed into invocation 2 assert result1["error_message"] == "Error 1" assert result2["error_message"] == "Error 2" # The messages list must not contain entries from both invocations assert all( msg.get("content", "") != "analysis" or "Error 2" not in str(msg) for msg in result2.get("messages", []) ) ``` ## Root Cause In `src/cleveragents/agents/graphs/auto_debug.py`, the `invoke()` and `ainvoke()` methods use a hardcoded default `thread_id`: ```python def invoke(self, input_state, config=None): config = config or {"configurable": {"thread_id": "auto-debug"}} # HARDCODED! ... async def ainvoke(self, input_state, config=None): config = config or {"configurable": {"thread_id": "auto-debug"}} # HARDCODED! ... ``` When multiple callers invoke the agent without providing a config, they all share the same `"auto-debug"` thread ID in the `MemorySaver` checkpointer. This causes LangGraph to resume from the previous invocation's checkpoint state rather than starting fresh, leading to state corruption. ## Expected Fix Generate a unique thread ID per invocation when no config is provided: ```python import uuid def invoke(self, input_state, config=None): config = config or {"configurable": {"thread_id": f"auto-debug-{uuid.uuid4()}"}} ... ``` --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor --- **Automated by CleverAgents Bot** Agent: new-issue-creator

HAL9000 added the

labels

2026-04-18 08:56:57 +00:00

HAL9000 referenced this issue

2026-04-18 08:57:48 +00:00

agents/graphs/auto_debug: Hardcoded default thread_id "auto-debug" causes checkpoint state corruption in concurrent invocations #10349