cleveragents/cleveragents-core

Fork 3

`agents/graphs/auto_debug`: Hardcoded default thread_id `"auto-debug"` causes checkpoint state corruption in concurrent invocations #10349

New issue

Open

opened 2026-04-18 08:57:48 +00:00 by HAL9000 · 1 comment

HAL9000 commented

2026-04-18 08:57:48 +00:00

Owner

Metadata

Commit: fix(agents/graphs/auto_debug): replace hardcoded thread_id with uuid4 per invocation
Branch: fix/auto-debug-unique-thread-id

Background and Context

AutoDebugAgent.invoke(), AutoDebugAgent.ainvoke(), and AutoDebugAgent.stream() all use a hardcoded default thread_id of "auto-debug" when no config is provided. The MemorySaver checkpointer stores state keyed by thread_id, so all callers using the default config share the same checkpoint. This causes sequential invocations to resume from the previous invocation's state, and concurrent invocations to race on the same checkpoint — both leading to state corruption.

The TDD test for this bug is tracked in #10342.

Expected Behavior

Each invocation of AutoDebugAgent without an explicit thread_id in config should operate on a fresh, independent checkpoint state. No state from a previous invocation should bleed into a subsequent one.

Acceptance Criteria

invoke() generates a unique thread_id per call when no config is provided (e.g., f"auto-debug-{uuid.uuid4()}")
ainvoke() generates a unique thread_id per call when no config is provided
stream() generates a unique thread_id per call when no config is provided
Two sequential default-config invocations produce fully independent results
The TDD test in #10342 passes after this fix is applied
No regression in existing tests

Subtasks

Import uuid in src/cleveragents/agents/graphs/auto_debug.py
Replace hardcoded "auto-debug" thread_id in invoke() with f"auto-debug-{uuid.uuid4()}"
Replace hardcoded "auto-debug" thread_id in ainvoke() with f"auto-debug-{uuid.uuid4()}"
Replace hardcoded "auto-debug" thread_id in stream() with f"auto-debug-{uuid.uuid4()}"
Run the TDD test from #10342 and confirm it passes
Run full test suite and confirm no regressions

Definition of Done

This issue is closed when:

All three methods (invoke, ainvoke, stream) use unique per-invocation thread IDs
The TDD test in #10342 passes
The full test suite passes with no regressions
The fix is merged to the main branch

Bug Report

Summary

AutoDebugAgent.invoke() and AutoDebugAgent.ainvoke() use a hardcoded default thread_id of "auto-debug" when no config is provided. Multiple concurrent or sequential invocations without explicit config share the same LangGraph checkpoint state, causing state corruption.

Affected File

src/cleveragents/agents/graphs/auto_debug.py

Code Evidence

def invoke(
    self, input_state: AutoDebugState, config: dict[str, Any] | None = None
) -> AutoDebugState:
    config = config or {"configurable": {"thread_id": "auto-debug"}}  # BUG: hardcoded
    result = self.app.invoke(cast(dict[str, Any], input_state), config)
    return cast(AutoDebugState, result)

async def ainvoke(
    self, input_state: AutoDebugState, config: dict[str, Any] | None = None
) -> AutoDebugState:
    config = config or {"configurable": {"thread_id": "auto-debug"}}  # BUG: hardcoded
    result = await self.app.ainvoke(cast(dict[str, Any], input_state), config)
    return cast(AutoDebugState, result)

The stream() method has the same issue:

def stream(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": "auto-debug"}}  # BUG: hardcoded

Impact

The MemorySaver checkpointer stores state keyed by thread_id. When all invocations use "auto-debug" as the thread ID:

Sequential invocations: The second invocation resumes from the first invocation's checkpoint, inheriting stale messages, attempted_fixes, and current_fix state.
Concurrent invocations: Multiple concurrent calls race to read/write the same checkpoint, causing non-deterministic state corruption.
Production impact: Any caller using the default config (e.g., agent.invoke(state) without explicit thread_id) is affected.

Reproduction

from langchain_community.llms import FakeListLLM
from cleveragents.agents.graphs.auto_debug import AutoDebugAgent, AutoDebugState

agent = AutoDebugAgent(llm=FakeListLLM(responses=["a"] * 20))

state: AutoDebugState = {
    "error_message": "test error",
    "code_context": "test code",
    "messages": [],
    "context": {},
    "result": None,
    "error": None,
    "metadata": {},
    "attempted_fixes": [],
    "current_fix": {},
    "fix_validated": False,
}

result1 = agent.invoke(state)
# Second invocation resumes from first invocation's checkpoint
result2 = agent.invoke(state)
# result2["messages"] will contain messages from result1 - state corruption!

Fix

Generate a unique thread ID per invocation:

import uuid

def invoke(self, input_state, config=None):
    config = config or {"configurable": {"thread_id": f"auto-debug-{uuid.uuid4()}"}}
    ...

Apply the same fix to ainvoke() and stream().

Validation Gate

Code evidence: Lines in invoke(), ainvoke(), stream() in auto_debug.py
Environment verification: Reproducible with any two sequential default-config invocations
Actionability: Replace hardcoded string with f"auto-debug-{uuid.uuid4()}"
Codebase freshness: Verified in current HEAD
Severity match: Critical - state corruption in production usage

Blocked By

Depends on TDD issue #10342 (test must be written and confirmed failing before fix is applied).

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit:** `fix(agents/graphs/auto_debug): replace hardcoded thread_id with uuid4 per invocation` - **Branch:** `fix/auto-debug-unique-thread-id` ## Background and Context `AutoDebugAgent.invoke()`, `AutoDebugAgent.ainvoke()`, and `AutoDebugAgent.stream()` all use a hardcoded default `thread_id` of `"auto-debug"` when no config is provided. The `MemorySaver` checkpointer stores state keyed by `thread_id`, so all callers using the default config share the same checkpoint. This causes sequential invocations to resume from the previous invocation's state, and concurrent invocations to race on the same checkpoint — both leading to state corruption. The TDD test for this bug is tracked in #10342. ## Expected Behavior Each invocation of `AutoDebugAgent` without an explicit `thread_id` in config should operate on a fresh, independent checkpoint state. No state from a previous invocation should bleed into a subsequent one. ## Acceptance Criteria - [ ] `invoke()` generates a unique `thread_id` per call when no config is provided (e.g., `f"auto-debug-{uuid.uuid4()}"`) - [ ] `ainvoke()` generates a unique `thread_id` per call when no config is provided - [ ] `stream()` generates a unique `thread_id` per call when no config is provided - [ ] Two sequential default-config invocations produce fully independent results - [ ] The TDD test in #10342 passes after this fix is applied - [ ] No regression in existing tests ## Subtasks - [ ] Import `uuid` in `src/cleveragents/agents/graphs/auto_debug.py` - [ ] Replace hardcoded `"auto-debug"` thread_id in `invoke()` with `f"auto-debug-{uuid.uuid4()}"` - [ ] Replace hardcoded `"auto-debug"` thread_id in `ainvoke()` with `f"auto-debug-{uuid.uuid4()}"` - [ ] Replace hardcoded `"auto-debug"` thread_id in `stream()` with `f"auto-debug-{uuid.uuid4()}"` - [ ] Run the TDD test from #10342 and confirm it passes - [ ] Run full test suite and confirm no regressions ## Definition of Done This issue is closed when: 1. All three methods (`invoke`, `ainvoke`, `stream`) use unique per-invocation thread IDs 2. The TDD test in #10342 passes 3. The full test suite passes with no regressions 4. The fix is merged to the main branch --- ## Bug Report ### Summary `AutoDebugAgent.invoke()` and `AutoDebugAgent.ainvoke()` use a hardcoded default `thread_id` of `"auto-debug"` when no config is provided. Multiple concurrent or sequential invocations without explicit config share the same LangGraph checkpoint state, causing state corruption. ### Affected File `src/cleveragents/agents/graphs/auto_debug.py` ### Code Evidence ```python def invoke( self, input_state: AutoDebugState, config: dict[str, Any] | None = None ) -> AutoDebugState: config = config or {"configurable": {"thread_id": "auto-debug"}} # BUG: hardcoded result = self.app.invoke(cast(dict[str, Any], input_state), config) return cast(AutoDebugState, result) async def ainvoke( self, input_state: AutoDebugState, config: dict[str, Any] | None = None ) -> AutoDebugState: config = config or {"configurable": {"thread_id": "auto-debug"}} # BUG: hardcoded result = await self.app.ainvoke(cast(dict[str, Any], input_state), config) return cast(AutoDebugState, result) ``` The `stream()` method has the same issue: ```python def stream(self, input_state, config=None): config = config or {"configurable": {"thread_id": "auto-debug"}} # BUG: hardcoded ``` ### Impact The `MemorySaver` checkpointer stores state keyed by `thread_id`. When all invocations use `"auto-debug"` as the thread ID: 1. **Sequential invocations**: The second invocation resumes from the first invocation's checkpoint, inheriting stale `messages`, `attempted_fixes`, and `current_fix` state. 2. **Concurrent invocations**: Multiple concurrent calls race to read/write the same checkpoint, causing non-deterministic state corruption. 3. **Production impact**: Any caller using the default config (e.g., `agent.invoke(state)` without explicit `thread_id`) is affected. ### Reproduction ```python from langchain_community.llms import FakeListLLM from cleveragents.agents.graphs.auto_debug import AutoDebugAgent, AutoDebugState agent = AutoDebugAgent(llm=FakeListLLM(responses=["a"] * 20)) state: AutoDebugState = { "error_message": "test error", "code_context": "test code", "messages": [], "context": {}, "result": None, "error": None, "metadata": {}, "attempted_fixes": [], "current_fix": {}, "fix_validated": False, } result1 = agent.invoke(state) # Second invocation resumes from first invocation's checkpoint result2 = agent.invoke(state) # result2["messages"] will contain messages from result1 - state corruption! ``` ### Fix Generate a unique thread ID per invocation: ```python import uuid def invoke(self, input_state, config=None): config = config or {"configurable": {"thread_id": f"auto-debug-{uuid.uuid4()}"}} ... ``` Apply the same fix to `ainvoke()` and `stream()`. ### Validation Gate - [x] Code evidence: Lines in `invoke()`, `ainvoke()`, `stream()` in `auto_debug.py` - [x] Environment verification: Reproducible with any two sequential default-config invocations - [x] Actionability: Replace hardcoded string with `f"auto-debug-{uuid.uuid4()}"` - [x] Codebase freshness: Verified in current HEAD - [x] Severity match: Critical - state corruption in production usage ### Blocked By Depends on TDD issue #10342 (test must be written and confirmed failing before fix is applied). --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor --- **Automated by CleverAgents Bot** Agent: new-issue-creator

HAL9000 added the

labels

2026-04-18 08:58:59 +00:00

HAL9000 referenced this issue

2026-04-18 09:10:52 +00:00

[AUTO-GROOMER] Status: Active (Cycle 6) #10360

HAL9000 added this to the v3.5.0 milestone

2026-04-18 09:11:46 +00:00

HAL9000 commented

2026-04-18 09:13:07 +00:00

Author

Owner

[GROOMED] Quality Analysis & Triage

✅ Issue Validity Assessment

This is a valid, actionable Priority/Critical bug with:

Clear problem statement: Hardcoded thread_id="auto-debug" in AutoDebugAgent.invoke(), ainvoke(), and stream() causes state corruption
Code evidence: Specific lines in src/cleveragents/agents/graphs/auto_debug.py provided
Reproduction steps: Clear example showing state contamination between sequential invocations
Proposed fix: Replace hardcoded string with f"auto-debug-{uuid.uuid4()}"
Severity match: Critical - affects production usage, causes non-deterministic state corruption

✅ Label Verification

Current labels:

✅ State/Unverified - Present (needs transition to State/Verified)
✅ Type/Bug - Correct classification
✅ Priority/Critical - Appropriate severity level

All required labels present.

✅ Milestone Assignment

Assigned to: v3.5.0 (M6: Autonomy Hardening)
Rationale: Priority/Critical bug in core agent component

✅ Acceptance Criteria Review

Issue includes well-defined acceptance criteria and subtasks.

🚨 CRITICAL PRIORITY FLAG

This is a Priority/Critical issue with no State/In Progress assignment. Immediate human attention required.

Grooming Actions Completed

✅ Validity assessment: PASSED
✅ Label verification: PASSED
✅ Milestone assignment: COMPLETED (v3.5.0)
✅ Acceptance criteria review: PASSED
⚠️ State transition: PENDING - Requires manual label change from State/Unverified → State/Verified

Recommendation

READY FOR VERIFICATION - Well-documented and actionable. Recommend:

Manual label transition to State/Verified
Assignment to an implementor
Coordination with #10342 for TDD test dependency
Prioritize for immediate implementation (Priority/Critical)

Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor

## [GROOMED] Quality Analysis & Triage ### ✅ Issue Validity Assessment This is a **valid, actionable Priority/Critical bug** with: - **Clear problem statement**: Hardcoded `thread_id="auto-debug"` in `AutoDebugAgent.invoke()`, `ainvoke()`, and `stream()` causes state corruption - **Code evidence**: Specific lines in `src/cleveragents/agents/graphs/auto_debug.py` provided - **Reproduction steps**: Clear example showing state contamination between sequential invocations - **Proposed fix**: Replace hardcoded string with `f"auto-debug-{uuid.uuid4()}"` - **Severity match**: Critical - affects production usage, causes non-deterministic state corruption ### ✅ Label Verification Current labels: - ✅ **State/Unverified** - Present (needs transition to State/Verified) - ✅ **Type/Bug** - Correct classification - ✅ **Priority/Critical** - Appropriate severity level All required labels present. ### ✅ Milestone Assignment - **Assigned to**: v3.5.0 (M6: Autonomy Hardening) - **Rationale**: Priority/Critical bug in core agent component ### ✅ Acceptance Criteria Review Issue includes well-defined acceptance criteria and subtasks. ### 🚨 CRITICAL PRIORITY FLAG **This is a Priority/Critical issue with no State/In Progress assignment.** Immediate human attention required. ### Grooming Actions Completed - ✅ Validity assessment: PASSED - ✅ Label verification: PASSED - ✅ Milestone assignment: COMPLETED (v3.5.0) - ✅ Acceptance criteria review: PASSED - ⚠️ State transition: PENDING - Requires manual label change from State/Unverified → State/Verified ### Recommendation **READY FOR VERIFICATION** - Well-documented and actionable. Recommend: 1. Manual label transition to State/Verified 2. Assignment to an implementor 3. Coordination with #10342 for TDD test dependency 4. Prioritize for immediate implementation (Priority/Critical) --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor

HAL9000 added

and removed

labels

2026-04-18 09:16:41 +00:00

HAL9000 referenced this issue

2026-04-18 09:41:10 +00:00