BUG-HUNT: [concurrency] `PlanGenerationGraph.invoke/ainvoke/stream` default `thread_id="default"` causes state contamination across independent plan generation sessions #6650

New issue

Open

opened 2026-04-09 22:45:55 +00:00 by HAL9000 · 0 comments

HAL9000 commented

2026-04-09 22:45:55 +00:00

Owner

Bug Report: [concurrency] — Shared Default `thread_id` Contaminates Independent Plan Generation Sessions

Severity Assessment

Impact: Two concurrent or sequential calls to PlanGenerationGraph.invoke(), ainvoke(), or stream() without explicit thread_id arguments share the same LangGraph checkpoint namespace ("default"). Prior run state bleeds into subsequent runs: leftover retry_count, partial generated_changes, stale validation_result, and corrupted analyzed_requirements from a previous session. Plans generated in a shared-thread context may incorporate wrong project data, wrong prompts, or incorrectly recycled context analysis from an unrelated prior request.
Likelihood: High — the default value is "default" in all three public entry-points. Any integration test, CLI batch invocation, or multi-tenant server using a long-lived PlanGenerationGraph instance without passing unique thread_id is affected.
Priority: High (same class as #6535 which was filed Critical for AutoDebugAgent)

Location

File: src/cleveragents/agents/graphs/plan_generation.py
Class: PlanGenerationGraph
Methods: invoke, ainvoke, stream
Lines: 649, 698, 747

Description

All three public entry-points on PlanGenerationGraph have thread_id: str = "default" as their default parameter:

# line 649
def invoke(
    self,
    project: Project,
    plan: Plan,
    contexts: list[Context],
    thread_id: str = "default",       # ← shared namespace
    ...
) -> PlanGenerationState:
    ...
    config = {"configurable": {"thread_id": thread_id}}
    result = self.app.invoke(initial_state, config)

# line 698
async def ainvoke(
    self,
    project: Project,
    plan: Plan,
    contexts: list[Context],
    thread_id: str = "default",       # ← shared namespace
    ...
) -> PlanGenerationState:
    ...
    config = {"configurable": {"thread_id": thread_id}}
    result = await self.app.ainvoke(initial_state, config)

# line 747
def stream(
    self,
    project: Project,
    plan: Plan,
    contexts: list[Context],
    thread_id: str = "default",       # ← shared namespace
    ...
) -> Iterator[dict[str, Any]]:
    ...
    config = {"configurable": {"thread_id": thread_id}}
    yield from self.app.stream(initial_state, config)

PlanGenerationGraph.__init__ creates a single BoundedMemorySaver instance (self.checkpointer = BoundedMemorySaver(max_checkpoints=checkpoint_limit)) shared across all invocations. LangGraph uses thread_id to namespace checkpoints in the saver. When all callers share "default":

Checkpoint state from plan A is stored under key "default".
Plan B's invoke() starts, re-uses the same key, and LangGraph resumes from A's checkpoint instead of starting fresh.
retry_count, generated_changes, analyzed_requirements, and validation_result from run A are visible in run B's initial state.

This is the same root cause as #6535 (AutoDebugAgent default thread_id="auto-debug"), confirmed by the comment in that issue: "any caller that does not supply a unique thread_id inherits the previous run's persisted state."

For comparison, the _load_context node already generates unique IDs for its sub-workflow (config = {"configurable": {"thread_id": f"context-analysis-{uuid4()}"}}), line 623), demonstrating awareness of the pattern elsewhere in the file — but the top-level entry-points were not updated.

Expected Behavior

Each call to invoke(), ainvoke(), or stream() that does not explicitly pass a thread_id should receive a unique, isolated checkpoint namespace so that no state from one plan generation run bleeds into another.

Actual Behavior

All default invocations share the "default" checkpoint namespace, causing cross-contamination of plan state between runs on the same PlanGenerationGraph instance.

Suggested Fix

Generate a unique thread_id when none is supplied, matching the pattern already used in _load_context:

from uuid import uuid4

def invoke(
    self,
    project: Project,
    plan: Plan,
    contexts: list[Context],
    thread_id: str | None = None,
    ...
) -> PlanGenerationState:
    if thread_id is None:
        thread_id = f"plan-gen-{uuid4()}"
    config = {"configurable": {"thread_id": thread_id}}
    ...

Apply the same change to ainvoke and stream. Callers that deliberately want continuity across calls can still pass an explicit stable thread_id.

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [concurrency] — Shared Default `thread_id` Contaminates Independent Plan Generation Sessions ### Severity Assessment - **Impact**: Two concurrent or sequential calls to `PlanGenerationGraph.invoke()`, `ainvoke()`, or `stream()` without explicit `thread_id` arguments share the same LangGraph checkpoint namespace (`"default"`). Prior run state bleeds into subsequent runs: leftover `retry_count`, partial `generated_changes`, stale `validation_result`, and corrupted `analyzed_requirements` from a previous session. Plans generated in a shared-thread context may incorporate wrong project data, wrong prompts, or incorrectly recycled context analysis from an unrelated prior request. - **Likelihood**: High — the default value is `"default"` in all three public entry-points. Any integration test, CLI batch invocation, or multi-tenant server using a long-lived `PlanGenerationGraph` instance without passing unique `thread_id` is affected. - **Priority**: High (same class as #6535 which was filed Critical for `AutoDebugAgent`) ### Location - **File**: `src/cleveragents/agents/graphs/plan_generation.py` - **Class**: `PlanGenerationGraph` - **Methods**: `invoke`, `ainvoke`, `stream` - **Lines**: 649, 698, 747 ### Description All three public entry-points on `PlanGenerationGraph` have `thread_id: str = "default"` as their default parameter: ```python # line 649 def invoke( self, project: Project, plan: Plan, contexts: list[Context], thread_id: str = "default", # ← shared namespace ... ) -> PlanGenerationState: ... config = {"configurable": {"thread_id": thread_id}} result = self.app.invoke(initial_state, config) # line 698 async def ainvoke( self, project: Project, plan: Plan, contexts: list[Context], thread_id: str = "default", # ← shared namespace ... ) -> PlanGenerationState: ... config = {"configurable": {"thread_id": thread_id}} result = await self.app.ainvoke(initial_state, config) # line 747 def stream( self, project: Project, plan: Plan, contexts: list[Context], thread_id: str = "default", # ← shared namespace ... ) -> Iterator[dict[str, Any]]: ... config = {"configurable": {"thread_id": thread_id}} yield from self.app.stream(initial_state, config) ``` `PlanGenerationGraph.__init__` creates a single `BoundedMemorySaver` instance (`self.checkpointer = BoundedMemorySaver(max_checkpoints=checkpoint_limit)`) shared across all invocations. LangGraph uses `thread_id` to namespace checkpoints in the saver. When all callers share `"default"`: 1. Checkpoint state from plan A is stored under key `"default"`. 2. Plan B's `invoke()` starts, re-uses the same key, and LangGraph resumes from A's checkpoint instead of starting fresh. 3. `retry_count`, `generated_changes`, `analyzed_requirements`, and `validation_result` from run A are visible in run B's initial state. This is the same root cause as #6535 (`AutoDebugAgent` default `thread_id="auto-debug"`), confirmed by the comment in that issue: "any caller that does not supply a unique thread_id inherits the previous run's persisted state." For comparison, the `_load_context` node already generates unique IDs for its sub-workflow (`config = {"configurable": {"thread_id": f"context-analysis-{uuid4()}"}}`), line 623), demonstrating awareness of the pattern elsewhere in the file — but the top-level entry-points were not updated. ### Expected Behavior Each call to `invoke()`, `ainvoke()`, or `stream()` that does not explicitly pass a `thread_id` should receive a unique, isolated checkpoint namespace so that no state from one plan generation run bleeds into another. ### Actual Behavior All default invocations share the `"default"` checkpoint namespace, causing cross-contamination of plan state between runs on the same `PlanGenerationGraph` instance. ### Suggested Fix Generate a unique `thread_id` when none is supplied, matching the pattern already used in `_load_context`: ```python from uuid import uuid4 def invoke( self, project: Project, plan: Plan, contexts: list[Context], thread_id: str | None = None, ... ) -> PlanGenerationState: if thread_id is None: thread_id = f"plan-gen-{uuid4()}" config = {"configurable": {"thread_id": thread_id}} ... ``` Apply the same change to `ainvoke` and `stream`. Callers that deliberately want continuity across calls can still pass an explicit stable `thread_id`. ### Category `concurrency` ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter

HAL9000 added the

labels

2026-04-09 23:00:54 +00:00

HAL9000 added this to the v3.2.0 milestone

2026-04-09 23:04:01 +00:00

HAL9000 referenced this issue