Race condition in LangGraph.execution_history when using parallel execution #8301

Open
opened 2026-04-13 08:17:04 +00:00 by HAL9000 · 2 comments
Owner

Metadata

  • Commit Message: fix(concurrency): protect LangGraph.execution_history append with asyncio.Lock
  • Branch Name: bugfix/langgraph-execution-history-race-condition

Background and Context

The LangGraph.execution_history list in src/cleveragents/langgraph/graph.py is appended to inside the async_executor inner function (line 179). When a LangGraph is configured with parallel_execution=True, multiple nodes can execute concurrently via asyncio and each will attempt to append to execution_history at the same time.

While CPython's GIL makes individual list.append() calls atomic for threads, in an asyncio context with await points, coroutines can interleave in ways that produce an incorrect or non-deterministic ordering in the history. More critically, if the implementation is ever extended to use asyncio.gather with true concurrency or thread-pool executors, the unprotected append becomes a data-corruption risk.

The offending code path:

async def async_executor(msg: StreamMessage) -> StreamMessage:
    state = self.state_manager.get_state()
    updates = await node.execute(state)
    self.state_manager.update_state(updates, node_id=node_name)
    self.execution_history.append(node_name)  # ← not protected by a lock
    return StreamMessage(
        content=updates,
        metadata={
            "node": node_name,
            "execution_count": getattr(node, "execution_count", 0),
            "graph": self.name,
        },
    )

Expected Behavior

The execution_history list should be a reliable, consistent log of executed node names. When parallel_execution=True, concurrent node executions must each append their node name atomically, without risk of interleaving, corruption, or lost entries. The history should accurately reflect every node that ran.

Acceptance Criteria

  • An asyncio.Lock (e.g., self._execution_history_lock) is added to the LangGraph.__init__ method.
  • All appends to self.execution_history inside async_executor are wrapped with async with self._execution_history_lock:.
  • The fix is applied consistently — any other locations in graph.py that mutate execution_history are also protected.
  • Unit tests are added that run a LangGraph with parallel_execution=True and at least two concurrent nodes, asserting that all node names appear in execution_history with no duplicates or missing entries.
  • No regressions are introduced in existing LangGraph tests.
  • nox -e lint and nox -e typecheck pass with no new violations.

Subtasks

  • Audit src/cleveragents/langgraph/graph.py for all mutation sites of execution_history
  • Add self._execution_history_lock = asyncio.Lock() to LangGraph.__init__
  • Wrap the self.execution_history.append(node_name) call in async_executor with async with self._execution_history_lock:
  • Check for any other coroutines or methods in graph.py that mutate execution_history and apply the same lock
  • Write a BDD or unit test scenario: parallel graph with 2+ nodes, assert all entries present in history
  • Run full test suite (nox) and confirm no regressions
  • Update CHANGELOG.md with a fix entry under [Unreleased] > Fixed

Definition of Done

This issue should be closed when:

  1. All acceptance criteria above are met and verified.
  2. The asyncio.Lock protection is merged to the appropriate branch.
  3. All existing LangGraph tests pass and new concurrency-focused tests are added.
  4. A peer review has confirmed the fix correctly addresses the race condition without introducing deadlocks or performance regressions.

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit Message**: `fix(concurrency): protect LangGraph.execution_history append with asyncio.Lock` - **Branch Name**: `bugfix/langgraph-execution-history-race-condition` ## Background and Context The `LangGraph.execution_history` list in `src/cleveragents/langgraph/graph.py` is appended to inside the `async_executor` inner function (line 179). When a `LangGraph` is configured with `parallel_execution=True`, multiple nodes can execute concurrently via `asyncio` and each will attempt to append to `execution_history` at the same time. While CPython's GIL makes individual `list.append()` calls atomic for threads, in an `asyncio` context with `await` points, coroutines can interleave in ways that produce an incorrect or non-deterministic ordering in the history. More critically, if the implementation is ever extended to use `asyncio.gather` with true concurrency or thread-pool executors, the unprotected append becomes a data-corruption risk. The offending code path: ```python async def async_executor(msg: StreamMessage) -> StreamMessage: state = self.state_manager.get_state() updates = await node.execute(state) self.state_manager.update_state(updates, node_id=node_name) self.execution_history.append(node_name) # ← not protected by a lock return StreamMessage( content=updates, metadata={ "node": node_name, "execution_count": getattr(node, "execution_count", 0), "graph": self.name, }, ) ``` ## Expected Behavior The `execution_history` list should be a reliable, consistent log of executed node names. When `parallel_execution=True`, concurrent node executions must each append their node name atomically, without risk of interleaving, corruption, or lost entries. The history should accurately reflect every node that ran. ## Acceptance Criteria - [ ] An `asyncio.Lock` (e.g., `self._execution_history_lock`) is added to the `LangGraph.__init__` method. - [ ] All appends to `self.execution_history` inside `async_executor` are wrapped with `async with self._execution_history_lock:`. - [ ] The fix is applied consistently — any other locations in `graph.py` that mutate `execution_history` are also protected. - [ ] Unit tests are added that run a `LangGraph` with `parallel_execution=True` and at least two concurrent nodes, asserting that all node names appear in `execution_history` with no duplicates or missing entries. - [ ] No regressions are introduced in existing `LangGraph` tests. - [ ] `nox -e lint` and `nox -e typecheck` pass with no new violations. ## Subtasks - [ ] Audit `src/cleveragents/langgraph/graph.py` for all mutation sites of `execution_history` - [ ] Add `self._execution_history_lock = asyncio.Lock()` to `LangGraph.__init__` - [ ] Wrap the `self.execution_history.append(node_name)` call in `async_executor` with `async with self._execution_history_lock:` - [ ] Check for any other coroutines or methods in `graph.py` that mutate `execution_history` and apply the same lock - [ ] Write a BDD or unit test scenario: parallel graph with 2+ nodes, assert all entries present in history - [ ] Run full test suite (`nox`) and confirm no regressions - [ ] Update `CHANGELOG.md` with a fix entry under `[Unreleased] > Fixed` ## Definition of Done This issue should be closed when: 1. All acceptance criteria above are met and verified. 2. The `asyncio.Lock` protection is merged to the appropriate branch. 3. All existing `LangGraph` tests pass and new concurrency-focused tests are added. 4. A peer review has confirmed the fix correctly addresses the race condition without introducing deadlocks or performance regressions. --- **Automated by CleverAgents Bot** Agent: new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-13 08:23:41 +00:00
Author
Owner

Verified — Race condition in LangGraph.execution_history during parallel execution could cause data corruption in execution history. Should Have fix for v3.5.0 — important for parallel execution reliability. Verified.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Race condition in LangGraph.execution_history during parallel execution could cause data corruption in execution history. **Should Have** fix for v3.5.0 — important for parallel execution reliability. Verified. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

[AUTO-EPIC] Epic Linkage

This issue is a child of Epic #8083 — Hierarchical Plan Decomposition & Parallel Scaling (M6) (v3.5.0).

The LangGraph execution_history race condition in parallel execution is a concurrency safety issue that must be resolved for parallel subplan execution to work correctly.

Dependency direction: This issue (#8301) BLOCKS Epic #8083.


Automated by CleverAgents Bot
Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor

## [AUTO-EPIC] Epic Linkage This issue is a child of **Epic #8083** — Hierarchical Plan Decomposition & Parallel Scaling (M6) (v3.5.0). The LangGraph execution_history race condition in parallel execution is a concurrency safety issue that must be resolved for parallel subplan execution to work correctly. **Dependency direction**: This issue (#8301) BLOCKS Epic #8083. --- **Automated by CleverAgents Bot** Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8301
No description provided.