BUG: [concurrency] Potential for Race Conditions in _tell_streaming #3725

Open
opened 2026-04-05 22:19:29 +00:00 by freemo · 0 comments
Owner

Background and Context

The _tell_streaming function in src/cleveragents/cli/commands/plan.py (lines 736–842) uses asyncio to handle streaming plan generation with real-time progress display via rich.Live. However, there are no locks or other synchronization mechanisms protecting shared mutable state (e.g., the status Text object). If multiple async events are dispatched concurrently — which is architecturally possible given the A2A streaming protocol and the parallel sub-plan spawning described in the specification — concurrent mutations to status could produce corrupted CLI output or silent data races.

Current Behavior

The _tell_streaming coroutine iterates over plan_service.generate_plan_streaming(...) and mutates a shared rich.Text status object on each event without any locking:

async def _tell_streaming(
    project: Project,
    description: str,
    name: str | None,
    plan_service: Any,
    actor: str | None = None,
) -> None:
    from rich.live import Live
    from rich.text import Text

    status = Text()
    status.append("Starting plan generation...\n\n", style="bold cyan")

    with Live(status, console=console, refresh_per_second=4) as live:
        try:
            async for event in plan_service.generate_plan_streaming(
                project, description, name, actor=actor,
            ):
                # ... mutates `status` without a lock ...
                live.update(status)

If the event source yields events that trigger concurrent callbacks or if the Live refresh timer fires while an event handler is mid-mutation, the status object can be read in a partially-updated state, resulting in corrupted output.

Expected Behavior

The _tell_streaming function should protect all mutations to the shared status object with an asyncio.Lock, ensuring that only one coroutine path modifies the display state at a time, and that live.update(status) is always called on a fully-consistent object.

Acceptance Criteria

  • An asyncio.Lock (or equivalent synchronization primitive) is introduced to guard all read-modify-write operations on the status object inside _tell_streaming.
  • The lock is acquired before any mutation of status and released after live.update(status) is called.
  • No deadlocks are introduced; the lock scope is kept minimal.
  • Existing unit and integration tests for plan tell / _tell_streaming continue to pass.
  • New tests are added to exercise concurrent event delivery and assert output consistency.
  • All nox default sessions pass; coverage ≥ 97%.

Supporting Information

  • File: src/cleveragents/cli/commands/plan.py
  • Function: _tell_streaming
  • Lines: 736–842
  • Severity: Medium — corrupted CLI output; low likelihood (timing-dependent)
  • Category: concurrency
  • Architectural context: The A2A Protocol (message/stream) and parallel sub-plan spawning (subplan_parallel_spawn decision nodes) make concurrent event delivery a realistic scenario as the system matures.

Metadata

  • Branch: fix/concurrency-tell-streaming-race-condition
  • Commit Message: fix(cli): add asyncio lock to _tell_streaming to prevent race conditions
  • Milestone: (backlog — see note below)
  • Parent Epic: #368

Subtasks

  • Audit _tell_streaming for all shared mutable state accessed across async boundaries
  • Introduce asyncio.Lock to guard status mutations and live.update() calls
  • Verify no deadlock scenarios exist under the new locking scheme
  • Tests (pytest/Behave): Add scenario for concurrent event delivery asserting output consistency
  • Verify coverage ≥ 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage ≥ 97%.

Backlog note: This issue was discovered during autonomous operation
on milestone v3.4.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Background and Context The `_tell_streaming` function in `src/cleveragents/cli/commands/plan.py` (lines 736–842) uses `asyncio` to handle streaming plan generation with real-time progress display via `rich.Live`. However, there are no locks or other synchronization mechanisms protecting shared mutable state (e.g., the `status` `Text` object). If multiple async events are dispatched concurrently — which is architecturally possible given the A2A streaming protocol and the parallel sub-plan spawning described in the specification — concurrent mutations to `status` could produce corrupted CLI output or silent data races. ## Current Behavior The `_tell_streaming` coroutine iterates over `plan_service.generate_plan_streaming(...)` and mutates a shared `rich.Text` `status` object on each event without any locking: ```python async def _tell_streaming( project: Project, description: str, name: str | None, plan_service: Any, actor: str | None = None, ) -> None: from rich.live import Live from rich.text import Text status = Text() status.append("Starting plan generation...\n\n", style="bold cyan") with Live(status, console=console, refresh_per_second=4) as live: try: async for event in plan_service.generate_plan_streaming( project, description, name, actor=actor, ): # ... mutates `status` without a lock ... live.update(status) ``` If the event source yields events that trigger concurrent callbacks or if the `Live` refresh timer fires while an event handler is mid-mutation, the `status` object can be read in a partially-updated state, resulting in corrupted output. ## Expected Behavior The `_tell_streaming` function should protect all mutations to the shared `status` object with an `asyncio.Lock`, ensuring that only one coroutine path modifies the display state at a time, and that `live.update(status)` is always called on a fully-consistent object. ## Acceptance Criteria - [ ] An `asyncio.Lock` (or equivalent synchronization primitive) is introduced to guard all read-modify-write operations on the `status` object inside `_tell_streaming`. - [ ] The lock is acquired before any mutation of `status` and released after `live.update(status)` is called. - [ ] No deadlocks are introduced; the lock scope is kept minimal. - [ ] Existing unit and integration tests for `plan tell` / `_tell_streaming` continue to pass. - [ ] New tests are added to exercise concurrent event delivery and assert output consistency. - [ ] All `nox` default sessions pass; coverage ≥ 97%. ## Supporting Information - **File**: `src/cleveragents/cli/commands/plan.py` - **Function**: `_tell_streaming` - **Lines**: 736–842 - **Severity**: Medium — corrupted CLI output; low likelihood (timing-dependent) - **Category**: concurrency - Architectural context: The A2A Protocol (`message/stream`) and parallel sub-plan spawning (`subplan_parallel_spawn` decision nodes) make concurrent event delivery a realistic scenario as the system matures. --- ## Metadata - **Branch**: `fix/concurrency-tell-streaming-race-condition` - **Commit Message**: `fix(cli): add asyncio lock to _tell_streaming to prevent race conditions` - **Milestone**: *(backlog — see note below)* - **Parent Epic**: #368 ## Subtasks - [ ] Audit `_tell_streaming` for all shared mutable state accessed across async boundaries - [ ] Introduce `asyncio.Lock` to guard `status` mutations and `live.update()` calls - [ ] Verify no deadlock scenarios exist under the new locking scheme - [ ] Tests (pytest/Behave): Add scenario for concurrent event delivery asserting output consistency - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage ≥ 97%. --- > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.4.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3725
No description provided.