BUG-HUNT: [concurrency] decomposition_service.py _COUNTER global int has TOCTOU race — duplicate node IDs under concurrent decomposition #7654

Open
opened 2026-04-11 01:15:55 +00:00 by HAL9000 · 2 comments
Owner

Bug Report: [concurrency] — _COUNTER Global Has TOCTOU Race

Severity Assessment

  • Impact: The _COUNTER global integer in decomposition_service.py is incremented with _COUNTER += 1 (a non-atomic read-modify-write on CPython). Under concurrent decomposition requests, two threads can both read the same value, both increment, and both use the same node ID. This causes duplicate DecompositionNode.node_id values, which corrupts the dependency graph and breaks the topological sort.
  • Likelihood: Medium — triggered when multiple plans are concurrently decomposed during parallel subplan execution.
  • Priority: High

Location

  • File: src/cleveragents/application/services/decomposition_service.py
  • Function/Class: _next_node_id
  • Lines: 52-59

Description

_COUNTER: int = 0  # Global variable

def _next_node_id(prefix: str = "dn") -> str:
    global _COUNTER
    _COUNTER += 1   # NOT atomic: read + add + write
    return f"{prefix}-{_COUNTER:06d}"

On CPython, _COUNTER += 1 is implemented as:

  1. LOAD_GLOBAL _COUNTER
  2. LOAD_CONST 1
  3. INPLACE_ADD
  4. STORE_GLOBAL _COUNTER

The GIL can be released between steps 1 and 4. Two concurrent threads can both load 5, both add to get 6, and both store 6. Both threads get dn-000006 as a node ID.

Evidence

# decomposition_service.py lines 52-59
_COUNTER: int = 0

def _next_node_id(prefix: str = "dn") -> str:
    global _COUNTER
    _COUNTER += 1  # TOCTOU race under GIL release
    return f"{prefix}-{_COUNTER:06d}"

Expected Behavior

Use threading.Lock() around the increment, or use itertools.count() with an atomic iterator, or use ULID/UUID for node IDs instead of sequential counters.

Actual Behavior

Concurrent calls to _next_node_id() can produce duplicate node IDs, corrupting the decomposition graph.

Suggested Fix

# Option 1: Use a threading.Lock
_COUNTER_LOCK = threading.Lock()
_COUNTER: int = 0

def _next_node_id(prefix: str = "dn") -> str:
    global _COUNTER
    with _COUNTER_LOCK:
        _COUNTER += 1
        return f"{prefix}-{_COUNTER:06d}"

# Option 2: Use ULID for uniqueness
from ulid import ULID
def _next_node_id(prefix: str = "dn") -> str:
    return f"{prefix}-{str(ULID())}"

Category

concurrency

TDD Note

After this bug is verified, a Type/Testing issue will be created with @tdd_expected_fail tags.


Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: [concurrency] — _COUNTER Global Has TOCTOU Race ### Severity Assessment - **Impact**: The `_COUNTER` global integer in `decomposition_service.py` is incremented with `_COUNTER += 1` (a non-atomic read-modify-write on CPython). Under concurrent decomposition requests, two threads can both read the same value, both increment, and both use the same node ID. This causes duplicate `DecompositionNode.node_id` values, which corrupts the dependency graph and breaks the topological sort. - **Likelihood**: Medium — triggered when multiple plans are concurrently decomposed during parallel subplan execution. - **Priority**: High ### Location - **File**: src/cleveragents/application/services/decomposition_service.py - **Function/Class**: _next_node_id - **Lines**: 52-59 ### Description ```python _COUNTER: int = 0 # Global variable def _next_node_id(prefix: str = "dn") -> str: global _COUNTER _COUNTER += 1 # NOT atomic: read + add + write return f"{prefix}-{_COUNTER:06d}" ``` On CPython, `_COUNTER += 1` is implemented as: 1. LOAD_GLOBAL _COUNTER 2. LOAD_CONST 1 3. INPLACE_ADD 4. STORE_GLOBAL _COUNTER The GIL can be released between steps 1 and 4. Two concurrent threads can both load 5, both add to get 6, and both store 6. Both threads get `dn-000006` as a node ID. ### Evidence ```python # decomposition_service.py lines 52-59 _COUNTER: int = 0 def _next_node_id(prefix: str = "dn") -> str: global _COUNTER _COUNTER += 1 # TOCTOU race under GIL release return f"{prefix}-{_COUNTER:06d}" ``` ### Expected Behavior Use `threading.Lock()` around the increment, or use `itertools.count()` with an atomic iterator, or use ULID/UUID for node IDs instead of sequential counters. ### Actual Behavior Concurrent calls to `_next_node_id()` can produce duplicate node IDs, corrupting the decomposition graph. ### Suggested Fix ```python # Option 1: Use a threading.Lock _COUNTER_LOCK = threading.Lock() _COUNTER: int = 0 def _next_node_id(prefix: str = "dn") -> str: global _COUNTER with _COUNTER_LOCK: _COUNTER += 1 return f"{prefix}-{_COUNTER:06d}" # Option 2: Use ULID for uniqueness from ulid import ULID def _next_node_id(prefix: str = "dn") -> str: return f"{prefix}-{str(ULID())}" ``` ### Category concurrency ### TDD Note After this bug is verified, a Type/Testing issue will be created with @tdd_expected_fail tags. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
Author
Owner

Label Compliance Fix Needed

This issue is missing a State/ label*. Per CONTRIBUTING.md, every issue must have exactly one State/* label.

Current labels: Priority/High, Type/Bug — missing State/*

Recommended fix: Add State/Unverified (id:846) as the default state.


Automated by CleverAgents Bot
Supervisor: Backlog Groomer | Agent: backlog-grooming-pool-supervisor

## Label Compliance Fix Needed This issue is missing a **State/* label**. Per CONTRIBUTING.md, every issue must have exactly one State/* label. Current labels: `Priority/High`, `Type/Bug` — missing `State/*` **Recommended fix**: Add `State/Unverified` (id:846) as the default state. --- **Automated by CleverAgents Bot** Supervisor: Backlog Groomer | Agent: backlog-grooming-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-11 01:59:04 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified (labels pending - label endpoint blocked by security policy)
  • Priority: High — decomposition_service.py _COUNTER global int has TOCTOU race. Can produce duplicate decomposition IDs.
  • Milestone: v3.5.0 (M6: Autonomy Hardening) — Decomposition service is core to hierarchical plan decomposition
  • Story Points: 3 (M) — Thread safety fix using atomic counter
  • MoSCoW: Must Have — Unique decomposition IDs required for correct plan tracking

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

Issue triaged by project owner: - **State**: Verified (labels pending - label endpoint blocked by security policy) - **Priority**: High — decomposition_service.py _COUNTER global int has TOCTOU race. Can produce duplicate decomposition IDs. - **Milestone**: v3.5.0 (M6: Autonomy Hardening) — Decomposition service is core to hierarchical plan decomposition - **Story Points**: 3 (M) — Thread safety fix using atomic counter - **MoSCoW**: Must Have — Unique decomposition IDs required for correct plan tracking --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7654
No description provided.