BUG-HUNT: concurrency — DecompositionService module-level _COUNTER is not thread-safe #7727

Open
opened 2026-04-12 03:21:18 +00:00 by HAL9000 · 3 comments
Owner

Bug Report: concurrency — DecompositionService module-level _COUNTER is not thread-safe

Severity Assessment

  • Impact: Concurrent plan decompositions can produce duplicate or out-of-order node IDs, causing two decomposition results to share the same node_id. Downstream services that key on node IDs (child-plan spawning, dependency ordering, audit records) would silently corrupt each other's state.
  • Likelihood: Any production deployment running concurrent plan decompositions (which is the normal case for a multi-user system) will trigger this race. The window is narrow per call but guaranteed to exist under load.
  • Priority: High

Location

  • File: src/cleveragents/application/services/decomposition_service.py
  • Function/Class: _next_node_id, _reset_counter
  • Lines: 52–65

Description

_next_node_id() and _reset_counter() operate on a module-level global integer _COUNTER with no synchronisation. In a multi-threaded environment:

  1. Thread A reads _COUNTER = 5, increments to 6.
  2. Thread B reads _COUNTER = 5 (before A writes), increments to 6.
  3. Both A and B return dn-000006 — a duplicate node ID.

Furthermore, decompose() calls _reset_counter() at the start of every invocation, so a concurrent call can reset the counter while another invocation is mid-decomposition, producing duplicate IDs within a single result.

Evidence

_COUNTER: int = 0

def _next_node_id(prefix: str = "dn") -> str:
    """Generate a deterministic, monotonically increasing node ID."""
    global _COUNTER
    _COUNTER += 1          # NOT atomic — race between threads
    return f"{prefix}-{_COUNTER:06d}"

def _reset_counter() -> None:
    """Reset the node ID counter (for testing)."""
    global _COUNTER
    _COUNTER = 0            # Resets across ALL concurrent decompositions

# In DecompositionService.decompose():
def decompose(self, files, config=None):
    _reset_counter()        # Dangerous: resets shared global mid-flight
    ...
    self._build_hierarchy(...)

_build_hierarchy then calls _next_node_id() repeatedly. Any concurrent invocation hitting _reset_counter() will restart the counter for all in-flight decompositions.

Expected Behavior

Each decomposition invocation should produce globally unique, collision-free node IDs regardless of concurrency.

Actual Behavior

Concurrent decompositions share a module-level counter without locking, producing duplicate node IDs and non-monotonic sequences.

Suggested Fix

Replace the module-level counter with a threading.Lock-protected counter or use itertools.count() with a threading-safe wrapper. Better: use ULID or UUID for node IDs, which are inherently unique without shared state:

import threading
_COUNTER_LOCK = threading.Lock()
_COUNTER: int = 0

def _next_node_id(prefix: str = "dn") -> str:
    global _COUNTER
    with _COUNTER_LOCK:
        _COUNTER += 1
        return f"{prefix}-{_COUNTER:06d}"

Or per-decomposition counter passed as an argument to avoid global state entirely.

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: concurrency — `DecompositionService` module-level `_COUNTER` is not thread-safe ### Severity Assessment - **Impact**: Concurrent plan decompositions can produce duplicate or out-of-order node IDs, causing two decomposition results to share the same `node_id`. Downstream services that key on node IDs (child-plan spawning, dependency ordering, audit records) would silently corrupt each other's state. - **Likelihood**: Any production deployment running concurrent plan decompositions (which is the normal case for a multi-user system) will trigger this race. The window is narrow per call but guaranteed to exist under load. - **Priority**: High ### Location - **File**: `src/cleveragents/application/services/decomposition_service.py` - **Function/Class**: `_next_node_id`, `_reset_counter` - **Lines**: 52–65 ### Description `_next_node_id()` and `_reset_counter()` operate on a module-level global integer `_COUNTER` with no synchronisation. In a multi-threaded environment: 1. Thread A reads `_COUNTER = 5`, increments to 6. 2. Thread B reads `_COUNTER = 5` (before A writes), increments to 6. 3. Both A and B return `dn-000006` — a duplicate node ID. Furthermore, `decompose()` calls `_reset_counter()` at the start of every invocation, so a concurrent call can reset the counter while another invocation is mid-decomposition, producing duplicate IDs within a single result. ### Evidence ```python _COUNTER: int = 0 def _next_node_id(prefix: str = "dn") -> str: """Generate a deterministic, monotonically increasing node ID.""" global _COUNTER _COUNTER += 1 # NOT atomic — race between threads return f"{prefix}-{_COUNTER:06d}" def _reset_counter() -> None: """Reset the node ID counter (for testing).""" global _COUNTER _COUNTER = 0 # Resets across ALL concurrent decompositions # In DecompositionService.decompose(): def decompose(self, files, config=None): _reset_counter() # Dangerous: resets shared global mid-flight ... self._build_hierarchy(...) ``` `_build_hierarchy` then calls `_next_node_id()` repeatedly. Any concurrent invocation hitting `_reset_counter()` will restart the counter for all in-flight decompositions. ### Expected Behavior Each decomposition invocation should produce globally unique, collision-free node IDs regardless of concurrency. ### Actual Behavior Concurrent decompositions share a module-level counter without locking, producing duplicate node IDs and non-monotonic sequences. ### Suggested Fix Replace the module-level counter with a threading.Lock-protected counter or use `itertools.count()` with a threading-safe wrapper. Better: use ULID or UUID for node IDs, which are inherently unique without shared state: ```python import threading _COUNTER_LOCK = threading.Lock() _COUNTER: int = 0 def _next_node_id(prefix: str = "dn") -> str: global _COUNTER with _COUNTER_LOCK: _COUNTER += 1 return f"{prefix}-{_COUNTER:06d}" ``` Or per-decomposition counter passed as an argument to avoid global state entirely. ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-12 03:42:25 +00:00
Author
Owner

Verified — Concurrency bug: DecompositionService module-level _COUNTER not thread-safe. MoSCoW: Must-have. Priority: High — race condition in parallel execution.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: DecompositionService module-level _COUNTER not thread-safe. MoSCoW: Must-have. Priority: High — race condition in parallel execution. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Concurrency bug: DecompositionService module-level _COUNTER not thread-safe. MoSCoW: Must-have. Priority: High — race condition in parallel execution.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: DecompositionService module-level _COUNTER not thread-safe. MoSCoW: Must-have. Priority: High — race condition in parallel execution. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Concurrency bug: DecompositionService module-level _COUNTER not thread-safe. MoSCoW: Must-have. Priority: High — race condition in parallel execution.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: DecompositionService module-level _COUNTER not thread-safe. MoSCoW: Must-have. Priority: High — race condition in parallel execution. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7727
No description provided.