BUG-HUNT: [concurrency] decomposition_service.py uses an unsynchronised module-level _COUNTER — concurrent decompose() calls produce duplicate node IDs and corrupt node trees #6762

Open
opened 2026-04-10 02:03:05 +00:00 by HAL9000 · 2 comments
Owner

Bug Report: [concurrency] — Module-level _COUNTER in DecompositionService is not thread-safe

Severity Assessment

  • Impact: Concurrent decompose() calls produce non-unique node_id values (e.g., both callers get dn-000001), corrupt the DAG structure, and one call's _reset_counter() silently truncates node IDs being generated by the other call mid-traversal
  • Likelihood: Medium — DecompositionService is a Factory in the DI container (new instance per resolution), but the _COUNTER and _reset_counter() are module-level globals shared across all instances and all threads
  • Priority: High

Location

  • File: src/cleveragents/application/services/decomposition_service.py
  • Function/Class: _next_node_id(), _reset_counter(), DecompositionService.decompose()
  • Lines: ~50–60 (globals), ~123–125 (decompose calls _reset_counter)

Description

decomposition_service.py uses a module-level _COUNTER integer to generate unique node IDs:

# module level — shared across all instances and threads
_COUNTER: int = 0

def _next_node_id(prefix: str = "dn") -> str:
    global _COUNTER
    _COUNTER += 1               # not atomic!
    return f"{prefix}-{_COUNTER:06d}"

def _reset_counter() -> None:
    global _COUNTER
    _COUNTER = 0                # not atomic!

DecompositionService.decompose() always calls _reset_counter() at the start:

def decompose(self, files: list[str], config: ...) -> DecompositionResult:
    _reset_counter()             # resets the SHARED global
    ...

This creates multiple concurrency hazards:

  1. Counter reset race: Thread A starts decompose() and calls _reset_counter(). Thread B is in the middle of _build_hierarchy() generating node IDs via _next_node_id(). Thread A's reset sets _COUNTER = 0, causing Thread B to generate duplicate node IDs starting from 1 again.

  2. Non-atomic increment: _COUNTER += 1 is NOT atomic in CPython under the GIL for mutable references (though it often is for small integers). Under high concurrency, two threads could read the same value of _COUNTER, both increment to the same value, and both return the same node_id.

  3. Instance independence illusion: DecompositionService is a Factory provider — callers expect each instance to be independent. But because _COUNTER is module-global, all instances share state. One call can invalidate another's in-progress decomposition.

Evidence

# src/cleveragents/application/services/decomposition_service.py

# Module-level global — shared across ALL instances and threads
_COUNTER: int = 0

def _next_node_id(prefix: str = "dn") -> str:
    global _COUNTER
    _COUNTER += 1        # ← not thread-safe
    return f"{prefix}-{_COUNTER:06d}"

def _reset_counter() -> None:
    global _COUNTER
    _COUNTER = 0         # ← resets for all threads

class DecompositionService:
    def decompose(self, files: list[str], ...) -> DecompositionResult:
        _reset_counter()  # ← resets shared global at decompose start
        ...
        nodes: list[DecompositionNode] = []
        self._build_hierarchy(files=unique_files, ..., nodes=nodes)
        ...

Expected Behavior

Each decompose() call should produce monotonically increasing, unique node_id values within its own call, completely independent of other concurrent decompose() calls.

Actual Behavior

Concurrent decompose() calls share and reset the same module-level counter, leading to:

  • Duplicate node_id values across concurrent decomposition results
  • Truncated node_id sequences when one call resets the counter mid-traversal
  • Non-deterministic decomposition results under concurrent load

Suggested Fix

Move the counter out of module scope and into the decompose() call context (closure or argument), or make _next_node_id a closure factory per-call:

def decompose(self, files: list[str], config: ...) -> DecompositionResult:
    counter = itertools.count(1)  # local per-call counter, thread-safe
    
    def next_node_id(prefix: str = "dn") -> str:
        return f"{prefix}-{next(counter):06d}"
    ...

Or use itertools.count() at the class instance level, or a threading.local() counter.

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [concurrency] — Module-level `_COUNTER` in DecompositionService is not thread-safe ### Severity Assessment - **Impact**: Concurrent `decompose()` calls produce non-unique `node_id` values (e.g., both callers get `dn-000001`), corrupt the DAG structure, and one call's `_reset_counter()` silently truncates node IDs being generated by the other call mid-traversal - **Likelihood**: Medium — `DecompositionService` is a `Factory` in the DI container (new instance per resolution), but the `_COUNTER` and `_reset_counter()` are module-level globals shared across all instances and all threads - **Priority**: High ### Location - **File**: `src/cleveragents/application/services/decomposition_service.py` - **Function/Class**: `_next_node_id()`, `_reset_counter()`, `DecompositionService.decompose()` - **Lines**: ~50–60 (globals), ~123–125 (decompose calls _reset_counter) ### Description `decomposition_service.py` uses a module-level `_COUNTER` integer to generate unique node IDs: ```python # module level — shared across all instances and threads _COUNTER: int = 0 def _next_node_id(prefix: str = "dn") -> str: global _COUNTER _COUNTER += 1 # not atomic! return f"{prefix}-{_COUNTER:06d}" def _reset_counter() -> None: global _COUNTER _COUNTER = 0 # not atomic! ``` `DecompositionService.decompose()` always calls `_reset_counter()` at the start: ```python def decompose(self, files: list[str], config: ...) -> DecompositionResult: _reset_counter() # resets the SHARED global ... ``` This creates multiple concurrency hazards: 1. **Counter reset race**: Thread A starts `decompose()` and calls `_reset_counter()`. Thread B is in the middle of `_build_hierarchy()` generating node IDs via `_next_node_id()`. Thread A's reset sets `_COUNTER = 0`, causing Thread B to generate duplicate node IDs starting from 1 again. 2. **Non-atomic increment**: `_COUNTER += 1` is NOT atomic in CPython under the GIL for mutable references (though it often is for small integers). Under high concurrency, two threads could read the same value of `_COUNTER`, both increment to the same value, and both return the same `node_id`. 3. **Instance independence illusion**: `DecompositionService` is a `Factory` provider — callers expect each instance to be independent. But because `_COUNTER` is module-global, all instances share state. One call can invalidate another's in-progress decomposition. ### Evidence ```python # src/cleveragents/application/services/decomposition_service.py # Module-level global — shared across ALL instances and threads _COUNTER: int = 0 def _next_node_id(prefix: str = "dn") -> str: global _COUNTER _COUNTER += 1 # ← not thread-safe return f"{prefix}-{_COUNTER:06d}" def _reset_counter() -> None: global _COUNTER _COUNTER = 0 # ← resets for all threads class DecompositionService: def decompose(self, files: list[str], ...) -> DecompositionResult: _reset_counter() # ← resets shared global at decompose start ... nodes: list[DecompositionNode] = [] self._build_hierarchy(files=unique_files, ..., nodes=nodes) ... ``` ### Expected Behavior Each `decompose()` call should produce monotonically increasing, unique `node_id` values within its own call, completely independent of other concurrent `decompose()` calls. ### Actual Behavior Concurrent `decompose()` calls share and reset the same module-level counter, leading to: - Duplicate `node_id` values across concurrent decomposition results - Truncated `node_id` sequences when one call resets the counter mid-traversal - Non-deterministic decomposition results under concurrent load ### Suggested Fix Move the counter out of module scope and into the `decompose()` call context (closure or argument), or make `_next_node_id` a closure factory per-call: ```python def decompose(self, files: list[str], config: ...) -> DecompositionResult: counter = itertools.count(1) # local per-call counter, thread-safe def next_node_id(prefix: str = "dn") -> str: return f"{prefix}-{next(counter):06d}" ... ``` Or use `itertools.count()` at the class instance level, or a `threading.local()` counter. ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
Author
Owner

Label compliance fix applied: Added missing State/Unverified label per CONTRIBUTING.md requirements.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: Added missing `State/Unverified` label per CONTRIBUTING.md requirements. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Author
Owner

Verified — Concurrency bug: unsynchronized module-level _COUNTER in decomposition_service.py produces duplicate node IDs. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: unsynchronized module-level _COUNTER in decomposition_service.py produces duplicate node IDs. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6762
No description provided.