BUG-HUNT: [concurrency] Unsafe global state in CLI modules causes race conditions in multi-threaded usage #7099

Open
opened 2026-04-10 07:41:37 +00:00 by HAL9000 · 2 comments
Owner

Background and Context

Multiple CLI modules use unsynchronized module-level global state variables (lazy-initialized singletons) without any thread-safety mechanisms. While the CLI is primarily invoked as a single-process tool, the pattern is unsafe in any multi-threaded or async context — including test harnesses, embedded usage, concurrent CLI invocations sharing a process, or future async CLI evolution.

The affected modules are:

  • src/cleveragents/cli/main.py_console, _err_console, _subcommands_registered
  • src/cleveragents/cli/commands/invariant.py_service
  • src/cleveragents/cli/commands/lsp.py — global service state
  • src/cleveragents/cli/commands/session.py — global service state
  • src/cleveragents/cli/output/_ids.py — global ID state

Current Behavior (Bug)

All affected modules use the classic unsynchronized lazy-init pattern:

# main.py — global console state (lines 19–21, 36, 46, 76):
_console: "Console | None" = None
_err_console: "Console | None" = None
_subcommands_registered = False

def get_console() -> "Console":
    global _console          # Not thread-safe
    if _console is None:
        from rich.console import Console
        _console = Console()  # TOCTOU: two threads can both see None and both construct

# invariant.py — global service state (line 64):
_service: InvariantService | None = None

def _get_service() -> InvariantService:
    global _service          # Not thread-safe
    if _service is None:
        _service = InvariantService()  # Two instances can be created concurrently

# Similar patterns in lsp.py (lines 82, 90), session.py (lines 63, 77),
# and output/_ids.py (lines 24, 32, 40)

In a concurrent scenario (e.g., two threads calling get_console() simultaneously), both threads can observe _console is None, both construct a Console instance, and one silently overwrites the other — leaving the first thread holding a reference to a now-abandoned object. For stateful objects like service singletons, this can cause data loss, duplicate initialization side-effects, or inconsistent state.

Expected Behavior

Global state that must be shared across call sites should be properly synchronized. Acceptable approaches include:

  1. threading.Lock guards around the check-and-set pattern (double-checked locking with a lock)
  2. threading.local() for per-thread instances (appropriate for Console objects)
  3. Dependency injection — pass service instances explicitly rather than relying on module-level globals (preferred per SOLID/DIP principles in CONTRIBUTING.md)
  4. Module-level eager initialization where the object is safe to construct at import time (eliminates the TOCTOU window entirely)

Affected Files and Lines

File Lines Global Variable(s)
src/cleveragents/cli/main.py 19–21, 36, 46, 76 _console, _err_console, _subcommands_registered
src/cleveragents/cli/commands/invariant.py 64 _service
src/cleveragents/cli/commands/lsp.py 82, 90 service singleton(s)
src/cleveragents/cli/commands/session.py 63, 77 service singleton(s)
src/cleveragents/cli/output/_ids.py 24, 32, 40 ID state globals

Suggested Fix

For Console objects (I/O-bound, not shared state): use threading.local() so each thread gets its own instance.

For service singletons: prefer dependency injection via the DI container already present in the codebase. If a module-level singleton is truly required, guard with a threading.Lock:

import threading

_console_lock = threading.Lock()
_console: "Console | None" = None

def get_console() -> "Console":
    global _console
    if _console is None:
        with _console_lock:
            if _console is None:  # double-checked locking
                from rich.console import Console
                _console = Console()
    return _console

The preferred long-term fix is to eliminate module-level globals entirely and pass dependencies explicitly, consistent with the DI patterns already used elsewhere in the codebase.

Acceptance Criteria

  • All identified global state variables in the five affected files are either removed, made thread-safe, or replaced with DI-injected dependencies
  • No new unsynchronized module-level mutable globals are introduced
  • Existing BDD unit tests continue to pass
  • New BDD scenarios cover concurrent access to the affected accessors (verifying no double-initialization occurs)
  • All nox stages pass; coverage ≥ 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Metadata

  • Branch: bugfix/cli-global-state-thread-safety
  • Commit Message: fix(cli): add thread-safety guards to global state accessors in CLI modules
  • Milestone: (backlog — no milestone assigned)
  • Parent Epic: (to be linked — see orphan note below)

Subtasks

  • Audit all five affected files and catalogue every unsynchronized global variable
  • Add threading.Lock double-checked locking to get_console() and get_err_console() in main.py, or switch to threading.local()
  • Fix _subcommands_registered flag in main.py with a lock guard
  • Replace _service global in invariant.py with DI-injected dependency or add lock guard
  • Fix global service accessors in lsp.py (lines 82, 90)
  • Fix global service accessors in session.py (lines 63, 77)
  • Fix global ID state in output/_ids.py (lines 24, 32, 40)
  • Write BDD scenarios (@tdd_issue @tdd_issue_<N>) covering concurrent accessor calls
  • Run nox full suite and confirm all stages pass
  • Update documentation if any public API changes

Definition of Done

  • All five affected files have no unsynchronized module-level mutable globals
  • Thread-safety mechanism chosen is consistent with the existing DI/architecture patterns
  • BDD regression scenarios added and tagged @tdd_issue @tdd_issue_<N>
  • nox -s typecheck passes (Pyright strict — no # type: ignore)
  • nox -s lint and nox -s format pass
  • All nox stages pass
  • Coverage ≥ 97%

Automated by CleverAgents Bot
Supervisor: concurrency | Agent: new-issue-creator

## Background and Context Multiple CLI modules use unsynchronized module-level global state variables (lazy-initialized singletons) without any thread-safety mechanisms. While the CLI is primarily invoked as a single-process tool, the pattern is unsafe in any multi-threaded or async context — including test harnesses, embedded usage, concurrent CLI invocations sharing a process, or future async CLI evolution. The affected modules are: - `src/cleveragents/cli/main.py` — `_console`, `_err_console`, `_subcommands_registered` - `src/cleveragents/cli/commands/invariant.py` — `_service` - `src/cleveragents/cli/commands/lsp.py` — global service state - `src/cleveragents/cli/commands/session.py` — global service state - `src/cleveragents/cli/output/_ids.py` — global ID state ## Current Behavior (Bug) All affected modules use the classic unsynchronized lazy-init pattern: ```python # main.py — global console state (lines 19–21, 36, 46, 76): _console: "Console | None" = None _err_console: "Console | None" = None _subcommands_registered = False def get_console() -> "Console": global _console # Not thread-safe if _console is None: from rich.console import Console _console = Console() # TOCTOU: two threads can both see None and both construct # invariant.py — global service state (line 64): _service: InvariantService | None = None def _get_service() -> InvariantService: global _service # Not thread-safe if _service is None: _service = InvariantService() # Two instances can be created concurrently # Similar patterns in lsp.py (lines 82, 90), session.py (lines 63, 77), # and output/_ids.py (lines 24, 32, 40) ``` In a concurrent scenario (e.g., two threads calling `get_console()` simultaneously), both threads can observe `_console is None`, both construct a `Console` instance, and one silently overwrites the other — leaving the first thread holding a reference to a now-abandoned object. For stateful objects like service singletons, this can cause data loss, duplicate initialization side-effects, or inconsistent state. ## Expected Behavior Global state that must be shared across call sites should be properly synchronized. Acceptable approaches include: 1. **`threading.Lock` guards** around the check-and-set pattern (double-checked locking with a lock) 2. **`threading.local()`** for per-thread instances (appropriate for `Console` objects) 3. **Dependency injection** — pass service instances explicitly rather than relying on module-level globals (preferred per SOLID/DIP principles in CONTRIBUTING.md) 4. **Module-level eager initialization** where the object is safe to construct at import time (eliminates the TOCTOU window entirely) ## Affected Files and Lines | File | Lines | Global Variable(s) | |------|-------|--------------------| | `src/cleveragents/cli/main.py` | 19–21, 36, 46, 76 | `_console`, `_err_console`, `_subcommands_registered` | | `src/cleveragents/cli/commands/invariant.py` | 64 | `_service` | | `src/cleveragents/cli/commands/lsp.py` | 82, 90 | service singleton(s) | | `src/cleveragents/cli/commands/session.py` | 63, 77 | service singleton(s) | | `src/cleveragents/cli/output/_ids.py` | 24, 32, 40 | ID state globals | ## Suggested Fix For `Console` objects (I/O-bound, not shared state): use `threading.local()` so each thread gets its own instance. For service singletons: prefer dependency injection via the DI container already present in the codebase. If a module-level singleton is truly required, guard with a `threading.Lock`: ```python import threading _console_lock = threading.Lock() _console: "Console | None" = None def get_console() -> "Console": global _console if _console is None: with _console_lock: if _console is None: # double-checked locking from rich.console import Console _console = Console() return _console ``` The preferred long-term fix is to eliminate module-level globals entirely and pass dependencies explicitly, consistent with the DI patterns already used elsewhere in the codebase. ## Acceptance Criteria - All identified global state variables in the five affected files are either removed, made thread-safe, or replaced with DI-injected dependencies - No new unsynchronized module-level mutable globals are introduced - Existing BDD unit tests continue to pass - New BDD scenarios cover concurrent access to the affected accessors (verifying no double-initialization occurs) - All nox stages pass; coverage ≥ 97% --- > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Metadata - **Branch**: `bugfix/cli-global-state-thread-safety` - **Commit Message**: `fix(cli): add thread-safety guards to global state accessors in CLI modules` - **Milestone**: *(backlog — no milestone assigned)* - **Parent Epic**: *(to be linked — see orphan note below)* ## Subtasks - [ ] Audit all five affected files and catalogue every unsynchronized global variable - [ ] Add `threading.Lock` double-checked locking to `get_console()` and `get_err_console()` in `main.py`, or switch to `threading.local()` - [ ] Fix `_subcommands_registered` flag in `main.py` with a lock guard - [ ] Replace `_service` global in `invariant.py` with DI-injected dependency or add lock guard - [ ] Fix global service accessors in `lsp.py` (lines 82, 90) - [ ] Fix global service accessors in `session.py` (lines 63, 77) - [ ] Fix global ID state in `output/_ids.py` (lines 24, 32, 40) - [ ] Write BDD scenarios (`@tdd_issue @tdd_issue_<N>`) covering concurrent accessor calls - [ ] Run `nox` full suite and confirm all stages pass - [ ] Update documentation if any public API changes ## Definition of Done - [ ] All five affected files have no unsynchronized module-level mutable globals - [ ] Thread-safety mechanism chosen is consistent with the existing DI/architecture patterns - [ ] BDD regression scenarios added and tagged `@tdd_issue @tdd_issue_<N>` - [ ] `nox -s typecheck` passes (Pyright strict — no `# type: ignore`) - [ ] `nox -s lint` and `nox -s format` pass - [ ] All nox stages pass - [ ] Coverage ≥ 97% --- **Automated by CleverAgents Bot** Supervisor: concurrency | Agent: new-issue-creator
Author
Owner

⚠️ Orphan Issue — Manual Parent Epic Linking Required

This issue was created by an automated agent and could not be automatically linked to a parent Epic. Per CONTRIBUTING.md, all atomic issues (leaf nodes) must belong to at least one parent Epic.

Action required for a maintainer:

  1. Identify or create an appropriate parent Epic covering CLI concurrency/thread-safety or CLI code quality
  2. Link this issue as a child using Forgejo's dependency system: this issue blocks the parent Epic (the Epic cannot be marked complete until this issue is resolved)
  3. The dependency direction is: child blocks parent (not the other way around)

Suggested parent Epics to consider:

  • An existing CLI architecture/quality Epic if one exists
  • A new Epic grouping all CLI global-state thread-safety fixes (there are related issues: #7069, #7039, #6771, #6762, #6732, #6566 cover similar concurrency patterns in other modules)

Also required per CONTRIBUTING.md Bug Fix Workflow:

  • A companion TDD: issue (Type/Testing) must be created that blocks this bug issue
  • The TDD issue should contain a @tdd_issue @tdd_issue_7099 @tdd_expected_fail scenario that fails until the fix is implemented

Automated by CleverAgents Bot
Supervisor: concurrency | Agent: new-issue-creator

⚠️ **Orphan Issue — Manual Parent Epic Linking Required** This issue was created by an automated agent and could not be automatically linked to a parent Epic. Per CONTRIBUTING.md, all atomic issues (leaf nodes) **must** belong to at least one parent Epic. **Action required for a maintainer:** 1. Identify or create an appropriate parent Epic covering CLI concurrency/thread-safety or CLI code quality 2. Link this issue as a child using Forgejo's dependency system: this issue **blocks** the parent Epic (the Epic cannot be marked complete until this issue is resolved) 3. The dependency direction is: **child blocks parent** (not the other way around) **Suggested parent Epics to consider:** - An existing CLI architecture/quality Epic if one exists - A new Epic grouping all CLI global-state thread-safety fixes (there are related issues: #7069, #7039, #6771, #6762, #6732, #6566 cover similar concurrency patterns in other modules) **Also required per CONTRIBUTING.md Bug Fix Workflow:** - A companion `TDD:` issue (`Type/Testing`) must be created that **blocks** this bug issue - The TDD issue should contain a `@tdd_issue @tdd_issue_7099 @tdd_expected_fail` scenario that fails until the fix is implemented --- **Automated by CleverAgents Bot** Supervisor: concurrency | Agent: new-issue-creator
Author
Owner

Verified — Critical concurrency bug: unsafe global state in CLI modules. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: unsafe global state in CLI modules. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7099
No description provided.