BUG-HUNT: [concurrency] get_container() has TOCTOU race — concurrent callers can construct two Container instances, leaving each with divergent Singleton state #6732

Open
opened 2026-04-10 00:20:40 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: [concurrency] — get_container() TOCTOU race with no thread lock

Severity Assessment

  • Impact: Two competing threads both initialize the global DI Container, producing divergent Singleton instances (different ReactiveEventBus, AuditEventSubscriber, PluginManager, etc.). Events emitted on one EventBus are invisible to subscribers registered on the other. Audit subscriptions may silently go missing.
  • Likelihood: Medium — triggered any time two coroutines or threads call get_container() simultaneously before the Container is initialized (e.g., server startup with concurrent request handlers, or parallel pytest workers without reset_container() isolation).
  • Priority: High

Location

  • File: src/cleveragents/application/container.py
  • Function: get_container()
  • Lines: approximately 909–975

Description

get_container() guards the global _container with a bare if _container is None: check and no lock. Two threads racing at this check both see None, both construct a Container(), and both proceed to call _container.audit_event_subscriber() on their respective instances. One thread's Container() assignment overwrites the other's, but the first thread's _container.audit_event_subscriber() call is made against the discarded instance. The surviving Container's audit subscriber may never be wired.

This is the same structural TOCTOU pattern already identified in:

  • Settings.get_settings() → bug #6566
  • get_provider_registry() → bug #6519
  • cli_bootstrap.get_facade() → bug #6387

Evidence

# src/cleveragents/application/container.py  (lines ~909–975)

_container: Container | None = None
_audit_subscriber_initialized: bool = False


def get_container() -> Container:
    global _container, _audit_subscriber_initialized
    if _container is None:          # ← Thread A reads None
                                    # ← Thread B reads None (race window)
        from cleveragents.config.logging import configure_structlog
        configure_structlog(log_level="WARNING")
        _container = Container()    # ← Both threads create a Container
    # Thread A and Thread B now hold DIFFERENT Container instances
    # in local registers; the second write wins for _container,
    # but the first thread continues operating on the discarded one.
    if not _audit_subscriber_initialized:
        try:
            _container.audit_event_subscriber()     # may fire on discarded instance
            _audit_subscriber_initialized = True
        except Exception as exc:
            _logger.warning(...)

No threading.Lock is acquired before either the _container is None check or the assignment.

Expected Behavior

get_container() should be thread-safe: exactly one Container instance should ever be created per process lifetime, regardless of how many threads call the function concurrently.

Actual Behavior

Under concurrent access, two Container instances can be created. Singletons (particularly ReactiveEventBus, AuditEventSubscriber, CostBudgetService, PluginManager) will exist as separate objects. Domain events emitted through one EventBus won't reach subscribers registered on the other.

Suggested Fix

Add a module-level threading.Lock and use double-checked locking (the idiom already applied in get_provider_registry() per bug #6519):

import threading
_container_lock = threading.Lock()

def get_container() -> Container:
    global _container, _audit_subscriber_initialized
    if _container is None:
        with _container_lock:
            if _container is None:   # re-check inside lock
                from cleveragents.config.logging import configure_structlog
                configure_structlog(log_level="WARNING")
                _container = Container()
    if not _audit_subscriber_initialized:
        with _container_lock:
            if not _audit_subscriber_initialized:
                try:
                    _container.audit_event_subscriber()
                    _audit_subscriber_initialized = True
                except Exception as exc:
                    _logger.warning(...)
    return _container

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [concurrency] — `get_container()` TOCTOU race with no thread lock ### Severity Assessment - **Impact**: Two competing threads both initialize the global DI Container, producing divergent Singleton instances (different `ReactiveEventBus`, `AuditEventSubscriber`, `PluginManager`, etc.). Events emitted on one EventBus are invisible to subscribers registered on the other. Audit subscriptions may silently go missing. - **Likelihood**: Medium — triggered any time two coroutines or threads call `get_container()` simultaneously before the Container is initialized (e.g., server startup with concurrent request handlers, or parallel pytest workers without `reset_container()` isolation). - **Priority**: High ### Location - **File**: `src/cleveragents/application/container.py` - **Function**: `get_container()` - **Lines**: approximately 909–975 ### Description `get_container()` guards the global `_container` with a bare `if _container is None:` check and no lock. Two threads racing at this check both see `None`, both construct a `Container()`, and both proceed to call `_container.audit_event_subscriber()` on their respective instances. One thread's `Container()` assignment overwrites the other's, but the first thread's `_container.audit_event_subscriber()` call is made against the discarded instance. The surviving `Container`'s audit subscriber may never be wired. This is the same structural TOCTOU pattern already identified in: - `Settings.get_settings()` → bug #6566 - `get_provider_registry()` → bug #6519 - `cli_bootstrap.get_facade()` → bug #6387 ### Evidence ```python # src/cleveragents/application/container.py (lines ~909–975) _container: Container | None = None _audit_subscriber_initialized: bool = False def get_container() -> Container: global _container, _audit_subscriber_initialized if _container is None: # ← Thread A reads None # ← Thread B reads None (race window) from cleveragents.config.logging import configure_structlog configure_structlog(log_level="WARNING") _container = Container() # ← Both threads create a Container # Thread A and Thread B now hold DIFFERENT Container instances # in local registers; the second write wins for _container, # but the first thread continues operating on the discarded one. if not _audit_subscriber_initialized: try: _container.audit_event_subscriber() # may fire on discarded instance _audit_subscriber_initialized = True except Exception as exc: _logger.warning(...) ``` No `threading.Lock` is acquired before either the `_container is None` check or the assignment. ### Expected Behavior `get_container()` should be thread-safe: exactly **one** `Container` instance should ever be created per process lifetime, regardless of how many threads call the function concurrently. ### Actual Behavior Under concurrent access, two `Container` instances can be created. Singletons (particularly `ReactiveEventBus`, `AuditEventSubscriber`, `CostBudgetService`, `PluginManager`) will exist as separate objects. Domain events emitted through one EventBus won't reach subscribers registered on the other. ### Suggested Fix Add a module-level `threading.Lock` and use double-checked locking (the idiom already applied in `get_provider_registry()` per bug #6519): ```python import threading _container_lock = threading.Lock() def get_container() -> Container: global _container, _audit_subscriber_initialized if _container is None: with _container_lock: if _container is None: # re-check inside lock from cleveragents.config.logging import configure_structlog configure_structlog(log_level="WARNING") _container = Container() if not _audit_subscriber_initialized: with _container_lock: if not _audit_subscriber_initialized: try: _container.audit_event_subscriber() _audit_subscriber_initialized = True except Exception as exc: _logger.warning(...) return _container ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-10 00:34:34 +00:00
Author
Owner

Verified — Concurrency bug: get_container() TOCTOU race creates divergent Singleton state. MoSCoW: Should-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: get_container() TOCTOU race creates divergent Singleton state. MoSCoW: Should-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6732
No description provided.