BUG-HUNT: [concurrency] get_provider_registry() global singleton has TOCTOU race condition — concurrent threads can create multiple registries or clobber in-use state #6519

Open
opened 2026-04-09 21:14:31 +00:00 by HAL9000 · 0 comments
Owner

Bug Report: [concurrency] — get_provider_registry() global singleton is not thread-safe

Severity Assessment

  • Impact: In a multi-threaded server process (e.g., ASGI/WSGI web server with concurrent requests), two threads can simultaneously pass the _registry is None check and both create a ProviderRegistry instance. The second write clobbers the first. Additionally, any call with settings is not None replaces the global unconditionally, potentially swapping the registry out from under concurrent users.
  • Likelihood: Medium — only manifests under concurrent load; single-threaded CLI use is safe.
  • Priority: Medium

Location

  • File: src/cleveragents/providers/registry.py
  • Function/Class: get_provider_registry
  • Lines: ~388–403 (the get_provider_registry function)

Description

The module-level _registry singleton is created with a simple check-then-act pattern and no synchronization:

_registry: ProviderRegistry | None = None

def get_provider_registry(settings: Settings | None = None) -> ProviderRegistry:
    global _registry
    if _registry is None or settings is not None:   # ← check (not atomic)
        _registry = ProviderRegistry(settings)       # ← act (not atomic)
    return _registry

Two race conditions exist:

Race 1 — Double initialisation: Two threads call get_provider_registry() concurrently when _registry is None. Both pass the None check before either completes the assignment. Both create a new ProviderRegistry, and whichever writes last wins. The other registry is discarded, but any in-flight work on it (if ProviderRegistry.__init__ had side effects) is wasted.

Race 2 — Unexpected global replacement: The condition settings is not None means any call that passes custom settings silently replaces the global singleton. If Thread A is using the current _registry while Thread B calls get_provider_registry(new_settings), Thread B replaces _registry. All subsequent calls from Thread A's code path will get the new registry, even if Thread A was mid-operation with the original one. This creates non-deterministic provider configuration.

Evidence

# registry.py

_registry: ProviderRegistry | None = None   # module-level, no lock


def get_provider_registry(settings: Settings | None = None) -> ProviderRegistry:
    """Get the global provider registry instance."""
    global _registry
    if _registry is None or settings is not None:
        _registry = ProviderRegistry(settings)   # ← no lock, TOCTOU
    return _registry


def reset_provider_registry() -> None:
    """Reset the global provider registry."""
    global _registry
    _registry = None   # ← also unprotected

Expected Behavior

The singleton should be initialised exactly once under lock. Custom-settings override should be handled explicitly (e.g., always return a new instance without replacing the global, or require explicit opt-in).

Actual Behavior

No synchronization exists. Under concurrent load, multiple ProviderRegistry objects can be created and the global can be replaced mid-use.

Suggested Fix

Protect the singleton with a module-level threading.Lock:

import threading

_registry: ProviderRegistry | None = None
_registry_lock: threading.Lock = threading.Lock()


def get_provider_registry(settings: Settings | None = None) -> ProviderRegistry:
    global _registry
    if settings is not None:
        # Custom settings → always return a fresh instance, don't replace global
        return ProviderRegistry(settings)
    with _registry_lock:
        if _registry is None:
            _registry = ProviderRegistry()
    return _registry


def reset_provider_registry() -> None:
    global _registry
    with _registry_lock:
        _registry = None

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [concurrency] — `get_provider_registry()` global singleton is not thread-safe ### Severity Assessment - **Impact**: In a multi-threaded server process (e.g., ASGI/WSGI web server with concurrent requests), two threads can simultaneously pass the `_registry is None` check and both create a `ProviderRegistry` instance. The second write clobbers the first. Additionally, any call with `settings is not None` replaces the global unconditionally, potentially swapping the registry out from under concurrent users. - **Likelihood**: Medium — only manifests under concurrent load; single-threaded CLI use is safe. - **Priority**: Medium ### Location - **File**: `src/cleveragents/providers/registry.py` - **Function/Class**: `get_provider_registry` - **Lines**: ~388–403 (the `get_provider_registry` function) ### Description The module-level `_registry` singleton is created with a simple check-then-act pattern and no synchronization: ```python _registry: ProviderRegistry | None = None def get_provider_registry(settings: Settings | None = None) -> ProviderRegistry: global _registry if _registry is None or settings is not None: # ← check (not atomic) _registry = ProviderRegistry(settings) # ← act (not atomic) return _registry ``` Two race conditions exist: **Race 1 — Double initialisation:** Two threads call `get_provider_registry()` concurrently when `_registry is None`. Both pass the `None` check before either completes the assignment. Both create a new `ProviderRegistry`, and whichever writes last wins. The other registry is discarded, but any in-flight work on it (if `ProviderRegistry.__init__` had side effects) is wasted. **Race 2 — Unexpected global replacement:** The condition `settings is not None` means *any* call that passes custom settings silently replaces the global singleton. If Thread A is using the current `_registry` while Thread B calls `get_provider_registry(new_settings)`, Thread B replaces `_registry`. All subsequent calls from Thread A's code path will get the new registry, even if Thread A was mid-operation with the original one. This creates non-deterministic provider configuration. ### Evidence ```python # registry.py _registry: ProviderRegistry | None = None # module-level, no lock def get_provider_registry(settings: Settings | None = None) -> ProviderRegistry: """Get the global provider registry instance.""" global _registry if _registry is None or settings is not None: _registry = ProviderRegistry(settings) # ← no lock, TOCTOU return _registry def reset_provider_registry() -> None: """Reset the global provider registry.""" global _registry _registry = None # ← also unprotected ``` ### Expected Behavior The singleton should be initialised exactly once under lock. Custom-settings override should be handled explicitly (e.g., always return a new instance without replacing the global, or require explicit opt-in). ### Actual Behavior No synchronization exists. Under concurrent load, multiple `ProviderRegistry` objects can be created and the global can be replaced mid-use. ### Suggested Fix Protect the singleton with a module-level `threading.Lock`: ```python import threading _registry: ProviderRegistry | None = None _registry_lock: threading.Lock = threading.Lock() def get_provider_registry(settings: Settings | None = None) -> ProviderRegistry: global _registry if settings is not None: # Custom settings → always return a fresh instance, don't replace global return ProviderRegistry(settings) with _registry_lock: if _registry is None: _registry = ProviderRegistry() return _registry def reset_provider_registry() -> None: global _registry with _registry_lock: _registry = None ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-09 21:27:54 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6519
No description provided.