BUG-HUNT: [concurrency] MEMORY_ENGINES global dict in engine_cache.py has TOCTOU race — two threads can create duplicate engines for same URL #7566

Open
opened 2026-04-10 22:06:22 +00:00 by HAL9000 · 2 comments
Owner

Bug Report: [concurrency] — MEMORY_ENGINES Global Dict Has TOCTOU Race

Severity Assessment

  • Impact: Two threads concurrently creating UnitOfWork instances with the same in-memory SQLite URL can both evaluate if self.database_url not in MEMORY_ENGINES as True, then both call create_engine(), with the second write overwriting the first. The first UnitOfWork instance then holds a reference to a discarded engine, causing all subsequent DB operations on that instance to use a stale/disconnected engine.
  • Likelihood: Low — only affects concurrent UnitOfWork creation for the same in-memory SQLite URL, most common in tests.
  • Priority: Medium

Location

  • File: src/cleveragents/infrastructure/database/unit_of_work.py
  • Also: src/cleveragents/infrastructure/database/engine_cache.py
  • Lines: unit_of_work.py:76-84

Description

The MEMORY_ENGINES global dict is a plain dict[str, Engine] with no locking:

# engine_cache.py
MEMORY_ENGINES: dict[str, Engine] = {}  # NO LOCK

In unit_of_work.py, the pattern is:

if self.database_url not in MEMORY_ENGINES:   # Thread A: checks, sees False
    # Thread B also checks here, also sees False
    MEMORY_ENGINES[self.database_url] = create_engine(...)  # Thread A writes
    # Thread B also writes, overwriting Thread A entry
self._engine = MEMORY_ENGINES[self.database_url]  # Thread A gets Thread B engine!

Thread A then has a reference to the engine it created, but MEMORY_ENGINES contains Thread B's engine. Thread A's UnitOfWork will use a different engine than what is cached.

Evidence

# unit_of_work.py lines 76-84
if self.database_url not in MEMORY_ENGINES:  # TOCTOU check
    MEMORY_ENGINES[self.database_url] = create_engine(  # TOCTOU write
        self.database_url, ...
    )
self._engine = MEMORY_ENGINES[self.database_url]

Expected Behavior

The check-and-set should be atomic, using a threading.Lock around the entire read-modify-write:

_MEMORY_ENGINES_LOCK = threading.Lock()

with _MEMORY_ENGINES_LOCK:
    if self.database_url not in MEMORY_ENGINES:
        MEMORY_ENGINES[self.database_url] = create_engine(...)
    self._engine = MEMORY_ENGINES[self.database_url]

Actual Behavior

TOCTOU race allows duplicate engine creation, causing the first thread to hold a stale engine reference.

Suggested Fix

Add a module-level _MEMORY_ENGINES_LOCK = threading.Lock() in engine_cache.py and use it when accessing MEMORY_ENGINES in unit_of_work.py.

Category

concurrency

TDD Note

After this bug is verified, a Type/Testing issue will be created with @tdd_expected_fail tags.


Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: [concurrency] — MEMORY_ENGINES Global Dict Has TOCTOU Race ### Severity Assessment - **Impact**: Two threads concurrently creating UnitOfWork instances with the same in-memory SQLite URL can both evaluate `if self.database_url not in MEMORY_ENGINES` as True, then both call `create_engine()`, with the second write overwriting the first. The first UnitOfWork instance then holds a reference to a discarded engine, causing all subsequent DB operations on that instance to use a stale/disconnected engine. - **Likelihood**: Low — only affects concurrent UnitOfWork creation for the same in-memory SQLite URL, most common in tests. - **Priority**: Medium ### Location - **File**: src/cleveragents/infrastructure/database/unit_of_work.py - **Also**: src/cleveragents/infrastructure/database/engine_cache.py - **Lines**: unit_of_work.py:76-84 ### Description The MEMORY_ENGINES global dict is a plain `dict[str, Engine]` with no locking: ```python # engine_cache.py MEMORY_ENGINES: dict[str, Engine] = {} # NO LOCK ``` In unit_of_work.py, the pattern is: ```python if self.database_url not in MEMORY_ENGINES: # Thread A: checks, sees False # Thread B also checks here, also sees False MEMORY_ENGINES[self.database_url] = create_engine(...) # Thread A writes # Thread B also writes, overwriting Thread A entry self._engine = MEMORY_ENGINES[self.database_url] # Thread A gets Thread B engine! ``` Thread A then has a reference to the engine it created, but MEMORY_ENGINES contains Thread B's engine. Thread A's UnitOfWork will use a different engine than what is cached. ### Evidence ```python # unit_of_work.py lines 76-84 if self.database_url not in MEMORY_ENGINES: # TOCTOU check MEMORY_ENGINES[self.database_url] = create_engine( # TOCTOU write self.database_url, ... ) self._engine = MEMORY_ENGINES[self.database_url] ``` ### Expected Behavior The check-and-set should be atomic, using a threading.Lock around the entire read-modify-write: ```python _MEMORY_ENGINES_LOCK = threading.Lock() with _MEMORY_ENGINES_LOCK: if self.database_url not in MEMORY_ENGINES: MEMORY_ENGINES[self.database_url] = create_engine(...) self._engine = MEMORY_ENGINES[self.database_url] ``` ### Actual Behavior TOCTOU race allows duplicate engine creation, causing the first thread to hold a stale engine reference. ### Suggested Fix Add a module-level `_MEMORY_ENGINES_LOCK = threading.Lock()` in engine_cache.py and use it when accessing MEMORY_ENGINES in unit_of_work.py. ### Category concurrency ### TDD Note After this bug is verified, a Type/Testing issue will be created with @tdd_expected_fail tags. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-10 23:05:49 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Backlog — Minor bug or optimization that does not block milestone delivery
  • Milestone: Assigned to appropriate milestone for future work
  • Story Points: 2 (S) — Small fix
  • MoSCoW: Could Have — Nice to fix but not blocking

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

Issue triaged by project owner: - **State**: Verified - **Priority**: Backlog — Minor bug or optimization that does not block milestone delivery - **Milestone**: Assigned to appropriate milestone for future work - **Story Points**: 2 (S) — Small fix - **MoSCoW**: Could Have — Nice to fix but not blocking --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Implementation Attempt — Tier 1: haiku — Success

[AUTO-IMP-ISSUE-7566]

What Was Done

Fixed the TOCTOU race condition in MEMORY_ENGINES by adding a threading.Lock to engine_cache.py and wrapping the check-and-set operation in unit_of_work.py.

Changes Made

  1. src/cleveragents/infrastructure/database/engine_cache.py

    • Added MEMORY_ENGINES_LOCK = threading.Lock() — a module-level lock protecting all read-modify-write operations on MEMORY_ENGINES
  2. src/cleveragents/infrastructure/database/unit_of_work.py

    • Imported MEMORY_ENGINES_LOCK from engine_cache
    • Wrapped the in-memory SQLite engine check-and-set with with MEMORY_ENGINES_LOCK: to make it atomic
  3. features/tdd_engine_cache_toctou.feature — New BDD feature with 4 scenarios:

    • Lock is exported from engine_cache
    • Concurrent UnitOfWork instances share a single cached engine
    • Lock is acquired during cache population
    • Cache-hit path regression test
  4. features/steps/tdd_engine_cache_toctou_steps.py — Step definitions

  5. CHANGELOG.md — Added fix entry under ### Fixed

Quality Gates

  • nox -e lint — All checks passed
  • nox -e typecheck — 0 errors (3 pre-existing warnings)
  • nox -e unit_tests -- features/tdd_engine_cache_toctou.feature — 4/4 scenarios pass
  • nox -e unit_tests -- features/uow_coverage_boost.feature features/uow_lifecycle.feature — 14/14 scenarios pass (no regressions)

PR

PR #8265 created: #8265

Branch: fix/7566-engine-cache-toctou-race


Automated by CleverAgents Bot
Supervisor: Implementation Pool | Agent: implementation-pool-supervisor

**Implementation Attempt** — Tier 1: haiku — **Success** ✅ [AUTO-IMP-ISSUE-7566] ## What Was Done Fixed the TOCTOU race condition in `MEMORY_ENGINES` by adding a `threading.Lock` to `engine_cache.py` and wrapping the check-and-set operation in `unit_of_work.py`. ### Changes Made 1. **`src/cleveragents/infrastructure/database/engine_cache.py`** - Added `MEMORY_ENGINES_LOCK = threading.Lock()` — a module-level lock protecting all read-modify-write operations on `MEMORY_ENGINES` 2. **`src/cleveragents/infrastructure/database/unit_of_work.py`** - Imported `MEMORY_ENGINES_LOCK` from `engine_cache` - Wrapped the in-memory SQLite engine check-and-set with `with MEMORY_ENGINES_LOCK:` to make it atomic 3. **`features/tdd_engine_cache_toctou.feature`** — New BDD feature with 4 scenarios: - Lock is exported from engine_cache - Concurrent UnitOfWork instances share a single cached engine - Lock is acquired during cache population - Cache-hit path regression test 4. **`features/steps/tdd_engine_cache_toctou_steps.py`** — Step definitions 5. **`CHANGELOG.md`** — Added fix entry under `### Fixed` ### Quality Gates - ✅ `nox -e lint` — All checks passed - ✅ `nox -e typecheck` — 0 errors (3 pre-existing warnings) - ✅ `nox -e unit_tests -- features/tdd_engine_cache_toctou.feature` — 4/4 scenarios pass - ✅ `nox -e unit_tests -- features/uow_coverage_boost.feature features/uow_lifecycle.feature` — 14/14 scenarios pass (no regressions) ### PR PR #8265 created: https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/8265 Branch: `fix/7566-engine-cache-toctou-race` --- **Automated by CleverAgents Bot** Supervisor: Implementation Pool | Agent: implementation-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7566
No description provided.