fix(database): add threading lock to MEMORY_ENGINES cache in UnitOfWork #4156

Open
opened 2026-04-06 11:57:28 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/concurrency-memory-engine-lock
  • Commit Message: fix(database): add threading lock to MEMORY_ENGINES cache in UnitOfWork
  • Milestone: (none — see backlog note below)
  • Parent Epic: #1020

Backlog note: This issue was discovered during autonomous operation
on milestone v3.6.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Background and Context

The UnitOfWork.engine property in src/cleveragents/infrastructure/database/unit_of_work.py uses a module-level dictionary MEMORY_ENGINES to cache in-memory SQLite database engines. This caching ensures that all UnitOfWork instances share the same in-memory database (important for test isolation and consistency). However, the read-check-write sequence on MEMORY_ENGINES is not protected by a lock.

In a multi-threaded environment — such as parallel test execution or concurrent agent operations — two threads could simultaneously evaluate self.database_url not in MEMORY_ENGINES as True, both proceed to call create_engine(...), and one thread's engine would silently overwrite the other's. This leaves the dictionary in an inconsistent state and could cause test failures or data corruption in the in-memory database.

Current Behavior

# src/cleveragents/infrastructure/database/unit_of_work.py:77-87
if self.database_url == "sqlite:///:memory:":
    # Check if we have a cached engine for this in-memory database
    if self.database_url not in MEMORY_ENGINES:
        MEMORY_ENGINES[self.database_url] = create_engine(
            self.database_url,
            echo=False,
            future=True,
            isolation_level="SERIALIZABLE",
            connect_args={"check_same_thread": False},
        )
    self._engine = MEMORY_ENGINES[self.database_url]

The MEMORY_ENGINES dictionary is accessed without any locking, creating a classic check-then-act race condition.

Expected Behavior

Access to MEMORY_ENGINES is protected by a threading.Lock, ensuring that only one thread can create and cache the engine at a time. All subsequent threads reuse the already-cached engine.

import threading

MEMORY_ENGINES: dict = {}
MEMORY_ENGINES_LOCK = threading.Lock()

# ...

if self.database_url == "sqlite:///:memory:":
    with MEMORY_ENGINES_LOCK:
        if self.database_url not in MEMORY_ENGINES:
            MEMORY_ENGINES[self.database_url] = create_engine(
                self.database_url,
                echo=False,
                future=True,
                isolation_level="SERIALIZABLE",
                connect_args={"check_same_thread": False},
            )
        self._engine = MEMORY_ENGINES[self.database_url]

Acceptance Criteria

  • A threading.Lock (MEMORY_ENGINES_LOCK) is declared at module level alongside MEMORY_ENGINES
  • All access to MEMORY_ENGINES in UnitOfWork.engine is wrapped in with MEMORY_ENGINES_LOCK:
  • Existing unit tests for UnitOfWork continue to pass
  • A regression test is added that exercises concurrent UnitOfWork instantiation with sqlite:///:memory: to confirm no race condition

Supporting Information

  • File: src/cleveragents/infrastructure/database/unit_of_work.py
  • Lines: 77–87
  • Category: concurrency
  • Discovered by: ca-bug-hunter (Bug Hunting supervisor)
  • Related Epic: #1020 — [Epic] Database resource hierarchy restructuring

Subtasks

  • Add import threading to unit_of_work.py
  • Declare MEMORY_ENGINES_LOCK = threading.Lock() at module level
  • Wrap the MEMORY_ENGINES check-and-set block with with MEMORY_ENGINES_LOCK:
  • Write a concurrent regression test for UnitOfWork with sqlite:///:memory:
  • Run nox and confirm all stages pass with coverage ≥ 97%

Definition of Done

  • MEMORY_ENGINES_LOCK declared and used correctly in unit_of_work.py
  • Regression test added and passing
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/concurrency-memory-engine-lock` - **Commit Message**: `fix(database): add threading lock to MEMORY_ENGINES cache in UnitOfWork` - **Milestone**: *(none — see backlog note below)* - **Parent Epic**: #1020 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.6.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Background and Context The `UnitOfWork.engine` property in `src/cleveragents/infrastructure/database/unit_of_work.py` uses a module-level dictionary `MEMORY_ENGINES` to cache in-memory SQLite database engines. This caching ensures that all `UnitOfWork` instances share the same in-memory database (important for test isolation and consistency). However, the read-check-write sequence on `MEMORY_ENGINES` is not protected by a lock. In a multi-threaded environment — such as parallel test execution or concurrent agent operations — two threads could simultaneously evaluate `self.database_url not in MEMORY_ENGINES` as `True`, both proceed to call `create_engine(...)`, and one thread's engine would silently overwrite the other's. This leaves the dictionary in an inconsistent state and could cause test failures or data corruption in the in-memory database. ## Current Behavior ```python # src/cleveragents/infrastructure/database/unit_of_work.py:77-87 if self.database_url == "sqlite:///:memory:": # Check if we have a cached engine for this in-memory database if self.database_url not in MEMORY_ENGINES: MEMORY_ENGINES[self.database_url] = create_engine( self.database_url, echo=False, future=True, isolation_level="SERIALIZABLE", connect_args={"check_same_thread": False}, ) self._engine = MEMORY_ENGINES[self.database_url] ``` The `MEMORY_ENGINES` dictionary is accessed without any locking, creating a classic check-then-act race condition. ## Expected Behavior Access to `MEMORY_ENGINES` is protected by a `threading.Lock`, ensuring that only one thread can create and cache the engine at a time. All subsequent threads reuse the already-cached engine. ```python import threading MEMORY_ENGINES: dict = {} MEMORY_ENGINES_LOCK = threading.Lock() # ... if self.database_url == "sqlite:///:memory:": with MEMORY_ENGINES_LOCK: if self.database_url not in MEMORY_ENGINES: MEMORY_ENGINES[self.database_url] = create_engine( self.database_url, echo=False, future=True, isolation_level="SERIALIZABLE", connect_args={"check_same_thread": False}, ) self._engine = MEMORY_ENGINES[self.database_url] ``` ## Acceptance Criteria - [ ] A `threading.Lock` (`MEMORY_ENGINES_LOCK`) is declared at module level alongside `MEMORY_ENGINES` - [ ] All access to `MEMORY_ENGINES` in `UnitOfWork.engine` is wrapped in `with MEMORY_ENGINES_LOCK:` - [ ] Existing unit tests for `UnitOfWork` continue to pass - [ ] A regression test is added that exercises concurrent `UnitOfWork` instantiation with `sqlite:///:memory:` to confirm no race condition ## Supporting Information - **File**: `src/cleveragents/infrastructure/database/unit_of_work.py` - **Lines**: 77–87 - **Category**: concurrency - **Discovered by**: ca-bug-hunter (Bug Hunting supervisor) - **Related Epic**: #1020 — [Epic] Database resource hierarchy restructuring ## Subtasks - [ ] Add `import threading` to `unit_of_work.py` - [ ] Declare `MEMORY_ENGINES_LOCK = threading.Lock()` at module level - [ ] Wrap the `MEMORY_ENGINES` check-and-set block with `with MEMORY_ENGINES_LOCK:` - [ ] Write a concurrent regression test for `UnitOfWork` with `sqlite:///:memory:` - [ ] Run `nox` and confirm all stages pass with coverage ≥ 97% ## Definition of Done - [ ] `MEMORY_ENGINES_LOCK` declared and used correctly in `unit_of_work.py` - [ ] Regression test added and passing - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.6.0 milestone 2026-04-06 17:49:19 +00:00
Author
Owner

Milestone Triage Decision: Moved to Backlog

This infrastructure issue has been moved out of v3.6.0 during aggressive milestone triage. Database threading concerns are infrastructure optimizations, not advanced concepts.

Reasoning:

  • v3.6.0 focus: Advanced concepts that extend beyond core MVP
  • This issue: Database threading lock - infrastructure optimization
  • Impact: Performance/concurrency improvement, not advanced conceptual capability

Should be addressed when database performance becomes a bottleneck or in a dedicated infrastructure optimization milestone.

**Milestone Triage Decision: Moved to Backlog** This infrastructure issue has been moved out of v3.6.0 during aggressive milestone triage. Database threading concerns are infrastructure optimizations, not advanced concepts. **Reasoning:** - v3.6.0 focus: Advanced concepts that extend beyond core MVP - This issue: Database threading lock - infrastructure optimization - Impact: Performance/concurrency improvement, not advanced conceptual capability Should be addressed when database performance becomes a bottleneck or in a dedicated infrastructure optimization milestone.
freemo removed this from the v3.6.0 milestone 2026-04-06 20:40:04 +00:00
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:12:33 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
Reference
cleveragents/cleveragents-core#4156
No description provided.