BUG-HUNT: [concurrency] AuditService._ensure_session() shares SQLAlchemy session across background writer thread and calling thread #7325

Open
opened 2026-04-10 16:54:59 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: Concurrency — AuditService Shares Session Across Threads

Severity Assessment

  • Impact: SQLAlchemy Session objects are not thread-safe. When async mode is active, the background writer thread calls _write_payload()_ensure_session() to get the session, while the main thread calls list_entries() / count() / prune()_ensure_session() on the same _session object. This can corrupt the session state, cause DetachedInstanceError, or produce silent data loss.
  • Likelihood: Medium-High — any production use of async audit with concurrent reads and writes
  • Priority: High

Location

  • File: src/cleveragents/application/services/audit_service.py
  • Function/Class: AuditService._ensure_session(), _write_payload(), list_entries()
  • Lines: 209–234, 268–281, 434–451

Description

AuditService in async mode (_async_mode=True) launches a background audit-writer daemon thread that drains the write queue. The background thread calls _write_payload() which calls _ensure_session() to get the SQLAlchemy session, then performs session.add(row) and session.commit().

Meanwhile, the calling thread (the main thread or any service thread) can simultaneously call list_entries(), count(), prune(), get_entry(), all of which also call _ensure_session() and operate on the same _session object.

SQLAlchemy's documentation explicitly warns that Session objects are not thread-safe and should not be shared between threads. Concurrent access can corrupt the unit-of-work state, cause sqlalchemy.exc.InvalidRequestError, or produce data inconsistencies.

The _ensure_session() method itself has no lock guarding the lazy init, so two threads can also observe self._session is None simultaneously and both try to create a new session.

Evidence

# audit_service.py lines 268-281: WRITER THREAD uses the session
def _write_payload(self, payload: _AuditPayload) -> None:
    row = AuditLogModel(...)
    session = self._ensure_session()    # ← called from background thread
    session.add(row)
    session.commit()

# audit_service.py lines 434-451: CALLING THREAD uses the same session
def list_entries(self, ...) -> list[AuditLogEntry]:
    session = self._ensure_session()   # ← called from calling thread
    query = session.query(AuditLogModel)
    ...

Both share self._session with no lock, no thread isolation.

Expected Behavior

The writer thread and the calling thread should each have their own SQLAlchemy session, or the service should serialize all session access through a lock. The service's own comment at line 192 acknowledges this: "SQLite connections are thread-local and cannot be shared across the background writer thread and the calling thread" — but then it creates and shares exactly one session.

Actual Behavior

Both the writer thread and the calling thread share the same _session object, violating SQLAlchemy's thread-safety requirements.

Suggested Fix

The writer thread should have its own session factory that creates a dedicated session for the writer loop:

def _writer_loop(self) -> None:
    # Create a dedicated session for this thread
    url = self._database_url or self._settings.database_url
    engine = create_engine(url, echo=False)
    session_factory = sessionmaker(bind=engine)
    writer_session = session_factory()
    try:
        while True:
            item = self._queue.get()
            if item is _STOP_SENTINEL:
                break
            self._write_payload_with_session(item, writer_session)
    finally:
        writer_session.close()

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: Concurrency — AuditService Shares Session Across Threads ### Severity Assessment - **Impact**: SQLAlchemy `Session` objects are not thread-safe. When async mode is active, the background writer thread calls `_write_payload()` → `_ensure_session()` to get the session, while the main thread calls `list_entries()` / `count()` / `prune()` → `_ensure_session()` on the **same** `_session` object. This can corrupt the session state, cause `DetachedInstanceError`, or produce silent data loss. - **Likelihood**: Medium-High — any production use of async audit with concurrent reads and writes - **Priority**: High ### Location - **File**: `src/cleveragents/application/services/audit_service.py` - **Function/Class**: `AuditService._ensure_session()`, `_write_payload()`, `list_entries()` - **Lines**: 209–234, 268–281, 434–451 ### Description `AuditService` in async mode (`_async_mode=True`) launches a background `audit-writer` daemon thread that drains the write queue. The background thread calls `_write_payload()` which calls `_ensure_session()` to get the SQLAlchemy session, then performs `session.add(row)` and `session.commit()`. Meanwhile, the **calling thread** (the main thread or any service thread) can simultaneously call `list_entries()`, `count()`, `prune()`, `get_entry()`, all of which also call `_ensure_session()` and operate on the **same** `_session` object. SQLAlchemy's documentation explicitly warns that `Session` objects are **not thread-safe** and should not be shared between threads. Concurrent access can corrupt the unit-of-work state, cause `sqlalchemy.exc.InvalidRequestError`, or produce data inconsistencies. The `_ensure_session()` method itself has no lock guarding the lazy init, so two threads can also observe `self._session is None` simultaneously and both try to create a new session. ### Evidence ```python # audit_service.py lines 268-281: WRITER THREAD uses the session def _write_payload(self, payload: _AuditPayload) -> None: row = AuditLogModel(...) session = self._ensure_session() # ← called from background thread session.add(row) session.commit() # audit_service.py lines 434-451: CALLING THREAD uses the same session def list_entries(self, ...) -> list[AuditLogEntry]: session = self._ensure_session() # ← called from calling thread query = session.query(AuditLogModel) ... ``` Both share `self._session` with no lock, no thread isolation. ### Expected Behavior The writer thread and the calling thread should each have their own SQLAlchemy session, or the service should serialize all session access through a lock. The service's own comment at line 192 acknowledges this: "SQLite connections are thread-local and cannot be shared across the background writer thread and the calling thread" — but then it creates and shares exactly one session. ### Actual Behavior Both the writer thread and the calling thread share the same `_session` object, violating SQLAlchemy's thread-safety requirements. ### Suggested Fix The writer thread should have its own session factory that creates a dedicated session for the writer loop: ```python def _writer_loop(self) -> None: # Create a dedicated session for this thread url = self._database_url or self._settings.database_url engine = create_engine(url, echo=False) session_factory = sessionmaker(bind=engine) writer_session = session_factory() try: while True: item = self._queue.get() if item is _STOP_SENTINEL: break self._write_payload_with_session(item, writer_session) finally: writer_session.close() ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-10 19:05:19 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified — SQLAlchemy session sharing across threads is a real concurrency bug
  • Priority: Priority/High (upgrading from Backlog) — can cause data corruption in production async audit mode
  • Milestone: v3.5.0 — AuditService is core to the audit trail required for Autonomy Hardening
  • Type: Type/Bug
  • MoSCoW: Must Have — audit integrity is required for production deployment

The fix: give the writer thread its own dedicated SQLAlchemy session.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

Issue triaged by project owner: - **State**: Verified — SQLAlchemy session sharing across threads is a real concurrency bug - **Priority**: Priority/High (upgrading from Backlog) — can cause data corruption in production async audit mode - **Milestone**: v3.5.0 — AuditService is core to the audit trail required for Autonomy Hardening - **Type**: Type/Bug - **MoSCoW**: Must Have — audit integrity is required for production deployment The fix: give the writer thread its own dedicated SQLAlchemy session. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7325
No description provided.