TOCTOU race condition in ActorService.ensure_default_mock_actor() — check and create in separate transactions #8448

Closed
opened 2026-04-13 19:08:40 +00:00 by HAL9000 · 2 comments
Owner

Metadata

  • Commit message: Build: Reinforced label enforcement, and ensure implementation workers dont continue work on a mergable PR.
  • Branch: master

Background and Context

In src/cleveragents/application/services/actor_service.py, the ensure_default_mock_actor() method performs a check-then-act pattern across two separate database transactions, creating a Time-of-Check-Time-of-Use (TOCTOU) race condition:

def ensure_default_mock_actor(self, *, force: bool = False) -> Actor | None:
    ...
    # Transaction 1: Check for existing default
    with self.unit_of_work.transaction() as ctx:
        existing_default = ctx.actors.get_default()
        if existing_default:
            return existing_default
    # ← GAP: another thread/process could create a default actor here

    # Transaction 2: Create mock actor
    now = datetime.now()
    mock_name = "local/mock-default"
    mock_actor = Actor(...)

    with self.unit_of_work.transaction() as ctx:
        existing = ctx.actors.get_by_name(mock_name)
        if existing:
            if not existing.is_default:
                ctx.actors.set_default(existing.name)
            return ctx.actors.get_default()

        saved = ctx.actors.upsert(mock_actor)
        ctx.actors.set_default(saved.name)
        return ctx.actors.get_default()

Between Transaction 1 (checking for a default) and Transaction 2 (creating the mock actor), another concurrent call to ensure_default_mock_actor() could also find no default and proceed to create one. This can result in:

  1. Two concurrent calls both finding no default actor.
  2. Both proceeding to Transaction 2.
  3. Both attempting to upsert and set the mock actor as default.
  4. Depending on the database's upsert semantics, this may result in duplicate operations, constraint violations, or inconsistent state.

This is particularly relevant in test environments where CLEVERAGENTS_TESTING_USE_MOCK_AI=true and multiple test workers may call this method concurrently.

Current Behavior

Two concurrent calls to ensure_default_mock_actor() can both pass the Transaction 1 check (no default found) and both proceed to Transaction 2, potentially causing race conditions, duplicate upserts, or database constraint violations.

Expected Behavior

The check and creation must be atomic. Options include:

  1. Combine both operations into a single transaction with a database-level upsert/insert-or-ignore.
  2. Use a database advisory lock around the entire operation.
  3. Use INSERT OR IGNORE / ON CONFLICT DO NOTHING semantics at the database level.

The operation must be idempotent and safe for concurrent callers.

Acceptance Criteria

  • The check for an existing default actor and the creation of the mock actor occur within a single atomic transaction.
  • Concurrent calls to ensure_default_mock_actor() do not cause database constraint violations or inconsistent state.
  • The method remains idempotent (calling it multiple times produces the same result).
  • Behave BDD scenarios cover concurrent invocation (or at minimum, the idempotency guarantee).
  • nox passes (all sessions).
  • Coverage remains ≥ 97%.

Subtasks

  • Merge the two separate transactions in ensure_default_mock_actor() into a single atomic transaction
  • Use upsert/insert-or-ignore semantics to handle concurrent creation safely
  • Add Behave BDD scenarios for idempotency and concurrent safety
  • Run nox and confirm all sessions pass

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message is fix(concurrency): make ensure_default_mock_actor atomic to prevent TOCTOU race, followed by a blank line, then additional details, and a footer ISSUES CLOSED: #<this issue number>.
  • The commit is pushed to a branch and submitted as a pull request to master, reviewed, and merged.
  • All nox stages pass and coverage ≥ 97%.

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Metadata - **Commit message**: `Build: Reinforced label enforcement, and ensure implementation workers dont continue work on a mergable PR.` - **Branch**: `master` ## Background and Context In `src/cleveragents/application/services/actor_service.py`, the `ensure_default_mock_actor()` method performs a check-then-act pattern across two separate database transactions, creating a Time-of-Check-Time-of-Use (TOCTOU) race condition: ```python def ensure_default_mock_actor(self, *, force: bool = False) -> Actor | None: ... # Transaction 1: Check for existing default with self.unit_of_work.transaction() as ctx: existing_default = ctx.actors.get_default() if existing_default: return existing_default # ← GAP: another thread/process could create a default actor here # Transaction 2: Create mock actor now = datetime.now() mock_name = "local/mock-default" mock_actor = Actor(...) with self.unit_of_work.transaction() as ctx: existing = ctx.actors.get_by_name(mock_name) if existing: if not existing.is_default: ctx.actors.set_default(existing.name) return ctx.actors.get_default() saved = ctx.actors.upsert(mock_actor) ctx.actors.set_default(saved.name) return ctx.actors.get_default() ``` Between Transaction 1 (checking for a default) and Transaction 2 (creating the mock actor), another concurrent call to `ensure_default_mock_actor()` could also find no default and proceed to create one. This can result in: 1. Two concurrent calls both finding no default actor. 2. Both proceeding to Transaction 2. 3. Both attempting to upsert and set the mock actor as default. 4. Depending on the database's upsert semantics, this may result in duplicate operations, constraint violations, or inconsistent state. This is particularly relevant in test environments where `CLEVERAGENTS_TESTING_USE_MOCK_AI=true` and multiple test workers may call this method concurrently. ## Current Behavior Two concurrent calls to `ensure_default_mock_actor()` can both pass the Transaction 1 check (no default found) and both proceed to Transaction 2, potentially causing race conditions, duplicate upserts, or database constraint violations. ## Expected Behavior The check and creation must be atomic. Options include: 1. Combine both operations into a single transaction with a database-level upsert/insert-or-ignore. 2. Use a database advisory lock around the entire operation. 3. Use `INSERT OR IGNORE` / `ON CONFLICT DO NOTHING` semantics at the database level. The operation must be idempotent and safe for concurrent callers. ## Acceptance Criteria - [ ] The check for an existing default actor and the creation of the mock actor occur within a single atomic transaction. - [ ] Concurrent calls to `ensure_default_mock_actor()` do not cause database constraint violations or inconsistent state. - [ ] The method remains idempotent (calling it multiple times produces the same result). - [ ] Behave BDD scenarios cover concurrent invocation (or at minimum, the idempotency guarantee). - [ ] `nox` passes (all sessions). - [ ] Coverage remains ≥ 97%. ## Subtasks - [ ] Merge the two separate transactions in `ensure_default_mock_actor()` into a single atomic transaction - [ ] Use upsert/insert-or-ignore semantics to handle concurrent creation safely - [ ] Add Behave BDD scenarios for idempotency and concurrent safety - [ ] Run `nox` and confirm all sessions pass ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message is `fix(concurrency): make ensure_default_mock_actor atomic to prevent TOCTOU race`, followed by a blank line, then additional details, and a footer `ISSUES CLOSED: #<this issue number>`. - The commit is pushed to a branch and submitted as a pull request to `master`, reviewed, and merged. - All nox stages pass and coverage ≥ 97%. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.6.0 milestone 2026-04-13 19:14:33 +00:00
Author
Owner

[AUTO-OWNR-5] Triage Decision

Status: Verified

MoSCoW: Should Have
Priority: Medium

Rationale: This is a genuine TOCTOU (Time-of-Check-Time-of-Use) race condition in ActorService.ensure_default_mock_actor(). The check for an existing default actor and the creation of the mock actor occur in two separate transactions, creating a window where concurrent callers can both observe no default and both attempt to create one. This is particularly relevant in test environments with CLEVERAGENTS_TESTING_USE_MOCK_AI=true and parallel test workers. While the upsert semantics may mitigate the worst outcomes, the non-atomic pattern is a correctness risk. Classified as Should Have because concurrency correctness matters for reliability, but the bug is most likely to manifest in test environments rather than production single-user scenarios.

Next Steps: Merge the two separate transactions in ensure_default_mock_actor() into a single atomic transaction using upsert/insert-or-ignore semantics. Add Behave BDD scenarios covering idempotency and concurrent safety. Run nox and confirm all sessions pass with coverage ≥ 97%.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## [AUTO-OWNR-5] Triage Decision **Status**: ✅ Verified **MoSCoW**: Should Have **Priority**: Medium **Rationale**: This is a genuine TOCTOU (Time-of-Check-Time-of-Use) race condition in `ActorService.ensure_default_mock_actor()`. The check for an existing default actor and the creation of the mock actor occur in two separate transactions, creating a window where concurrent callers can both observe no default and both attempt to create one. This is particularly relevant in test environments with `CLEVERAGENTS_TESTING_USE_MOCK_AI=true` and parallel test workers. While the upsert semantics may mitigate the worst outcomes, the non-atomic pattern is a correctness risk. Classified as **Should Have** because concurrency correctness matters for reliability, but the bug is most likely to manifest in test environments rather than production single-user scenarios. **Next Steps**: Merge the two separate transactions in `ensure_default_mock_actor()` into a single atomic transaction using upsert/insert-or-ignore semantics. Add Behave BDD scenarios covering idempotency and concurrent safety. Run `nox` and confirm all sessions pass with coverage ≥ 97%. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Implementation Attempt — Tier 3: sonnet — Success

PR #10627 already existed from Tier 1 (Haiku) implementation but was missing the milestone assignment. Updated PR #10627 to assign milestone v3.6.0 (ID 109) via PATCH API call.

PR #10627 contains:

  • Fix for TOCTOU race condition in ensure_default_mock_actor()
  • Atomic transaction implementation combining check and create into a single transaction
  • Proper error handling and idempotency guarantees
  • Unit and integration tests

All quality gates were passing per the Tier 1 implementation. Milestone v3.6.0 has now been assigned to the PR.


Automated by CleverAgents Bot
Supervisor: Implementation Pool | Agent: implementation-worker

**Implementation Attempt** — Tier 3: sonnet — Success PR #10627 already existed from Tier 1 (Haiku) implementation but was missing the milestone assignment. Updated PR #10627 to assign milestone v3.6.0 (ID 109) via PATCH API call. PR #10627 contains: - Fix for TOCTOU race condition in `ensure_default_mock_actor()` - Atomic transaction implementation combining check and create into a single transaction - Proper error handling and idempotency guarantees - Unit and integration tests All quality gates were passing per the Tier 1 implementation. Milestone v3.6.0 has now been assigned to the PR. --- **Automated by CleverAgents Bot** Supervisor: Implementation Pool | Agent: implementation-worker
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8448
No description provided.