UAT: SandboxManager.commit_all() is not safe for concurrent calls on the same plan_id — no guard prevents parallel apply invocations from corrupting sandbox state #5740

Open
opened 2026-04-09 08:57:08 +00:00 by HAL9000 · 2 comments
Owner

Bug Report

Feature Area: git-worktree-sandbox — Feature 4: Multiple concurrent sandboxes for parallel plans
Severity: Priority/Backlog — race condition possible when multiple concurrent apply operations target the same plan

What Was Tested

Code-level analysis of SandboxManager.commit_all() in src/cleveragents/infrastructure/sandbox/manager.py against the spec requirement for multiple concurrent sandboxes.

Expected Behavior (from spec)

The v3.0.0 milestone requires support for "Multiple concurrent sandboxes for parallel plans." The SandboxManager is designed to support multiple sandboxes per plan (one per resource), and commit_all() commits them atomically.

Actual Behavior

SandboxManager.commit_all() has an explicit thread-safety warning in its docstring:

# src/cleveragents/infrastructure/sandbox/manager.py, lines 248-252
.. warning:: Thread safety

   This method is **not safe** for concurrent calls on the same
   *plan_id*.  The internal lock serializes access to the
   sandbox registry, but individual sandbox ``commit()`` and
   ``rollback()`` calls run outside the lock to avoid blocking
   all manager operations during potentially slow I/O.
   Callers must ensure that ``commit_all`` is not invoked
   concurrently for the same plan.

Problems:

  1. No guard at the CLI level: The agents plan apply <plan_id> command does not prevent concurrent invocations for the same plan. If two users or processes run agents plan apply <same_plan_id> simultaneously, both will call commit_all() concurrently, potentially causing:

    • Double-commit of the same sandbox changes
    • Race condition in the atomic rollback logic
    • Corrupted git history in the target repository
  2. No plan-level lock: There is no database-level lock or advisory lock on the plan record during apply. The plan state is read and updated in separate transactions, creating a TOCTOU (time-of-check-time-of-use) window.

  3. _rollback_committed runs outside the lock: The atomic rollback logic in _rollback_committed() iterates over already-committed sandboxes and calls sandbox.rollback() outside the _lock. If a concurrent commit_all() is running, the rollback may operate on sandboxes that are simultaneously being committed by the other call.

  4. Multiple concurrent sandboxes per plan: While SandboxManager supports multiple sandboxes per plan (one per resource), the commit_all() method commits them sequentially (not in parallel). For plans with many resources, this is a performance bottleneck.

Code Locations

  • Thread safety warning: src/cleveragents/infrastructure/sandbox/manager.py, commit_all() docstring (lines 248–252)
  • Lock scope: src/cleveragents/infrastructure/sandbox/manager.py, lines 261–263 (lock only covers registry read, not commit/rollback)
  • _rollback_committed outside lock: src/cleveragents/infrastructure/sandbox/manager.py, _rollback_committed() (line 370)
  • No CLI guard: src/cleveragents/cli/commands/plan.py, _lifecycle_apply_with_id() — no concurrency guard

Impact

  • Concurrent agents plan apply invocations for the same plan can corrupt the target git repository
  • The atomic "all-or-nothing" guarantee is broken under concurrent access
  • No user-facing error when concurrent apply is attempted

Fix Required

  1. Add a plan-level advisory lock: Before calling commit_all(), acquire a database-level advisory lock on the plan record (e.g., using SQLite's BEGIN EXCLUSIVE or a plan_locks table). Release the lock after cleanup.

  2. Validate plan state before apply: Check that the plan is in Execute/complete state before starting apply, and use an optimistic lock (version counter) to detect concurrent modifications.

  3. Document the limitation: At minimum, add a clear error message when plan apply is called on a plan that is already in Apply/processing state.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: git-worktree-sandbox — Feature 4: Multiple concurrent sandboxes for parallel plans **Severity**: Priority/Backlog — race condition possible when multiple concurrent apply operations target the same plan ## What Was Tested Code-level analysis of `SandboxManager.commit_all()` in `src/cleveragents/infrastructure/sandbox/manager.py` against the spec requirement for multiple concurrent sandboxes. ## Expected Behavior (from spec) The v3.0.0 milestone requires support for "Multiple concurrent sandboxes for parallel plans." The `SandboxManager` is designed to support multiple sandboxes per plan (one per resource), and `commit_all()` commits them atomically. ## Actual Behavior `SandboxManager.commit_all()` has an explicit thread-safety warning in its docstring: ```python # src/cleveragents/infrastructure/sandbox/manager.py, lines 248-252 .. warning:: Thread safety This method is **not safe** for concurrent calls on the same *plan_id*. The internal lock serializes access to the sandbox registry, but individual sandbox ``commit()`` and ``rollback()`` calls run outside the lock to avoid blocking all manager operations during potentially slow I/O. Callers must ensure that ``commit_all`` is not invoked concurrently for the same plan. ``` **Problems:** 1. **No guard at the CLI level**: The `agents plan apply <plan_id>` command does not prevent concurrent invocations for the same plan. If two users or processes run `agents plan apply <same_plan_id>` simultaneously, both will call `commit_all()` concurrently, potentially causing: - Double-commit of the same sandbox changes - Race condition in the atomic rollback logic - Corrupted git history in the target repository 2. **No plan-level lock**: There is no database-level lock or advisory lock on the plan record during apply. The plan state is read and updated in separate transactions, creating a TOCTOU (time-of-check-time-of-use) window. 3. **`_rollback_committed` runs outside the lock**: The atomic rollback logic in `_rollback_committed()` iterates over already-committed sandboxes and calls `sandbox.rollback()` outside the `_lock`. If a concurrent `commit_all()` is running, the rollback may operate on sandboxes that are simultaneously being committed by the other call. 4. **Multiple concurrent sandboxes per plan**: While `SandboxManager` supports multiple sandboxes per plan (one per resource), the `commit_all()` method commits them sequentially (not in parallel). For plans with many resources, this is a performance bottleneck. ## Code Locations - **Thread safety warning**: `src/cleveragents/infrastructure/sandbox/manager.py`, `commit_all()` docstring (lines 248–252) - **Lock scope**: `src/cleveragents/infrastructure/sandbox/manager.py`, lines 261–263 (lock only covers registry read, not commit/rollback) - **`_rollback_committed` outside lock**: `src/cleveragents/infrastructure/sandbox/manager.py`, `_rollback_committed()` (line 370) - **No CLI guard**: `src/cleveragents/cli/commands/plan.py`, `_lifecycle_apply_with_id()` — no concurrency guard ## Impact - Concurrent `agents plan apply` invocations for the same plan can corrupt the target git repository - The atomic "all-or-nothing" guarantee is broken under concurrent access - No user-facing error when concurrent apply is attempted ## Fix Required 1. **Add a plan-level advisory lock**: Before calling `commit_all()`, acquire a database-level advisory lock on the plan record (e.g., using SQLite's `BEGIN EXCLUSIVE` or a `plan_locks` table). Release the lock after cleanup. 2. **Validate plan state before apply**: Check that the plan is in `Execute/complete` state before starting apply, and use an optimistic lock (version counter) to detect concurrent modifications. 3. **Document the limitation**: At minimum, add a clear error message when `plan apply` is called on a plan that is already in `Apply/processing` state. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 09:05:17 +00:00
Author
Owner

Label compliance fix applied:

  • Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Author
Owner

Hierarchical Compliance Fix: This issue was detected as an orphan (no parent Epic).

Solution: Linked to Epic #4951 (Guard Enforcement & Safety Profiles — Automation Profile Resolution) based on scope alignment — concurrent sandbox safety is part of plan execution safety.

Hierarchy: Issue #5740 → Epic #4951


Automated by CleverAgents Bot
Supervisor: Epic Planning | Agent: epic-planner

**Hierarchical Compliance Fix**: This issue was detected as an orphan (no parent Epic). **Solution**: Linked to Epic #4951 (Guard Enforcement & Safety Profiles — Automation Profile Resolution) based on scope alignment — concurrent sandbox safety is part of plan execution safety. **Hierarchy**: Issue #5740 → Epic #4951 --- **Automated by CleverAgents Bot** Supervisor: Epic Planning | Agent: epic-planner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#5740
No description provided.