fix(sandbox): make commit_all atomic per specification #1146

Merged
CoreRasurae merged 1 commit from fix/atomic-sandbox-commit into master 2026-03-29 18:13:20 +00:00
Member

Summary

Changed SandboxManager.commit_all() from partial-commit semantics to all-or-nothing atomic operation per specification requirement. On partial failure, already-committed sandboxes are rolled back using strategy-specific mechanisms (filesystem backup for CoW/Overlay, git reset for worktree).

Changes

  • SandboxManager.commit_all() now enforces atomicity with _rollback_committed() helper
  • Added COMMITTED → ROLLED_BACK status transition in protocol
  • Updated all 4 sandbox strategies to support rollback from COMMITTED state
  • 3 new Behave scenarios + Robot integration test for atomic commit verification
  • Coverage: 98% (exceeds 97% requirement)

Design Decision

Chose rollback-on-failure over pre-validation (can't guarantee success) or two-phase commit (too invasive for protocol). Each strategy implements rollback from COMMITTED state using its native mechanism.

Quality Gates

All nox stages pass: lint, typecheck, unit_tests (12,294 scenarios), integration_tests, coverage_report.

Closes #925

## Summary Changed `SandboxManager.commit_all()` from partial-commit semantics to all-or-nothing atomic operation per specification requirement. On partial failure, already-committed sandboxes are rolled back using strategy-specific mechanisms (filesystem backup for CoW/Overlay, git reset for worktree). ## Changes - `SandboxManager.commit_all()` now enforces atomicity with `_rollback_committed()` helper - Added `COMMITTED → ROLLED_BACK` status transition in protocol - Updated all 4 sandbox strategies to support rollback from COMMITTED state - 3 new Behave scenarios + Robot integration test for atomic commit verification - Coverage: **98%** (exceeds 97% requirement) ## Design Decision Chose rollback-on-failure over pre-validation (can't guarantee success) or two-phase commit (too invasive for protocol). Each strategy implements rollback from COMMITTED state using its native mechanism. ## Quality Gates All nox stages pass: lint, typecheck, unit_tests (12,294 scenarios), integration_tests, coverage_report. Closes #925
CoreRasurae added this to the v3.5.0 milestone 2026-03-24 03:45:25 +00:00
freemo approved these changes 2026-03-24 15:27:37 +00:00
Dismissed
freemo left a comment

Review: APPROVED

Well-designed atomicity fix with appropriate rollback strategies per sandbox type: CoW/Overlay use pre-commit backup dirs, GitWorktree uses git reset --hard, Transaction logs a warning (COMMIT is irreversible). The _rollback_committed() helper cleanly isolates per-sandbox error handling. State machine update to allow COMMITTED -> ROLLED_BACK is correct. 3 BDD scenarios + 1 Robot integration test — both present.

Minor Notes

  1. CopyOnWriteSandbox._backup_directory and OverlaySandbox._backup_directory are nearly identical (~30 duplicated lines). Consider extracting to a shared utility in a follow-up.

  2. Pre-commit backups via tempfile.mkdtemp + full directory copy could be expensive for large workspaces. No size guard or warning — worth noting for future performance consideration.

  3. TransactionSandbox.rollback() from COMMITTED state is effectively a no-op (only logs a warning). The manager's _rollback_committed will report it as rolled back when it isn't really. Callers should be aware that DB changes are permanent despite the "successful" rollback status. Consider surfacing this distinction in the return value.

## Review: APPROVED Well-designed atomicity fix with appropriate rollback strategies per sandbox type: CoW/Overlay use pre-commit backup dirs, GitWorktree uses `git reset --hard`, Transaction logs a warning (COMMIT is irreversible). The `_rollback_committed()` helper cleanly isolates per-sandbox error handling. State machine update to allow `COMMITTED -> ROLLED_BACK` is correct. 3 BDD scenarios + 1 Robot integration test — both present. ### Minor Notes 1. `CopyOnWriteSandbox._backup_directory` and `OverlaySandbox._backup_directory` are nearly identical (~30 duplicated lines). Consider extracting to a shared utility in a follow-up. 2. Pre-commit backups via `tempfile.mkdtemp` + full directory copy could be expensive for large workspaces. No size guard or warning — worth noting for future performance consideration. 3. `TransactionSandbox.rollback()` from `COMMITTED` state is effectively a no-op (only logs a warning). The manager's `_rollback_committed` will report it as rolled back when it isn't really. Callers should be aware that DB changes are permanent despite the "successful" rollback status. Consider surfacing this distinction in the return value.
freemo approved these changes 2026-03-27 17:10:56 +00:00
Dismissed
freemo left a comment

Review: fix(sandbox): make commit_all atomic per specification

Approved with comments. Important correctness fix with clean implementation.

Issues to Address

1. Duplicated _backup_directory implementation (Medium)
Both CopyOnWriteSandbox._backup_directory and OverlaySandbox._backup_directory have identical implementations. Extract to a shared utility function to follow DRY.

2. TransactionSandbox "fake rollback" (Medium)
TransactionSandbox.rollback_committed() logs a warning that DB COMMIT cannot be undone, then transitions to ROLLED_BACK anyway. The Manager's _rollback_committed will count this as "successfully rolled back" in rolled_back_ids, which is misleading. Consider either:

  • Returning a distinct status so callers know the rollback was not real
  • Not adding the ID to rolled_back_ids for transaction sandboxes

3. Backup performance concern (Low)
Backup creation during CopyOnWrite commit copies the entire original directory (tempfile.mkdtemp + _backup_directory). For large repositories this could be slow and disk-intensive. Consider documenting size limits or adding a benchmark.

What's Good

  • State machine correctly updated: COMMITTED → [ROLLED_BACK, CLEANED_UP].
  • Manager.commit_all rewritten with clear atomic semantics: on failure, calls _rollback_committed for already-committed sandboxes.
  • GitWorktreeSandbox records pre-merge HEAD and uses git reset --hard for rollback.
  • Low-level I/O in backup methods (_copy_file_bytes) avoids mock interference.
  • 3 BDD scenarios + Robot integration test covering rollback on failure, error reporting, and rollback error tolerance.
## Review: fix(sandbox): make commit_all atomic per specification **Approved with comments.** Important correctness fix with clean implementation. ### Issues to Address **1. Duplicated `_backup_directory` implementation (Medium)** Both `CopyOnWriteSandbox._backup_directory` and `OverlaySandbox._backup_directory` have identical implementations. Extract to a shared utility function to follow DRY. **2. TransactionSandbox "fake rollback" (Medium)** `TransactionSandbox.rollback_committed()` logs a warning that DB COMMIT cannot be undone, then transitions to `ROLLED_BACK` anyway. The Manager's `_rollback_committed` will count this as "successfully rolled back" in `rolled_back_ids`, which is misleading. Consider either: - Returning a distinct status so callers know the rollback was not real - Not adding the ID to `rolled_back_ids` for transaction sandboxes **3. Backup performance concern (Low)** Backup creation during CopyOnWrite commit copies the entire original directory (`tempfile.mkdtemp` + `_backup_directory`). For large repositories this could be slow and disk-intensive. Consider documenting size limits or adding a benchmark. ### What's Good - State machine correctly updated: `COMMITTED → [ROLLED_BACK, CLEANED_UP]`. - Manager.commit_all rewritten with clear atomic semantics: on failure, calls `_rollback_committed` for already-committed sandboxes. - GitWorktreeSandbox records pre-merge HEAD and uses `git reset --hard` for rollback. - Low-level I/O in backup methods (`_copy_file_bytes`) avoids mock interference. - 3 BDD scenarios + Robot integration test covering rollback on failure, error reporting, and rollback error tolerance.
Author
Member

Code Review Report: PR #1146 — Atomic commit_all

Reviewer: Automated deep review (4 full review cycles)
Scope: Code changes in fix/atomic-sandbox-commit branch + immediate surrounding code
References: Issue #925, docs/specification.md (lines 45938, 19193)


Executive Summary

The PR correctly implements the core atomicity requirement from the specification (line 45938): commit_all() now rolls back previously-committed sandboxes when one fails. The protocol transition map, all four real sandbox strategies, and the NoSandbox pass-through are addressed. Tests cover the happy path, rollback-on-failure, and rollback-error tolerance.

However, the review identified 15 findings across 6 categories, including 3 high-severity bugs that can silently break the atomicity guarantee this PR is designed to deliver. These should be addressed before merge.


HIGH Severity

H1. BUG — commit_all catches only SandboxError: atomicity silently broken for unexpected exceptions

File: manager.py:245

except SandboxError as exc:
    rolled_back_ids = self._rollback_committed(committed, plan_id)

If sandbox.commit() raises any non-SandboxError exception (e.g., RuntimeError, TypeError, KeyError, or an uncaught OSError from a library call), the exception propagates without rolling back already-committed sandboxes. The caller sees a crash, but committed sandboxes remain committed — the atomicity guarantee is silently violated.

Recommendation: Widen the catch to Exception (or at minimum (SandboxError, OSError, RuntimeError)) with a final re-raise after rollback:

except Exception as exc:
    rolled_back_ids = self._rollback_committed(committed, plan_id)
    if isinstance(exc, SandboxError):
        # ... existing error handling ...
    else:
        raise  # Re-raise unexpected exceptions after rollback

H2. BUG — TransactionSandbox.rollback() from COMMITTED falsely reports success

File: transaction_sandbox.py:284-290

When rolling back from COMMITTED, the method logs a warning but still transitions to ROLLED_BACK:

if self._status == SandboxStatus.COMMITTED:
    logger.warning("Cannot undo database COMMIT for sandbox %s — ...")
# ... falls through to ...
self._status = SandboxStatus.ROLLED_BACK

The manager's _rollback_committed then adds this sandbox's ID to rolled_back_ids, telling the caller "this sandbox was successfully rolled back." But the database changes are permanent and NOT actually reversed. The error report provides a false sense of atomicity.

Recommendation: Either:

  • (a) Raise SandboxRollbackError to signal that rollback could not actually undo the committed changes, so the manager correctly excludes it from rolled_back_ids, OR
  • (b) Return a distinct indicator and add the ID to a separate "rollback_noop" metadata list.

H3. BUG — Rollback-from-COMMITTED uses non-atomic rmtree-then-copytree: data loss risk

Files: copy_on_write.py:313-319, overlay.py:366-372

shutil.rmtree(self._original_path, ignore_errors=True)   # DELETE original
shutil.copytree(self._pre_commit_backup, self._original_path, ...)  # RESTORE

If copytree fails after rmtree succeeds (e.g., disk full, permission error), the original directory is deleted and the backup restoration is incomplete. The data is irrecoverably lost — the backup still exists on disk but the original path is gone, and the sandbox transitions to ERRORED (which does not permit rollback). Additionally, rmtree(..., ignore_errors=True) can leave a partially-deleted directory, causing copytree(..., dirs_exist_ok=False) to fail on the remaining files.

Recommendation: Use a rename-based approach:

temp_name = self._original_path + ".atomic-rollback-old"
os.rename(self._original_path, temp_name)
try:
    shutil.copytree(self._pre_commit_backup, self._original_path, ...)
    shutil.rmtree(temp_name, ignore_errors=True)
except OSError:
    os.rename(temp_name, self._original_path)  # Restore on failure
    raise

MEDIUM Severity

M1. BUG — _rollback_committed catches only SandboxError: remaining rollbacks skipped on unexpected exception

File: manager.py:304

Same class of issue as H1 but in the rollback loop. If one sandbox's rollback() raises a non-SandboxError (e.g., subprocess.SubprocessError from git, OSError), the exception propagates and remaining rollbacks are never attempted.

Recommendation: Catch Exception in the rollback loop.


M2. BUG — No reporting of failed rollbacks in error metadata

File: manager.py:260-268

The error result's metadata only contains "rolled_back" (successfully rolled-back IDs). Sandboxes that failed to roll back are silently omitted. The caller has no programmatic way to identify which sandboxes are in an inconsistent state — critical operational information when atomicity itself fails.

Recommendation: Add a "rollback_failed" key:

metadata={
    "rolled_back": rolled_back_ids,
    "rollback_failed": failed_rollback_ids,
}

Files: copy_on_write.py:451-461, overlay.py:539-549

os.walk() does not descend into symlinked directories, and open(src, "rb") follows symlinks for files — so the backup converts symlinks to regular files. _copy_file_bytes explicitly preserves no metadata (permissions, timestamps). After rollback from COMMITTED, the original directory:

  • Loses all symlink structure (symlinks become regular files with copied content)
  • Gets default file permissions (e.g., 0o644) instead of original permissions

This contrasts with ACTIVE rollback which uses shutil.copytree(symlinks=True) and preserves permissions. The inconsistency means rollback-from-COMMITTED is less faithful than rollback-from-ACTIVE.

Recommendation: Use shutil.copy2 or explicitly handle symlinks and permissions in the backup. At minimum, document the limitation.


M4. DESIGN — Specification contradiction unaddressed

The specification contains two contradictory statements:

  • Line 45938 (Security Model): "The Apply phase is an atomic operation — either all sandbox changes are committed to the real resources, or none are."
  • Line 19193 (Multi-Resource Sandboxing): "If any sandbox Apply fails, others may still succeed (partial apply)."

This PR correctly implements atomicity per line 45938, but the contradicting statement at line 19193 remains in the spec. Future developers may be confused about the intended behavior.

Recommendation: Open a follow-up issue to update line 19193 to align with the atomic behavior, or add a code comment referencing the known contradiction.


M5. CODE QUALITY — Duplicated _backup_directory across two sandbox implementations

Files: copy_on_write.py:436-461, overlay.py:528-559

Nearly identical _backup_directory implementations (and inline _copy_file_bytes in overlay). This duplication increases maintenance burden and risk of divergence.

Recommendation: Extract to a shared utility in a common module (e.g., sandbox/_fs_utils.py).


LOW Severity

L1. TEST GAP — No test for real file-based rollback from COMMITTED state

All atomic commit tests use MagicMock sandboxes. No Behave scenario or Robot test exercises the actual CopyOnWriteSandbox, OverlaySandbox, or GitWorktreeSandbox rollback() from COMMITTED state with real files on disk. The backup, rmtree, copytree, and git-reset paths are untested at the integration level.


L2. TEST GAP — No test for NoSandbox/TransactionSandbox interaction with atomic commit

NoSandbox.rollback() always raises SandboxRollbackError. TransactionSandbox.rollback() from COMMITTED is a silent no-op (see H2). Neither interaction is tested in an atomic commit context. The edge case of mixing sandbox strategies in a single plan is not covered.


L3. TEST FRAGILITY — Behave scenario depends on Python dict iteration order

The scenario "Atomic commit rolls back already-committed sandboxes when one fails" (sandbox_manager_coverage.feature:163) depends on res-001 being committed before res-002 to have any rollback candidates. This relies on Python 3.7+ dict insertion order — correct but implicit. If the manager ever switches to a non-ordered collection, the test silently becomes non-deterministic.

Recommendation: Add a comment documenting the ordering assumption.


L4. TEST WEAKNESS — Robot integration test verifies mock call, not actual state restoration

_manager_atomic_commit() in helper_sandbox_integration.py asserts ok_mock.rollback.called but does not verify actual file-system state restoration. Since it uses mocks, this is expected — but combined with L1, it means the rollback correctness is never verified end-to-end.


L5. PERFORMANCE — Full directory backup doubles disk usage at commit time for CoW/Overlay

For CopyOnWriteSandbox and OverlaySandbox, the entire original directory is backed up to a temp directory before commit. For large repositories this doubles disk usage and adds significant I/O latency. The GitWorktreeSandbox avoids this by recording only a commit SHA (much more efficient). This is an inherent cost of the chosen approach but worth noting for large-scale deployments.


L6. DOCUMENTATION — Status transition diagram in protocol.py docstring is ambiguous

The ASCII diagram in SandboxStatus docstring (lines 50-58) does not clearly show the new COMMITTED -> ROLLED_BACK transition added by this PR. While the code (valid_transitions()) is correct, the visual representation is potentially misleading to developers reading the docstring.


Summary Table

ID Severity Category File(s) Summary
H1 HIGH Bug manager.py:245 commit_all only catches SandboxError — atomicity broken for unexpected exceptions
H2 HIGH Bug transaction_sandbox.py:284 Rollback from COMMITTED falsely reports success for DB transactions
H3 HIGH Bug copy_on_write.py:313, overlay.py:366 Non-atomic rmtree+copytree in rollback — data loss risk
M1 MEDIUM Bug manager.py:304 _rollback_committed catches only SandboxError — remaining rollbacks skipped
M2 MEDIUM Bug manager.py:260 No reporting of failed rollbacks in error metadata
M3 MEDIUM Bug copy_on_write.py:451, overlay.py:539 Backup does not preserve symlinks or permissions
M4 MEDIUM Design spec lines 45938 vs 19193 Specification contradiction unaddressed
M5 MEDIUM Code Quality copy_on_write.py:436, overlay.py:528 Duplicated _backup_directory across two files
L1 LOW Test Gap tests No real file-based rollback-from-COMMITTED test
L2 LOW Test Gap tests No NoSandbox/TransactionSandbox atomic commit test
L3 LOW Test Fragility feature file:163 Test depends on dict iteration order
L4 LOW Test Weakness helper_sandbox_integration.py Robot test verifies mock call, not actual restoration
L5 LOW Performance copy_on_write.py, overlay.py Full directory backup doubles disk usage
L6 LOW Documentation protocol.py:50-58 Status transition diagram not updated

Verdict: The HIGH severity items (H1, H2, H3) should be addressed before merge. H1 and H3 are particularly concerning as they can silently violate the atomicity guarantee or cause data loss — the very things this PR is designed to prevent.

# Code Review Report: PR #1146 — Atomic `commit_all` **Reviewer:** Automated deep review (4 full review cycles) **Scope:** Code changes in `fix/atomic-sandbox-commit` branch + immediate surrounding code **References:** Issue #925, `docs/specification.md` (lines 45938, 19193) --- ## Executive Summary The PR correctly implements the core atomicity requirement from the specification (line 45938): `commit_all()` now rolls back previously-committed sandboxes when one fails. The protocol transition map, all four real sandbox strategies, and the `NoSandbox` pass-through are addressed. Tests cover the happy path, rollback-on-failure, and rollback-error tolerance. However, the review identified **15 findings** across 6 categories, including **3 high-severity bugs** that can silently break the atomicity guarantee this PR is designed to deliver. These should be addressed before merge. --- ## HIGH Severity ### H1. BUG — `commit_all` catches only `SandboxError`: atomicity silently broken for unexpected exceptions **File:** `manager.py:245` ```python except SandboxError as exc: rolled_back_ids = self._rollback_committed(committed, plan_id) ``` If `sandbox.commit()` raises any non-`SandboxError` exception (e.g., `RuntimeError`, `TypeError`, `KeyError`, or an uncaught `OSError` from a library call), the exception propagates **without rolling back** already-committed sandboxes. The caller sees a crash, but committed sandboxes remain committed — the atomicity guarantee is silently violated. **Recommendation:** Widen the catch to `Exception` (or at minimum `(SandboxError, OSError, RuntimeError)`) with a final re-raise after rollback: ```python except Exception as exc: rolled_back_ids = self._rollback_committed(committed, plan_id) if isinstance(exc, SandboxError): # ... existing error handling ... else: raise # Re-raise unexpected exceptions after rollback ``` --- ### H2. BUG — `TransactionSandbox.rollback()` from COMMITTED falsely reports success **File:** `transaction_sandbox.py:284-290` When rolling back from `COMMITTED`, the method logs a warning but still transitions to `ROLLED_BACK`: ```python if self._status == SandboxStatus.COMMITTED: logger.warning("Cannot undo database COMMIT for sandbox %s — ...") # ... falls through to ... self._status = SandboxStatus.ROLLED_BACK ``` The manager's `_rollback_committed` then adds this sandbox's ID to `rolled_back_ids`, telling the caller "this sandbox was successfully rolled back." But the database changes are **permanent and NOT actually reversed**. The error report provides a false sense of atomicity. **Recommendation:** Either: - (a) Raise `SandboxRollbackError` to signal that rollback could not actually undo the committed changes, so the manager correctly excludes it from `rolled_back_ids`, OR - (b) Return a distinct indicator and add the ID to a separate `"rollback_noop"` metadata list. --- ### H3. BUG — Rollback-from-COMMITTED uses non-atomic `rmtree`-then-`copytree`: data loss risk **Files:** `copy_on_write.py:313-319`, `overlay.py:366-372` ```python shutil.rmtree(self._original_path, ignore_errors=True) # DELETE original shutil.copytree(self._pre_commit_backup, self._original_path, ...) # RESTORE ``` If `copytree` fails after `rmtree` succeeds (e.g., disk full, permission error), the original directory is **deleted** and the backup restoration is incomplete. The data is **irrecoverably lost** — the backup still exists on disk but the original path is gone, and the sandbox transitions to `ERRORED` (which does not permit rollback). Additionally, `rmtree(..., ignore_errors=True)` can leave a partially-deleted directory, causing `copytree(..., dirs_exist_ok=False)` to fail on the remaining files. **Recommendation:** Use a rename-based approach: ```python temp_name = self._original_path + ".atomic-rollback-old" os.rename(self._original_path, temp_name) try: shutil.copytree(self._pre_commit_backup, self._original_path, ...) shutil.rmtree(temp_name, ignore_errors=True) except OSError: os.rename(temp_name, self._original_path) # Restore on failure raise ``` --- ## MEDIUM Severity ### M1. BUG — `_rollback_committed` catches only `SandboxError`: remaining rollbacks skipped on unexpected exception **File:** `manager.py:304` Same class of issue as H1 but in the rollback loop. If one sandbox's `rollback()` raises a non-`SandboxError` (e.g., `subprocess.SubprocessError` from git, `OSError`), the exception propagates and **remaining rollbacks are never attempted**. **Recommendation:** Catch `Exception` in the rollback loop. --- ### M2. BUG — No reporting of failed rollbacks in error metadata **File:** `manager.py:260-268` The error result's `metadata` only contains `"rolled_back"` (successfully rolled-back IDs). Sandboxes that **failed** to roll back are silently omitted. The caller has no programmatic way to identify which sandboxes are in an inconsistent state — critical operational information when atomicity itself fails. **Recommendation:** Add a `"rollback_failed"` key: ```python metadata={ "rolled_back": rolled_back_ids, "rollback_failed": failed_rollback_ids, } ``` --- ### M3. BUG — `_backup_directory` does not preserve symlinks or permissions **Files:** `copy_on_write.py:451-461`, `overlay.py:539-549` `os.walk()` does not descend into symlinked directories, and `open(src, "rb")` follows symlinks for files — so the backup converts symlinks to regular files. `_copy_file_bytes` explicitly preserves no metadata (permissions, timestamps). After rollback from `COMMITTED`, the original directory: - Loses all symlink structure (symlinks become regular files with copied content) - Gets default file permissions (e.g., `0o644`) instead of original permissions This contrasts with `ACTIVE` rollback which uses `shutil.copytree(symlinks=True)` and preserves permissions. The inconsistency means rollback-from-COMMITTED is **less faithful** than rollback-from-ACTIVE. **Recommendation:** Use `shutil.copy2` or explicitly handle symlinks and permissions in the backup. At minimum, document the limitation. --- ### M4. DESIGN — Specification contradiction unaddressed The specification contains two contradictory statements: - **Line 45938** (Security Model): *"The Apply phase is an **atomic operation** — either all sandbox changes are committed to the real resources, or none are."* - **Line 19193** (Multi-Resource Sandboxing): *"If any sandbox Apply fails, others may still succeed (**partial apply**)."* This PR correctly implements atomicity per line 45938, but the contradicting statement at line 19193 remains in the spec. Future developers may be confused about the intended behavior. **Recommendation:** Open a follow-up issue to update line 19193 to align with the atomic behavior, or add a code comment referencing the known contradiction. --- ### M5. CODE QUALITY — Duplicated `_backup_directory` across two sandbox implementations **Files:** `copy_on_write.py:436-461`, `overlay.py:528-559` Nearly identical `_backup_directory` implementations (and inline `_copy_file_bytes` in overlay). This duplication increases maintenance burden and risk of divergence. **Recommendation:** Extract to a shared utility in a common module (e.g., `sandbox/_fs_utils.py`). --- ## LOW Severity ### L1. TEST GAP — No test for real file-based rollback from COMMITTED state All atomic commit tests use `MagicMock` sandboxes. No Behave scenario or Robot test exercises the **actual** `CopyOnWriteSandbox`, `OverlaySandbox`, or `GitWorktreeSandbox` `rollback()` from `COMMITTED` state with real files on disk. The backup, rmtree, copytree, and git-reset paths are untested at the integration level. --- ### L2. TEST GAP — No test for NoSandbox/TransactionSandbox interaction with atomic commit `NoSandbox.rollback()` always raises `SandboxRollbackError`. `TransactionSandbox.rollback()` from COMMITTED is a silent no-op (see H2). Neither interaction is tested in an atomic commit context. The edge case of mixing sandbox strategies in a single plan is not covered. --- ### L3. TEST FRAGILITY — Behave scenario depends on Python dict iteration order The scenario "Atomic commit rolls back already-committed sandboxes when one fails" (`sandbox_manager_coverage.feature:163`) depends on `res-001` being committed **before** `res-002` to have any rollback candidates. This relies on Python 3.7+ dict insertion order — correct but implicit. If the manager ever switches to a non-ordered collection, the test silently becomes non-deterministic. **Recommendation:** Add a comment documenting the ordering assumption. --- ### L4. TEST WEAKNESS — Robot integration test verifies mock call, not actual state restoration `_manager_atomic_commit()` in `helper_sandbox_integration.py` asserts `ok_mock.rollback.called` but does not verify actual file-system state restoration. Since it uses mocks, this is expected — but combined with L1, it means the rollback correctness is never verified end-to-end. --- ### L5. PERFORMANCE — Full directory backup doubles disk usage at commit time for CoW/Overlay For `CopyOnWriteSandbox` and `OverlaySandbox`, the entire original directory is backed up to a temp directory before commit. For large repositories this doubles disk usage and adds significant I/O latency. The `GitWorktreeSandbox` avoids this by recording only a commit SHA (much more efficient). This is an inherent cost of the chosen approach but worth noting for large-scale deployments. --- ### L6. DOCUMENTATION — Status transition diagram in `protocol.py` docstring is ambiguous The ASCII diagram in `SandboxStatus` docstring (lines 50-58) does not clearly show the new `COMMITTED -> ROLLED_BACK` transition added by this PR. While the code (`valid_transitions()`) is correct, the visual representation is potentially misleading to developers reading the docstring. --- ## Summary Table | ID | Severity | Category | File(s) | Summary | |:---|:---------|:---------|:--------|:--------| | H1 | HIGH | Bug | `manager.py:245` | `commit_all` only catches `SandboxError` — atomicity broken for unexpected exceptions | | H2 | HIGH | Bug | `transaction_sandbox.py:284` | Rollback from COMMITTED falsely reports success for DB transactions | | H3 | HIGH | Bug | `copy_on_write.py:313`, `overlay.py:366` | Non-atomic `rmtree`+`copytree` in rollback — data loss risk | | M1 | MEDIUM | Bug | `manager.py:304` | `_rollback_committed` catches only `SandboxError` — remaining rollbacks skipped | | M2 | MEDIUM | Bug | `manager.py:260` | No reporting of failed rollbacks in error metadata | | M3 | MEDIUM | Bug | `copy_on_write.py:451`, `overlay.py:539` | Backup does not preserve symlinks or permissions | | M4 | MEDIUM | Design | spec lines 45938 vs 19193 | Specification contradiction unaddressed | | M5 | MEDIUM | Code Quality | `copy_on_write.py:436`, `overlay.py:528` | Duplicated `_backup_directory` across two files | | L1 | LOW | Test Gap | tests | No real file-based rollback-from-COMMITTED test | | L2 | LOW | Test Gap | tests | No NoSandbox/TransactionSandbox atomic commit test | | L3 | LOW | Test Fragility | feature file:163 | Test depends on dict iteration order | | L4 | LOW | Test Weakness | `helper_sandbox_integration.py` | Robot test verifies mock call, not actual restoration | | L5 | LOW | Performance | `copy_on_write.py`, `overlay.py` | Full directory backup doubles disk usage | | L6 | LOW | Documentation | `protocol.py:50-58` | Status transition diagram not updated | --- **Verdict:** The HIGH severity items (H1, H2, H3) should be addressed before merge. H1 and H3 are particularly concerning as they can silently violate the atomicity guarantee or cause data loss — the very things this PR is designed to prevent.
Author
Member

Review Findings — Resolution Report

All 10 applicable findings from the code review have been addressed and amended into the feature commit. The commit is ready for force-push.

Applied Fixes

ID Finding Resolution
H1 commit_all catches only SandboxError Widened to except Exception; non-SandboxError re-raised after rollback
H2 TransactionSandbox.rollback() from COMMITTED falsely reports success Now raises SandboxRollbackError — manager correctly excludes from rolled_back_ids
H3 Non-atomic rmtree+copytree data loss risk Replaced with safe_restore() using rename-based atomic swap
M1 _rollback_committed catches only SandboxError Widened to except Exception so remaining rollbacks proceed
M2 No reporting of failed rollbacks _rollback_committed now returns (rolled_back_ids, failed_rollback_ids); metadata carries both keys
M3 Backup doesn't preserve symlinks/permissions Shared backup_directory() preserves symlinks via os.readlink/os.symlink and permissions via os.chmod
M4 Spec contradiction unaddressed Added .. note:: in commit_all docstring referencing lines 45938 vs 19193
M5 Duplicated _backup_directory code Extracted to _fs_utils.py shared module
L6 Status transition diagram stale Updated ASCII diagram in protocol.py with explicit COMMITTED -> ROLLED_BACK path
Tests Behave + Robot updated Added rollback_failed metadata assertion; updated Robot test to verify both metadata keys

Not Applied (with justification)

ID Finding Reason
L1 No real file-based rollback-from-COMMITTED test Deferred to a follow-up — requires significant test infrastructure changes beyond this fix scope
L2 No NoSandbox/TransactionSandbox atomic commit test Deferred — these edge cases are best addressed in dedicated test issues
L3 Test depends on dict iteration order Acceptable: Python 3.7+ guarantees insertion order; added no code change
L4 Robot test verifies mock call only Acceptable for unit-level mocking; covered by L1 follow-up
L5 Full directory backup performance Inherent to the chosen rollback mechanism; git worktree already efficient; no code change needed

Quality Gates

Check Result
nox -s lint PASS
nox -s typecheck PASS (0 errors, 1 pre-existing warning)
nox -s unit_tests PASS (12,294 scenarios, 0 failures)
nox -s integration_tests PASS (1,700 tests, 0 failures)
# Review Findings — Resolution Report All **10 applicable findings** from the code review have been addressed and amended into the feature commit. The commit is ready for force-push. ## Applied Fixes | ID | Finding | Resolution | |:---|:--------|:-----------| | H1 | `commit_all` catches only `SandboxError` | Widened to `except Exception`; non-`SandboxError` re-raised after rollback | | H2 | `TransactionSandbox.rollback()` from COMMITTED falsely reports success | Now raises `SandboxRollbackError` — manager correctly excludes from `rolled_back_ids` | | H3 | Non-atomic `rmtree`+`copytree` data loss risk | Replaced with `safe_restore()` using rename-based atomic swap | | M1 | `_rollback_committed` catches only `SandboxError` | Widened to `except Exception` so remaining rollbacks proceed | | M2 | No reporting of failed rollbacks | `_rollback_committed` now returns `(rolled_back_ids, failed_rollback_ids)`; metadata carries both keys | | M3 | Backup doesn't preserve symlinks/permissions | Shared `backup_directory()` preserves symlinks via `os.readlink`/`os.symlink` and permissions via `os.chmod` | | M4 | Spec contradiction unaddressed | Added `.. note::` in `commit_all` docstring referencing lines 45938 vs 19193 | | M5 | Duplicated `_backup_directory` code | Extracted to `_fs_utils.py` shared module | | L6 | Status transition diagram stale | Updated ASCII diagram in `protocol.py` with explicit `COMMITTED -> ROLLED_BACK` path | | Tests | Behave + Robot updated | Added `rollback_failed` metadata assertion; updated Robot test to verify both metadata keys | ## Not Applied (with justification) | ID | Finding | Reason | |:---|:--------|:-------| | L1 | No real file-based rollback-from-COMMITTED test | Deferred to a follow-up — requires significant test infrastructure changes beyond this fix scope | | L2 | No NoSandbox/TransactionSandbox atomic commit test | Deferred — these edge cases are best addressed in dedicated test issues | | L3 | Test depends on dict iteration order | Acceptable: Python 3.7+ guarantees insertion order; added no code change | | L4 | Robot test verifies mock call only | Acceptable for unit-level mocking; covered by L1 follow-up | | L5 | Full directory backup performance | Inherent to the chosen rollback mechanism; git worktree already efficient; no code change needed | ## Quality Gates | Check | Result | |:------|:-------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors, 1 pre-existing warning) | | `nox -s unit_tests` | PASS (12,294 scenarios, 0 failures) | | `nox -s integration_tests` | PASS (1,700 tests, 0 failures) |
CoreRasurae force-pushed fix/atomic-sandbox-commit from f23f0bc7e5
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 27s
CI / lint (pull_request) Successful in 3m17s
CI / quality (pull_request) Successful in 3m52s
CI / typecheck (pull_request) Successful in 4m9s
CI / security (pull_request) Successful in 4m21s
CI / unit_tests (pull_request) Successful in 8m26s
CI / integration_tests (pull_request) Successful in 7m42s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 10m59s
CI / coverage (pull_request) Successful in 10m19s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 50m24s
to 8b4e774575
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m20s
CI / typecheck (pull_request) Successful in 3m54s
CI / security (pull_request) Successful in 4m2s
CI / quality (pull_request) Successful in 4m8s
CI / e2e_tests (pull_request) Successful in 8m49s
CI / unit_tests (pull_request) Successful in 9m36s
CI / integration_tests (pull_request) Successful in 9m47s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 10m52s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Failing after 38m22s
2026-03-28 11:59:22 +00:00
Compare
CoreRasurae dismissed freemo's review 2026-03-28 11:59:22 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

CoreRasurae left a comment

Code Review Report — PR #1146: fix(sandbox): make commit_all atomic per specification

Reviewer: Automated deep review (4 full cycles across all categories)
Scope: Code changes in branch fix/atomic-sandbox-commit + close connections to surrounding code
Issue: #925
Commit: 8b4e774


Summary

The PR correctly implements atomic commit_all semantics as required by the specification (line 45938). The approach — pre-commit backup + rollback-on-failure — is sound. The COMMITTED → ROLLED_BACK transition is properly added to the state machine. The _fs_utils extraction is clean. The specification contradiction (line 45938 vs 19193) is well-documented.

Findings: 5 bugs, 1 security concern, 1 performance concern, 5 test coverage gaps, 3 robustness issues.


1. Bugs — MEDIUM Severity

1.1 _rollback_committed rolls back in forward order instead of reverse

File: manager.py:324
Description: When rolling back already-committed sandboxes, the iteration proceeds in the same (forward) order as the commits. Standard transactional rollback semantics use reverse (LIFO) order to properly unwind dependencies. If sandbox B's commit depends on sandbox A's committed state, rolling back A first can leave B's rollback in an inconsistent state.
Suggested fix: for sandbox, _result in reversed(committed):

1.2 Undocumented re-raise behavior for non-SandboxError in commit_all

File: manager.py:276-280
Description: When a sandbox's commit() raises a non-SandboxError exception (e.g., RuntimeError, TypeError), the code performs the rollback and then re-raises the original exception. The caller receives an exception instead of a structured CommitResult. This is reasonable but the method's docstring (lines 220-229) only documents the CommitResult-based return contract — it does not mention that exceptions may propagate. This is an API contract inconsistency.
Suggested fix: Add to the docstring: Raises: Exception — If a non-SandboxError exception occurs during commit, it is re-raised after rollback. The caller receives the exception, not a CommitResult.

1.3 After atomic rollback, sandbox states prevent retry of the full batch

File: manager.py:237-241 and _rollback_committed
Description: After commit_all fails and rolls back previously-committed sandboxes, those sandboxes transition to ROLLED_BACK status. If the caller retries commit_all, the committable filter (sb.status in (CREATED, ACTIVE)) skips all ROLLED_BACK sandboxes, so only the previously-un-attempted sandboxes are retried. The full batch cannot be retried without manual intervention to re-activate the rolled-back sandboxes.

This is inconsistent with the atomic guarantee — the expectation after an all-or-nothing failure is that the system returns to a state where the entire operation can be retried.

Suggested fix: Either (a) transition rolled-back sandboxes back to ACTIVE in _rollback_committed, or (b) document clearly that after atomic failure the plan must be fully re-created, or (c) include ROLLED_BACK in the committable filter if it makes sense for the lifecycle.

1.4 Backup directory not cleaned up on commit failure (resource leak)

Files: copy_on_write.py:244-266, overlay.py:291-313
Description: In both CopyOnWriteSandbox.commit() and OverlaySandbox.commit(), backup_directory creates a backup at line 244/291 before applying changes. If the commit subsequently fails with OSError (e.g., during file copy), the status transitions to ERRORED and SandboxCommitError is raised — but the backup temp directory is never cleaned up. It persists until cleanup() is explicitly called.
Suggested fix: Clean up the backup directory in the except OSError handler before raising.

1.5 git reset --hard during rollback from COMMITTED can affect other worktrees

File: git_worktree.py:493-497
Description: When rolling back a git worktree sandbox from COMMITTED, git reset --hard <pre_merge_commit> is executed on the original branch. If other sandbox worktrees or external tooling are tracking the same branch, this hard reset will affect them. This is an inherent limitation of the git worktree approach but is worth documenting.
Suggested fix: Add a docstring warning about multi-worktree safety.


2. Bugs — LOW Severity

2.1 backup_directory doesn't preserve file timestamps (mtime/atime)

File: _fs_utils.py:58-59 (_copy_file)
Description: _copy_file preserves permissions via os.chmod but does not preserve modification timestamps. When restoring from backup, files will have the backup creation time. This could affect build tools, caches, or make-like systems that rely on file modification timestamps to determine staleness.
Suggested fix: Add os.utime(dst, (src_stat.st_atime, src_stat.st_mtime)) after os.chmod in _copy_file.

2.2 backup_directory doesn't handle special files (FIFOs, sockets, device nodes)

File: _fs_utils.py:38-59
Description: The function handles regular files and symlinks but silently skips special files. If the sandboxed directory contains FIFOs or sockets (unlikely but possible), the backup would be incomplete and the restore would not match the original.
Suggested fix: Log a warning when encountering special files, or raise explicitly.

2.3 Root directory permissions not preserved by backup_directory

File: _fs_utils.py:21-59
Description: The function receives a dst directory (created by tempfile.mkdtemp) and copies contents into it. The permissions of the source root directory (src) are never applied to dst. When safe_restore uses shutil.copytree to copy back, the restored root directory may have different permissions than the original.
Suggested fix: Add os.chmod(dst, stat.S_IMODE(os.stat(src).st_mode)) at the start of backup_directory.


3. Security — LOW Severity

File: _fs_utils.py:82
Description: The stale path is deterministic (target_path + ".atomic-rollback-old"). In a multi-user environment, an attacker could pre-create a symlink at this path pointing to a sensitive directory. The pre-existence cleanup (shutil.rmtree(stale, ignore_errors=True) on line 87) would follow the symlink and delete the target.
Mitigating factors: Sandbox paths are typically in temp dirs or controlled workspaces, limiting exposure.
Suggested fix: Use tempfile.mkdtemp(dir=os.path.dirname(target_path)) for an unpredictable stale name, or add an os.path.islink check before rmtree.


4. Performance — MEDIUM Severity

4.1 Full backup copy before every commit doubles I/O and disk usage

Files: copy_on_write.py:244-245, overlay.py:291-292
Description: Both CopyOnWriteSandbox.commit() and OverlaySandbox.commit() create a full copy of the original directory before applying changes, even when the batch succeeds and the backup is never used. The backup persists until cleanup() is called. For large repositories, this doubles disk I/O during commit and increases temporary disk usage.
Suggested fix: Consider (a) using hard links instead of copies for unchanged files (same-filesystem optimization), (b) deferring backup creation to the manager level (only backup when there are multiple sandboxes to commit), or (c) cleaning up backups eagerly on success in commit_all.


5. Test Coverage Gaps — MEDIUM Severity

5.1 No test for non-SandboxError exception propagation in commit_all

File: manager.py:276-280
Description: The code path where a sandbox's commit() raises a non-SandboxError exception (e.g., RuntimeError) triggers rollback and then re-raises. No test verifies that (a) the rollback occurs and (b) the exception propagates.

5.2 No dedicated unit tests for _fs_utils module

File: _fs_utils.py
Description: backup_directory, safe_restore, and _copy_file have no direct unit tests. Their behavior is only tested indirectly through sandbox commit/rollback scenarios. Edge cases (symlinks, restricted permissions, empty directories, cross-filesystem scenarios, pre-existing stale paths) are not covered.

5.3 No test for safe_restore failure/recovery path

File: _fs_utils.py:97-100
Description: The fallback path where shutil.copytree fails during restore and the original is renamed back from stale is not tested.

5.4 No test for GitWorktreeSandbox.rollback() from COMMITTED state

File: git_worktree.py:487-498
Description: The new rollback-from-COMMITTED code path (git reset --hard on the original branch) has no corresponding test scenario in the feature files or robot tests.

5.5 No test for TransactionSandbox in the atomic commit_all flow

File: transaction_sandbox.py:281-288
Description: TransactionSandbox.rollback() from COMMITTED raises SandboxRollbackError (database commits are irreversible). When this sandbox type participates in an atomic commit_all batch, it will always appear in rollback_failed. No test verifies this interaction.


6. Summary Table

# Category Severity File Short Description
1.1 Bug Medium manager.py:324 Rollback order should be reversed (LIFO)
1.2 Bug Medium manager.py:276 Undocumented exception re-raise bypasses CommitResult contract
1.3 Bug Medium manager.py:237 Rolled-back sandboxes can't be retried by commit_all
1.4 Bug Medium copy_on_write.py:244, overlay.py:291 Backup not cleaned up on commit failure
1.5 Bug Low git_worktree.py:493 git reset --hard can affect other worktrees
2.1 Bug Low _fs_utils.py:59 File timestamps not preserved in backup
2.2 Bug Low _fs_utils.py:38 Special files silently skipped in backup
2.3 Bug Low _fs_utils.py:21 Root directory permissions not preserved
3.1 Security Low _fs_utils.py:82 Predictable stale path (symlink TOCTOU)
4.1 Performance Medium copy_on_write.py:244, overlay.py:291 Full backup copy on every commit
5.1 Test Gap Medium manager.py:276 No test for non-SandboxError re-raise
5.2 Test Gap Medium _fs_utils.py No dedicated unit tests for _fs_utils
5.3 Test Gap Medium _fs_utils.py:97 No test for safe_restore failure path
5.4 Test Gap Medium git_worktree.py:487 No test for git worktree rollback from COMMITTED
5.5 Test Gap Medium transaction_sandbox.py:281 No test for TransactionSandbox in atomic flow

Review performed by: automated 4-cycle deep analysis
Review scope: strictly limited to code changes in fix/atomic-sandbox-commit + immediate surrounding code

# Code Review Report — PR #1146: `fix(sandbox): make commit_all atomic per specification` **Reviewer:** Automated deep review (4 full cycles across all categories) **Scope:** Code changes in branch `fix/atomic-sandbox-commit` + close connections to surrounding code **Issue:** #925 **Commit:** `8b4e774` --- ## Summary The PR correctly implements atomic `commit_all` semantics as required by the specification (line 45938). The approach — pre-commit backup + rollback-on-failure — is sound. The COMMITTED → ROLLED_BACK transition is properly added to the state machine. The `_fs_utils` extraction is clean. The specification contradiction (line 45938 vs 19193) is well-documented. **Findings: 5 bugs, 1 security concern, 1 performance concern, 5 test coverage gaps, 3 robustness issues.** --- ## 1. Bugs — MEDIUM Severity ### 1.1 `_rollback_committed` rolls back in forward order instead of reverse **File:** `manager.py:324` **Description:** When rolling back already-committed sandboxes, the iteration proceeds in the same (forward) order as the commits. Standard transactional rollback semantics use reverse (LIFO) order to properly unwind dependencies. If sandbox B's commit depends on sandbox A's committed state, rolling back A first can leave B's rollback in an inconsistent state. **Suggested fix:** `for sandbox, _result in reversed(committed):` ### 1.2 Undocumented re-raise behavior for non-`SandboxError` in `commit_all` **File:** `manager.py:276-280` **Description:** When a sandbox's `commit()` raises a non-`SandboxError` exception (e.g., `RuntimeError`, `TypeError`), the code performs the rollback and then re-raises the original exception. The caller receives an exception instead of a structured `CommitResult`. This is reasonable but the method's docstring (lines 220-229) only documents the `CommitResult`-based return contract — it does not mention that exceptions may propagate. This is an API contract inconsistency. **Suggested fix:** Add to the docstring: `Raises: Exception — If a non-SandboxError exception occurs during commit, it is re-raised after rollback. The caller receives the exception, not a CommitResult.` ### 1.3 After atomic rollback, sandbox states prevent retry of the full batch **File:** `manager.py:237-241` and `_rollback_committed` **Description:** After `commit_all` fails and rolls back previously-committed sandboxes, those sandboxes transition to `ROLLED_BACK` status. If the caller retries `commit_all`, the committable filter (`sb.status in (CREATED, ACTIVE)`) skips all `ROLLED_BACK` sandboxes, so only the previously-un-attempted sandboxes are retried. The full batch cannot be retried without manual intervention to re-activate the rolled-back sandboxes. This is inconsistent with the atomic guarantee — the expectation after an all-or-nothing failure is that the system returns to a state where the entire operation can be retried. **Suggested fix:** Either (a) transition rolled-back sandboxes back to `ACTIVE` in `_rollback_committed`, or (b) document clearly that after atomic failure the plan must be fully re-created, or (c) include `ROLLED_BACK` in the committable filter if it makes sense for the lifecycle. ### 1.4 Backup directory not cleaned up on commit failure (resource leak) **Files:** `copy_on_write.py:244-266`, `overlay.py:291-313` **Description:** In both `CopyOnWriteSandbox.commit()` and `OverlaySandbox.commit()`, `backup_directory` creates a backup at line 244/291 before applying changes. If the commit subsequently fails with `OSError` (e.g., during file copy), the status transitions to `ERRORED` and `SandboxCommitError` is raised — but the backup temp directory is never cleaned up. It persists until `cleanup()` is explicitly called. **Suggested fix:** Clean up the backup directory in the `except OSError` handler before raising. ### 1.5 `git reset --hard` during rollback from COMMITTED can affect other worktrees **File:** `git_worktree.py:493-497` **Description:** When rolling back a git worktree sandbox from COMMITTED, `git reset --hard <pre_merge_commit>` is executed on the original branch. If other sandbox worktrees or external tooling are tracking the same branch, this hard reset will affect them. This is an inherent limitation of the git worktree approach but is worth documenting. **Suggested fix:** Add a docstring warning about multi-worktree safety. --- ## 2. Bugs — LOW Severity ### 2.1 `backup_directory` doesn't preserve file timestamps (mtime/atime) **File:** `_fs_utils.py:58-59` (`_copy_file`) **Description:** `_copy_file` preserves permissions via `os.chmod` but does not preserve modification timestamps. When restoring from backup, files will have the backup creation time. This could affect build tools, caches, or `make`-like systems that rely on file modification timestamps to determine staleness. **Suggested fix:** Add `os.utime(dst, (src_stat.st_atime, src_stat.st_mtime))` after `os.chmod` in `_copy_file`. ### 2.2 `backup_directory` doesn't handle special files (FIFOs, sockets, device nodes) **File:** `_fs_utils.py:38-59` **Description:** The function handles regular files and symlinks but silently skips special files. If the sandboxed directory contains FIFOs or sockets (unlikely but possible), the backup would be incomplete and the restore would not match the original. **Suggested fix:** Log a warning when encountering special files, or raise explicitly. ### 2.3 Root directory permissions not preserved by `backup_directory` **File:** `_fs_utils.py:21-59` **Description:** The function receives a `dst` directory (created by `tempfile.mkdtemp`) and copies contents into it. The permissions of the source root directory (`src`) are never applied to `dst`. When `safe_restore` uses `shutil.copytree` to copy back, the restored root directory may have different permissions than the original. **Suggested fix:** Add `os.chmod(dst, stat.S_IMODE(os.stat(src).st_mode))` at the start of `backup_directory`. --- ## 3. Security — LOW Severity ### 3.1 Predictable stale path in `safe_restore` (TOCTOU symlink concern) **File:** `_fs_utils.py:82` **Description:** The stale path is deterministic (`target_path + ".atomic-rollback-old"`). In a multi-user environment, an attacker could pre-create a symlink at this path pointing to a sensitive directory. The pre-existence cleanup (`shutil.rmtree(stale, ignore_errors=True)` on line 87) would follow the symlink and delete the target. **Mitigating factors:** Sandbox paths are typically in temp dirs or controlled workspaces, limiting exposure. **Suggested fix:** Use `tempfile.mkdtemp(dir=os.path.dirname(target_path))` for an unpredictable stale name, or add an `os.path.islink` check before `rmtree`. --- ## 4. Performance — MEDIUM Severity ### 4.1 Full backup copy before every commit doubles I/O and disk usage **Files:** `copy_on_write.py:244-245`, `overlay.py:291-292` **Description:** Both `CopyOnWriteSandbox.commit()` and `OverlaySandbox.commit()` create a full copy of the original directory before applying changes, even when the batch succeeds and the backup is never used. The backup persists until `cleanup()` is called. For large repositories, this doubles disk I/O during commit and increases temporary disk usage. **Suggested fix:** Consider (a) using hard links instead of copies for unchanged files (same-filesystem optimization), (b) deferring backup creation to the manager level (only backup when there are multiple sandboxes to commit), or (c) cleaning up backups eagerly on success in `commit_all`. --- ## 5. Test Coverage Gaps — MEDIUM Severity ### 5.1 No test for non-`SandboxError` exception propagation in `commit_all` **File:** `manager.py:276-280` **Description:** The code path where a sandbox's `commit()` raises a non-`SandboxError` exception (e.g., `RuntimeError`) triggers rollback and then re-raises. No test verifies that (a) the rollback occurs and (b) the exception propagates. ### 5.2 No dedicated unit tests for `_fs_utils` module **File:** `_fs_utils.py` **Description:** `backup_directory`, `safe_restore`, and `_copy_file` have no direct unit tests. Their behavior is only tested indirectly through sandbox commit/rollback scenarios. Edge cases (symlinks, restricted permissions, empty directories, cross-filesystem scenarios, pre-existing stale paths) are not covered. ### 5.3 No test for `safe_restore` failure/recovery path **File:** `_fs_utils.py:97-100` **Description:** The fallback path where `shutil.copytree` fails during restore and the original is renamed back from stale is not tested. ### 5.4 No test for `GitWorktreeSandbox.rollback()` from COMMITTED state **File:** `git_worktree.py:487-498` **Description:** The new rollback-from-COMMITTED code path (`git reset --hard` on the original branch) has no corresponding test scenario in the feature files or robot tests. ### 5.5 No test for `TransactionSandbox` in the atomic `commit_all` flow **File:** `transaction_sandbox.py:281-288` **Description:** `TransactionSandbox.rollback()` from COMMITTED raises `SandboxRollbackError` (database commits are irreversible). When this sandbox type participates in an atomic `commit_all` batch, it will always appear in `rollback_failed`. No test verifies this interaction. --- ## 6. Summary Table | # | Category | Severity | File | Short Description | |---|----------|----------|------|-------------------| | 1.1 | Bug | **Medium** | `manager.py:324` | Rollback order should be reversed (LIFO) | | 1.2 | Bug | **Medium** | `manager.py:276` | Undocumented exception re-raise bypasses CommitResult contract | | 1.3 | Bug | **Medium** | `manager.py:237` | Rolled-back sandboxes can't be retried by commit_all | | 1.4 | Bug | **Medium** | `copy_on_write.py:244`, `overlay.py:291` | Backup not cleaned up on commit failure | | 1.5 | Bug | **Low** | `git_worktree.py:493` | git reset --hard can affect other worktrees | | 2.1 | Bug | **Low** | `_fs_utils.py:59` | File timestamps not preserved in backup | | 2.2 | Bug | **Low** | `_fs_utils.py:38` | Special files silently skipped in backup | | 2.3 | Bug | **Low** | `_fs_utils.py:21` | Root directory permissions not preserved | | 3.1 | Security | **Low** | `_fs_utils.py:82` | Predictable stale path (symlink TOCTOU) | | 4.1 | Performance | **Medium** | `copy_on_write.py:244`, `overlay.py:291` | Full backup copy on every commit | | 5.1 | Test Gap | **Medium** | `manager.py:276` | No test for non-SandboxError re-raise | | 5.2 | Test Gap | **Medium** | `_fs_utils.py` | No dedicated unit tests for _fs_utils | | 5.3 | Test Gap | **Medium** | `_fs_utils.py:97` | No test for safe_restore failure path | | 5.4 | Test Gap | **Medium** | `git_worktree.py:487` | No test for git worktree rollback from COMMITTED | | 5.5 | Test Gap | **Medium** | `transaction_sandbox.py:281` | No test for TransactionSandbox in atomic flow | --- *Review performed by: automated 4-cycle deep analysis* *Review scope: strictly limited to code changes in `fix/atomic-sandbox-commit` + immediate surrounding code*
CoreRasurae force-pushed fix/atomic-sandbox-commit from 8b4e774575
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m20s
CI / typecheck (pull_request) Successful in 3m54s
CI / security (pull_request) Successful in 4m2s
CI / quality (pull_request) Successful in 4m8s
CI / e2e_tests (pull_request) Successful in 8m49s
CI / unit_tests (pull_request) Successful in 9m36s
CI / integration_tests (pull_request) Successful in 9m47s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 10m52s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Failing after 38m22s
to 7fecd4b17c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m18s
CI / typecheck (pull_request) Successful in 4m5s
CI / security (pull_request) Successful in 4m0s
CI / quality (pull_request) Successful in 4m19s
CI / integration_tests (pull_request) Successful in 9m3s
CI / unit_tests (pull_request) Successful in 9m14s
CI / e2e_tests (pull_request) Successful in 10m10s
CI / docker (pull_request) Successful in 1m8s
CI / coverage (pull_request) Successful in 11m48s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 14:07:38 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146 fix(sandbox): make commit_all atomic per specification

Reviewer: Automated deep review (4 full cycles across all categories)
Scope: All code changes in branch fix/atomic-sandbox-commit plus close surrounding context
Ref: Issue #925, Specification docs/specification.md (lines 45938, 19193, 24632)


Summary

Overall this is a well-structured change that addresses a genuine specification gap. The atomic commit_all protocol, LIFO rollback ordering, COMMITTED→ROLLED_BACK state transition, and shared _fs_utils module are all sound design choices. The commit message and documentation are thorough. However, the review identified 13 findings across 5 categories, including 2 high-severity bugs that should be addressed before merge.


Findings by Severity

HIGH — Should be addressed before merge

H1. NoSandbox.rollback() breaks atomicity guarantee in mixed batches

Category: Bug / Spec Compliance
Files: no_sandbox.py:190-200, manager.py:256-301

NoSandbox.rollback() unconditionally raises SandboxRollbackError regardless of status. In a batch containing a NoSandbox alongside a real sandbox (e.g., CopyOnWriteSandbox), if the real sandbox fails to commit, the NoSandbox cannot be rolled back. Its changes (applied immediately, per the "none" strategy) persist while others are undone — violating the atomicity guarantee of spec line 45938.

The specification itself acknowledges none has "No automatic rollback" (line 45932), creating an inherent tension with the atomicity requirement. The code correctly reports NoSandbox in rollback_failed metadata, but the caller has no mechanism to recover.

Suggested mitigations (pick one):

  • (a) Commit NoSandbox instances last in the batch so they're never in the "already committed" set when a failure occurs.
  • (b) Reject NoSandbox from atomic commit batches (fail fast with a clear error).
  • (c) At minimum, add a prominent log warning when a NoSandbox is included in an atomic commit.

H2. Pre-commit backup deleted on partial commit failure — data loss risk

Category: Bug
Files: copy_on_write.py:262-266, overlay.py:309-313

In CopyOnWriteSandbox.commit() and OverlaySandbox.commit(), when file copying to the original directory fails midway (partial modification), the except OSError handler deletes the pre-commit backup:

except OSError as exc:
    if self._pre_commit_backup is not None:
        shutil.rmtree(self._pre_commit_backup, ignore_errors=True)
        self._pre_commit_backup = None
    self._status = SandboxStatus.ERRORED

At this point the original directory is partially corrupted (some files were copied before the error), and the backup that could restore it is now gone. The cleanup() method already handles backup cleanup, so the explicit deletion here is both unnecessary and harmful.

Suggested fix: Replace the backup deletion with a restore attempt:

except OSError as exc:
    if self._pre_commit_backup is not None:
        try:
            safe_restore(self._pre_commit_backup, self._original_path)
            self._pre_commit_backup = None
        except Exception:
            pass  # Keep backup for manual recovery
    self._status = SandboxStatus.ERRORED

Or at minimum, remove the rmtree call and let cleanup() handle it.


MEDIUM — Should be addressed (non-blocking)

M1. Rollback from COMMITTED with missing backup falls through silently

Category: Bug
Files: copy_on_write.py:316-335, overlay.py:368-396

When self._status == SandboxStatus.COMMITTED but self._pre_commit_backup is None, the if/else falls through to the regular ACTIVE rollback path, which re-copies from the already-modified original directory. This silently produces a sandbox that reflects the committed changes instead of the pre-commit state — a rollback that doesn't actually roll back.

While this path should not occur in normal operation, it is a defensive gap. Consider raising SandboxRollbackError when COMMITTED and backup is missing:

if self._status == SandboxStatus.COMMITTED and self._pre_commit_backup is None:
    raise SandboxRollbackError(
        f"Cannot rollback COMMITTED sandbox {self._sandbox_id}: "
        f"pre-commit backup is missing"
    )

M2. LIFO rollback order test uses fragile string comparison

Category: Test Flaw
File: features/steps/sandbox_manager_coverage_steps.py:587-592

for i in range(len(rollback_log) - 1):
    assert rollback_log[i] > rollback_log[i + 1], ...

This relies on resource-ID keys ("plan-001:res-001", "plan-001:res-002") sorting lexicographically in commit order. If future resource IDs don't follow this pattern (e.g., ULIDs, non-alphabetic names), the test breaks without any real code defect. Consider comparing against the known commit order explicitly:

expected_order = list(reversed(context._committable_mock_keys))
assert rollback_log == expected_order

M3. Temp directory leak in _fs_utils BDD tests

Category: Test Flaw
File: features/steps/sandbox_fs_utils_steps.py:38, 55, 64, 72

Six scenarios create temp directories via tempfile.mkdtemp() stored in context._fs_tmpdir but no cleanup handler is registered. The after_scenario hook in environment.py only cleans context.test_dir, not _fs_tmpdir. These directories accumulate across test runs.

Suggested fix: Register a cleanup handler in each Given step:

if not hasattr(context, "_cleanup_handlers"):
    context._cleanup_handlers = []
context._cleanup_handlers.append(lambda: shutil.rmtree(context._fs_tmpdir, ignore_errors=True))

M4. Missing BDD scenarios for strategy-specific COMMITTED rollback

Category: Test Coverage
Files: Feature files

The following rollback-from-COMMITTED paths are implemented but lack dedicated BDD scenarios:

  • OverlaySandbox.rollback() from COMMITTED (uses safe_restore)
  • GitWorktreeSandbox.rollback() from COMMITTED (uses git reset --hard)
  • TransactionSandbox.rollback() from COMMITTED (raises SandboxRollbackError)

These are critical paths for the atomic commit protocol and should have explicit coverage.


M5. Full directory backup on every commit even with empty diffs

Category: Performance
Files: copy_on_write.py:243-245, overlay.py:290-292

backup_directory runs unconditionally after _compute_diff, even when the diff is empty (no changed, added, or deleted files). For large project resources with thousands of files, this creates a complete copy that is immediately discarded on the successful (no-change) commit path. Consider guarding:

has_changes = changed_files or added_files or deleted_files
if has_changes:
    self._pre_commit_backup = tempfile.mkdtemp(prefix="ca-cow-backup-")
    backup_directory(self._original_path, self._pre_commit_backup)

LOW — Minor issues / nice-to-have improvements

L1. GitWorktreeSandbox._pre_merge_commit not cleared on commit failure

Category: Bug (latent)
File: git_worktree.py:417-442

If the merge fails, _pre_merge_commit retains a stale value (the commit hash before the attempted merge). Status transitions to ERRORED, which only allows CLEANED_UP, so the stale value is currently harmless. But if future code adds recovery from ERRORED, this could cause incorrect git reset --hard to a commit where no merge occurred.


L2. safe_restore double-failure leaves target at stale path

Category: Bug (edge case)
File: _fs_utils.py:92-103

If shutil.copytree fails and then os.rename(stale, target_path) also fails (e.g., filesystem corruption), the original directory remains at the .atomic-rollback-old sibling path and target_path doesn't exist. This is an extreme edge case but the function's docstring claims "the original is preserved" unconditionally.


L3. Predictable stale-directory naming in safe_restore

Category: Security (low risk)
File: _fs_utils.py:85

The .atomic-rollback-old suffix is deterministic. In scenarios where sandbox paths are attacker-influenced, a pre-created symlink at this path could cause shutil.rmtree to follow the symlink. Mitigated by sandbox directories residing in controlled temp paths.


L4. Integration test uses mocks instead of real sandbox implementations

Category: Test Fidelity
File: robot/helper_sandbox_integration.py:617-683

_manager_atomic_commit uses unittest.mock.MagicMock for both the OK and failing sandboxes. This tests the manager's orchestration logic but not the actual rollback behavior of real sandbox implementations (e.g., safe_restore on filesystem, git reset --hard).


L5. Commit ordering not enforced per specification

Category: Spec Compliance (pre-existing)
File: manager.py:256

Specification line 24632 requires commit to be bottom-up (children before parents) and rollback to be top-down. commit_all iterates sandboxes in Python dict insertion order, which is not necessarily bottom-up topological order. The LIFO rollback only matches the spec if commit order was correct. This is a pre-existing concern not introduced by this PR, but it affects the correctness of the new LIFO rollback.


Positive Observations

  • The _fs_utils extraction is clean and well-motivated; avoiding shutil.copytree/os.makedirs to prevent test-patch interference is thoughtful.
  • The COMMITTED → ROLLED_BACK transition in protocol.py is correctly added to the transition graph, diagram, and docstring.
  • The specification-contradiction note in the commit_all docstring (line 45938 vs 19193) is responsible documentation.
  • Error metadata (rolled_back, rollback_failed keys) gives callers actionable information for recovery decisions.
  • The CHANGELOG entry is comprehensive and well-structured.
  • Catching Exception (not just SandboxError) in both commit_all and _rollback_committed is correct for robustness — unexpected errors cannot bypass rollback.
# Code Review Report — PR #1146 `fix(sandbox): make commit_all atomic per specification` **Reviewer:** Automated deep review (4 full cycles across all categories) **Scope:** All code changes in branch `fix/atomic-sandbox-commit` plus close surrounding context **Ref:** Issue #925, Specification `docs/specification.md` (lines 45938, 19193, 24632) --- ## Summary Overall this is a well-structured change that addresses a genuine specification gap. The atomic `commit_all` protocol, LIFO rollback ordering, COMMITTED→ROLLED_BACK state transition, and shared `_fs_utils` module are all sound design choices. The commit message and documentation are thorough. However, the review identified **13 findings across 5 categories**, including 2 high-severity bugs that should be addressed before merge. --- ## Findings by Severity ### HIGH — Should be addressed before merge #### H1. `NoSandbox.rollback()` breaks atomicity guarantee in mixed batches **Category:** Bug / Spec Compliance **Files:** `no_sandbox.py:190-200`, `manager.py:256-301` `NoSandbox.rollback()` unconditionally raises `SandboxRollbackError` regardless of status. In a batch containing a `NoSandbox` alongside a real sandbox (e.g., `CopyOnWriteSandbox`), if the real sandbox fails to commit, the `NoSandbox` cannot be rolled back. Its changes (applied immediately, per the `"none"` strategy) persist while others are undone — violating the atomicity guarantee of spec line 45938. The specification itself acknowledges `none` has "No automatic rollback" (line 45932), creating an inherent tension with the atomicity requirement. The code correctly reports `NoSandbox` in `rollback_failed` metadata, but the caller has no mechanism to recover. **Suggested mitigations (pick one):** - (a) Commit `NoSandbox` instances **last** in the batch so they're never in the "already committed" set when a failure occurs. - (b) Reject `NoSandbox` from atomic commit batches (fail fast with a clear error). - (c) At minimum, add a prominent log warning when a `NoSandbox` is included in an atomic commit. --- #### H2. Pre-commit backup deleted on partial commit failure — data loss risk **Category:** Bug **Files:** `copy_on_write.py:262-266`, `overlay.py:309-313` In `CopyOnWriteSandbox.commit()` and `OverlaySandbox.commit()`, when file copying to the original directory fails midway (partial modification), the `except OSError` handler **deletes the pre-commit backup**: ```python except OSError as exc: if self._pre_commit_backup is not None: shutil.rmtree(self._pre_commit_backup, ignore_errors=True) self._pre_commit_backup = None self._status = SandboxStatus.ERRORED ``` At this point the original directory is partially corrupted (some files were copied before the error), and the backup that could restore it is now gone. The `cleanup()` method already handles backup cleanup, so the explicit deletion here is both unnecessary and harmful. **Suggested fix:** Replace the backup deletion with a restore attempt: ```python except OSError as exc: if self._pre_commit_backup is not None: try: safe_restore(self._pre_commit_backup, self._original_path) self._pre_commit_backup = None except Exception: pass # Keep backup for manual recovery self._status = SandboxStatus.ERRORED ``` Or at minimum, remove the `rmtree` call and let `cleanup()` handle it. --- ### MEDIUM — Should be addressed (non-blocking) #### M1. Rollback from COMMITTED with missing backup falls through silently **Category:** Bug **Files:** `copy_on_write.py:316-335`, `overlay.py:368-396` When `self._status == SandboxStatus.COMMITTED` but `self._pre_commit_backup is None`, the `if/else` falls through to the regular ACTIVE rollback path, which re-copies from the **already-modified** original directory. This silently produces a sandbox that reflects the committed changes instead of the pre-commit state — a rollback that doesn't actually roll back. While this path should not occur in normal operation, it is a defensive gap. Consider raising `SandboxRollbackError` when COMMITTED and backup is missing: ```python if self._status == SandboxStatus.COMMITTED and self._pre_commit_backup is None: raise SandboxRollbackError( f"Cannot rollback COMMITTED sandbox {self._sandbox_id}: " f"pre-commit backup is missing" ) ``` --- #### M2. LIFO rollback order test uses fragile string comparison **Category:** Test Flaw **File:** `features/steps/sandbox_manager_coverage_steps.py:587-592` ```python for i in range(len(rollback_log) - 1): assert rollback_log[i] > rollback_log[i + 1], ... ``` This relies on resource-ID keys (`"plan-001:res-001"`, `"plan-001:res-002"`) sorting lexicographically in commit order. If future resource IDs don't follow this pattern (e.g., ULIDs, non-alphabetic names), the test breaks without any real code defect. Consider comparing against the known commit order explicitly: ```python expected_order = list(reversed(context._committable_mock_keys)) assert rollback_log == expected_order ``` --- #### M3. Temp directory leak in `_fs_utils` BDD tests **Category:** Test Flaw **File:** `features/steps/sandbox_fs_utils_steps.py:38, 55, 64, 72` Six scenarios create temp directories via `tempfile.mkdtemp()` stored in `context._fs_tmpdir` but no cleanup handler is registered. The `after_scenario` hook in `environment.py` only cleans `context.test_dir`, not `_fs_tmpdir`. These directories accumulate across test runs. **Suggested fix:** Register a cleanup handler in each Given step: ```python if not hasattr(context, "_cleanup_handlers"): context._cleanup_handlers = [] context._cleanup_handlers.append(lambda: shutil.rmtree(context._fs_tmpdir, ignore_errors=True)) ``` --- #### M4. Missing BDD scenarios for strategy-specific COMMITTED rollback **Category:** Test Coverage **Files:** Feature files The following rollback-from-COMMITTED paths are implemented but lack dedicated BDD scenarios: - `OverlaySandbox.rollback()` from COMMITTED (uses `safe_restore`) - `GitWorktreeSandbox.rollback()` from COMMITTED (uses `git reset --hard`) - `TransactionSandbox.rollback()` from COMMITTED (raises `SandboxRollbackError`) These are critical paths for the atomic commit protocol and should have explicit coverage. --- #### M5. Full directory backup on every commit even with empty diffs **Category:** Performance **Files:** `copy_on_write.py:243-245`, `overlay.py:290-292` `backup_directory` runs unconditionally after `_compute_diff`, even when the diff is empty (no changed, added, or deleted files). For large project resources with thousands of files, this creates a complete copy that is immediately discarded on the successful (no-change) commit path. Consider guarding: ```python has_changes = changed_files or added_files or deleted_files if has_changes: self._pre_commit_backup = tempfile.mkdtemp(prefix="ca-cow-backup-") backup_directory(self._original_path, self._pre_commit_backup) ``` --- ### LOW — Minor issues / nice-to-have improvements #### L1. `GitWorktreeSandbox._pre_merge_commit` not cleared on commit failure **Category:** Bug (latent) **File:** `git_worktree.py:417-442` If the merge fails, `_pre_merge_commit` retains a stale value (the commit hash before the attempted merge). Status transitions to ERRORED, which only allows CLEANED_UP, so the stale value is currently harmless. But if future code adds recovery from ERRORED, this could cause incorrect `git reset --hard` to a commit where no merge occurred. --- #### L2. `safe_restore` double-failure leaves target at stale path **Category:** Bug (edge case) **File:** `_fs_utils.py:92-103` If `shutil.copytree` fails and then `os.rename(stale, target_path)` also fails (e.g., filesystem corruption), the original directory remains at the `.atomic-rollback-old` sibling path and `target_path` doesn't exist. This is an extreme edge case but the function's docstring claims "the original is preserved" unconditionally. --- #### L3. Predictable stale-directory naming in `safe_restore` **Category:** Security (low risk) **File:** `_fs_utils.py:85` The `.atomic-rollback-old` suffix is deterministic. In scenarios where sandbox paths are attacker-influenced, a pre-created symlink at this path could cause `shutil.rmtree` to follow the symlink. Mitigated by sandbox directories residing in controlled temp paths. --- #### L4. Integration test uses mocks instead of real sandbox implementations **Category:** Test Fidelity **File:** `robot/helper_sandbox_integration.py:617-683` `_manager_atomic_commit` uses `unittest.mock.MagicMock` for both the OK and failing sandboxes. This tests the manager's orchestration logic but not the actual rollback behavior of real sandbox implementations (e.g., `safe_restore` on filesystem, `git reset --hard`). --- #### L5. Commit ordering not enforced per specification **Category:** Spec Compliance (pre-existing) **File:** `manager.py:256` Specification line 24632 requires commit to be **bottom-up** (children before parents) and rollback to be **top-down**. `commit_all` iterates sandboxes in Python dict insertion order, which is not necessarily bottom-up topological order. The LIFO rollback only matches the spec if commit order was correct. This is a pre-existing concern not introduced by this PR, but it affects the correctness of the new LIFO rollback. --- ## Positive Observations - The `_fs_utils` extraction is clean and well-motivated; avoiding `shutil.copytree`/`os.makedirs` to prevent test-patch interference is thoughtful. - The `COMMITTED → ROLLED_BACK` transition in `protocol.py` is correctly added to the transition graph, diagram, and docstring. - The specification-contradiction note in the `commit_all` docstring (line 45938 vs 19193) is responsible documentation. - Error metadata (`rolled_back`, `rollback_failed` keys) gives callers actionable information for recovery decisions. - The CHANGELOG entry is comprehensive and well-structured. - Catching `Exception` (not just `SandboxError`) in both `commit_all` and `_rollback_committed` is correct for robustness — unexpected errors cannot bypass rollback.
CoreRasurae force-pushed fix/atomic-sandbox-commit from 7fecd4b17c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m18s
CI / typecheck (pull_request) Successful in 4m5s
CI / security (pull_request) Successful in 4m0s
CI / quality (pull_request) Successful in 4m19s
CI / integration_tests (pull_request) Successful in 9m3s
CI / unit_tests (pull_request) Successful in 9m14s
CI / e2e_tests (pull_request) Successful in 10m10s
CI / docker (pull_request) Successful in 1m8s
CI / coverage (pull_request) Successful in 11m48s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to c0614da1fa
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 31s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 4m2s
CI / e2e_tests (pull_request) Failing after 14m21s
CI / integration_tests (pull_request) Failing after 14m22s
CI / unit_tests (pull_request) Failing after 14m22s
CI / quality (pull_request) Failing after 14m23s
CI / security (pull_request) Failing after 14m23s
CI / coverage (pull_request) Successful in 12m50s
CI / docker (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Failing after 35m19s
2026-03-28 14:42:19 +00:00
Compare
CoreRasurae left a comment

Code Review Report -- PR #1146: fix(sandbox): make commit_all atomic per specification

Reviewer: Automated code review (requested by CoreRasurae)
Scope: Code changes in branch fix/atomic-sandbox-commit and close connections to surrounding code
Reference: Issue #925, specification lines 45938 (atomicity), 19193 (partial apply), 24632 (dependency ordering)
Review cycles: 4 full passes across all categories (bugs, security, performance, test coverage/flaws, spec compliance)


Summary

The PR successfully converts SandboxManager.commit_all() from partial-commit to all-or-nothing semantics with LIFO rollback, adds a shared _fs_utils module for backup/restore, and extends all 4 sandbox strategies with rollback-from-COMMITTED support. The specification contradiction between line 45938 (atomicity) and line 19193 (partial apply) is correctly documented.

However, multiple review cycles revealed 12 findings across 5 categories, including 1 high-severity bug that can cause data corruption under disk-pressure conditions.


Findings by Severity

HIGH -- Bugs

BUG-1: Backup failure during commit can corrupt the original directory

Files: copy_on_write.py:247-248, overlay.py:294-295
Severity: HIGH -- Can cause data loss of the original resource

In both CopyOnWriteSandbox.commit() and OverlaySandbox.commit(), self._pre_commit_backup is assigned before backup_directory() is called:

self._pre_commit_backup = tempfile.mkdtemp(prefix="ca-cow-backup-")  # assigned HERE
backup_directory(self._original_path, self._pre_commit_backup)        # may fail HERE

If backup_directory() fails partway (e.g., disk full, permission error), the except OSError handler at line 265/312 checks self._pre_commit_backup is not None (True, since it was set before the call) and invokes safe_restore(). Since the original directory has not been modified yet at this point (file copy operations haven't started), safe_restore() replaces the intact original with the partial backup data -- corrupting it.

Suggested fix: Assign self._pre_commit_backup only after backup_directory() succeeds:

if changed_files or added_files or deleted_files:
    backup_dir = tempfile.mkdtemp(prefix="ca-cow-backup-")
    try:
        backup_directory(self._original_path, backup_dir)
    except OSError:
        shutil.rmtree(backup_dir, ignore_errors=True)
        raise
    self._pre_commit_backup = backup_dir  # only set after successful backup

This ensures safe_restore() is only attempted when the original was actually modified.


MEDIUM -- Bugs

BUG-2: No-change commit followed by rollback falsely reports rollback failure

Files: copy_on_write.py:246,331-335, overlay.py:293,384-388

When commit() detects no changes, the pre-commit backup is skipped (_pre_commit_backup remains None) and status transitions to COMMITTED. If the atomic commit_all() later fails on a different sandbox and triggers _rollback_committed(), rollback() is called on this no-change sandbox. The rollback check:

if self._pre_commit_backup is None:
    raise SandboxRollbackError("Cannot rollback COMMITTED sandbox ... backup is missing")

raises SandboxRollbackError even though the original was never modified and no rollback is needed. This results in the sandbox being reported in failed_rollback_ids metadata, producing misleading error reports.

Suggested fix: When there are no changes to apply, skip the backup-required check during rollback (the original is unmodified):

if self._status == SandboxStatus.COMMITTED:
    if self._pre_commit_backup is None:
        # No changes were applied, nothing to undo
        pass  # or log at debug level
    else:
        safe_restore(self._pre_commit_backup, self._original_path)
        self._pre_commit_backup = None

BUG-3: commit_all lock released during commit loop is not fully thread-safe

File: manager.py:242-317

The _lock is acquired to snapshot the sandbox list (line 242-243) then released before the commit/rollback loop. A concurrent thread calling rollback_all(), cleanup_all(), or modifying sandbox state during the commit loop could interfere with the atomic commit protocol. While the loop operates on a snapshot, individual sandbox objects are shared references and another thread could mutate their state.

Impact: Low probability in practice since concurrent access to the same plan's sandboxes during commit is unusual, but the contract should be documented.


MEDIUM -- Test Coverage

TEST-1: Mock sandboxes have static status that doesn't reflect real state transitions

File: sandbox_manager_coverage_steps.py:35

PropertyMock(return_value=status) creates mocks with an immutable status. When commit_all() calls sandbox.commit(), the mock returns a CommitResult but the mock's .status stays ACTIVE (never transitions to COMMITTED). When _rollback_committed() then calls sandbox.rollback(), it's rolling back a mock that appears to be in ACTIVE state, not COMMITTED state. This means the tests don't exercise the real status-dependent branching in rollback() implementations (which behave differently for ACTIVE vs COMMITTED).

TEST-2: No integration tests exercise real filesystem rollback from COMMITTED during atomic commit_all

Files: BDD feature files and Robot tests

All atomic commit rollback scenarios use mocks for the sandbox instances. No test exercises the actual filesystem-level safe_restore() path during a commit_all failure with real CopyOnWriteSandbox or OverlaySandbox instances. This means BUG-1 above is not caught by existing tests. A test that creates real filesystem sandboxes, commits one successfully, fails the second, and verifies the first's original directory is correctly restored would close this gap.


LOW -- Bugs

BUG-4: backup_directory doesn't preserve directory timestamps

File: _fs_utils.py:51-54

File timestamps are preserved via os.utime() (line 126), but directory access/modification timestamps are not. Only directory permissions are preserved via os.chmod() (line 54). For tools or validation logic that relies on directory mtime (e.g., make-like incremental builds), restored directories would have current timestamps instead of originals.


LOW -- Performance

PERF-1: Full directory backup for any change

Files: copy_on_write.py:247-248, overlay.py:294-295

backup_directory() copies the entire original directory even if only one file changed. For large directories (the project's scaling tests target 10K-100K files), this adds significant I/O overhead to every commit. The no-change optimization (skip backup when diff is empty) helps, but a single-file change still triggers a full copy.

Note: This is a known tradeoff for simplicity/correctness. A potential future optimization would be to back up only the files that will be modified (based on _compute_diff results), though this adds complexity to the restore logic.


LOW -- Test Coverage

TEST-3: No test verifies the NoSandbox warning log in commit_all

File: manager.py:259-266

The code logs a warning when NoSandbox instances are present in a batch (since they weaken atomicity). No BDD or Robot test verifies this warning is emitted. Adding a log-capture assertion would confirm the warning is triggered and contains the expected message.

TEST-4: No test for TransactionSandbox.rollback() from COMMITTED raising SandboxRollbackError

File: transaction_sandbox.py:281-289

The new behavior where TransactionSandbox.rollback() from COMMITTED raises SandboxRollbackError (because DB COMMIT is irreversible) is not directly tested. This is a critical path for the atomic commit protocol -- if a TransactionSandbox commits successfully and a later sandbox fails, the rollback of the TransactionSandbox should fail and be reported in failed_rollback_ids.


LOW -- Spec Compliance

SPEC-1: LIFO rollback order doesn't match spec's "top-down" requirement

File: manager.py:346

The commit message cites "specification dependency ordering (line 24632: rollback is top-down)" as justification for LIFO rollback order. However, the spec's "top-down" refers to resource DAG traversal (parent before child), not reverse-of-commit-order. LIFO (reverse insertion order of dict.values()) may produce a different ordering than top-down DAG traversal.

In practice, since sandboxes in commit_all operate on independent resource boundaries, the ordering is unlikely to matter. But the docstring/commit message should clarify that LIFO is used for transaction-log-style undo rather than citing the DAG-based spec requirement.


LOW -- Security

File: _fs_utils.py:48-49,58-60

backup_directory() preserves symlinks without validating that their targets are within the source directory tree. A symlink pointing to /etc/passwd would be faithfully backed up and restored. Practical risk is very low since sandboxes are not user-facing and symlinks in the original directory are pre-existing.

SEC-2: Predictable stale directory naming in safe_restore

File: _fs_utils.py:85

The stale directory uses a fixed suffix .atomic-rollback-old, making the name predictable. In a shared filesystem environment, another process could pre-create this path to interfere with the restore. Practical risk is negligible since the sandbox temp directories have unique names.


Overall Assessment

The implementation correctly addresses the core requirement of atomic commit_all() and is well-structured with clear documentation of the spec contradiction. BUG-1 should be fixed before merge as it can cause data corruption under disk-pressure conditions. BUG-2 produces misleading error reports but doesn't cause data loss. The test coverage gaps (TEST-1, TEST-2) should ideally be addressed to verify the real filesystem paths that BUG-1 affects.

# Code Review Report -- PR #1146: fix(sandbox): make commit_all atomic per specification **Reviewer:** Automated code review (requested by CoreRasurae) **Scope:** Code changes in branch `fix/atomic-sandbox-commit` and close connections to surrounding code **Reference:** Issue #925, specification lines 45938 (atomicity), 19193 (partial apply), 24632 (dependency ordering) **Review cycles:** 4 full passes across all categories (bugs, security, performance, test coverage/flaws, spec compliance) --- ## Summary The PR successfully converts `SandboxManager.commit_all()` from partial-commit to all-or-nothing semantics with LIFO rollback, adds a shared `_fs_utils` module for backup/restore, and extends all 4 sandbox strategies with rollback-from-COMMITTED support. The specification contradiction between line 45938 (atomicity) and line 19193 (partial apply) is correctly documented. However, multiple review cycles revealed **12 findings** across 5 categories, including **1 high-severity bug** that can cause data corruption under disk-pressure conditions. --- ## Findings by Severity ### HIGH -- Bugs #### BUG-1: Backup failure during commit can corrupt the original directory **Files:** `copy_on_write.py:247-248`, `overlay.py:294-295` **Severity:** HIGH -- Can cause data loss of the original resource In both `CopyOnWriteSandbox.commit()` and `OverlaySandbox.commit()`, `self._pre_commit_backup` is assigned **before** `backup_directory()` is called: ```python self._pre_commit_backup = tempfile.mkdtemp(prefix="ca-cow-backup-") # assigned HERE backup_directory(self._original_path, self._pre_commit_backup) # may fail HERE ``` If `backup_directory()` fails partway (e.g., disk full, permission error), the `except OSError` handler at line 265/312 checks `self._pre_commit_backup is not None` (True, since it was set before the call) and invokes `safe_restore()`. Since the **original directory has not been modified yet** at this point (file copy operations haven't started), `safe_restore()` replaces the intact original with the **partial backup data** -- corrupting it. **Suggested fix:** Assign `self._pre_commit_backup` only after `backup_directory()` succeeds: ```python if changed_files or added_files or deleted_files: backup_dir = tempfile.mkdtemp(prefix="ca-cow-backup-") try: backup_directory(self._original_path, backup_dir) except OSError: shutil.rmtree(backup_dir, ignore_errors=True) raise self._pre_commit_backup = backup_dir # only set after successful backup ``` This ensures `safe_restore()` is only attempted when the original was actually modified. --- ### MEDIUM -- Bugs #### BUG-2: No-change commit followed by rollback falsely reports rollback failure **Files:** `copy_on_write.py:246,331-335`, `overlay.py:293,384-388` When `commit()` detects no changes, the pre-commit backup is skipped (`_pre_commit_backup` remains `None`) and status transitions to `COMMITTED`. If the atomic `commit_all()` later fails on a different sandbox and triggers `_rollback_committed()`, `rollback()` is called on this no-change sandbox. The rollback check: ```python if self._pre_commit_backup is None: raise SandboxRollbackError("Cannot rollback COMMITTED sandbox ... backup is missing") ``` raises `SandboxRollbackError` even though the original was never modified and no rollback is needed. This results in the sandbox being reported in `failed_rollback_ids` metadata, producing misleading error reports. **Suggested fix:** When there are no changes to apply, skip the backup-required check during rollback (the original is unmodified): ```python if self._status == SandboxStatus.COMMITTED: if self._pre_commit_backup is None: # No changes were applied, nothing to undo pass # or log at debug level else: safe_restore(self._pre_commit_backup, self._original_path) self._pre_commit_backup = None ``` #### BUG-3: `commit_all` lock released during commit loop is not fully thread-safe **File:** `manager.py:242-317` The `_lock` is acquired to snapshot the sandbox list (line 242-243) then released before the commit/rollback loop. A concurrent thread calling `rollback_all()`, `cleanup_all()`, or modifying sandbox state during the commit loop could interfere with the atomic commit protocol. While the loop operates on a snapshot, individual sandbox objects are shared references and another thread could mutate their state. **Impact:** Low probability in practice since concurrent access to the same plan's sandboxes during commit is unusual, but the contract should be documented. --- ### MEDIUM -- Test Coverage #### TEST-1: Mock sandboxes have static status that doesn't reflect real state transitions **File:** `sandbox_manager_coverage_steps.py:35` `PropertyMock(return_value=status)` creates mocks with an immutable status. When `commit_all()` calls `sandbox.commit()`, the mock returns a `CommitResult` but the mock's `.status` stays `ACTIVE` (never transitions to `COMMITTED`). When `_rollback_committed()` then calls `sandbox.rollback()`, it's rolling back a mock that appears to be in `ACTIVE` state, not `COMMITTED` state. This means the tests don't exercise the real status-dependent branching in `rollback()` implementations (which behave differently for ACTIVE vs COMMITTED). #### TEST-2: No integration tests exercise real filesystem rollback from COMMITTED during atomic commit_all **Files:** BDD feature files and Robot tests All atomic commit rollback scenarios use mocks for the sandbox instances. No test exercises the actual filesystem-level `safe_restore()` path during a `commit_all` failure with real `CopyOnWriteSandbox` or `OverlaySandbox` instances. This means BUG-1 above is not caught by existing tests. A test that creates real filesystem sandboxes, commits one successfully, fails the second, and verifies the first's original directory is correctly restored would close this gap. --- ### LOW -- Bugs #### BUG-4: `backup_directory` doesn't preserve directory timestamps **File:** `_fs_utils.py:51-54` File timestamps are preserved via `os.utime()` (line 126), but directory access/modification timestamps are not. Only directory permissions are preserved via `os.chmod()` (line 54). For tools or validation logic that relies on directory mtime (e.g., make-like incremental builds), restored directories would have current timestamps instead of originals. --- ### LOW -- Performance #### PERF-1: Full directory backup for any change **Files:** `copy_on_write.py:247-248`, `overlay.py:294-295` `backup_directory()` copies the **entire** original directory even if only one file changed. For large directories (the project's scaling tests target 10K-100K files), this adds significant I/O overhead to every commit. The no-change optimization (skip backup when diff is empty) helps, but a single-file change still triggers a full copy. **Note:** This is a known tradeoff for simplicity/correctness. A potential future optimization would be to back up only the files that will be modified (based on `_compute_diff` results), though this adds complexity to the restore logic. --- ### LOW -- Test Coverage #### TEST-3: No test verifies the NoSandbox warning log in commit_all **File:** `manager.py:259-266` The code logs a warning when `NoSandbox` instances are present in a batch (since they weaken atomicity). No BDD or Robot test verifies this warning is emitted. Adding a log-capture assertion would confirm the warning is triggered and contains the expected message. #### TEST-4: No test for `TransactionSandbox.rollback()` from COMMITTED raising `SandboxRollbackError` **File:** `transaction_sandbox.py:281-289` The new behavior where `TransactionSandbox.rollback()` from COMMITTED raises `SandboxRollbackError` (because DB COMMIT is irreversible) is not directly tested. This is a critical path for the atomic commit protocol -- if a `TransactionSandbox` commits successfully and a later sandbox fails, the rollback of the `TransactionSandbox` should fail and be reported in `failed_rollback_ids`. --- ### LOW -- Spec Compliance #### SPEC-1: LIFO rollback order doesn't match spec's "top-down" requirement **File:** `manager.py:346` The commit message cites "specification dependency ordering (line 24632: rollback is top-down)" as justification for LIFO rollback order. However, the spec's "top-down" refers to **resource DAG traversal** (parent before child), not reverse-of-commit-order. LIFO (reverse insertion order of `dict.values()`) may produce a different ordering than top-down DAG traversal. In practice, since sandboxes in `commit_all` operate on independent resource boundaries, the ordering is unlikely to matter. But the docstring/commit message should clarify that LIFO is used for **transaction-log-style undo** rather than citing the DAG-based spec requirement. --- ### LOW -- Security #### SEC-1: Symlink targets not validated during backup **File:** `_fs_utils.py:48-49,58-60` `backup_directory()` preserves symlinks without validating that their targets are within the source directory tree. A symlink pointing to `/etc/passwd` would be faithfully backed up and restored. Practical risk is very low since sandboxes are not user-facing and symlinks in the original directory are pre-existing. #### SEC-2: Predictable stale directory naming in `safe_restore` **File:** `_fs_utils.py:85` The stale directory uses a fixed suffix `.atomic-rollback-old`, making the name predictable. In a shared filesystem environment, another process could pre-create this path to interfere with the restore. Practical risk is negligible since the sandbox temp directories have unique names. --- ## Overall Assessment The implementation correctly addresses the core requirement of atomic `commit_all()` and is well-structured with clear documentation of the spec contradiction. BUG-1 should be fixed before merge as it can cause data corruption under disk-pressure conditions. BUG-2 produces misleading error reports but doesn't cause data loss. The test coverage gaps (TEST-1, TEST-2) should ideally be addressed to verify the real filesystem paths that BUG-1 affects.
CoreRasurae force-pushed fix/atomic-sandbox-commit from c0614da1fa
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 31s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 4m2s
CI / e2e_tests (pull_request) Failing after 14m21s
CI / integration_tests (pull_request) Failing after 14m22s
CI / unit_tests (pull_request) Failing after 14m22s
CI / quality (pull_request) Failing after 14m23s
CI / security (pull_request) Failing after 14m23s
CI / coverage (pull_request) Successful in 12m50s
CI / docker (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Failing after 35m19s
to d58311d39a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m25s
CI / typecheck (pull_request) Successful in 3m53s
CI / quality (pull_request) Successful in 3m42s
CI / security (pull_request) Successful in 4m13s
CI / build (pull_request) Successful in 19s
CI / integration_tests (pull_request) Successful in 5m48s
CI / e2e_tests (pull_request) Successful in 5m22s
CI / unit_tests (pull_request) Successful in 9m3s
CI / docker (pull_request) Successful in 59s
CI / coverage (pull_request) Successful in 12m43s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 50m50s
2026-03-28 19:54:34 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146 fix/atomic-sandbox-commit

Reviewer: Automated deep review (3 full-cycle passes)
Scope: All code changes on branch fix/atomic-sandbox-commit plus immediate surrounding code
Commit: d58311d3 by Luis Mendes
Issue: #925 — ensure atomic apply — commit_all must be all-or-nothing


Overall Assessment

The implementation correctly converts SandboxManager.commit_all() from partial-commit to all-or-nothing semantics. The design choices (rollback-on-failure over pre-validation, LIFO rollback order, strategy-specific rollback mechanisms) are sound. The code is well-documented with clear explanations of trade-offs and spec contradictions. The _fs_utils extraction is a good refactoring that eliminates duplication.

However, the review identified several issues across multiple categories that should be addressed before merge.


Findings by Severity

CRITICAL — Bugs

B1. OverlaySandbox: stale merged directory after rollback from COMMITTED

File: overlay.py:394-409 | Severity: Critical

When rollback() is called from COMMITTED state, safe_restore() correctly restores the original directory from the pre-commit backup, but the merged directory (_merged_dir) is not reset. Since OverlaySandbox.get_path() (line 238-242) allows the ROLLED_BACK status (unlike CopyOnWriteSandbox), a subsequent get_path() call would transition to ACTIVE and return paths into the stale merged directory — exposing pre-rollback data to the caller.

Impact: Data integrity violation — the sandbox appears clean but contains stale modifications.

Suggested fix: After restoring the original from backup in the COMMITTED rollback branch, also reset the merged directory (re-copy from the now-restored original, or recreate the overlay mount).


HIGH — Bugs

B2. Non-SandboxError re-raise loses rollback metadata

File: manager.py:298-302 | Severity: High

When commit() raises a non-SandboxError exception (e.g., RuntimeError), the atomic rollback is performed but the original exception is re-raised directly. The caller receives the bare exception with no indication of which sandboxes were rolled back or which rollback operations failed. The detailed error_msg is logged but not attached to the exception.

Impact: The caller cannot determine the system state after the exception — whether rollback succeeded, partially succeeded, or failed.

Suggested fix: Wrap the original exception in a SandboxError (or a new AtomicCommitError subclass) that includes the rollback metadata, or attach the metadata to the exception via __notes__ (Python 3.11+).

B3. Backup created in system temp dir — cross-filesystem safe_restore performance

File: copy_on_write.py:253, overlay.py:300 | Severity: High (Performance + Correctness risk)

tempfile.mkdtemp() creates the backup directory in the system temp directory (typically /tmp). If the original directory is on a different filesystem (e.g., /home, /data), then:

  1. backup_directory() must copy across filesystems (slow for large dirs)
  2. safe_restore() must use shutil.copytree for the restore (slow again)

If the backup were created on the same filesystem as the original (e.g., using the original's parent as dir= argument to mkdtemp), the backup could potentially use hard links and the restore rename would be nearly instantaneous.

Impact: For large project directories, the backup+restore cycle could take minutes instead of seconds.


HIGH — Spec Compliance / Design

S1. NoSandbox and TransactionSandbox fundamentally break atomicity

File: manager.py:254-266, no_sandbox.py:190-200, transaction_sandbox.py:281-289 | Severity: High (Design)

The commit correctly logs a warning when NoSandbox instances are in the batch, but the operation proceeds anyway. Both NoSandbox.rollback() and TransactionSandbox.rollback() from COMMITTED raise unconditionally, meaning:

  • Any NoSandbox in the batch makes atomicity impossible (changes applied in-place, cannot be undone)
  • Any TransactionSandbox that committed before a failure cannot be rolled back (database COMMIT is permanent)

The warning message says atomicity is "weakened" — a more accurate statement would be that atomicity is broken.

Suggested action: Consider either:

  1. Refusing to proceed with commit_all when non-rollbackable sandboxes are present (strict mode), or
  2. Requiring non-rollbackable sandboxes to commit last in the batch (so their commit only happens after all rollbackable sandboxes succeed), or
  3. Upgrading the log message from "weakened" to "broken" and documenting this limitation more prominently in the docstring.

MEDIUM — Bugs

B4. _fs_utils.safe_restore only catches OSError — non-OS exceptions leave directory renamed

File: _fs_utils.py:103-106 | Severity: Medium

If shutil.copytree (line 97-102) raises a non-OSError exception (unlikely but possible — e.g., MemoryError from a deeply nested tree), the except OSError block won't catch it. The original directory has already been renamed to stale (line 95), and the rename-back at line 105 won't execute. This leaves target_path non-existent and the original data only accessible via the .atomic-rollback-old suffix.

Suggested fix: Broaden the except clause to except BaseException (or at minimum except Exception) for the restore-rename, while still re-raising:

except BaseException:
    os.rename(stale, target_path)
    raise

MEDIUM — Performance

P1. Full directory backup for any change

File: copy_on_write.py:252-259, overlay.py:299-306 | Severity: Medium

Even a single-file change triggers a complete directory backup via backup_directory(). For large project directories (the spec references Firefox-scale codebases), this means copying potentially gigabytes of data for a one-line edit. The no-change optimization is good, but the common case (some changes) remains expensive.

Considered: A per-file backup (only backing up files that will be overwritten) would be more efficient but significantly more complex. This is a known trade-off and may be acceptable for v1.

P2. Triple directory walk during commit

File: copy_on_write.py:239-255, overlay.py:286-306 | Severity: Medium

The commit path walks the directory tree three times:

  1. _compute_diff() walks both sandbox and original
  2. backup_directory() walks the original again

For very large directories, consolidating into fewer walks could improve performance.


MEDIUM — Test Coverage

T1. No integration test for real filesystem rollback from COMMITTED

File: features/sandbox_manager_coverage.feature | Severity: Medium

All atomic commit BDD scenarios use mock sandboxes. There is no test that exercises the actual safe_restore() → filesystem restoration path through CopyOnWriteSandbox.rollback() from COMMITTED. The _fs_utils tests cover safe_restore in isolation, but the full commit() → partial-failure → rollback-from-COMMITTED path with real files is untested.

T2. No test for commit_all with TransactionSandbox in batch

Severity: Medium

There is no scenario verifying that a TransactionSandbox that committed before a failure correctly ends up in failed_rollback_ids (since its rollback from COMMITTED always raises SandboxRollbackError).

T3. No test for OverlaySandbox stale merged dir after COMMITTED rollback

Severity: Medium (directly related to bug B1)

Missing test that would catch the stale merged directory issue described in B1.


LOW — Bugs / Robustness

B5. backup_directory does not handle special files (FIFOs, sockets, devices)

File: _fs_utils.py:58-65 | Severity: Low

backup_directory handles regular files, symlinks, and directories, but _copy_file would block or fail on named pipes, Unix domain sockets, or device files. These are unlikely in typical project directories but could appear in certain resource types.

B6. safe_restore stale cleanup has TOCTOU window

File: _fs_utils.py:92-93 | Severity: Low

os.path.exists(stale) followed by shutil.rmtree(stale, ignore_errors=True) has a small TOCTOU (time-of-check-time-of-use) window. The ignore_errors=True mitigates the practical impact — if the stale directory changes between check and removal, the rmtree failure is silenced and the subsequent os.rename will fail with a clear error.


LOW — Test Quality

T4. Rollback order test implicitly depends on Python dict insertion order

File: features/steps/sandbox_manager_coverage_steps.py | Severity: Low

The LIFO rollback order test relies on _committable_mock_keys being populated in the order the step definitions are called, and on _active_sandboxes[plan_id].values() iterating in insertion order. While Python 3.7+ guarantees dict insertion order, the test is implicitly testing a Python language guarantee rather than explicit ordering logic in the manager.

T5. Robot integration test for atomic commit uses mocks

File: robot/helper_sandbox_integration.py:333-407 | Severity: Low

The "Manager Atomic Commit Rolls Back On Failure" Robot test replaces sandboxes with MagicMock objects, making it effectively a unit test despite being in the integration test suite. A true integration test would use real sandbox implementations with real filesystem operations.

T6. Cross-module step definition dependency

File: features/steps/sandbox_manager_coverage_steps.pylock_service_coverage_steps.py | Severity: Low

The "a RuntimeError should have been raised" step used in the atomic commit scenarios is defined in lock_service_coverage_steps.py, creating a hidden cross-module dependency. If that file is refactored, the sandbox tests break. Consider defining this step in a shared steps module or locally.


INFORMATIONAL — Documentation / Style

D1. Inline import in hot path

File: manager.py:257 | Severity: Informational

from cleveragents.infrastructure.sandbox.no_sandbox import NoSandbox is inside commit_all(). Python caches imports after first load, so the performance impact is negligible after the first call. However, top-level imports are conventional and easier to trace. Since this avoids a circular import, it's acceptable but worth a comment explaining why.

D2. Spec contradiction documented clearly

File: manager.py:210-215 | Severity: Informational (Positive)

The spec contradiction between line 45938 (atomicity) and line 19193 (partial apply) is clearly documented in the docstring with specific line references. This is good practice.

D3. LIFO vs DAG-based rollback distinction documented

File: manager.py:332-342 | Severity: Informational (Positive)

The distinction between LIFO batch undo and spec-defined DAG-based rollback is clearly explained in the _rollback_committed docstring.


Summary Table

ID Category Severity File(s) Summary
B1 Bug Critical overlay.py Stale merged dir after COMMITTED rollback
B2 Bug High manager.py Non-SandboxError re-raise loses rollback metadata
B3 Perf/Bug High copy_on_write.py, overlay.py Backup in /tmp causes cross-FS copy penalty
S1 Spec/Design High manager.py, no_sandbox.py, transaction_sandbox.py Non-rollbackable sandboxes break atomicity
B4 Bug Medium _fs_utils.py safe_restore except clause too narrow
P1 Performance Medium copy_on_write.py, overlay.py Full dir backup for any change
P2 Performance Medium copy_on_write.py, overlay.py Triple directory walk during commit
T1 Test Coverage Medium sandbox_manager_coverage.feature No integration test for real FS rollback from COMMITTED
T2 Test Coverage Medium No test for TransactionSandbox in atomic commit
T3 Test Coverage Medium No test for OverlaySandbox stale merged dir (B1)
B5 Bug Low _fs_utils.py No handling for special files
B6 Bug Low _fs_utils.py TOCTOU in stale cleanup
T4 Test Quality Low sandbox_manager_coverage_steps.py Implicit dict order dependency
T5 Test Quality Low helper_sandbox_integration.py Integration test uses mocks
T6 Test Quality Low sandbox_manager_coverage_steps.py Cross-module step dependency
D1 Style Info manager.py Inline import

Critical: 1 | High: 3 | Medium: 6 | Low: 5 | Info: 1

# Code Review Report — PR #1146 `fix/atomic-sandbox-commit` **Reviewer:** Automated deep review (3 full-cycle passes) **Scope:** All code changes on branch `fix/atomic-sandbox-commit` plus immediate surrounding code **Commit:** `d58311d3` by Luis Mendes **Issue:** #925 — ensure atomic apply — `commit_all` must be all-or-nothing --- ## Overall Assessment The implementation correctly converts `SandboxManager.commit_all()` from partial-commit to all-or-nothing semantics. The design choices (rollback-on-failure over pre-validation, LIFO rollback order, strategy-specific rollback mechanisms) are sound. The code is well-documented with clear explanations of trade-offs and spec contradictions. The `_fs_utils` extraction is a good refactoring that eliminates duplication. However, the review identified several issues across multiple categories that should be addressed before merge. --- ## Findings by Severity ### CRITICAL — Bugs #### B1. `OverlaySandbox`: stale merged directory after rollback from COMMITTED **File:** `overlay.py:394-409` | **Severity:** Critical When `rollback()` is called from `COMMITTED` state, `safe_restore()` correctly restores the **original** directory from the pre-commit backup, but the **merged directory** (`_merged_dir`) is **not reset**. Since `OverlaySandbox.get_path()` (line 238-242) allows the `ROLLED_BACK` status (unlike `CopyOnWriteSandbox`), a subsequent `get_path()` call would transition to `ACTIVE` and return paths into the **stale merged directory** — exposing pre-rollback data to the caller. **Impact:** Data integrity violation — the sandbox appears clean but contains stale modifications. **Suggested fix:** After restoring the original from backup in the COMMITTED rollback branch, also reset the merged directory (re-copy from the now-restored original, or recreate the overlay mount). --- ### HIGH — Bugs #### B2. Non-`SandboxError` re-raise loses rollback metadata **File:** `manager.py:298-302` | **Severity:** High When `commit()` raises a non-`SandboxError` exception (e.g., `RuntimeError`), the atomic rollback is performed but the original exception is re-raised directly. The caller receives the bare exception with **no indication** of which sandboxes were rolled back or which rollback operations failed. The detailed `error_msg` is logged but not attached to the exception. **Impact:** The caller cannot determine the system state after the exception — whether rollback succeeded, partially succeeded, or failed. **Suggested fix:** Wrap the original exception in a `SandboxError` (or a new `AtomicCommitError` subclass) that includes the rollback metadata, or attach the metadata to the exception via `__notes__` (Python 3.11+). #### B3. Backup created in system temp dir — cross-filesystem `safe_restore` performance **File:** `copy_on_write.py:253`, `overlay.py:300` | **Severity:** High (Performance + Correctness risk) `tempfile.mkdtemp()` creates the backup directory in the system temp directory (typically `/tmp`). If the original directory is on a different filesystem (e.g., `/home`, `/data`), then: 1. `backup_directory()` must **copy** across filesystems (slow for large dirs) 2. `safe_restore()` must use `shutil.copytree` for the restore (slow again) If the backup were created on the **same filesystem** as the original (e.g., using the original's parent as `dir=` argument to `mkdtemp`), the backup could potentially use hard links and the restore rename would be nearly instantaneous. **Impact:** For large project directories, the backup+restore cycle could take minutes instead of seconds. --- ### HIGH — Spec Compliance / Design #### S1. `NoSandbox` and `TransactionSandbox` fundamentally break atomicity **File:** `manager.py:254-266`, `no_sandbox.py:190-200`, `transaction_sandbox.py:281-289` | **Severity:** High (Design) The commit correctly logs a **warning** when `NoSandbox` instances are in the batch, but the operation proceeds anyway. Both `NoSandbox.rollback()` and `TransactionSandbox.rollback()` from `COMMITTED` raise unconditionally, meaning: - Any `NoSandbox` in the batch makes atomicity **impossible** (changes applied in-place, cannot be undone) - Any `TransactionSandbox` that committed before a failure **cannot be rolled back** (database COMMIT is permanent) The warning message says atomicity is "weakened" — a more accurate statement would be that atomicity is **broken**. **Suggested action:** Consider either: 1. Refusing to proceed with `commit_all` when non-rollbackable sandboxes are present (strict mode), or 2. Requiring non-rollbackable sandboxes to commit **last** in the batch (so their commit only happens after all rollbackable sandboxes succeed), or 3. Upgrading the log message from "weakened" to "broken" and documenting this limitation more prominently in the docstring. --- ### MEDIUM — Bugs #### B4. `_fs_utils.safe_restore` only catches `OSError` — non-OS exceptions leave directory renamed **File:** `_fs_utils.py:103-106` | **Severity:** Medium If `shutil.copytree` (line 97-102) raises a non-`OSError` exception (unlikely but possible — e.g., `MemoryError` from a deeply nested tree), the `except OSError` block won't catch it. The original directory has already been renamed to `stale` (line 95), and the rename-back at line 105 won't execute. This leaves `target_path` non-existent and the original data only accessible via the `.atomic-rollback-old` suffix. **Suggested fix:** Broaden the except clause to `except BaseException` (or at minimum `except Exception`) for the restore-rename, while still re-raising: ```python except BaseException: os.rename(stale, target_path) raise ``` --- ### MEDIUM — Performance #### P1. Full directory backup for any change **File:** `copy_on_write.py:252-259`, `overlay.py:299-306` | **Severity:** Medium Even a single-file change triggers a complete directory backup via `backup_directory()`. For large project directories (the spec references Firefox-scale codebases), this means copying potentially gigabytes of data for a one-line edit. The no-change optimization is good, but the common case (some changes) remains expensive. **Considered:** A per-file backup (only backing up files that will be overwritten) would be more efficient but significantly more complex. This is a known trade-off and may be acceptable for v1. #### P2. Triple directory walk during commit **File:** `copy_on_write.py:239-255`, `overlay.py:286-306` | **Severity:** Medium The commit path walks the directory tree **three times**: 1. `_compute_diff()` walks both sandbox and original 2. `backup_directory()` walks the original again For very large directories, consolidating into fewer walks could improve performance. --- ### MEDIUM — Test Coverage #### T1. No integration test for real filesystem rollback from COMMITTED **File:** `features/sandbox_manager_coverage.feature` | **Severity:** Medium All atomic commit BDD scenarios use mock sandboxes. There is no test that exercises the actual `safe_restore()` → filesystem restoration path through `CopyOnWriteSandbox.rollback()` from `COMMITTED`. The `_fs_utils` tests cover `safe_restore` in isolation, but the full `commit() → partial-failure → rollback-from-COMMITTED` path with real files is untested. #### T2. No test for `commit_all` with `TransactionSandbox` in batch **Severity:** Medium There is no scenario verifying that a `TransactionSandbox` that committed before a failure correctly ends up in `failed_rollback_ids` (since its rollback from COMMITTED always raises `SandboxRollbackError`). #### T3. No test for `OverlaySandbox` stale merged dir after COMMITTED rollback **Severity:** Medium (directly related to bug B1) Missing test that would catch the stale merged directory issue described in B1. --- ### LOW — Bugs / Robustness #### B5. `backup_directory` does not handle special files (FIFOs, sockets, devices) **File:** `_fs_utils.py:58-65` | **Severity:** Low `backup_directory` handles regular files, symlinks, and directories, but `_copy_file` would block or fail on named pipes, Unix domain sockets, or device files. These are unlikely in typical project directories but could appear in certain resource types. #### B6. `safe_restore` stale cleanup has TOCTOU window **File:** `_fs_utils.py:92-93` | **Severity:** Low `os.path.exists(stale)` followed by `shutil.rmtree(stale, ignore_errors=True)` has a small TOCTOU (time-of-check-time-of-use) window. The `ignore_errors=True` mitigates the practical impact — if the stale directory changes between check and removal, the rmtree failure is silenced and the subsequent `os.rename` will fail with a clear error. --- ### LOW — Test Quality #### T4. Rollback order test implicitly depends on Python dict insertion order **File:** `features/steps/sandbox_manager_coverage_steps.py` | **Severity:** Low The LIFO rollback order test relies on `_committable_mock_keys` being populated in the order the step definitions are called, and on `_active_sandboxes[plan_id].values()` iterating in insertion order. While Python 3.7+ guarantees dict insertion order, the test is implicitly testing a Python language guarantee rather than explicit ordering logic in the manager. #### T5. Robot integration test for atomic commit uses mocks **File:** `robot/helper_sandbox_integration.py:333-407` | **Severity:** Low The "Manager Atomic Commit Rolls Back On Failure" Robot test replaces sandboxes with MagicMock objects, making it effectively a unit test despite being in the integration test suite. A true integration test would use real sandbox implementations with real filesystem operations. #### T6. Cross-module step definition dependency **File:** `features/steps/sandbox_manager_coverage_steps.py` → `lock_service_coverage_steps.py` | **Severity:** Low The "a RuntimeError should have been raised" step used in the atomic commit scenarios is defined in `lock_service_coverage_steps.py`, creating a hidden cross-module dependency. If that file is refactored, the sandbox tests break. Consider defining this step in a shared steps module or locally. --- ### INFORMATIONAL — Documentation / Style #### D1. Inline import in hot path **File:** `manager.py:257` | **Severity:** Informational `from cleveragents.infrastructure.sandbox.no_sandbox import NoSandbox` is inside `commit_all()`. Python caches imports after first load, so the performance impact is negligible after the first call. However, top-level imports are conventional and easier to trace. Since this avoids a circular import, it's acceptable but worth a comment explaining why. #### D2. Spec contradiction documented clearly **File:** `manager.py:210-215` | **Severity:** Informational (Positive) The spec contradiction between line 45938 (atomicity) and line 19193 (partial apply) is clearly documented in the docstring with specific line references. This is good practice. #### D3. LIFO vs DAG-based rollback distinction documented **File:** `manager.py:332-342` | **Severity:** Informational (Positive) The distinction between LIFO batch undo and spec-defined DAG-based rollback is clearly explained in the `_rollback_committed` docstring. --- ## Summary Table | ID | Category | Severity | File(s) | Summary | |----|----------|----------|---------|---------| | B1 | Bug | **Critical** | `overlay.py` | Stale merged dir after COMMITTED rollback | | B2 | Bug | High | `manager.py` | Non-SandboxError re-raise loses rollback metadata | | B3 | Perf/Bug | High | `copy_on_write.py`, `overlay.py` | Backup in /tmp causes cross-FS copy penalty | | S1 | Spec/Design | High | `manager.py`, `no_sandbox.py`, `transaction_sandbox.py` | Non-rollbackable sandboxes break atomicity | | B4 | Bug | Medium | `_fs_utils.py` | `safe_restore` except clause too narrow | | P1 | Performance | Medium | `copy_on_write.py`, `overlay.py` | Full dir backup for any change | | P2 | Performance | Medium | `copy_on_write.py`, `overlay.py` | Triple directory walk during commit | | T1 | Test Coverage | Medium | `sandbox_manager_coverage.feature` | No integration test for real FS rollback from COMMITTED | | T2 | Test Coverage | Medium | — | No test for TransactionSandbox in atomic commit | | T3 | Test Coverage | Medium | — | No test for OverlaySandbox stale merged dir (B1) | | B5 | Bug | Low | `_fs_utils.py` | No handling for special files | | B6 | Bug | Low | `_fs_utils.py` | TOCTOU in stale cleanup | | T4 | Test Quality | Low | `sandbox_manager_coverage_steps.py` | Implicit dict order dependency | | T5 | Test Quality | Low | `helper_sandbox_integration.py` | Integration test uses mocks | | T6 | Test Quality | Low | `sandbox_manager_coverage_steps.py` | Cross-module step dependency | | D1 | Style | Info | `manager.py` | Inline import | **Critical: 1** | **High: 3** | **Medium: 6** | **Low: 5** | **Info: 1**
CoreRasurae force-pushed fix/atomic-sandbox-commit from d58311d39a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m25s
CI / typecheck (pull_request) Successful in 3m53s
CI / quality (pull_request) Successful in 3m42s
CI / security (pull_request) Successful in 4m13s
CI / build (pull_request) Successful in 19s
CI / integration_tests (pull_request) Successful in 5m48s
CI / e2e_tests (pull_request) Successful in 5m22s
CI / unit_tests (pull_request) Successful in 9m3s
CI / docker (pull_request) Successful in 59s
CI / coverage (pull_request) Successful in 12m43s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 50m50s
to baf3d9f9f1
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 25s
CI / security (pull_request) Successful in 49s
CI / typecheck (pull_request) Successful in 4m17s
CI / build (pull_request) Successful in 26s
CI / quality (pull_request) Successful in 3m48s
CI / integration_tests (pull_request) Successful in 5m37s
CI / unit_tests (pull_request) Successful in 5m55s
CI / docker (pull_request) Successful in 57s
CI / e2e_tests (pull_request) Successful in 8m18s
CI / coverage (pull_request) Successful in 8m49s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 52m58s
2026-03-28 22:01:01 +00:00
Compare
Author
Member

Code Review Report — PR #1146: fix(sandbox): make commit_all atomic per specification

Branch: fix/atomic-sandbox-commit
Issue: #925
Commit: baf3d9f9
Review scope: All code changes in the branch + immediate surrounding context.
Review method: 3 iterative full-pass cycles covering bug detection, security, performance, test coverage, and documentation.


Summary

The PR implements all-or-nothing semantics for SandboxManager.commit_all() per the specification's atomicity requirement (line 45938). The overall design is sound: rollbackable sandboxes commit first, non-rollbackable last; on failure, LIFO rollback is performed on already-committed sandboxes. The shared _fs_utils module, pre-commit backups, and safe_restore rename-swap pattern are well thought out.

However, the review identified 1 high-severity bug, 5 medium-severity issues, and several lower-priority items across bugs, test coverage gaps, and documentation inconsistencies.


HIGH Severity

H1 — Bug: TransactionSandbox misclassified as rollbackable in commit_all

File: manager.py:264-275

commit_all() classifies sandboxes as non-rollbackable using isinstance(sb, NoSandbox) only. TransactionSandbox.rollback() from COMMITTED state raises SandboxRollbackError (line 281-289 of transaction_sandbox.py) because a database COMMIT is irreversible. Yet TransactionSandbox ends up in the rollbackable list and is committed first in the batch.

If a later sandbox in the batch fails, the atomic rollback attempts to undo the TransactionSandbox commit and always fails. Database changes become permanent while filesystem sandboxes are rolled back — breaking the atomicity guarantee this PR is implementing.

Recommendation: Add TransactionSandbox to the non-rollbackable classification (or introduce a supports_rollback_from_committed protocol method) so it is committed last, alongside NoSandbox.


MEDIUM Severity

M1 — Bug: OverlaySandbox COMMITTED rollback doesn't remount OverlayFS

File: overlay.py:400-427

When rolling back from COMMITTED on a real OverlayFS sandbox, the code restores the original via safe_restore and then resets the merged directory with a plain shutil.copytree. The OverlayFS mount is not re-established. If the sandbox is subsequently reactivated (ROLLED_BACKACTIVE via get_path()), writes will go to the plain copy rather than through the overlay filesystem — silently losing the copy-on-write isolation.

The upper_dir and work_dir are also not reset, leaving them in a stale state.

Recommendation: For real OverlayFS, the rollback from COMMITTED should unmount, clean upper/work dirs, and remount (similar to the ACTIVE rollback path for real overlay).

M2 — Bug: Merged dir reset uses dirs_exist_ok=False after rmtree(ignore_errors=True)

File: overlay.py:420-427

shutil.rmtree(self._merged_dir, ignore_errors=True)
shutil.copytree(
    self._original_path, self._merged_dir,
    symlinks=self._use_real_overlay, dirs_exist_ok=False,
)

If rmtree fails silently (e.g., due to a locked file or permission issue), copytree with dirs_exist_ok=False raises FileExistsError, causing the rollback to fail with ERRORED status. Since the backup has already been consumed by safe_restore at this point, there is no recovery path.

Recommendation: Use dirs_exist_ok=True, or perform the rmtree without ignore_errors and handle the error explicitly.

M3 — Bug: CopyOnWriteSandbox / OverlaySandbox rollback only catches OSError

Files: copy_on_write.py:375, overlay.py:448

The rollback methods catch except OSError as exc: but safe_restore can re-raise non-OSError exceptions. For example, shutil.copytree internally catches BaseException in safe_restore, renames the original back, and re-raises. If the original exception is MemoryError or another non-OSError, the rollback's except OSError clause doesn't catch it. The sandbox status is never set to ERRORED, and _pre_commit_backup is not cleared.

Recommendation: Broaden the catch to except Exception as exc: (or at minimum except BaseException as exc:) in the rollback methods, matching the pattern already used in _rollback_committed.

M4 — Bug: Non-SandboxError re-raise in commit_all loses rollback metadata

File: manager.py:312-316

if not isinstance(exc, SandboxError):
    raise

When a non-SandboxError exception triggers the atomic rollback, the rollback is performed and rolled_back_ids / failed_rollback_ids are computed — but only logged. The original exception is re-raised as-is with no rollback metadata attached. The caller has no programmatic way to determine which sandboxes were rolled back.

Recommendation: Wrap the original exception in a SandboxError (or a new AtomicCommitError subclass) that carries the rollback metadata, and chain the original as __cause__.

M5 — Documentation: TransactionSandbox.rollback() docstring contradicts implementation

File: transaction_sandbox.py:264-289

The docstring states:

"A warning is logged and the sandbox transitions to ROLLED_BACK so that the manager's atomic commit protocol can proceed consistently."

But the code raises SandboxRollbackError without transitioning to ROLLED_BACK. The manager's _rollback_committed catches this and reports it in failed_rollback_ids. The docstring should match the actual behavior.


LOW Severity

L1 — Bug: backup_directory doesn't handle special files

File: _fs_utils.py:58-65

os.walk yields FIFOs, sockets, and device files in the filenames list. _copy_file uses open(src, "rb") which could hang on a FIFO or error on a socket. Sandbox directories are unlikely to contain these, but a defensive os.path.isfile() check before _copy_file would prevent surprises.

L2 — Bug: OverlaySandbox COMMITTED rollback — no recovery if merged dir copytree fails after backup consumed

File: overlay.py:420-427

If safe_restore succeeds (consuming the backup) but the subsequent copytree for the merged directory fails, the original is correctly restored but the merged dir is in a broken state with no backup available for retry. Status transitions to ERRORED, which is correct, but the sandbox cannot recover.

L3 — Performance: commit_all re-imports NoSandbox on every call

File: manager.py:260

from cleveragents.infrastructure.sandbox.no_sandbox import NoSandbox is inside commit_all(). Python caches module imports, so the overhead is minimal (dict lookup), but moving it to module level (or using a TYPE_CHECKING guard) would be cleaner. If circular import is the concern, a TYPE_CHECKING guard could be used.

L4 — Security: safe_restore uses predictable stale directory name

File: _fs_utils.py:88

target_path + ".atomic-rollback-old" is a predictable sibling name. A local user could theoretically pre-create this as a symlink to interfere with the restore. The TOCTOU window between os.path.exists and shutil.rmtree is small but present. Using tempfile.mkdtemp(dir=os.path.dirname(target_path)) for the stale name would eliminate this.


Test Coverage Gaps

T1 (MEDIUM): No BDD test for GitWorktreeSandbox rollback from COMMITTED

The git reset --hard to the pre-merge commit path is untested in BDD scenarios. Only the manager-level atomic commit is tested via mocks in the Robot test.

T2 (MEDIUM): No BDD test for TransactionSandbox rollback from COMMITTED

The SandboxRollbackError raised when attempting to undo a database COMMIT is untested in BDD. This is especially important given issue H1.

T3 (MEDIUM): No test for commit_all with mixed sandbox types

A batch containing CopyOnWriteSandbox + GitWorktreeSandbox + TransactionSandbox is untested. The interaction between rollbackable and non-rollbackable sandboxes of different types is not verified end-to-end.

T4 (MEDIUM): Test steps silently skip when sandbox is None

File: sandbox_manager_coverage_steps.py

In step_given_sandbox_replaced_with_committable_mock (line 180) and step_given_sandbox_commit_succeeds_rollback_fails (line 232), the if sandbox is not None: guard silently does nothing if the sandbox doesn't exist. This could mask test setup failures — a missing sandbox would cause the step to pass vacuously, and the scenario might pass with incorrect setup.

Recommendation: Replace if sandbox is not None: with assert sandbox is not None.

T5 (LOW): Committable mock's commit() returns MagicMock, not CommitResult

The committable mock created in step_given_sandbox_replaced_with_committable_mock has no explicit commit.return_value. In failure paths this doesn't matter (the mock results are discarded), but if any success-path test uses these mocks, the returned MagicMock would not match the CommitResult interface.


Informational

  • backup_directory does not preserve extended attributes (xattrs) or ACLs — only mode and timestamps are copied. This differs from shutil.copy2 behavior on platforms with xattr support.
  • commit_all operates without the lock during the commit loop (line 284+). This is a pre-existing pattern in the codebase (rollback_all, cleanup_all), but the atomic semantics make concurrent modification more impactful.
  • git reset --hard during COMMITTED rollback could affect concurrent processes tracking the same branch (documented in docstring warning — good).

Verdict

The implementation is well-structured with thorough commit message documentation, good defensive patterns (safe_restore rename-swap, backup-before-modify), and comprehensive BDD coverage for the core scenarios. The H1 issue (TransactionSandbox atomicity bypass) should be addressed before merge as it directly undermines the feature's purpose. The medium-severity items (M1–M5) are worth fixing but are in less-likely code paths.

# Code Review Report — PR #1146: `fix(sandbox): make commit_all atomic per specification` **Branch:** `fix/atomic-sandbox-commit` **Issue:** #925 **Commit:** `baf3d9f9` **Review scope:** All code changes in the branch + immediate surrounding context. **Review method:** 3 iterative full-pass cycles covering bug detection, security, performance, test coverage, and documentation. --- ## Summary The PR implements all-or-nothing semantics for `SandboxManager.commit_all()` per the specification's atomicity requirement (line 45938). The overall design is sound: rollbackable sandboxes commit first, non-rollbackable last; on failure, LIFO rollback is performed on already-committed sandboxes. The shared `_fs_utils` module, pre-commit backups, and `safe_restore` rename-swap pattern are well thought out. However, the review identified **1 high-severity bug, 5 medium-severity issues, and several lower-priority items** across bugs, test coverage gaps, and documentation inconsistencies. --- ## HIGH Severity ### H1 — Bug: `TransactionSandbox` misclassified as rollbackable in `commit_all` **File:** `manager.py:264-275` `commit_all()` classifies sandboxes as non-rollbackable using `isinstance(sb, NoSandbox)` only. `TransactionSandbox.rollback()` from `COMMITTED` state raises `SandboxRollbackError` (line 281-289 of `transaction_sandbox.py`) because a database `COMMIT` is irreversible. Yet `TransactionSandbox` ends up in the `rollbackable` list and is committed **first** in the batch. If a later sandbox in the batch fails, the atomic rollback attempts to undo the `TransactionSandbox` commit and **always fails**. Database changes become permanent while filesystem sandboxes are rolled back — breaking the atomicity guarantee this PR is implementing. **Recommendation:** Add `TransactionSandbox` to the non-rollbackable classification (or introduce a `supports_rollback_from_committed` protocol method) so it is committed **last**, alongside `NoSandbox`. --- ## MEDIUM Severity ### M1 — Bug: `OverlaySandbox` COMMITTED rollback doesn't remount OverlayFS **File:** `overlay.py:400-427` When rolling back from `COMMITTED` on a real OverlayFS sandbox, the code restores the original via `safe_restore` and then resets the merged directory with a plain `shutil.copytree`. The OverlayFS mount is **not** re-established. If the sandbox is subsequently reactivated (`ROLLED_BACK` → `ACTIVE` via `get_path()`), writes will go to the plain copy rather than through the overlay filesystem — silently losing the copy-on-write isolation. The `upper_dir` and `work_dir` are also not reset, leaving them in a stale state. **Recommendation:** For real OverlayFS, the rollback from `COMMITTED` should unmount, clean upper/work dirs, and remount (similar to the `ACTIVE` rollback path for real overlay). ### M2 — Bug: Merged dir reset uses `dirs_exist_ok=False` after `rmtree(ignore_errors=True)` **File:** `overlay.py:420-427` ```python shutil.rmtree(self._merged_dir, ignore_errors=True) shutil.copytree( self._original_path, self._merged_dir, symlinks=self._use_real_overlay, dirs_exist_ok=False, ) ``` If `rmtree` fails silently (e.g., due to a locked file or permission issue), `copytree` with `dirs_exist_ok=False` raises `FileExistsError`, causing the rollback to fail with `ERRORED` status. Since the backup has already been consumed by `safe_restore` at this point, there is no recovery path. **Recommendation:** Use `dirs_exist_ok=True`, or perform the rmtree without `ignore_errors` and handle the error explicitly. ### M3 — Bug: `CopyOnWriteSandbox` / `OverlaySandbox` rollback only catches `OSError` **Files:** `copy_on_write.py:375`, `overlay.py:448` The rollback methods catch `except OSError as exc:` but `safe_restore` can re-raise non-`OSError` exceptions. For example, `shutil.copytree` internally catches `BaseException` in `safe_restore`, renames the original back, and re-raises. If the original exception is `MemoryError` or another non-`OSError`, the rollback's `except OSError` clause doesn't catch it. The sandbox status is never set to `ERRORED`, and `_pre_commit_backup` is not cleared. **Recommendation:** Broaden the catch to `except Exception as exc:` (or at minimum `except BaseException as exc:`) in the rollback methods, matching the pattern already used in `_rollback_committed`. ### M4 — Bug: Non-`SandboxError` re-raise in `commit_all` loses rollback metadata **File:** `manager.py:312-316` ```python if not isinstance(exc, SandboxError): raise ``` When a non-`SandboxError` exception triggers the atomic rollback, the rollback is performed and `rolled_back_ids` / `failed_rollback_ids` are computed — but only logged. The original exception is re-raised as-is with no rollback metadata attached. The caller has no programmatic way to determine which sandboxes were rolled back. **Recommendation:** Wrap the original exception in a `SandboxError` (or a new `AtomicCommitError` subclass) that carries the rollback metadata, and chain the original as `__cause__`. ### M5 — Documentation: `TransactionSandbox.rollback()` docstring contradicts implementation **File:** `transaction_sandbox.py:264-289` The docstring states: > "A warning is logged and the sandbox transitions to `ROLLED_BACK` so that the manager's atomic commit protocol can proceed consistently." But the code raises `SandboxRollbackError` **without** transitioning to `ROLLED_BACK`. The manager's `_rollback_committed` catches this and reports it in `failed_rollback_ids`. The docstring should match the actual behavior. --- ## LOW Severity ### L1 — Bug: `backup_directory` doesn't handle special files **File:** `_fs_utils.py:58-65` `os.walk` yields FIFOs, sockets, and device files in the `filenames` list. `_copy_file` uses `open(src, "rb")` which could hang on a FIFO or error on a socket. Sandbox directories are unlikely to contain these, but a defensive `os.path.isfile()` check before `_copy_file` would prevent surprises. ### L2 — Bug: `OverlaySandbox` COMMITTED rollback — no recovery if merged dir copytree fails after backup consumed **File:** `overlay.py:420-427` If `safe_restore` succeeds (consuming the backup) but the subsequent `copytree` for the merged directory fails, the original is correctly restored but the merged dir is in a broken state with no backup available for retry. Status transitions to `ERRORED`, which is correct, but the sandbox cannot recover. ### L3 — Performance: `commit_all` re-imports `NoSandbox` on every call **File:** `manager.py:260` `from cleveragents.infrastructure.sandbox.no_sandbox import NoSandbox` is inside `commit_all()`. Python caches module imports, so the overhead is minimal (dict lookup), but moving it to module level (or using a `TYPE_CHECKING` guard) would be cleaner. If circular import is the concern, a `TYPE_CHECKING` guard could be used. ### L4 — Security: `safe_restore` uses predictable stale directory name **File:** `_fs_utils.py:88` `target_path + ".atomic-rollback-old"` is a predictable sibling name. A local user could theoretically pre-create this as a symlink to interfere with the restore. The TOCTOU window between `os.path.exists` and `shutil.rmtree` is small but present. Using `tempfile.mkdtemp(dir=os.path.dirname(target_path))` for the stale name would eliminate this. --- ## Test Coverage Gaps ### T1 (MEDIUM): No BDD test for `GitWorktreeSandbox` rollback from `COMMITTED` The `git reset --hard` to the pre-merge commit path is untested in BDD scenarios. Only the manager-level atomic commit is tested via mocks in the Robot test. ### T2 (MEDIUM): No BDD test for `TransactionSandbox` rollback from `COMMITTED` The `SandboxRollbackError` raised when attempting to undo a database `COMMIT` is untested in BDD. This is especially important given issue H1. ### T3 (MEDIUM): No test for `commit_all` with mixed sandbox types A batch containing `CopyOnWriteSandbox` + `GitWorktreeSandbox` + `TransactionSandbox` is untested. The interaction between rollbackable and non-rollbackable sandboxes of different types is not verified end-to-end. ### T4 (MEDIUM): Test steps silently skip when sandbox is `None` **File:** `sandbox_manager_coverage_steps.py` In `step_given_sandbox_replaced_with_committable_mock` (line 180) and `step_given_sandbox_commit_succeeds_rollback_fails` (line 232), the `if sandbox is not None:` guard silently does nothing if the sandbox doesn't exist. This could mask test setup failures — a missing sandbox would cause the step to pass vacuously, and the scenario might pass with incorrect setup. **Recommendation:** Replace `if sandbox is not None:` with `assert sandbox is not None`. ### T5 (LOW): Committable mock's `commit()` returns `MagicMock`, not `CommitResult` The committable mock created in `step_given_sandbox_replaced_with_committable_mock` has no explicit `commit.return_value`. In failure paths this doesn't matter (the mock results are discarded), but if any success-path test uses these mocks, the returned `MagicMock` would not match the `CommitResult` interface. --- ## Informational - `backup_directory` does not preserve extended attributes (xattrs) or ACLs — only mode and timestamps are copied. This differs from `shutil.copy2` behavior on platforms with xattr support. - `commit_all` operates without the lock during the commit loop (line 284+). This is a pre-existing pattern in the codebase (`rollback_all`, `cleanup_all`), but the atomic semantics make concurrent modification more impactful. - `git reset --hard` during COMMITTED rollback could affect concurrent processes tracking the same branch (documented in docstring warning — good). --- ## Verdict The implementation is well-structured with thorough commit message documentation, good defensive patterns (safe_restore rename-swap, backup-before-modify), and comprehensive BDD coverage for the core scenarios. The **H1 issue** (`TransactionSandbox` atomicity bypass) should be addressed before merge as it directly undermines the feature's purpose. The medium-severity items (M1–M5) are worth fixing but are in less-likely code paths.
CoreRasurae force-pushed fix/atomic-sandbox-commit from baf3d9f9f1
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 25s
CI / security (pull_request) Successful in 49s
CI / typecheck (pull_request) Successful in 4m17s
CI / build (pull_request) Successful in 26s
CI / quality (pull_request) Successful in 3m48s
CI / integration_tests (pull_request) Successful in 5m37s
CI / unit_tests (pull_request) Successful in 5m55s
CI / docker (pull_request) Successful in 57s
CI / e2e_tests (pull_request) Successful in 8m18s
CI / coverage (pull_request) Successful in 8m49s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 52m58s
to 3a6dcd6d74
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 54s
CI / lint (pull_request) Successful in 3m20s
CI / build (pull_request) Successful in 17s
CI / security (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m18s
CI / integration_tests (pull_request) Successful in 6m48s
CI / e2e_tests (pull_request) Successful in 6m4s
CI / unit_tests (pull_request) Successful in 7m20s
CI / docker (pull_request) Failing after 54s
CI / coverage (pull_request) Successful in 11m18s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 23:09:28 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146 (fix/atomic-sandbox-commit, issue #925)

Reviewer: Automated deep review (4 global cycles across all categories)
Scope: Code changes in branch fix/atomic-sandbox-commit plus immediate surrounding code
Files reviewed: _fs_utils.py, manager.py, protocol.py, copy_on_write.py, overlay.py, git_worktree.py, transaction_sandbox.py, BDD features/steps, Robot Framework tests
Reference: Specification docs/specification.md (sandbox sections, lines 45938, 19193, 24632)


Summary

The implementation achieves its primary goal: commit_all() is now atomic with rollback support. The new _fs_utils module, AtomicCommitError exception, and LIFO rollback logic are well-structured. The commit-ordering strategy (rollbackable-first, non-rollbackable-last) is sound. However, the review identified 17 findings across multiple categories.


HIGH Severity

H1. backup_directory does not preserve directory timestamps correctly

Category: Bug
File: src/cleveragents/infrastructure/sandbox/_fs_utils.py:54-57 (subdirs), lines 39-41 (root)

Directory timestamps are set inside the os.walk loop, immediately after creating each subdirectory. However, subsequent file-copy operations into that directory update its mtime (standard POSIX behavior). The root directory timestamps are set on line 41 before the walk even begins, so they are overwritten by every mkdir/file-create inside the root.

Impact: backup_directory does not actually preserve source directory timestamps. Any rollback based on this backup exposes directories with wrong mtimes.

Fix: Collect directory (path, atime, mtime) tuples during the walk, then apply them all in a bottom-up post-walk pass (deepest directories first, root last), after all file copies are complete.


MEDIUM Severity

M1. backup_directory root/directory permissions set too early — may cause backup failure

Category: Bug
File: src/cleveragents/infrastructure/sandbox/_fs_utils.py:40 (root), 56 (subdirs)

Permissions are applied to the backup destination directory before content is written into it. If the source directory has restrictive permissions (e.g., 0o500 — read+execute only), the backup destination is chmod'd to 0o500, and subsequent file-write operations inside it will fail with PermissionError.

Impact: Backup (and therefore rollback) fails for directories with write-restrictive permissions.

Fix: Apply permissions in the same bottom-up post-walk pass as timestamps (after H1 fix).

M2. Thread-safety gap in commit_all

Category: Bug / Concurrency
File: src/cleveragents/infrastructure/sandbox/manager.py:243-294

The method acquires _lock to snapshot the sandboxes list (line 243-244), then releases the lock before iterating and committing. Between snapshot and commit, another thread could call cleanup_all, rollback_all, or another commit_all for the same plan_id, modifying the underlying sandboxes or _active_sandboxes dict.

Impact: Concurrent same-plan calls could cause double-commit, commit-after-cleanup, or other race conditions.

Fix: Either hold the lock for the entire commit loop, or document that concurrent commit_all calls for the same plan are not supported (and add a per-plan guard).

M3. Asymmetric error handling in commit_all — dual return/raise contract

Category: Bug / API Design
File: src/cleveragents/infrastructure/sandbox/manager.py:330-348

When a SandboxError occurs during commit, the method returns a CommitResult(success=False). When a non-SandboxError occurs, it raises AtomicCommitError. Callers must handle both a failed result and an exception for full correctness. This inconsistent contract is easy to misuse.

Impact: Callers that only check CommitResult.success will miss AtomicCommitError exceptions (and vice versa).

Suggestion: Consider unifying: always return CommitResult(success=False) and attach the exception info, or always raise (wrapping SandboxError too).

M4. OverlaySandbox.rollback() from ACTIVEdirs_exist_ok=False with ignore_errors=True rmtree

Category: Bug
File: src/cleveragents/infrastructure/sandbox/overlay.py:455-460

The ACTIVE rollback path calls shutil.rmtree(self._merged_dir, ignore_errors=True) then shutil.copytree(..., dirs_exist_ok=False). If rmtree silently fails (e.g., permission error on a file), copytree will raise FileExistsError because the directory still exists.

Note: The COMMITTED rollback path (line 441) correctly uses dirs_exist_ok=True for exactly this reason.

Fix: Use dirs_exist_ok=True on line 460, or don't use ignore_errors=True on the rmtree (so the error surfaces early).

M5. commit_all does not implement bottom-up commit ordering per spec

Category: Specification Compliance
File: src/cleveragents/infrastructure/sandbox/manager.py:294
Spec ref: Line 24632 — "Sandbox commit: Bottom-up (child before parent)"

The specification requires bottom-up ordering (child resources committed before parents). The current implementation uses insertion order partitioned into rollbackable-first / non-rollbackable-last, with no DAG traversal to determine parent/child relationships.

Impact: If resources are inserted parent-first (the natural top-down creation order), they'll be committed parent-first, violating the spec. This could cause issues with cross-resource dependencies.

Suggestion: This may be acceptable as a known limitation (document it), or implement DAG-aware ordering using the boundary cache/resource registry.

M6. backup_directory inner try/except only catches OSError — non-OSError leaks temp directory

Category: Bug / Resource Leak
File: src/cleveragents/infrastructure/sandbox/copy_on_write.py:260-264 and overlay.py:307-311

try:
    backup_directory(self._original_path, backup_dir)
except OSError:
    shutil.rmtree(backup_dir, ignore_errors=True)
    raise

If backup_directory raises a non-OSError exception (e.g., MemoryError, TypeError from an unexpected None), the inner except doesn't fire, the outer except OSError on line 282 doesn't fire either, and backup_dir is leaked on disk.

Fix: Catch Exception (or BaseException) in the inner handler, matching the broader catch philosophy used elsewhere in this PR.

M7. BDD test for directory timestamp preservation passes by coincidence

Category: Test Flaw
File: features/steps/sandbox_fs_utils_steps.pystep_then_dir_timestamps_match

The test creates source directories and files, then checks abs(src_mtime - dst_mtime) < 1.0. Because directories are freshly created and backup runs immediately, both source and destination directories have "approximately now" mtimes — the test passes because both timestamps are recent, not because backup_directory correctly preserved them.

If the source directories had distinctive old timestamps (e.g., os.utime(src_dir, (2000000.0, 2000000.0))), the test would fail, exposing bug H1.

Fix: Set distinctive directory timestamps in the Given step and verify they're preserved exactly (as is done for files).


LOW Severity

L1. TOCTOU race in safe_restore mkdtemp+rmdir pattern

Category: Security
File: src/cleveragents/infrastructure/sandbox/_fs_utils.py:100-108

safe_restore calls mkdtemp() (creates dir), os.rmdir() (removes it), then os.rename(target, stale). Between rmdir and rename, another process with write access to the parent directory could create a file/symlink at the stale path.

Risk: Very low in practice due to random naming and narrow timing window.

L2. Duplicate _compute_diff implementations

Category: Code Quality
Files: copy_on_write.py:426-473 and overlay.py:561-596

Both sandbox classes have functionally identical _compute_diff static methods. The _fs_utils module was introduced for shared backup/restore logic, but _compute_diff wasn't extracted.

Suggestion: Extract to _fs_utils.compute_diff() to eliminate duplication.

L3. Inline imports in commit_all

Category: Code Quality
File: src/cleveragents/infrastructure/sandbox/manager.py:261-264

NoSandbox and TransactionSandbox are imported inside the method body (to avoid circular imports). While Python caches imports, this is a code smell in a critical path.

Suggestion: Consider restructuring the module dependency to allow top-level imports, or use a type-checking flag/protocol-based check instead of isinstance.

L4. datetime.now() without timezone

Category: Code Quality
Files: manager.py:342, copy_on_write.py:161,321, overlay.py:373, etc.

Multiple CommitResult and SandboxContext instances use naive datetime.now(). While this matches existing codebase patterns, it's worth noting for future distributed/server-mode scenarios.

L5. No test for GitWorktreeSandbox rollback from COMMITTED

Category: Test Coverage Gap
File: git_worktree.py:496-506

The git reset --hard <pre_merge_commit> path for undoing a merge is not covered by any BDD or Robot Framework test in this PR. Both CopyOnWriteSandbox and OverlaySandbox have explicit rollback-from-COMMITTED scenarios.

L6. No test for commit_all with only non-rollbackable sandboxes

Category: Test Coverage Gap
File: manager.py:266-293

If all sandboxes are NoSandbox or TransactionSandbox, the rollbackable list is empty and the commit loop only processes non-rollbackable entries. If one fails, _rollback_committed is called with an empty committed list. This path is untested.

L7. No test for backup_directory skipping non-regular files

Category: Test Coverage Gap
File: _fs_utils.py:66-72

The code skips FIFOs, sockets, and device files with a warning. No BDD scenario verifies this behavior.

L8. Redundant None checks after assertions in test steps

Category: Code Quality
Files: sandbox_manager_coverage_steps.py:194, 728

Pattern: assert sandbox is not None followed by if sandbox is not None:. The if is dead code since the assertion would fail first.

L9. commit_all docstring inaccurately describes exception behavior

Category: Documentation
File: src/cleveragents/infrastructure/sandbox/manager.py:234-238

The docstring says "the original exception is re-raised", but the code actually wraps it in AtomicCommitError and raises that instead. The caller receives AtomicCommitError (with __cause__), not the raw exception.


Informational

I1. Specification contradiction documented correctly

The commit message and commit_all docstring (lines 211-216) properly document the contradiction between spec line 45938 (atomic apply) and line 19193 (partial apply). The atomicity requirement from 45938 was chosen per issue #925. A spec update for line 19193 is recommended.

I2. LIFO rollback order vs. spec's top-down rollback

The _rollback_committed docstring (lines 369-375) correctly distinguishes between LIFO batch-undo ordering and the spec's top-down DAG-traversal rollback ordering (line 24632). This is a reasonable design choice.


Findings Summary

Severity Count Categories
HIGH 1 Bug
MEDIUM 7 Bug (4), Spec Compliance (1), API Design (1), Test Flaw (1)
LOW 9 Security (1), Code Quality (4), Test Coverage (3), Documentation (1)
Info 2 Spec notes

Recommended priority: Fix H1 and M1 together (they share the same root cause and fix), then address M4 and M6 (one-line fixes). M2 and M3 are design-level decisions that may warrant discussion.

## Code Review Report — PR #1146 (`fix/atomic-sandbox-commit`, issue #925) **Reviewer:** Automated deep review (4 global cycles across all categories) **Scope:** Code changes in branch `fix/atomic-sandbox-commit` plus immediate surrounding code **Files reviewed:** `_fs_utils.py`, `manager.py`, `protocol.py`, `copy_on_write.py`, `overlay.py`, `git_worktree.py`, `transaction_sandbox.py`, BDD features/steps, Robot Framework tests **Reference:** Specification `docs/specification.md` (sandbox sections, lines 45938, 19193, 24632) --- ### Summary The implementation achieves its primary goal: `commit_all()` is now atomic with rollback support. The new `_fs_utils` module, `AtomicCommitError` exception, and LIFO rollback logic are well-structured. The commit-ordering strategy (rollbackable-first, non-rollbackable-last) is sound. However, the review identified **17 findings** across multiple categories. --- ## HIGH Severity ### H1. `backup_directory` does not preserve directory timestamps correctly **Category:** Bug **File:** `src/cleveragents/infrastructure/sandbox/_fs_utils.py:54-57` (subdirs), lines `39-41` (root) Directory timestamps are set *inside* the `os.walk` loop, immediately after creating each subdirectory. However, subsequent file-copy operations *into* that directory update its `mtime` (standard POSIX behavior). The root directory timestamps are set on line 41 before the walk even begins, so they are overwritten by every mkdir/file-create inside the root. **Impact:** `backup_directory` does not actually preserve source directory timestamps. Any rollback based on this backup exposes directories with wrong mtimes. **Fix:** Collect directory (path, atime, mtime) tuples during the walk, then apply them all in a *bottom-up* post-walk pass (deepest directories first, root last), after all file copies are complete. --- ## MEDIUM Severity ### M1. `backup_directory` root/directory permissions set too early — may cause backup failure **Category:** Bug **File:** `src/cleveragents/infrastructure/sandbox/_fs_utils.py:40` (root), `56` (subdirs) Permissions are applied to the backup destination directory *before* content is written into it. If the source directory has restrictive permissions (e.g., `0o500` — read+execute only), the backup destination is chmod'd to `0o500`, and subsequent file-write operations inside it will fail with `PermissionError`. **Impact:** Backup (and therefore rollback) fails for directories with write-restrictive permissions. **Fix:** Apply permissions in the same bottom-up post-walk pass as timestamps (after H1 fix). ### M2. Thread-safety gap in `commit_all` **Category:** Bug / Concurrency **File:** `src/cleveragents/infrastructure/sandbox/manager.py:243-294` The method acquires `_lock` to snapshot the sandboxes list (line 243-244), then *releases* the lock before iterating and committing. Between snapshot and commit, another thread could call `cleanup_all`, `rollback_all`, or another `commit_all` for the same `plan_id`, modifying the underlying sandboxes or `_active_sandboxes` dict. **Impact:** Concurrent same-plan calls could cause double-commit, commit-after-cleanup, or other race conditions. **Fix:** Either hold the lock for the entire commit loop, or document that concurrent `commit_all` calls for the same plan are not supported (and add a per-plan guard). ### M3. Asymmetric error handling in `commit_all` — dual return/raise contract **Category:** Bug / API Design **File:** `src/cleveragents/infrastructure/sandbox/manager.py:330-348` When a `SandboxError` occurs during commit, the method *returns* a `CommitResult(success=False)`. When a non-`SandboxError` occurs, it *raises* `AtomicCommitError`. Callers must handle **both** a failed result **and** an exception for full correctness. This inconsistent contract is easy to misuse. **Impact:** Callers that only check `CommitResult.success` will miss `AtomicCommitError` exceptions (and vice versa). **Suggestion:** Consider unifying: always return `CommitResult(success=False)` and attach the exception info, or always raise (wrapping `SandboxError` too). ### M4. `OverlaySandbox.rollback()` from `ACTIVE` — `dirs_exist_ok=False` with `ignore_errors=True` rmtree **Category:** Bug **File:** `src/cleveragents/infrastructure/sandbox/overlay.py:455-460` The ACTIVE rollback path calls `shutil.rmtree(self._merged_dir, ignore_errors=True)` then `shutil.copytree(..., dirs_exist_ok=False)`. If `rmtree` silently fails (e.g., permission error on a file), `copytree` will raise `FileExistsError` because the directory still exists. Note: The COMMITTED rollback path (line 441) correctly uses `dirs_exist_ok=True` for exactly this reason. **Fix:** Use `dirs_exist_ok=True` on line 460, or don't use `ignore_errors=True` on the rmtree (so the error surfaces early). ### M5. `commit_all` does not implement bottom-up commit ordering per spec **Category:** Specification Compliance **File:** `src/cleveragents/infrastructure/sandbox/manager.py:294` **Spec ref:** Line 24632 — "Sandbox commit: Bottom-up (child before parent)" The specification requires bottom-up ordering (child resources committed before parents). The current implementation uses insertion order partitioned into rollbackable-first / non-rollbackable-last, with no DAG traversal to determine parent/child relationships. **Impact:** If resources are inserted parent-first (the natural top-down creation order), they'll be committed parent-first, violating the spec. This could cause issues with cross-resource dependencies. **Suggestion:** This may be acceptable as a known limitation (document it), or implement DAG-aware ordering using the boundary cache/resource registry. ### M6. `backup_directory` inner try/except only catches `OSError` — non-OSError leaks temp directory **Category:** Bug / Resource Leak **File:** `src/cleveragents/infrastructure/sandbox/copy_on_write.py:260-264` and `overlay.py:307-311` ```python try: backup_directory(self._original_path, backup_dir) except OSError: shutil.rmtree(backup_dir, ignore_errors=True) raise ``` If `backup_directory` raises a non-`OSError` exception (e.g., `MemoryError`, `TypeError` from an unexpected `None`), the inner except doesn't fire, the outer `except OSError` on line 282 doesn't fire either, and `backup_dir` is leaked on disk. **Fix:** Catch `Exception` (or `BaseException`) in the inner handler, matching the broader catch philosophy used elsewhere in this PR. ### M7. BDD test for directory timestamp preservation passes by coincidence **Category:** Test Flaw **File:** `features/steps/sandbox_fs_utils_steps.py` — `step_then_dir_timestamps_match` The test creates source directories and files, then checks `abs(src_mtime - dst_mtime) < 1.0`. Because directories are freshly created and backup runs immediately, both source and destination directories have "approximately now" mtimes — the test passes because both timestamps are recent, **not** because `backup_directory` correctly preserved them. If the source directories had distinctive old timestamps (e.g., `os.utime(src_dir, (2000000.0, 2000000.0))`), the test would fail, exposing bug H1. **Fix:** Set distinctive directory timestamps in the Given step and verify they're preserved exactly (as is done for files). --- ## LOW Severity ### L1. TOCTOU race in `safe_restore` mkdtemp+rmdir pattern **Category:** Security **File:** `src/cleveragents/infrastructure/sandbox/_fs_utils.py:100-108` `safe_restore` calls `mkdtemp()` (creates dir), `os.rmdir()` (removes it), then `os.rename(target, stale)`. Between rmdir and rename, another process with write access to the parent directory could create a file/symlink at the stale path. **Risk:** Very low in practice due to random naming and narrow timing window. ### L2. Duplicate `_compute_diff` implementations **Category:** Code Quality **Files:** `copy_on_write.py:426-473` and `overlay.py:561-596` Both sandbox classes have functionally identical `_compute_diff` static methods. The `_fs_utils` module was introduced for shared backup/restore logic, but `_compute_diff` wasn't extracted. **Suggestion:** Extract to `_fs_utils.compute_diff()` to eliminate duplication. ### L3. Inline imports in `commit_all` **Category:** Code Quality **File:** `src/cleveragents/infrastructure/sandbox/manager.py:261-264` `NoSandbox` and `TransactionSandbox` are imported inside the method body (to avoid circular imports). While Python caches imports, this is a code smell in a critical path. **Suggestion:** Consider restructuring the module dependency to allow top-level imports, or use a type-checking flag/protocol-based check instead of `isinstance`. ### L4. `datetime.now()` without timezone **Category:** Code Quality **Files:** `manager.py:342`, `copy_on_write.py:161,321`, `overlay.py:373`, etc. Multiple `CommitResult` and `SandboxContext` instances use naive `datetime.now()`. While this matches existing codebase patterns, it's worth noting for future distributed/server-mode scenarios. ### L5. No test for `GitWorktreeSandbox` rollback from COMMITTED **Category:** Test Coverage Gap **File:** `git_worktree.py:496-506` The `git reset --hard <pre_merge_commit>` path for undoing a merge is not covered by any BDD or Robot Framework test in this PR. Both `CopyOnWriteSandbox` and `OverlaySandbox` have explicit rollback-from-COMMITTED scenarios. ### L6. No test for `commit_all` with only non-rollbackable sandboxes **Category:** Test Coverage Gap **File:** `manager.py:266-293` If all sandboxes are `NoSandbox` or `TransactionSandbox`, the `rollbackable` list is empty and the commit loop only processes non-rollbackable entries. If one fails, `_rollback_committed` is called with an empty `committed` list. This path is untested. ### L7. No test for `backup_directory` skipping non-regular files **Category:** Test Coverage Gap **File:** `_fs_utils.py:66-72` The code skips FIFOs, sockets, and device files with a warning. No BDD scenario verifies this behavior. ### L8. Redundant `None` checks after assertions in test steps **Category:** Code Quality **Files:** `sandbox_manager_coverage_steps.py:194`, `728` Pattern: `assert sandbox is not None` followed by `if sandbox is not None:`. The `if` is dead code since the assertion would fail first. ### L9. `commit_all` docstring inaccurately describes exception behavior **Category:** Documentation **File:** `src/cleveragents/infrastructure/sandbox/manager.py:234-238` The docstring says *"the original exception is re-raised"*, but the code actually wraps it in `AtomicCommitError` and raises that instead. The caller receives `AtomicCommitError` (with `__cause__`), not the raw exception. --- ## Informational ### I1. Specification contradiction documented correctly The commit message and `commit_all` docstring (lines 211-216) properly document the contradiction between spec line 45938 (atomic apply) and line 19193 (partial apply). The atomicity requirement from 45938 was chosen per issue #925. A spec update for line 19193 is recommended. ### I2. LIFO rollback order vs. spec's top-down rollback The `_rollback_committed` docstring (lines 369-375) correctly distinguishes between LIFO batch-undo ordering and the spec's top-down DAG-traversal rollback ordering (line 24632). This is a reasonable design choice. --- ### Findings Summary | Severity | Count | Categories | |----------|-------|------------| | **HIGH** | 1 | Bug | | **MEDIUM** | 7 | Bug (4), Spec Compliance (1), API Design (1), Test Flaw (1) | | **LOW** | 9 | Security (1), Code Quality (4), Test Coverage (3), Documentation (1) | | **Info** | 2 | Spec notes | **Recommended priority:** Fix H1 and M1 together (they share the same root cause and fix), then address M4 and M6 (one-line fixes). M2 and M3 are design-level decisions that may warrant discussion.
CoreRasurae force-pushed fix/atomic-sandbox-commit from 3a6dcd6d74
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 54s
CI / lint (pull_request) Successful in 3m20s
CI / build (pull_request) Successful in 17s
CI / security (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m18s
CI / integration_tests (pull_request) Successful in 6m48s
CI / e2e_tests (pull_request) Successful in 6m4s
CI / unit_tests (pull_request) Successful in 7m20s
CI / docker (pull_request) Failing after 54s
CI / coverage (pull_request) Successful in 11m18s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Has been cancelled
to 7a47e9dcdc
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 59s
CI / lint (pull_request) Successful in 3m22s
CI / quality (pull_request) Successful in 3m48s
CI / security (pull_request) Successful in 4m14s
CI / build (pull_request) Successful in 17s
CI / unit_tests (pull_request) Successful in 4m20s
CI / docker (pull_request) Successful in 1m25s
CI / integration_tests (pull_request) Successful in 4m4s
CI / e2e_tests (pull_request) Successful in 5m34s
CI / coverage (pull_request) Successful in 11m16s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 54m5s
2026-03-28 23:42:52 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146 fix(sandbox): make commit_all atomic per specification

Reviewer: Automated code review (4 full iterative cycles)
Scope: All code changes in branch fix/atomic-sandbox-commit plus close connections to surrounding code
Reference: Issue #925, specification lines 45938 (atomic apply), 19193 (partial apply), 24632 (DAG ordering)


Summary

Overall this is a well-structured change that correctly addresses the specification atomicity requirement (line 45938). The LIFO rollback strategy, non-rollbackable sandbox ordering, and AtomicCommitError exception design are sound. The _fs_utils extraction is a solid refactor eliminating code duplication. However, several issues were identified across 4 review cycles that should be addressed before merge.

Findings: 13 total — 2 High, 5 Medium, 6 Low


HIGH Severity

H1. BUG — commit_all does not hold lock during the commit loop (Thread-safety violation)

File: manager.py:249-256
Category: Bug / Concurrency

The lock is released after copying the sandbox list but before the status filter and the commit loop:

with self._lock:
    sandboxes = list(self._active_sandboxes.get(plan_id, {}).values())

committable = [
    sb for sb in sandboxes
    if sb.status in (SandboxStatus.CREATED, SandboxStatus.ACTIVE)
]
# ... commit loop runs without lock ...

Another thread could (a) modify a sandbox's status between the status check and the sandbox.commit() call, (b) add or remove sandboxes from the plan's tracking dict, or (c) call commit_all on the same plan concurrently, resulting in double-commits or missed sandboxes.

Recommendation: Either hold the lock for the entire commit_all operation (acceptable since sandbox commit() does I/O and this prevents concurrent commit_all for the same plan — which is the intended atomic behavior), or re-verify each sandbox's status immediately before calling commit() inside the loop and document the concurrency contract.


H2. BUG — Non-OSError exceptions during file-copy phase leave original directory partially corrupted

File: copy_on_write.py:282, overlay.py:329
Category: Bug / Data Integrity

The outer exception handler in commit() only catches OSError:

except OSError as exc:
    if self._pre_commit_backup is not None:
        try:
            safe_restore(self._pre_commit_backup, self._original_path)

If a non-OSError exception occurs during the file-copy phase (lines 267-280 in copy_on_write.py, lines 314-327 in overlay.py) — e.g., MemoryError, KeyboardInterrupt (a BaseException), or an unexpected TypeError — the backup exists but safe_restore is never called. The original directory is left partially modified (some files copied, some not). The exception propagates to commit_all, which calls _rollback_committed, but since this sandbox was never added to the committed list (it raised during commit), no rollback is attempted on it. The _pre_commit_backup is preserved but the original remains corrupted until manual intervention.

The commit message states "Pre-commit backup exception handler catches Exception (not just OSError)" — this is accurate for the inner handler around backup_directory() (line 262), but not for the outer handler that triggers safe_restore.

Recommendation: Change except OSError to except Exception (or except BaseException for KeyboardInterrupt safety) in both copy_on_write.py:282 and overlay.py:329, mirroring the pattern already used in rollback() (which catches Exception).


MEDIUM Severity

M1. SECURITY — TOCTOU window in safe_restore between rmdir and rename

File: _fs_utils.py:135-143
Category: Security / Race Condition

stale = tempfile.mkdtemp(prefix=".atomic-rollback-old-", dir=parent_dir)
os.rmdir(stale)           # Directory removed — name is now available
os.rename(target_path, stale)  # Another process could claim the name

The commit message states "safe_restore now uses a unique temporary name (via mkdtemp) for the stale directory instead of a predictable sibling, eliminating TOCTOU race conditions". While mkdtemp produces an unpredictable name (mitigating the symlink-attack vector), the rmdir + rename pair re-introduces a narrow TOCTOU window: between rmdir and rename, another process with write access to parent_dir could create a file or directory at the same path.

Recommendation: Instead of mkdtemp + rmdir, consider using tempfile.mkdtemp() and then os.rename(target_path, os.path.join(stale, "original")) to rename into the mkdtemp directory (which is guaranteed to exist and be owned by the current process). Alternatively, document the accepted residual risk.


M2. BUG — OverlaySandbox rollback from COMMITTED may double-mount if _unmount_overlay fails silently

File: overlay.py:421-432
Category: Bug / Resource Leak

self._unmount_overlay()    # Errors are logged but NOT raised (line 549-564)
# ... clean upper/work dirs ...
self._mount_overlay()      # Mounts on top of potentially still-mounted FS

_unmount_overlay() suppresses errors (line 559-564: logger.warning only). If the unmount fails (e.g., busy mount, permission error), the subsequent _mount_overlay() will attempt to mount OverlayFS on a still-mounted path, leading to a double-mount or mount error. The mount error would then propagate as SandboxCreationError, leaving the sandbox in an inconsistent state.

Recommendation: Either check the unmount result and bail out before remounting, or make _unmount_overlay raise on failure when called from rollback (as opposed to cleanup where suppression is appropriate).


M3. TEST FLAW — _register_tmpdir_cleanup does not actually trigger cleanup

File: features/steps/sandbox_fs_utils_steps.py:34-38
Category: Test Flaw / Resource Leak

def _register_tmpdir_cleanup(context: Any) -> None:
    tmpdir = context._fs_tmpdir
    if not hasattr(context, "_cleanup_handlers"):
        context._cleanup_handlers = []
    context._cleanup_handlers.append(lambda: shutil.rmtree(tmpdir, ignore_errors=True))

The cleanup lambdas are appended to context._cleanup_handlers, but Behave does not automatically invoke a list named _cleanup_handlers. Behave's cleanup mechanism is context.add_cleanup(fn) or explicit after_scenario hooks. These temporary directories will leak on every test run.

Recommendation: Replace with context.add_cleanup(lambda: shutil.rmtree(tmpdir, ignore_errors=True)) which Behave invokes automatically at scenario teardown.


File: _fs_utils.py:164-204
Category: Bug / Correctness

compute_diff uses os.walk to collect files and filecmp.cmp(shallow=False) to compare content. However:

  1. Symlinks are followed by os.path.isfile() in os.walk (default behavior), so they appear as regular files.
  2. If a symlink in the sandbox points to a different target than the same-named symlink in the original, filecmp.cmp compares the resolved content, not the symlink targets. Two symlinks pointing to different paths but with identical resolved content would be reported as unchanged.
  3. Conversely, backup_directory preserves symlinks as symlinks (os.readlink + os.symlink), creating a mismatch between what is backed up and what compute_diff detects.

Recommendation: Add symlink target comparison in compute_diff for entries where os.path.islink() returns True in either tree.


M5. BUG — Deleted directories not detected or cleaned up during commit

File: copy_on_write.py:277-280, overlay.py:324-327
Category: Bug / Correctness

for rel_path in deleted_files:
    dst = os.path.join(self._original_path, rel_path)
    if os.path.exists(dst):
        os.remove(dst)

compute_diff only tracks files, not directories. If an entire subdirectory is deleted in the sandbox, the individual files are removed from the original, but the now-empty parent directories remain. After commit, the original may contain stale empty directories that did not exist in the sandbox.

Recommendation: After processing deleted files, walk the deleted file paths in reverse depth order and remove empty parent directories up to (but not including) the original root.


LOW Severity

L1. PERFORMANCE — Full directory backup on every commit with changes

File: copy_on_write.py:256-265, overlay.py:303-312
Category: Performance

backup_directory copies the entire original directory even when only a single file changed. For large directories (tens of thousands of files), this is expensive. The no-change skip optimization is good, but a targeted backup (copying only files that will be overwritten) would be significantly more efficient.

Recommendation: Consider backing up only the files identified by compute_diff as changed_files + deleted_files (the files that will be modified or removed), rather than the entire directory tree. This would require updating safe_restore to perform a targeted restore.


L2. CODE QUALITY — Significant code duplication between CopyOnWriteSandbox and OverlaySandbox

File: copy_on_write.py, overlay.py
Category: Code Quality / Maintainability

The commit() method bodies (backup creation, file copy, error handling with safe_restore) and the rollback() COMMITTED path are nearly identical between the two classes. The _fs_utils extraction is a good start but the remaining duplication means bug fixes must be applied in two places (as evidenced by H2 appearing in both files).

Recommendation: Extract a shared base class or mixin (FilesystemSandboxBase) containing the common commit() and rollback() logic, with subclass hooks for strategy-specific behavior (e.g., overlay mount/unmount).


L3. TEST GAP — No test for backup_directory skipping special files (FIFOs, sockets)

File: _fs_utils.py:91-97
Category: Test Coverage

The commit message documents "backup_directory skips non-regular files (FIFOs, sockets, device files) with a warning to prevent hangs on special files", but there is no BDD scenario or step definition that creates a FIFO/socket and verifies it is skipped with a warning.

Recommendation: Add a scenario that creates a FIFO (os.mkfifo) in the source directory and verifies it is absent from the backup.


L4. TEST GAP — No test for concurrent commit_all calls

Category: Test Coverage

There is no test exercising concurrent calls to commit_all from multiple threads, which is the scenario where H1 would manifest. Given that SandboxManager documents itself as "Thread-safe via an internal reentrant lock" (line 4), this gap is notable.

Recommendation: Add a threading test that calls commit_all from two threads simultaneously and verifies no double-commits or data corruption occur.


L5. TEST GAP — No test for safe_restore with symlinked content

Category: Test Coverage

safe_restore passes symlinks=True to shutil.copytree, but no existing scenario verifies that symlinks in the backup are correctly restored as symlinks in the target.

Recommendation: Add a scenario in sandbox_fs_utils.feature that creates a backup containing symlinks and verifies they are preserved after safe_restore.


L6. DOCUMENTATION — Issue #925 body references wrong specification line

Category: Documentation

The issue body states "The specification (line 45844) requires..." but line 45844 is self._subscriptions.setdefault(event_type, []).append(handler) (EventBus code). The correct reference is line 45938: "The Apply phase is an atomic operation — either all sandbox changes are committed to the real resources, or none are." The commit message correctly uses line 45938.

Recommendation: Update issue #925 body to reference line 45938 for traceability.


Summary Table

ID Severity Category File(s) Title
H1 High Bug/Concurrency manager.py commit_all does not hold lock during commit loop
H2 High Bug/Data Integrity copy_on_write.py, overlay.py Non-OSError exceptions during file-copy leave original corrupted
M1 Medium Security _fs_utils.py TOCTOU window in safe_restore between rmdir and rename
M2 Medium Bug overlay.py Rollback may double-mount if _unmount_overlay fails silently
M3 Medium Test Flaw sandbox_fs_utils_steps.py _register_tmpdir_cleanup never triggers — temp dirs leak
M4 Medium Bug _fs_utils.py compute_diff does not detect symlink target changes
M5 Medium Bug copy_on_write.py, overlay.py Deleted directories not cleaned up during commit
L1 Low Performance copy_on_write.py, overlay.py Full directory backup even for single-file changes
L2 Low Code Quality copy_on_write.py, overlay.py Significant code duplication in commit/rollback logic
L3 Low Test Coverage _fs_utils.py No test for skipping special files (FIFOs)
L4 Low Test Coverage manager.py No test for concurrent commit_all
L5 Low Test Coverage _fs_utils.py No test for safe_restore with symlinks
L6 Low Documentation Issue #925 Wrong spec line reference in issue body
# Code Review Report — PR #1146 `fix(sandbox): make commit_all atomic per specification` **Reviewer:** Automated code review (4 full iterative cycles) **Scope:** All code changes in branch `fix/atomic-sandbox-commit` plus close connections to surrounding code **Reference:** Issue #925, specification lines 45938 (atomic apply), 19193 (partial apply), 24632 (DAG ordering) --- ## Summary Overall this is a well-structured change that correctly addresses the specification atomicity requirement (line 45938). The LIFO rollback strategy, non-rollbackable sandbox ordering, and `AtomicCommitError` exception design are sound. The `_fs_utils` extraction is a solid refactor eliminating code duplication. However, several issues were identified across 4 review cycles that should be addressed before merge. **Findings: 13 total** — 2 High, 5 Medium, 6 Low --- ## HIGH Severity ### H1. BUG — `commit_all` does not hold lock during the commit loop (Thread-safety violation) **File:** `manager.py:249-256` **Category:** Bug / Concurrency The lock is released after copying the sandbox list but **before** the status filter and the commit loop: ```python with self._lock: sandboxes = list(self._active_sandboxes.get(plan_id, {}).values()) committable = [ sb for sb in sandboxes if sb.status in (SandboxStatus.CREATED, SandboxStatus.ACTIVE) ] # ... commit loop runs without lock ... ``` Another thread could (a) modify a sandbox's status between the status check and the `sandbox.commit()` call, (b) add or remove sandboxes from the plan's tracking dict, or (c) call `commit_all` on the same plan concurrently, resulting in double-commits or missed sandboxes. **Recommendation:** Either hold the lock for the entire `commit_all` operation (acceptable since sandbox `commit()` does I/O and this prevents concurrent `commit_all` for the same plan — which is the intended atomic behavior), or re-verify each sandbox's status immediately before calling `commit()` inside the loop and document the concurrency contract. --- ### H2. BUG — Non-`OSError` exceptions during file-copy phase leave original directory partially corrupted **File:** `copy_on_write.py:282`, `overlay.py:329` **Category:** Bug / Data Integrity The outer exception handler in `commit()` only catches `OSError`: ```python except OSError as exc: if self._pre_commit_backup is not None: try: safe_restore(self._pre_commit_backup, self._original_path) ``` If a non-`OSError` exception occurs during the file-copy phase (lines 267-280 in `copy_on_write.py`, lines 314-327 in `overlay.py`) — e.g., `MemoryError`, `KeyboardInterrupt` (a `BaseException`), or an unexpected `TypeError` — the backup exists but `safe_restore` is **never called**. The original directory is left partially modified (some files copied, some not). The exception propagates to `commit_all`, which calls `_rollback_committed`, but since this sandbox was never added to the `committed` list (it raised during commit), no rollback is attempted on it. The `_pre_commit_backup` is preserved but the original remains corrupted until manual intervention. The commit message states *"Pre-commit backup exception handler catches Exception (not just OSError)"* — this is accurate for the **inner** handler around `backup_directory()` (line 262), but not for the **outer** handler that triggers `safe_restore`. **Recommendation:** Change `except OSError` to `except Exception` (or `except BaseException` for `KeyboardInterrupt` safety) in both `copy_on_write.py:282` and `overlay.py:329`, mirroring the pattern already used in `rollback()` (which catches `Exception`). --- ## MEDIUM Severity ### M1. SECURITY — TOCTOU window in `safe_restore` between `rmdir` and `rename` **File:** `_fs_utils.py:135-143` **Category:** Security / Race Condition ```python stale = tempfile.mkdtemp(prefix=".atomic-rollback-old-", dir=parent_dir) os.rmdir(stale) # Directory removed — name is now available os.rename(target_path, stale) # Another process could claim the name ``` The commit message states *"safe_restore now uses a unique temporary name (via mkdtemp) for the stale directory instead of a predictable sibling, eliminating TOCTOU race conditions"*. While `mkdtemp` produces an unpredictable name (mitigating the symlink-attack vector), the `rmdir` + `rename` pair re-introduces a narrow TOCTOU window: between `rmdir` and `rename`, another process with write access to `parent_dir` could create a file or directory at the same path. **Recommendation:** Instead of `mkdtemp` + `rmdir`, consider using `tempfile.mkdtemp()` and then `os.rename(target_path, os.path.join(stale, "original"))` to rename **into** the mkdtemp directory (which is guaranteed to exist and be owned by the current process). Alternatively, document the accepted residual risk. --- ### M2. BUG — `OverlaySandbox` rollback from COMMITTED may double-mount if `_unmount_overlay` fails silently **File:** `overlay.py:421-432` **Category:** Bug / Resource Leak ```python self._unmount_overlay() # Errors are logged but NOT raised (line 549-564) # ... clean upper/work dirs ... self._mount_overlay() # Mounts on top of potentially still-mounted FS ``` `_unmount_overlay()` suppresses errors (line 559-564: `logger.warning` only). If the unmount fails (e.g., busy mount, permission error), the subsequent `_mount_overlay()` will attempt to mount OverlayFS on a still-mounted path, leading to a double-mount or mount error. The mount error would then propagate as `SandboxCreationError`, leaving the sandbox in an inconsistent state. **Recommendation:** Either check the unmount result and bail out before remounting, or make `_unmount_overlay` raise on failure when called from rollback (as opposed to cleanup where suppression is appropriate). --- ### M3. TEST FLAW — `_register_tmpdir_cleanup` does not actually trigger cleanup **File:** `features/steps/sandbox_fs_utils_steps.py:34-38` **Category:** Test Flaw / Resource Leak ```python def _register_tmpdir_cleanup(context: Any) -> None: tmpdir = context._fs_tmpdir if not hasattr(context, "_cleanup_handlers"): context._cleanup_handlers = [] context._cleanup_handlers.append(lambda: shutil.rmtree(tmpdir, ignore_errors=True)) ``` The cleanup lambdas are appended to `context._cleanup_handlers`, but Behave does not automatically invoke a list named `_cleanup_handlers`. Behave's cleanup mechanism is `context.add_cleanup(fn)` or explicit `after_scenario` hooks. These temporary directories will leak on every test run. **Recommendation:** Replace with `context.add_cleanup(lambda: shutil.rmtree(tmpdir, ignore_errors=True))` which Behave invokes automatically at scenario teardown. --- ### M4. BUG — `compute_diff` does not detect symlink target changes **File:** `_fs_utils.py:164-204` **Category:** Bug / Correctness `compute_diff` uses `os.walk` to collect files and `filecmp.cmp(shallow=False)` to compare content. However: 1. Symlinks are followed by `os.path.isfile()` in `os.walk` (default behavior), so they appear as regular files. 2. If a symlink in the sandbox points to a different target than the same-named symlink in the original, `filecmp.cmp` compares the resolved content, not the symlink targets. Two symlinks pointing to different paths but with identical resolved content would be reported as unchanged. 3. Conversely, `backup_directory` preserves symlinks as symlinks (`os.readlink` + `os.symlink`), creating a mismatch between what is backed up and what `compute_diff` detects. **Recommendation:** Add symlink target comparison in `compute_diff` for entries where `os.path.islink()` returns `True` in either tree. --- ### M5. BUG — Deleted directories not detected or cleaned up during commit **File:** `copy_on_write.py:277-280`, `overlay.py:324-327` **Category:** Bug / Correctness ```python for rel_path in deleted_files: dst = os.path.join(self._original_path, rel_path) if os.path.exists(dst): os.remove(dst) ``` `compute_diff` only tracks **files**, not directories. If an entire subdirectory is deleted in the sandbox, the individual files are removed from the original, but the now-empty parent directories remain. After commit, the original may contain stale empty directories that did not exist in the sandbox. **Recommendation:** After processing deleted files, walk the deleted file paths in reverse depth order and remove empty parent directories up to (but not including) the original root. --- ## LOW Severity ### L1. PERFORMANCE — Full directory backup on every commit with changes **File:** `copy_on_write.py:256-265`, `overlay.py:303-312` **Category:** Performance `backup_directory` copies the **entire** original directory even when only a single file changed. For large directories (tens of thousands of files), this is expensive. The no-change skip optimization is good, but a targeted backup (copying only files that will be overwritten) would be significantly more efficient. **Recommendation:** Consider backing up only the files identified by `compute_diff` as `changed_files` + `deleted_files` (the files that will be modified or removed), rather than the entire directory tree. This would require updating `safe_restore` to perform a targeted restore. --- ### L2. CODE QUALITY — Significant code duplication between `CopyOnWriteSandbox` and `OverlaySandbox` **File:** `copy_on_write.py`, `overlay.py` **Category:** Code Quality / Maintainability The `commit()` method bodies (backup creation, file copy, error handling with `safe_restore`) and the `rollback()` COMMITTED path are nearly identical between the two classes. The `_fs_utils` extraction is a good start but the remaining duplication means bug fixes must be applied in two places (as evidenced by H2 appearing in both files). **Recommendation:** Extract a shared base class or mixin (`FilesystemSandboxBase`) containing the common `commit()` and `rollback()` logic, with subclass hooks for strategy-specific behavior (e.g., overlay mount/unmount). --- ### L3. TEST GAP — No test for `backup_directory` skipping special files (FIFOs, sockets) **File:** `_fs_utils.py:91-97` **Category:** Test Coverage The commit message documents *"backup_directory skips non-regular files (FIFOs, sockets, device files) with a warning to prevent hangs on special files"*, but there is no BDD scenario or step definition that creates a FIFO/socket and verifies it is skipped with a warning. **Recommendation:** Add a scenario that creates a FIFO (`os.mkfifo`) in the source directory and verifies it is absent from the backup. --- ### L4. TEST GAP — No test for concurrent `commit_all` calls **Category:** Test Coverage There is no test exercising concurrent calls to `commit_all` from multiple threads, which is the scenario where H1 would manifest. Given that `SandboxManager` documents itself as *"Thread-safe via an internal reentrant lock"* (line 4), this gap is notable. **Recommendation:** Add a threading test that calls `commit_all` from two threads simultaneously and verifies no double-commits or data corruption occur. --- ### L5. TEST GAP — No test for `safe_restore` with symlinked content **Category:** Test Coverage `safe_restore` passes `symlinks=True` to `shutil.copytree`, but no existing scenario verifies that symlinks in the backup are correctly restored as symlinks in the target. **Recommendation:** Add a scenario in `sandbox_fs_utils.feature` that creates a backup containing symlinks and verifies they are preserved after `safe_restore`. --- ### L6. DOCUMENTATION — Issue #925 body references wrong specification line **Category:** Documentation The issue body states *"The specification (line 45844) requires..."* but line 45844 is `self._subscriptions.setdefault(event_type, []).append(handler)` (EventBus code). The correct reference is **line 45938**: *"The Apply phase is an atomic operation — either all sandbox changes are committed to the real resources, or none are."* The commit message correctly uses line 45938. **Recommendation:** Update issue #925 body to reference line 45938 for traceability. --- ## Summary Table | ID | Severity | Category | File(s) | Title | |----|----------|----------|---------|-------| | H1 | **High** | Bug/Concurrency | `manager.py` | `commit_all` does not hold lock during commit loop | | H2 | **High** | Bug/Data Integrity | `copy_on_write.py`, `overlay.py` | Non-OSError exceptions during file-copy leave original corrupted | | M1 | Medium | Security | `_fs_utils.py` | TOCTOU window in `safe_restore` between `rmdir` and `rename` | | M2 | Medium | Bug | `overlay.py` | Rollback may double-mount if `_unmount_overlay` fails silently | | M3 | Medium | Test Flaw | `sandbox_fs_utils_steps.py` | `_register_tmpdir_cleanup` never triggers — temp dirs leak | | M4 | Medium | Bug | `_fs_utils.py` | `compute_diff` does not detect symlink target changes | | M5 | Medium | Bug | `copy_on_write.py`, `overlay.py` | Deleted directories not cleaned up during commit | | L1 | Low | Performance | `copy_on_write.py`, `overlay.py` | Full directory backup even for single-file changes | | L2 | Low | Code Quality | `copy_on_write.py`, `overlay.py` | Significant code duplication in commit/rollback logic | | L3 | Low | Test Coverage | `_fs_utils.py` | No test for skipping special files (FIFOs) | | L4 | Low | Test Coverage | `manager.py` | No test for concurrent `commit_all` | | L5 | Low | Test Coverage | `_fs_utils.py` | No test for `safe_restore` with symlinks | | L6 | Low | Documentation | Issue #925 | Wrong spec line reference in issue body |
CoreRasurae force-pushed fix/atomic-sandbox-commit from 7a47e9dcdc
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 59s
CI / lint (pull_request) Successful in 3m22s
CI / quality (pull_request) Successful in 3m48s
CI / security (pull_request) Successful in 4m14s
CI / build (pull_request) Successful in 17s
CI / unit_tests (pull_request) Successful in 4m20s
CI / docker (pull_request) Successful in 1m25s
CI / integration_tests (pull_request) Successful in 4m4s
CI / e2e_tests (pull_request) Successful in 5m34s
CI / coverage (pull_request) Successful in 11m16s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 54m5s
to 1a7ebf1825
Some checks failed
CI / lint (pull_request) Successful in 3m23s
CI / quality (pull_request) Successful in 3m47s
CI / typecheck (pull_request) Successful in 4m1s
CI / security (pull_request) Successful in 4m9s
CI / build (pull_request) Successful in 19s
CI / integration_tests (pull_request) Successful in 3m42s
CI / unit_tests (pull_request) Successful in 7m3s
CI / docker (pull_request) Successful in 56s
CI / coverage (pull_request) Successful in 7m58s
CI / e2e_tests (pull_request) Successful in 8m24s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-29 10:12:53 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146 fix(sandbox): make commit_all atomic per specification

Review scope: All code changes in branch fix/atomic-sandbox-commit plus close connections to surrounding code.
Review method: 3 iterative global passes across all categories (bugs, security, performance, test coverage/flaws). Each cycle re-examined all categories until no new issues were found.
Specification references: docs/specification.md lines 45938 (atomic apply invariant), 19193 (partial apply contradiction), 24632 (DAG ordering), 19677-19761 (Apply phase), 28636-28663 (mid-execute correction).


Overall Assessment

The PR delivers a well-thought-out implementation of the atomic commit_all semantics required by spec line 45938. The rollbackable-first / non-rollbackable-last commit ordering, LIFO rollback, AtomicCommitError wrapping, and the shared _fs_utils extraction are all sound design choices. The spec contradiction (line 45938 vs 19193) is correctly documented. The safe_restore rename-into-mkdtemp approach eliminating the TOCTOU window is particularly well done.

The findings below are organized by severity (High → Low) within each category.


1. Bugs

[BUG-1] CopyOnWriteSandbox.rollback() from ACTIVE uses dirs_exist_ok=False — inconsistent with OverlaySandbox fix | Medium

File: src/cleveragents/infrastructure/sandbox/copy_on_write.py:377

The commit message states: "OverlaySandbox rollback from ACTIVE now uses dirs_exist_ok=True for userspace fallback to prevent FileExistsError when rmtree with ignore_errors=True silently fails."

This fix was applied in OverlaySandbox.rollback() (overlay.py:489) but not in CopyOnWriteSandbox.rollback() (copy_on_write.py:377), which still uses dirs_exist_ok=False. The same failure mode applies: shutil.rmtree(self._sandbox_path, ignore_errors=True) at line 370 can silently fail (e.g., permission error on a file), leaving the directory partially intact. The subsequent shutil.copytree(..., dirs_exist_ok=False) at line 377 would then raise FileExistsError.

Recommendation: Change copy_on_write.py:377 to dirs_exist_ok=True for consistency.


[BUG-2] OverlaySandbox.rollback() from COMMITTED over-reports failure when merged dir reset fails after original was restored | Medium

File: src/cleveragents/infrastructure/sandbox/overlay.py:404-496

When rolling back from COMMITTED:

  1. safe_restore() successfully restores the original directory (line 419)
  2. The merged directory reset (unmount/remount for real OverlayFS, or rmtree/copytree for userspace) fails
  3. The exception handler at line 492 sets self._status = SandboxStatus.ERRORED and raises SandboxRollbackError
  4. _rollback_committed() adds this sandbox to failed_rollback_ids

The critical data (original directory) was correctly restored, but the error metadata reports this sandbox as a failed rollback. Callers of commit_all() cannot distinguish between "original not restored" (actual data loss risk) and "original restored but sandbox workspace is inconsistent" (no data loss).

Recommendation: Consider splitting the rollback into two phases: (1) restore original (critical), (2) reset workspace (non-critical). Only report rollback failure if phase 1 fails.


[BUG-3] AtomicCommitError not exported from sandbox/__init__.py | Medium

File: src/cleveragents/infrastructure/sandbox/__init__.py

AtomicCommitError is a new public exception that callers need to catch when commit_all() encounters a non-SandboxError exception. It is defined in protocol.py and raised by manager.py, but it is not imported or included in __all__ in the package's __init__.py.

Callers must use from cleveragents.infrastructure.sandbox.protocol import AtomicCommitError instead of the expected from cleveragents.infrastructure.sandbox import AtomicCommitError.

Recommendation: Add AtomicCommitError to the imports and __all__ in __init__.py.


2. Test Coverage Gaps

[TEST-1] Missing test for CopyOnWriteSandbox.rollback() from COMMITTED state | Medium

The PR adds a BDD scenario for OverlaySandbox rollback from COMMITTED (overlay_sandbox.feature:200-210) but there is no corresponding test for CopyOnWriteSandbox.rollback() from COMMITTED with a real pre-commit backup. This is a critical new code path (copy_on_write.py:352-367) that calls safe_restore() and is untested with a real sandbox instance.

The manager-level BDD tests use mocks, which don't exercise the actual safe_restore integration in CopyOnWriteSandbox.


[TEST-2] Missing test for GitWorktreeSandbox.rollback() from COMMITTED state | Medium

No BDD or Robot test exercises the git reset --hard <pre_merge_commit> rollback path in git_worktree.py:497-506. This is a destructive operation (hard reset on the original branch) introduced by this PR. The commit message includes a docstring warning about multi-worktree safety, but there is no test verifying the merge is actually undone.


[TEST-3] Missing test for TransactionSandbox.rollback() from COMMITTED raising SandboxRollbackError | Medium

TransactionSandbox.rollback() now raises SandboxRollbackError when called from COMMITTED state (transaction_sandbox.py:284-292). The existing test TransactionSandbox rollback in wrong state raises (database_resources.feature:186) tests rollback from PENDING state, not from COMMITTED state. The new raising behavior is untested.


[TEST-4] Robot integration test uses mocks instead of real sandboxes | Medium

File: robot/helper_sandbox_integration.py:336-405

The _manager_atomic_commit() function is labeled as an integration test but replaces both sandboxes with MagicMock objects. This does not exercise the actual commit/rollback code paths of any real sandbox implementation. Only the OverlaySandbox COMMITTED rollback scenario tests with a real sandbox.


[TEST-5] Orphaned step definitions in sandbox_fs_utils_steps.py | Low

File: features/steps/sandbox_fs_utils_steps.py:108-124, 272-277

Two step definitions are defined but not used by any .feature file:

  • @given("a stale atomic-rollback-old directory exists") (line 108)
  • @then("no stale rollback directory should remain") (line 272)

Additionally, these steps reference the old ".atomic-rollback-old" naming pattern, but safe_restore now uses ".atomic-rollback-old-" prefix with mkdtemp (different naming scheme). These appear to be leftovers from an earlier design iteration.


[TEST-6] No direct test for compute_diff() in _fs_utils | Low

compute_diff() was extracted from both CopyOnWriteSandbox and OverlaySandbox into _fs_utils. While it is tested indirectly through commit scenarios, there is no BDD scenario in sandbox_fs_utils.feature that directly tests it, unlike backup_directory and safe_restore which have dedicated scenarios.


3. Performance

[PERF-1] Full directory backup before every commit with changes | Medium

Files: copy_on_write.py:252-264, overlay.py:299-312, _fs_utils.py:21-107

For every commit that detects changes, backup_directory() copies the entire original directory tree. For large repositories (thousands of files, deep trees), this is O(n) in both disk space and time. The same-filesystem mkdtemp optimization (avoiding cross-device copies) and the no-change skip are good mitigations, but the backup cost is still proportional to the total directory size regardless of how many files actually changed.

Recommendation: Consider a targeted backup strategy that only backs up files in the changed_files + deleted_files lists. This would reduce backup time and disk usage proportionally to the actual change set. Alternatively, document the performance characteristic for large directory scenarios.


4. Design Observations (Non-blocking)

[DESIGN-1] Specification contradiction documented but not resolved

The commit_all docstring (manager.py:215-220) correctly documents the contradiction between spec line 45938 (atomic) and line 19193 (partial apply). This PR implements the atomicity requirement. A spec update for line 19193 may be needed to align the specification with the implementation.

[DESIGN-2] LIFO rollback ordering vs spec DAG ordering

The _rollback_committed docstring (manager.py:367-373) correctly distinguishes LIFO (batch undo) from the spec's "top-down" DAG ordering (line 24632). This is a valid architectural note. The spec's DAG ordering applies within a single sandbox domain; LIFO applies to the batch of independent sandboxes.


Summary: 3 medium bugs, 4 medium test gaps, 1 medium performance note, 3 low-severity items. No high-severity or security issues found.

## Code Review Report — PR #1146 `fix(sandbox): make commit_all atomic per specification` **Review scope:** All code changes in branch `fix/atomic-sandbox-commit` plus close connections to surrounding code. **Review method:** 3 iterative global passes across all categories (bugs, security, performance, test coverage/flaws). Each cycle re-examined all categories until no new issues were found. **Specification references:** `docs/specification.md` lines 45938 (atomic apply invariant), 19193 (partial apply contradiction), 24632 (DAG ordering), 19677-19761 (Apply phase), 28636-28663 (mid-execute correction). --- ### Overall Assessment The PR delivers a well-thought-out implementation of the atomic `commit_all` semantics required by spec line 45938. The rollbackable-first / non-rollbackable-last commit ordering, LIFO rollback, `AtomicCommitError` wrapping, and the shared `_fs_utils` extraction are all sound design choices. The spec contradiction (line 45938 vs 19193) is correctly documented. The `safe_restore` rename-into-mkdtemp approach eliminating the TOCTOU window is particularly well done. The findings below are organized by severity (High → Low) within each category. --- ## 1. Bugs ### [BUG-1] `CopyOnWriteSandbox.rollback()` from ACTIVE uses `dirs_exist_ok=False` — inconsistent with OverlaySandbox fix | **Medium** **File:** `src/cleveragents/infrastructure/sandbox/copy_on_write.py:377` The commit message states: *"OverlaySandbox rollback from ACTIVE now uses `dirs_exist_ok=True` for userspace fallback to prevent `FileExistsError` when `rmtree` with `ignore_errors=True` silently fails."* This fix was applied in `OverlaySandbox.rollback()` (`overlay.py:489`) but **not** in `CopyOnWriteSandbox.rollback()` (`copy_on_write.py:377`), which still uses `dirs_exist_ok=False`. The same failure mode applies: `shutil.rmtree(self._sandbox_path, ignore_errors=True)` at line 370 can silently fail (e.g., permission error on a file), leaving the directory partially intact. The subsequent `shutil.copytree(..., dirs_exist_ok=False)` at line 377 would then raise `FileExistsError`. **Recommendation:** Change `copy_on_write.py:377` to `dirs_exist_ok=True` for consistency. --- ### [BUG-2] `OverlaySandbox.rollback()` from COMMITTED over-reports failure when merged dir reset fails after original was restored | **Medium** **File:** `src/cleveragents/infrastructure/sandbox/overlay.py:404-496` When rolling back from COMMITTED: 1. `safe_restore()` successfully restores the original directory (line 419) 2. The merged directory reset (unmount/remount for real OverlayFS, or rmtree/copytree for userspace) fails 3. The exception handler at line 492 sets `self._status = SandboxStatus.ERRORED` and raises `SandboxRollbackError` 4. `_rollback_committed()` adds this sandbox to `failed_rollback_ids` The critical data (original directory) was correctly restored, but the error metadata reports this sandbox as a **failed rollback**. Callers of `commit_all()` cannot distinguish between "original not restored" (actual data loss risk) and "original restored but sandbox workspace is inconsistent" (no data loss). **Recommendation:** Consider splitting the rollback into two phases: (1) restore original (critical), (2) reset workspace (non-critical). Only report rollback failure if phase 1 fails. --- ### [BUG-3] `AtomicCommitError` not exported from `sandbox/__init__.py` | **Medium** **File:** `src/cleveragents/infrastructure/sandbox/__init__.py` `AtomicCommitError` is a new public exception that callers need to catch when `commit_all()` encounters a non-`SandboxError` exception. It is defined in `protocol.py` and raised by `manager.py`, but it is **not** imported or included in `__all__` in the package's `__init__.py`. Callers must use `from cleveragents.infrastructure.sandbox.protocol import AtomicCommitError` instead of the expected `from cleveragents.infrastructure.sandbox import AtomicCommitError`. **Recommendation:** Add `AtomicCommitError` to the imports and `__all__` in `__init__.py`. --- ## 2. Test Coverage Gaps ### [TEST-1] Missing test for `CopyOnWriteSandbox.rollback()` from COMMITTED state | **Medium** The PR adds a BDD scenario for `OverlaySandbox` rollback from COMMITTED (`overlay_sandbox.feature:200-210`) but there is **no corresponding test** for `CopyOnWriteSandbox.rollback()` from COMMITTED with a real pre-commit backup. This is a critical new code path (`copy_on_write.py:352-367`) that calls `safe_restore()` and is untested with a real sandbox instance. The manager-level BDD tests use mocks, which don't exercise the actual `safe_restore` integration in `CopyOnWriteSandbox`. --- ### [TEST-2] Missing test for `GitWorktreeSandbox.rollback()` from COMMITTED state | **Medium** No BDD or Robot test exercises the `git reset --hard <pre_merge_commit>` rollback path in `git_worktree.py:497-506`. This is a destructive operation (hard reset on the original branch) introduced by this PR. The commit message includes a docstring warning about multi-worktree safety, but there is no test verifying the merge is actually undone. --- ### [TEST-3] Missing test for `TransactionSandbox.rollback()` from COMMITTED raising `SandboxRollbackError` | **Medium** `TransactionSandbox.rollback()` now raises `SandboxRollbackError` when called from COMMITTED state (`transaction_sandbox.py:284-292`). The existing test `TransactionSandbox rollback in wrong state raises` (`database_resources.feature:186`) tests rollback from PENDING state, not from COMMITTED state. The new raising behavior is untested. --- ### [TEST-4] Robot integration test uses mocks instead of real sandboxes | **Medium** **File:** `robot/helper_sandbox_integration.py:336-405` The `_manager_atomic_commit()` function is labeled as an integration test but replaces both sandboxes with `MagicMock` objects. This does not exercise the actual commit/rollback code paths of any real sandbox implementation. Only the `OverlaySandbox` COMMITTED rollback scenario tests with a real sandbox. --- ### [TEST-5] Orphaned step definitions in `sandbox_fs_utils_steps.py` | **Low** **File:** `features/steps/sandbox_fs_utils_steps.py:108-124, 272-277` Two step definitions are defined but not used by any `.feature` file: - `@given("a stale atomic-rollback-old directory exists")` (line 108) - `@then("no stale rollback directory should remain")` (line 272) Additionally, these steps reference the old `".atomic-rollback-old"` naming pattern, but `safe_restore` now uses `".atomic-rollback-old-"` prefix with mkdtemp (different naming scheme). These appear to be leftovers from an earlier design iteration. --- ### [TEST-6] No direct test for `compute_diff()` in `_fs_utils` | **Low** `compute_diff()` was extracted from both `CopyOnWriteSandbox` and `OverlaySandbox` into `_fs_utils`. While it is tested indirectly through commit scenarios, there is no BDD scenario in `sandbox_fs_utils.feature` that directly tests it, unlike `backup_directory` and `safe_restore` which have dedicated scenarios. --- ## 3. Performance ### [PERF-1] Full directory backup before every commit with changes | **Medium** **Files:** `copy_on_write.py:252-264`, `overlay.py:299-312`, `_fs_utils.py:21-107` For every commit that detects changes, `backup_directory()` copies the **entire** original directory tree. For large repositories (thousands of files, deep trees), this is O(n) in both disk space and time. The same-filesystem mkdtemp optimization (avoiding cross-device copies) and the no-change skip are good mitigations, but the backup cost is still proportional to the total directory size regardless of how many files actually changed. **Recommendation:** Consider a targeted backup strategy that only backs up files in the `changed_files + deleted_files` lists. This would reduce backup time and disk usage proportionally to the actual change set. Alternatively, document the performance characteristic for large directory scenarios. --- ## 4. Design Observations (Non-blocking) ### [DESIGN-1] Specification contradiction documented but not resolved The `commit_all` docstring (`manager.py:215-220`) correctly documents the contradiction between spec line 45938 (atomic) and line 19193 (partial apply). This PR implements the atomicity requirement. A spec update for line 19193 may be needed to align the specification with the implementation. ### [DESIGN-2] LIFO rollback ordering vs spec DAG ordering The `_rollback_committed` docstring (`manager.py:367-373`) correctly distinguishes LIFO (batch undo) from the spec's "top-down" DAG ordering (line 24632). This is a valid architectural note. The spec's DAG ordering applies within a single sandbox domain; LIFO applies to the batch of independent sandboxes. --- **Summary: 3 medium bugs, 4 medium test gaps, 1 medium performance note, 3 low-severity items. No high-severity or security issues found.**
CoreRasurae force-pushed fix/atomic-sandbox-commit from 1a7ebf1825
Some checks failed
CI / lint (pull_request) Successful in 3m23s
CI / quality (pull_request) Successful in 3m47s
CI / typecheck (pull_request) Successful in 4m1s
CI / security (pull_request) Successful in 4m9s
CI / build (pull_request) Successful in 19s
CI / integration_tests (pull_request) Successful in 3m42s
CI / unit_tests (pull_request) Successful in 7m3s
CI / docker (pull_request) Successful in 56s
CI / coverage (pull_request) Successful in 7m58s
CI / e2e_tests (pull_request) Successful in 8m24s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
to 614c350433
All checks were successful
CI / lint (pull_request) Successful in 28s
CI / quality (pull_request) Successful in 56s
CI / build (pull_request) Successful in 28s
CI / integration_tests (pull_request) Successful in 3m46s
CI / unit_tests (pull_request) Successful in 4m5s
CI / security (pull_request) Successful in 4m9s
CI / typecheck (pull_request) Successful in 4m14s
CI / docker (pull_request) Successful in 55s
CI / e2e_tests (pull_request) Successful in 6m16s
CI / coverage (pull_request) Successful in 11m11s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 53m47s
2026-03-29 10:53:24 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146 fix/atomic-sandbox-commit

Reviewer: AI-assisted review (4 full global cycles across all categories)
Scope: All code changes in branch fix/atomic-sandbox-commit plus close connections to surrounding code
Issue: #925commit_all must be all-or-nothing atomic
Spec reference: docs/specification.md (lines 45938, 19193, 24632)


Overall Assessment

This is a substantial and well-thought-out change. The commit message is exceptionally thorough, documenting every design decision. The atomicity guarantee for commit_all is correctly implemented for the single-threaded case, and the LIFO rollback pattern is sound. The new _fs_utils module with backup_directory/safe_restore/compute_diff is a solid extraction that eliminates duplication. The BDD coverage is extensive.

That said, 4 review cycles across all categories surfaced findings that warrant attention, organized below by severity and category.


SEVERITY: HIGH

H1 — Bug: OverlaySandbox.rollback() from COMMITTED with no backup unnecessarily resets merged directory

File: overlay.py:404-466

When commit() detects no changes, it skips the pre-commit backup (_pre_commit_backup remains None). On rollback from COMMITTED, the no-backup path correctly logs a debug no-op message (line 410). However, the merged-directory reset block at lines 425-466 always executes regardless of whether a backup existed.

For real OverlayFS mode, this attempts umount (line 434) on a live mount that was never modified. If umount fails (e.g., busy mount, stale NFS handle), SandboxRollbackError is raised — turning what should be a harmless no-op rollback into a failure. In the commit_all atomic recovery path, this would place the sandbox in failed_rollback_ids, producing a false alarm.

For userspace fallback, this deletes and re-copies the entire merged directory unnecessarily.

Suggested fix: Guard the merged-dir reset with the same _pre_commit_backup is not None condition, or use an else clause, matching the pattern already used in CopyOnWriteSandbox.rollback().

if self._pre_commit_backup is None:
    logger.debug("...")
else:
    safe_restore(...)
    self._pre_commit_backup = None
    # Reset merged dir only when changes were actually restored
    if self._merged_dir is not None:
        ...

H2 — Bug: Dual error-reporting paths in commit_all create asymmetric API

File: manager.py:296-346

When a sandbox commit() raises SandboxError, the method returns a CommitResult list with success=False and rollback metadata in result.metadata["rolled_back"]. When it raises a non-SandboxError exception, the method raises AtomicCommitError with rollback metadata in exc.rolled_back_ids.

Callers must handle both a return-value error path and an exception path, with rollback metadata in structurally different locations (dict["rolled_back"] vs list attribute). This is error-prone: a caller that only checks the return value will miss AtomicCommitError, and vice versa.

Suggested fix: Consider one of:

  • Always raise AtomicCommitError (or a SandboxCommitError subclass) on failure, carrying all metadata in exception attributes.
  • Always return a CommitResult list, wrapping non-SandboxError exceptions in the result metadata.

H3 — Bug: CopyOnWriteSandbox.get_path() rejects ROLLED_BACK status, contradicting its own rollback docstring

File: copy_on_write.py:191-194 vs copy_on_write.py:330-337

The rollback() docstring (introduced in this commit) states: "The sandbox transitions to ROLLED_BACK and can be re-activated via get_path." But get_path() only accepts CREATED and ACTIVE statuses — ROLLED_BACK is rejected with SandboxStateError.

This is inconsistent with OverlaySandbox.get_path() which correctly accepts ROLLED_BACK (line 238-242). The status transition table also allows ROLLED_BACK -> ACTIVE.

Impact: After a COMMITTED rollback via commit_all, the CopyOnWriteSandbox cannot be re-activated. While commit_all doesn't attempt re-activation (so atomicity is not broken), the docstring promise is violated and the behavioral inconsistency between sandbox implementations could cause issues for future callers.

Suggested fix: Either add SandboxStatus.ROLLED_BACK to the allowed statuses in CopyOnWriteSandbox.get_path(), or correct the docstring to state that re-activation is not supported.


H4 — Bug: commit_all non-rollbackable classification only handles built-in types

File: manager.py:264-287

The rollbackable/non-rollbackable partitioning uses isinstance(sb, NoSandbox) and isinstance(sb, TransactionSandbox). Custom sandbox strategies registered via SandboxStrategyRegistry that cannot support rollback-from-COMMITTED would be classified as rollbackable and committed early in the batch. If a later sandbox fails, their rollback would also fail — but the error metadata would misleadingly list them in failed_rollback_ids rather than rolled_back being empty.

Suggested fix: Consider adding a protocol method or property (e.g., supports_committed_rollback: bool) to the Sandbox protocol so the classification is strategy-agnostic.


H5 — Bug: Thread-safety gap in commit_all

File: manager.py:249-348

The method acquires _lock only to snapshot the sandbox list (line 249-250), then releases it. The entire commit loop, including _rollback_committed, runs outside the lock. If two threads call commit_all for the same plan_id concurrently:

  1. Both threads get the same snapshot of committable sandboxes.
  2. Both attempt to commit the same sandbox objects.
  3. The second thread's commit() call would hit SandboxStateError (sandbox already COMMITTED), triggering rollback of any sandboxes it had already committed.

The class docstring says "Thread-safe: all mutable state is protected by _lock", but the sandbox objects themselves are mutable and not protected.

Suggested fix: Hold the lock for the entire commit_all operation, or use a per-plan lock to serialize commit operations for the same plan. Alternatively, document that commit_all is not safe for concurrent calls on the same plan_id.


SEVERITY: MEDIUM

M1 — Bug: OverlaySandbox.rollback() double-wraps SandboxRollbackError

File: overlay.py:433-449 and overlay.py:492-496

When the umount call fails during COMMITTED rollback, the inner except block raises SandboxRollbackError (line 445). This propagates to the outer except Exception (line 492) which catches it and wraps it in another SandboxRollbackError, producing a confusing double-wrapped error chain:

SandboxRollbackError("Failed to rollback overlay sandbox...")
  __cause__ = SandboxRollbackError("Cannot remount overlay...unmount failed")
    __cause__ = CalledProcessError(...)

Suggested fix: Either re-raise SandboxRollbackError without wrapping in the outer handler, or catch only non-SandboxRollbackError exceptions in the outer block.


M2 — Bug: CopyOnWriteSandbox.rollback() from COMMITTED doesn't reset sandbox copy

File: copy_on_write.py:351-367

After rollback from COMMITTED, safe_restore restores the original directory, but the sandbox copy at _sandbox_path still contains the pre-rollback modifications. Although get_path() currently blocks re-activation (H3), if H3 is fixed, the stale sandbox copy would cause previously committed changes to reappear in compute_diff on the next commit cycle.

For consistency with OverlaySandbox (which resets its merged directory after COMMITTED rollback), the sandbox copy should be reset from the restored original.


M3 — Bug: rollback_all catches only SandboxError, inconsistent with _rollback_committed

File: manager.py:434 vs manager.py:398

_rollback_committed catches Exception (line 398), but rollback_all catches only SandboxError (line 434). An unexpected non-SandboxError exception in rollback_all would crash the caller and prevent remaining sandboxes from being rolled back.

Suggested fix: Use Exception in rollback_all for consistency, or document the different behavior.


M4 — Performance: compute_diff walks both directory trees fully with deep comparison

File: _fs_utils.py:165-205

compute_diff walks both sandbox and original directory trees to build complete file sets, then calls filecmp.cmp(shallow=False) on every common file (reading full contents). For large directories with few changes, this is significantly more expensive than necessary.

For commit-time use, a change-tracking approach (dirty flags, inotify, or git status) would avoid the full tree walk. This is a pre-existing pattern that was extracted into _fs_utils, but worth flagging since the extraction makes it a shared concern.


File: overlay.py:183-188

The userspace copytree at creation uses symlinks=False, following symlinks instead of preserving them. This means the merged directory has regular files where the original has symlinks. On commit, compute_diff compares file contents (equal), but the file type differs. More critically, shutil.copy2 during commit would copy the regular file back, replacing the original's symlink with a regular file — a silent behavioral change.

This is a pre-existing issue not introduced by this commit, but compute_diff was extracted to _fs_utils as part of this change, making it a shared concern.


M6 — Test Quality: Integration test doesn't verify rollback mock was called

File: robot/helper_sandbox_integration.py:336-402

The _manager_atomic_commit integration test verifies CommitResult metadata contains rolled_back, but does not assert that the mock sandbox's rollback() method was actually called. If the manager populated the metadata without calling rollback, the test would still pass.

Suggested fix: Add mock_ok.rollback.assert_called_once() after the commit_all call.


M7 — Test Quality: Committable mock sandboxes don't simulate status transitions

File: features/steps/sandbox_manager_coverage_steps.py:178-211

The _make_mock_sandbox helper creates mocks with a static ACTIVE status that never changes to COMMITTED after commit(). While this doesn't break current manager tests (the manager doesn't re-check status post-commit), it means the mocks don't accurately simulate real sandbox behavior and could mask bugs where post-commit status checks are added.


SEVERITY: LOW

File: _fs_utils.py:86-88

backup_directory calls os.readlink for symlinks but doesn't handle the case where the symlink target doesn't exist (broken symlink). No BDD scenario covers this edge case. The function would preserve the broken symlink (which is correct), but compute_diff would fail with filecmp.cmp on broken symlinks.


L2 — Test Coverage: No test for backup_directory skipping non-regular files

File: _fs_utils.py:91-97

The code logs a warning and skips FIFOs, sockets, and device files, but no BDD scenario verifies this behavior. The feature file sandbox_fs_utils.feature covers permissions, timestamps, symlinks, and restore — but not special file skipping.


L3 — Test Coverage: No concurrency test for SandboxManager

The manager class is documented as thread-safe, but no test exercises concurrent operations. Given the thread-safety gap in H5, adding a concurrency test would help validate the guarantees.


L4 — Test Quality: Private attribute access in transaction sandbox steps

File: features/steps/transaction_sandbox_coverage_steps.py (lines 143, 181, 255)

Steps like sb._connection = None and sb._connection = MagicMock(...) access private attributes, creating tight coupling to the internal implementation. If the attribute name changes, tests break silently. Consider adding a test-only method or using dependency injection for the connection.


L5 — Test Quality: String-based error assertions are fragile

File: features/steps/transaction_sandbox_coverage_steps.py:241, 274

Assertions like "cannot undo database commit" in str(context.caught_error).lower() break if error message wording changes. Consider asserting on exception type instead.


L6 — Documentation: CHANGELOG entry has formatting inconsistencies

File: CHANGELOG.md (lines 34, 56 of the diff)

Two lines have inconsistent leading whitespace (three spaces instead of two):

  • " backup_directory skips non-regular files..." (extra leading space)
  • " on non-OSError failures..." (extra leading space)

L7 — Security: backup_directory doesn't preserve file ownership

File: _fs_utils.py:215-223

The _copy_file function preserves permissions and timestamps but not file ownership (uid/gid). When running as root (real OverlayFS mode) with varying file ownership, the backup would have all files owned by root. On restore, original ownership would be lost. This is unlikely to matter in most deployments but could affect root-mode OverlayFS usage.


INFORMATIONAL NOTES

I1 — Specification Contradiction (Acknowledged)

The commit correctly notes the spec contradiction between line 45938 ("atomic apply — all or none") and line 19193 ("partial apply"). The implementation follows 45938 (atomicity). The docstring at manager.py:215-220 documents this. Consider filing a spec issue to resolve the contradiction.

I2 — LIFO vs Spec's DAG-based Ordering

The _rollback_committed method uses LIFO ordering for batch undo, while the spec's "top-down" rollback (line 24632) refers to resource DAG traversal within a single sandbox domain. The commit message correctly distinguishes these two contexts. The docstring at manager.py:363-373 documents this distinction.


Summary: 5 High, 7 Medium, 7 Low findings across 4 review cycles. The core atomicity logic is sound. The most actionable items are H1 (OverlaySandbox no-op rollback bug), H2 (dual error paths), and H3 (docstring/behavior mismatch). The implementation overall reflects careful engineering with thorough consideration of edge cases.

# Code Review Report — PR #1146 `fix/atomic-sandbox-commit` **Reviewer:** AI-assisted review (4 full global cycles across all categories) **Scope:** All code changes in branch `fix/atomic-sandbox-commit` plus close connections to surrounding code **Issue:** #925 — `commit_all` must be all-or-nothing atomic **Spec reference:** `docs/specification.md` (lines 45938, 19193, 24632) --- ## Overall Assessment This is a substantial and well-thought-out change. The commit message is exceptionally thorough, documenting every design decision. The atomicity guarantee for `commit_all` is correctly implemented for the single-threaded case, and the LIFO rollback pattern is sound. The new `_fs_utils` module with `backup_directory`/`safe_restore`/`compute_diff` is a solid extraction that eliminates duplication. The BDD coverage is extensive. That said, 4 review cycles across all categories surfaced findings that warrant attention, organized below by severity and category. --- ## SEVERITY: HIGH ### H1 — Bug: `OverlaySandbox.rollback()` from COMMITTED with no backup unnecessarily resets merged directory **File:** `overlay.py:404-466` When `commit()` detects no changes, it skips the pre-commit backup (`_pre_commit_backup` remains `None`). On rollback from COMMITTED, the no-backup path correctly logs a debug no-op message (line 410). **However**, the merged-directory reset block at lines 425-466 always executes regardless of whether a backup existed. For **real OverlayFS** mode, this attempts `umount` (line 434) on a live mount that was never modified. If `umount` fails (e.g., busy mount, stale NFS handle), `SandboxRollbackError` is raised — turning what should be a harmless no-op rollback into a failure. In the `commit_all` atomic recovery path, this would place the sandbox in `failed_rollback_ids`, producing a false alarm. For **userspace fallback**, this deletes and re-copies the entire merged directory unnecessarily. **Suggested fix:** Guard the merged-dir reset with the same `_pre_commit_backup is not None` condition, or use an `else` clause, matching the pattern already used in `CopyOnWriteSandbox.rollback()`. ```python if self._pre_commit_backup is None: logger.debug("...") else: safe_restore(...) self._pre_commit_backup = None # Reset merged dir only when changes were actually restored if self._merged_dir is not None: ... ``` --- ### H2 — Bug: Dual error-reporting paths in `commit_all` create asymmetric API **File:** `manager.py:296-346` When a sandbox `commit()` raises `SandboxError`, the method returns a `CommitResult` list with `success=False` and rollback metadata in `result.metadata["rolled_back"]`. When it raises a non-`SandboxError` exception, the method **raises** `AtomicCommitError` with rollback metadata in `exc.rolled_back_ids`. Callers must handle **both** a return-value error path **and** an exception path, with rollback metadata in structurally different locations (`dict["rolled_back"]` vs `list` attribute). This is error-prone: a caller that only checks the return value will miss `AtomicCommitError`, and vice versa. **Suggested fix:** Consider one of: - Always raise `AtomicCommitError` (or a `SandboxCommitError` subclass) on failure, carrying all metadata in exception attributes. - Always return a `CommitResult` list, wrapping non-`SandboxError` exceptions in the result metadata. --- ### H3 — Bug: `CopyOnWriteSandbox.get_path()` rejects `ROLLED_BACK` status, contradicting its own rollback docstring **File:** `copy_on_write.py:191-194` vs `copy_on_write.py:330-337` The `rollback()` docstring (introduced in this commit) states: *"The sandbox transitions to `ROLLED_BACK` and can be re-activated via `get_path`."* But `get_path()` only accepts `CREATED` and `ACTIVE` statuses — `ROLLED_BACK` is rejected with `SandboxStateError`. This is inconsistent with `OverlaySandbox.get_path()` which correctly accepts `ROLLED_BACK` (line 238-242). The status transition table also allows `ROLLED_BACK -> ACTIVE`. **Impact:** After a COMMITTED rollback via `commit_all`, the `CopyOnWriteSandbox` cannot be re-activated. While `commit_all` doesn't attempt re-activation (so atomicity is not broken), the docstring promise is violated and the behavioral inconsistency between sandbox implementations could cause issues for future callers. **Suggested fix:** Either add `SandboxStatus.ROLLED_BACK` to the allowed statuses in `CopyOnWriteSandbox.get_path()`, or correct the docstring to state that re-activation is not supported. --- ### H4 — Bug: `commit_all` non-rollbackable classification only handles built-in types **File:** `manager.py:264-287` The rollbackable/non-rollbackable partitioning uses `isinstance(sb, NoSandbox)` and `isinstance(sb, TransactionSandbox)`. Custom sandbox strategies registered via `SandboxStrategyRegistry` that cannot support rollback-from-COMMITTED would be classified as rollbackable and committed **early** in the batch. If a later sandbox fails, their rollback would also fail — but the error metadata would misleadingly list them in `failed_rollback_ids` rather than `rolled_back` being empty. **Suggested fix:** Consider adding a protocol method or property (e.g., `supports_committed_rollback: bool`) to the `Sandbox` protocol so the classification is strategy-agnostic. --- ### H5 — Bug: Thread-safety gap in `commit_all` **File:** `manager.py:249-348` The method acquires `_lock` only to snapshot the sandbox list (line 249-250), then releases it. The entire commit loop, including `_rollback_committed`, runs outside the lock. If two threads call `commit_all` for the same `plan_id` concurrently: 1. Both threads get the same snapshot of committable sandboxes. 2. Both attempt to commit the same sandbox objects. 3. The second thread's `commit()` call would hit `SandboxStateError` (sandbox already COMMITTED), triggering rollback of any sandboxes it had already committed. The class docstring says *"Thread-safe: all mutable state is protected by `_lock`"*, but the sandbox objects themselves are mutable and not protected. **Suggested fix:** Hold the lock for the entire `commit_all` operation, or use a per-plan lock to serialize commit operations for the same plan. Alternatively, document that `commit_all` is not safe for concurrent calls on the same `plan_id`. --- ## SEVERITY: MEDIUM ### M1 — Bug: `OverlaySandbox.rollback()` double-wraps `SandboxRollbackError` **File:** `overlay.py:433-449` and `overlay.py:492-496` When the `umount` call fails during COMMITTED rollback, the inner `except` block raises `SandboxRollbackError` (line 445). This propagates to the outer `except Exception` (line 492) which catches it and wraps it in another `SandboxRollbackError`, producing a confusing double-wrapped error chain: ``` SandboxRollbackError("Failed to rollback overlay sandbox...") __cause__ = SandboxRollbackError("Cannot remount overlay...unmount failed") __cause__ = CalledProcessError(...) ``` **Suggested fix:** Either re-raise `SandboxRollbackError` without wrapping in the outer handler, or catch only non-`SandboxRollbackError` exceptions in the outer block. --- ### M2 — Bug: `CopyOnWriteSandbox.rollback()` from COMMITTED doesn't reset sandbox copy **File:** `copy_on_write.py:351-367` After rollback from COMMITTED, `safe_restore` restores the original directory, but the sandbox copy at `_sandbox_path` still contains the pre-rollback modifications. Although `get_path()` currently blocks re-activation (H3), if H3 is fixed, the stale sandbox copy would cause previously committed changes to reappear in `compute_diff` on the next commit cycle. For consistency with `OverlaySandbox` (which resets its merged directory after COMMITTED rollback), the sandbox copy should be reset from the restored original. --- ### M3 — Bug: `rollback_all` catches only `SandboxError`, inconsistent with `_rollback_committed` **File:** `manager.py:434` vs `manager.py:398` `_rollback_committed` catches `Exception` (line 398), but `rollback_all` catches only `SandboxError` (line 434). An unexpected non-`SandboxError` exception in `rollback_all` would crash the caller and prevent remaining sandboxes from being rolled back. **Suggested fix:** Use `Exception` in `rollback_all` for consistency, or document the different behavior. --- ### M4 — Performance: `compute_diff` walks both directory trees fully with deep comparison **File:** `_fs_utils.py:165-205` `compute_diff` walks both sandbox and original directory trees to build complete file sets, then calls `filecmp.cmp(shallow=False)` on every common file (reading full contents). For large directories with few changes, this is significantly more expensive than necessary. For commit-time use, a change-tracking approach (dirty flags, inotify, or git status) would avoid the full tree walk. This is a pre-existing pattern that was extracted into `_fs_utils`, but worth flagging since the extraction makes it a shared concern. --- ### M5 — Bug: `OverlaySandbox.create()` userspace fallback uses `symlinks=False` **File:** `overlay.py:183-188` The userspace copytree at creation uses `symlinks=False`, following symlinks instead of preserving them. This means the merged directory has regular files where the original has symlinks. On commit, `compute_diff` compares file contents (equal), but the file type differs. More critically, `shutil.copy2` during commit would copy the regular file back, replacing the original's symlink with a regular file — a silent behavioral change. This is a pre-existing issue not introduced by this commit, but `compute_diff` was extracted to `_fs_utils` as part of this change, making it a shared concern. --- ### M6 — Test Quality: Integration test doesn't verify rollback mock was called **File:** `robot/helper_sandbox_integration.py:336-402` The `_manager_atomic_commit` integration test verifies `CommitResult` metadata contains `rolled_back`, but does not assert that the mock sandbox's `rollback()` method was actually called. If the manager populated the metadata without calling rollback, the test would still pass. **Suggested fix:** Add `mock_ok.rollback.assert_called_once()` after the commit_all call. --- ### M7 — Test Quality: Committable mock sandboxes don't simulate status transitions **File:** `features/steps/sandbox_manager_coverage_steps.py:178-211` The `_make_mock_sandbox` helper creates mocks with a static `ACTIVE` status that never changes to `COMMITTED` after `commit()`. While this doesn't break current manager tests (the manager doesn't re-check status post-commit), it means the mocks don't accurately simulate real sandbox behavior and could mask bugs where post-commit status checks are added. --- ## SEVERITY: LOW ### L1 — Test Coverage: No test for `backup_directory` with broken symlinks **File:** `_fs_utils.py:86-88` `backup_directory` calls `os.readlink` for symlinks but doesn't handle the case where the symlink target doesn't exist (broken symlink). No BDD scenario covers this edge case. The function would preserve the broken symlink (which is correct), but `compute_diff` would fail with `filecmp.cmp` on broken symlinks. --- ### L2 — Test Coverage: No test for `backup_directory` skipping non-regular files **File:** `_fs_utils.py:91-97` The code logs a warning and skips FIFOs, sockets, and device files, but no BDD scenario verifies this behavior. The feature file `sandbox_fs_utils.feature` covers permissions, timestamps, symlinks, and restore — but not special file skipping. --- ### L3 — Test Coverage: No concurrency test for `SandboxManager` The manager class is documented as thread-safe, but no test exercises concurrent operations. Given the thread-safety gap in H5, adding a concurrency test would help validate the guarantees. --- ### L4 — Test Quality: Private attribute access in transaction sandbox steps **File:** `features/steps/transaction_sandbox_coverage_steps.py` (lines 143, 181, 255) Steps like `sb._connection = None` and `sb._connection = MagicMock(...)` access private attributes, creating tight coupling to the internal implementation. If the attribute name changes, tests break silently. Consider adding a test-only method or using dependency injection for the connection. --- ### L5 — Test Quality: String-based error assertions are fragile **File:** `features/steps/transaction_sandbox_coverage_steps.py:241, 274` Assertions like `"cannot undo database commit" in str(context.caught_error).lower()` break if error message wording changes. Consider asserting on exception type instead. --- ### L6 — Documentation: CHANGELOG entry has formatting inconsistencies **File:** `CHANGELOG.md` (lines 34, 56 of the diff) Two lines have inconsistent leading whitespace (three spaces instead of two): - `" `backup_directory` skips non-regular files..."` (extra leading space) - `" on non-`OSError` failures..."` (extra leading space) --- ### L7 — Security: `backup_directory` doesn't preserve file ownership **File:** `_fs_utils.py:215-223` The `_copy_file` function preserves permissions and timestamps but not file ownership (uid/gid). When running as root (real OverlayFS mode) with varying file ownership, the backup would have all files owned by root. On restore, original ownership would be lost. This is unlikely to matter in most deployments but could affect root-mode OverlayFS usage. --- ## INFORMATIONAL NOTES ### I1 — Specification Contradiction (Acknowledged) The commit correctly notes the spec contradiction between line 45938 ("atomic apply — all or none") and line 19193 ("partial apply"). The implementation follows 45938 (atomicity). The docstring at `manager.py:215-220` documents this. Consider filing a spec issue to resolve the contradiction. ### I2 — LIFO vs Spec's DAG-based Ordering The `_rollback_committed` method uses LIFO ordering for batch undo, while the spec's "top-down" rollback (line 24632) refers to resource DAG traversal within a single sandbox domain. The commit message correctly distinguishes these two contexts. The docstring at `manager.py:363-373` documents this distinction. --- **Summary:** 5 High, 7 Medium, 7 Low findings across 4 review cycles. The core atomicity logic is sound. The most actionable items are H1 (OverlaySandbox no-op rollback bug), H2 (dual error paths), and H3 (docstring/behavior mismatch). The implementation overall reflects careful engineering with thorough consideration of edge cases.
CoreRasurae force-pushed fix/atomic-sandbox-commit from 614c350433
All checks were successful
CI / lint (pull_request) Successful in 28s
CI / quality (pull_request) Successful in 56s
CI / build (pull_request) Successful in 28s
CI / integration_tests (pull_request) Successful in 3m46s
CI / unit_tests (pull_request) Successful in 4m5s
CI / security (pull_request) Successful in 4m9s
CI / typecheck (pull_request) Successful in 4m14s
CI / docker (pull_request) Successful in 55s
CI / e2e_tests (pull_request) Successful in 6m16s
CI / coverage (pull_request) Successful in 11m11s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 53m47s
to 8cf55ba449
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 46s
CI / security (pull_request) Successful in 58s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m44s
CI / build (pull_request) Successful in 17s
CI / integration_tests (pull_request) Successful in 3m41s
CI / unit_tests (pull_request) Successful in 6m59s
CI / docker (pull_request) Successful in 57s
CI / e2e_tests (pull_request) Successful in 5m32s
CI / coverage (pull_request) Successful in 11m8s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 51m22s
2026-03-29 12:38:59 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146: Atomic commit_all (#925)

Reviewer: Automated deep review (3 global cycles across all categories)
Scope: Code changes in fix/atomic-sandbox-commit branch + close connections to surrounding code
Commit: 8cf55ba


Summary

The PR successfully implements the spec requirement (line 45938) for atomic commit_all semantics with comprehensive rollback support. The design is well-considered: rollbackable sandboxes commit first, non-rollbackable last; LIFO rollback order; AtomicCommitError wraps non-sandbox exceptions with metadata. However, three cycles of review uncovered 2 critical, 10 high, 9 medium, and 6 low severity findings across bugs, performance, security, test coverage, and documentation.


Critical

C1 — BUG: safe_restore() recovery fails with ENOTEMPTY when copytree partially succeeds

_fs_utils.py:144-158

When shutil.copytree() (line 146-151) fails midway (e.g., disk full), target_path is a partially-populated directory. The recovery path at line 157 calls os.rename(stale, target_path), but POSIX rename(2) requires the destination directory to be empty — it fails with ENOTEMPTY / EEXIST if target_path contains any files from the partial copy.

Impact: The original directory is stranded in the stale_container temp dir, target_path contains corrupt partial data, and the function's core invariant ("no data is lost") is violated. This affects every caller: CopyOnWriteSandbox.rollback(), OverlaySandbox.rollback(), and both sandboxes' commit-failure recovery paths.

Fix: Add shutil.rmtree(target_path, ignore_errors=True) before the os.rename(stale, target_path) in the except BaseException block. Or better — see C2.

C2 — PERFORMANCE: safe_restore() uses shutil.copytree where os.rename suffices

_fs_utils.py:146-151

The backup is always on the same filesystem (callers use dir=os.path.dirname(original_path) for mkdtemp, confirmed at copy_on_write.py:257-259, overlay.py:303-306, and documented in CHANGELOG line 41). Same-filesystem os.rename is O(1) and atomic. Yet safe_restore() does:

rename(target → stale)       # O(1)
copytree(backup → target)    # O(n) ← unnecessary full recursive copy
rmtree(stale)                # O(n)
rmtree(backup)               # O(n)

Replacing copytree with os.rename(backup_path, target_path) would:

  • Eliminate C1 entirely (no partial state to clean up)
  • Reduce restore from O(3n) to O(n) — a ~3x I/O reduction
  • Make the restore truly atomic (single syscall, no partial-failure window)

The module docstring (lines 5-7) also contradicts itself: it says functions "deliberately avoid shutil.copytree" but safe_restore uses it.


High

H1 — BUG: OverlaySandbox.get_path() missing ROLLED_BACK → ACTIVE transition

overlay.py:250-251

if self._status == SandboxStatus.CREATED:
    self._status = SandboxStatus.ACTIVE

This only handles CREATED → ACTIVE. The ROLLED_BACK status is allowed by the guard at line 238, but get_path() never transitions it to ACTIVE. Compare with CopyOnWriteSandbox.get_path() (line 203-204) which correctly handles both:

if self._status in (SandboxStatus.CREATED, SandboxStatus.ROLLED_BACK):
    self._status = SandboxStatus.ACTIVE

Impact: After rollback, OverlaySandbox is stuck in ROLLED_BACK. Calling get_path() returns a valid path but the status never advances to ACTIVE, so a subsequent commit() (which requires CREATED or ACTIVE) raises SandboxStateError. The sandbox becomes permanently unusable after rollback, contradicting the state machine (ROLLED_BACK → ACTIVE at protocol.py:117).

H2 — BUG: GitWorktreeSandbox.get_path() rejects ROLLED_BACK status entirely

git_worktree.py:303-306

if self._status not in (SandboxStatus.CREATED, SandboxStatus.ACTIVE):
    raise SandboxStateError(...)

ROLLED_BACK is not in the allowed set, so get_path() raises SandboxStateError after any rollback. Both CopyOnWriteSandbox and OverlaySandbox include ROLLED_BACK in their guards. The state machine explicitly allows ROLLED_BACK → ACTIVE (protocol.py:117), making this sandbox type unreusable after rollback — inconsistent with the protocol contract and with the other sandbox implementations.

H3 — PERFORMANCE: Full directory backup for any change, regardless of size

copy_on_write.py:253-266, overlay.py:299-312

If even one byte in one file changes, backup_directory() copies the entire original directory. The diff is already computed at this point (changed_files, added_files, deleted_files are known). A proportional approach would back up only files about to be overwritten (changed_files + deleted_files destinations), making cost O(changed) instead of O(total).

For a 100GB directory with a 1-byte change, this generates 100GB of backup I/O.

H4 — PERFORMANCE: compute_diff() uses shallow=False, reading all file contents

_fs_utils.py:199

filecmp.cmp(shallow=False) reads and byte-compares the entire contents of every common file. Since shutil.copytree (used at sandbox create() time) preserves mtime and size, any unmodified file has identical stat metadata. Using shallow=True (stat-based comparison) would correctly identify unchanged files without I/O, only needing byte-level reads for files whose metadata differs.

For 10,000 files with 5 modified, this currently reads all 10,000 files twice. With shallow=True, only 5 files would need content comparison.

H5 — PERFORMANCE: Triple disk usage during commit window

copy_on_write.py:240-275, overlay.py:286-321

During commit, three full copies coexist: original + sandbox copy + pre-commit backup. For a 10GB directory, ~30GB consumed with no disk-space check. This can itself trigger the very failure (disk full) that the backup is meant to protect against.

_fs_utils.py:69-70, 87-88

backup_directory copies symlinks with arbitrary targets. When safe_restore() recreates them via shutil.copytree(..., symlinks=True), symlinks pointing outside the sandbox boundary persist through the backup/restore cycle. A malicious symlink crafted during sandbox use survives rollback and points to any filesystem path.

H7 — SECURITY: Setuid/setgid bit preservation during backup and restore

_fs_utils.py:106, 223

os.chmod(dir_path, mode) and os.chmod(dst, stat.S_IMODE(src_stat.st_mode)) copy the full permission mode including setuid (S_ISUID), setgid (S_ISGID), and sticky bits. If a source file has setuid-root permissions, the backup/restore cycle recreates a setuid binary — a privilege escalation vector. Consider masking out special bits: os.chmod(dst, stat.S_IMODE(st.st_mode) & ~(stat.S_ISUID | stat.S_ISGID)).

_fs_utils.py:86-90

Between os.path.islink(src_file) (line 86) and _copy_file(src_file, dst_file) (line 90), a local attacker could replace a regular file with a symlink. _copy_file uses open(src, "rb") which follows symlinks, allowing exfiltration of arbitrary file contents into the backup directory. Standard mitigation: open with O_NOFOLLOW or use os.lstat() + os.open().

H9 — TEST: No scenario tests the no-backup no-op rollback path for CopyOnWriteSandbox

copy_on_write.py:354-362

When commit() detects zero changes and skips the backup, rollback() from COMMITTED takes the _pre_commit_backup is None branch (a debug-log no-op). No test exercises this path. All existing rollback-from-COMMITTED tests mutate a file first, ensuring a backup always exists.

H10 — TEST: No scenario tests the no-backup no-op rollback path for OverlaySandbox

overlay.py:406-416

Same gap as H9 but for OverlaySandbox. No scenario commits with zero changes and then rolls back.


Medium

M1 — BUG: rollback_all() only rolls back ACTIVE sandboxes, skips COMMITTED

manager.py:439

if sandbox.status != SandboxStatus.ACTIVE:
    continue

This skips COMMITTED sandboxes. The state machine allows COMMITTED → ROLLED_BACK (protocol.py:116), and _rollback_committed relies on it. A caller invoking rollback_all() after manual commits (outside commit_all) would leave committed sandboxes un-rolled-back.

M2 — BUG: cleanup_all() catches only SandboxError, may abort mid-loop

manager.py:471

If a sandbox's cleanup() raises a non-SandboxError (e.g., raw OSError, PermissionError), the exception propagates uncaught and aborts the loop. Remaining sandboxes are never cleaned up. Compare with _rollback_committed() (line 408) and rollback_all() (line 444) which both correctly catch Exception.

M3 — BUG: Git merge timeout can leave repository in MERGE_HEAD state

git_worktree.py:426-430, 540-596

If git merge times out, the original repository may be left with MERGE_HEAD present (merge-in-progress state). The error handler clears _pre_merge_commit = None (line 433), losing the ability to git reset --hard. cleanup() removes the worktree/branch but never runs git merge --abort on the original repo.

M4 — PERFORMANCE: backup_directory re-walks the original tree that compute_diff just walked

copy_on_write.py:240-262, overlay.py:286-308

The commit sequence does: compute_diff() (walks both trees) then backup_directory() (walks original again). That is 3 os.walk traversals where 2 would suffice. On directories with millions of entries, the redundant walk adds measurable latency.

M5 — PERFORMANCE: _copy_file() manual loop bypasses kernel zero-copy

_fs_utils.py:215-219

The manual read(1MB)/write loop avoids shutil.copy2 (to prevent test-patch interference per the docstring), but also bypasses os.sendfile() / copy_file_range() kernel zero-copy. For large files, this is 2-5x slower. Consider using os.sendfile() directly (not shutil) for zero-copy without the patching concern.

overlay.py:183,464,488 vs copy_on_write.py:139,385

CopyOnWriteSandbox uses symlinks=True (preserves symlinks); OverlaySandbox userspace fallback uses symlinks=False (follows symlinks and copies target content). This means the same original directory produces different security characteristics depending on strategy. If the original contains link → /etc/shadow, the overlay sandbox materializes its contents into a temp directory.

M7 — DOCUMENTATION: CHANGELOG entry is a single ~90-line bullet point

CHANGELOG.md:5-91

The entire PR's behavioral changes are concatenated into one - **Breaking (behavioral):** paragraph. This is effectively unreadable and defeats the purpose of a changelog. Consider splitting into sub-bullets per distinct change.

M8 — TEST: No scenario tests ordered commit (rollbackable-first, non-rollbackable-last)

manager.py:302

commit_all() sorts sandboxes so rollbackable ones commit first and NoSandbox/TransactionSandbox commit last. No test verifies this ordering, so a regression that breaks the ordering would degrade the atomicity guarantee silently.

M9 — TEST: No direct unit tests for compute_diff()

_fs_utils.py:165-205

compute_diff is a public API in _fs_utils but has no dedicated test scenarios. It is only tested indirectly through commit paths. Edge cases (empty dirs, symlink-only trees, binary files) are never verified.


Low

L1 — BUG: git_worktree.py error handlers unconditionally clear _pre_merge_commit

git_worktree.py:433, 440

Both timeout and error handlers set self._pre_merge_commit = None even if _pre_merge_commit was already recorded and the merge partially succeeded. The lost reference makes even manual recovery harder.

L2 — SECURITY: backup_directory / safe_restore no path boundary validation

_fs_utils.py:21, 110

Neither function validates that paths are within expected boundaries. A caller passing a crafted path could cause operations on arbitrary directories. The functions trust their callers completely.

L3 — TEST: No test for safe_restore failure during commit error recovery (double-failure path)

copy_on_write.py:293-303, overlay.py:339-349

If commit() fails and then safe_restore() also fails during recovery, the backup is preserved for manual recovery. No scenario covers this double-failure path.

L4 — TEST: No test for backup_directory non-regular file skip path

_fs_utils.py:91-97

The else branch logging "Skipping non-regular file during backup" for FIFOs/sockets/device files is never exercised.

L5 — TEST: BDD steps manipulate 15+ private attributes of SandboxManager

sandbox_manager_coverage_steps.py (throughout)

Steps access context.manager._active_sandboxes directly for mock injection and assertions. This creates tight coupling to internal implementation details. Consider adding a test-only injection method.

L6 — DOCUMENTATION: safe_restore docstring claims "atomically" but operation has non-atomic window

_fs_utils.py:111

Between os.rename(target_path, stale) and copytree completion, target_path either doesn't exist or is partially populated. The composite operation is not atomic. Adopting the os.rename fix from C2 would narrow this window significantly.


Summary Table

Severity Bugs Performance Security Test Coverage Documentation Total
Critical 1 1 2
High 2 3 3 2 10
Medium 3 2 1 2 1 9
Low 1 1 3 1 6
Total 7 6 5 7 2 27

  1. C1+C2 — Fix safe_restore to use os.rename instead of copytree (eliminates the critical data-loss bug AND the performance issue in one change)
  2. H1 — Add ROLLED_BACK to OverlaySandbox.get_path() transition (one-line fix, blocks re-activation)
  3. H2 — Add ROLLED_BACK to GitWorktreeSandbox.get_path() allowed statuses (consistency with protocol)
  4. M1 — Extend rollback_all() to handle COMMITTED status
  5. M2 — Widen cleanup_all() except clause to Exception
  6. H9, H10, M8 — Add missing test scenarios for no-op rollback and commit ordering
# Code Review Report — PR #1146: Atomic `commit_all` (#925) **Reviewer:** Automated deep review (3 global cycles across all categories) **Scope:** Code changes in `fix/atomic-sandbox-commit` branch + close connections to surrounding code **Commit:** `8cf55ba` --- ## Summary The PR successfully implements the spec requirement (line 45938) for atomic `commit_all` semantics with comprehensive rollback support. The design is well-considered: rollbackable sandboxes commit first, non-rollbackable last; LIFO rollback order; `AtomicCommitError` wraps non-sandbox exceptions with metadata. However, three cycles of review uncovered **2 critical**, **10 high**, **9 medium**, and **6 low** severity findings across bugs, performance, security, test coverage, and documentation. --- ## Critical ### C1 — BUG: `safe_restore()` recovery fails with `ENOTEMPTY` when `copytree` partially succeeds **`_fs_utils.py:144-158`** When `shutil.copytree()` (line 146-151) fails midway (e.g., disk full), `target_path` is a partially-populated directory. The recovery path at line 157 calls `os.rename(stale, target_path)`, but POSIX `rename(2)` requires the destination directory to be **empty** — it fails with `ENOTEMPTY` / `EEXIST` if `target_path` contains any files from the partial copy. **Impact:** The original directory is stranded in the `stale_container` temp dir, `target_path` contains corrupt partial data, and the function's core invariant ("no data is lost") is violated. This affects every caller: `CopyOnWriteSandbox.rollback()`, `OverlaySandbox.rollback()`, and both sandboxes' commit-failure recovery paths. **Fix:** Add `shutil.rmtree(target_path, ignore_errors=True)` before the `os.rename(stale, target_path)` in the `except BaseException` block. Or better — see C2. ### C2 — PERFORMANCE: `safe_restore()` uses `shutil.copytree` where `os.rename` suffices **`_fs_utils.py:146-151`** The backup is **always on the same filesystem** (callers use `dir=os.path.dirname(original_path)` for `mkdtemp`, confirmed at `copy_on_write.py:257-259`, `overlay.py:303-306`, and documented in CHANGELOG line 41). Same-filesystem `os.rename` is O(1) and atomic. Yet `safe_restore()` does: ``` rename(target → stale) # O(1) copytree(backup → target) # O(n) ← unnecessary full recursive copy rmtree(stale) # O(n) rmtree(backup) # O(n) ``` Replacing `copytree` with `os.rename(backup_path, target_path)` would: - **Eliminate C1 entirely** (no partial state to clean up) - Reduce restore from O(3n) to O(n) — a ~3x I/O reduction - Make the restore truly atomic (single syscall, no partial-failure window) The module docstring (lines 5-7) also contradicts itself: it says functions "deliberately avoid `shutil.copytree`" but `safe_restore` uses it. --- ## High ### H1 — BUG: `OverlaySandbox.get_path()` missing `ROLLED_BACK → ACTIVE` transition **`overlay.py:250-251`** ```python if self._status == SandboxStatus.CREATED: self._status = SandboxStatus.ACTIVE ``` This only handles `CREATED → ACTIVE`. The `ROLLED_BACK` status is allowed by the guard at line 238, but `get_path()` never transitions it to `ACTIVE`. Compare with `CopyOnWriteSandbox.get_path()` (line 203-204) which correctly handles both: ```python if self._status in (SandboxStatus.CREATED, SandboxStatus.ROLLED_BACK): self._status = SandboxStatus.ACTIVE ``` **Impact:** After rollback, `OverlaySandbox` is stuck in `ROLLED_BACK`. Calling `get_path()` returns a valid path but the status never advances to `ACTIVE`, so a subsequent `commit()` (which requires `CREATED` or `ACTIVE`) raises `SandboxStateError`. The sandbox becomes permanently unusable after rollback, contradicting the state machine (`ROLLED_BACK → ACTIVE` at `protocol.py:117`). ### H2 — BUG: `GitWorktreeSandbox.get_path()` rejects `ROLLED_BACK` status entirely **`git_worktree.py:303-306`** ```python if self._status not in (SandboxStatus.CREATED, SandboxStatus.ACTIVE): raise SandboxStateError(...) ``` `ROLLED_BACK` is not in the allowed set, so `get_path()` raises `SandboxStateError` after any rollback. Both `CopyOnWriteSandbox` and `OverlaySandbox` include `ROLLED_BACK` in their guards. The state machine explicitly allows `ROLLED_BACK → ACTIVE` (`protocol.py:117`), making this sandbox type unreusable after rollback — inconsistent with the protocol contract and with the other sandbox implementations. ### H3 — PERFORMANCE: Full directory backup for any change, regardless of size **`copy_on_write.py:253-266`, `overlay.py:299-312`** If even **one byte** in one file changes, `backup_directory()` copies the **entire** original directory. The diff is already computed at this point (`changed_files`, `added_files`, `deleted_files` are known). A proportional approach would back up only files about to be overwritten (`changed_files` + `deleted_files` destinations), making cost O(changed) instead of O(total). For a 100GB directory with a 1-byte change, this generates 100GB of backup I/O. ### H4 — PERFORMANCE: `compute_diff()` uses `shallow=False`, reading all file contents **`_fs_utils.py:199`** `filecmp.cmp(shallow=False)` reads and byte-compares the **entire contents** of every common file. Since `shutil.copytree` (used at sandbox `create()` time) preserves `mtime` and `size`, any unmodified file has identical stat metadata. Using `shallow=True` (stat-based comparison) would correctly identify unchanged files without I/O, only needing byte-level reads for files whose metadata differs. For 10,000 files with 5 modified, this currently reads all 10,000 files twice. With `shallow=True`, only 5 files would need content comparison. ### H5 — PERFORMANCE: Triple disk usage during commit window **`copy_on_write.py:240-275`, `overlay.py:286-321`** During commit, three full copies coexist: original + sandbox copy + pre-commit backup. For a 10GB directory, ~30GB consumed with no disk-space check. This can itself trigger the very failure (disk full) that the backup is meant to protect against. ### H6 — SECURITY: Symlink preservation in `backup_directory` / `safe_restore` enables escape **`_fs_utils.py:69-70, 87-88`** `backup_directory` copies symlinks with arbitrary targets. When `safe_restore()` recreates them via `shutil.copytree(..., symlinks=True)`, symlinks pointing outside the sandbox boundary persist through the backup/restore cycle. A malicious symlink crafted during sandbox use survives rollback and points to any filesystem path. ### H7 — SECURITY: Setuid/setgid bit preservation during backup and restore **`_fs_utils.py:106, 223`** `os.chmod(dir_path, mode)` and `os.chmod(dst, stat.S_IMODE(src_stat.st_mode))` copy the full permission mode including setuid (`S_ISUID`), setgid (`S_ISGID`), and sticky bits. If a source file has setuid-root permissions, the backup/restore cycle recreates a setuid binary — a privilege escalation vector. Consider masking out special bits: `os.chmod(dst, stat.S_IMODE(st.st_mode) & ~(stat.S_ISUID | stat.S_ISGID))`. ### H8 — SECURITY: TOCTOU race in `backup_directory` between `islink` and `_copy_file` **`_fs_utils.py:86-90`** Between `os.path.islink(src_file)` (line 86) and `_copy_file(src_file, dst_file)` (line 90), a local attacker could replace a regular file with a symlink. `_copy_file` uses `open(src, "rb")` which follows symlinks, allowing exfiltration of arbitrary file contents into the backup directory. Standard mitigation: open with `O_NOFOLLOW` or use `os.lstat()` + `os.open()`. ### H9 — TEST: No scenario tests the no-backup no-op rollback path for CopyOnWriteSandbox **`copy_on_write.py:354-362`** When `commit()` detects zero changes and skips the backup, `rollback()` from `COMMITTED` takes the `_pre_commit_backup is None` branch (a debug-log no-op). No test exercises this path. All existing rollback-from-COMMITTED tests mutate a file first, ensuring a backup always exists. ### H10 — TEST: No scenario tests the no-backup no-op rollback path for OverlaySandbox **`overlay.py:406-416`** Same gap as H9 but for OverlaySandbox. No scenario commits with zero changes and then rolls back. --- ## Medium ### M1 — BUG: `rollback_all()` only rolls back `ACTIVE` sandboxes, skips `COMMITTED` **`manager.py:439`** ```python if sandbox.status != SandboxStatus.ACTIVE: continue ``` This skips `COMMITTED` sandboxes. The state machine allows `COMMITTED → ROLLED_BACK` (`protocol.py:116`), and `_rollback_committed` relies on it. A caller invoking `rollback_all()` after manual commits (outside `commit_all`) would leave committed sandboxes un-rolled-back. ### M2 — BUG: `cleanup_all()` catches only `SandboxError`, may abort mid-loop **`manager.py:471`** If a sandbox's `cleanup()` raises a non-`SandboxError` (e.g., raw `OSError`, `PermissionError`), the exception propagates uncaught and aborts the loop. Remaining sandboxes are never cleaned up. Compare with `_rollback_committed()` (line 408) and `rollback_all()` (line 444) which both correctly catch `Exception`. ### M3 — BUG: Git merge timeout can leave repository in `MERGE_HEAD` state **`git_worktree.py:426-430, 540-596`** If `git merge` times out, the original repository may be left with `MERGE_HEAD` present (merge-in-progress state). The error handler clears `_pre_merge_commit = None` (line 433), losing the ability to `git reset --hard`. `cleanup()` removes the worktree/branch but never runs `git merge --abort` on the original repo. ### M4 — PERFORMANCE: `backup_directory` re-walks the original tree that `compute_diff` just walked **`copy_on_write.py:240-262`, `overlay.py:286-308`** The commit sequence does: `compute_diff()` (walks both trees) then `backup_directory()` (walks original again). That is 3 `os.walk` traversals where 2 would suffice. On directories with millions of entries, the redundant walk adds measurable latency. ### M5 — PERFORMANCE: `_copy_file()` manual loop bypasses kernel zero-copy **`_fs_utils.py:215-219`** The manual `read(1MB)/write` loop avoids `shutil.copy2` (to prevent test-patch interference per the docstring), but also bypasses `os.sendfile()` / `copy_file_range()` kernel zero-copy. For large files, this is 2-5x slower. Consider using `os.sendfile()` directly (not `shutil`) for zero-copy without the patching concern. ### M6 — SECURITY: Inconsistent symlink handling between sandbox strategies **`overlay.py:183,464,488` vs `copy_on_write.py:139,385`** `CopyOnWriteSandbox` uses `symlinks=True` (preserves symlinks); `OverlaySandbox` userspace fallback uses `symlinks=False` (follows symlinks and copies target content). This means the same original directory produces different security characteristics depending on strategy. If the original contains `link → /etc/shadow`, the overlay sandbox materializes its contents into a temp directory. ### M7 — DOCUMENTATION: CHANGELOG entry is a single ~90-line bullet point **`CHANGELOG.md:5-91`** The entire PR's behavioral changes are concatenated into one `- **Breaking (behavioral):**` paragraph. This is effectively unreadable and defeats the purpose of a changelog. Consider splitting into sub-bullets per distinct change. ### M8 — TEST: No scenario tests ordered commit (rollbackable-first, non-rollbackable-last) **`manager.py:302`** `commit_all()` sorts sandboxes so rollbackable ones commit first and `NoSandbox`/`TransactionSandbox` commit last. No test verifies this ordering, so a regression that breaks the ordering would degrade the atomicity guarantee silently. ### M9 — TEST: No direct unit tests for `compute_diff()` **`_fs_utils.py:165-205`** `compute_diff` is a public API in `_fs_utils` but has no dedicated test scenarios. It is only tested indirectly through commit paths. Edge cases (empty dirs, symlink-only trees, binary files) are never verified. --- ## Low ### L1 — BUG: `git_worktree.py` error handlers unconditionally clear `_pre_merge_commit` **`git_worktree.py:433, 440`** Both timeout and error handlers set `self._pre_merge_commit = None` even if `_pre_merge_commit` was already recorded and the merge partially succeeded. The lost reference makes even manual recovery harder. ### L2 — SECURITY: `backup_directory` / `safe_restore` no path boundary validation **`_fs_utils.py:21, 110`** Neither function validates that paths are within expected boundaries. A caller passing a crafted path could cause operations on arbitrary directories. The functions trust their callers completely. ### L3 — TEST: No test for `safe_restore` failure during commit error recovery (double-failure path) **`copy_on_write.py:293-303`, `overlay.py:339-349`** If `commit()` fails and then `safe_restore()` also fails during recovery, the backup is preserved for manual recovery. No scenario covers this double-failure path. ### L4 — TEST: No test for `backup_directory` non-regular file skip path **`_fs_utils.py:91-97`** The `else` branch logging "Skipping non-regular file during backup" for FIFOs/sockets/device files is never exercised. ### L5 — TEST: BDD steps manipulate 15+ private attributes of `SandboxManager` **`sandbox_manager_coverage_steps.py` (throughout)** Steps access `context.manager._active_sandboxes` directly for mock injection and assertions. This creates tight coupling to internal implementation details. Consider adding a test-only injection method. ### L6 — DOCUMENTATION: `safe_restore` docstring claims "atomically" but operation has non-atomic window **`_fs_utils.py:111`** Between `os.rename(target_path, stale)` and `copytree` completion, `target_path` either doesn't exist or is partially populated. The composite operation is not atomic. Adopting the `os.rename` fix from C2 would narrow this window significantly. --- ## Summary Table | Severity | Bugs | Performance | Security | Test Coverage | Documentation | Total | |----------|------|-------------|----------|---------------|---------------|-------| | Critical | 1 | 1 | — | — | — | **2** | | High | 2 | 3 | 3 | 2 | — | **10** | | Medium | 3 | 2 | 1 | 2 | 1 | **9** | | Low | 1 | — | 1 | 3 | 1 | **6** | | **Total**| **7**| **6** | **5** | **7** | **2** | **27**| --- ## Recommended Priority 1. **C1+C2** — Fix `safe_restore` to use `os.rename` instead of `copytree` (eliminates the critical data-loss bug AND the performance issue in one change) 2. **H1** — Add `ROLLED_BACK` to `OverlaySandbox.get_path()` transition (one-line fix, blocks re-activation) 3. **H2** — Add `ROLLED_BACK` to `GitWorktreeSandbox.get_path()` allowed statuses (consistency with protocol) 4. **M1** — Extend `rollback_all()` to handle `COMMITTED` status 5. **M2** — Widen `cleanup_all()` except clause to `Exception` 6. **H9, H10, M8** — Add missing test scenarios for no-op rollback and commit ordering
CoreRasurae force-pushed fix/atomic-sandbox-commit from 8cf55ba449
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 46s
CI / security (pull_request) Successful in 58s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m44s
CI / build (pull_request) Successful in 17s
CI / integration_tests (pull_request) Successful in 3m41s
CI / unit_tests (pull_request) Successful in 6m59s
CI / docker (pull_request) Successful in 57s
CI / e2e_tests (pull_request) Successful in 5m32s
CI / coverage (pull_request) Successful in 11m8s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 51m22s
to bd3d01f8e3
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 55s
CI / lint (pull_request) Successful in 3m20s
CI / build (pull_request) Successful in 16s
CI / integration_tests (pull_request) Successful in 3m37s
CI / security (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m14s
CI / unit_tests (pull_request) Successful in 4m19s
CI / docker (pull_request) Successful in 1m6s
CI / e2e_tests (pull_request) Successful in 10m7s
CI / coverage (pull_request) Successful in 8m1s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-29 17:29:37 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1146 (fix/atomic-sandbox-commit)

Reviewer: Automated deep review (5 iterative cycles)
Scope: All code changes on branch fix/atomic-sandbox-commit (commit bd3d01f8) plus closely connected surrounding code
Reference: Issue #925, specification lines 19193 and 45938


Summary

The PR correctly implements the atomic commit_all semantics required by spec line 45938 and properly documents the contradiction with spec line 19193 (partial apply). The overall design — rollback-on-failure with LIFO ordering, non-rollbackable-last commit ordering, pre-commit backups, rename-based safe_restore, and the shared _fs_utils module — is sound and well-documented.

15 findings were identified across 5 review cycles. None are critical, but 3 are high severity bugs that could cause incorrect state or crash loops in production.


HIGH Severity

H1. BUG — cleanup_abandoned catches only SandboxError, inconsistent with other batch methods

File: manager.py:515

All other batch methods in this PR were hardened to catch Exception:

  • cleanup_all (line 471): except Exception
  • rollback_all (line 444): except Exception
  • _rollback_committed (line 408): except Exception

But cleanup_abandoned still catches only SandboxError:

except SandboxError as exc:  # line 515

A raw PermissionError, OSError, or any non-SandboxError will crash the loop and prevent remaining abandoned sandboxes from being cleaned up. This directly contradicts the hardening pattern applied to every other batch method in this PR.

Fix: Change except SandboxError to except Exception at line 515.


H2. BUG — OverlaySandbox.create() doesn't handle TimeoutExpired from _mount_overlay()

File: overlay.py:164-198 / overlay.py:577

_mount_overlay() catches (CalledProcessError, OSError) but subprocess.TimeoutExpired inherits from SubprocessError, not from either of those. If mount hangs beyond 30s:

  1. TimeoutExpired propagates from _mount_overlay()
  2. In create(), it falls through both except SandboxCreationError and except OSError
  3. Status remains PENDING (never set to ERRORED)
  4. The sandbox is in an inconsistent state — partially initialized but not marked as failed

Fix: Add subprocess.TimeoutExpired to the except clause in _mount_overlay():

except (subprocess.CalledProcessError, subprocess.TimeoutExpired, OSError) as exc:

H3. BUG — _unmount_overlay() doesn't catch TimeoutExpired, breaks cleanup()

File: overlay.py:591 / overlay.py:531-543

Same root cause as H2. If umount hangs beyond 30s during cleanup(), TimeoutExpired propagates unhandled. The status is never set to CLEANED_UP, leaving the sandbox in a zombie state.

Note: the rollback-from-COMMITTED path (lines 443-446) correctly catches TimeoutExpired — the inconsistency is confined to the _unmount_overlay helper.

Fix: Add subprocess.TimeoutExpired to the except clause in _unmount_overlay():

except (subprocess.CalledProcessError, subprocess.TimeoutExpired, OSError) as exc:

MEDIUM Severity

M1. BUG — GitWorktreeSandbox.commit() doesn't check diff_result.returncode

File: git_worktree.py:363-371

diff_result = subprocess.run(
    ["git", "diff", "--cached", "--name-status"],
    ..., check=False, ...
)
diff_output = diff_result.stdout.strip()
if not diff_output:
    # No changes to commit — returns early, skips merge

If git diff fails (returncode != 0), stdout may be empty, causing the method to incorrectly conclude there are no changes. It returns early at line 375 with status COMMITTED but without performing the merge. The commit's changes are silently lost.

Fix: Check diff_result.returncode != 0 and raise SandboxCommitError if the diff command itself failed.


File: _fs_utils.py:191-213

os.walk() resolves file-pointing symlinks into filenames. filecmp.cmp() follows symlinks and compares the content they point to. This means:

  • If a symlink changed target (e.g., link -> A to link -> B) but both targets have identical content, the change is missed
  • If a regular file was replaced with a symlink (or vice versa), the type change is not detected

This could cause rollbacks to miss reverting symlink modifications.

Fix: Before filecmp.cmp, check os.path.islink() status and symlink targets with os.readlink() for files that exist in both trees.


M3. DESIGN — commit_all() uses dual error-reporting pattern

File: manager.py:306-356

SandboxError failures return a CommitResult(success=False) list. Non-SandboxError failures raise AtomicCommitError. Callers must handle both a return-value check AND an exception catch for failure cases. While both paths are documented in the docstring, this split makes it easy for callers to handle one path but miss the other, leading to silent failure.

Suggestion: Consider unifying to always return a CommitResult list (wrapping all errors), or always raising on failure. Not necessarily a blocker for this PR, but worth tracking.


M4. TEST GAP — No test for cleanup_abandoned exception handling inconsistency

Files: features/sandbox_manager_coverage.feature

There is no BDD scenario verifying that cleanup_abandoned continues processing when a non-SandboxError exception is raised. This gap directly corresponds to bug H1.


LOW Severity

L1. BUG — safe_restore() leaks empty temp directory on failure

File: _fs_utils.py:150-170

If os.rename(backup_path, target_path) fails at line 159:

  1. except BaseException restores the original via os.rename(stale, target_path)
  2. Exception is re-raised
  3. shutil.rmtree(stale_container) at line 170 is never reached
  4. Empty stale_container directory persists

Minor leak that could accumulate over many failed restores.

Fix: Add a finally clause or handle cleanup in the except block.


L2. MINOR — backup_directory() doesn't preserve extended attributes (xattr)

File: _fs_utils.py:216-232

_copy_file preserves permissions (os.chmod) and timestamps (os.utime) but not extended attributes. On filesystems using SELinux contexts, ACLs, or macOS metadata, backup fidelity is reduced compared to shutil.copy2 which preserves xattrs via shutil.copystat.


L3. MINOR — Commit doesn't remove empty parent directories after file deletion

Files: copy_on_write.py:278-281, overlay.py:324-327

When files are deleted during commit, os.remove() deletes the file but the parent directory (which may now be empty) is not cleaned up. Over multiple commit cycles, empty directory trees could accumulate in the original.


L4. SECURITY (LOW) — Mount options injection via commas in path

File: overlay.py:557-561

opts = f"lowerdir={self._original_path},upperdir=..."

If self._original_path contains commas, the mount options string is malformed and could be misinterpreted by the mount command. While subprocess.run uses a list (preventing shell injection), the mount option value parsing is done by the kernel's filesystem driver. Low practical risk since paths rarely contain commas.

Fix: Validate or escape commas in _original_path before constructing mount options.


L5. TEST GAP — No test for backup_directory with special files (FIFOs, sockets)

File: features/sandbox_fs_utils.feature

Code at _fs_utils.py:91-97 logs a warning and skips non-regular files (FIFOs, sockets, device files), but no BDD scenario verifies this behavior.


L6. TEST GAP — No test for safe_restore with BaseException

File: _fs_utils.py:160

The except BaseException catch (which ensures rename-back on KeyboardInterrupt, SystemExit, etc.) has no test coverage. This is a critical safety path — verifying it would increase confidence in the rollback guarantees.


L7. TEST GAP — No test for _unmount_overlay TimeoutExpired during cleanup

File: overlay.py:580-596

No BDD or Robot scenario tests the behavior when umount times out. This gap corresponds to bug H3.


L8. TEST FLAW — safe_restore error test has misleading step name

File: features/sandbox_fs_utils.feature

The scenario "safe_restore preserves original on rename failure" uses a step "backup path that will cause copytree to fail". However, after the round-2 hardening, safe_restore uses os.rename (not copytree). The step name is stale and misleading.


Statistics

Severity Bugs Design Security Test Gaps Test Flaws Total
HIGH 3 0 0 0 0 3
MEDIUM 2 1 0 1 0 4
LOW 2 0 1 3 1 7
Total 7 1 1 4 1 14

Note: Finding L1 (safe_restore temp leak) was counted separately from the 14 in the table — total is 15 findings.


Recommendation

The PR implements the core atomicity requirement correctly and demonstrates thorough hardening across multiple review rounds. The 3 HIGH findings (H1, H2, H3) should be addressed before merge as they represent real failure modes — cleanup_abandoned loop crash and OverlaySandbox state corruption on timeout. The MEDIUM bugs (M1, M2) are worth fixing but are less likely to manifest in typical usage. LOW findings can be addressed as follow-ups.

# Code Review Report — PR #1146 (`fix/atomic-sandbox-commit`) **Reviewer:** Automated deep review (5 iterative cycles) **Scope:** All code changes on branch `fix/atomic-sandbox-commit` (commit `bd3d01f8`) plus closely connected surrounding code **Reference:** Issue #925, specification lines 19193 and 45938 --- ## Summary The PR correctly implements the atomic `commit_all` semantics required by spec line 45938 and properly documents the contradiction with spec line 19193 (partial apply). The overall design — rollback-on-failure with LIFO ordering, non-rollbackable-last commit ordering, pre-commit backups, rename-based `safe_restore`, and the shared `_fs_utils` module — is sound and well-documented. **15 findings** were identified across 5 review cycles. None are critical, but 3 are high severity bugs that could cause incorrect state or crash loops in production. --- ## HIGH Severity ### H1. BUG — `cleanup_abandoned` catches only `SandboxError`, inconsistent with other batch methods **File:** `manager.py:515` All other batch methods in this PR were hardened to catch `Exception`: - `cleanup_all` (line 471): `except Exception` - `rollback_all` (line 444): `except Exception` - `_rollback_committed` (line 408): `except Exception` But `cleanup_abandoned` still catches only `SandboxError`: ```python except SandboxError as exc: # line 515 ``` A raw `PermissionError`, `OSError`, or any non-`SandboxError` will crash the loop and prevent remaining abandoned sandboxes from being cleaned up. This directly contradicts the hardening pattern applied to every other batch method in this PR. **Fix:** Change `except SandboxError` to `except Exception` at line 515. --- ### H2. BUG — `OverlaySandbox.create()` doesn't handle `TimeoutExpired` from `_mount_overlay()` **File:** `overlay.py:164-198` / `overlay.py:577` `_mount_overlay()` catches `(CalledProcessError, OSError)` but `subprocess.TimeoutExpired` inherits from `SubprocessError`, not from either of those. If `mount` hangs beyond 30s: 1. `TimeoutExpired` propagates from `_mount_overlay()` 2. In `create()`, it falls through both `except SandboxCreationError` and `except OSError` 3. Status remains `PENDING` (never set to `ERRORED`) 4. The sandbox is in an inconsistent state — partially initialized but not marked as failed **Fix:** Add `subprocess.TimeoutExpired` to the except clause in `_mount_overlay()`: ```python except (subprocess.CalledProcessError, subprocess.TimeoutExpired, OSError) as exc: ``` --- ### H3. BUG — `_unmount_overlay()` doesn't catch `TimeoutExpired`, breaks `cleanup()` **File:** `overlay.py:591` / `overlay.py:531-543` Same root cause as H2. If `umount` hangs beyond 30s during `cleanup()`, `TimeoutExpired` propagates unhandled. The status is never set to `CLEANED_UP`, leaving the sandbox in a zombie state. Note: the rollback-from-COMMITTED path (lines 443-446) correctly catches `TimeoutExpired` — the inconsistency is confined to the `_unmount_overlay` helper. **Fix:** Add `subprocess.TimeoutExpired` to the except clause in `_unmount_overlay()`: ```python except (subprocess.CalledProcessError, subprocess.TimeoutExpired, OSError) as exc: ``` --- ## MEDIUM Severity ### M1. BUG — `GitWorktreeSandbox.commit()` doesn't check `diff_result.returncode` **File:** `git_worktree.py:363-371` ```python diff_result = subprocess.run( ["git", "diff", "--cached", "--name-status"], ..., check=False, ... ) diff_output = diff_result.stdout.strip() if not diff_output: # No changes to commit — returns early, skips merge ``` If `git diff` fails (returncode != 0), `stdout` may be empty, causing the method to incorrectly conclude there are no changes. It returns early at line 375 with status `COMMITTED` but without performing the merge. The commit's changes are silently lost. **Fix:** Check `diff_result.returncode != 0` and raise `SandboxCommitError` if the diff command itself failed. --- ### M2. BUG — `compute_diff()` doesn't detect symlink target changes **File:** `_fs_utils.py:191-213` `os.walk()` resolves file-pointing symlinks into `filenames`. `filecmp.cmp()` follows symlinks and compares the content they point to. This means: - If a symlink changed target (e.g., `link -> A` to `link -> B`) but both targets have identical content, the change is missed - If a regular file was replaced with a symlink (or vice versa), the type change is not detected This could cause rollbacks to miss reverting symlink modifications. **Fix:** Before `filecmp.cmp`, check `os.path.islink()` status and symlink targets with `os.readlink()` for files that exist in both trees. --- ### M3. DESIGN — `commit_all()` uses dual error-reporting pattern **File:** `manager.py:306-356` `SandboxError` failures return a `CommitResult(success=False)` list. Non-`SandboxError` failures raise `AtomicCommitError`. Callers must handle **both** a return-value check AND an exception catch for failure cases. While both paths are documented in the docstring, this split makes it easy for callers to handle one path but miss the other, leading to silent failure. **Suggestion:** Consider unifying to always return a `CommitResult` list (wrapping all errors), or always raising on failure. Not necessarily a blocker for this PR, but worth tracking. --- ### M4. TEST GAP — No test for `cleanup_abandoned` exception handling inconsistency **Files:** `features/sandbox_manager_coverage.feature` There is no BDD scenario verifying that `cleanup_abandoned` continues processing when a non-`SandboxError` exception is raised. This gap directly corresponds to bug H1. --- ## LOW Severity ### L1. BUG — `safe_restore()` leaks empty temp directory on failure **File:** `_fs_utils.py:150-170` If `os.rename(backup_path, target_path)` fails at line 159: 1. `except BaseException` restores the original via `os.rename(stale, target_path)` 2. Exception is re-raised 3. `shutil.rmtree(stale_container)` at line 170 is **never reached** 4. Empty `stale_container` directory persists Minor leak that could accumulate over many failed restores. **Fix:** Add a `finally` clause or handle cleanup in the except block. --- ### L2. MINOR — `backup_directory()` doesn't preserve extended attributes (xattr) **File:** `_fs_utils.py:216-232` `_copy_file` preserves permissions (`os.chmod`) and timestamps (`os.utime`) but not extended attributes. On filesystems using SELinux contexts, ACLs, or macOS metadata, backup fidelity is reduced compared to `shutil.copy2` which preserves xattrs via `shutil.copystat`. --- ### L3. MINOR — Commit doesn't remove empty parent directories after file deletion **Files:** `copy_on_write.py:278-281`, `overlay.py:324-327` When files are deleted during commit, `os.remove()` deletes the file but the parent directory (which may now be empty) is not cleaned up. Over multiple commit cycles, empty directory trees could accumulate in the original. --- ### L4. SECURITY (LOW) — Mount options injection via commas in path **File:** `overlay.py:557-561` ```python opts = f"lowerdir={self._original_path},upperdir=..." ``` If `self._original_path` contains commas, the mount options string is malformed and could be misinterpreted by the `mount` command. While `subprocess.run` uses a list (preventing shell injection), the mount option value parsing is done by the kernel's filesystem driver. Low practical risk since paths rarely contain commas. **Fix:** Validate or escape commas in `_original_path` before constructing mount options. --- ### L5. TEST GAP — No test for `backup_directory` with special files (FIFOs, sockets) **File:** `features/sandbox_fs_utils.feature` Code at `_fs_utils.py:91-97` logs a warning and skips non-regular files (FIFOs, sockets, device files), but no BDD scenario verifies this behavior. --- ### L6. TEST GAP — No test for `safe_restore` with `BaseException` **File:** `_fs_utils.py:160` The `except BaseException` catch (which ensures rename-back on `KeyboardInterrupt`, `SystemExit`, etc.) has no test coverage. This is a critical safety path — verifying it would increase confidence in the rollback guarantees. --- ### L7. TEST GAP — No test for `_unmount_overlay` `TimeoutExpired` during cleanup **File:** `overlay.py:580-596` No BDD or Robot scenario tests the behavior when `umount` times out. This gap corresponds to bug H3. --- ### L8. TEST FLAW — `safe_restore` error test has misleading step name **File:** `features/sandbox_fs_utils.feature` The scenario "safe_restore preserves original on rename failure" uses a step "backup path that will cause copytree to fail". However, after the round-2 hardening, `safe_restore` uses `os.rename` (not `copytree`). The step name is stale and misleading. --- ## Statistics | Severity | Bugs | Design | Security | Test Gaps | Test Flaws | Total | |----------|------|--------|----------|-----------|------------|-------| | HIGH | 3 | 0 | 0 | 0 | 0 | 3 | | MEDIUM | 2 | 1 | 0 | 1 | 0 | 4 | | LOW | 2 | 0 | 1 | 3 | 1 | 7 | | **Total**| **7**| **1** | **1** | **4** | **1** |**14** | Note: Finding L1 (`safe_restore` temp leak) was counted separately from the 14 in the table — total is **15 findings**. --- ## Recommendation The PR implements the core atomicity requirement correctly and demonstrates thorough hardening across multiple review rounds. The 3 HIGH findings (H1, H2, H3) should be addressed before merge as they represent real failure modes — `cleanup_abandoned` loop crash and `OverlaySandbox` state corruption on timeout. The MEDIUM bugs (M1, M2) are worth fixing but are less likely to manifest in typical usage. LOW findings can be addressed as follow-ups.
CoreRasurae force-pushed fix/atomic-sandbox-commit from bd3d01f8e3
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 55s
CI / lint (pull_request) Successful in 3m20s
CI / build (pull_request) Successful in 16s
CI / integration_tests (pull_request) Successful in 3m37s
CI / security (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m14s
CI / unit_tests (pull_request) Successful in 4m19s
CI / docker (pull_request) Successful in 1m6s
CI / e2e_tests (pull_request) Successful in 10m7s
CI / coverage (pull_request) Successful in 8m1s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to f614fa9006
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 45s
CI / typecheck (pull_request) Successful in 1m4s
CI / build (pull_request) Successful in 31s
CI / security (pull_request) Successful in 1m39s
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
2026-03-29 17:53:44 +00:00
Compare
CoreRasurae force-pushed fix/atomic-sandbox-commit from f614fa9006
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 29s
CI / quality (pull_request) Successful in 45s
CI / typecheck (pull_request) Successful in 1m4s
CI / build (pull_request) Successful in 31s
CI / security (pull_request) Successful in 1m39s
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
to 7f078f75a5
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 28s
CI / security (pull_request) Successful in 1m8s
CI / build (pull_request) Successful in 20s
CI / helm (pull_request) Successful in 34s
CI / quality (pull_request) Successful in 3m53s
CI / integration_tests (pull_request) Successful in 4m2s
CI / typecheck (pull_request) Successful in 4m9s
CI / unit_tests (pull_request) Successful in 4m48s
CI / docker (pull_request) Successful in 1m19s
CI / e2e_tests (pull_request) Successful in 9m47s
CI / coverage (pull_request) Successful in 11m41s
CI / status-check (pull_request) Successful in 1s
CI / quality (push) Successful in 52s
CI / lint (push) Successful in 3m18s
CI / build (push) Successful in 20s
CI / typecheck (push) Successful in 3m56s
CI / helm (push) Successful in 22s
CI / security (push) Successful in 4m15s
CI / unit_tests (push) Successful in 4m39s
CI / docker (push) Successful in 18s
CI / integration_tests (push) Successful in 6m56s
CI / e2e_tests (push) Successful in 12m34s
CI / coverage (push) Successful in 11m39s
CI / status-check (push) Successful in 1s
CI / benchmark-regression (push) Has been skipped
CI / benchmark-publish (push) Successful in 30m10s
CI / benchmark-regression (pull_request) Successful in 57m33s
2026-03-29 17:57:24 +00:00
Compare
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-29 17:57:41 +00:00
CoreRasurae deleted branch fix/atomic-sandbox-commit 2026-03-29 18:13:20 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!1146
No description provided.