fix(sandbox): make commit_all atomic per specification #1146
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!1146
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "fix/atomic-sandbox-commit"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Changed
SandboxManager.commit_all()from partial-commit semantics to all-or-nothing atomic operation per specification requirement. On partial failure, already-committed sandboxes are rolled back using strategy-specific mechanisms (filesystem backup for CoW/Overlay, git reset for worktree).Changes
SandboxManager.commit_all()now enforces atomicity with_rollback_committed()helperCOMMITTED → ROLLED_BACKstatus transition in protocolDesign Decision
Chose rollback-on-failure over pre-validation (can't guarantee success) or two-phase commit (too invasive for protocol). Each strategy implements rollback from COMMITTED state using its native mechanism.
Quality Gates
All nox stages pass: lint, typecheck, unit_tests (12,294 scenarios), integration_tests, coverage_report.
Closes #925
Review: APPROVED
Well-designed atomicity fix with appropriate rollback strategies per sandbox type: CoW/Overlay use pre-commit backup dirs, GitWorktree uses
git reset --hard, Transaction logs a warning (COMMIT is irreversible). The_rollback_committed()helper cleanly isolates per-sandbox error handling. State machine update to allowCOMMITTED -> ROLLED_BACKis correct. 3 BDD scenarios + 1 Robot integration test — both present.Minor Notes
CopyOnWriteSandbox._backup_directoryandOverlaySandbox._backup_directoryare nearly identical (~30 duplicated lines). Consider extracting to a shared utility in a follow-up.Pre-commit backups via
tempfile.mkdtemp+ full directory copy could be expensive for large workspaces. No size guard or warning — worth noting for future performance consideration.TransactionSandbox.rollback()fromCOMMITTEDstate is effectively a no-op (only logs a warning). The manager's_rollback_committedwill report it as rolled back when it isn't really. Callers should be aware that DB changes are permanent despite the "successful" rollback status. Consider surfacing this distinction in the return value.Review: fix(sandbox): make commit_all atomic per specification
Approved with comments. Important correctness fix with clean implementation.
Issues to Address
1. Duplicated
_backup_directoryimplementation (Medium)Both
CopyOnWriteSandbox._backup_directoryandOverlaySandbox._backup_directoryhave identical implementations. Extract to a shared utility function to follow DRY.2. TransactionSandbox "fake rollback" (Medium)
TransactionSandbox.rollback_committed()logs a warning that DB COMMIT cannot be undone, then transitions toROLLED_BACKanyway. The Manager's_rollback_committedwill count this as "successfully rolled back" inrolled_back_ids, which is misleading. Consider either:rolled_back_idsfor transaction sandboxes3. Backup performance concern (Low)
Backup creation during CopyOnWrite commit copies the entire original directory (
tempfile.mkdtemp+_backup_directory). For large repositories this could be slow and disk-intensive. Consider documenting size limits or adding a benchmark.What's Good
COMMITTED → [ROLLED_BACK, CLEANED_UP]._rollback_committedfor already-committed sandboxes.git reset --hardfor rollback._copy_file_bytes) avoids mock interference.Code Review Report: PR #1146 — Atomic
commit_allReviewer: Automated deep review (4 full review cycles)
Scope: Code changes in
fix/atomic-sandbox-commitbranch + immediate surrounding codeReferences: Issue #925,
docs/specification.md(lines 45938, 19193)Executive Summary
The PR correctly implements the core atomicity requirement from the specification (line 45938):
commit_all()now rolls back previously-committed sandboxes when one fails. The protocol transition map, all four real sandbox strategies, and theNoSandboxpass-through are addressed. Tests cover the happy path, rollback-on-failure, and rollback-error tolerance.However, the review identified 15 findings across 6 categories, including 3 high-severity bugs that can silently break the atomicity guarantee this PR is designed to deliver. These should be addressed before merge.
HIGH Severity
H1. BUG —
commit_allcatches onlySandboxError: atomicity silently broken for unexpected exceptionsFile:
manager.py:245If
sandbox.commit()raises any non-SandboxErrorexception (e.g.,RuntimeError,TypeError,KeyError, or an uncaughtOSErrorfrom a library call), the exception propagates without rolling back already-committed sandboxes. The caller sees a crash, but committed sandboxes remain committed — the atomicity guarantee is silently violated.Recommendation: Widen the catch to
Exception(or at minimum(SandboxError, OSError, RuntimeError)) with a final re-raise after rollback:H2. BUG —
TransactionSandbox.rollback()from COMMITTED falsely reports successFile:
transaction_sandbox.py:284-290When rolling back from
COMMITTED, the method logs a warning but still transitions toROLLED_BACK:The manager's
_rollback_committedthen adds this sandbox's ID torolled_back_ids, telling the caller "this sandbox was successfully rolled back." But the database changes are permanent and NOT actually reversed. The error report provides a false sense of atomicity.Recommendation: Either:
SandboxRollbackErrorto signal that rollback could not actually undo the committed changes, so the manager correctly excludes it fromrolled_back_ids, OR"rollback_noop"metadata list.H3. BUG — Rollback-from-COMMITTED uses non-atomic
rmtree-then-copytree: data loss riskFiles:
copy_on_write.py:313-319,overlay.py:366-372If
copytreefails afterrmtreesucceeds (e.g., disk full, permission error), the original directory is deleted and the backup restoration is incomplete. The data is irrecoverably lost — the backup still exists on disk but the original path is gone, and the sandbox transitions toERRORED(which does not permit rollback). Additionally,rmtree(..., ignore_errors=True)can leave a partially-deleted directory, causingcopytree(..., dirs_exist_ok=False)to fail on the remaining files.Recommendation: Use a rename-based approach:
MEDIUM Severity
M1. BUG —
_rollback_committedcatches onlySandboxError: remaining rollbacks skipped on unexpected exceptionFile:
manager.py:304Same class of issue as H1 but in the rollback loop. If one sandbox's
rollback()raises a non-SandboxError(e.g.,subprocess.SubprocessErrorfrom git,OSError), the exception propagates and remaining rollbacks are never attempted.Recommendation: Catch
Exceptionin the rollback loop.M2. BUG — No reporting of failed rollbacks in error metadata
File:
manager.py:260-268The error result's
metadataonly contains"rolled_back"(successfully rolled-back IDs). Sandboxes that failed to roll back are silently omitted. The caller has no programmatic way to identify which sandboxes are in an inconsistent state — critical operational information when atomicity itself fails.Recommendation: Add a
"rollback_failed"key:M3. BUG —
_backup_directorydoes not preserve symlinks or permissionsFiles:
copy_on_write.py:451-461,overlay.py:539-549os.walk()does not descend into symlinked directories, andopen(src, "rb")follows symlinks for files — so the backup converts symlinks to regular files._copy_file_bytesexplicitly preserves no metadata (permissions, timestamps). After rollback fromCOMMITTED, the original directory:0o644) instead of original permissionsThis contrasts with
ACTIVErollback which usesshutil.copytree(symlinks=True)and preserves permissions. The inconsistency means rollback-from-COMMITTED is less faithful than rollback-from-ACTIVE.Recommendation: Use
shutil.copy2or explicitly handle symlinks and permissions in the backup. At minimum, document the limitation.M4. DESIGN — Specification contradiction unaddressed
The specification contains two contradictory statements:
This PR correctly implements atomicity per line 45938, but the contradicting statement at line 19193 remains in the spec. Future developers may be confused about the intended behavior.
Recommendation: Open a follow-up issue to update line 19193 to align with the atomic behavior, or add a code comment referencing the known contradiction.
M5. CODE QUALITY — Duplicated
_backup_directoryacross two sandbox implementationsFiles:
copy_on_write.py:436-461,overlay.py:528-559Nearly identical
_backup_directoryimplementations (and inline_copy_file_bytesin overlay). This duplication increases maintenance burden and risk of divergence.Recommendation: Extract to a shared utility in a common module (e.g.,
sandbox/_fs_utils.py).LOW Severity
L1. TEST GAP — No test for real file-based rollback from COMMITTED state
All atomic commit tests use
MagicMocksandboxes. No Behave scenario or Robot test exercises the actualCopyOnWriteSandbox,OverlaySandbox, orGitWorktreeSandboxrollback()fromCOMMITTEDstate with real files on disk. The backup, rmtree, copytree, and git-reset paths are untested at the integration level.L2. TEST GAP — No test for NoSandbox/TransactionSandbox interaction with atomic commit
NoSandbox.rollback()always raisesSandboxRollbackError.TransactionSandbox.rollback()from COMMITTED is a silent no-op (see H2). Neither interaction is tested in an atomic commit context. The edge case of mixing sandbox strategies in a single plan is not covered.L3. TEST FRAGILITY — Behave scenario depends on Python dict iteration order
The scenario "Atomic commit rolls back already-committed sandboxes when one fails" (
sandbox_manager_coverage.feature:163) depends onres-001being committed beforeres-002to have any rollback candidates. This relies on Python 3.7+ dict insertion order — correct but implicit. If the manager ever switches to a non-ordered collection, the test silently becomes non-deterministic.Recommendation: Add a comment documenting the ordering assumption.
L4. TEST WEAKNESS — Robot integration test verifies mock call, not actual state restoration
_manager_atomic_commit()inhelper_sandbox_integration.pyassertsok_mock.rollback.calledbut does not verify actual file-system state restoration. Since it uses mocks, this is expected — but combined with L1, it means the rollback correctness is never verified end-to-end.L5. PERFORMANCE — Full directory backup doubles disk usage at commit time for CoW/Overlay
For
CopyOnWriteSandboxandOverlaySandbox, the entire original directory is backed up to a temp directory before commit. For large repositories this doubles disk usage and adds significant I/O latency. TheGitWorktreeSandboxavoids this by recording only a commit SHA (much more efficient). This is an inherent cost of the chosen approach but worth noting for large-scale deployments.L6. DOCUMENTATION — Status transition diagram in
protocol.pydocstring is ambiguousThe ASCII diagram in
SandboxStatusdocstring (lines 50-58) does not clearly show the newCOMMITTED -> ROLLED_BACKtransition added by this PR. While the code (valid_transitions()) is correct, the visual representation is potentially misleading to developers reading the docstring.Summary Table
manager.py:245commit_allonly catchesSandboxError— atomicity broken for unexpected exceptionstransaction_sandbox.py:284copy_on_write.py:313,overlay.py:366rmtree+copytreein rollback — data loss riskmanager.py:304_rollback_committedcatches onlySandboxError— remaining rollbacks skippedmanager.py:260copy_on_write.py:451,overlay.py:539copy_on_write.py:436,overlay.py:528_backup_directoryacross two fileshelper_sandbox_integration.pycopy_on_write.py,overlay.pyprotocol.py:50-58Verdict: The HIGH severity items (H1, H2, H3) should be addressed before merge. H1 and H3 are particularly concerning as they can silently violate the atomicity guarantee or cause data loss — the very things this PR is designed to prevent.
Review Findings — Resolution Report
All 10 applicable findings from the code review have been addressed and amended into the feature commit. The commit is ready for force-push.
Applied Fixes
commit_allcatches onlySandboxErrorexcept Exception; non-SandboxErrorre-raised after rollbackTransactionSandbox.rollback()from COMMITTED falsely reports successSandboxRollbackError— manager correctly excludes fromrolled_back_idsrmtree+copytreedata loss risksafe_restore()using rename-based atomic swap_rollback_committedcatches onlySandboxErrorexcept Exceptionso remaining rollbacks proceed_rollback_committednow returns(rolled_back_ids, failed_rollback_ids); metadata carries both keysbackup_directory()preserves symlinks viaos.readlink/os.symlinkand permissions viaos.chmod.. note::incommit_alldocstring referencing lines 45938 vs 19193_backup_directorycode_fs_utils.pyshared moduleprotocol.pywith explicitCOMMITTED -> ROLLED_BACKpathrollback_failedmetadata assertion; updated Robot test to verify both metadata keysNot Applied (with justification)
Quality Gates
nox -s lintnox -s typechecknox -s unit_testsnox -s integration_testsf23f0bc7e58b4e774575New commits pushed, approval review dismissed automatically according to repository settings
Code Review Report — PR #1146:
fix(sandbox): make commit_all atomic per specificationReviewer: Automated deep review (4 full cycles across all categories)
Scope: Code changes in branch
fix/atomic-sandbox-commit+ close connections to surrounding codeIssue: #925
Commit:
8b4e774Summary
The PR correctly implements atomic
commit_allsemantics as required by the specification (line 45938). The approach — pre-commit backup + rollback-on-failure — is sound. The COMMITTED → ROLLED_BACK transition is properly added to the state machine. The_fs_utilsextraction is clean. The specification contradiction (line 45938 vs 19193) is well-documented.Findings: 5 bugs, 1 security concern, 1 performance concern, 5 test coverage gaps, 3 robustness issues.
1. Bugs — MEDIUM Severity
1.1
_rollback_committedrolls back in forward order instead of reverseFile:
manager.py:324Description: When rolling back already-committed sandboxes, the iteration proceeds in the same (forward) order as the commits. Standard transactional rollback semantics use reverse (LIFO) order to properly unwind dependencies. If sandbox B's commit depends on sandbox A's committed state, rolling back A first can leave B's rollback in an inconsistent state.
Suggested fix:
for sandbox, _result in reversed(committed):1.2 Undocumented re-raise behavior for non-
SandboxErrorincommit_allFile:
manager.py:276-280Description: When a sandbox's
commit()raises a non-SandboxErrorexception (e.g.,RuntimeError,TypeError), the code performs the rollback and then re-raises the original exception. The caller receives an exception instead of a structuredCommitResult. This is reasonable but the method's docstring (lines 220-229) only documents theCommitResult-based return contract — it does not mention that exceptions may propagate. This is an API contract inconsistency.Suggested fix: Add to the docstring:
Raises: Exception — If a non-SandboxError exception occurs during commit, it is re-raised after rollback. The caller receives the exception, not a CommitResult.1.3 After atomic rollback, sandbox states prevent retry of the full batch
File:
manager.py:237-241and_rollback_committedDescription: After
commit_allfails and rolls back previously-committed sandboxes, those sandboxes transition toROLLED_BACKstatus. If the caller retriescommit_all, the committable filter (sb.status in (CREATED, ACTIVE)) skips allROLLED_BACKsandboxes, so only the previously-un-attempted sandboxes are retried. The full batch cannot be retried without manual intervention to re-activate the rolled-back sandboxes.This is inconsistent with the atomic guarantee — the expectation after an all-or-nothing failure is that the system returns to a state where the entire operation can be retried.
Suggested fix: Either (a) transition rolled-back sandboxes back to
ACTIVEin_rollback_committed, or (b) document clearly that after atomic failure the plan must be fully re-created, or (c) includeROLLED_BACKin the committable filter if it makes sense for the lifecycle.1.4 Backup directory not cleaned up on commit failure (resource leak)
Files:
copy_on_write.py:244-266,overlay.py:291-313Description: In both
CopyOnWriteSandbox.commit()andOverlaySandbox.commit(),backup_directorycreates a backup at line 244/291 before applying changes. If the commit subsequently fails withOSError(e.g., during file copy), the status transitions toERROREDandSandboxCommitErroris raised — but the backup temp directory is never cleaned up. It persists untilcleanup()is explicitly called.Suggested fix: Clean up the backup directory in the
except OSErrorhandler before raising.1.5
git reset --hardduring rollback from COMMITTED can affect other worktreesFile:
git_worktree.py:493-497Description: When rolling back a git worktree sandbox from COMMITTED,
git reset --hard <pre_merge_commit>is executed on the original branch. If other sandbox worktrees or external tooling are tracking the same branch, this hard reset will affect them. This is an inherent limitation of the git worktree approach but is worth documenting.Suggested fix: Add a docstring warning about multi-worktree safety.
2. Bugs — LOW Severity
2.1
backup_directorydoesn't preserve file timestamps (mtime/atime)File:
_fs_utils.py:58-59(_copy_file)Description:
_copy_filepreserves permissions viaos.chmodbut does not preserve modification timestamps. When restoring from backup, files will have the backup creation time. This could affect build tools, caches, ormake-like systems that rely on file modification timestamps to determine staleness.Suggested fix: Add
os.utime(dst, (src_stat.st_atime, src_stat.st_mtime))afteros.chmodin_copy_file.2.2
backup_directorydoesn't handle special files (FIFOs, sockets, device nodes)File:
_fs_utils.py:38-59Description: The function handles regular files and symlinks but silently skips special files. If the sandboxed directory contains FIFOs or sockets (unlikely but possible), the backup would be incomplete and the restore would not match the original.
Suggested fix: Log a warning when encountering special files, or raise explicitly.
2.3 Root directory permissions not preserved by
backup_directoryFile:
_fs_utils.py:21-59Description: The function receives a
dstdirectory (created bytempfile.mkdtemp) and copies contents into it. The permissions of the source root directory (src) are never applied todst. Whensafe_restoreusesshutil.copytreeto copy back, the restored root directory may have different permissions than the original.Suggested fix: Add
os.chmod(dst, stat.S_IMODE(os.stat(src).st_mode))at the start ofbackup_directory.3. Security — LOW Severity
3.1 Predictable stale path in
safe_restore(TOCTOU symlink concern)File:
_fs_utils.py:82Description: The stale path is deterministic (
target_path + ".atomic-rollback-old"). In a multi-user environment, an attacker could pre-create a symlink at this path pointing to a sensitive directory. The pre-existence cleanup (shutil.rmtree(stale, ignore_errors=True)on line 87) would follow the symlink and delete the target.Mitigating factors: Sandbox paths are typically in temp dirs or controlled workspaces, limiting exposure.
Suggested fix: Use
tempfile.mkdtemp(dir=os.path.dirname(target_path))for an unpredictable stale name, or add anos.path.islinkcheck beforermtree.4. Performance — MEDIUM Severity
4.1 Full backup copy before every commit doubles I/O and disk usage
Files:
copy_on_write.py:244-245,overlay.py:291-292Description: Both
CopyOnWriteSandbox.commit()andOverlaySandbox.commit()create a full copy of the original directory before applying changes, even when the batch succeeds and the backup is never used. The backup persists untilcleanup()is called. For large repositories, this doubles disk I/O during commit and increases temporary disk usage.Suggested fix: Consider (a) using hard links instead of copies for unchanged files (same-filesystem optimization), (b) deferring backup creation to the manager level (only backup when there are multiple sandboxes to commit), or (c) cleaning up backups eagerly on success in
commit_all.5. Test Coverage Gaps — MEDIUM Severity
5.1 No test for non-
SandboxErrorexception propagation incommit_allFile:
manager.py:276-280Description: The code path where a sandbox's
commit()raises a non-SandboxErrorexception (e.g.,RuntimeError) triggers rollback and then re-raises. No test verifies that (a) the rollback occurs and (b) the exception propagates.5.2 No dedicated unit tests for
_fs_utilsmoduleFile:
_fs_utils.pyDescription:
backup_directory,safe_restore, and_copy_filehave no direct unit tests. Their behavior is only tested indirectly through sandbox commit/rollback scenarios. Edge cases (symlinks, restricted permissions, empty directories, cross-filesystem scenarios, pre-existing stale paths) are not covered.5.3 No test for
safe_restorefailure/recovery pathFile:
_fs_utils.py:97-100Description: The fallback path where
shutil.copytreefails during restore and the original is renamed back from stale is not tested.5.4 No test for
GitWorktreeSandbox.rollback()from COMMITTED stateFile:
git_worktree.py:487-498Description: The new rollback-from-COMMITTED code path (
git reset --hardon the original branch) has no corresponding test scenario in the feature files or robot tests.5.5 No test for
TransactionSandboxin the atomiccommit_allflowFile:
transaction_sandbox.py:281-288Description:
TransactionSandbox.rollback()from COMMITTED raisesSandboxRollbackError(database commits are irreversible). When this sandbox type participates in an atomiccommit_allbatch, it will always appear inrollback_failed. No test verifies this interaction.6. Summary Table
manager.py:324manager.py:276manager.py:237copy_on_write.py:244,overlay.py:291git_worktree.py:493_fs_utils.py:59_fs_utils.py:38_fs_utils.py:21_fs_utils.py:82copy_on_write.py:244,overlay.py:291manager.py:276_fs_utils.py_fs_utils.py:97git_worktree.py:487transaction_sandbox.py:281Review performed by: automated 4-cycle deep analysis
Review scope: strictly limited to code changes in
fix/atomic-sandbox-commit+ immediate surrounding code8b4e7745757fecd4b17cCode Review Report — PR #1146
fix(sandbox): make commit_all atomic per specificationReviewer: Automated deep review (4 full cycles across all categories)
Scope: All code changes in branch
fix/atomic-sandbox-commitplus close surrounding contextRef: Issue #925, Specification
docs/specification.md(lines 45938, 19193, 24632)Summary
Overall this is a well-structured change that addresses a genuine specification gap. The atomic
commit_allprotocol, LIFO rollback ordering, COMMITTED→ROLLED_BACK state transition, and shared_fs_utilsmodule are all sound design choices. The commit message and documentation are thorough. However, the review identified 13 findings across 5 categories, including 2 high-severity bugs that should be addressed before merge.Findings by Severity
HIGH — Should be addressed before merge
H1.
NoSandbox.rollback()breaks atomicity guarantee in mixed batchesCategory: Bug / Spec Compliance
Files:
no_sandbox.py:190-200,manager.py:256-301NoSandbox.rollback()unconditionally raisesSandboxRollbackErrorregardless of status. In a batch containing aNoSandboxalongside a real sandbox (e.g.,CopyOnWriteSandbox), if the real sandbox fails to commit, theNoSandboxcannot be rolled back. Its changes (applied immediately, per the"none"strategy) persist while others are undone — violating the atomicity guarantee of spec line 45938.The specification itself acknowledges
nonehas "No automatic rollback" (line 45932), creating an inherent tension with the atomicity requirement. The code correctly reportsNoSandboxinrollback_failedmetadata, but the caller has no mechanism to recover.Suggested mitigations (pick one):
NoSandboxinstances last in the batch so they're never in the "already committed" set when a failure occurs.NoSandboxfrom atomic commit batches (fail fast with a clear error).NoSandboxis included in an atomic commit.H2. Pre-commit backup deleted on partial commit failure — data loss risk
Category: Bug
Files:
copy_on_write.py:262-266,overlay.py:309-313In
CopyOnWriteSandbox.commit()andOverlaySandbox.commit(), when file copying to the original directory fails midway (partial modification), theexcept OSErrorhandler deletes the pre-commit backup:At this point the original directory is partially corrupted (some files were copied before the error), and the backup that could restore it is now gone. The
cleanup()method already handles backup cleanup, so the explicit deletion here is both unnecessary and harmful.Suggested fix: Replace the backup deletion with a restore attempt:
Or at minimum, remove the
rmtreecall and letcleanup()handle it.MEDIUM — Should be addressed (non-blocking)
M1. Rollback from COMMITTED with missing backup falls through silently
Category: Bug
Files:
copy_on_write.py:316-335,overlay.py:368-396When
self._status == SandboxStatus.COMMITTEDbutself._pre_commit_backup is None, theif/elsefalls through to the regular ACTIVE rollback path, which re-copies from the already-modified original directory. This silently produces a sandbox that reflects the committed changes instead of the pre-commit state — a rollback that doesn't actually roll back.While this path should not occur in normal operation, it is a defensive gap. Consider raising
SandboxRollbackErrorwhen COMMITTED and backup is missing:M2. LIFO rollback order test uses fragile string comparison
Category: Test Flaw
File:
features/steps/sandbox_manager_coverage_steps.py:587-592This relies on resource-ID keys (
"plan-001:res-001","plan-001:res-002") sorting lexicographically in commit order. If future resource IDs don't follow this pattern (e.g., ULIDs, non-alphabetic names), the test breaks without any real code defect. Consider comparing against the known commit order explicitly:M3. Temp directory leak in
_fs_utilsBDD testsCategory: Test Flaw
File:
features/steps/sandbox_fs_utils_steps.py:38, 55, 64, 72Six scenarios create temp directories via
tempfile.mkdtemp()stored incontext._fs_tmpdirbut no cleanup handler is registered. Theafter_scenariohook inenvironment.pyonly cleanscontext.test_dir, not_fs_tmpdir. These directories accumulate across test runs.Suggested fix: Register a cleanup handler in each Given step:
M4. Missing BDD scenarios for strategy-specific COMMITTED rollback
Category: Test Coverage
Files: Feature files
The following rollback-from-COMMITTED paths are implemented but lack dedicated BDD scenarios:
OverlaySandbox.rollback()from COMMITTED (usessafe_restore)GitWorktreeSandbox.rollback()from COMMITTED (usesgit reset --hard)TransactionSandbox.rollback()from COMMITTED (raisesSandboxRollbackError)These are critical paths for the atomic commit protocol and should have explicit coverage.
M5. Full directory backup on every commit even with empty diffs
Category: Performance
Files:
copy_on_write.py:243-245,overlay.py:290-292backup_directoryruns unconditionally after_compute_diff, even when the diff is empty (no changed, added, or deleted files). For large project resources with thousands of files, this creates a complete copy that is immediately discarded on the successful (no-change) commit path. Consider guarding:LOW — Minor issues / nice-to-have improvements
L1.
GitWorktreeSandbox._pre_merge_commitnot cleared on commit failureCategory: Bug (latent)
File:
git_worktree.py:417-442If the merge fails,
_pre_merge_commitretains a stale value (the commit hash before the attempted merge). Status transitions to ERRORED, which only allows CLEANED_UP, so the stale value is currently harmless. But if future code adds recovery from ERRORED, this could cause incorrectgit reset --hardto a commit where no merge occurred.L2.
safe_restoredouble-failure leaves target at stale pathCategory: Bug (edge case)
File:
_fs_utils.py:92-103If
shutil.copytreefails and thenos.rename(stale, target_path)also fails (e.g., filesystem corruption), the original directory remains at the.atomic-rollback-oldsibling path andtarget_pathdoesn't exist. This is an extreme edge case but the function's docstring claims "the original is preserved" unconditionally.L3. Predictable stale-directory naming in
safe_restoreCategory: Security (low risk)
File:
_fs_utils.py:85The
.atomic-rollback-oldsuffix is deterministic. In scenarios where sandbox paths are attacker-influenced, a pre-created symlink at this path could causeshutil.rmtreeto follow the symlink. Mitigated by sandbox directories residing in controlled temp paths.L4. Integration test uses mocks instead of real sandbox implementations
Category: Test Fidelity
File:
robot/helper_sandbox_integration.py:617-683_manager_atomic_commitusesunittest.mock.MagicMockfor both the OK and failing sandboxes. This tests the manager's orchestration logic but not the actual rollback behavior of real sandbox implementations (e.g.,safe_restoreon filesystem,git reset --hard).L5. Commit ordering not enforced per specification
Category: Spec Compliance (pre-existing)
File:
manager.py:256Specification line 24632 requires commit to be bottom-up (children before parents) and rollback to be top-down.
commit_alliterates sandboxes in Python dict insertion order, which is not necessarily bottom-up topological order. The LIFO rollback only matches the spec if commit order was correct. This is a pre-existing concern not introduced by this PR, but it affects the correctness of the new LIFO rollback.Positive Observations
_fs_utilsextraction is clean and well-motivated; avoidingshutil.copytree/os.makedirsto prevent test-patch interference is thoughtful.COMMITTED → ROLLED_BACKtransition inprotocol.pyis correctly added to the transition graph, diagram, and docstring.commit_alldocstring (line 45938 vs 19193) is responsible documentation.rolled_back,rollback_failedkeys) gives callers actionable information for recovery decisions.Exception(not justSandboxError) in bothcommit_alland_rollback_committedis correct for robustness — unexpected errors cannot bypass rollback.7fecd4b17cc0614da1faCode Review Report -- PR #1146: fix(sandbox): make commit_all atomic per specification
Reviewer: Automated code review (requested by CoreRasurae)
Scope: Code changes in branch
fix/atomic-sandbox-commitand close connections to surrounding codeReference: Issue #925, specification lines 45938 (atomicity), 19193 (partial apply), 24632 (dependency ordering)
Review cycles: 4 full passes across all categories (bugs, security, performance, test coverage/flaws, spec compliance)
Summary
The PR successfully converts
SandboxManager.commit_all()from partial-commit to all-or-nothing semantics with LIFO rollback, adds a shared_fs_utilsmodule for backup/restore, and extends all 4 sandbox strategies with rollback-from-COMMITTED support. The specification contradiction between line 45938 (atomicity) and line 19193 (partial apply) is correctly documented.However, multiple review cycles revealed 12 findings across 5 categories, including 1 high-severity bug that can cause data corruption under disk-pressure conditions.
Findings by Severity
HIGH -- Bugs
BUG-1: Backup failure during commit can corrupt the original directory
Files:
copy_on_write.py:247-248,overlay.py:294-295Severity: HIGH -- Can cause data loss of the original resource
In both
CopyOnWriteSandbox.commit()andOverlaySandbox.commit(),self._pre_commit_backupis assigned beforebackup_directory()is called:If
backup_directory()fails partway (e.g., disk full, permission error), theexcept OSErrorhandler at line 265/312 checksself._pre_commit_backup is not None(True, since it was set before the call) and invokessafe_restore(). Since the original directory has not been modified yet at this point (file copy operations haven't started),safe_restore()replaces the intact original with the partial backup data -- corrupting it.Suggested fix: Assign
self._pre_commit_backuponly afterbackup_directory()succeeds:This ensures
safe_restore()is only attempted when the original was actually modified.MEDIUM -- Bugs
BUG-2: No-change commit followed by rollback falsely reports rollback failure
Files:
copy_on_write.py:246,331-335,overlay.py:293,384-388When
commit()detects no changes, the pre-commit backup is skipped (_pre_commit_backupremainsNone) and status transitions toCOMMITTED. If the atomiccommit_all()later fails on a different sandbox and triggers_rollback_committed(),rollback()is called on this no-change sandbox. The rollback check:raises
SandboxRollbackErroreven though the original was never modified and no rollback is needed. This results in the sandbox being reported infailed_rollback_idsmetadata, producing misleading error reports.Suggested fix: When there are no changes to apply, skip the backup-required check during rollback (the original is unmodified):
BUG-3:
commit_alllock released during commit loop is not fully thread-safeFile:
manager.py:242-317The
_lockis acquired to snapshot the sandbox list (line 242-243) then released before the commit/rollback loop. A concurrent thread callingrollback_all(),cleanup_all(), or modifying sandbox state during the commit loop could interfere with the atomic commit protocol. While the loop operates on a snapshot, individual sandbox objects are shared references and another thread could mutate their state.Impact: Low probability in practice since concurrent access to the same plan's sandboxes during commit is unusual, but the contract should be documented.
MEDIUM -- Test Coverage
TEST-1: Mock sandboxes have static status that doesn't reflect real state transitions
File:
sandbox_manager_coverage_steps.py:35PropertyMock(return_value=status)creates mocks with an immutable status. Whencommit_all()callssandbox.commit(), the mock returns aCommitResultbut the mock's.statusstaysACTIVE(never transitions toCOMMITTED). When_rollback_committed()then callssandbox.rollback(), it's rolling back a mock that appears to be inACTIVEstate, notCOMMITTEDstate. This means the tests don't exercise the real status-dependent branching inrollback()implementations (which behave differently for ACTIVE vs COMMITTED).TEST-2: No integration tests exercise real filesystem rollback from COMMITTED during atomic commit_all
Files: BDD feature files and Robot tests
All atomic commit rollback scenarios use mocks for the sandbox instances. No test exercises the actual filesystem-level
safe_restore()path during acommit_allfailure with realCopyOnWriteSandboxorOverlaySandboxinstances. This means BUG-1 above is not caught by existing tests. A test that creates real filesystem sandboxes, commits one successfully, fails the second, and verifies the first's original directory is correctly restored would close this gap.LOW -- Bugs
BUG-4:
backup_directorydoesn't preserve directory timestampsFile:
_fs_utils.py:51-54File timestamps are preserved via
os.utime()(line 126), but directory access/modification timestamps are not. Only directory permissions are preserved viaos.chmod()(line 54). For tools or validation logic that relies on directory mtime (e.g., make-like incremental builds), restored directories would have current timestamps instead of originals.LOW -- Performance
PERF-1: Full directory backup for any change
Files:
copy_on_write.py:247-248,overlay.py:294-295backup_directory()copies the entire original directory even if only one file changed. For large directories (the project's scaling tests target 10K-100K files), this adds significant I/O overhead to every commit. The no-change optimization (skip backup when diff is empty) helps, but a single-file change still triggers a full copy.Note: This is a known tradeoff for simplicity/correctness. A potential future optimization would be to back up only the files that will be modified (based on
_compute_diffresults), though this adds complexity to the restore logic.LOW -- Test Coverage
TEST-3: No test verifies the NoSandbox warning log in commit_all
File:
manager.py:259-266The code logs a warning when
NoSandboxinstances are present in a batch (since they weaken atomicity). No BDD or Robot test verifies this warning is emitted. Adding a log-capture assertion would confirm the warning is triggered and contains the expected message.TEST-4: No test for
TransactionSandbox.rollback()from COMMITTED raisingSandboxRollbackErrorFile:
transaction_sandbox.py:281-289The new behavior where
TransactionSandbox.rollback()from COMMITTED raisesSandboxRollbackError(because DB COMMIT is irreversible) is not directly tested. This is a critical path for the atomic commit protocol -- if aTransactionSandboxcommits successfully and a later sandbox fails, the rollback of theTransactionSandboxshould fail and be reported infailed_rollback_ids.LOW -- Spec Compliance
SPEC-1: LIFO rollback order doesn't match spec's "top-down" requirement
File:
manager.py:346The commit message cites "specification dependency ordering (line 24632: rollback is top-down)" as justification for LIFO rollback order. However, the spec's "top-down" refers to resource DAG traversal (parent before child), not reverse-of-commit-order. LIFO (reverse insertion order of
dict.values()) may produce a different ordering than top-down DAG traversal.In practice, since sandboxes in
commit_alloperate on independent resource boundaries, the ordering is unlikely to matter. But the docstring/commit message should clarify that LIFO is used for transaction-log-style undo rather than citing the DAG-based spec requirement.LOW -- Security
SEC-1: Symlink targets not validated during backup
File:
_fs_utils.py:48-49,58-60backup_directory()preserves symlinks without validating that their targets are within the source directory tree. A symlink pointing to/etc/passwdwould be faithfully backed up and restored. Practical risk is very low since sandboxes are not user-facing and symlinks in the original directory are pre-existing.SEC-2: Predictable stale directory naming in
safe_restoreFile:
_fs_utils.py:85The stale directory uses a fixed suffix
.atomic-rollback-old, making the name predictable. In a shared filesystem environment, another process could pre-create this path to interfere with the restore. Practical risk is negligible since the sandbox temp directories have unique names.Overall Assessment
The implementation correctly addresses the core requirement of atomic
commit_all()and is well-structured with clear documentation of the spec contradiction. BUG-1 should be fixed before merge as it can cause data corruption under disk-pressure conditions. BUG-2 produces misleading error reports but doesn't cause data loss. The test coverage gaps (TEST-1, TEST-2) should ideally be addressed to verify the real filesystem paths that BUG-1 affects.c0614da1fad58311d39aCode Review Report — PR #1146
fix/atomic-sandbox-commitReviewer: Automated deep review (3 full-cycle passes)
Scope: All code changes on branch
fix/atomic-sandbox-commitplus immediate surrounding codeCommit:
d58311d3by Luis MendesIssue: #925 — ensure atomic apply —
commit_allmust be all-or-nothingOverall Assessment
The implementation correctly converts
SandboxManager.commit_all()from partial-commit to all-or-nothing semantics. The design choices (rollback-on-failure over pre-validation, LIFO rollback order, strategy-specific rollback mechanisms) are sound. The code is well-documented with clear explanations of trade-offs and spec contradictions. The_fs_utilsextraction is a good refactoring that eliminates duplication.However, the review identified several issues across multiple categories that should be addressed before merge.
Findings by Severity
CRITICAL — Bugs
B1.
OverlaySandbox: stale merged directory after rollback from COMMITTEDFile:
overlay.py:394-409| Severity: CriticalWhen
rollback()is called fromCOMMITTEDstate,safe_restore()correctly restores the original directory from the pre-commit backup, but the merged directory (_merged_dir) is not reset. SinceOverlaySandbox.get_path()(line 238-242) allows theROLLED_BACKstatus (unlikeCopyOnWriteSandbox), a subsequentget_path()call would transition toACTIVEand return paths into the stale merged directory — exposing pre-rollback data to the caller.Impact: Data integrity violation — the sandbox appears clean but contains stale modifications.
Suggested fix: After restoring the original from backup in the COMMITTED rollback branch, also reset the merged directory (re-copy from the now-restored original, or recreate the overlay mount).
HIGH — Bugs
B2. Non-
SandboxErrorre-raise loses rollback metadataFile:
manager.py:298-302| Severity: HighWhen
commit()raises a non-SandboxErrorexception (e.g.,RuntimeError), the atomic rollback is performed but the original exception is re-raised directly. The caller receives the bare exception with no indication of which sandboxes were rolled back or which rollback operations failed. The detailederror_msgis logged but not attached to the exception.Impact: The caller cannot determine the system state after the exception — whether rollback succeeded, partially succeeded, or failed.
Suggested fix: Wrap the original exception in a
SandboxError(or a newAtomicCommitErrorsubclass) that includes the rollback metadata, or attach the metadata to the exception via__notes__(Python 3.11+).B3. Backup created in system temp dir — cross-filesystem
safe_restoreperformanceFile:
copy_on_write.py:253,overlay.py:300| Severity: High (Performance + Correctness risk)tempfile.mkdtemp()creates the backup directory in the system temp directory (typically/tmp). If the original directory is on a different filesystem (e.g.,/home,/data), then:backup_directory()must copy across filesystems (slow for large dirs)safe_restore()must useshutil.copytreefor the restore (slow again)If the backup were created on the same filesystem as the original (e.g., using the original's parent as
dir=argument tomkdtemp), the backup could potentially use hard links and the restore rename would be nearly instantaneous.Impact: For large project directories, the backup+restore cycle could take minutes instead of seconds.
HIGH — Spec Compliance / Design
S1.
NoSandboxandTransactionSandboxfundamentally break atomicityFile:
manager.py:254-266,no_sandbox.py:190-200,transaction_sandbox.py:281-289| Severity: High (Design)The commit correctly logs a warning when
NoSandboxinstances are in the batch, but the operation proceeds anyway. BothNoSandbox.rollback()andTransactionSandbox.rollback()fromCOMMITTEDraise unconditionally, meaning:NoSandboxin the batch makes atomicity impossible (changes applied in-place, cannot be undone)TransactionSandboxthat committed before a failure cannot be rolled back (database COMMIT is permanent)The warning message says atomicity is "weakened" — a more accurate statement would be that atomicity is broken.
Suggested action: Consider either:
commit_allwhen non-rollbackable sandboxes are present (strict mode), orMEDIUM — Bugs
B4.
_fs_utils.safe_restoreonly catchesOSError— non-OS exceptions leave directory renamedFile:
_fs_utils.py:103-106| Severity: MediumIf
shutil.copytree(line 97-102) raises a non-OSErrorexception (unlikely but possible — e.g.,MemoryErrorfrom a deeply nested tree), theexcept OSErrorblock won't catch it. The original directory has already been renamed tostale(line 95), and the rename-back at line 105 won't execute. This leavestarget_pathnon-existent and the original data only accessible via the.atomic-rollback-oldsuffix.Suggested fix: Broaden the except clause to
except BaseException(or at minimumexcept Exception) for the restore-rename, while still re-raising:MEDIUM — Performance
P1. Full directory backup for any change
File:
copy_on_write.py:252-259,overlay.py:299-306| Severity: MediumEven a single-file change triggers a complete directory backup via
backup_directory(). For large project directories (the spec references Firefox-scale codebases), this means copying potentially gigabytes of data for a one-line edit. The no-change optimization is good, but the common case (some changes) remains expensive.Considered: A per-file backup (only backing up files that will be overwritten) would be more efficient but significantly more complex. This is a known trade-off and may be acceptable for v1.
P2. Triple directory walk during commit
File:
copy_on_write.py:239-255,overlay.py:286-306| Severity: MediumThe commit path walks the directory tree three times:
_compute_diff()walks both sandbox and originalbackup_directory()walks the original againFor very large directories, consolidating into fewer walks could improve performance.
MEDIUM — Test Coverage
T1. No integration test for real filesystem rollback from COMMITTED
File:
features/sandbox_manager_coverage.feature| Severity: MediumAll atomic commit BDD scenarios use mock sandboxes. There is no test that exercises the actual
safe_restore()→ filesystem restoration path throughCopyOnWriteSandbox.rollback()fromCOMMITTED. The_fs_utilstests coversafe_restorein isolation, but the fullcommit() → partial-failure → rollback-from-COMMITTEDpath with real files is untested.T2. No test for
commit_allwithTransactionSandboxin batchSeverity: Medium
There is no scenario verifying that a
TransactionSandboxthat committed before a failure correctly ends up infailed_rollback_ids(since its rollback from COMMITTED always raisesSandboxRollbackError).T3. No test for
OverlaySandboxstale merged dir after COMMITTED rollbackSeverity: Medium (directly related to bug B1)
Missing test that would catch the stale merged directory issue described in B1.
LOW — Bugs / Robustness
B5.
backup_directorydoes not handle special files (FIFOs, sockets, devices)File:
_fs_utils.py:58-65| Severity: Lowbackup_directoryhandles regular files, symlinks, and directories, but_copy_filewould block or fail on named pipes, Unix domain sockets, or device files. These are unlikely in typical project directories but could appear in certain resource types.B6.
safe_restorestale cleanup has TOCTOU windowFile:
_fs_utils.py:92-93| Severity: Lowos.path.exists(stale)followed byshutil.rmtree(stale, ignore_errors=True)has a small TOCTOU (time-of-check-time-of-use) window. Theignore_errors=Truemitigates the practical impact — if the stale directory changes between check and removal, the rmtree failure is silenced and the subsequentos.renamewill fail with a clear error.LOW — Test Quality
T4. Rollback order test implicitly depends on Python dict insertion order
File:
features/steps/sandbox_manager_coverage_steps.py| Severity: LowThe LIFO rollback order test relies on
_committable_mock_keysbeing populated in the order the step definitions are called, and on_active_sandboxes[plan_id].values()iterating in insertion order. While Python 3.7+ guarantees dict insertion order, the test is implicitly testing a Python language guarantee rather than explicit ordering logic in the manager.T5. Robot integration test for atomic commit uses mocks
File:
robot/helper_sandbox_integration.py:333-407| Severity: LowThe "Manager Atomic Commit Rolls Back On Failure" Robot test replaces sandboxes with MagicMock objects, making it effectively a unit test despite being in the integration test suite. A true integration test would use real sandbox implementations with real filesystem operations.
T6. Cross-module step definition dependency
File:
features/steps/sandbox_manager_coverage_steps.py→lock_service_coverage_steps.py| Severity: LowThe "a RuntimeError should have been raised" step used in the atomic commit scenarios is defined in
lock_service_coverage_steps.py, creating a hidden cross-module dependency. If that file is refactored, the sandbox tests break. Consider defining this step in a shared steps module or locally.INFORMATIONAL — Documentation / Style
D1. Inline import in hot path
File:
manager.py:257| Severity: Informationalfrom cleveragents.infrastructure.sandbox.no_sandbox import NoSandboxis insidecommit_all(). Python caches imports after first load, so the performance impact is negligible after the first call. However, top-level imports are conventional and easier to trace. Since this avoids a circular import, it's acceptable but worth a comment explaining why.D2. Spec contradiction documented clearly
File:
manager.py:210-215| Severity: Informational (Positive)The spec contradiction between line 45938 (atomicity) and line 19193 (partial apply) is clearly documented in the docstring with specific line references. This is good practice.
D3. LIFO vs DAG-based rollback distinction documented
File:
manager.py:332-342| Severity: Informational (Positive)The distinction between LIFO batch undo and spec-defined DAG-based rollback is clearly explained in the
_rollback_committeddocstring.Summary Table
overlay.pymanager.pycopy_on_write.py,overlay.pymanager.py,no_sandbox.py,transaction_sandbox.py_fs_utils.pysafe_restoreexcept clause too narrowcopy_on_write.py,overlay.pycopy_on_write.py,overlay.pysandbox_manager_coverage.feature_fs_utils.py_fs_utils.pysandbox_manager_coverage_steps.pyhelper_sandbox_integration.pysandbox_manager_coverage_steps.pymanager.pyCritical: 1 | High: 3 | Medium: 6 | Low: 5 | Info: 1
d58311d39abaf3d9f9f1Code Review Report — PR #1146:
fix(sandbox): make commit_all atomic per specificationBranch:
fix/atomic-sandbox-commitIssue: #925
Commit:
baf3d9f9Review scope: All code changes in the branch + immediate surrounding context.
Review method: 3 iterative full-pass cycles covering bug detection, security, performance, test coverage, and documentation.
Summary
The PR implements all-or-nothing semantics for
SandboxManager.commit_all()per the specification's atomicity requirement (line 45938). The overall design is sound: rollbackable sandboxes commit first, non-rollbackable last; on failure, LIFO rollback is performed on already-committed sandboxes. The shared_fs_utilsmodule, pre-commit backups, andsafe_restorerename-swap pattern are well thought out.However, the review identified 1 high-severity bug, 5 medium-severity issues, and several lower-priority items across bugs, test coverage gaps, and documentation inconsistencies.
HIGH Severity
H1 — Bug:
TransactionSandboxmisclassified as rollbackable incommit_allFile:
manager.py:264-275commit_all()classifies sandboxes as non-rollbackable usingisinstance(sb, NoSandbox)only.TransactionSandbox.rollback()fromCOMMITTEDstate raisesSandboxRollbackError(line 281-289 oftransaction_sandbox.py) because a databaseCOMMITis irreversible. YetTransactionSandboxends up in therollbackablelist and is committed first in the batch.If a later sandbox in the batch fails, the atomic rollback attempts to undo the
TransactionSandboxcommit and always fails. Database changes become permanent while filesystem sandboxes are rolled back — breaking the atomicity guarantee this PR is implementing.Recommendation: Add
TransactionSandboxto the non-rollbackable classification (or introduce asupports_rollback_from_committedprotocol method) so it is committed last, alongsideNoSandbox.MEDIUM Severity
M1 — Bug:
OverlaySandboxCOMMITTED rollback doesn't remount OverlayFSFile:
overlay.py:400-427When rolling back from
COMMITTEDon a real OverlayFS sandbox, the code restores the original viasafe_restoreand then resets the merged directory with a plainshutil.copytree. The OverlayFS mount is not re-established. If the sandbox is subsequently reactivated (ROLLED_BACK→ACTIVEviaget_path()), writes will go to the plain copy rather than through the overlay filesystem — silently losing the copy-on-write isolation.The
upper_dirandwork_dirare also not reset, leaving them in a stale state.Recommendation: For real OverlayFS, the rollback from
COMMITTEDshould unmount, clean upper/work dirs, and remount (similar to theACTIVErollback path for real overlay).M2 — Bug: Merged dir reset uses
dirs_exist_ok=Falseafterrmtree(ignore_errors=True)File:
overlay.py:420-427If
rmtreefails silently (e.g., due to a locked file or permission issue),copytreewithdirs_exist_ok=FalseraisesFileExistsError, causing the rollback to fail withERROREDstatus. Since the backup has already been consumed bysafe_restoreat this point, there is no recovery path.Recommendation: Use
dirs_exist_ok=True, or perform the rmtree withoutignore_errorsand handle the error explicitly.M3 — Bug:
CopyOnWriteSandbox/OverlaySandboxrollback only catchesOSErrorFiles:
copy_on_write.py:375,overlay.py:448The rollback methods catch
except OSError as exc:butsafe_restorecan re-raise non-OSErrorexceptions. For example,shutil.copytreeinternally catchesBaseExceptioninsafe_restore, renames the original back, and re-raises. If the original exception isMemoryErroror another non-OSError, the rollback'sexcept OSErrorclause doesn't catch it. The sandbox status is never set toERRORED, and_pre_commit_backupis not cleared.Recommendation: Broaden the catch to
except Exception as exc:(or at minimumexcept BaseException as exc:) in the rollback methods, matching the pattern already used in_rollback_committed.M4 — Bug: Non-
SandboxErrorre-raise incommit_allloses rollback metadataFile:
manager.py:312-316When a non-
SandboxErrorexception triggers the atomic rollback, the rollback is performed androlled_back_ids/failed_rollback_idsare computed — but only logged. The original exception is re-raised as-is with no rollback metadata attached. The caller has no programmatic way to determine which sandboxes were rolled back.Recommendation: Wrap the original exception in a
SandboxError(or a newAtomicCommitErrorsubclass) that carries the rollback metadata, and chain the original as__cause__.M5 — Documentation:
TransactionSandbox.rollback()docstring contradicts implementationFile:
transaction_sandbox.py:264-289The docstring states:
But the code raises
SandboxRollbackErrorwithout transitioning toROLLED_BACK. The manager's_rollback_committedcatches this and reports it infailed_rollback_ids. The docstring should match the actual behavior.LOW Severity
L1 — Bug:
backup_directorydoesn't handle special filesFile:
_fs_utils.py:58-65os.walkyields FIFOs, sockets, and device files in thefilenameslist._copy_fileusesopen(src, "rb")which could hang on a FIFO or error on a socket. Sandbox directories are unlikely to contain these, but a defensiveos.path.isfile()check before_copy_filewould prevent surprises.L2 — Bug:
OverlaySandboxCOMMITTED rollback — no recovery if merged dir copytree fails after backup consumedFile:
overlay.py:420-427If
safe_restoresucceeds (consuming the backup) but the subsequentcopytreefor the merged directory fails, the original is correctly restored but the merged dir is in a broken state with no backup available for retry. Status transitions toERRORED, which is correct, but the sandbox cannot recover.L3 — Performance:
commit_allre-importsNoSandboxon every callFile:
manager.py:260from cleveragents.infrastructure.sandbox.no_sandbox import NoSandboxis insidecommit_all(). Python caches module imports, so the overhead is minimal (dict lookup), but moving it to module level (or using aTYPE_CHECKINGguard) would be cleaner. If circular import is the concern, aTYPE_CHECKINGguard could be used.L4 — Security:
safe_restoreuses predictable stale directory nameFile:
_fs_utils.py:88target_path + ".atomic-rollback-old"is a predictable sibling name. A local user could theoretically pre-create this as a symlink to interfere with the restore. The TOCTOU window betweenos.path.existsandshutil.rmtreeis small but present. Usingtempfile.mkdtemp(dir=os.path.dirname(target_path))for the stale name would eliminate this.Test Coverage Gaps
T1 (MEDIUM): No BDD test for
GitWorktreeSandboxrollback fromCOMMITTEDThe
git reset --hardto the pre-merge commit path is untested in BDD scenarios. Only the manager-level atomic commit is tested via mocks in the Robot test.T2 (MEDIUM): No BDD test for
TransactionSandboxrollback fromCOMMITTEDThe
SandboxRollbackErrorraised when attempting to undo a databaseCOMMITis untested in BDD. This is especially important given issue H1.T3 (MEDIUM): No test for
commit_allwith mixed sandbox typesA batch containing
CopyOnWriteSandbox+GitWorktreeSandbox+TransactionSandboxis untested. The interaction between rollbackable and non-rollbackable sandboxes of different types is not verified end-to-end.T4 (MEDIUM): Test steps silently skip when sandbox is
NoneFile:
sandbox_manager_coverage_steps.pyIn
step_given_sandbox_replaced_with_committable_mock(line 180) andstep_given_sandbox_commit_succeeds_rollback_fails(line 232), theif sandbox is not None:guard silently does nothing if the sandbox doesn't exist. This could mask test setup failures — a missing sandbox would cause the step to pass vacuously, and the scenario might pass with incorrect setup.Recommendation: Replace
if sandbox is not None:withassert sandbox is not None.T5 (LOW): Committable mock's
commit()returnsMagicMock, notCommitResultThe committable mock created in
step_given_sandbox_replaced_with_committable_mockhas no explicitcommit.return_value. In failure paths this doesn't matter (the mock results are discarded), but if any success-path test uses these mocks, the returnedMagicMockwould not match theCommitResultinterface.Informational
backup_directorydoes not preserve extended attributes (xattrs) or ACLs — only mode and timestamps are copied. This differs fromshutil.copy2behavior on platforms with xattr support.commit_alloperates without the lock during the commit loop (line 284+). This is a pre-existing pattern in the codebase (rollback_all,cleanup_all), but the atomic semantics make concurrent modification more impactful.git reset --hardduring COMMITTED rollback could affect concurrent processes tracking the same branch (documented in docstring warning — good).Verdict
The implementation is well-structured with thorough commit message documentation, good defensive patterns (safe_restore rename-swap, backup-before-modify), and comprehensive BDD coverage for the core scenarios. The H1 issue (
TransactionSandboxatomicity bypass) should be addressed before merge as it directly undermines the feature's purpose. The medium-severity items (M1–M5) are worth fixing but are in less-likely code paths.baf3d9f9f13a6dcd6d74Code Review Report — PR #1146 (
fix/atomic-sandbox-commit, issue #925)Reviewer: Automated deep review (4 global cycles across all categories)
Scope: Code changes in branch
fix/atomic-sandbox-commitplus immediate surrounding codeFiles reviewed:
_fs_utils.py,manager.py,protocol.py,copy_on_write.py,overlay.py,git_worktree.py,transaction_sandbox.py, BDD features/steps, Robot Framework testsReference: Specification
docs/specification.md(sandbox sections, lines 45938, 19193, 24632)Summary
The implementation achieves its primary goal:
commit_all()is now atomic with rollback support. The new_fs_utilsmodule,AtomicCommitErrorexception, and LIFO rollback logic are well-structured. The commit-ordering strategy (rollbackable-first, non-rollbackable-last) is sound. However, the review identified 17 findings across multiple categories.HIGH Severity
H1.
backup_directorydoes not preserve directory timestamps correctlyCategory: Bug
File:
src/cleveragents/infrastructure/sandbox/_fs_utils.py:54-57(subdirs), lines39-41(root)Directory timestamps are set inside the
os.walkloop, immediately after creating each subdirectory. However, subsequent file-copy operations into that directory update itsmtime(standard POSIX behavior). The root directory timestamps are set on line 41 before the walk even begins, so they are overwritten by every mkdir/file-create inside the root.Impact:
backup_directorydoes not actually preserve source directory timestamps. Any rollback based on this backup exposes directories with wrong mtimes.Fix: Collect directory (path, atime, mtime) tuples during the walk, then apply them all in a bottom-up post-walk pass (deepest directories first, root last), after all file copies are complete.
MEDIUM Severity
M1.
backup_directoryroot/directory permissions set too early — may cause backup failureCategory: Bug
File:
src/cleveragents/infrastructure/sandbox/_fs_utils.py:40(root),56(subdirs)Permissions are applied to the backup destination directory before content is written into it. If the source directory has restrictive permissions (e.g.,
0o500— read+execute only), the backup destination is chmod'd to0o500, and subsequent file-write operations inside it will fail withPermissionError.Impact: Backup (and therefore rollback) fails for directories with write-restrictive permissions.
Fix: Apply permissions in the same bottom-up post-walk pass as timestamps (after H1 fix).
M2. Thread-safety gap in
commit_allCategory: Bug / Concurrency
File:
src/cleveragents/infrastructure/sandbox/manager.py:243-294The method acquires
_lockto snapshot the sandboxes list (line 243-244), then releases the lock before iterating and committing. Between snapshot and commit, another thread could callcleanup_all,rollback_all, or anothercommit_allfor the sameplan_id, modifying the underlying sandboxes or_active_sandboxesdict.Impact: Concurrent same-plan calls could cause double-commit, commit-after-cleanup, or other race conditions.
Fix: Either hold the lock for the entire commit loop, or document that concurrent
commit_allcalls for the same plan are not supported (and add a per-plan guard).M3. Asymmetric error handling in
commit_all— dual return/raise contractCategory: Bug / API Design
File:
src/cleveragents/infrastructure/sandbox/manager.py:330-348When a
SandboxErroroccurs during commit, the method returns aCommitResult(success=False). When a non-SandboxErroroccurs, it raisesAtomicCommitError. Callers must handle both a failed result and an exception for full correctness. This inconsistent contract is easy to misuse.Impact: Callers that only check
CommitResult.successwill missAtomicCommitErrorexceptions (and vice versa).Suggestion: Consider unifying: always return
CommitResult(success=False)and attach the exception info, or always raise (wrappingSandboxErrortoo).M4.
OverlaySandbox.rollback()fromACTIVE—dirs_exist_ok=Falsewithignore_errors=TruermtreeCategory: Bug
File:
src/cleveragents/infrastructure/sandbox/overlay.py:455-460The ACTIVE rollback path calls
shutil.rmtree(self._merged_dir, ignore_errors=True)thenshutil.copytree(..., dirs_exist_ok=False). Ifrmtreesilently fails (e.g., permission error on a file),copytreewill raiseFileExistsErrorbecause the directory still exists.Note: The COMMITTED rollback path (line 441) correctly uses
dirs_exist_ok=Truefor exactly this reason.Fix: Use
dirs_exist_ok=Trueon line 460, or don't useignore_errors=Trueon the rmtree (so the error surfaces early).M5.
commit_alldoes not implement bottom-up commit ordering per specCategory: Specification Compliance
File:
src/cleveragents/infrastructure/sandbox/manager.py:294Spec ref: Line 24632 — "Sandbox commit: Bottom-up (child before parent)"
The specification requires bottom-up ordering (child resources committed before parents). The current implementation uses insertion order partitioned into rollbackable-first / non-rollbackable-last, with no DAG traversal to determine parent/child relationships.
Impact: If resources are inserted parent-first (the natural top-down creation order), they'll be committed parent-first, violating the spec. This could cause issues with cross-resource dependencies.
Suggestion: This may be acceptable as a known limitation (document it), or implement DAG-aware ordering using the boundary cache/resource registry.
M6.
backup_directoryinner try/except only catchesOSError— non-OSError leaks temp directoryCategory: Bug / Resource Leak
File:
src/cleveragents/infrastructure/sandbox/copy_on_write.py:260-264andoverlay.py:307-311If
backup_directoryraises a non-OSErrorexception (e.g.,MemoryError,TypeErrorfrom an unexpectedNone), the inner except doesn't fire, the outerexcept OSErroron line 282 doesn't fire either, andbackup_diris leaked on disk.Fix: Catch
Exception(orBaseException) in the inner handler, matching the broader catch philosophy used elsewhere in this PR.M7. BDD test for directory timestamp preservation passes by coincidence
Category: Test Flaw
File:
features/steps/sandbox_fs_utils_steps.py—step_then_dir_timestamps_matchThe test creates source directories and files, then checks
abs(src_mtime - dst_mtime) < 1.0. Because directories are freshly created and backup runs immediately, both source and destination directories have "approximately now" mtimes — the test passes because both timestamps are recent, not becausebackup_directorycorrectly preserved them.If the source directories had distinctive old timestamps (e.g.,
os.utime(src_dir, (2000000.0, 2000000.0))), the test would fail, exposing bug H1.Fix: Set distinctive directory timestamps in the Given step and verify they're preserved exactly (as is done for files).
LOW Severity
L1. TOCTOU race in
safe_restoremkdtemp+rmdir patternCategory: Security
File:
src/cleveragents/infrastructure/sandbox/_fs_utils.py:100-108safe_restorecallsmkdtemp()(creates dir),os.rmdir()(removes it), thenos.rename(target, stale). Between rmdir and rename, another process with write access to the parent directory could create a file/symlink at the stale path.Risk: Very low in practice due to random naming and narrow timing window.
L2. Duplicate
_compute_diffimplementationsCategory: Code Quality
Files:
copy_on_write.py:426-473andoverlay.py:561-596Both sandbox classes have functionally identical
_compute_diffstatic methods. The_fs_utilsmodule was introduced for shared backup/restore logic, but_compute_diffwasn't extracted.Suggestion: Extract to
_fs_utils.compute_diff()to eliminate duplication.L3. Inline imports in
commit_allCategory: Code Quality
File:
src/cleveragents/infrastructure/sandbox/manager.py:261-264NoSandboxandTransactionSandboxare imported inside the method body (to avoid circular imports). While Python caches imports, this is a code smell in a critical path.Suggestion: Consider restructuring the module dependency to allow top-level imports, or use a type-checking flag/protocol-based check instead of
isinstance.L4.
datetime.now()without timezoneCategory: Code Quality
Files:
manager.py:342,copy_on_write.py:161,321,overlay.py:373, etc.Multiple
CommitResultandSandboxContextinstances use naivedatetime.now(). While this matches existing codebase patterns, it's worth noting for future distributed/server-mode scenarios.L5. No test for
GitWorktreeSandboxrollback from COMMITTEDCategory: Test Coverage Gap
File:
git_worktree.py:496-506The
git reset --hard <pre_merge_commit>path for undoing a merge is not covered by any BDD or Robot Framework test in this PR. BothCopyOnWriteSandboxandOverlaySandboxhave explicit rollback-from-COMMITTED scenarios.L6. No test for
commit_allwith only non-rollbackable sandboxesCategory: Test Coverage Gap
File:
manager.py:266-293If all sandboxes are
NoSandboxorTransactionSandbox, therollbackablelist is empty and the commit loop only processes non-rollbackable entries. If one fails,_rollback_committedis called with an emptycommittedlist. This path is untested.L7. No test for
backup_directoryskipping non-regular filesCategory: Test Coverage Gap
File:
_fs_utils.py:66-72The code skips FIFOs, sockets, and device files with a warning. No BDD scenario verifies this behavior.
L8. Redundant
Nonechecks after assertions in test stepsCategory: Code Quality
Files:
sandbox_manager_coverage_steps.py:194,728Pattern:
assert sandbox is not Nonefollowed byif sandbox is not None:. Theifis dead code since the assertion would fail first.L9.
commit_alldocstring inaccurately describes exception behaviorCategory: Documentation
File:
src/cleveragents/infrastructure/sandbox/manager.py:234-238The docstring says "the original exception is re-raised", but the code actually wraps it in
AtomicCommitErrorand raises that instead. The caller receivesAtomicCommitError(with__cause__), not the raw exception.Informational
I1. Specification contradiction documented correctly
The commit message and
commit_alldocstring (lines 211-216) properly document the contradiction between spec line 45938 (atomic apply) and line 19193 (partial apply). The atomicity requirement from 45938 was chosen per issue #925. A spec update for line 19193 is recommended.I2. LIFO rollback order vs. spec's top-down rollback
The
_rollback_committeddocstring (lines 369-375) correctly distinguishes between LIFO batch-undo ordering and the spec's top-down DAG-traversal rollback ordering (line 24632). This is a reasonable design choice.Findings Summary
Recommended priority: Fix H1 and M1 together (they share the same root cause and fix), then address M4 and M6 (one-line fixes). M2 and M3 are design-level decisions that may warrant discussion.
3a6dcd6d747a47e9dcdcCode Review Report — PR #1146
fix(sandbox): make commit_all atomic per specificationReviewer: Automated code review (4 full iterative cycles)
Scope: All code changes in branch
fix/atomic-sandbox-commitplus close connections to surrounding codeReference: Issue #925, specification lines 45938 (atomic apply), 19193 (partial apply), 24632 (DAG ordering)
Summary
Overall this is a well-structured change that correctly addresses the specification atomicity requirement (line 45938). The LIFO rollback strategy, non-rollbackable sandbox ordering, and
AtomicCommitErrorexception design are sound. The_fs_utilsextraction is a solid refactor eliminating code duplication. However, several issues were identified across 4 review cycles that should be addressed before merge.Findings: 13 total — 2 High, 5 Medium, 6 Low
HIGH Severity
H1. BUG —
commit_alldoes not hold lock during the commit loop (Thread-safety violation)File:
manager.py:249-256Category: Bug / Concurrency
The lock is released after copying the sandbox list but before the status filter and the commit loop:
Another thread could (a) modify a sandbox's status between the status check and the
sandbox.commit()call, (b) add or remove sandboxes from the plan's tracking dict, or (c) callcommit_allon the same plan concurrently, resulting in double-commits or missed sandboxes.Recommendation: Either hold the lock for the entire
commit_alloperation (acceptable since sandboxcommit()does I/O and this prevents concurrentcommit_allfor the same plan — which is the intended atomic behavior), or re-verify each sandbox's status immediately before callingcommit()inside the loop and document the concurrency contract.H2. BUG — Non-
OSErrorexceptions during file-copy phase leave original directory partially corruptedFile:
copy_on_write.py:282,overlay.py:329Category: Bug / Data Integrity
The outer exception handler in
commit()only catchesOSError:If a non-
OSErrorexception occurs during the file-copy phase (lines 267-280 incopy_on_write.py, lines 314-327 inoverlay.py) — e.g.,MemoryError,KeyboardInterrupt(aBaseException), or an unexpectedTypeError— the backup exists butsafe_restoreis never called. The original directory is left partially modified (some files copied, some not). The exception propagates tocommit_all, which calls_rollback_committed, but since this sandbox was never added to thecommittedlist (it raised during commit), no rollback is attempted on it. The_pre_commit_backupis preserved but the original remains corrupted until manual intervention.The commit message states "Pre-commit backup exception handler catches Exception (not just OSError)" — this is accurate for the inner handler around
backup_directory()(line 262), but not for the outer handler that triggerssafe_restore.Recommendation: Change
except OSErrortoexcept Exception(orexcept BaseExceptionforKeyboardInterruptsafety) in bothcopy_on_write.py:282andoverlay.py:329, mirroring the pattern already used inrollback()(which catchesException).MEDIUM Severity
M1. SECURITY — TOCTOU window in
safe_restorebetweenrmdirandrenameFile:
_fs_utils.py:135-143Category: Security / Race Condition
The commit message states "safe_restore now uses a unique temporary name (via mkdtemp) for the stale directory instead of a predictable sibling, eliminating TOCTOU race conditions". While
mkdtempproduces an unpredictable name (mitigating the symlink-attack vector), thermdir+renamepair re-introduces a narrow TOCTOU window: betweenrmdirandrename, another process with write access toparent_dircould create a file or directory at the same path.Recommendation: Instead of
mkdtemp+rmdir, consider usingtempfile.mkdtemp()and thenos.rename(target_path, os.path.join(stale, "original"))to rename into the mkdtemp directory (which is guaranteed to exist and be owned by the current process). Alternatively, document the accepted residual risk.M2. BUG —
OverlaySandboxrollback from COMMITTED may double-mount if_unmount_overlayfails silentlyFile:
overlay.py:421-432Category: Bug / Resource Leak
_unmount_overlay()suppresses errors (line 559-564:logger.warningonly). If the unmount fails (e.g., busy mount, permission error), the subsequent_mount_overlay()will attempt to mount OverlayFS on a still-mounted path, leading to a double-mount or mount error. The mount error would then propagate asSandboxCreationError, leaving the sandbox in an inconsistent state.Recommendation: Either check the unmount result and bail out before remounting, or make
_unmount_overlayraise on failure when called from rollback (as opposed to cleanup where suppression is appropriate).M3. TEST FLAW —
_register_tmpdir_cleanupdoes not actually trigger cleanupFile:
features/steps/sandbox_fs_utils_steps.py:34-38Category: Test Flaw / Resource Leak
The cleanup lambdas are appended to
context._cleanup_handlers, but Behave does not automatically invoke a list named_cleanup_handlers. Behave's cleanup mechanism iscontext.add_cleanup(fn)or explicitafter_scenariohooks. These temporary directories will leak on every test run.Recommendation: Replace with
context.add_cleanup(lambda: shutil.rmtree(tmpdir, ignore_errors=True))which Behave invokes automatically at scenario teardown.M4. BUG —
compute_diffdoes not detect symlink target changesFile:
_fs_utils.py:164-204Category: Bug / Correctness
compute_diffusesos.walkto collect files andfilecmp.cmp(shallow=False)to compare content. However:os.path.isfile()inos.walk(default behavior), so they appear as regular files.filecmp.cmpcompares the resolved content, not the symlink targets. Two symlinks pointing to different paths but with identical resolved content would be reported as unchanged.backup_directorypreserves symlinks as symlinks (os.readlink+os.symlink), creating a mismatch between what is backed up and whatcompute_diffdetects.Recommendation: Add symlink target comparison in
compute_difffor entries whereos.path.islink()returnsTruein either tree.M5. BUG — Deleted directories not detected or cleaned up during commit
File:
copy_on_write.py:277-280,overlay.py:324-327Category: Bug / Correctness
compute_diffonly tracks files, not directories. If an entire subdirectory is deleted in the sandbox, the individual files are removed from the original, but the now-empty parent directories remain. After commit, the original may contain stale empty directories that did not exist in the sandbox.Recommendation: After processing deleted files, walk the deleted file paths in reverse depth order and remove empty parent directories up to (but not including) the original root.
LOW Severity
L1. PERFORMANCE — Full directory backup on every commit with changes
File:
copy_on_write.py:256-265,overlay.py:303-312Category: Performance
backup_directorycopies the entire original directory even when only a single file changed. For large directories (tens of thousands of files), this is expensive. The no-change skip optimization is good, but a targeted backup (copying only files that will be overwritten) would be significantly more efficient.Recommendation: Consider backing up only the files identified by
compute_diffaschanged_files+deleted_files(the files that will be modified or removed), rather than the entire directory tree. This would require updatingsafe_restoreto perform a targeted restore.L2. CODE QUALITY — Significant code duplication between
CopyOnWriteSandboxandOverlaySandboxFile:
copy_on_write.py,overlay.pyCategory: Code Quality / Maintainability
The
commit()method bodies (backup creation, file copy, error handling withsafe_restore) and therollback()COMMITTED path are nearly identical between the two classes. The_fs_utilsextraction is a good start but the remaining duplication means bug fixes must be applied in two places (as evidenced by H2 appearing in both files).Recommendation: Extract a shared base class or mixin (
FilesystemSandboxBase) containing the commoncommit()androllback()logic, with subclass hooks for strategy-specific behavior (e.g., overlay mount/unmount).L3. TEST GAP — No test for
backup_directoryskipping special files (FIFOs, sockets)File:
_fs_utils.py:91-97Category: Test Coverage
The commit message documents "backup_directory skips non-regular files (FIFOs, sockets, device files) with a warning to prevent hangs on special files", but there is no BDD scenario or step definition that creates a FIFO/socket and verifies it is skipped with a warning.
Recommendation: Add a scenario that creates a FIFO (
os.mkfifo) in the source directory and verifies it is absent from the backup.L4. TEST GAP — No test for concurrent
commit_allcallsCategory: Test Coverage
There is no test exercising concurrent calls to
commit_allfrom multiple threads, which is the scenario where H1 would manifest. Given thatSandboxManagerdocuments itself as "Thread-safe via an internal reentrant lock" (line 4), this gap is notable.Recommendation: Add a threading test that calls
commit_allfrom two threads simultaneously and verifies no double-commits or data corruption occur.L5. TEST GAP — No test for
safe_restorewith symlinked contentCategory: Test Coverage
safe_restorepassessymlinks=Truetoshutil.copytree, but no existing scenario verifies that symlinks in the backup are correctly restored as symlinks in the target.Recommendation: Add a scenario in
sandbox_fs_utils.featurethat creates a backup containing symlinks and verifies they are preserved aftersafe_restore.L6. DOCUMENTATION — Issue #925 body references wrong specification line
Category: Documentation
The issue body states "The specification (line 45844) requires..." but line 45844 is
self._subscriptions.setdefault(event_type, []).append(handler)(EventBus code). The correct reference is line 45938: "The Apply phase is an atomic operation — either all sandbox changes are committed to the real resources, or none are." The commit message correctly uses line 45938.Recommendation: Update issue #925 body to reference line 45938 for traceability.
Summary Table
manager.pycommit_alldoes not hold lock during commit loopcopy_on_write.py,overlay.py_fs_utils.pysafe_restorebetweenrmdirandrenameoverlay.py_unmount_overlayfails silentlysandbox_fs_utils_steps.py_register_tmpdir_cleanupnever triggers — temp dirs leak_fs_utils.pycompute_diffdoes not detect symlink target changescopy_on_write.py,overlay.pycopy_on_write.py,overlay.pycopy_on_write.py,overlay.py_fs_utils.pymanager.pycommit_all_fs_utils.pysafe_restorewith symlinks7a47e9dcdc1a7ebf1825Code Review Report — PR #1146
fix(sandbox): make commit_all atomic per specificationReview scope: All code changes in branch
fix/atomic-sandbox-commitplus close connections to surrounding code.Review method: 3 iterative global passes across all categories (bugs, security, performance, test coverage/flaws). Each cycle re-examined all categories until no new issues were found.
Specification references:
docs/specification.mdlines 45938 (atomic apply invariant), 19193 (partial apply contradiction), 24632 (DAG ordering), 19677-19761 (Apply phase), 28636-28663 (mid-execute correction).Overall Assessment
The PR delivers a well-thought-out implementation of the atomic
commit_allsemantics required by spec line 45938. The rollbackable-first / non-rollbackable-last commit ordering, LIFO rollback,AtomicCommitErrorwrapping, and the shared_fs_utilsextraction are all sound design choices. The spec contradiction (line 45938 vs 19193) is correctly documented. Thesafe_restorerename-into-mkdtemp approach eliminating the TOCTOU window is particularly well done.The findings below are organized by severity (High → Low) within each category.
1. Bugs
[BUG-1]
CopyOnWriteSandbox.rollback()from ACTIVE usesdirs_exist_ok=False— inconsistent with OverlaySandbox fix | MediumFile:
src/cleveragents/infrastructure/sandbox/copy_on_write.py:377The commit message states: "OverlaySandbox rollback from ACTIVE now uses
dirs_exist_ok=Truefor userspace fallback to preventFileExistsErrorwhenrmtreewithignore_errors=Truesilently fails."This fix was applied in
OverlaySandbox.rollback()(overlay.py:489) but not inCopyOnWriteSandbox.rollback()(copy_on_write.py:377), which still usesdirs_exist_ok=False. The same failure mode applies:shutil.rmtree(self._sandbox_path, ignore_errors=True)at line 370 can silently fail (e.g., permission error on a file), leaving the directory partially intact. The subsequentshutil.copytree(..., dirs_exist_ok=False)at line 377 would then raiseFileExistsError.Recommendation: Change
copy_on_write.py:377todirs_exist_ok=Truefor consistency.[BUG-2]
OverlaySandbox.rollback()from COMMITTED over-reports failure when merged dir reset fails after original was restored | MediumFile:
src/cleveragents/infrastructure/sandbox/overlay.py:404-496When rolling back from COMMITTED:
safe_restore()successfully restores the original directory (line 419)self._status = SandboxStatus.ERROREDand raisesSandboxRollbackError_rollback_committed()adds this sandbox tofailed_rollback_idsThe critical data (original directory) was correctly restored, but the error metadata reports this sandbox as a failed rollback. Callers of
commit_all()cannot distinguish between "original not restored" (actual data loss risk) and "original restored but sandbox workspace is inconsistent" (no data loss).Recommendation: Consider splitting the rollback into two phases: (1) restore original (critical), (2) reset workspace (non-critical). Only report rollback failure if phase 1 fails.
[BUG-3]
AtomicCommitErrornot exported fromsandbox/__init__.py| MediumFile:
src/cleveragents/infrastructure/sandbox/__init__.pyAtomicCommitErroris a new public exception that callers need to catch whencommit_all()encounters a non-SandboxErrorexception. It is defined inprotocol.pyand raised bymanager.py, but it is not imported or included in__all__in the package's__init__.py.Callers must use
from cleveragents.infrastructure.sandbox.protocol import AtomicCommitErrorinstead of the expectedfrom cleveragents.infrastructure.sandbox import AtomicCommitError.Recommendation: Add
AtomicCommitErrorto the imports and__all__in__init__.py.2. Test Coverage Gaps
[TEST-1] Missing test for
CopyOnWriteSandbox.rollback()from COMMITTED state | MediumThe PR adds a BDD scenario for
OverlaySandboxrollback from COMMITTED (overlay_sandbox.feature:200-210) but there is no corresponding test forCopyOnWriteSandbox.rollback()from COMMITTED with a real pre-commit backup. This is a critical new code path (copy_on_write.py:352-367) that callssafe_restore()and is untested with a real sandbox instance.The manager-level BDD tests use mocks, which don't exercise the actual
safe_restoreintegration inCopyOnWriteSandbox.[TEST-2] Missing test for
GitWorktreeSandbox.rollback()from COMMITTED state | MediumNo BDD or Robot test exercises the
git reset --hard <pre_merge_commit>rollback path ingit_worktree.py:497-506. This is a destructive operation (hard reset on the original branch) introduced by this PR. The commit message includes a docstring warning about multi-worktree safety, but there is no test verifying the merge is actually undone.[TEST-3] Missing test for
TransactionSandbox.rollback()from COMMITTED raisingSandboxRollbackError| MediumTransactionSandbox.rollback()now raisesSandboxRollbackErrorwhen called from COMMITTED state (transaction_sandbox.py:284-292). The existing testTransactionSandbox rollback in wrong state raises(database_resources.feature:186) tests rollback from PENDING state, not from COMMITTED state. The new raising behavior is untested.[TEST-4] Robot integration test uses mocks instead of real sandboxes | Medium
File:
robot/helper_sandbox_integration.py:336-405The
_manager_atomic_commit()function is labeled as an integration test but replaces both sandboxes withMagicMockobjects. This does not exercise the actual commit/rollback code paths of any real sandbox implementation. Only theOverlaySandboxCOMMITTED rollback scenario tests with a real sandbox.[TEST-5] Orphaned step definitions in
sandbox_fs_utils_steps.py| LowFile:
features/steps/sandbox_fs_utils_steps.py:108-124, 272-277Two step definitions are defined but not used by any
.featurefile:@given("a stale atomic-rollback-old directory exists")(line 108)@then("no stale rollback directory should remain")(line 272)Additionally, these steps reference the old
".atomic-rollback-old"naming pattern, butsafe_restorenow uses".atomic-rollback-old-"prefix with mkdtemp (different naming scheme). These appear to be leftovers from an earlier design iteration.[TEST-6] No direct test for
compute_diff()in_fs_utils| Lowcompute_diff()was extracted from bothCopyOnWriteSandboxandOverlaySandboxinto_fs_utils. While it is tested indirectly through commit scenarios, there is no BDD scenario insandbox_fs_utils.featurethat directly tests it, unlikebackup_directoryandsafe_restorewhich have dedicated scenarios.3. Performance
[PERF-1] Full directory backup before every commit with changes | Medium
Files:
copy_on_write.py:252-264,overlay.py:299-312,_fs_utils.py:21-107For every commit that detects changes,
backup_directory()copies the entire original directory tree. For large repositories (thousands of files, deep trees), this is O(n) in both disk space and time. The same-filesystem mkdtemp optimization (avoiding cross-device copies) and the no-change skip are good mitigations, but the backup cost is still proportional to the total directory size regardless of how many files actually changed.Recommendation: Consider a targeted backup strategy that only backs up files in the
changed_files + deleted_fileslists. This would reduce backup time and disk usage proportionally to the actual change set. Alternatively, document the performance characteristic for large directory scenarios.4. Design Observations (Non-blocking)
[DESIGN-1] Specification contradiction documented but not resolved
The
commit_alldocstring (manager.py:215-220) correctly documents the contradiction between spec line 45938 (atomic) and line 19193 (partial apply). This PR implements the atomicity requirement. A spec update for line 19193 may be needed to align the specification with the implementation.[DESIGN-2] LIFO rollback ordering vs spec DAG ordering
The
_rollback_committeddocstring (manager.py:367-373) correctly distinguishes LIFO (batch undo) from the spec's "top-down" DAG ordering (line 24632). This is a valid architectural note. The spec's DAG ordering applies within a single sandbox domain; LIFO applies to the batch of independent sandboxes.Summary: 3 medium bugs, 4 medium test gaps, 1 medium performance note, 3 low-severity items. No high-severity or security issues found.
1a7ebf1825614c350433Code Review Report — PR #1146
fix/atomic-sandbox-commitReviewer: AI-assisted review (4 full global cycles across all categories)
Scope: All code changes in branch
fix/atomic-sandbox-commitplus close connections to surrounding codeIssue: #925 —
commit_allmust be all-or-nothing atomicSpec reference:
docs/specification.md(lines 45938, 19193, 24632)Overall Assessment
This is a substantial and well-thought-out change. The commit message is exceptionally thorough, documenting every design decision. The atomicity guarantee for
commit_allis correctly implemented for the single-threaded case, and the LIFO rollback pattern is sound. The new_fs_utilsmodule withbackup_directory/safe_restore/compute_diffis a solid extraction that eliminates duplication. The BDD coverage is extensive.That said, 4 review cycles across all categories surfaced findings that warrant attention, organized below by severity and category.
SEVERITY: HIGH
H1 — Bug:
OverlaySandbox.rollback()from COMMITTED with no backup unnecessarily resets merged directoryFile:
overlay.py:404-466When
commit()detects no changes, it skips the pre-commit backup (_pre_commit_backupremainsNone). On rollback from COMMITTED, the no-backup path correctly logs a debug no-op message (line 410). However, the merged-directory reset block at lines 425-466 always executes regardless of whether a backup existed.For real OverlayFS mode, this attempts
umount(line 434) on a live mount that was never modified. Ifumountfails (e.g., busy mount, stale NFS handle),SandboxRollbackErroris raised — turning what should be a harmless no-op rollback into a failure. In thecommit_allatomic recovery path, this would place the sandbox infailed_rollback_ids, producing a false alarm.For userspace fallback, this deletes and re-copies the entire merged directory unnecessarily.
Suggested fix: Guard the merged-dir reset with the same
_pre_commit_backup is not Nonecondition, or use anelseclause, matching the pattern already used inCopyOnWriteSandbox.rollback().H2 — Bug: Dual error-reporting paths in
commit_allcreate asymmetric APIFile:
manager.py:296-346When a sandbox
commit()raisesSandboxError, the method returns aCommitResultlist withsuccess=Falseand rollback metadata inresult.metadata["rolled_back"]. When it raises a non-SandboxErrorexception, the method raisesAtomicCommitErrorwith rollback metadata inexc.rolled_back_ids.Callers must handle both a return-value error path and an exception path, with rollback metadata in structurally different locations (
dict["rolled_back"]vslistattribute). This is error-prone: a caller that only checks the return value will missAtomicCommitError, and vice versa.Suggested fix: Consider one of:
AtomicCommitError(or aSandboxCommitErrorsubclass) on failure, carrying all metadata in exception attributes.CommitResultlist, wrapping non-SandboxErrorexceptions in the result metadata.H3 — Bug:
CopyOnWriteSandbox.get_path()rejectsROLLED_BACKstatus, contradicting its own rollback docstringFile:
copy_on_write.py:191-194vscopy_on_write.py:330-337The
rollback()docstring (introduced in this commit) states: "The sandbox transitions toROLLED_BACKand can be re-activated viaget_path." Butget_path()only acceptsCREATEDandACTIVEstatuses —ROLLED_BACKis rejected withSandboxStateError.This is inconsistent with
OverlaySandbox.get_path()which correctly acceptsROLLED_BACK(line 238-242). The status transition table also allowsROLLED_BACK -> ACTIVE.Impact: After a COMMITTED rollback via
commit_all, theCopyOnWriteSandboxcannot be re-activated. Whilecommit_alldoesn't attempt re-activation (so atomicity is not broken), the docstring promise is violated and the behavioral inconsistency between sandbox implementations could cause issues for future callers.Suggested fix: Either add
SandboxStatus.ROLLED_BACKto the allowed statuses inCopyOnWriteSandbox.get_path(), or correct the docstring to state that re-activation is not supported.H4 — Bug:
commit_allnon-rollbackable classification only handles built-in typesFile:
manager.py:264-287The rollbackable/non-rollbackable partitioning uses
isinstance(sb, NoSandbox)andisinstance(sb, TransactionSandbox). Custom sandbox strategies registered viaSandboxStrategyRegistrythat cannot support rollback-from-COMMITTED would be classified as rollbackable and committed early in the batch. If a later sandbox fails, their rollback would also fail — but the error metadata would misleadingly list them infailed_rollback_idsrather thanrolled_backbeing empty.Suggested fix: Consider adding a protocol method or property (e.g.,
supports_committed_rollback: bool) to theSandboxprotocol so the classification is strategy-agnostic.H5 — Bug: Thread-safety gap in
commit_allFile:
manager.py:249-348The method acquires
_lockonly to snapshot the sandbox list (line 249-250), then releases it. The entire commit loop, including_rollback_committed, runs outside the lock. If two threads callcommit_allfor the sameplan_idconcurrently:commit()call would hitSandboxStateError(sandbox already COMMITTED), triggering rollback of any sandboxes it had already committed.The class docstring says "Thread-safe: all mutable state is protected by
_lock", but the sandbox objects themselves are mutable and not protected.Suggested fix: Hold the lock for the entire
commit_alloperation, or use a per-plan lock to serialize commit operations for the same plan. Alternatively, document thatcommit_allis not safe for concurrent calls on the sameplan_id.SEVERITY: MEDIUM
M1 — Bug:
OverlaySandbox.rollback()double-wrapsSandboxRollbackErrorFile:
overlay.py:433-449andoverlay.py:492-496When the
umountcall fails during COMMITTED rollback, the innerexceptblock raisesSandboxRollbackError(line 445). This propagates to the outerexcept Exception(line 492) which catches it and wraps it in anotherSandboxRollbackError, producing a confusing double-wrapped error chain:Suggested fix: Either re-raise
SandboxRollbackErrorwithout wrapping in the outer handler, or catch only non-SandboxRollbackErrorexceptions in the outer block.M2 — Bug:
CopyOnWriteSandbox.rollback()from COMMITTED doesn't reset sandbox copyFile:
copy_on_write.py:351-367After rollback from COMMITTED,
safe_restorerestores the original directory, but the sandbox copy at_sandbox_pathstill contains the pre-rollback modifications. Althoughget_path()currently blocks re-activation (H3), if H3 is fixed, the stale sandbox copy would cause previously committed changes to reappear incompute_diffon the next commit cycle.For consistency with
OverlaySandbox(which resets its merged directory after COMMITTED rollback), the sandbox copy should be reset from the restored original.M3 — Bug:
rollback_allcatches onlySandboxError, inconsistent with_rollback_committedFile:
manager.py:434vsmanager.py:398_rollback_committedcatchesException(line 398), butrollback_allcatches onlySandboxError(line 434). An unexpected non-SandboxErrorexception inrollback_allwould crash the caller and prevent remaining sandboxes from being rolled back.Suggested fix: Use
Exceptioninrollback_allfor consistency, or document the different behavior.M4 — Performance:
compute_diffwalks both directory trees fully with deep comparisonFile:
_fs_utils.py:165-205compute_diffwalks both sandbox and original directory trees to build complete file sets, then callsfilecmp.cmp(shallow=False)on every common file (reading full contents). For large directories with few changes, this is significantly more expensive than necessary.For commit-time use, a change-tracking approach (dirty flags, inotify, or git status) would avoid the full tree walk. This is a pre-existing pattern that was extracted into
_fs_utils, but worth flagging since the extraction makes it a shared concern.M5 — Bug:
OverlaySandbox.create()userspace fallback usessymlinks=FalseFile:
overlay.py:183-188The userspace copytree at creation uses
symlinks=False, following symlinks instead of preserving them. This means the merged directory has regular files where the original has symlinks. On commit,compute_diffcompares file contents (equal), but the file type differs. More critically,shutil.copy2during commit would copy the regular file back, replacing the original's symlink with a regular file — a silent behavioral change.This is a pre-existing issue not introduced by this commit, but
compute_diffwas extracted to_fs_utilsas part of this change, making it a shared concern.M6 — Test Quality: Integration test doesn't verify rollback mock was called
File:
robot/helper_sandbox_integration.py:336-402The
_manager_atomic_commitintegration test verifiesCommitResultmetadata containsrolled_back, but does not assert that the mock sandbox'srollback()method was actually called. If the manager populated the metadata without calling rollback, the test would still pass.Suggested fix: Add
mock_ok.rollback.assert_called_once()after the commit_all call.M7 — Test Quality: Committable mock sandboxes don't simulate status transitions
File:
features/steps/sandbox_manager_coverage_steps.py:178-211The
_make_mock_sandboxhelper creates mocks with a staticACTIVEstatus that never changes toCOMMITTEDaftercommit(). While this doesn't break current manager tests (the manager doesn't re-check status post-commit), it means the mocks don't accurately simulate real sandbox behavior and could mask bugs where post-commit status checks are added.SEVERITY: LOW
L1 — Test Coverage: No test for
backup_directorywith broken symlinksFile:
_fs_utils.py:86-88backup_directorycallsos.readlinkfor symlinks but doesn't handle the case where the symlink target doesn't exist (broken symlink). No BDD scenario covers this edge case. The function would preserve the broken symlink (which is correct), butcompute_diffwould fail withfilecmp.cmpon broken symlinks.L2 — Test Coverage: No test for
backup_directoryskipping non-regular filesFile:
_fs_utils.py:91-97The code logs a warning and skips FIFOs, sockets, and device files, but no BDD scenario verifies this behavior. The feature file
sandbox_fs_utils.featurecovers permissions, timestamps, symlinks, and restore — but not special file skipping.L3 — Test Coverage: No concurrency test for
SandboxManagerThe manager class is documented as thread-safe, but no test exercises concurrent operations. Given the thread-safety gap in H5, adding a concurrency test would help validate the guarantees.
L4 — Test Quality: Private attribute access in transaction sandbox steps
File:
features/steps/transaction_sandbox_coverage_steps.py(lines 143, 181, 255)Steps like
sb._connection = Noneandsb._connection = MagicMock(...)access private attributes, creating tight coupling to the internal implementation. If the attribute name changes, tests break silently. Consider adding a test-only method or using dependency injection for the connection.L5 — Test Quality: String-based error assertions are fragile
File:
features/steps/transaction_sandbox_coverage_steps.py:241, 274Assertions like
"cannot undo database commit" in str(context.caught_error).lower()break if error message wording changes. Consider asserting on exception type instead.L6 — Documentation: CHANGELOG entry has formatting inconsistencies
File:
CHANGELOG.md(lines 34, 56 of the diff)Two lines have inconsistent leading whitespace (three spaces instead of two):
"backup_directoryskips non-regular files..."(extra leading space)" on non-OSErrorfailures..."(extra leading space)L7 — Security:
backup_directorydoesn't preserve file ownershipFile:
_fs_utils.py:215-223The
_copy_filefunction preserves permissions and timestamps but not file ownership (uid/gid). When running as root (real OverlayFS mode) with varying file ownership, the backup would have all files owned by root. On restore, original ownership would be lost. This is unlikely to matter in most deployments but could affect root-mode OverlayFS usage.INFORMATIONAL NOTES
I1 — Specification Contradiction (Acknowledged)
The commit correctly notes the spec contradiction between line 45938 ("atomic apply — all or none") and line 19193 ("partial apply"). The implementation follows 45938 (atomicity). The docstring at
manager.py:215-220documents this. Consider filing a spec issue to resolve the contradiction.I2 — LIFO vs Spec's DAG-based Ordering
The
_rollback_committedmethod uses LIFO ordering for batch undo, while the spec's "top-down" rollback (line 24632) refers to resource DAG traversal within a single sandbox domain. The commit message correctly distinguishes these two contexts. The docstring atmanager.py:363-373documents this distinction.Summary: 5 High, 7 Medium, 7 Low findings across 4 review cycles. The core atomicity logic is sound. The most actionable items are H1 (OverlaySandbox no-op rollback bug), H2 (dual error paths), and H3 (docstring/behavior mismatch). The implementation overall reflects careful engineering with thorough consideration of edge cases.
614c3504338cf55ba449Code Review Report — PR #1146: Atomic
commit_all(#925)Reviewer: Automated deep review (3 global cycles across all categories)
Scope: Code changes in
fix/atomic-sandbox-commitbranch + close connections to surrounding codeCommit:
8cf55baSummary
The PR successfully implements the spec requirement (line 45938) for atomic
commit_allsemantics with comprehensive rollback support. The design is well-considered: rollbackable sandboxes commit first, non-rollbackable last; LIFO rollback order;AtomicCommitErrorwraps non-sandbox exceptions with metadata. However, three cycles of review uncovered 2 critical, 10 high, 9 medium, and 6 low severity findings across bugs, performance, security, test coverage, and documentation.Critical
C1 — BUG:
safe_restore()recovery fails withENOTEMPTYwhencopytreepartially succeeds_fs_utils.py:144-158When
shutil.copytree()(line 146-151) fails midway (e.g., disk full),target_pathis a partially-populated directory. The recovery path at line 157 callsos.rename(stale, target_path), but POSIXrename(2)requires the destination directory to be empty — it fails withENOTEMPTY/EEXISTiftarget_pathcontains any files from the partial copy.Impact: The original directory is stranded in the
stale_containertemp dir,target_pathcontains corrupt partial data, and the function's core invariant ("no data is lost") is violated. This affects every caller:CopyOnWriteSandbox.rollback(),OverlaySandbox.rollback(), and both sandboxes' commit-failure recovery paths.Fix: Add
shutil.rmtree(target_path, ignore_errors=True)before theos.rename(stale, target_path)in theexcept BaseExceptionblock. Or better — see C2.C2 — PERFORMANCE:
safe_restore()usesshutil.copytreewhereos.renamesuffices_fs_utils.py:146-151The backup is always on the same filesystem (callers use
dir=os.path.dirname(original_path)formkdtemp, confirmed atcopy_on_write.py:257-259,overlay.py:303-306, and documented in CHANGELOG line 41). Same-filesystemos.renameis O(1) and atomic. Yetsafe_restore()does:Replacing
copytreewithos.rename(backup_path, target_path)would:The module docstring (lines 5-7) also contradicts itself: it says functions "deliberately avoid
shutil.copytree" butsafe_restoreuses it.High
H1 — BUG:
OverlaySandbox.get_path()missingROLLED_BACK → ACTIVEtransitionoverlay.py:250-251This only handles
CREATED → ACTIVE. TheROLLED_BACKstatus is allowed by the guard at line 238, butget_path()never transitions it toACTIVE. Compare withCopyOnWriteSandbox.get_path()(line 203-204) which correctly handles both:Impact: After rollback,
OverlaySandboxis stuck inROLLED_BACK. Callingget_path()returns a valid path but the status never advances toACTIVE, so a subsequentcommit()(which requiresCREATEDorACTIVE) raisesSandboxStateError. The sandbox becomes permanently unusable after rollback, contradicting the state machine (ROLLED_BACK → ACTIVEatprotocol.py:117).H2 — BUG:
GitWorktreeSandbox.get_path()rejectsROLLED_BACKstatus entirelygit_worktree.py:303-306ROLLED_BACKis not in the allowed set, soget_path()raisesSandboxStateErrorafter any rollback. BothCopyOnWriteSandboxandOverlaySandboxincludeROLLED_BACKin their guards. The state machine explicitly allowsROLLED_BACK → ACTIVE(protocol.py:117), making this sandbox type unreusable after rollback — inconsistent with the protocol contract and with the other sandbox implementations.H3 — PERFORMANCE: Full directory backup for any change, regardless of size
copy_on_write.py:253-266,overlay.py:299-312If even one byte in one file changes,
backup_directory()copies the entire original directory. The diff is already computed at this point (changed_files,added_files,deleted_filesare known). A proportional approach would back up only files about to be overwritten (changed_files+deleted_filesdestinations), making cost O(changed) instead of O(total).For a 100GB directory with a 1-byte change, this generates 100GB of backup I/O.
H4 — PERFORMANCE:
compute_diff()usesshallow=False, reading all file contents_fs_utils.py:199filecmp.cmp(shallow=False)reads and byte-compares the entire contents of every common file. Sinceshutil.copytree(used at sandboxcreate()time) preservesmtimeandsize, any unmodified file has identical stat metadata. Usingshallow=True(stat-based comparison) would correctly identify unchanged files without I/O, only needing byte-level reads for files whose metadata differs.For 10,000 files with 5 modified, this currently reads all 10,000 files twice. With
shallow=True, only 5 files would need content comparison.H5 — PERFORMANCE: Triple disk usage during commit window
copy_on_write.py:240-275,overlay.py:286-321During commit, three full copies coexist: original + sandbox copy + pre-commit backup. For a 10GB directory, ~30GB consumed with no disk-space check. This can itself trigger the very failure (disk full) that the backup is meant to protect against.
H6 — SECURITY: Symlink preservation in
backup_directory/safe_restoreenables escape_fs_utils.py:69-70, 87-88backup_directorycopies symlinks with arbitrary targets. Whensafe_restore()recreates them viashutil.copytree(..., symlinks=True), symlinks pointing outside the sandbox boundary persist through the backup/restore cycle. A malicious symlink crafted during sandbox use survives rollback and points to any filesystem path.H7 — SECURITY: Setuid/setgid bit preservation during backup and restore
_fs_utils.py:106, 223os.chmod(dir_path, mode)andos.chmod(dst, stat.S_IMODE(src_stat.st_mode))copy the full permission mode including setuid (S_ISUID), setgid (S_ISGID), and sticky bits. If a source file has setuid-root permissions, the backup/restore cycle recreates a setuid binary — a privilege escalation vector. Consider masking out special bits:os.chmod(dst, stat.S_IMODE(st.st_mode) & ~(stat.S_ISUID | stat.S_ISGID)).H8 — SECURITY: TOCTOU race in
backup_directorybetweenislinkand_copy_file_fs_utils.py:86-90Between
os.path.islink(src_file)(line 86) and_copy_file(src_file, dst_file)(line 90), a local attacker could replace a regular file with a symlink._copy_fileusesopen(src, "rb")which follows symlinks, allowing exfiltration of arbitrary file contents into the backup directory. Standard mitigation: open withO_NOFOLLOWor useos.lstat()+os.open().H9 — TEST: No scenario tests the no-backup no-op rollback path for CopyOnWriteSandbox
copy_on_write.py:354-362When
commit()detects zero changes and skips the backup,rollback()fromCOMMITTEDtakes the_pre_commit_backup is Nonebranch (a debug-log no-op). No test exercises this path. All existing rollback-from-COMMITTED tests mutate a file first, ensuring a backup always exists.H10 — TEST: No scenario tests the no-backup no-op rollback path for OverlaySandbox
overlay.py:406-416Same gap as H9 but for OverlaySandbox. No scenario commits with zero changes and then rolls back.
Medium
M1 — BUG:
rollback_all()only rolls backACTIVEsandboxes, skipsCOMMITTEDmanager.py:439This skips
COMMITTEDsandboxes. The state machine allowsCOMMITTED → ROLLED_BACK(protocol.py:116), and_rollback_committedrelies on it. A caller invokingrollback_all()after manual commits (outsidecommit_all) would leave committed sandboxes un-rolled-back.M2 — BUG:
cleanup_all()catches onlySandboxError, may abort mid-loopmanager.py:471If a sandbox's
cleanup()raises a non-SandboxError(e.g., rawOSError,PermissionError), the exception propagates uncaught and aborts the loop. Remaining sandboxes are never cleaned up. Compare with_rollback_committed()(line 408) androllback_all()(line 444) which both correctly catchException.M3 — BUG: Git merge timeout can leave repository in
MERGE_HEADstategit_worktree.py:426-430, 540-596If
git mergetimes out, the original repository may be left withMERGE_HEADpresent (merge-in-progress state). The error handler clears_pre_merge_commit = None(line 433), losing the ability togit reset --hard.cleanup()removes the worktree/branch but never runsgit merge --aborton the original repo.M4 — PERFORMANCE:
backup_directoryre-walks the original tree thatcompute_diffjust walkedcopy_on_write.py:240-262,overlay.py:286-308The commit sequence does:
compute_diff()(walks both trees) thenbackup_directory()(walks original again). That is 3os.walktraversals where 2 would suffice. On directories with millions of entries, the redundant walk adds measurable latency.M5 — PERFORMANCE:
_copy_file()manual loop bypasses kernel zero-copy_fs_utils.py:215-219The manual
read(1MB)/writeloop avoidsshutil.copy2(to prevent test-patch interference per the docstring), but also bypassesos.sendfile()/copy_file_range()kernel zero-copy. For large files, this is 2-5x slower. Consider usingos.sendfile()directly (notshutil) for zero-copy without the patching concern.M6 — SECURITY: Inconsistent symlink handling between sandbox strategies
overlay.py:183,464,488vscopy_on_write.py:139,385CopyOnWriteSandboxusessymlinks=True(preserves symlinks);OverlaySandboxuserspace fallback usessymlinks=False(follows symlinks and copies target content). This means the same original directory produces different security characteristics depending on strategy. If the original containslink → /etc/shadow, the overlay sandbox materializes its contents into a temp directory.M7 — DOCUMENTATION: CHANGELOG entry is a single ~90-line bullet point
CHANGELOG.md:5-91The entire PR's behavioral changes are concatenated into one
- **Breaking (behavioral):**paragraph. This is effectively unreadable and defeats the purpose of a changelog. Consider splitting into sub-bullets per distinct change.M8 — TEST: No scenario tests ordered commit (rollbackable-first, non-rollbackable-last)
manager.py:302commit_all()sorts sandboxes so rollbackable ones commit first andNoSandbox/TransactionSandboxcommit last. No test verifies this ordering, so a regression that breaks the ordering would degrade the atomicity guarantee silently.M9 — TEST: No direct unit tests for
compute_diff()_fs_utils.py:165-205compute_diffis a public API in_fs_utilsbut has no dedicated test scenarios. It is only tested indirectly through commit paths. Edge cases (empty dirs, symlink-only trees, binary files) are never verified.Low
L1 — BUG:
git_worktree.pyerror handlers unconditionally clear_pre_merge_commitgit_worktree.py:433, 440Both timeout and error handlers set
self._pre_merge_commit = Noneeven if_pre_merge_commitwas already recorded and the merge partially succeeded. The lost reference makes even manual recovery harder.L2 — SECURITY:
backup_directory/safe_restoreno path boundary validation_fs_utils.py:21, 110Neither function validates that paths are within expected boundaries. A caller passing a crafted path could cause operations on arbitrary directories. The functions trust their callers completely.
L3 — TEST: No test for
safe_restorefailure during commit error recovery (double-failure path)copy_on_write.py:293-303,overlay.py:339-349If
commit()fails and thensafe_restore()also fails during recovery, the backup is preserved for manual recovery. No scenario covers this double-failure path.L4 — TEST: No test for
backup_directorynon-regular file skip path_fs_utils.py:91-97The
elsebranch logging "Skipping non-regular file during backup" for FIFOs/sockets/device files is never exercised.L5 — TEST: BDD steps manipulate 15+ private attributes of
SandboxManagersandbox_manager_coverage_steps.py(throughout)Steps access
context.manager._active_sandboxesdirectly for mock injection and assertions. This creates tight coupling to internal implementation details. Consider adding a test-only injection method.L6 — DOCUMENTATION:
safe_restoredocstring claims "atomically" but operation has non-atomic window_fs_utils.py:111Between
os.rename(target_path, stale)andcopytreecompletion,target_patheither doesn't exist or is partially populated. The composite operation is not atomic. Adopting theos.renamefix from C2 would narrow this window significantly.Summary Table
Recommended Priority
safe_restoreto useos.renameinstead ofcopytree(eliminates the critical data-loss bug AND the performance issue in one change)ROLLED_BACKtoOverlaySandbox.get_path()transition (one-line fix, blocks re-activation)ROLLED_BACKtoGitWorktreeSandbox.get_path()allowed statuses (consistency with protocol)rollback_all()to handleCOMMITTEDstatuscleanup_all()except clause toException8cf55ba449bd3d01f8e3Code Review Report — PR #1146 (
fix/atomic-sandbox-commit)Reviewer: Automated deep review (5 iterative cycles)
Scope: All code changes on branch
fix/atomic-sandbox-commit(commitbd3d01f8) plus closely connected surrounding codeReference: Issue #925, specification lines 19193 and 45938
Summary
The PR correctly implements the atomic
commit_allsemantics required by spec line 45938 and properly documents the contradiction with spec line 19193 (partial apply). The overall design — rollback-on-failure with LIFO ordering, non-rollbackable-last commit ordering, pre-commit backups, rename-basedsafe_restore, and the shared_fs_utilsmodule — is sound and well-documented.15 findings were identified across 5 review cycles. None are critical, but 3 are high severity bugs that could cause incorrect state or crash loops in production.
HIGH Severity
H1. BUG —
cleanup_abandonedcatches onlySandboxError, inconsistent with other batch methodsFile:
manager.py:515All other batch methods in this PR were hardened to catch
Exception:cleanup_all(line 471):except Exceptionrollback_all(line 444):except Exception_rollback_committed(line 408):except ExceptionBut
cleanup_abandonedstill catches onlySandboxError:A raw
PermissionError,OSError, or any non-SandboxErrorwill crash the loop and prevent remaining abandoned sandboxes from being cleaned up. This directly contradicts the hardening pattern applied to every other batch method in this PR.Fix: Change
except SandboxErrortoexcept Exceptionat line 515.H2. BUG —
OverlaySandbox.create()doesn't handleTimeoutExpiredfrom_mount_overlay()File:
overlay.py:164-198/overlay.py:577_mount_overlay()catches(CalledProcessError, OSError)butsubprocess.TimeoutExpiredinherits fromSubprocessError, not from either of those. Ifmounthangs beyond 30s:TimeoutExpiredpropagates from_mount_overlay()create(), it falls through bothexcept SandboxCreationErrorandexcept OSErrorPENDING(never set toERRORED)Fix: Add
subprocess.TimeoutExpiredto the except clause in_mount_overlay():H3. BUG —
_unmount_overlay()doesn't catchTimeoutExpired, breakscleanup()File:
overlay.py:591/overlay.py:531-543Same root cause as H2. If
umounthangs beyond 30s duringcleanup(),TimeoutExpiredpropagates unhandled. The status is never set toCLEANED_UP, leaving the sandbox in a zombie state.Note: the rollback-from-COMMITTED path (lines 443-446) correctly catches
TimeoutExpired— the inconsistency is confined to the_unmount_overlayhelper.Fix: Add
subprocess.TimeoutExpiredto the except clause in_unmount_overlay():MEDIUM Severity
M1. BUG —
GitWorktreeSandbox.commit()doesn't checkdiff_result.returncodeFile:
git_worktree.py:363-371If
git difffails (returncode != 0),stdoutmay be empty, causing the method to incorrectly conclude there are no changes. It returns early at line 375 with statusCOMMITTEDbut without performing the merge. The commit's changes are silently lost.Fix: Check
diff_result.returncode != 0and raiseSandboxCommitErrorif the diff command itself failed.M2. BUG —
compute_diff()doesn't detect symlink target changesFile:
_fs_utils.py:191-213os.walk()resolves file-pointing symlinks intofilenames.filecmp.cmp()follows symlinks and compares the content they point to. This means:link -> Atolink -> B) but both targets have identical content, the change is missedThis could cause rollbacks to miss reverting symlink modifications.
Fix: Before
filecmp.cmp, checkos.path.islink()status and symlink targets withos.readlink()for files that exist in both trees.M3. DESIGN —
commit_all()uses dual error-reporting patternFile:
manager.py:306-356SandboxErrorfailures return aCommitResult(success=False)list. Non-SandboxErrorfailures raiseAtomicCommitError. Callers must handle both a return-value check AND an exception catch for failure cases. While both paths are documented in the docstring, this split makes it easy for callers to handle one path but miss the other, leading to silent failure.Suggestion: Consider unifying to always return a
CommitResultlist (wrapping all errors), or always raising on failure. Not necessarily a blocker for this PR, but worth tracking.M4. TEST GAP — No test for
cleanup_abandonedexception handling inconsistencyFiles:
features/sandbox_manager_coverage.featureThere is no BDD scenario verifying that
cleanup_abandonedcontinues processing when a non-SandboxErrorexception is raised. This gap directly corresponds to bug H1.LOW Severity
L1. BUG —
safe_restore()leaks empty temp directory on failureFile:
_fs_utils.py:150-170If
os.rename(backup_path, target_path)fails at line 159:except BaseExceptionrestores the original viaos.rename(stale, target_path)shutil.rmtree(stale_container)at line 170 is never reachedstale_containerdirectory persistsMinor leak that could accumulate over many failed restores.
Fix: Add a
finallyclause or handle cleanup in the except block.L2. MINOR —
backup_directory()doesn't preserve extended attributes (xattr)File:
_fs_utils.py:216-232_copy_filepreserves permissions (os.chmod) and timestamps (os.utime) but not extended attributes. On filesystems using SELinux contexts, ACLs, or macOS metadata, backup fidelity is reduced compared toshutil.copy2which preserves xattrs viashutil.copystat.L3. MINOR — Commit doesn't remove empty parent directories after file deletion
Files:
copy_on_write.py:278-281,overlay.py:324-327When files are deleted during commit,
os.remove()deletes the file but the parent directory (which may now be empty) is not cleaned up. Over multiple commit cycles, empty directory trees could accumulate in the original.L4. SECURITY (LOW) — Mount options injection via commas in path
File:
overlay.py:557-561If
self._original_pathcontains commas, the mount options string is malformed and could be misinterpreted by themountcommand. Whilesubprocess.runuses a list (preventing shell injection), the mount option value parsing is done by the kernel's filesystem driver. Low practical risk since paths rarely contain commas.Fix: Validate or escape commas in
_original_pathbefore constructing mount options.L5. TEST GAP — No test for
backup_directorywith special files (FIFOs, sockets)File:
features/sandbox_fs_utils.featureCode at
_fs_utils.py:91-97logs a warning and skips non-regular files (FIFOs, sockets, device files), but no BDD scenario verifies this behavior.L6. TEST GAP — No test for
safe_restorewithBaseExceptionFile:
_fs_utils.py:160The
except BaseExceptioncatch (which ensures rename-back onKeyboardInterrupt,SystemExit, etc.) has no test coverage. This is a critical safety path — verifying it would increase confidence in the rollback guarantees.L7. TEST GAP — No test for
_unmount_overlayTimeoutExpiredduring cleanupFile:
overlay.py:580-596No BDD or Robot scenario tests the behavior when
umounttimes out. This gap corresponds to bug H3.L8. TEST FLAW —
safe_restoreerror test has misleading step nameFile:
features/sandbox_fs_utils.featureThe scenario "safe_restore preserves original on rename failure" uses a step "backup path that will cause copytree to fail". However, after the round-2 hardening,
safe_restoreusesos.rename(notcopytree). The step name is stale and misleading.Statistics
Note: Finding L1 (
safe_restoretemp leak) was counted separately from the 14 in the table — total is 15 findings.Recommendation
The PR implements the core atomicity requirement correctly and demonstrates thorough hardening across multiple review rounds. The 3 HIGH findings (H1, H2, H3) should be addressed before merge as they represent real failure modes —
cleanup_abandonedloop crash andOverlaySandboxstate corruption on timeout. The MEDIUM bugs (M1, M2) are worth fixing but are less likely to manifest in typical usage. LOW findings can be addressed as follow-ups.bd3d01f8e3f614fa9006f614fa90067f078f75a5