fix: [error-handling] Propagate exception when sandbox backup restore fails during commit #1291

Open
opened 2026-04-02 09:00:40 +00:00 by freemo · 1 comment
Owner

Bug Report: [error-handling] — Exception during sandbox restore is not propagated

Severity Assessment

  • Impact: In the rare event that the atomic restore operation fails during a sandbox commit, the exception is swallowed and only a warning is logged. The caller of the commit method is not notified of the failed restore, which could leave the original data in an inconsistent or corrupted state.
  • Likelihood: Low. This requires a failure in the safe_restore function, which is designed to be robust.
  • Priority: Medium

Location

  • File: src/cleveragents/infrastructure/sandbox/copy_on_write.py, src/cleveragents/infrastructure/sandbox/overlay.py
  • Function/Class: CopyOnWriteSandbox.commit, OverlaySandbox.commit
  • Lines: copy_on_write.py:297, overlay.py:343

Description

The commit method in CopyOnWriteSandbox and OverlaySandbox attempts to restore a backup of the original data if the commit operation fails. If this restore operation itself fails, the exception is caught, a warning is logged, and the original exception that triggered the rollback is re-raised. The information about the failed restore is lost, and the caller is not made aware of the potential data corruption.

Evidence

# src/cleveragents/infrastructure/sandbox/copy_on_write.py:291
            if self._pre_commit_backup is not None:
                try:
                    safe_restore(self._pre_commit_backup, self._original_path)
                    self._pre_commit_backup = None
                except Exception:
                    logger.warning(
                        "Failed to restore original from backup for sandbox %s; "
                        "backup preserved at %s for manual recovery",
                        self._sandbox_id,
                        self._pre_commit_backup,
                    )

Expected Behavior

If the safe_restore operation fails, the exception should be propagated by wrapping it in a SandboxCommitError that provides full context about the failure, including the sandbox ID, original path, and backup path.

Actual Behavior

The exception during restore is swallowed, and only a warning is logged.

Suggested Fix

                except Exception as restore_exc:
                    logger.warning(...)
                    raise SandboxCommitError(
                        f"Failed to restore original from backup for sandbox {self._sandbox_id}. "
                        f"The original data at {self._original_path} might be corrupted. "
                        f"A backup is available at {self._pre_commit_backup}."
                    ) from restore_exc

Category

error-handling


Metadata

  • Branch: fix/error-handling-propagate-sandbox-restore-exception
  • Commit Message: fix(sandbox): raise SandboxCommitError when backup restore fails during commit
  • Milestone: v3.3.0
  • Parent Epic: #362

Subtasks

  • Write a failing Behave scenario (TDD) for CopyOnWriteSandbox.commit that asserts a SandboxCommitError is raised when safe_restore throws during rollback
  • Write a failing Behave scenario (TDD) for OverlaySandbox.commit that asserts a SandboxCommitError is raised when safe_restore throws during rollback
  • Update CopyOnWriteSandbox.commit in copy_on_write.py (line ~297): catch Exception as restore_exc, log the warning, then raise SandboxCommitError(...) from restore_exc
  • Update OverlaySandbox.commit in overlay.py (line ~343): apply the same fix
  • Ensure SandboxCommitError message includes sandbox ID, original path, and backup path for operator recovery
  • Run nox -e typecheck and resolve any Pyright errors
  • Run nox -e unit_tests and confirm all Behave scenarios pass
  • Run nox -e coverage_report and confirm coverage >= 97%
  • Run nox (all default sessions) and confirm no regressions

Definition of Done

  • CopyOnWriteSandbox.commit raises SandboxCommitError (chained from the restore exception) when safe_restore fails, rather than silently swallowing the exception
  • OverlaySandbox.commit raises SandboxCommitError (chained from the restore exception) when safe_restore fails, rather than silently swallowing the exception
  • The raised SandboxCommitError message includes the sandbox ID, original path, and backup path to aid manual recovery
  • Behave unit test scenarios cover the new raise path for both sandbox implementations
  • All nox stages pass
  • Coverage >= 97%
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Bug Report: [error-handling] — Exception during sandbox restore is not propagated ### Severity Assessment - **Impact**: In the rare event that the atomic restore operation fails during a sandbox commit, the exception is swallowed and only a warning is logged. The caller of the `commit` method is not notified of the failed restore, which could leave the original data in an inconsistent or corrupted state. - **Likelihood**: Low. This requires a failure in the `safe_restore` function, which is designed to be robust. - **Priority**: Medium ### Location - **File**: `src/cleveragents/infrastructure/sandbox/copy_on_write.py`, `src/cleveragents/infrastructure/sandbox/overlay.py` - **Function/Class**: `CopyOnWriteSandbox.commit`, `OverlaySandbox.commit` - **Lines**: `copy_on_write.py:297`, `overlay.py:343` ### Description The `commit` method in `CopyOnWriteSandbox` and `OverlaySandbox` attempts to restore a backup of the original data if the commit operation fails. If this restore operation itself fails, the exception is caught, a warning is logged, and the original exception that triggered the rollback is re-raised. The information about the failed restore is lost, and the caller is not made aware of the potential data corruption. ### Evidence ```python # src/cleveragents/infrastructure/sandbox/copy_on_write.py:291 if self._pre_commit_backup is not None: try: safe_restore(self._pre_commit_backup, self._original_path) self._pre_commit_backup = None except Exception: logger.warning( "Failed to restore original from backup for sandbox %s; " "backup preserved at %s for manual recovery", self._sandbox_id, self._pre_commit_backup, ) ``` ### Expected Behavior If the `safe_restore` operation fails, the exception should be propagated by wrapping it in a `SandboxCommitError` that provides full context about the failure, including the sandbox ID, original path, and backup path. ### Actual Behavior The exception during restore is swallowed, and only a warning is logged. ### Suggested Fix ```python except Exception as restore_exc: logger.warning(...) raise SandboxCommitError( f"Failed to restore original from backup for sandbox {self._sandbox_id}. " f"The original data at {self._original_path} might be corrupted. " f"A backup is available at {self._pre_commit_backup}." ) from restore_exc ``` ### Category error-handling --- ## Metadata - **Branch**: `fix/error-handling-propagate-sandbox-restore-exception` - **Commit Message**: `fix(sandbox): raise SandboxCommitError when backup restore fails during commit` - **Milestone**: v3.3.0 - **Parent Epic**: #362 ## Subtasks - [ ] Write a failing Behave scenario (TDD) for `CopyOnWriteSandbox.commit` that asserts a `SandboxCommitError` is raised when `safe_restore` throws during rollback - [ ] Write a failing Behave scenario (TDD) for `OverlaySandbox.commit` that asserts a `SandboxCommitError` is raised when `safe_restore` throws during rollback - [ ] Update `CopyOnWriteSandbox.commit` in `copy_on_write.py` (line ~297): catch `Exception as restore_exc`, log the warning, then `raise SandboxCommitError(...) from restore_exc` - [ ] Update `OverlaySandbox.commit` in `overlay.py` (line ~343): apply the same fix - [ ] Ensure `SandboxCommitError` message includes sandbox ID, original path, and backup path for operator recovery - [ ] Run `nox -e typecheck` and resolve any Pyright errors - [ ] Run `nox -e unit_tests` and confirm all Behave scenarios pass - [ ] Run `nox -e coverage_report` and confirm coverage >= 97% - [ ] Run `nox` (all default sessions) and confirm no regressions ## Definition of Done - [ ] `CopyOnWriteSandbox.commit` raises `SandboxCommitError` (chained from the restore exception) when `safe_restore` fails, rather than silently swallowing the exception - [ ] `OverlaySandbox.commit` raises `SandboxCommitError` (chained from the restore exception) when `safe_restore` fails, rather than silently swallowing the exception - [ ] The raised `SandboxCommitError` message includes the sandbox ID, original path, and backup path to aid manual recovery - [ ] Behave unit test scenarios cover the new raise path for both sandbox implementations - [ ] All nox stages pass - [ ] Coverage >= 97% - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.3.0 milestone 2026-04-02 09:01:18 +00:00
Author
Owner

Starting implementation on branch fix/error-handling-propagate-sandbox-restore-exception.

Plan:

  1. Write failing Behave scenarios (TDD) for both CopyOnWriteSandbox and OverlaySandbox
  2. Fix CopyOnWriteSandbox.commit to raise SandboxCommitError when safe_restore fails
  3. Fix OverlaySandbox.commit to raise SandboxCommitError when safe_restore fails
  4. Ensure error messages include sandbox ID, original path, and backup path
  5. Run all quality gates
Starting implementation on branch `fix/error-handling-propagate-sandbox-restore-exception`. **Plan:** 1. Write failing Behave scenarios (TDD) for both `CopyOnWriteSandbox` and `OverlaySandbox` 2. Fix `CopyOnWriteSandbox.commit` to raise `SandboxCommitError` when `safe_restore` fails 3. Fix `OverlaySandbox.commit` to raise `SandboxCommitError` when `safe_restore` fails 4. Ensure error messages include sandbox ID, original path, and backup path 5. Run all quality gates
freemo self-assigned this 2026-04-02 18:45:25 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#1291
No description provided.