feat(sandbox): add checkpoint and rollback hooks #183

Closed
opened 2026-02-22 23:39:55 +00:00 by freemo · 2 comments
Owner

Metadata

  • Commit Message: feat(sandbox): add checkpoint and rollback hooks
  • Branch: feature/m4-checkpoints

Background

Sandbox checkpoint and rollback hooks are integrated into plan execute/apply flows with metadata capture. Checkpoints preserve sandbox state at key points, and rollback restores to a specific checkpoint with metadata preserved.

Acceptance Criteria

  • Add sandbox checkpoint/rollback hooks for plan execute/apply flows with metadata capture.
  • Update docs/reference/sandbox.md with checkpoint lifecycle and rollback behavior.

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches
    the Commit Message in Metadata exactly, followed by a blank line, then
    additional lines providing relevant details about the implementation. The
    commit body should be appropriate in size for a commit message and relatively
    complete in describing what was done.
  • The commit is pushed to the remote on the branch matching the Branch in
    Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and
    merged before this issue is marked done.

Subtasks

  • Add sandbox checkpoint/rollback hooks for plan execute/apply flows with metadata capture.
  • Update docs/reference/sandbox.md with checkpoint lifecycle and rollback behavior.
  • Tests (Behave): Add scenarios for checkpoint creation and rollback on failure.
  • Tests (Robot): Add Robot test verifying rollback after failed apply.
  • Tests (ASV): Add benchmarks/sandbox_checkpoint_bench.py for checkpoint overhead.
  • Verify coverage >=97% via nox -s coverage_report. If coverage is <97% then review the current unit test coverage report at build/coverage.xml and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improves coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun nox -s coverage_report to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%.
  • Run nox (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across entire code base, do not ignore any failure even if it seems unrelated to this commit, fix it.

Section: #### M4: Corrections + Subplans + Checkpoints (Day 22)
Status: Open

## Metadata - **Commit Message**: `feat(sandbox): add checkpoint and rollback hooks` - **Branch**: `feature/m4-checkpoints` ## Background Sandbox checkpoint and rollback hooks are integrated into plan execute/apply flows with metadata capture. Checkpoints preserve sandbox state at key points, and rollback restores to a specific checkpoint with metadata preserved. ## Acceptance Criteria - [ ] Add sandbox checkpoint/rollback hooks for plan execute/apply flows with metadata capture. - [ ] Update `docs/reference/sandbox.md` with checkpoint lifecycle and rollback behavior. ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. The commit body should be appropriate in size for a commit message and relatively complete in describing what was done. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. ## Subtasks - [ ] Add sandbox checkpoint/rollback hooks for plan execute/apply flows with metadata capture. - [ ] Update `docs/reference/sandbox.md` with checkpoint lifecycle and rollback behavior. - [ ] Tests (Behave): Add scenarios for checkpoint creation and rollback on failure. - [ ] Tests (Robot): Add Robot test verifying rollback after failed apply. - [ ] Tests (ASV): Add `benchmarks/sandbox_checkpoint_bench.py` for checkpoint overhead. - [ ] Verify coverage >=97% via `nox -s coverage_report`. If coverage is <97% then review the current unit test coverage report at `build/coverage.xml` and use it to write new Behave based unit tests to improve code coverage. Specifically, write Behave style unit tests that are descriptively named and specifically improves coverage on whichever file has the most uncovered lines by writing tests that will target the uncovered lines in the report. Once that is done rerun `nox -s coverage_report` to verify all tests pass and coverage is above >=97%. Only mark this as complete once coverage is >=97%, if not repeat this task as many times as is needed until coverage reaches >=97%. - [ ] Run `nox` (all default sessions, including benchmark), fix any errors if needed ensuring nox passes across **entire** code base, do not ignore any failure even if it seems unrelated to this commit, fix it. **Section**: #### M4: Corrections + Subplans + Checkpoints (Day 22) **Status**: Open
freemo added this to the v3.3.0 milestone 2026-02-22 23:39:55 +00:00
Author
Owner

Expected completion updated (Day 15 rebaseline): Day 31 / 2026-03-11 (previously Day 24 / 2026-03-04)

**Expected completion updated (Day 15 rebaseline):** Day 31 / 2026-03-11 (previously Day 24 / 2026-03-04)
freemo added the due date 2026-03-03 2026-02-23 18:41:38 +00:00
freemo self-assigned this 2026-02-24 21:53:37 +00:00
Author
Owner

Implementation Notes (Day 19 — 2026-02-27)

PR #462: feat(sandbox): add checkpoint and rollback hooks

Branch: feature/m4-checkpoints
Commit: 0f4eb806782d1cf29840299911aa7948dba5ac74

Changes (11 files, +1503 lines)

  1. SandboxCheckpoint Model & CheckpointManager (infrastructure/sandbox/checkpoint.py):

    • SandboxCheckpoint frozen Pydantic model with ULID-based IDs, phase tracking, metadata capture
    • CheckpointManager with create_checkpoint, rollback_to, list_checkpoints, delete_checkpoint
    • Filesystem-based snapshot preservation
  2. Plan Execute Integration (plan_executor.py):

    • Pre-execute checkpoint creation
    • Post-execute checkpoint on success
    • Automatic rollback on failure
  3. Plan Apply Integration (plan_apply_service.py):

    • Pre-apply checkpoint creation
    • Rollback on apply failure
  4. Documentation: New docs/reference/sandbox.md with checkpoint lifecycle diagram and rollback behavior

Tests

  • Behave: 12 scenarios — checkpoint creation, rollback on failure, multiple checkpoints, metadata capture
  • Robot: 5 test cases — checkpoint creation during execute, rollback after failed apply
  • ASV: 3 benchmarks — creation, rollback, listing overhead

Quality Gates: All PASS (lint, typecheck, unit_tests, integration_tests)

## Implementation Notes (Day 19 — 2026-02-27) ### PR #462: `feat(sandbox): add checkpoint and rollback hooks` **Branch:** `feature/m4-checkpoints` **Commit:** `0f4eb806782d1cf29840299911aa7948dba5ac74` ### Changes (11 files, +1503 lines) 1. **SandboxCheckpoint Model & CheckpointManager** (`infrastructure/sandbox/checkpoint.py`): - `SandboxCheckpoint` frozen Pydantic model with ULID-based IDs, phase tracking, metadata capture - `CheckpointManager` with `create_checkpoint`, `rollback_to`, `list_checkpoints`, `delete_checkpoint` - Filesystem-based snapshot preservation 2. **Plan Execute Integration** (`plan_executor.py`): - Pre-execute checkpoint creation - Post-execute checkpoint on success - Automatic rollback on failure 3. **Plan Apply Integration** (`plan_apply_service.py`): - Pre-apply checkpoint creation - Rollback on apply failure 4. **Documentation**: New `docs/reference/sandbox.md` with checkpoint lifecycle diagram and rollback behavior ### Tests - **Behave**: 12 scenarios — checkpoint creation, rollback on failure, multiple checkpoints, metadata capture - **Robot**: 5 test cases — checkpoint creation during execute, rollback after failed apply - **ASV**: 3 benchmarks — creation, rollback, listing overhead ### Quality Gates: All PASS (lint, typecheck, unit_tests, integration_tests)
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

2026-03-03

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Depends on
Reference
cleveragents/cleveragents-core#183
No description provided.