feat(correction): wire checkpoint rollback into correction service revert flow #943

Closed
opened 2026-03-14 01:18:03 +00:00 by freemo · 1 comment
Owner

Background and Context

The specification defines plan correction with checkpoint rollback as a critical safety feature (spec §Core Concepts > Correction, §Behavior > Corrections). When a plan correction reverts one or more decisions, the system must: (1) create a checkpoint of the current workspace state, (2) archive artifacts from the reverted decisions, (3) roll back filesystem changes to the pre-decision state, and (4) re-execute from the corrected decision point.

The current implementation in src/cleveragents/application/services/correction_service.py (~85% complete) has:

  • CorrectionService — identifies affected decisions, marks them for re-execution
  • create_correction() — creates correction records with reason and scope
  • apply_correction() — re-executes affected decisions
  • Checkpoint model exists in database schema
  • Artifact archival writes metadata records

Missing:

  • Checkpoint rollback not wiredCorrectionService.revert_decisions() marks decisions as reverted in the database but does NOT invoke the checkpoint system to roll back filesystem changes
  • Artifact archival is metadata-only — Reverted decision artifacts are flagged in the DB but the actual files are not moved to an archive location
  • No workspace snapshot — The checkpoint system exists in the model layer but no service actually creates workspace snapshots before plan execution starts
  • No selective rollback — Cannot roll back to an arbitrary checkpoint; only full revert is possible

Affected files

  • src/cleveragents/application/services/correction_service.pyCorrectionService
  • src/cleveragents/application/services/checkpoint_service.py — Exists but minimal
  • src/cleveragents/application/services/artifact_service.py — Artifact management
  • src/cleveragents/domain/models/core/checkpoint.py — Checkpoint model

Expected Behavior

When a plan correction reverts decisions, the filesystem must be rolled back to the pre-decision state using workspace checkpoints. Reverted artifacts must be physically archived. Selective rollback to any checkpoint must be possible.

Acceptance Criteria

  • Workspace checkpoints created automatically before each decision execution
  • CorrectionService.revert_decisions() invokes checkpoint rollback to restore filesystem state
  • Reverted decision artifacts physically moved to archive directory (not just metadata flag)
  • Selective rollback: can roll back to any named checkpoint, not just the last one
  • Checkpoint storage uses efficient diff-based snapshots (not full copies)
  • plan rollback --to-checkpoint <id> CLI command functional
  • Rollback is atomic: either fully succeeds or fully fails with no partial state

Metadata

  • Commit message: feat(correction): wire checkpoint rollback into correction service revert flow
  • Branch: feature/correction-checkpoint-rollback
  • Parent Epic: None (standalone feature)
  • Blocks: None
  • Blocked by: None

Subtasks

  • Implement workspace snapshot creation in CheckpointService before decision execution
  • Wire CorrectionService.revert_decisions() to call CheckpointService.rollback()
  • Implement filesystem rollback from checkpoint (restore files to snapshot state)
  • Implement physical artifact archival (move files to archive directory)
  • Implement selective rollback to arbitrary checkpoint by ID
  • Implement diff-based checkpoint storage (store only changed files)
  • Implement atomic rollback with transaction-like semantics
  • Add plan rollback --to-checkpoint CLI command
  • Tests (Behave): Add scenarios for checkpoint creation and rollback
  • Tests (Behave): Add scenarios for artifact archival
  • Tests (Unit): Add tests for diff-based snapshot storage
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when plan corrections physically roll back filesystem state using workspace checkpoints, artifacts are properly archived, and selective rollback to any checkpoint is functional.

## Background and Context The specification defines plan correction with checkpoint rollback as a critical safety feature (spec §Core Concepts > Correction, §Behavior > Corrections). When a plan correction reverts one or more decisions, the system must: (1) create a checkpoint of the current workspace state, (2) archive artifacts from the reverted decisions, (3) roll back filesystem changes to the pre-decision state, and (4) re-execute from the corrected decision point. The current implementation in `src/cleveragents/application/services/correction_service.py` (~85% complete) has: - `CorrectionService` — identifies affected decisions, marks them for re-execution - `create_correction()` — creates correction records with reason and scope - `apply_correction()` — re-executes affected decisions - Checkpoint model exists in database schema - Artifact archival writes metadata records Missing: - **Checkpoint rollback not wired** — `CorrectionService.revert_decisions()` marks decisions as reverted in the database but does NOT invoke the checkpoint system to roll back filesystem changes - **Artifact archival is metadata-only** — Reverted decision artifacts are flagged in the DB but the actual files are not moved to an archive location - **No workspace snapshot** — The checkpoint system exists in the model layer but no service actually creates workspace snapshots before plan execution starts - **No selective rollback** — Cannot roll back to an arbitrary checkpoint; only full revert is possible ### Affected files - `src/cleveragents/application/services/correction_service.py` — `CorrectionService` - `src/cleveragents/application/services/checkpoint_service.py` — Exists but minimal - `src/cleveragents/application/services/artifact_service.py` — Artifact management - `src/cleveragents/domain/models/core/checkpoint.py` — Checkpoint model ## Expected Behavior When a plan correction reverts decisions, the filesystem must be rolled back to the pre-decision state using workspace checkpoints. Reverted artifacts must be physically archived. Selective rollback to any checkpoint must be possible. ## Acceptance Criteria - [ ] Workspace checkpoints created automatically before each decision execution - [ ] `CorrectionService.revert_decisions()` invokes checkpoint rollback to restore filesystem state - [ ] Reverted decision artifacts physically moved to archive directory (not just metadata flag) - [ ] Selective rollback: can roll back to any named checkpoint, not just the last one - [ ] Checkpoint storage uses efficient diff-based snapshots (not full copies) - [ ] `plan rollback --to-checkpoint <id>` CLI command functional - [ ] Rollback is atomic: either fully succeeds or fully fails with no partial state ## Metadata - **Commit message**: `feat(correction): wire checkpoint rollback into correction service revert flow` - **Branch**: `feature/correction-checkpoint-rollback` - **Parent Epic**: None (standalone feature) - **Blocks**: None - **Blocked by**: None ## Subtasks - [ ] Implement workspace snapshot creation in `CheckpointService` before decision execution - [ ] Wire `CorrectionService.revert_decisions()` to call `CheckpointService.rollback()` - [ ] Implement filesystem rollback from checkpoint (restore files to snapshot state) - [ ] Implement physical artifact archival (move files to archive directory) - [ ] Implement selective rollback to arbitrary checkpoint by ID - [ ] Implement diff-based checkpoint storage (store only changed files) - [ ] Implement atomic rollback with transaction-like semantics - [ ] Add `plan rollback --to-checkpoint` CLI command - [ ] Tests (Behave): Add scenarios for checkpoint creation and rollback - [ ] Tests (Behave): Add scenarios for artifact archival - [ ] Tests (Unit): Add tests for diff-based snapshot storage - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when plan corrections physically roll back filesystem state using workspace checkpoints, artifacts are properly archived, and selective rollback to any checkpoint is functional.
freemo added this to the v3.5.0 milestone 2026-03-14 01:18:26 +00:00
freemo self-assigned this 2026-03-29 02:30:31 +00:00
Author
Owner

Implementation submitted in PR #1199.

Branch: feature/correction-checkpoint-rollback
Commit: 7467fa304c888a51594f0bcd1d2fac7157964d78

Changes Summary

  1. CheckpointService — Added create_workspace_snapshot() for diff-based pre-decision snapshots, selective_rollback() with atomic semantics, archive_artifacts() for physical file archival, and _compute_diff_snapshot() for efficient diff-only storage
  2. CorrectionService — Added revert_decisions() high-level entry point that wires checkpoint rollback and artifact archival into the revert flow; added _archive_decision_artifacts() for physical artifact moves during revert
  3. Checkpoint model — Added pre_decision checkpoint type
  4. DI container — Wired checkpoint_service into CorrectionService (fixes bug #986)
  5. CLI — Added --to-checkpoint option to plan rollback; correct command now uses container-provided service
  6. TDD — Removed @tdd_expected_fail from wiring tests (bug now fixed)

Nox Results

Session Result
lint pass
format pass
typecheck 0 errors
dead_code pass
security_scan pass
BDD (new) 12/12 scenarios pass
BDD (existing checkpoint) 49/49 pass
BDD (existing correction) 100/100 pass
BDD (TDD wiring) 2/2 pass
Implementation submitted in PR #1199. **Branch**: `feature/correction-checkpoint-rollback` **Commit**: `7467fa304c888a51594f0bcd1d2fac7157964d78` ### Changes Summary 1. **CheckpointService** — Added `create_workspace_snapshot()` for diff-based pre-decision snapshots, `selective_rollback()` with atomic semantics, `archive_artifacts()` for physical file archival, and `_compute_diff_snapshot()` for efficient diff-only storage 2. **CorrectionService** — Added `revert_decisions()` high-level entry point that wires checkpoint rollback and artifact archival into the revert flow; added `_archive_decision_artifacts()` for physical artifact moves during revert 3. **Checkpoint model** — Added `pre_decision` checkpoint type 4. **DI container** — Wired `checkpoint_service` into `CorrectionService` (fixes bug #986) 5. **CLI** — Added `--to-checkpoint` option to `plan rollback`; `correct` command now uses container-provided service 6. **TDD** — Removed `@tdd_expected_fail` from wiring tests (bug now fixed) ### Nox Results | Session | Result | |---------|--------| | lint | pass | | format | pass | | typecheck | 0 errors | | dead_code | pass | | security_scan | pass | | BDD (new) | 12/12 scenarios pass | | BDD (existing checkpoint) | 49/49 pass | | BDD (existing correction) | 100/100 pass | | BDD (TDD wiring) | 2/2 pass |
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#943
No description provided.