feat(checkpoint): add checkpointing and rollback #445

Merged
CoreRasurae merged 1 commit from feature/m6-checkpoint into master 2026-03-02 10:46:16 +00:00
Member

Summary

Implements the checkpoint and rollback subsystem for plan execution safety, including domain models, database persistence, CLI commands, and correction flow integration.

Changes

  • Checkpoint domain model: Checkpoint with ULID ID, plan association, sandbox reference, and metadata (reason, source tool, phase). CheckpointRetentionPolicy for auto-prune. RollbackResult for operation outcomes.
  • Database layer: CheckpointModel ORM mapping and CheckpointRepository with CRUD + prune operations
  • CheckpointService: Full checkpoint lifecycle management with creation, rollback, listing, and retention enforcement
  • CLI command: plan rollback <plan_id> <checkpoint_id> with output showing restored file counts and changed paths
  • Correction integration: Wired into CorrectionService for revert flow reuse
  • Safety guards: Rollback rejected when plan is applied or sandbox is missing
  • BDD Tests: 20 Behave scenarios for checkpoint/rollback behavior
  • Robot Tests: 10 integration test cases
  • ASV Benchmarks: Rollback latency benchmarks
  • Documentation: Reference documentation at docs/reference/checkpointing.md

Motivation

Checkpointing enables safe, reversible plan execution by capturing sandbox state at key points. The rollback capability allows recovering from failed execution steps without losing earlier progress, and integrates with the correction flow for targeted decision reversal.

Key Design Decisions

  • Checkpoint IDs use ULIDs for sortable, unique identification
  • Retention policy with auto-prune prevents unbounded checkpoint accumulation
  • Git-worktree snapshots capture commit hash for efficient storage and restore
  • Rollback guards enforce state safety (no rollback on applied plans)

Closes #206

## Summary Implements the checkpoint and rollback subsystem for plan execution safety, including domain models, database persistence, CLI commands, and correction flow integration. ### Changes - **Checkpoint domain model**: `Checkpoint` with ULID ID, plan association, sandbox reference, and metadata (reason, source tool, phase). `CheckpointRetentionPolicy` for auto-prune. `RollbackResult` for operation outcomes. - **Database layer**: `CheckpointModel` ORM mapping and `CheckpointRepository` with CRUD + prune operations - **CheckpointService**: Full checkpoint lifecycle management with creation, rollback, listing, and retention enforcement - **CLI command**: `plan rollback <plan_id> <checkpoint_id>` with output showing restored file counts and changed paths - **Correction integration**: Wired into `CorrectionService` for revert flow reuse - **Safety guards**: Rollback rejected when plan is applied or sandbox is missing - **BDD Tests**: 20 Behave scenarios for checkpoint/rollback behavior - **Robot Tests**: 10 integration test cases - **ASV Benchmarks**: Rollback latency benchmarks - **Documentation**: Reference documentation at `docs/reference/checkpointing.md` ### Motivation Checkpointing enables safe, reversible plan execution by capturing sandbox state at key points. The rollback capability allows recovering from failed execution steps without losing earlier progress, and integrates with the correction flow for targeted decision reversal. ### Key Design Decisions - Checkpoint IDs use ULIDs for sortable, unique identification - Retention policy with auto-prune prevents unbounded checkpoint accumulation - Git-worktree snapshots capture commit hash for efficient storage and restore - Rollback guards enforce state safety (no rollback on applied plans) Closes #206
CoreRasurae added this to the v3.5.0 milestone 2026-02-25 22:23:56 +00:00
CoreRasurae force-pushed feature/m6-checkpoint from 5427f8983a
Some checks failed
CI / lint (pull_request) Successful in 25s
CI / security (pull_request) Successful in 50s
CI / typecheck (pull_request) Successful in 1m4s
CI / quality (pull_request) Successful in 27s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / integration_tests (pull_request) Successful in 4m41s
CI / unit_tests (pull_request) Successful in 32m6s
CI / docker (pull_request) Successful in 1m0s
CI / benchmark-regression (pull_request) Successful in 22m51s
CI / coverage (pull_request) Failing after 1h37m38s
to 25fc4229f8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 20s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 57s
CI / integration_tests (pull_request) Failing after 1m42s
CI / unit_tests (pull_request) Failing after 13m32s
CI / docker (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 22m30s
CI / coverage (pull_request) Failing after 30m17s
2026-02-27 16:21:26 +00:00
Compare
brent.edwards left a comment

I asked GPT-5.2 Codex to do a review, but it was garbage. Trying again.

I asked GPT-5.2 Codex to do a review, but it was garbage. Trying again.
brent.edwards left a comment

Review Summary (commit 25fc4229f822d2fca2aa5)

Scoped to the checkpointing/rollback commit. The core models and service shape look good, but there are blockers around migration lineage and the rollback guard wiring that make the CLI path unusable in practice.

CI status isn’t visible via the API on my side. Please confirm required checks per docs/development/ci-cd.md are green (lint, typecheck, security, quality, unit_tests, integration_tests, coverage, build, docker).

Findings

P0:blocker — Alembic migration chain is broken. m6_001_checkpoint_metadata_table.py uses down_revision = "c3_001_actor_registry", but the repo only has c3_001_actor_registry_columns.py. This will fail migrations.

  • File: alembic/versions/m6_001_checkpoint_metadata_table.py

P1:must-fixplan rollback will always fail with “sandbox is missing.” CheckpointService guards rely on in‑memory register_sandbox, but the DI container provides a new service instance per call (providers.Factory), and the CLI never registers sandbox refs. There’s no persistent sandbox registry lookup either, so rollback cannot succeed for real usage.

  • Files: src/cleveragents/application/services/checkpoint_service.py, src/cleveragents/application/container.py, src/cleveragents/cli/commands/plan.py

P1:must-fix — The “plan already applied” guard is also in‑memory and never set in production flows. Even if the sandbox guard were fixed, applied plans could be rolled back unless you wire this to lifecycle state in the DB.

  • File: src/cleveragents/application/services/checkpoint_service.py

P2:should-fix — Docs claim CorrectionService can delegate rollback to CheckpointService, but the service never calls it. Either implement delegation or remove/adjust the doc claim to avoid misleading behavior.

  • Files: src/cleveragents/application/services/correction_service.py, docs/reference/checkpointing.md

P3:should-fix — Retention policy defaults are documented but create_checkpoint() only prunes if a policy is explicitly passed. Consider defaulting to DEFAULT_RETENTION_POLICY to match docs.

  • Files: src/cleveragents/application/services/checkpoint_service.py, docs/reference/checkpointing.md

Positive Notes

  • Domain model and audit metadata are clear and align with the spec.
  • Repository + DI wiring is consistent with ADR-007 patterns.
  • CLI output envelope follows the spec for rollback responses.
## Review Summary (commit 25fc4229f822d2fca2aa5) Scoped to the checkpointing/rollback commit. The core models and service shape look good, but there are blockers around migration lineage and the rollback guard wiring that make the CLI path unusable in practice. CI status isn’t visible via the API on my side. Please confirm required checks per `docs/development/ci-cd.md` are green (lint, typecheck, security, quality, unit_tests, integration_tests, coverage, build, docker). ## Findings **P0:blocker** — Alembic migration chain is broken. `m6_001_checkpoint_metadata_table.py` uses `down_revision = "c3_001_actor_registry"`, but the repo only has `c3_001_actor_registry_columns.py`. This will fail migrations. - File: `alembic/versions/m6_001_checkpoint_metadata_table.py` **P1:must-fix** — `plan rollback` will always fail with “sandbox is missing.” `CheckpointService` guards rely on in‑memory `register_sandbox`, but the DI container provides a **new** service instance per call (`providers.Factory`), and the CLI never registers sandbox refs. There’s no persistent sandbox registry lookup either, so rollback cannot succeed for real usage. - Files: `src/cleveragents/application/services/checkpoint_service.py`, `src/cleveragents/application/container.py`, `src/cleveragents/cli/commands/plan.py` **P1:must-fix** — The “plan already applied” guard is also in‑memory and never set in production flows. Even if the sandbox guard were fixed, applied plans could be rolled back unless you wire this to lifecycle state in the DB. - File: `src/cleveragents/application/services/checkpoint_service.py` **P2:should-fix** — Docs claim CorrectionService can delegate rollback to CheckpointService, but the service never calls it. Either implement delegation or remove/adjust the doc claim to avoid misleading behavior. - Files: `src/cleveragents/application/services/correction_service.py`, `docs/reference/checkpointing.md` **P3:should-fix** — Retention policy defaults are documented but `create_checkpoint()` only prunes if a policy is explicitly passed. Consider defaulting to `DEFAULT_RETENTION_POLICY` to match docs. - Files: `src/cleveragents/application/services/checkpoint_service.py`, `docs/reference/checkpointing.md` ## Positive Notes - Domain model and audit metadata are clear and align with the spec. - Repository + DI wiring is consistent with ADR-007 patterns. - CLI output envelope follows the spec for rollback responses.
CoreRasurae force-pushed feature/m6-checkpoint from 25fc4229f8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 20s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 57s
CI / integration_tests (pull_request) Failing after 1m42s
CI / unit_tests (pull_request) Failing after 13m32s
CI / docker (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 22m30s
CI / coverage (pull_request) Failing after 30m17s
to 830b29e4ae
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 27s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 36s
CI / build (pull_request) Successful in 24s
CI / integration_tests (pull_request) Successful in 3m37s
CI / unit_tests (pull_request) Successful in 11m34s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 22m41s
CI / coverage (pull_request) Failing after 43m38s
2026-02-28 00:21:42 +00:00
Compare
brent.edwards approved these changes 2026-02-28 00:33:39 +00:00
Dismissed
brent.edwards left a comment

Approved.

Approved.
CoreRasurae force-pushed feature/m6-checkpoint from 830b29e4ae
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 27s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 36s
CI / build (pull_request) Successful in 24s
CI / integration_tests (pull_request) Successful in 3m37s
CI / unit_tests (pull_request) Successful in 11m34s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 22m41s
CI / coverage (pull_request) Failing after 43m38s
to 8cb008478e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 33s
CI / integration_tests (pull_request) Successful in 4m29s
CI / unit_tests (pull_request) Failing after 9m15s
CI / docker (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 28m1s
CI / coverage (pull_request) Failing after 30m44s
2026-03-01 10:25:02 +00:00
Compare
CoreRasurae dismissed brent.edwards's review 2026-03-01 10:25:02 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-01 10:25:39 +00:00
CoreRasurae canceled auto merging this pull request when all checks succeed 2026-03-01 10:25:54 +00:00
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-01 10:26:03 +00:00
CoreRasurae force-pushed feature/m6-checkpoint from 8cb008478e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 32s
CI / typecheck (pull_request) Successful in 33s
CI / integration_tests (pull_request) Successful in 4m29s
CI / unit_tests (pull_request) Failing after 9m15s
CI / docker (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 28m1s
CI / coverage (pull_request) Failing after 30m44s
to b5371e53f5
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 17s
CI / build (pull_request) Successful in 18s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 44s
CI / integration_tests (pull_request) Successful in 4m23s
CI / unit_tests (pull_request) Successful in 13m40s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 21m23s
CI / coverage (pull_request) Failing after 48m57s
2026-03-01 11:28:32 +00:00
Compare
CoreRasurae force-pushed feature/m6-checkpoint from b5371e53f5
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 17s
CI / build (pull_request) Successful in 18s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 44s
CI / integration_tests (pull_request) Successful in 4m23s
CI / unit_tests (pull_request) Successful in 13m40s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 21m23s
CI / coverage (pull_request) Failing after 48m57s
to 32775fceb6
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 19s
CI / typecheck (pull_request) Successful in 33s
CI / security (pull_request) Successful in 35s
CI / integration_tests (pull_request) Successful in 2m53s
CI / unit_tests (pull_request) Failing after 12m54s
CI / docker (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 27m40s
CI / coverage (pull_request) Failing after 49m21s
2026-03-01 15:38:58 +00:00
Compare
CoreRasurae force-pushed feature/m6-checkpoint from 32775fceb6
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 19s
CI / typecheck (pull_request) Successful in 33s
CI / security (pull_request) Successful in 35s
CI / integration_tests (pull_request) Successful in 2m53s
CI / unit_tests (pull_request) Failing after 12m54s
CI / docker (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 27m40s
CI / coverage (pull_request) Failing after 49m21s
to f6bd6ce7e1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 18s
CI / typecheck (pull_request) Successful in 34s
CI / security (pull_request) Successful in 44s
CI / benchmark-regression (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-01 23:35:10 +00:00
Compare
CoreRasurae force-pushed feature/m6-checkpoint from f6bd6ce7e1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 18s
CI / typecheck (pull_request) Successful in 34s
CI / security (pull_request) Successful in 44s
CI / benchmark-regression (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to 81c09acc53
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 58s
CI / integration_tests (pull_request) Successful in 4m37s
CI / unit_tests (pull_request) Successful in 14m18s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 25m49s
CI / coverage (pull_request) Failing after 48m35s
2026-03-01 23:36:15 +00:00
Compare
CoreRasurae force-pushed feature/m6-checkpoint from 81c09acc53
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 58s
CI / integration_tests (pull_request) Successful in 4m37s
CI / unit_tests (pull_request) Successful in 14m18s
CI / docker (pull_request) Successful in 38s
CI / benchmark-regression (pull_request) Successful in 25m49s
CI / coverage (pull_request) Failing after 48m35s
to 86e245c585
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 37s
CI / unit_tests (pull_request) Successful in 2m22s
CI / integration_tests (pull_request) Successful in 2m51s
CI / docker (pull_request) Successful in 38s
CI / coverage (pull_request) Successful in 3m34s
CI / benchmark-regression (pull_request) Successful in 22m10s
CI / lint (push) Successful in 12s
CI / build (push) Successful in 14s
CI / quality (push) Successful in 18s
CI / security (push) Successful in 32s
CI / typecheck (push) Successful in 34s
CI / benchmark-regression (push) Has been skipped
CI / unit_tests (push) Successful in 1m55s
CI / docker (push) Successful in 38s
CI / integration_tests (push) Successful in 2m52s
CI / coverage (push) Successful in 4m4s
CI / benchmark-publish (push) Successful in 13m13s
2026-03-02 10:23:25 +00:00
Compare
CoreRasurae deleted branch feature/m6-checkpoint 2026-03-02 10:46:17 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!445
No description provided.