feat(checkpoint): add checkpointing and rollback #445
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!445
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m6-checkpoint"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements the checkpoint and rollback subsystem for plan execution safety, including domain models, database persistence, CLI commands, and correction flow integration.
Changes
Checkpointwith ULID ID, plan association, sandbox reference, and metadata (reason, source tool, phase).CheckpointRetentionPolicyfor auto-prune.RollbackResultfor operation outcomes.CheckpointModelORM mapping andCheckpointRepositorywith CRUD + prune operationsplan rollback <plan_id> <checkpoint_id>with output showing restored file counts and changed pathsCorrectionServicefor revert flow reusedocs/reference/checkpointing.mdMotivation
Checkpointing enables safe, reversible plan execution by capturing sandbox state at key points. The rollback capability allows recovering from failed execution steps without losing earlier progress, and integrates with the correction flow for targeted decision reversal.
Key Design Decisions
Closes #206
5427f8983a25fc4229f8I asked GPT-5.2 Codex to do a review, but it was garbage. Trying again.
Review Summary (commit 25fc4229f822d2fca2aa5)
Scoped to the checkpointing/rollback commit. The core models and service shape look good, but there are blockers around migration lineage and the rollback guard wiring that make the CLI path unusable in practice.
CI status isn’t visible via the API on my side. Please confirm required checks per
docs/development/ci-cd.mdare green (lint, typecheck, security, quality, unit_tests, integration_tests, coverage, build, docker).Findings
P0:blocker — Alembic migration chain is broken.
m6_001_checkpoint_metadata_table.pyusesdown_revision = "c3_001_actor_registry", but the repo only hasc3_001_actor_registry_columns.py. This will fail migrations.alembic/versions/m6_001_checkpoint_metadata_table.pyP1:must-fix —
plan rollbackwill always fail with “sandbox is missing.”CheckpointServiceguards rely on in‑memoryregister_sandbox, but the DI container provides a new service instance per call (providers.Factory), and the CLI never registers sandbox refs. There’s no persistent sandbox registry lookup either, so rollback cannot succeed for real usage.src/cleveragents/application/services/checkpoint_service.py,src/cleveragents/application/container.py,src/cleveragents/cli/commands/plan.pyP1:must-fix — The “plan already applied” guard is also in‑memory and never set in production flows. Even if the sandbox guard were fixed, applied plans could be rolled back unless you wire this to lifecycle state in the DB.
src/cleveragents/application/services/checkpoint_service.pyP2:should-fix — Docs claim CorrectionService can delegate rollback to CheckpointService, but the service never calls it. Either implement delegation or remove/adjust the doc claim to avoid misleading behavior.
src/cleveragents/application/services/correction_service.py,docs/reference/checkpointing.mdP3:should-fix — Retention policy defaults are documented but
create_checkpoint()only prunes if a policy is explicitly passed. Consider defaulting toDEFAULT_RETENTION_POLICYto match docs.src/cleveragents/application/services/checkpoint_service.py,docs/reference/checkpointing.mdPositive Notes
25fc4229f8830b29e4aeApproved.
830b29e4ae8cb008478eNew commits pushed, approval review dismissed automatically according to repository settings
8cb008478eb5371e53f5b5371e53f532775fceb632775fceb6f6bd6ce7e1f6bd6ce7e181c09acc5381c09acc5386e245c585