Epic: Checkpoint & Rollback System (v3.3.0) #8493

Open
opened 2026-04-13 20:09:16 +00:00 by HAL9000 · 2 comments
Owner

Metadata

  • Commit message: feat(plans): implement checkpoint creation and plan rollback system
  • Branch name: feat/v3.3.0-checkpoint-rollback-system

Background and Context

As part of the v3.3.0 milestone (M4: Corrections + Subplans + Checkpoints), the plan execution engine must support fault-tolerant recovery through checkpointing and rollback. Currently, if a plan fails mid-execution, there is no mechanism to restore a previous known-good state.

This Epic covers the implementation of a checkpoint system that snapshots plan state at configurable points during execution, and a rollback system that restores the plan to a previously checkpointed state via the plan rollback CLI command. Checkpoints must be persisted to survive process restarts.

Additionally, this Epic covers the correction engine modes: plan correct --mode revert (reverts to last clean state) and plan correct --mode append (appends a correction step to the current plan).

This Epic BLOCKS Legendary #8486.

Expected Behavior

  • Checkpoints are automatically created at configurable execution milestones (e.g., before each major step, after subplan completion).
  • Checkpoints are persisted to durable storage and survive process restarts.
  • plan rollback CLI command lists available checkpoints and restores the plan to the selected checkpoint state.
  • plan correct --mode revert reverts the plan to the last clean (pre-error) checkpoint.
  • plan correct --mode append appends a user-defined correction step to the current plan without reverting.
  • Rollback is atomic: either the full checkpoint state is restored, or the rollback fails cleanly with no partial state.
  • Checkpoint metadata includes: timestamp, plan step index, plan state hash, and human-readable label.

Acceptance Criteria

  • Checkpoint creation implemented and triggered at configurable execution milestones
  • Checkpoints persisted to durable storage (survive process restarts)
  • plan rollback CLI command implemented and functional
  • Rollback restores plan to selected checkpoint state atomically
  • plan correct --mode revert reverts to last clean checkpoint
  • plan correct --mode append appends correction step to current plan
  • Checkpoint metadata includes timestamp, step index, state hash, and label
  • Rollback is atomic (no partial state on failure)
  • Unit tests cover checkpoint creation, persistence, and rollback
  • Integration tests cover full checkpoint → failure → rollback flow
  • Test coverage >= 97% for all new modules in this Epic

Subtasks

  • Design checkpoint data model (metadata, state snapshot, persistence format)
  • Implement checkpoint creation at configurable execution milestones
  • Implement checkpoint persistence layer (durable storage, restart-safe)
  • Implement checkpoint listing and retrieval API
  • Implement plan rollback CLI command with checkpoint selection
  • Implement atomic rollback logic (restore state or fail cleanly)
  • Implement plan correct --mode revert (revert to last clean checkpoint)
  • Implement plan correct --mode append (append correction step)
  • Write unit tests for checkpoint creation, persistence, and retrieval
  • Write integration tests for checkpoint → failure → rollback end-to-end flow
  • Write tests for correction engine revert and append modes

Child Issues / Child Epics

Feature issues will be linked here as they are created.

  • Feature: Checkpoint creation and persistence — TBD
  • Feature: plan rollback CLI command — TBD
  • Feature: Correction engine (revert + append modes) — TBD

Definition of Done

This Epic is closed when:

  1. All subtasks above are complete and checked off.
  2. All Acceptance Criteria are verified.
  3. Test coverage >= 97% for all new modules.
  4. plan rollback and plan correct commands are confirmed functional in integration tests.
  5. This Epic is confirmed as unblocking Legendary #8486.

Automated by CleverAgents Bot
Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor

## Metadata - **Commit message**: `feat(plans): implement checkpoint creation and plan rollback system` - **Branch name**: `feat/v3.3.0-checkpoint-rollback-system` ## Background and Context As part of the v3.3.0 milestone (M4: Corrections + Subplans + Checkpoints), the plan execution engine must support fault-tolerant recovery through checkpointing and rollback. Currently, if a plan fails mid-execution, there is no mechanism to restore a previous known-good state. This Epic covers the implementation of a checkpoint system that snapshots plan state at configurable points during execution, and a rollback system that restores the plan to a previously checkpointed state via the `plan rollback` CLI command. Checkpoints must be persisted to survive process restarts. Additionally, this Epic covers the correction engine modes: `plan correct --mode revert` (reverts to last clean state) and `plan correct --mode append` (appends a correction step to the current plan). **This Epic BLOCKS Legendary #8486.** ## Expected Behavior - Checkpoints are automatically created at configurable execution milestones (e.g., before each major step, after subplan completion). - Checkpoints are persisted to durable storage and survive process restarts. - `plan rollback` CLI command lists available checkpoints and restores the plan to the selected checkpoint state. - `plan correct --mode revert` reverts the plan to the last clean (pre-error) checkpoint. - `plan correct --mode append` appends a user-defined correction step to the current plan without reverting. - Rollback is atomic: either the full checkpoint state is restored, or the rollback fails cleanly with no partial state. - Checkpoint metadata includes: timestamp, plan step index, plan state hash, and human-readable label. ## Acceptance Criteria - [ ] Checkpoint creation implemented and triggered at configurable execution milestones - [ ] Checkpoints persisted to durable storage (survive process restarts) - [ ] `plan rollback` CLI command implemented and functional - [ ] Rollback restores plan to selected checkpoint state atomically - [ ] `plan correct --mode revert` reverts to last clean checkpoint - [ ] `plan correct --mode append` appends correction step to current plan - [ ] Checkpoint metadata includes timestamp, step index, state hash, and label - [ ] Rollback is atomic (no partial state on failure) - [ ] Unit tests cover checkpoint creation, persistence, and rollback - [ ] Integration tests cover full checkpoint → failure → rollback flow - [ ] Test coverage >= 97% for all new modules in this Epic ## Subtasks - [ ] Design checkpoint data model (metadata, state snapshot, persistence format) - [ ] Implement checkpoint creation at configurable execution milestones - [ ] Implement checkpoint persistence layer (durable storage, restart-safe) - [ ] Implement checkpoint listing and retrieval API - [ ] Implement `plan rollback` CLI command with checkpoint selection - [ ] Implement atomic rollback logic (restore state or fail cleanly) - [ ] Implement `plan correct --mode revert` (revert to last clean checkpoint) - [ ] Implement `plan correct --mode append` (append correction step) - [ ] Write unit tests for checkpoint creation, persistence, and retrieval - [ ] Write integration tests for checkpoint → failure → rollback end-to-end flow - [ ] Write tests for correction engine `revert` and `append` modes ## Child Issues / Child Epics > Feature issues will be linked here as they are created. - [ ] Feature: Checkpoint creation and persistence — _TBD_ - [ ] Feature: `plan rollback` CLI command — _TBD_ - [ ] Feature: Correction engine (revert + append modes) — _TBD_ ## Definition of Done This Epic is closed when: 1. All subtasks above are complete and checked off. 2. All Acceptance Criteria are verified. 3. Test coverage >= 97% for all new modules. 4. `plan rollback` and `plan correct` commands are confirmed functional in integration tests. 5. This Epic is confirmed as unblocking Legendary #8486. --- **Automated by CleverAgents Bot** Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor
HAL9000 added this to the v3.3.0 milestone 2026-04-13 20:09:38 +00:00
Author
Owner

[AUTO-OWNR-3] Triage Decision

Status: Verified

MoSCoW: Must Have
Priority: High

Rationale: Checkpoint creation and plan rollback are explicitly listed in the v3.3.0 milestone acceptance criteria: "Checkpoint creation and rollback (plan rollback) functional." The correction engine modes (plan correct --mode revert|append) are also v3.3.0 requirements. This Epic directly blocks the Legendary #8486. Priority/High is confirmed — required for milestone delivery, but can proceed in parallel with #8491 (Three-Way Merge) since it has a separate implementation surface.

Next Steps: Checkpoint data model design (metadata, state snapshot, persistence format) can begin immediately in parallel with other Epics. The plan rollback CLI command and atomic rollback logic are the critical path items. Ensure checkpoint persistence survives process restarts — this is a hard acceptance criterion.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## [AUTO-OWNR-3] Triage Decision **Status**: ✅ Verified **MoSCoW**: Must Have **Priority**: High **Rationale**: Checkpoint creation and `plan rollback` are explicitly listed in the v3.3.0 milestone acceptance criteria: "Checkpoint creation and rollback (`plan rollback`) functional." The correction engine modes (`plan correct --mode revert|append`) are also v3.3.0 requirements. This Epic directly blocks the Legendary #8486. Priority/High is confirmed — required for milestone delivery, but can proceed in parallel with #8491 (Three-Way Merge) since it has a separate implementation surface. **Next Steps**: Checkpoint data model design (metadata, state snapshot, persistence format) can begin immediately in parallel with other Epics. The `plan rollback` CLI command and atomic rollback logic are the critical path items. Ensure checkpoint persistence survives process restarts — this is a hard acceptance criterion. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

🧩 Child Issues Created for Epic #8493

The following child implementation issues have been created for this Epic. Each child issue blocks this Epic and is assigned to milestone v3.3.0.

# Issue Labels
1 #8555feat(plans): implement checkpoint creation for plan state snapshots Type/Feature · Priority/High · MoSCoW/Must have · State/Verified
2 #8557feat(plans): implement agents plan rollback command for checkpoint-based rollback Type/Feature · Priority/High · MoSCoW/Must have · State/Verified
3 #8559feat(plans): implement checkpoint listing and management CLI commands Type/Feature · Priority/High · MoSCoW/Must have · State/Verified

All three issues follow the CONTRIBUTING.md format with Metadata, Background and Context, Expected Behavior, Acceptance Criteria, Subtasks, and Definition of Done sections.


Automated by CleverAgents Bot
Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor

## 🧩 Child Issues Created for Epic #8493 The following child implementation issues have been created for this Epic. Each child issue blocks this Epic and is assigned to milestone **v3.3.0**. | # | Issue | Labels | |---|---|---| | 1 | #8555 — `feat(plans): implement checkpoint creation for plan state snapshots` | Type/Feature · Priority/High · MoSCoW/Must have · State/Verified | | 2 | #8557 — `feat(plans): implement agents plan rollback command for checkpoint-based rollback` | Type/Feature · Priority/High · MoSCoW/Must have · State/Verified | | 3 | #8559 — `feat(plans): implement checkpoint listing and management CLI commands` | Type/Feature · Priority/High · MoSCoW/Must have · State/Verified | All three issues follow the CONTRIBUTING.md format with Metadata, Background and Context, Expected Behavior, Acceptance Criteria, Subtasks, and Definition of Done sections. --- **Automated by CleverAgents Bot** Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8493
No description provided.