UAT: agents plan rollback does not discard decisions or invalidate child plans after checkpoint — only filesystem is reverted #3326

Open
opened 2026-04-05 09:52:25 +00:00 by freemo · 3 comments
Owner

Metadata

  • Branch: bugfix/m3.3-checkpoint-rollback-completeness
  • Commit Message: fix(checkpoint): implement full rollback of decisions, child plans, and tool calls
  • Milestone: v3.3.0
  • Parent Epic: #368

Background and Context

The spec (docs/specification.md line 15953) defines the full semantics of plan rollback:

"Rollback a plan sandbox to a previous checkpoint. All changes made after the target checkpoint are reverted: files are restored or removed, decisions are discarded, and tool calls are undone. Child plans spawned after the checkpoint are invalidated."

The spec's correction flow (lines 28706–28708) further describes the full rollback sequence as four steps:

  1. Resource rollbackgit reset --hard <sandbox_ref> ( implemented)
  2. Reasoning rollback — restore actor state ( not implemented)
  3. Supersede affected subtree — cascade superseding through target and all descendants ( not implemented)
  4. Inject guidance and resume ( not implemented)

CheckpointService.rollback_to_checkpoint() (src/cleveragents/application/services/checkpoint_service.py lines 259–368) only performs steps 1 (filesystem revert via git reset --hard + git clean -fd). Steps 2–4 are entirely absent. The selective_rollback() method (lines 468–539) wraps rollback_to_checkpoint() with atomic semantics but also does not address the missing behaviors.

Current Behavior

CheckpointService.rollback_to_checkpoint() performs only a filesystem revert:

  • Runs git reset --hard <sandbox_ref> on the sandbox
  • Runs git clean -fd to remove untracked files

It does not:

  1. Discard or supersede decisions made after the checkpoint — decisions remain active in the decision tree
  2. Invalidate child plans spawned after the checkpoint — child plans continue to exist and execute
  3. Mark tool call records after the checkpoint as undone — tool call records remain unchanged
  4. Transition the plan's processing state back to execute/queued

Steps to Reproduce

  1. Create a plan with a sandbox
  2. Create a checkpoint (checkpoint A)
  3. Record several decisions after checkpoint A
  4. Spawn child plans after checkpoint A
  5. Run agents plan rollback <PLAN_ID> <CHECKPOINT_A_ID> --yes
  6. Observe: filesystem is reverted, but decisions and child plans from after checkpoint A still exist in the system

Expected Behavior

Per docs/specification.md lines 15952–15953 and 28686–28711:

  • All decisions recorded after checkpoint A are superseded/discarded
  • All child plans spawned after checkpoint A are invalidated
  • Tool call records after checkpoint A are marked as undone
  • Plan processing state transitions to execute/queued

Acceptance Criteria

  • After rollback, no decisions made after the target checkpoint remain active in the decision tree
  • After rollback, all child plans spawned after the target checkpoint are in an invalidated state
  • After rollback, tool call records created after the target checkpoint are marked as undone
  • After rollback, the plan's processing state is execute/queued
  • selective_rollback() propagates all four rollback behaviors atomically
  • The CLI rollback_plan command output accurately reflects the complete rollback (not just filesystem)
  • All behaviors are covered by Behave and Robot Framework tests

Supporting Information

Code Locations:

  • src/cleveragents/application/services/checkpoint_service.py lines 259–368 (rollback_to_checkpoint)
  • src/cleveragents/application/services/checkpoint_service.py lines 468–539 (selective_rollback)
  • src/cleveragents/cli/commands/plan.py lines 3394–3447 (rollback_plan CLI command)

Spec References:

  • docs/specification.md lines 15952–15953 (rollback definition)
  • docs/specification.md lines 28686–28711 (correction flow / full rollback sequence)

Subtasks

  • Implement decision superseding in rollback_to_checkpoint() — discard/supersede all decisions made after the target checkpoint
  • Implement child plan invalidation in rollback_to_checkpoint() — invalidate all child plans spawned after the target checkpoint
  • Implement tool call record marking in rollback_to_checkpoint() — mark all tool calls recorded after the checkpoint as undone
  • Implement plan processing state transition — transition plan back to execute/queued after rollback completes
  • Update selective_rollback() to propagate all four rollback behaviors atomically
  • Update CLI rollback_plan command output and confirmation messaging to reflect complete rollback semantics
  • Tests (Behave): Add scenarios covering decision superseding, child plan invalidation, tool call marking, and state transition after rollback
  • Tests (Robot): Add integration test for the full rollback sequence end-to-end
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `bugfix/m3.3-checkpoint-rollback-completeness` - **Commit Message**: `fix(checkpoint): implement full rollback of decisions, child plans, and tool calls` - **Milestone**: v3.3.0 - **Parent Epic**: #368 ## Background and Context The spec (`docs/specification.md` line 15953) defines the full semantics of plan rollback: > "Rollback a plan sandbox to a previous checkpoint. All changes made after the target checkpoint are **reverted**: files are restored or removed, **decisions are discarded**, and **tool calls are undone**. **Child plans spawned after the checkpoint are invalidated**." The spec's correction flow (lines 28706–28708) further describes the full rollback sequence as four steps: 1. **Resource rollback** — `git reset --hard <sandbox_ref>` (✅ implemented) 2. **Reasoning rollback** — restore actor state (❌ not implemented) 3. **Supersede affected subtree** — cascade superseding through target and all descendants (❌ not implemented) 4. **Inject guidance and resume** (❌ not implemented) `CheckpointService.rollback_to_checkpoint()` (`src/cleveragents/application/services/checkpoint_service.py` lines 259–368) only performs steps 1 (filesystem revert via `git reset --hard` + `git clean -fd`). Steps 2–4 are entirely absent. The `selective_rollback()` method (lines 468–539) wraps `rollback_to_checkpoint()` with atomic semantics but also does not address the missing behaviors. ## Current Behavior `CheckpointService.rollback_to_checkpoint()` performs only a filesystem revert: - Runs `git reset --hard <sandbox_ref>` on the sandbox - Runs `git clean -fd` to remove untracked files It does **not**: 1. Discard or supersede decisions made after the checkpoint — decisions remain active in the decision tree 2. Invalidate child plans spawned after the checkpoint — child plans continue to exist and execute 3. Mark tool call records after the checkpoint as undone — tool call records remain unchanged 4. Transition the plan's processing state back to `execute/queued` ## Steps to Reproduce 1. Create a plan with a sandbox 2. Create a checkpoint (checkpoint A) 3. Record several decisions after checkpoint A 4. Spawn child plans after checkpoint A 5. Run `agents plan rollback <PLAN_ID> <CHECKPOINT_A_ID> --yes` 6. **Observe**: filesystem is reverted, but decisions and child plans from after checkpoint A still exist in the system ## Expected Behavior Per `docs/specification.md` lines 15952–15953 and 28686–28711: - All decisions recorded after checkpoint A are superseded/discarded - All child plans spawned after checkpoint A are invalidated - Tool call records after checkpoint A are marked as undone - Plan processing state transitions to `execute/queued` ## Acceptance Criteria - [ ] After rollback, no decisions made after the target checkpoint remain active in the decision tree - [ ] After rollback, all child plans spawned after the target checkpoint are in an invalidated state - [ ] After rollback, tool call records created after the target checkpoint are marked as undone - [ ] After rollback, the plan's processing state is `execute/queued` - [ ] `selective_rollback()` propagates all four rollback behaviors atomically - [ ] The CLI `rollback_plan` command output accurately reflects the complete rollback (not just filesystem) - [ ] All behaviors are covered by Behave and Robot Framework tests ## Supporting Information **Code Locations:** - `src/cleveragents/application/services/checkpoint_service.py` lines 259–368 (`rollback_to_checkpoint`) - `src/cleveragents/application/services/checkpoint_service.py` lines 468–539 (`selective_rollback`) - `src/cleveragents/cli/commands/plan.py` lines 3394–3447 (`rollback_plan` CLI command) **Spec References:** - `docs/specification.md` lines 15952–15953 (rollback definition) - `docs/specification.md` lines 28686–28711 (correction flow / full rollback sequence) ## Subtasks - [ ] Implement decision superseding in `rollback_to_checkpoint()` — discard/supersede all decisions made after the target checkpoint - [ ] Implement child plan invalidation in `rollback_to_checkpoint()` — invalidate all child plans spawned after the target checkpoint - [ ] Implement tool call record marking in `rollback_to_checkpoint()` — mark all tool calls recorded after the checkpoint as undone - [ ] Implement plan processing state transition — transition plan back to `execute/queued` after rollback completes - [ ] Update `selective_rollback()` to propagate all four rollback behaviors atomically - [ ] Update CLI `rollback_plan` command output and confirmation messaging to reflect complete rollback semantics - [ ] Tests (Behave): Add scenarios covering decision superseding, child plan invalidation, tool call marking, and state transition after rollback - [ ] Tests (Robot): Add integration test for the full rollback sequence end-to-end - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.3.0 milestone 2026-04-05 10:02:11 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — This is a fundamental correctness bug in the checkpoint/rollback system. The rollback only reverts the filesystem but leaves decisions, child plans, and tool call records in an inconsistent state. This directly violates the specification and blocks the v3.3.0 acceptance criterion "Checkpoint creation and rollback (plan rollback) functional."
  • Milestone: v3.3.0 — Keeping in v3.3.0 where checkpoint/rollback is scoped.
  • MoSCoW: Must Have — The specification explicitly requires full rollback semantics (decisions, child plans, tool calls). Without this, the rollback feature is fundamentally broken.
  • Parent Epic: #368 (Subplans & Parallelism) — Checkpoint rollback is part of the subplans/parallelism Epic.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical — This is a fundamental correctness bug in the checkpoint/rollback system. The rollback only reverts the filesystem but leaves decisions, child plans, and tool call records in an inconsistent state. This directly violates the specification and blocks the v3.3.0 acceptance criterion "Checkpoint creation and rollback (`plan rollback`) functional." - **Milestone**: v3.3.0 — Keeping in v3.3.0 where checkpoint/rollback is scoped. - **MoSCoW**: Must Have — The specification explicitly requires full rollback semantics (decisions, child plans, tool calls). Without this, the rollback feature is fundamentally broken. - **Parent Epic**: #368 (Subplans & Parallelism) — Checkpoint rollback is part of the subplans/parallelism Epic. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — agents plan rollback only reverts the filesystem but does not discard decisions or invalidate child plans. This means a rollback leaves the plan in an inconsistent state where the decision tree references decisions that no longer correspond to the current filesystem state.
  • MoSCoW: Must Have — Per the specification, the plan lifecycle requires safe and reversible operations. A rollback that only partially reverts state is worse than no rollback at all — it creates silent inconsistencies that can lead to data corruption.

Paired with #3327 (TDD test).


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified ✅ - **Priority**: Critical — `agents plan rollback` only reverts the filesystem but does not discard decisions or invalidate child plans. This means a rollback leaves the plan in an inconsistent state where the decision tree references decisions that no longer correspond to the current filesystem state. - **MoSCoW**: Must Have — Per the specification, the plan lifecycle requires safe and reversible operations. A rollback that only partially reverts state is worse than no rollback at all — it creates silent inconsistencies that can lead to data corruption. Paired with #3327 (TDD test). --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical (unchanged) — Plan rollback only reverts filesystem but does not discard decisions or invalidate child plans. This is a critical correctness issue for the plan lifecycle.
  • Milestone: v3.3.0 (already set)
  • Story Points: 8 — XL — Requires implementing decision discarding, child plan invalidation, and checkpoint-based state restoration. Complex plan lifecycle change. Estimated 2-4 days.
  • MoSCoW: Must Have (already set) — Confirmed. Plan rollback correctness is essential for the Decision Framework milestone.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical (unchanged) — Plan rollback only reverts filesystem but does not discard decisions or invalidate child plans. This is a critical correctness issue for the plan lifecycle. - **Milestone**: v3.3.0 (already set) - **Story Points**: 8 — XL — Requires implementing decision discarding, child plan invalidation, and checkpoint-based state restoration. Complex plan lifecycle change. Estimated 2-4 days. - **MoSCoW**: Must Have (already set) — Confirmed. Plan rollback correctness is essential for the Decision Framework milestone. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.3.0 milestone 2026-04-06 23:59:45 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#3326
No description provided.