bug(checkpoint): CheckpointManager not wired into PlanExecutor — no checkpoints created during execution #1253

Closed
opened 2026-04-01 23:32:18 +00:00 by hamza.khyari · 2 comments
Member

Metadata

  • Commit Message: fix(checkpoint): wire CheckpointManager into PlanExecutor execution path
  • Branch: bugfix/checkpoint-wiring

Background

The specification states (line 18668):

In the Execute phase, record_decision is treated as a write operation for checkpoint purposes. The system automatically creates a coordinated checkpoint group — one checkpoint per modified sandbox resource — at every Execute-phase decision point.

However, no checkpoints are created during plan execution because the checkpoint infrastructure is fully implemented but never wired into the execution path.

Root Cause

Three disconnected systems:

  1. CheckpointManager (infrastructure/sandbox/checkpoint.py) — creates in-memory sandbox snapshots. Fully implemented. But the CLI factory at cli/commands/plan.py:1311-1315 constructs PlanExecutor without passing checkpoint_manager, so it defaults to None. When None, _try_create_checkpoint() silently returns None (plan_executor.py:419-420).

  2. CheckpointService (application/services/checkpoint_service.py) — DB-backed checkpoints. IS wired in the DI container (container.py:605-610). But PlanExecutor uses CheckpointManager (infra layer), not CheckpointService (app layer). The two are not connected.

  3. PlanResumeService.record_step_checkpoint() — the only code path that sets plan.last_checkpoint_id (plan_resume_service.py:330). Has zero production callers — only called from test helpers and step definitions.

Additionally:

  • The concept of "coordinated checkpoint group" from the spec does not exist in the codebase (zero grep results).
  • checkpointable defaults to False on both tools and resources, so preflight guardrails would reject checkpoint-requiring plans even if wiring were fixed.
  • plan status --format json conditionally includes last_checkpoint_id only when non-None (plan.py:208-209), so it never appears in practice.

Expected Behavior

After plan execute completes the Execute phase, plan status --format json should include a non-null last_checkpoint_id field. Checkpoint creation should be automatic per the spec, not requiring explicit record_step_checkpoint() calls.

Acceptance Criteria

  • PlanExecutor receives a CheckpointManager instance when constructed from the CLI and DI container
  • _try_create_checkpoint() creates a real checkpoint during Execute-phase decision recording
  • plan.last_checkpoint_id is set after execution completes
  • plan status --format json includes last_checkpoint_id after a successful execution
  • plan rollback works against the auto-created checkpoint
  • Default checkpointable flag behavior is reviewed — tools/resources used in execution should be checkpointable by default or the flag should be set during registration

Subtasks

  • Wire CheckpointManager into PlanExecutor construction in CLI (plan.py:1311-1315)
  • Wire CheckpointManager into PlanExecutor via DI container (currently not registered)
  • Connect CheckpointManager checkpoint creation to plan.last_checkpoint_id (bridge infra -> domain)
  • Review whether record_step_checkpoint() should be called from _try_create_checkpoint() to set last_checkpoint_id
  • Review checkpointable default on tools/resources
  • Tests (Behave): Verify checkpoint creation during plan execution
  • Tests (Robot): E2E test verifying plan status shows last_checkpoint_id after execution
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `fix(checkpoint): wire CheckpointManager into PlanExecutor execution path` - **Branch**: `bugfix/checkpoint-wiring` ## Background The specification states (line 18668): > In the Execute phase, `record_decision` is treated as a **write operation** for checkpoint purposes. The system **automatically creates** a **coordinated checkpoint group** — one checkpoint per modified sandbox resource — at **every Execute-phase decision point**. However, no checkpoints are created during plan execution because the checkpoint infrastructure is fully implemented but **never wired into the execution path**. ## Root Cause Three disconnected systems: 1. **`CheckpointManager`** (`infrastructure/sandbox/checkpoint.py`) — creates in-memory sandbox snapshots. Fully implemented. But the CLI factory at `cli/commands/plan.py:1311-1315` constructs `PlanExecutor` **without** passing `checkpoint_manager`, so it defaults to `None`. When `None`, `_try_create_checkpoint()` silently returns `None` (`plan_executor.py:419-420`). 2. **`CheckpointService`** (`application/services/checkpoint_service.py`) — DB-backed checkpoints. IS wired in the DI container (`container.py:605-610`). But `PlanExecutor` uses `CheckpointManager` (infra layer), not `CheckpointService` (app layer). The two are not connected. 3. **`PlanResumeService.record_step_checkpoint()`** — the **only** code path that sets `plan.last_checkpoint_id` (`plan_resume_service.py:330`). Has **zero production callers** — only called from test helpers and step definitions. Additionally: - The concept of "coordinated checkpoint group" from the spec does not exist in the codebase (zero grep results). - `checkpointable` defaults to `False` on both tools and resources, so preflight guardrails would reject checkpoint-requiring plans even if wiring were fixed. - `plan status --format json` conditionally includes `last_checkpoint_id` only when non-None (`plan.py:208-209`), so it never appears in practice. ## Expected Behavior After `plan execute` completes the Execute phase, `plan status --format json` should include a non-null `last_checkpoint_id` field. Checkpoint creation should be automatic per the spec, not requiring explicit `record_step_checkpoint()` calls. ## Acceptance Criteria - [ ] `PlanExecutor` receives a `CheckpointManager` instance when constructed from the CLI and DI container - [ ] `_try_create_checkpoint()` creates a real checkpoint during Execute-phase decision recording - [ ] `plan.last_checkpoint_id` is set after execution completes - [ ] `plan status --format json` includes `last_checkpoint_id` after a successful execution - [ ] `plan rollback` works against the auto-created checkpoint - [ ] Default `checkpointable` flag behavior is reviewed — tools/resources used in execution should be checkpointable by default or the flag should be set during registration ## Subtasks - [ ] Wire `CheckpointManager` into `PlanExecutor` construction in CLI (`plan.py:1311-1315`) - [ ] Wire `CheckpointManager` into `PlanExecutor` via DI container (currently not registered) - [ ] Connect `CheckpointManager` checkpoint creation to `plan.last_checkpoint_id` (bridge infra -> domain) - [ ] Review whether `record_step_checkpoint()` should be called from `_try_create_checkpoint()` to set `last_checkpoint_id` - [ ] Review `checkpointable` default on tools/resources - [ ] Tests (Behave): Verify checkpoint creation during plan execution - [ ] Tests (Robot): E2E test verifying `plan status` shows `last_checkpoint_id` after execution - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.5.0 milestone 2026-04-02 08:08:53 +00:00
Owner

🏷️ Backlog Groomer — Added Type/Bug label and assigned to milestone v3.5.0. Assigned to @freemo.

This comment was posted automatically by the backlog groomer.

🏷️ **Backlog Groomer** — Added `Type/Bug` label and assigned to milestone `v3.5.0`. Assigned to @freemo. *This comment was posted automatically by the backlog groomer.*
freemo self-assigned this 2026-04-02 18:45:27 +00:00
Owner

Architect Guidance: Critical Wiring Gap — Checkpoint Infrastructure

This is a critical architectural gap for M4 (v3.3.0) and M5 (v3.4.0). The checkpoint system has three disconnected layers that need to be unified:

Architecture Decision

The correct wiring path is:

PlanExecutor
  → CheckpointService (application layer, DB-backed)
    → CheckpointManager (infrastructure layer, sandbox snapshots)

NOT PlanExecutor → CheckpointManager directly. The application layer service should orchestrate:

  1. Create sandbox snapshot via CheckpointManager (infra)
  2. Persist checkpoint metadata to the database (via repository)
  3. Update plan's last_checkpoint_id (via PlanResumeService or directly)
  4. Coordinate checkpoint groups — one checkpoint per modified sandbox resource per decision point

Required Changes

  1. CLI factory (cli/commands/plan.py:1311-1315): Pass CheckpointService (not CheckpointManager) to PlanExecutor
  2. PlanExecutor: Accept CheckpointService and call it at every Execute-phase decision point
  3. CheckpointService: Orchestrate CheckpointManager.create() + DB persistence + plan update
  4. Coordinated checkpoint groups: Implement the spec's concept — when a decision modifies multiple sandbox resources, create one checkpoint per resource, grouped by decision ID
  5. checkpointable defaults: Consider changing the default to True for tools that write to sandbox resources, since the whole point of sandboxing is reversibility

Priority

High — without checkpoints, plan resume (which we're adding to the spec) has nothing to resume from, and plan correct can't roll back Execute-phase work.


🤖 CleverAgents Bot (architect-1)

## Architect Guidance: Critical Wiring Gap — Checkpoint Infrastructure This is a **critical architectural gap** for M4 (v3.3.0) and M5 (v3.4.0). The checkpoint system has three disconnected layers that need to be unified: ### Architecture Decision The correct wiring path is: ``` PlanExecutor → CheckpointService (application layer, DB-backed) → CheckpointManager (infrastructure layer, sandbox snapshots) ``` **NOT** `PlanExecutor → CheckpointManager` directly. The application layer service should orchestrate: 1. **Create sandbox snapshot** via `CheckpointManager` (infra) 2. **Persist checkpoint metadata** to the database (via repository) 3. **Update plan's `last_checkpoint_id`** (via `PlanResumeService` or directly) 4. **Coordinate checkpoint groups** — one checkpoint per modified sandbox resource per decision point ### Required Changes 1. **CLI factory** (`cli/commands/plan.py:1311-1315`): Pass `CheckpointService` (not `CheckpointManager`) to `PlanExecutor` 2. **PlanExecutor**: Accept `CheckpointService` and call it at every Execute-phase decision point 3. **CheckpointService**: Orchestrate `CheckpointManager.create()` + DB persistence + plan update 4. **Coordinated checkpoint groups**: Implement the spec's concept — when a decision modifies multiple sandbox resources, create one checkpoint per resource, grouped by decision ID 5. **`checkpointable` defaults**: Consider changing the default to `True` for tools that write to sandbox resources, since the whole point of sandboxing is reversibility ### Priority **High** — without checkpoints, `plan resume` (which we're adding to the spec) has nothing to resume from, and `plan correct` can't roll back Execute-phase work. --- *🤖 CleverAgents Bot (architect-1)*
freemo removed their assignment 2026-04-08 19:28:46 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#1253
No description provided.