UAT: Automatic checkpoint triggers not implemented — only pre/post execute checkpoints created, not per-tool-write or event-based #3439

Closed
opened 2026-04-05 16:50:21 +00:00 by freemo · 3 comments
Owner

Metadata

  • Branch: fix/m3.3-checkpoint-auto-triggers
  • Commit Message: fix(executor): implement automatic per-tool-write and event-based checkpoint triggers
  • Milestone: v3.3.0

Background and Context

The specification defines four automatic checkpoint triggers that must fire during the Execute phase of the plan lifecycle. These triggers are essential to the checkpoint/rollback safety model: without per-tool-write checkpoints, a sandbox mutation cannot be rolled back to the state immediately before the offending tool ran — only to the state before the entire execute phase began. This severely limits the granularity and usefulness of the rollback system and directly violates the milestone v3.3.0 acceptance criterion "Checkpoint creation and rollback functional."

The four triggers defined in docs/specification.md (Checkpoint Creation → Automatic Creation) are:

  • before_tool_execute(writes=true) — Before executing any tool that modifies the sandbox.
  • after_tool_execute(writes=true) — After a tool successfully modifies the sandbox.
  • on_subplan_spawn — Immediately after a child plan is spawned.
  • on_error — After any unrecoverable error in the Execute phase.

Current Behavior

PlanExecutor._try_create_checkpoint() is only called at two points in the execute phase:

  1. pre_execute — before the entire execute phase begins (lines 679/736 of src/cleveragents/application/services/plan_executor.py)
  2. post_execute — after the entire execute phase completes (lines 701/771 of the same file)

There is no per-tool-write checkpoint creation. The on_subplan_spawn and on_error triggers are also absent. The tool runner (src/cleveragents/tool/runner.py) does not call CheckpointService.create_checkpoint() before or after individual tool execution, and no checkpoint hooks exist in src/cleveragents/tool/builtins/.

Steps to Reproduce:

  1. Create a plan with multiple tool-writing decisions.
  2. Execute the plan.
  3. Inspect checkpoints via CheckpointService.list_checkpoints().
  4. Observe: Only 2 checkpoints exist (pre_execute and post_execute), not one per tool write.
  5. Expected: A checkpoint before and after each tool that writes to the sandbox, plus checkpoints on subplan spawn and on unrecoverable error.

Expected Behavior

Checkpoints are automatically created:

  • Before and after each tool execution that writes to the sandbox (before_tool_execute / after_tool_execute).
  • Immediately when a child plan is spawned (on_subplan_spawn).
  • After any unrecoverable error in the Execute phase (on_error).

This enables fine-grained rollback to any point between individual tool writes, not just to the start of the execute phase.

Acceptance Criteria

  • CheckpointService.create_checkpoint() is called in the tool runner before executing any tool flagged as a write operation.
  • CheckpointService.create_checkpoint() is called in the tool runner after a write tool completes successfully.
  • A checkpoint is created immediately after a child plan is spawned (on_subplan_spawn).
  • A checkpoint is created after any unrecoverable error in the Execute phase (on_error).
  • A configuration key core.checkpoints.auto_create_on controls which triggers are active (all four enabled by default).
  • Behave tests cover all four automatic trigger scenarios.
  • All nox stages pass.
  • Coverage >= 97%.

Supporting Information

Affected code locations:

  • src/cleveragents/application/services/plan_executor.py lines 679, 701, 736, 771 — only pre/post execute checkpoints present
  • src/cleveragents/tool/runner.py — no checkpoint creation hooks
  • src/cleveragents/tool/builtins/ — no checkpoint hooks in tool execution

Spec reference: docs/specification.md — Checkpoint Creation → Automatic Creation section

Discovery method: Code-level analysis of plan_executor.py and src/cleveragents/tool/ against the spec's Checkpoint Creation section during UAT of the checkpoint-rollback-system feature area.

Subtasks

  • Add checkpoint creation hook in tool runner before executing any write tool (before_tool_execute)
  • Add checkpoint creation hook in tool runner after successful write tool execution (after_tool_execute)
  • Add checkpoint creation hook when a child plan is spawned (on_subplan_spawn)
  • Add checkpoint creation hook on unrecoverable error in Execute phase (on_error)
  • Wire CheckpointService into the tool runner
  • Add configuration support for core.checkpoints.auto_create_on to control which triggers are active
  • Write Behave tests covering all four automatic trigger scenarios
  • Run nox (all default sessions), fix any errors
  • Verify coverage >= 97% via nox -s coverage_report

Definition of Done

This issue is complete when:

  • All four automatic trigger types (before_tool_execute, after_tool_execute, on_subplan_spawn, on_error) create checkpoints as specified in docs/specification.md.
  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/m3.3-checkpoint-auto-triggers` - **Commit Message**: `fix(executor): implement automatic per-tool-write and event-based checkpoint triggers` - **Milestone**: v3.3.0 ## Background and Context The specification defines four automatic checkpoint triggers that must fire during the Execute phase of the plan lifecycle. These triggers are essential to the checkpoint/rollback safety model: without per-tool-write checkpoints, a sandbox mutation cannot be rolled back to the state immediately before the offending tool ran — only to the state before the entire execute phase began. This severely limits the granularity and usefulness of the rollback system and directly violates the milestone v3.3.0 acceptance criterion "Checkpoint creation and rollback functional." The four triggers defined in `docs/specification.md` (Checkpoint Creation → Automatic Creation) are: - `before_tool_execute(writes=true)` — Before executing any tool that modifies the sandbox. - `after_tool_execute(writes=true)` — After a tool successfully modifies the sandbox. - `on_subplan_spawn` — Immediately after a child plan is spawned. - `on_error` — After any unrecoverable error in the Execute phase. ## Current Behavior `PlanExecutor._try_create_checkpoint()` is only called at two points in the execute phase: 1. `pre_execute` — before the entire execute phase begins (lines 679/736 of `src/cleveragents/application/services/plan_executor.py`) 2. `post_execute` — after the entire execute phase completes (lines 701/771 of the same file) There is **no** per-tool-write checkpoint creation. The `on_subplan_spawn` and `on_error` triggers are also absent. The tool runner (`src/cleveragents/tool/runner.py`) does not call `CheckpointService.create_checkpoint()` before or after individual tool execution, and no checkpoint hooks exist in `src/cleveragents/tool/builtins/`. **Steps to Reproduce:** 1. Create a plan with multiple tool-writing decisions. 2. Execute the plan. 3. Inspect checkpoints via `CheckpointService.list_checkpoints()`. 4. **Observe**: Only 2 checkpoints exist (`pre_execute` and `post_execute`), not one per tool write. 5. **Expected**: A checkpoint before and after each tool that writes to the sandbox, plus checkpoints on subplan spawn and on unrecoverable error. ## Expected Behavior Checkpoints are automatically created: - Before and after each tool execution that writes to the sandbox (`before_tool_execute` / `after_tool_execute`). - Immediately when a child plan is spawned (`on_subplan_spawn`). - After any unrecoverable error in the Execute phase (`on_error`). This enables fine-grained rollback to any point between individual tool writes, not just to the start of the execute phase. ## Acceptance Criteria - [ ] `CheckpointService.create_checkpoint()` is called in the tool runner before executing any tool flagged as a write operation. - [ ] `CheckpointService.create_checkpoint()` is called in the tool runner after a write tool completes successfully. - [ ] A checkpoint is created immediately after a child plan is spawned (`on_subplan_spawn`). - [ ] A checkpoint is created after any unrecoverable error in the Execute phase (`on_error`). - [ ] A configuration key `core.checkpoints.auto_create_on` controls which triggers are active (all four enabled by default). - [ ] Behave tests cover all four automatic trigger scenarios. - [ ] All nox stages pass. - [ ] Coverage >= 97%. ## Supporting Information **Affected code locations:** - `src/cleveragents/application/services/plan_executor.py` lines 679, 701, 736, 771 — only pre/post execute checkpoints present - `src/cleveragents/tool/runner.py` — no checkpoint creation hooks - `src/cleveragents/tool/builtins/` — no checkpoint hooks in tool execution **Spec reference:** `docs/specification.md` — Checkpoint Creation → Automatic Creation section **Discovery method:** Code-level analysis of `plan_executor.py` and `src/cleveragents/tool/` against the spec's Checkpoint Creation section during UAT of the checkpoint-rollback-system feature area. ## Subtasks - [ ] Add checkpoint creation hook in tool runner before executing any write tool (`before_tool_execute`) - [ ] Add checkpoint creation hook in tool runner after successful write tool execution (`after_tool_execute`) - [ ] Add checkpoint creation hook when a child plan is spawned (`on_subplan_spawn`) - [ ] Add checkpoint creation hook on unrecoverable error in Execute phase (`on_error`) - [ ] Wire `CheckpointService` into the tool runner - [ ] Add configuration support for `core.checkpoints.auto_create_on` to control which triggers are active - [ ] Write Behave tests covering all four automatic trigger scenarios - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage >= 97% via `nox -s coverage_report` ## Definition of Done This issue is complete when: - All four automatic trigger types (`before_tool_execute`, `after_tool_execute`, `on_subplan_spawn`, `on_error`) create checkpoints as specified in `docs/specification.md`. - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.3.0 milestone 2026-04-05 16:50:25 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — Only 2 of the spec-required checkpoint triggers are implemented (pre/post execute). The 4 fine-grained triggers (before/after tool write, on subplan spawn, on error) are completely missing, severely limiting rollback granularity.
  • Milestone: v3.3.0 (already assigned)
  • Story Points: 5 (L) — Requires wiring CheckpointService into the tool runner, implementing 4 distinct trigger types, adding configuration support, and comprehensive BDD tests. Touches multiple modules (executor, tool runner, builtins).
  • MoSCoW: Must Have — The spec mandates all four automatic checkpoint triggers (§Checkpoint Creation → Automatic Creation). v3.3.0 acceptance criterion "Checkpoint creation and rollback functional" cannot pass without fine-grained checkpoint triggers.
  • Parent Epic: #368 (Subplans & Parallelism) — Checkpoints are integral to the subplan/correction system

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical — Only 2 of the spec-required checkpoint triggers are implemented (pre/post execute). The 4 fine-grained triggers (before/after tool write, on subplan spawn, on error) are completely missing, severely limiting rollback granularity. - **Milestone**: v3.3.0 (already assigned) - **Story Points**: 5 (L) — Requires wiring CheckpointService into the tool runner, implementing 4 distinct trigger types, adding configuration support, and comprehensive BDD tests. Touches multiple modules (executor, tool runner, builtins). - **MoSCoW**: Must Have — The spec mandates all four automatic checkpoint triggers (§Checkpoint Creation → Automatic Creation). v3.3.0 acceptance criterion "Checkpoint creation and rollback functional" cannot pass without fine-grained checkpoint triggers. - **Parent Epic**: #368 (Subplans & Parallelism) — Checkpoints are integral to the subplan/correction system --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

PR #3474 created on branch fix/m3.3-checkpoint-auto-triggers. I will monitor and handle all review feedback until merged.

Implementation Summary:

All four automatic checkpoint triggers have been implemented:

  1. before_tool_executeToolRunner.execute() now creates a checkpoint before executing any tool with writes=True, when a CheckpointService is wired.

  2. after_tool_executeToolRunner.execute() creates a checkpoint after a write tool completes successfully.

  3. on_subplan_spawnSubplanExecutionService._execute_one_with_retry() creates a checkpoint immediately after a child plan is spawned (before its first execution attempt).

  4. on_errorPlanExecutor._run_execute_with_stub() and _run_execute_with_runtime() create a checkpoint after any unrecoverable error in the Execute phase (before rollback).

Configuration: New config key core.checkpoints.auto_create_on (env: CLEVERAGENTS_CHECKPOINT_AUTO_CREATE_ON) controls which triggers are active. Default: all four enabled.

Tests: 15 Behave scenarios in features/checkpoint_auto_triggers.feature covering all triggers, disable-trigger behavior, and config key registration.

Quality gates: nox -s lint and nox -s typecheck both pass.


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-worker

PR #3474 created on branch `fix/m3.3-checkpoint-auto-triggers`. I will monitor and handle all review feedback until merged. **Implementation Summary:** All four automatic checkpoint triggers have been implemented: 1. **`before_tool_execute`** — `ToolRunner.execute()` now creates a checkpoint before executing any tool with `writes=True`, when a `CheckpointService` is wired. 2. **`after_tool_execute`** — `ToolRunner.execute()` creates a checkpoint after a write tool completes successfully. 3. **`on_subplan_spawn`** — `SubplanExecutionService._execute_one_with_retry()` creates a checkpoint immediately after a child plan is spawned (before its first execution attempt). 4. **`on_error`** — `PlanExecutor._run_execute_with_stub()` and `_run_execute_with_runtime()` create a checkpoint after any unrecoverable error in the Execute phase (before rollback). **Configuration:** New config key `core.checkpoints.auto_create_on` (env: `CLEVERAGENTS_CHECKPOINT_AUTO_CREATE_ON`) controls which triggers are active. Default: all four enabled. **Tests:** 15 Behave scenarios in `features/checkpoint_auto_triggers.feature` covering all triggers, disable-trigger behavior, and config key registration. **Quality gates:** `nox -s lint` and `nox -s typecheck` both pass. --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-worker
Author
Owner

PR #3474 has been merged successfully! 🎉

Summary:

  • Implemented all four automatic checkpoint triggers (before_tool_execute, after_tool_execute, on_subplan_spawn, on_error)
  • Fixed module boundary violation in PlanExecutor (public is_trigger_active API)
  • Split 519-line step file into two files under 500 lines
  • Eliminated DRY violation with single DEFAULT_AUTO_TRIGGERS constant
  • Replaced Any type with PlanLifecycleProtocol for type safety
  • Added ISSUES CLOSED: footer to commit message
  • Rebased onto latest master

Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-worker

PR #3474 has been merged successfully! 🎉 **Summary:** - Implemented all four automatic checkpoint triggers (before_tool_execute, after_tool_execute, on_subplan_spawn, on_error) - Fixed module boundary violation in PlanExecutor (public is_trigger_active API) - Split 519-line step file into two files under 500 lines - Eliminated DRY violation with single DEFAULT_AUTO_TRIGGERS constant - Replaced Any type with PlanLifecycleProtocol for type safety - Added ISSUES CLOSED: footer to commit message - Rebased onto latest master --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-worker
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#368 Epic: Subplans & Parallelism
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3439
No description provided.