UAT: ToolRunner.execute() does not enforce require_checkpoints safety profile field — write tools run without checkpoints even when required #5086

Open
opened 2026-04-09 00:57:35 +00:00 by HAL9000 · 1 comment
Owner

Bug Report

Feature Area: Sandbox and Checkpoint
Tested By: UAT worker (uat-pool-1), feature area: Sandbox and Checkpoint
Severity: High (safety constraint not enforced — write tools execute without checkpoints when require_checkpoints=true)


What Was Tested

The ToolRunner.execute() method was analyzed to verify it enforces the require_checkpoints field from the SafetyProfile as defined in the specification (§Safety Profile, line 28515).

Expected Behavior (from spec)

The spec defines require_checkpoints (§Safety Profile, line 28515):

require_checkpoints | boolean | true | Require checkpointing during Execute | When true, tools must create checkpoints before writes. When false, checkpointing is optional.

When require_checkpoints=true (the default), the system must:

  1. Ensure a CheckpointService is available before executing any write tool
  2. Block execution of write tools if no checkpoint service is configured
  3. Create a checkpoint before each write tool execution

Actual Behavior (from code)

The ToolRunner.execute() method in src/cleveragents/tool/runner.py (lines 472–524) creates checkpoints only when:

  1. _tool_writes is True (tool has capabilities.writes=True)
  2. The trigger is active in auto_checkpoint_triggers
  3. self._checkpoint_service is not None

If self._checkpoint_service is None, the checkpoint creation is silently skipped (line 150):

def _try_create_tool_checkpoint(self, ...):
    if self._checkpoint_service is None:
        return  # ← silently skips, no error raised

The ToolRunner has no access to the SafetyProfile at all — it does not accept a safety_profile parameter and never checks require_checkpoints. A write tool will execute without any checkpoint even when require_checkpoints=true in the active safety profile.

Code Location

  • src/cleveragents/tool/runner.py, _try_create_tool_checkpoint() method (lines ~132–165)
  • src/cleveragents/tool/runner.py, execute() method (lines 220–530)

Steps to Reproduce

  1. Configure a plan with an automation profile where safety.require_checkpoints=true (the default)
  2. Execute the plan without wiring a CheckpointService to the ToolRunner
  3. Invoke a write tool (e.g., write_file)
  4. Expected: Error — "Cannot execute write tool: require_checkpoints=true but no checkpoint service is configured"
  5. Actual: Write tool executes successfully without creating any checkpoint

Impact

  • Safety: The require_checkpoints safety constraint is silently ignored, making it impossible to guarantee rollback capability for plans that require it
  • Spec violation: The safety profile's require_checkpoints=true field has no effect on tool execution
  • Rollback: Plans that rely on checkpoints for rollback may have no checkpoints available, making agents plan rollback fail

See also #4882 (ToolRunner does not enforce require_sandbox) — the same pattern of safety profile fields being ignored at the tool execution layer.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: Sandbox and Checkpoint **Tested By**: UAT worker (uat-pool-1), feature area: Sandbox and Checkpoint **Severity**: High (safety constraint not enforced — write tools execute without checkpoints when `require_checkpoints=true`) --- ## What Was Tested The `ToolRunner.execute()` method was analyzed to verify it enforces the `require_checkpoints` field from the `SafetyProfile` as defined in the specification (§Safety Profile, line 28515). ## Expected Behavior (from spec) The spec defines `require_checkpoints` (§Safety Profile, line 28515): > `require_checkpoints` | boolean | `true` | Require checkpointing during Execute | When `true`, tools must create checkpoints before writes. When `false`, checkpointing is optional. When `require_checkpoints=true` (the default), the system must: 1. Ensure a `CheckpointService` is available before executing any write tool 2. Block execution of write tools if no checkpoint service is configured 3. Create a checkpoint before each write tool execution ## Actual Behavior (from code) The `ToolRunner.execute()` method in `src/cleveragents/tool/runner.py` (lines 472–524) creates checkpoints only when: 1. `_tool_writes` is True (tool has `capabilities.writes=True`) 2. The trigger is active in `auto_checkpoint_triggers` 3. `self._checkpoint_service is not None` If `self._checkpoint_service is None`, the checkpoint creation is silently skipped (line 150): ```python def _try_create_tool_checkpoint(self, ...): if self._checkpoint_service is None: return # ← silently skips, no error raised ``` The `ToolRunner` has no access to the `SafetyProfile` at all — it does not accept a `safety_profile` parameter and never checks `require_checkpoints`. A write tool will execute without any checkpoint even when `require_checkpoints=true` in the active safety profile. ## Code Location - `src/cleveragents/tool/runner.py`, `_try_create_tool_checkpoint()` method (lines ~132–165) - `src/cleveragents/tool/runner.py`, `execute()` method (lines 220–530) ## Steps to Reproduce 1. Configure a plan with an automation profile where `safety.require_checkpoints=true` (the default) 2. Execute the plan without wiring a `CheckpointService` to the `ToolRunner` 3. Invoke a write tool (e.g., `write_file`) 4. Expected: Error — "Cannot execute write tool: require_checkpoints=true but no checkpoint service is configured" 5. Actual: Write tool executes successfully without creating any checkpoint ## Impact - **Safety**: The `require_checkpoints` safety constraint is silently ignored, making it impossible to guarantee rollback capability for plans that require it - **Spec violation**: The safety profile's `require_checkpoints=true` field has no effect on tool execution - **Rollback**: Plans that rely on checkpoints for rollback may have no checkpoints available, making `agents plan rollback` fail ## Related Issue See also #4882 (ToolRunner does not enforce `require_sandbox`) — the same pattern of safety profile fields being ignored at the tool execution layer. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
HAL9000 added this to the v3.2.0 milestone 2026-04-09 01:01:26 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: High — Spec compliance bug that breaks documented behavior
  • Milestone: v3.2.0
  • Story Points: 3 — M
  • MoSCoW: Must Have — Spec compliance is required

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: High — Spec compliance bug that breaks documented behavior - **Milestone**: v3.2.0 - **Story Points**: 3 — M - **MoSCoW**: Must Have — Spec compliance is required --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#5086
No description provided.