UAT: ToolRuntime and safety profile enforcement (_enforce_capabilities) are not wired into the execution pipeline — safety constraints are never enforced at runtime #4026

Open
opened 2026-04-06 08:45:40 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/tool-runtime-execution-pipeline-integration
  • Commit Message: fix(tool-runtime): wire ToolRuntime and safety profile enforcement into execution pipeline
  • Milestone: (none — backlog)
  • Parent Epic: #3370

Bug Report

What Was Tested

The ToolRuntime class and its _enforce_capabilities() method were analyzed to determine whether safety profile constraints are actually enforced during plan execution.

Expected Behavior (from spec)

Per docs/specification.md lines 28379–28381 and the safety_profile.py docstring:

Safety profile enforcement is implemented in the tool execution pipeline. The resolve_safety_profile function selects the highest-precedence profile, and ToolRuntime._enforce_capabilities enforces the resolved profile's constraints at tool activation and execution time.

The ToolRuntime._enforce_capabilities() method should be called for every tool invocation during plan execution, enforcing:

  • Unsafe tool gating (allow_unsafe_tools)
  • Skill category allow-list
  • Sandbox requirement (require_sandbox)
  • Human approval requirement (require_human_approval)
  • Cost limits (max_cost_per_plan, max_total_cost)
  • Retry limits (max_retries_per_step)

Actual Behavior

A grep of the entire src/cleveragents/ directory shows that ToolRuntime is:

  1. Defined in src/cleveragents/tool/lifecycle.py
  2. Exported from src/cleveragents/tool/__init__.py
  3. Never imported or used in any application service, actor, or runtime code

The ToolRuntime class is only used in unit tests (via features/steps/). No production code in src/cleveragents/application/, src/cleveragents/runtime/, src/cleveragents/actor/, or src/cleveragents/agents/ imports or instantiates ToolRuntime.

This means:

  • Safety profile constraints (require_sandbox, allow_unsafe_tools, require_human_approval, cost limits, retry limits) are never enforced during actual plan execution
  • The ToolExecutionContext with safety_profile= is never created in production code
  • All safety profile enforcement tests pass because they test the ToolRuntime in isolation, but the runtime is not connected to the actual execution pipeline

Code Evidence

# ToolRuntime is only referenced in:
grep -rn "ToolRuntime" src/cleveragents/ --include="*.py"
# Results: only tool/lifecycle.py, tool/__init__.py, tool/wrapping.py (type hints only)
# No results in: application/, runtime/, actor/, agents/, langgraph/

Impact

This is a critical safety gap: users who configure require_sandbox=True, allow_unsafe_tools=False, or cost limits in their safety profiles will find these constraints are silently ignored during plan execution.

Code Location

  • src/cleveragents/tool/lifecycle.pyToolRuntime class (defined but not wired)
  • src/cleveragents/tool/context.pyToolExecutionContext (defined but not instantiated in production)

Fix Required

The ToolRuntime must be integrated into the actual tool execution pipeline. The actor implementations (strategy actor, execution actor) must:

  1. Create a ToolExecutionContext with the resolved safety_profile from the plan's automation profile
  2. Use ToolRuntime.activate(), ToolRuntime.execute(), and ToolRuntime.deactivate() for all tool invocations
  3. Pass the ToolExecutionContext through the execution chain

Subtasks

  • Identify all actor implementations that invoke tools
  • Inject ToolRuntime into actor implementations
  • Create ToolExecutionContext with resolved safety profile in actor execution
  • Add integration tests verifying safety constraints are enforced during plan execution
  • Add BDD scenario: plan with allow_unsafe_tools=False blocks unsafe tool invocation
  • Run nox (all default sessions), fix any errors
  • Verify coverage >=97% via nox -s coverage_report

Definition of Done

  • ToolRuntime._enforce_capabilities() is called for every tool invocation during plan execution
  • Safety profile constraints are enforced in production code
  • Integration tests verify end-to-end safety enforcement
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation, with footer ISSUES CLOSED: #<this issue>
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone (active). It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/tool-runtime-execution-pipeline-integration` - **Commit Message**: `fix(tool-runtime): wire ToolRuntime and safety profile enforcement into execution pipeline` - **Milestone**: *(none — backlog)* - **Parent Epic**: #3370 ## Bug Report ### What Was Tested The `ToolRuntime` class and its `_enforce_capabilities()` method were analyzed to determine whether safety profile constraints are actually enforced during plan execution. ### Expected Behavior (from spec) Per `docs/specification.md` lines 28379–28381 and the `safety_profile.py` docstring: > Safety profile enforcement is implemented in the tool execution pipeline. The `resolve_safety_profile` function selects the highest-precedence profile, and `ToolRuntime._enforce_capabilities` enforces the resolved profile's constraints at tool activation and execution time. The `ToolRuntime._enforce_capabilities()` method should be called for every tool invocation during plan execution, enforcing: - Unsafe tool gating (`allow_unsafe_tools`) - Skill category allow-list - Sandbox requirement (`require_sandbox`) - Human approval requirement (`require_human_approval`) - Cost limits (`max_cost_per_plan`, `max_total_cost`) - Retry limits (`max_retries_per_step`) ### Actual Behavior A grep of the entire `src/cleveragents/` directory shows that `ToolRuntime` is: 1. Defined in `src/cleveragents/tool/lifecycle.py` 2. Exported from `src/cleveragents/tool/__init__.py` 3. **Never imported or used in any application service, actor, or runtime code** The `ToolRuntime` class is only used in unit tests (via `features/steps/`). No production code in `src/cleveragents/application/`, `src/cleveragents/runtime/`, `src/cleveragents/actor/`, or `src/cleveragents/agents/` imports or instantiates `ToolRuntime`. This means: - Safety profile constraints (`require_sandbox`, `allow_unsafe_tools`, `require_human_approval`, cost limits, retry limits) are **never enforced** during actual plan execution - The `ToolExecutionContext` with `safety_profile=` is never created in production code - All safety profile enforcement tests pass because they test the `ToolRuntime` in isolation, but the runtime is not connected to the actual execution pipeline ### Code Evidence ```bash # ToolRuntime is only referenced in: grep -rn "ToolRuntime" src/cleveragents/ --include="*.py" # Results: only tool/lifecycle.py, tool/__init__.py, tool/wrapping.py (type hints only) # No results in: application/, runtime/, actor/, agents/, langgraph/ ``` ### Impact This is a critical safety gap: users who configure `require_sandbox=True`, `allow_unsafe_tools=False`, or cost limits in their safety profiles will find these constraints are silently ignored during plan execution. ### Code Location - `src/cleveragents/tool/lifecycle.py` — `ToolRuntime` class (defined but not wired) - `src/cleveragents/tool/context.py` — `ToolExecutionContext` (defined but not instantiated in production) ### Fix Required The `ToolRuntime` must be integrated into the actual tool execution pipeline. The actor implementations (strategy actor, execution actor) must: 1. Create a `ToolExecutionContext` with the resolved `safety_profile` from the plan's automation profile 2. Use `ToolRuntime.activate()`, `ToolRuntime.execute()`, and `ToolRuntime.deactivate()` for all tool invocations 3. Pass the `ToolExecutionContext` through the execution chain ## Subtasks - [ ] Identify all actor implementations that invoke tools - [ ] Inject `ToolRuntime` into actor implementations - [ ] Create `ToolExecutionContext` with resolved safety profile in actor execution - [ ] Add integration tests verifying safety constraints are enforced during plan execution - [ ] Add BDD scenario: plan with `allow_unsafe_tools=False` blocks unsafe tool invocation - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage >=97% via `nox -s coverage_report` ## Definition of Done - [ ] `ToolRuntime._enforce_capabilities()` is called for every tool invocation during plan execution - [ ] Safety profile constraints are enforced in production code - [ ] Integration tests verify end-to-end safety enforcement - [ ] A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation, with footer `ISSUES CLOSED: #<this issue>` - [ ] The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly - [ ] The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done - All nox stages pass - Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone (active). It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:49 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#4026
No description provided.