UAT: Guard enforcement (denylist, budget caps, tool call limits) never invoked during tool execution — check_guard and evaluate_guard are dead code #3346

Open
opened 2026-04-05 10:24:39 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/m6-guard-enforcement-invoke-check-guard-in-runtime
  • Commit Message: fix(tool): invoke guard enforcement checks before each tool execution in actor runtime
  • Milestone: v3.5.0
  • Parent Epic: #397

Background and Context

Per docs/specification.md §Automation Guards:

"Guard evaluation is performed via AutomationProfile.check_guard(tool_name, is_write, cost_so_far, calls_so_far, scope). Evaluation order: denylist → allowlist → tool call limit → budget cap → write approval → apply approval."

The spec requires that before each tool invocation, the runtime checks the active automation profile's guards. If a tool is on the denylist, or the tool call limit is reached, or the budget cap is exceeded, the tool call must be blocked and requires_approval=True returned.

Bug Description

The guard enforcement logic (AutomationProfile.check_guard() and AutomationProfileService.evaluate_guard()) is implemented in the domain model but is never called during actual tool execution. The denylist, budget caps, and tool call limits defined in an AutomationGuard are silently ignored at runtime.

Code evidence:

$ grep -rn "check_guard\|evaluate_guard" src/cleveragents/ --include="*.py"
# Results: ONLY the method definitions themselves — no call sites

The tool execution path (src/cleveragents/tool/actor_runtime.py, src/cleveragents/tool/router.py, src/cleveragents/tool/runner.py) never invokes guard checks before executing tool calls. The ToolActorContext carries an automation_profile field but it is never used to evaluate guards.

Expected Behavior

Before each tool invocation, the runtime must:

  1. Resolve the active AutomationProfile from the ToolActorContext
  2. Call AutomationProfile.check_guard(tool_name, is_write, cost_so_far, calls_so_far, scope) (or delegate to AutomationProfileService.evaluate_guard())
  3. Evaluate in order: denylist → allowlist → tool call limit → budget cap → write approval → apply approval
  4. If any guard blocks the tool, return requires_approval=True without executing the tool

Actual Behavior

Guard checks are never performed. Tools on the denylist execute freely. Budget caps are never enforced. Tool call limits are never checked. The AutomationGuard configuration is completely inert at runtime.

Steps to Reproduce

  1. Create an AutomationProfile with a guard that denylists a tool (e.g., tool_denylist: ["dangerous_tool"])
  2. Execute a plan that calls dangerous_tool
  3. Observe that dangerous_tool executes without any guard check

Code Locations

  • src/cleveragents/domain/models/core/automation_profile.py, line 277 — check_guard method (defined but never called)
  • src/cleveragents/application/services/automation_profile_service.py, line 251 — evaluate_guard method (defined but never called)
  • src/cleveragents/tool/actor_runtime.py — tool execution loop (no guard checks)
  • src/cleveragents/tool/actor_context.py, line 117 — automation_profile field (carried but never used for guard evaluation)

Impact

  • The v3.5.0 milestone acceptance criterion "Guard enforcement works (denylist, budget caps, tool call limits)" is completely unmet
  • Security and cost controls are non-functional — any tool can be called regardless of denylist configuration
  • Budget caps set by operators are silently ignored

Subtasks

  • Audit actor_runtime.py tool execution loop to identify the correct pre-invocation hook point
  • Retrieve the active AutomationProfile from ToolActorContext at the hook point
  • Call AutomationProfileService.evaluate_guard() (or AutomationProfile.check_guard() directly) before each tool dispatch
  • Implement guard result handling: if blocked, return requires_approval=True without executing the tool
  • Track cost_so_far and calls_so_far state across tool calls within a single actor run
  • Ensure denylist enforcement blocks execution and returns correct approval signal
  • Ensure tool call limit enforcement blocks execution when limit is reached
  • Ensure budget cap enforcement blocks execution when cap is exceeded
  • Ensure write approval gate is respected for write-classified tools
  • Add unit tests (Behave/pytest) for each guard evaluation path (denylist, allowlist, call limit, budget cap, write approval)
  • Add integration test: plan with denylisted tool → tool is blocked, requires_approval=True returned
  • Run nox (all default sessions), fix any errors
  • Verify coverage ≥ 97% via nox -s coverage_report

Definition of Done

  • check_guard / evaluate_guard is called before every tool invocation in the actor runtime
  • Denylisted tools are blocked and never executed
  • Tool call limit is enforced — execution stops when the limit is reached
  • Budget cap is enforced — execution stops when the cap is exceeded
  • Write approval gate is respected for write-classified tools
  • All guard evaluation paths are covered by tests
  • v3.5.0 acceptance criterion "Guard enforcement works (denylist, budget caps, tool call limits)" is met
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/m6-guard-enforcement-invoke-check-guard-in-runtime` - **Commit Message**: `fix(tool): invoke guard enforcement checks before each tool execution in actor runtime` - **Milestone**: v3.5.0 - **Parent Epic**: #397 ## Background and Context Per `docs/specification.md` §Automation Guards: > "Guard evaluation is performed via `AutomationProfile.check_guard(tool_name, is_write, cost_so_far, calls_so_far, scope)`. Evaluation order: denylist → allowlist → tool call limit → budget cap → write approval → apply approval." The spec requires that before each tool invocation, the runtime checks the active automation profile's guards. If a tool is on the denylist, or the tool call limit is reached, or the budget cap is exceeded, the tool call must be blocked and `requires_approval=True` returned. ## Bug Description The guard enforcement logic (`AutomationProfile.check_guard()` and `AutomationProfileService.evaluate_guard()`) is implemented in the domain model but is **never called** during actual tool execution. The denylist, budget caps, and tool call limits defined in an `AutomationGuard` are silently ignored at runtime. **Code evidence:** ``` $ grep -rn "check_guard\|evaluate_guard" src/cleveragents/ --include="*.py" # Results: ONLY the method definitions themselves — no call sites ``` The tool execution path (`src/cleveragents/tool/actor_runtime.py`, `src/cleveragents/tool/router.py`, `src/cleveragents/tool/runner.py`) never invokes guard checks before executing tool calls. The `ToolActorContext` carries an `automation_profile` field but it is never used to evaluate guards. ## Expected Behavior Before each tool invocation, the runtime must: 1. Resolve the active `AutomationProfile` from the `ToolActorContext` 2. Call `AutomationProfile.check_guard(tool_name, is_write, cost_so_far, calls_so_far, scope)` (or delegate to `AutomationProfileService.evaluate_guard()`) 3. Evaluate in order: denylist → allowlist → tool call limit → budget cap → write approval → apply approval 4. If any guard blocks the tool, return `requires_approval=True` without executing the tool ## Actual Behavior Guard checks are never performed. Tools on the denylist execute freely. Budget caps are never enforced. Tool call limits are never checked. The `AutomationGuard` configuration is completely inert at runtime. ## Steps to Reproduce 1. Create an `AutomationProfile` with a guard that denylists a tool (e.g., `tool_denylist: ["dangerous_tool"]`) 2. Execute a plan that calls `dangerous_tool` 3. Observe that `dangerous_tool` executes without any guard check ## Code Locations - `src/cleveragents/domain/models/core/automation_profile.py`, line 277 — `check_guard` method (defined but never called) - `src/cleveragents/application/services/automation_profile_service.py`, line 251 — `evaluate_guard` method (defined but never called) - `src/cleveragents/tool/actor_runtime.py` — tool execution loop (no guard checks) - `src/cleveragents/tool/actor_context.py`, line 117 — `automation_profile` field (carried but never used for guard evaluation) ## Impact - The v3.5.0 milestone acceptance criterion **"Guard enforcement works (denylist, budget caps, tool call limits)"** is completely unmet - Security and cost controls are non-functional — any tool can be called regardless of denylist configuration - Budget caps set by operators are silently ignored ## Subtasks - [ ] Audit `actor_runtime.py` tool execution loop to identify the correct pre-invocation hook point - [ ] Retrieve the active `AutomationProfile` from `ToolActorContext` at the hook point - [ ] Call `AutomationProfileService.evaluate_guard()` (or `AutomationProfile.check_guard()` directly) before each tool dispatch - [ ] Implement guard result handling: if blocked, return `requires_approval=True` without executing the tool - [ ] Track `cost_so_far` and `calls_so_far` state across tool calls within a single actor run - [ ] Ensure denylist enforcement blocks execution and returns correct approval signal - [ ] Ensure tool call limit enforcement blocks execution when limit is reached - [ ] Ensure budget cap enforcement blocks execution when cap is exceeded - [ ] Ensure write approval gate is respected for write-classified tools - [ ] Add unit tests (Behave/pytest) for each guard evaluation path (denylist, allowlist, call limit, budget cap, write approval) - [ ] Add integration test: plan with denylisted tool → tool is blocked, `requires_approval=True` returned - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` ## Definition of Done - [ ] `check_guard` / `evaluate_guard` is called before every tool invocation in the actor runtime - [ ] Denylisted tools are blocked and never executed - [ ] Tool call limit is enforced — execution stops when the limit is reached - [ ] Budget cap is enforced — execution stops when the cap is exceeded - [ ] Write approval gate is respected for write-classified tools - [ ] All guard evaluation paths are covered by tests - [ ] v3.5.0 acceptance criterion "Guard enforcement works (denylist, budget caps, tool call limits)" is met - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.5.0 milestone 2026-04-05 10:32:14 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — Guard enforcement is an explicit v3.5.0 acceptance criterion: "Guard enforcement works (denylist, budget caps, tool call limits)." This bug means that criterion is completely unmet. check_guard() and evaluate_guard() are dead code — never called in the execution pipeline.
  • Milestone: v3.5.0 (already set)
  • Story Points: 8 — XL — Requires auditing the tool execution pipeline, wiring guard checks at the correct hook points, implementing guard result handling, tracking cost/calls state, and comprehensive test coverage. Estimated 2-4 days.
  • MoSCoW: Must Have — This is a blocking acceptance criterion for v3.5.0. The milestone cannot be considered complete without functional guard enforcement.
  • Parent Epic: #397 (Server & Autonomy Infrastructure)

Note: This issue overlaps with #3372 (v3.6.0, Epic #3370). The implementation that fixes this bug should also close #3372. The implementer should reference both issues in the PR.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical — Guard enforcement is an explicit v3.5.0 acceptance criterion: "Guard enforcement works (denylist, budget caps, tool call limits)." This bug means that criterion is completely unmet. `check_guard()` and `evaluate_guard()` are dead code — never called in the execution pipeline. - **Milestone**: v3.5.0 (already set) - **Story Points**: 8 — XL — Requires auditing the tool execution pipeline, wiring guard checks at the correct hook points, implementing guard result handling, tracking cost/calls state, and comprehensive test coverage. Estimated 2-4 days. - **MoSCoW**: Must Have — This is a blocking acceptance criterion for v3.5.0. The milestone cannot be considered complete without functional guard enforcement. - **Parent Epic**: #397 (Server & Autonomy Infrastructure) Note: This issue overlaps with #3372 (v3.6.0, Epic #3370). The implementation that fixes this bug should also close #3372. The implementer should reference both issues in the PR. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.5.0 milestone 2026-04-06 21:05:28 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#397 Epic: Server & Autonomy Infrastructure
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3346
No description provided.