UAT: AutomationGuard denylist, budget caps, and tool call limits are never enforced during plan execution — check_guard() is never called in the execution pipeline #3294

Open
opened 2026-04-05 09:14:34 +00:00 by freemo · 3 comments
Owner

Metadata

  • Branch: fix/automation-guard-execution-enforcement
  • Commit Message: fix(executor): enforce AutomationGuard denylist, budget, and tool-call limits during execution
  • Milestone: v3.5.0
  • Parent Epic: #397

Bug Report

Feature Area: Guard Enforcement — Denylist, Budget Caps, Tool Call Limits (M6: Autonomy Hardening)

What was tested

The AutomationGuard enforcement during plan execution via AutomationProfile.check_guard() and AutomationProfileService.evaluate_guard().

Expected behavior (from spec)

Per docs/specification.md §Automation Profiles and the M6 acceptance criterion "Guard enforcement: denylist, budget caps, tool call limits": When a plan executes with an automation profile that has guards configured (tool denylist, allowlist, max tool calls per step, max total cost), those guards must be enforced before each tool invocation. Tools on the denylist must be blocked; budget caps must prevent tool calls that would exceed the budget; tool call limits must stop execution when the per-step limit is reached.

Actual behavior (from code analysis)

AutomationProfile.check_guard() (lines 277-353 of automation_profile.py) and AutomationProfileService.evaluate_guard() (lines 251-288 of automation_profile_service.py) are never called anywhere in the execution pipeline:

# Searched all execution-layer files:
# - src/cleveragents/application/services/plan_executor.py
# - src/cleveragents/application/services/plan_lifecycle_service.py
# - src/cleveragents/tool/actor_runtime.py
# - src/cleveragents/tool/container_executor.py
# - src/cleveragents/langgraph/graph.py
# Result: ZERO occurrences of check_guard or evaluate_guard

The guard enforcement exists as:

  • AutomationGuard domain model (automation_guard.py) — denylist, allowlist, max_tool_calls_per_step, max_total_cost
  • AutomationProfile.check_guard() — evaluates guards for a tool invocation
  • AutomationProfileService.evaluate_guard() — service-level guard evaluation

But none of these are called when tools are actually invoked during plan execution. A plan with a denylist containing "shell_exec" will still allow shell_exec to run.

Note: The AutonomyGuardrailService (a separate system) IS wired to PlanExecutor for step limits and wall-clock time. But the AutomationGuard denylist/allowlist/budget/tool-call-limit enforcement is completely separate and unimplemented.

Code location

  • src/cleveragents/domain/models/core/automation_guard.py — guard model (implemented, never used)
  • src/cleveragents/domain/models/core/automation_profile.py, lines 277-353 — check_guard() (implemented, never called)
  • src/cleveragents/application/services/automation_profile_service.py, lines 251-288 — evaluate_guard() (implemented, never called)
  • src/cleveragents/application/services/plan_executor.py — no guard enforcement
  • src/cleveragents/tool/actor_runtime.py — no guard enforcement

Steps to reproduce

  1. Create an automation profile with a denylist containing "shell_exec"
  2. Create and execute a plan using that profile
  3. Observe that shell_exec tool calls are NOT blocked — the denylist is ignored

Impact

  • Critical: The M6 acceptance criterion "Guard enforcement: denylist, budget caps, tool call limits" is completely non-functional
  • Plans can invoke any tool regardless of denylist configuration
  • Budget caps are not enforced at the tool invocation level (only at the guardrail step level)
  • Tool call limits per step are not enforced
  • This is a core safety feature for autonomous plan execution

Subtasks

  • Wire AutomationProfile.check_guard() into the tool invocation path in PlanExecutor or actor_runtime.py
  • Block tool calls that are on the denylist
  • Block tool calls that would exceed the budget cap
  • Block tool calls that exceed the per-step limit
  • Add BDD tests verifying guard enforcement during execution
  • Add integration test verifying denylist blocks tool invocation

Definition of Done

  • Tools on the denylist are blocked during plan execution
  • Budget caps prevent tool calls that would exceed the limit
  • Per-step tool call limits are enforced
  • BDD tests verify all three guard types
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/automation-guard-execution-enforcement` - **Commit Message**: `fix(executor): enforce AutomationGuard denylist, budget, and tool-call limits during execution` - **Milestone**: v3.5.0 - **Parent Epic**: #397 ## Bug Report **Feature Area**: Guard Enforcement — Denylist, Budget Caps, Tool Call Limits (M6: Autonomy Hardening) ### What was tested The `AutomationGuard` enforcement during plan execution via `AutomationProfile.check_guard()` and `AutomationProfileService.evaluate_guard()`. ### Expected behavior (from spec) Per `docs/specification.md` §Automation Profiles and the M6 acceptance criterion "Guard enforcement: denylist, budget caps, tool call limits": When a plan executes with an automation profile that has guards configured (tool denylist, allowlist, max tool calls per step, max total cost), those guards must be enforced before each tool invocation. Tools on the denylist must be blocked; budget caps must prevent tool calls that would exceed the budget; tool call limits must stop execution when the per-step limit is reached. ### Actual behavior (from code analysis) `AutomationProfile.check_guard()` (lines 277-353 of `automation_profile.py`) and `AutomationProfileService.evaluate_guard()` (lines 251-288 of `automation_profile_service.py`) are **never called** anywhere in the execution pipeline: ```bash # Searched all execution-layer files: # - src/cleveragents/application/services/plan_executor.py # - src/cleveragents/application/services/plan_lifecycle_service.py # - src/cleveragents/tool/actor_runtime.py # - src/cleveragents/tool/container_executor.py # - src/cleveragents/langgraph/graph.py # Result: ZERO occurrences of check_guard or evaluate_guard ``` The guard enforcement exists as: - `AutomationGuard` domain model (`automation_guard.py`) — denylist, allowlist, max_tool_calls_per_step, max_total_cost - `AutomationProfile.check_guard()` — evaluates guards for a tool invocation - `AutomationProfileService.evaluate_guard()` — service-level guard evaluation But none of these are called when tools are actually invoked during plan execution. A plan with a denylist containing `"shell_exec"` will still allow `shell_exec` to run. Note: The `AutonomyGuardrailService` (a separate system) IS wired to `PlanExecutor` for step limits and wall-clock time. But the `AutomationGuard` denylist/allowlist/budget/tool-call-limit enforcement is completely separate and unimplemented. ### Code location - `src/cleveragents/domain/models/core/automation_guard.py` — guard model (implemented, never used) - `src/cleveragents/domain/models/core/automation_profile.py`, lines 277-353 — `check_guard()` (implemented, never called) - `src/cleveragents/application/services/automation_profile_service.py`, lines 251-288 — `evaluate_guard()` (implemented, never called) - `src/cleveragents/application/services/plan_executor.py` — no guard enforcement - `src/cleveragents/tool/actor_runtime.py` — no guard enforcement ### Steps to reproduce 1. Create an automation profile with a denylist containing `"shell_exec"` 2. Create and execute a plan using that profile 3. Observe that `shell_exec` tool calls are NOT blocked — the denylist is ignored ### Impact - **Critical**: The M6 acceptance criterion "Guard enforcement: denylist, budget caps, tool call limits" is completely non-functional - Plans can invoke any tool regardless of denylist configuration - Budget caps are not enforced at the tool invocation level (only at the guardrail step level) - Tool call limits per step are not enforced - This is a core safety feature for autonomous plan execution ## Subtasks - [ ] Wire `AutomationProfile.check_guard()` into the tool invocation path in `PlanExecutor` or `actor_runtime.py` - [ ] Block tool calls that are on the denylist - [ ] Block tool calls that would exceed the budget cap - [ ] Block tool calls that exceed the per-step limit - [ ] Add BDD tests verifying guard enforcement during execution - [ ] Add integration test verifying denylist blocks tool invocation ## Definition of Done - [ ] Tools on the denylist are blocked during plan execution - [ ] Budget caps prevent tool calls that would exceed the limit - [ ] Per-step tool call limits are enforced - [ ] BDD tests verify all three guard types - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.5.0 milestone 2026-04-05 09:19:17 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical (keeping existing — guard enforcement is a core v3.5.0 acceptance criterion)
  • Milestone: v3.5.0 (already assigned, keeping)
  • MoSCoW: Must Have — v3.5.0 acceptance criteria explicitly requires "Guard enforcement works (denylist, budget caps, tool call limits)." The AutomationGuard model and check_guard() method exist but are never called during execution, meaning guards are completely non-functional.
  • Parent Epic: #397 (Server & Autonomy Infrastructure)

This is a critical safety gap — plans can invoke any tool regardless of denylist configuration. Combined with the prompt injection vulnerability (#3236), this means the autonomy hardening milestone has significant safety holes.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical (keeping existing — guard enforcement is a core v3.5.0 acceptance criterion) - **Milestone**: v3.5.0 (already assigned, keeping) - **MoSCoW**: Must Have — v3.5.0 acceptance criteria explicitly requires "Guard enforcement works (denylist, budget caps, tool call limits)." The AutomationGuard model and check_guard() method exist but are never called during execution, meaning guards are completely non-functional. - **Parent Epic**: #397 (Server & Autonomy Infrastructure) This is a critical safety gap — plans can invoke any tool regardless of denylist configuration. Combined with the prompt injection vulnerability (#3236), this means the autonomy hardening milestone has significant safety holes. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — AutomationGuard.check_guard() is never called in the execution pipeline. This means the denylist, budget caps, and tool call limits are completely unenforced during plan execution. Any tool can be called, any budget can be exceeded, and any denied operation can proceed.
  • Milestone: v3.5.0 (already assigned)
  • MoSCoW: Must Have — The automation guard is a critical safety mechanism. Per the specification, safety profiles and automation guards MUST be enforced during execution. Without this, the system has no runtime safety boundaries.

This is now the 7th Critical Must Have bug in v3.5.0. The milestone has severe safety and correctness gaps that must be addressed before it can ship.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified ✅ - **Priority**: Critical — `AutomationGuard.check_guard()` is never called in the execution pipeline. This means the denylist, budget caps, and tool call limits are completely unenforced during plan execution. Any tool can be called, any budget can be exceeded, and any denied operation can proceed. - **Milestone**: v3.5.0 (already assigned) - **MoSCoW**: Must Have — The automation guard is a critical safety mechanism. Per the specification, safety profiles and automation guards MUST be enforced during execution. Without this, the system has no runtime safety boundaries. This is now the **7th Critical Must Have bug** in v3.5.0. The milestone has severe safety and correctness gaps that must be addressed before it can ship. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — AutomationGuard enforcement is completely non-functional. Denylist, budget caps, and tool call limits are never enforced during plan execution.
  • Milestone: v3.5.0 (already assigned — correct, as guard enforcement is a core M6 acceptance criterion)
  • MoSCoW: Must Have — the v3.5.0 acceptance criteria explicitly state: "Guard enforcement works (denylist, budget caps, tool call limits)". If AutomationGuard is never invoked, this criterion cannot be met. This is essential for milestone completion.

This is now the 4th Critical/Must Have issue blocking milestone completion. Combined with #3171 (v3.3.0), #3231 (v3.5.0), and #3156 (v3.2.0), these represent the most urgent work items across all milestones.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical — AutomationGuard enforcement is completely non-functional. Denylist, budget caps, and tool call limits are never enforced during plan execution. - **Milestone**: v3.5.0 (already assigned — correct, as guard enforcement is a core M6 acceptance criterion) - **MoSCoW**: Must Have — the v3.5.0 acceptance criteria explicitly state: "Guard enforcement works (denylist, budget caps, tool call limits)". If AutomationGuard is never invoked, this criterion cannot be met. This is essential for milestone completion. This is now the **4th Critical/Must Have** issue blocking milestone completion. Combined with #3171 (v3.3.0), #3231 (v3.5.0), and #3156 (v3.2.0), these represent the most urgent work items across all milestones. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.5.0 milestone 2026-04-06 21:05:28 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#397 Epic: Server & Autonomy Infrastructure
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3294
No description provided.