feat(safety): Uncertainty band LLM escalation for AutonomyController #6760

Open
opened 2026-04-10 02:01:49 +00:00 by drew · 4 comments
Member

Overview

Add an optional LLM-based second stage to the existing AutonomyController that activates only when the heuristic confidence score falls within an uncertainty band around the active automation profile threshold. This augments the existing Semantic Escalation system without replacing it.

Gap Being Filled

The current AutonomyController computes confidence from four heuristic factors: past success rate, codebase familiarity, risk assessment, and invariant complexity. These are population-level statistics about operation types on a project — they are completely blind to the specific content of the tool call being evaluated right now.

Example: The execute_command threshold is set to 0.7. The system scores 0.82 based on historical success rates on this project and proceeds automatically. But the specific command being attempted is git reset --hard HEAD~10 on a branch with uncommitted work. The heuristic score cannot see that. In ci or full-auto mode, this matters.

The gap: Semantic Escalation knows nothing about what is in the tool call — only the statistical history of that operation type on this project.

Proposed Design

`
Stage 1 (existing): Heuristic AutonomyController — always runs, zero LLM cost
├── score >= threshold + margin → proceed automatically (no LLM call)
├── score <= threshold - margin → escalate to user (no LLM call)
└── score within uncertainty band → Stage 2

Stage 2 (new): Cheap LLM evaluator reads actual tool call name + args + context
├── safe → proceed automatically
└── unsafe → escalate or request confirmation
`

  • Uncertainty band (e.g., ±0.10 around the threshold, configurable) means the LLM is only invoked when the heuristic is genuinely ambiguous — roughly 10–20% of decisions in practice.
  • Result caching by hash of (tool name + argument fingerprint + operation type) with a configurable TTL (default: 1 hour) to avoid redundant calls for identical repeated operations.
  • Model selection via a new escalation.classifier.model config key, following the pattern of the existing context.summarize.model setting. Set to a cheap/fast model by default.
  • Integration point: Optional llm_classifier dependency injected into AutonomyController. When absent, behaviour is identical to today.

Inspiration from Claude Code

This is adapted from Claude Code's YOLO Classifier (yoloClassifier.ts), revealed in the March 2026 source map leak. CC uses a two-stage pipeline for auto-mode tool call evaluation:

  • Stage 1: Cheap model (Haiku), stop sequences forcing binary yes/no output, 64-token max, temperature 0. Cache hit rate ~60–80%, making most Stage 1 decisions free.
  • Stage 2: Frontier model, full chain-of-thought, 4,096 tokens. Only reached for borderline cases.

The adaptation fits within CleverAgents' existing Semantic Escalation architecture rather than introducing a parallel system. CC's YOLO is per-tool-call; this proposal makes the LLM an augmentation of the existing heuristic, not a replacement.

Recommendation

If approved, update:

  • Specification §Automation & Safety > Semantic Escalation (~line 28659): document the two-stage option
  • ADR-017 (Automation Profiles): note optional llm_classifier enhancement
  • Config reference: add escalation.classifier.model and escalation.classifier.uncertainty-band keys
  • Implementation: extend src/cleveragents/application/services/autonomy_controller.py
## Overview Add an optional LLM-based second stage to the existing `AutonomyController` that activates **only when the heuristic confidence score falls within an uncertainty band** around the active automation profile threshold. This augments the existing Semantic Escalation system without replacing it. ## Gap Being Filled The current `AutonomyController` computes confidence from four heuristic factors: past success rate, codebase familiarity, risk assessment, and invariant complexity. These are **population-level statistics** about operation types on a project — they are completely blind to the specific content of the tool call being evaluated right now. **Example:** The `execute_command` threshold is set to 0.7. The system scores 0.82 based on historical success rates on this project and proceeds automatically. But the specific command being attempted is `git reset --hard HEAD~10` on a branch with uncommitted work. The heuristic score cannot see that. In `ci` or `full-auto` mode, this matters. **The gap:** Semantic Escalation knows nothing about *what is in* the tool call — only the statistical history of that operation type on this project. ## Proposed Design ` Stage 1 (existing): Heuristic AutonomyController — always runs, zero LLM cost ├── score >= threshold + margin → proceed automatically (no LLM call) ├── score <= threshold - margin → escalate to user (no LLM call) └── score within uncertainty band → Stage 2 Stage 2 (new): Cheap LLM evaluator reads actual tool call name + args + context ├── safe → proceed automatically └── unsafe → escalate or request confirmation ` - **Uncertainty band** (e.g., ±0.10 around the threshold, configurable) means the LLM is only invoked when the heuristic is genuinely ambiguous — roughly 10–20% of decisions in practice. - **Result caching** by hash of (tool name + argument fingerprint + operation type) with a configurable TTL (default: 1 hour) to avoid redundant calls for identical repeated operations. - **Model selection** via a new `escalation.classifier.model` config key, following the pattern of the existing `context.summarize.model` setting. Set to a cheap/fast model by default. - **Integration point:** Optional `llm_classifier` dependency injected into `AutonomyController`. When absent, behaviour is identical to today. ## Inspiration from Claude Code This is adapted from Claude Code's **YOLO Classifier** (`yoloClassifier.ts`), revealed in the March 2026 source map leak. CC uses a two-stage pipeline for auto-mode tool call evaluation: - **Stage 1:** Cheap model (Haiku), stop sequences forcing binary ` yes `/` no ` output, 64-token max, temperature 0. Cache hit rate ~60–80%, making most Stage 1 decisions free. - **Stage 2:** Frontier model, full chain-of-thought, 4,096 tokens. Only reached for borderline cases. The adaptation fits within CleverAgents' existing Semantic Escalation architecture rather than introducing a parallel system. CC's YOLO is per-tool-call; this proposal makes the LLM an *augmentation* of the existing heuristic, not a replacement. ## Recommendation If approved, update: - **Specification** §Automation & Safety > Semantic Escalation (~line 28659): document the two-stage option - **ADR-017** (Automation Profiles): note optional `llm_classifier` enhancement - **Config reference**: add `escalation.classifier.model` and `escalation.classifier.uncertainty-band` keys - **Implementation**: extend `src/cleveragents/application/services/autonomy_controller.py`
Owner

Thank you for filing this proposal, @drew. I have reviewed it.

This issue carries the Needs Feedback label, indicating it is a proposal awaiting project owner review. I will not modify its state, priority, or milestone — those decisions belong to the project owner (@freemo).

Summary of the proposal: Add an optional LLM-based second stage to the existing AutonomyController that activates only when the heuristic confidence score falls within an uncertainty band around the active automation profile threshold. This augments (not replaces) the existing Semantic Escalation system. The LLM is only invoked for genuinely ambiguous decisions (~10-20% in practice), with result caching to avoid redundant calls.

Completeness assessment: This proposal is well-structured and detailed. It includes:

  • Clear gap analysis with a concrete example (git reset --hard on uncommitted work)
  • Concrete proposed design with a decision tree diagram
  • Uncertainty band, result caching, and model selection details
  • Inspiration from Claude Code's YOLO Classifier two-stage pipeline
  • Specific files and spec sections to update if approved

What happens next: The project owner (@freemo) needs to review this proposal and either:

  1. Approve it — at which point the Needs Feedback label is removed and implementation issues are created
  2. Request changes — provide feedback in comments
  3. Decline it — close with a State/Wont Do label and explanation

I will monitor this issue and respond to any questions or comments.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Thank you for filing this proposal, @drew. I have reviewed it. This issue carries the `Needs Feedback` label, indicating it is a proposal awaiting project owner review. I will not modify its state, priority, or milestone — those decisions belong to the project owner (@freemo). **Summary of the proposal:** Add an optional LLM-based second stage to the existing `AutonomyController` that activates only when the heuristic confidence score falls within an uncertainty band around the active automation profile threshold. This augments (not replaces) the existing Semantic Escalation system. The LLM is only invoked for genuinely ambiguous decisions (~10-20% in practice), with result caching to avoid redundant calls. **Completeness assessment:** This proposal is well-structured and detailed. It includes: - Clear gap analysis with a concrete example (git reset --hard on uncommitted work) - Concrete proposed design with a decision tree diagram - Uncertainty band, result caching, and model selection details - Inspiration from Claude Code's YOLO Classifier two-stage pipeline - Specific files and spec sections to update if approved **What happens next:** The project owner (@freemo) needs to review this proposal and either: 1. Approve it — at which point the `Needs Feedback` label is removed and implementation issues are created 2. Request changes — provide feedback in comments 3. Decline it — close with a `State/Wont Do` label and explanation I will monitor this issue and respond to any questions or comments. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
Owner

Update: The architecture agent has reviewed this proposal and created PR #6884 with the corresponding specification update.

Status: PR #6884 is open and awaiting human review from @freemo. Once approved and merged, this issue will be closed and implementation issues will be created.

The spec PR covers all four proposals from @drew (#6760, #6761, #6763, #6765) in a single architectural cycle.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Update: The architecture agent has reviewed this proposal and created PR #6884 with the corresponding specification update. **Status:** PR #6884 is open and awaiting human review from @freemo. Once approved and merged, this issue will be closed and implementation issues will be created. The spec PR covers all four proposals from @drew (#6760, #6761, #6763, #6765) in a single architectural cycle. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
Owner

Label compliance fix applied: Added missing Priority/Backlog label. Feature proposals without a milestone default to backlog priority per CONTRIBUTING.md.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: Added missing `Priority/Backlog` label. Feature proposals without a milestone default to backlog priority per CONTRIBUTING.md. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Owner

Verified — Feature discussion: uncertainty band LLM escalation for AutonomyController. MoSCoW: Could-have. Priority: Low — future enhancement.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Feature discussion: uncertainty band LLM escalation for AutonomyController. MoSCoW: Could-have. Priority: Low — future enhancement. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#6760
No description provided.