UAT: PlanLifecycleService.should_auto_progress() ignores Semantic Escalation confidence — treats any threshold < 1.0 as fully automatic #4018

Open
opened 2026-04-06 08:40:08 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/plan-lifecycle-confidence-based-auto-progress
  • Commit Message: fix(plan-lifecycle): use AutonomyController confidence for should_auto_progress phase transitions
  • Milestone: (none — see backlog note below)
  • Parent Epic: #3370

Bug Report

What Was Tested

The PlanLifecycleService.should_auto_progress() method was analyzed against the specification's Semantic Escalation and automation profile threshold semantics.

Expected Behavior (from spec)

Per docs/specification.md lines 28354, 28362–28364:

Each threshold specifies the minimum confidence score (as computed by the Semantic Escalation system) at which the system proceeds automatically. When the computed confidence for a given operation falls below the profile's threshold for that flag, the system drops to manual mode.

For phase transitions:

  • create_tool: "When confidence >= threshold, Execute begins when Strategize completes."
  • select_tool: "When confidence >= threshold, Apply begins when Execute completes."

The system should compute a confidence score using the AutonomyController (which weighs past_success_rate, codebase_familiarity, risk_assessment, invariant_complexity) and compare it against the profile threshold.

Actual Behavior

src/cleveragents/application/services/plan_lifecycle_service.py, lines 2164–2177:

# Strategize complete -> auto-execute?
if (
    plan.phase == PlanPhase.STRATEGIZE
    and plan.processing_state == ProcessingState.COMPLETE
    and profile.create_tool < 1.0
):
    return True

# Execute complete -> auto-apply?
return bool(
    plan.phase == PlanPhase.EXECUTE
    and plan.processing_state == ProcessingState.COMPLETE
    and profile.select_tool < 1.0
)

The method treats any threshold < 1.0 as "always auto-progress" without computing confidence. This means:

  1. A profile with create_tool=0.5 (meaning "auto-progress when confidence >= 0.5") always auto-progresses regardless of actual confidence
  2. The AutonomyController (which exists and is implemented) is never consulted for phase transitions
  3. Intermediate threshold values (0.1–0.9) have no effect — they behave identically to 0.0 (always automatic)

Impact

  • The cautious profile's intermediate thresholds (0.6–0.8) are effectively ignored for phase transitions — the system always auto-progresses when using cautious profile, defeating the purpose of confidence-gated automation
  • The AutonomyController class is implemented but not used for phase-transition decisions

Code Location

  • src/cleveragents/application/services/plan_lifecycle_service.py lines 2137–2177

Fix Required

The should_auto_progress() method should:

  1. Accept an AutonomyController instance (or use one from the service's dependencies)
  2. Compute confidence factors for the current plan state
  3. Use AutonomyController.should_proceed_automatically() to make the decision
  4. Only fall back to the binary < 1.0 check when no confidence computation is available

Subtasks

  • Inject AutonomyController into PlanLifecycleService
  • Update should_auto_progress() to compute confidence and use AutonomyController
  • Add BDD scenarios: cautious profile with low confidence does not auto-progress
  • Add BDD scenarios: cautious profile with high confidence does auto-progress

Definition of Done

  • should_auto_progress() uses confidence-based decision making for intermediate thresholds
  • cautious profile correctly gates phase transitions based on computed confidence
  • All new BDD scenarios pass
  • All nox stages pass
  • Coverage >= 97%
  • PR merged

Backlog note: This issue was discovered during autonomous operation
on milestone v3.6.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/plan-lifecycle-confidence-based-auto-progress` - **Commit Message**: `fix(plan-lifecycle): use AutonomyController confidence for should_auto_progress phase transitions` - **Milestone**: *(none — see backlog note below)* - **Parent Epic**: #3370 ## Bug Report ### What Was Tested The `PlanLifecycleService.should_auto_progress()` method was analyzed against the specification's Semantic Escalation and automation profile threshold semantics. ### Expected Behavior (from spec) Per `docs/specification.md` lines 28354, 28362–28364: > Each threshold specifies the minimum confidence score (as computed by the Semantic Escalation system) at which the system proceeds automatically. When the computed confidence for a given operation falls below the profile's threshold for that flag, the system drops to manual mode. For phase transitions: - `create_tool`: "When confidence >= threshold, Execute begins when Strategize completes." - `select_tool`: "When confidence >= threshold, Apply begins when Execute completes." The system should compute a confidence score using the `AutonomyController` (which weighs `past_success_rate`, `codebase_familiarity`, `risk_assessment`, `invariant_complexity`) and compare it against the profile threshold. ### Actual Behavior **`src/cleveragents/application/services/plan_lifecycle_service.py`, lines 2164–2177:** ```python # Strategize complete -> auto-execute? if ( plan.phase == PlanPhase.STRATEGIZE and plan.processing_state == ProcessingState.COMPLETE and profile.create_tool < 1.0 ): return True # Execute complete -> auto-apply? return bool( plan.phase == PlanPhase.EXECUTE and plan.processing_state == ProcessingState.COMPLETE and profile.select_tool < 1.0 ) ``` The method treats any threshold `< 1.0` as "always auto-progress" without computing confidence. This means: 1. A profile with `create_tool=0.5` (meaning "auto-progress when confidence >= 0.5") always auto-progresses regardless of actual confidence 2. The `AutonomyController` (which exists and is implemented) is never consulted for phase transitions 3. Intermediate threshold values (0.1–0.9) have no effect — they behave identically to 0.0 (always automatic) ### Impact - The `cautious` profile's intermediate thresholds (0.6–0.8) are effectively ignored for phase transitions — the system always auto-progresses when using `cautious` profile, defeating the purpose of confidence-gated automation - The `AutonomyController` class is implemented but not used for phase-transition decisions ### Code Location - `src/cleveragents/application/services/plan_lifecycle_service.py` lines 2137–2177 ### Fix Required The `should_auto_progress()` method should: 1. Accept an `AutonomyController` instance (or use one from the service's dependencies) 2. Compute confidence factors for the current plan state 3. Use `AutonomyController.should_proceed_automatically()` to make the decision 4. Only fall back to the binary `< 1.0` check when no confidence computation is available ## Subtasks - [ ] Inject `AutonomyController` into `PlanLifecycleService` - [ ] Update `should_auto_progress()` to compute confidence and use `AutonomyController` - [ ] Add BDD scenarios: `cautious` profile with low confidence does not auto-progress - [ ] Add BDD scenarios: `cautious` profile with high confidence does auto-progress ## Definition of Done - [ ] `should_auto_progress()` uses confidence-based decision making for intermediate thresholds - [ ] `cautious` profile correctly gates phase transitions based on computed confidence - [ ] All new BDD scenarios pass - [ ] All nox stages pass - [ ] Coverage >= 97% - [ ] PR merged > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.6.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:11:59 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#4018
No description provided.