UAT: PlanLifecycleService.should_auto_progress() ignores AutonomyController — phase transitions use simplified threshold check instead of computed confidence score #3333

Open
opened 2026-04-05 10:01:01 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/autonomy-controller-phase-gating
  • Commit Message: fix(autonomy): wire AutonomyController into should_auto_progress for confidence-based phase gating
  • Milestone: v3.5.0
  • Parent Epic: #360

Summary

PlanLifecycleService.should_auto_progress() uses a simplified threshold < 1.0 check to decide whether to auto-advance a plan between phases. The spec requires that phase transitions be gated by comparing a computed confidence score (from AutonomyController) against the profile threshold. The AutonomyController class exists and is fully implemented but is never called during phase transitions.

Expected Behavior (from spec)

Per docs/specification.md lines 28362–28364:

  • decompose_task threshold: "When confidence >= threshold, Strategize begins immediately"
  • create_tool threshold: "When computed confidence >= threshold, Execute begins automatically"
  • select_tool threshold: "When computed confidence >= threshold, Apply begins automatically"

The system should compute a confidence score using AutonomyController.compute_confidence() (which weighs past_success_rate, codebase_familiarity, risk_assessment, and invariant_complexity) and compare it against the profile threshold.

Actual Behavior

PlanLifecycleService.should_auto_progress() (line ~1914 in plan_lifecycle_service.py) uses:

# Strategize complete -> auto-execute?
if (
    plan.phase == PlanPhase.STRATEGIZE
    and plan.processing_state == ProcessingState.COMPLETE
    and profile.create_tool < 1.0  # ← simplified check, ignores confidence
):
    return True

This means:

  1. Any profile with create_tool < 1.0 (e.g., trusted with create_tool=0.0) will ALWAYS auto-progress, regardless of actual confidence
  2. Any profile with create_tool == 1.0 (e.g., manual) will NEVER auto-progress, even if confidence is 1.0
  3. The AutonomyController class (application/services/autonomy_controller.py) is completely unused in the phase-transition path

Code Location

  • src/cleveragents/application/services/plan_lifecycle_service.pyshould_auto_progress() method
  • src/cleveragents/application/services/autonomy_controller.pyAutonomyController class (unused in phase transitions)

Steps to Reproduce

  1. Create a plan with the cautious profile (create_tool=0.7)
  2. Complete the Strategize phase
  3. Observe: plan auto-progresses to Execute regardless of actual confidence score
  4. Expected: plan should only auto-progress if computed confidence >= 0.7

Impact

This is a critical gap in the full autonomy acceptance flow. The spec's confidence-based gating is the core mechanism that distinguishes between different automation profiles. Without it, cautious and trusted profiles behave identically to full-auto for phase transitions.

Subtasks

  • Wire AutonomyController into PlanLifecycleService.should_auto_progress()
  • Compute confidence factors from plan context (past success rate, codebase familiarity, risk assessment, invariant complexity)
  • Compare computed confidence against profile threshold for each phase transition
  • Add Behave scenarios covering confidence-gated phase transitions
  • Ensure nox -e typecheck and nox -e unit_tests pass

Definition of Done

  • should_auto_progress() uses AutonomyController.compute_confidence() to compute a score
  • The computed score is compared against the profile threshold (not just threshold < 1.0)
  • All 8 built-in profiles produce correct auto-progress behavior based on actual confidence
  • Behave scenarios cover the confidence-gated path
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/autonomy-controller-phase-gating` - **Commit Message**: `fix(autonomy): wire AutonomyController into should_auto_progress for confidence-based phase gating` - **Milestone**: v3.5.0 - **Parent Epic**: #360 ## Summary `PlanLifecycleService.should_auto_progress()` uses a simplified `threshold < 1.0` check to decide whether to auto-advance a plan between phases. The spec requires that phase transitions be gated by comparing a **computed confidence score** (from `AutonomyController`) against the profile threshold. The `AutonomyController` class exists and is fully implemented but is never called during phase transitions. ## Expected Behavior (from spec) Per `docs/specification.md` lines 28362–28364: - `decompose_task` threshold: "When confidence >= threshold, Strategize begins immediately" - `create_tool` threshold: "When computed confidence >= threshold, Execute begins automatically" - `select_tool` threshold: "When computed confidence >= threshold, Apply begins automatically" The system should compute a confidence score using `AutonomyController.compute_confidence()` (which weighs `past_success_rate`, `codebase_familiarity`, `risk_assessment`, and `invariant_complexity`) and compare it against the profile threshold. ## Actual Behavior `PlanLifecycleService.should_auto_progress()` (line ~1914 in `plan_lifecycle_service.py`) uses: ```python # Strategize complete -> auto-execute? if ( plan.phase == PlanPhase.STRATEGIZE and plan.processing_state == ProcessingState.COMPLETE and profile.create_tool < 1.0 # ← simplified check, ignores confidence ): return True ``` This means: 1. Any profile with `create_tool < 1.0` (e.g., `trusted` with `create_tool=0.0`) will ALWAYS auto-progress, regardless of actual confidence 2. Any profile with `create_tool == 1.0` (e.g., `manual`) will NEVER auto-progress, even if confidence is 1.0 3. The `AutonomyController` class (`application/services/autonomy_controller.py`) is completely unused in the phase-transition path ## Code Location - `src/cleveragents/application/services/plan_lifecycle_service.py` — `should_auto_progress()` method - `src/cleveragents/application/services/autonomy_controller.py` — `AutonomyController` class (unused in phase transitions) ## Steps to Reproduce 1. Create a plan with the `cautious` profile (`create_tool=0.7`) 2. Complete the Strategize phase 3. Observe: plan auto-progresses to Execute regardless of actual confidence score 4. Expected: plan should only auto-progress if computed confidence >= 0.7 ## Impact This is a critical gap in the full autonomy acceptance flow. The spec's confidence-based gating is the core mechanism that distinguishes between different automation profiles. Without it, `cautious` and `trusted` profiles behave identically to `full-auto` for phase transitions. ## Subtasks - [ ] Wire `AutonomyController` into `PlanLifecycleService.should_auto_progress()` - [ ] Compute confidence factors from plan context (past success rate, codebase familiarity, risk assessment, invariant complexity) - [ ] Compare computed confidence against profile threshold for each phase transition - [ ] Add Behave scenarios covering confidence-gated phase transitions - [ ] Ensure `nox -e typecheck` and `nox -e unit_tests` pass ## Definition of Done - [ ] `should_auto_progress()` uses `AutonomyController.compute_confidence()` to compute a score - [ ] The computed score is compared against the profile threshold (not just `threshold < 1.0`) - [ ] All 8 built-in profiles produce correct auto-progress behavior based on actual confidence - [ ] Behave scenarios cover the confidence-gated path - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.5.0 milestone 2026-04-05 10:02:58 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — PlanLifecycleService ignores AutonomyController, meaning the autonomy guardrails are bypassed during plan progression
  • Milestone: v3.5.0 (already assigned — correct, as autonomy hardening is M6)
  • MoSCoW: Should Have — while this is a significant spec compliance gap, the core plan lifecycle still functions. The AutonomyController integration is important for safety but the system can operate without it in controlled environments. Elevated to Should Have rather than Must Have because the acceptance criteria focus on guard enforcement (covered by #3294) rather than auto-progression specifically.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical — PlanLifecycleService ignores AutonomyController, meaning the autonomy guardrails are bypassed during plan progression - **Milestone**: v3.5.0 (already assigned — correct, as autonomy hardening is M6) - **MoSCoW**: Should Have — while this is a significant spec compliance gap, the core plan lifecycle still functions. The AutonomyController integration is important for safety but the system can operate without it in controlled environments. Elevated to Should Have rather than Must Have because the acceptance criteria focus on guard enforcement (covered by #3294) rather than auto-progression specifically. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.5.0 milestone 2026-04-06 21:05:28 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
Reference
cleveragents/cleveragents-core#3333
No description provided.