feat(autonomy): implement semantic escalation with confidence scoring and threshold comparison #558

Merged
freemo merged 1 commit from feature/m6-semantic-escalation into master 2026-03-04 17:31:35 +00:00
Owner

Summary

Implements the semantic escalation system with confidence scoring as specified in docs/specification.md § Automation & Safety > Semantic Escalation.

Changes

  • AutonomyController class with should_proceed_automatically(operation, context) -> EscalationDecision
  • Confidence scoring from 4 weighted factors: past_success_rate (0.30), codebase_familiarity (0.20), risk_assessment (0.30 inverted), invariant_complexity (0.20 inverted)
  • EscalationDecision model: proceed: bool, confidence: float, factors: dict, explanation: str
  • Historical success tracking: thread-safe per-operation-type success rates for future past_success_rate computation
  • Integration with 8 built-in automation profiles via auto_threshold comparison
  • DI container integration as singleton service

Testing

  • Behave BDD scenarios for confidence computation, threshold comparison per profile, history tracking, escalation explanations
  • Robot Framework integration tests for end-to-end escalation
  • ASV benchmarks for confidence computation throughput

Quality Gates

  • nox -s lint — passes
  • nox -s typecheck — 0 errors (Pyright strict)
  • nox -s unit_tests — 8182 scenarios pass, 0 failures
  • nox -s integration_tests — 10/10 Robot tests pass
  • nox -s coverage_report — >= 97%

Closes #546

## Summary Implements the semantic escalation system with confidence scoring as specified in docs/specification.md § Automation & Safety > Semantic Escalation. ### Changes - **`AutonomyController`** class with `should_proceed_automatically(operation, context) -> EscalationDecision` - **Confidence scoring** from 4 weighted factors: `past_success_rate` (0.30), `codebase_familiarity` (0.20), `risk_assessment` (0.30 inverted), `invariant_complexity` (0.20 inverted) - **`EscalationDecision`** model: `proceed: bool`, `confidence: float`, `factors: dict`, `explanation: str` - **Historical success tracking**: thread-safe per-operation-type success rates for future `past_success_rate` computation - **Integration** with 8 built-in automation profiles via `auto_threshold` comparison - **DI container integration** as singleton service ### Testing - Behave BDD scenarios for confidence computation, threshold comparison per profile, history tracking, escalation explanations - Robot Framework integration tests for end-to-end escalation - ASV benchmarks for confidence computation throughput ### Quality Gates - `nox -s lint` — passes - `nox -s typecheck` — 0 errors (Pyright strict) - `nox -s unit_tests` — 8182 scenarios pass, 0 failures - `nox -s integration_tests` — 10/10 Robot tests pass - `nox -s coverage_report` — >= 97% Closes #546
freemo added this to the v3.5.0 milestone 2026-03-04 04:57:12 +00:00
freemo scheduled this pull request to auto merge when all checks succeed 2026-03-04 15:37:39 +00:00
freemo force-pushed feature/m6-semantic-escalation from 647e69427e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 15s
CI / security (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 35s
CI / unit_tests (pull_request) Successful in 3m13s
CI / docker (pull_request) Successful in 38s
CI / integration_tests (pull_request) Successful in 4m1s
CI / coverage (pull_request) Successful in 4m15s
CI / benchmark-regression (pull_request) Successful in 28m20s
to db5e5c974f
All checks were successful
CI / lint (pull_request) Successful in 14s
CI / typecheck (pull_request) Successful in 31s
CI / security (pull_request) Successful in 30s
CI / quality (pull_request) Successful in 15s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / unit_tests (pull_request) Successful in 1m58s
CI / integration_tests (pull_request) Successful in 3m24s
CI / docker (pull_request) Successful in 40s
CI / coverage (pull_request) Successful in 4m3s
CI / lint (push) Successful in 12s
CI / typecheck (push) Successful in 33s
CI / quality (push) Successful in 15s
CI / security (push) Successful in 30s
CI / unit_tests (push) Successful in 2m7s
CI / build (push) Successful in 15s
CI / benchmark-regression (push) Has been skipped
CI / integration_tests (push) Successful in 2m57s
CI / docker (push) Successful in 43s
CI / coverage (push) Successful in 4m14s
CI / benchmark-publish (push) Successful in 14m12s
CI / benchmark-regression (pull_request) Successful in 31m33s
2026-03-04 17:19:07 +00:00
Compare
freemo merged commit db5e5c974f into master 2026-03-04 17:31:35 +00:00
freemo deleted branch feature/m6-semantic-escalation 2026-03-04 17:31:36 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!558
No description provided.