refactor(autonomy): rename automation profile task flags to spec names #1147
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!1147
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m5-automation-profile-fields"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Renamed all 11 task-type confidence threshold fields in the
AutomationProfilemodel from phase-transition semantics (auto_strategize, auto_execute, etc.) to specification-defined task-type semantics (decompose_task, create_tool, etc.). Updated all 8 built-in profiles, CLI formatting, YAML schema, services, Alembic migration, and all Behave/Robot tests.Changes
m5_001_rename_automation_profile_fieldsfor column renamesQuality Gates
All nox stages pass: lint, typecheck, unit_tests (12,288 scenarios), integration_tests, coverage_report.
Closes #902
Review: APPROVED
Thorough mechanical rename across 42 files (+885/-840 lines, net near-zero). Every reference updated consistently: domain model, DB model +
from_domain/to_domain, repository, CLI commands, YAML profiles, schema, benchmarks, BDD features+steps, Robot tests+helpers. Alembic migration usesbatch_alter_tablefor SQLite compatibility — good practice.Important Note
Alembic migration conflict with PR #1145. Both PRs share the same
down_revision = "m4_003_plan_env_columns". Merging both will create multiple Alembic heads. Coordinate with the #1145 author to determine merge order and rebase accordingly.Minor Notes
YAML profile comments still say "Phase-transition thresholds" / "Decision-autonomy thresholds" / "Self-repair thresholds" — these are semantically wrong after the rename and should be updated to match the new task-type naming.
The semantic mapping between old and new names is non-obvious (e.g.,
auto_reversion_from_apply→access_network). A brief mapping table in the CHANGELOG or migration docstring would aid understanding.Review Scope
2d03b809(Luis) on branchfeature/m5-automation-profile-fieldsdocs/specification.mdIterative Review Cycles
uv run --with behave==1.3.3 python -m behave features/automation_profile_cli.feature features/semantic_escalation.featureuv run --with behave==1.3.3 python -m behave features/consolidated_automation_profile.feature features/consolidated_misc.featureFindings (Ordered by Severity and Category)
CRITICAL — Security/Safety: Legacy keys are silently ignored, producing permissive defaults
AutomationProfile.from_configdirectly callsmodel_validate(src/cleveragents/domain/models/core/automation_profile.py:278) and the model config does not forbid unknown fields (src/cleveragents/domain/models/core/automation_profile.py:307).auto_strategize/auto_execute/auto_applyare now unknown and silently dropped; threshold fields remain at default0.0.extra='forbid') and/or add explicit legacy-key alias/migration with warnings.HIGH — Spec/Contract Mismatch: Current implementation diverges from
docs/specification.mdauto_*names (docs/specification.md:28329,docs/specification.md:28371).src/cleveragents/domain/models/core/automation_profile.py:74,src/cleveragents/cli/commands/automation_profile.py:95,docs/schema/automation_profile.schema.yaml:25).docs/specification.mdwill provide keys that runtime now ignores (see critical finding), causing unsafe behavior and acceptance-criteria ambiguity.docs/specification.mdin lockstep with this PR, or keep backward-compatible aliases until spec docs are migrated.MEDIUM — Test Flaw/Coverage Gap: No regression guard for unknown/legacy threshold keys
features/steps/automation_profile_cli_steps.py:52) but do not assert rejection/migration of legacyauto_*keys.automation-profile addeither rejects legacy/unknown keys explicitly or validates a controlled compatibility path.Review Outcome
I found no additional issues after repeated global review cycles; the three findings above are the complete set for this branch scope.
Review Scope
8fe537c(Luis / CoreRasurae) on branchfeature/m5-automation-profile-fieldsdocs/specification.md2d03b809(the reviewed local commit is not yet the PR head)Iterative Global Cycles Performed
Findings (Ordered by Severity, then Category)
MEDIUM — Contract Correctness
Schema now rejects valid
guardsprofile configsdocs/schema/automation_profile.schema.yaml:8setsadditionalProperties: false, but the schema does not define a top-levelguardsproperty.guardsviaAutomationProfile.guards(src/cleveragents/domain/models/core/automation_profile.py:156).guards, but JSON Schema validation fails with:Additional properties are not allowed ('guards' was unexpected).Impact
Recommended fix
guardsobject definition todocs/schema/automation_profile.schema.yaml(preferred), or relax top-leveladditionalPropertiesuntil schema/runtime parity is restored.LOW — Test Coverage Gap
No regression guard for schema/runtime parity on
guardsImpact
Recommended fix
guardsagainst the published schema and runtime parser.LOW — Migration Documentation Consistency
Migration header text and actual revision chain differ
alembic/versions/m5_001_rename_automation_profile_fields.py:8saysRevises: a6_001_automation_profiles, while metadata usesdown_revision = "m4_003_plan_env_columns"(alembic/versions/m5_001_rename_automation_profile_fields.py:19).Impact
Recommended fix
down_revisionmetadata.Outcome
After repeated full-category review cycles, these are the complete findings for the requested scope.
2d03b8098d66294cac56New commits pushed, approval review dismissed automatically according to repository settings
Review: refactor(autonomy): rename automation profile task flags to spec names
Issues Requiring Resolution
1. Semantic mismatch between old and new field names (High)
The rename mapping maps old phase-transition concepts to completely unrelated task-type concepts. For example:
auto_reversion_from_apply(reverting applied changes) →access_network(network access permission)auto_child_plans(child plan spawning) →install_dependency(dependency installation)The threshold values in the YAML profiles were tuned for the old semantics but now have names suggesting different operations. For example, the
autoprofile hadauto_reversion_from_apply: 1.0(always require human for reversion), now it'saccess_network: 1.0(always require human for network access). These are conceptually different things. This needs clarification — is this an intentional semantic redesign or an error in the mapping?2. No backward-compatible aliases — breaking change (High)
If any external tooling, stored YAML files, or database records reference the old field names (
auto_strategize, etc.), they will break silently. The Pydantic model should useField(alias="auto_strategize")ormodel_config = ConfigDict(populate_by_name=True)with aliases for a graceful transition period.3. Alembic migration conflict — merge blocker (High)
down_revision = "m4_003_plan_env_columns"— this is the same parent as PRs #1074 and #1145. Three PRs branching from the same migration parent will cause an Alembic "multiple heads" error. These must be coordinated: first PR merged keeps itsdown_revision, subsequent PRs must be rebased.4. Stale YAML profile comments (Medium)
Profile YAML comments still reference "Phase-transition thresholds" / "Decision-autonomy thresholds" groupings, but the new field names no longer map to phases. Comments should be updated to reflect the new task-type semantics.
5. Branch naming inconsistency (Low)
Branch
feature/m5-automation-profile-fieldsusesfeature/prefix but the commit type isrefactor. Should berefactor/m5-automation-profile-fieldsfor consistency.What's Good
batch_alter_tablefor SQLite compatibility.Required Before Merge
down_revisionwith PRs #1074 and #1145Code Review Report — PR #1147: Rename automation profile task flags to spec names
Reviewer: Automated review (3 full cycles)
Scope: All code changes in
feature/m5-automation-profile-fieldsplus close surrounding contextReferences: Issue #902,
docs/specification.md(lines 28312–28476, 35517–35636)Executive Summary
The rename of 11 automation profile threshold fields from phase-transition semantics to spec-defined task-type semantics is mechanically correct — all field mappings, all 8 built-in profile threshold values, and the Alembic migration mapping are consistent with the specification. However, the review identified 13 issues across 7 categories, including 1 critical issue (stale fixture files that will break M6 tests) and 4 major issues that impact maintainability or test coverage.
Spec compliance: All 11 field names match the specification. All 8 built-in profile threshold values match the specification table (lines 28369–28381). Safety profile values are unchanged. The YAML schema, CLI output, and example profiles are all aligned.
Findings by Severity
CRITICAL (must fix before merge)
C1 — M6 fixture files still use ALL old field names (24 occurrences)
features/fixtures/m6/automation_profiles.jsonauto_strategize,auto_execute,auto_apply,auto_decisions_strategize,auto_decisions_execute(12 occurrences acrossmanual_thresholds,full_auto_thresholds,custom_profile.config)features/fixtures/m6/autonomy_guardrails.jsonauto_strategize,auto_execute,auto_apply(12 occurrences across 4 guard profiles:denylist_guard,allowlist_guard,cost_budget_guard,write_approval_guard)Impact: Any M6 acceptance test that loads these fixtures into an
AutomationProfilewill fail withValidationErrorbecause the model now hasextra="forbid"(line 310 ofautomation_profile.py). Even if current tests only use these as raw JSON data, they represent incorrect documentation of the schema and will break when future tests construct profile objects from them.Fix: Rename all 24 old field references to the new spec-defined names.
MAJOR (should fix before merge)
M1 — Phase-transition logic uses task-type field names without explanatory comments
The
plan_lifecycle_service.pyfile uses 4 task-type threshold fields to control phase transitions and reversions. The new field names are semantically unrelated to the behavior they control, making the code opaque to future contributors:profile.create_toolprofile.select_toolprofile.access_networkprofile.delete_contentThe old field names (
auto_execute,auto_apply,auto_reversion_from_apply,auto_strategy_revision) were self-documenting. The new names require a semantic bridge.Fix: Add a mapping comment block at the top of the auto-progression section:
M2 —
extra="forbid"with no migration path for existing YAML configsFile:
automation_profile.py:310Adding
extra="forbid"to the Pydanticmodel_configmeans any user YAML config with old field names will be hard-rejected with Pydantic's generic error: "Extra inputs are not permitted". The BDD test atautomation_profile_cli.feature:49-53confirms the rejection works, but the error message provides no guidance on what to rename fields to.Impact: Users with custom automation profile YAML files created before this change will get an opaque error. This is a breaking change with no deprecation path.
Fix (recommended): Add a
@model_validator(mode="before")that detects old field names and raises aValueErrorwith the mapping, e.g.: "Field 'auto_strategize' was renamed to 'decompose_task'. Please update your YAML config."M3 — No database migration test (upgrade or downgrade)
File:
alembic/versions/m5_001_rename_automation_profile_fields.pyThe migration renames 11 columns in
automation_profiles. There is no test anywhere infeatures/,tests/,robot/, orbenchmarks/that:Impact: A subtle migration bug (e.g., data loss during SQLite table recreation in
batch_alter_table) would go undetected.M4 — Built-in profile tests only spot-check a subset of 11 thresholds
File:
features/consolidated_automation_profile.feature6 of 8 built-in profile scenarios test fewer than half of the 11 threshold fields:
Impact: A regression in any non-asserted threshold value (e.g., wrong mapping during rename) would go undetected. For profiles with trivial patterns (all 1.0 or all 0.0), exhaustive assertions are trivial to add and would provide strong regression protection.
MINOR (nice-to-have improvements)
m1 — Stale BDD step function names use old field semantics
File:
features/steps/automation_profile_steps.py:~130-1728 step function names retain old naming:
step_check_auto_val_fix(checkscreate_file),step_check_auto_strat_rev(checksdelete_content),step_check_auto_rev_apply(checksaccess_network), etc. Functionally correct but misleading for maintenance.m2 — Stale BDD step description text
features/automation_profile_cli_coverage.featuredecompose_task,create_tool,select_toolfeatures/steps/automation_profile_cli_coverage_steps.py@thendecoratorm3 — Dead check in Robot test
File:
robot/e2e/m6_acceptance.robot:96— checks for stringsafety_profilein output, but the actual model field issafety. This is a dead check in an OR chain — it never matches.m4 — Structured log messages lack field-name context
File:
plan_lifecycle_service.py:1731-1735, 1779-1783Log messages emit
threshold=<value>without indicating which field was checked. Debugging reversion issues from logs requires code inspection. Addingthreshold_field="access_network"(etc.) to log kwargs would improve observability.m5 — No idempotency guard on migration column rename
File:
m5_001_rename_automation_profile_fields.py:42-43batch_op.alter_column(old_name, new_column_name=new_name)will error ifold_namedoesn't exist (e.g., after partial failure or on a fresh DB created withBase.metadata.create_all()). Standard Alembic practice, but a column-existence check would make it more robust.NOTE (informational / pre-existing)
n1 —
autoprofile description is imprecise (pre-existing)File:
automation_profile.py:436— Description says "Fully automatic except reversion" but bothselect_tool=1.0(Apply gate) andaccess_network=1.0(Apply reversion gate) require human approval. This was already imprecise in the old code (auto_apply=1.0+auto_reversion_from_apply=1.0); the rename makes the imprecision more visible because the new field names don't convey the apply/reversion relationship. A more accurate description: "Fully automatic except apply and reversion".n2 — CLI list table headers lost phase-transition context
File:
automation_profile.py:407-409— Column headers changed from "Strategize / Execute / Apply" to "Decompose / Create Tool / Select Tool", which no longer convey the phase-transition semantics. This is by design per the spec but impacts discoverability for users familiar with the old UI.n3 —
float(cast(float, row.xxx))is stylistically noisy (pre-existing)File:
repositories.py:4453-4463—typing.castis a no-op at runtime; thefloat()wrapping is sufficient. Not a performance issue but adds visual noise.Verification Checklist
automation_profile.schema.yaml)src/Python code__init__.pyexportsAutonomyControlleris field-name-agnostic (getattr)Summary Table
fixtures/m6/*.jsonplan_lifecycle_service.pyautomation_profile.py:310extra="forbid"with no migration pathm5_001_rename_...pyconsolidated_automation_profile.featureautomation_profile_steps.pyautomation_profile_cli_coverage.feature/.pym6_acceptance.robot:96safety_profilecheckplan_lifecycle_service.pym5_001_rename_...pyautomation_profile.py:436autoprofile description imprecise (pre-existing)automation_profile.py:407-409repositories.py:4453float(cast(float, x))patternRecommendation: Fix C1 (critical) and M1–M4 (major) before merge. Minor and Note items can be addressed as follow-up.
66294cac56def66513bfCode Review Report — PR #1147
Branch:
feature/m5-automation-profile-fieldsCommit:
def66513(CoreRasurae)Issue: #902 — rename automation profile task flags to spec names
Reviewer scope: All code changes in the branch + close surrounding code
Review cycles completed: 3 full passes across all categories
Executive Summary
The rename of all 11 automation profile threshold fields from phase-transition semantics to spec-defined task-type semantics is mechanically correct. All field mappings match the specification table (§28369–28381), all 8 built-in profiles preserve their exact threshold values, and the codebase-wide stale reference search found zero missed renames. The Alembic migration is correct and safely reversible. No security or performance issues were found.
The findings below center on documentation/spec alignment (the primary concern), test coverage gaps, and one CLI output format regression.
Findings by Severity
HIGH — Documentation: Missing Bridge Documentation (5 findings)
These are the most impactful issues. The task-type field names (
delete_content,access_network, etc.) are now used throughout ADRs, reference docs, and the spec to describe phase-transition behaviors (Execute→Strategize reversion,Apply→Strategize reversion). No documentation anywhere explains why a task-type name controls a phase-transition behavior.automation_profiles.md. Without it, readers encounteringdelete_contentas a "reversion gate" have no path to understanding.docs/adr/ADR-006-plan-lifecycle.md:105,107delete_contentdescribed as "Execute→Strategize reversion" gate;access_networkdescribed as "Apply→Strategize reversion" gate. No semantic link between the field name and the behavior it controls.docs/adr/ADR-017-automation-profiles.md:60–66create_tool→ "transition from Strategize to Execute").docs/reference/automation_profiles.md:17–29create_file→ "Gate for automatic validation fixes",install_dependency→ "Gate for spawning child plans",modify_config→ "Gate for retrying transient failures").docs/reference/phase_reversion.md:20–21access_networkanddelete_contentas column headers for reversion gates with no explanation.Recommendation: Add a "Field Naming Convention" section to ADR-017 and
automation_profiles.mdwith a full 3-column mapping table:Field Name|Spec Task Type|Runtime Behavior Controlled. Add parenthetical notes inphase_reversion.md,error_recovery.md, and ADR-006 referencing this mapping.MEDIUM — Bugs & Spec Mismatch (5 findings)
src/.../cli/commands/automation_profile.py:148showoutput format regressed. The specification (docs/specification.md:17104–17127) defines categorized output groups ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls") forautomation-profile show. The code now outputs all 11 thresholds under a single flat header"Task-Type Confidence Thresholds". The old code had categories matching the spec. This is a regression introduced by this PR.docs/specification.md:17104–17230access_networkmissing from specshowoutput examples. All 4 format variants (Rich, Plain, JSON, YAML) of theautomation-profile showoutput display only 10 of 11 threshold fields.access_networkis not shown in any category group. (Pre-existing: was also missing asauto_reversion_from_applybefore the rename.)docs/schema/automation_profile.schema.yaml:26–94delete_content) but do not describe the actual runtime behavior (strategy revision gate). A user writing a custom profile YAML would not know thataccess_network: 1.0means "always require manual approval for Apply→Strategize reversion."docs/adr/ADR-017-automation-profiles.md:58–67create_file,install_dependency,modify_config,approve_plan.features/steps/repositories_uncovered_lines_steps.py:1501–1512decompose_task=0.9,create_tool=0.8, ...,approve_plan=0.15) but only assertsdecompose_task,create_tool, andselect_toolafter roundtrip. A serialization regression for the other 8 fields (e.g.,edit_code,access_network) would pass silently.LOW — Minor Improvements (5 findings)
features/steps/automation_profile_cli_steps.py:86–93_LEGACY_THRESHOLD_YAMLonly includes 3 of 11 legacy names (auto_strategize,auto_execute,auto_apply). Since the validator uses a simple dict lookup, a typo in any of the 8 untested map entries would go undetected.features/steps/plan_lifecycle_service_coverage_r2_steps.py:291–296, 374–379access_network=0.0anddelete_content=0.0(used as rationale for selectingciin reversion tests), but the CI profile BDD test inconsolidated_automation_profile.featuredoes not assert these specific values.docs/reference/error_recovery.md:38modify_configdescribed as controlling the "retry policy" threshold. No explanation of why a config-modification task type gates transient error retry.docs/reference/plan_execute.md:298modify_configdescribed as controlling "retry limits" without bridge explanation.docs/adr/ADR-007-decision-tree-and-correction.md:118–122edit_codegates all strategize-phase decisions (strategy_choice, resource_selection, subplan_spawn, invariant enforcement), not just code editing decisions. The old nameauto_decisions_strategizeclearly conveyed the broader scope.What Was Verified Clean
src/,features/,robot/,benchmarks/,docs/,examples/(excluding intentional references in migration, legacy map, and rejection tests)batch_alter_tablecorrect for SQLite; downgrade correctly reverses; no data loss riskreject_legacy_field_namescorrectly detects old keys and produces actionable error with rename mappingextra="forbid"on model +additionalProperties: falseon schema aligned; no injection or mutation risks in validator_to_domainand_to_model/_update_modelRecommended Actions
_print_profileto match the specification, OR update the specification output examples to reflect the new flat format. One or the other must be aligned.access_networkto the spec'sshowoutput examples in all 4 format variants.automation_profiles.md. Add brief parenthetical explanations in ADR-006, phase_reversion.md, error_recovery.md, and plan_execute.md.def66513bf4dbb6ef97eCode Review Report: PR #1147 — Rename Automation Profile Task Flags
Reviewer scope: All code changes in
feature/m5-automation-profile-fieldsbranch plus close connections to surrounding code.Reference: Issue #902,
docs/specification.md(Automation Profiles section, lines 28316-28485).Method: 4 full review cycles covering bugs, logic errors, semantic mismatches, security, test coverage, test flaws, performance, migration safety, and edge cases. Each cycle examined all categories globally until no new issues were found.
Overall Assessment
The core rename of all 11 task-type threshold fields from phase-transition semantics to spec-defined task-type semantics is correctly implemented. All 8 built-in profile threshold values match the specification table exactly. The semantic mapping in
PlanLifecycleServicephase-transition gates (create_toolfor Strategize->Execute,select_toolfor Execute->Apply,access_networkfor Apply reversion,delete_contentfor Execute reversion) is correct per spec. The legacy field detection validator and migration are well-designed. Documentation, fixtures, schema, and example profiles are all properly updated with no stale old field names remaining.However, 8 issues were identified that should be addressed before merge.
Findings by Severity
BUG -- Medium-High
B1. Benchmark
_make_profile()crashes due toextra="forbid"File:
benchmarks/automation_profile_bench.py:54-56Category: Bug
Impact: All ASV benchmark runs that call
_make_profile()will fail withValidationError.The helper passes
require_sandbox,require_checkpoints,allow_unsafe_toolsas top-level kwargs toAutomationProfile(). These areSafetyProfilesub-model fields, notAutomationProfilefields. Sinceextra="forbid"was added toAutomationProfile.model_config(line 355 ofautomation_profile.py), Pydantic rejects unknown top-level keys.This affects
ProfileValidationSuite.time_profile_construction(line 65) andProfileSerializationSuite.setup(line 94).SPEC COMPLIANCE -- Medium
S1. CLI JSON/YAML output uses flat structure; spec requires nested categorized grouping
File:
src/cleveragents/cli/commands/automation_profile.py:83-118Category: Spec compliance
Impact: JSON and YAML output from
agents automation-profile show --format json|yamldoes not match the specification's documented format.The
_profile_spec_dictfunction returns a flat dict with all 11 threshold fields as top-level keys. The specification (lines 17179-17200) defines a nested JSON structure with four category groupings:Additionally, the spec's
execution_controlsgroup mergesinstall_dependencywith safety fields, while the code keepssafetyas a separate nested dict. The Rich output (_print_profile) correctly uses the four category headers, but the machine-readable formats (JSON, YAML) do not.Note: This may be a pre-existing gap if the flat format preceded the spec update, but since the spec was updated in this PR (574 lines changed in
specification.md), the code should match the new spec.ROBUSTNESS -- Medium
R1.
_get_threshold()silently degrades to always-manual on unknown operation typesFile:
src/cleveragents/application/services/autonomy_controller.py:301-321Category: Robustness / Observability
Impact: If any component passes an old operation_type string (e.g.,
"auto_execute"instead of"create_tool"), thehasattrcheck returnsFalseand the method silently returns1.0(always-manual). No error, no warning, no log entry.Suggestion: Add a
logger.warning()on the fallback path, or validateoperation_typeagainstAutomationProfile._THRESHOLD_FIELDS.R2. Migration lacks idempotency guard
File:
alembic/versions/m5_001_rename_automation_profile_fields.py:43-50Category: Migration safety
Impact:
upgrade()will fail on databases that already have the new column names (schema drift, re-run scenarios, partial migrations). Neitherupgrade()nordowngrade()checks whether source columns exist before renaming.Suggestion: Inspect
inspector.get_columns("automation_profiles")before renaming, or wrap eachalter_columnin a try/except with appropriate handling.R3. Repository accesses renamed columns without fallback guard
File:
src/cleveragents/infrastructure/database/repositories.py:4453-4463Category: Operational safety
Impact: If the database migration hasn't been applied, accessing
row.decompose_task(etc.) raisesAttributeErrorat runtime with no recovery path. The file already useshasattrguards forsafety_jsonandguards_jsoncolumns (lines 4428, 4443) -- the same pattern should be applied for the renamed threshold columns.TEST COVERAGE -- Low
T1. Legacy field rejection test covers only 3 of 11 old names
File:
features/steps/automation_profile_cli_steps.py:86-93Category: Test coverage
Impact:
_LEGACY_THRESHOLD_YAMLonly includesauto_strategize,auto_execute,auto_apply. The remaining 8 old field names are not explicitly tested for rejection. While thereject_legacy_field_namesvalidator catches any key in_LEGACY_FIELD_MAP, the test only verifies the mechanism works for 3 of the 11 legacy names.Suggestion: Either add the remaining 8 fields to the fixture, or add a parameterized test that iterates all 11 legacy names.
T2. CLI show tests don't assert category grouping headers
File:
features/automation_profile_cli.feature:100-104Category: Test coverage
Impact: The
automation-profile showtests verify field names and values appear in output but do not assert that the category headers ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls") are present in Rich output. If the category grouping regresses to a flat list, tests would still pass.DESIGN -- Low (Informational)
D1.
extra="forbid"without schema_version-aware relaxationFile:
src/cleveragents/domain/models/core/automation_profile.py:355Category: Forward compatibility
Impact: The
schema_versionfield (line 85) exists for forward compatibility, but there is no conditional logic that relaxes validation based on the version. YAML configs from a newer schema version that add new fields will be rejected by older code. This is a deliberate fail-fast design choice, but worth noting for future extensibility planning.Positive Observations
PlanLifecycleServiceis semantically correct per spec._LEGACY_FIELD_MAP+model_validator(mode="before")provides excellent UX for config migration -- actionable error messages listing the exact renames needed.extra="forbid"catches typos and unknown fields early, which is a good defensive choice.Summary Table
benchmarks/automation_profile_bench.py:54-56extra="forbid"cli/commands/automation_profile.py:83-118autonomy_controller.py:301-321_get_threshold()silently degrades on unknown operation typesm5_001_rename_automation_profile_fields.py:43-50repositories.py:4453-4463automation_profile_cli_steps.py:86-93automation_profile_cli.feature:100-104automation_profile.py:355extra="forbid"without schema_version-aware relaxationRecommendation: Fix B1 (benchmark crash) and address S1 (spec compliance) before merge. R1-R3 are recommended improvements. T1, T2, D1 are informational observations for future hardening.
Review Findings: Fix Application Report
Fixes Applied (amended into feature commit)
B1 -- Benchmark
_make_profile()crash (FIXED)File:
benchmarks/automation_profile_bench.py:54-56Fix: Wrapped
require_sandbox,require_checkpoints,allow_unsafe_toolsin aSafetyProfile(...)sub-model passed via thesafety=parameter, replacing the broken top-level kwargs that are rejected byextra="forbid".T1 -- Legacy rejection test fixture (IMPROVED)
File:
features/steps/automation_profile_cli_steps.py:86-101Fix: Extended
_LEGACY_THRESHOLD_YAMLfixture to include all 11 legacy field names instead of only 3, ensuring the legacy rejection validator is exercised against the complete set.CHANGELOG updated
Added the benchmark fix to the existing #902 changelog entry.
All quality gates pass: lint (0 errors), typecheck (0 errors, 1 pre-existing warning), unit_tests (12,293 scenarios, 0 failures), integration_tests (1,699 tests, 0 failures).
Findings NOT Applied (with justification)
S1 -- CLI JSON/YAML flat vs nested structure
Decision: Out of scope for issue #902.
Justification: Issue #902 is scoped to "rename automation profile task flags to match specification field names" -- a field name change, not an output structure change. The flat JSON/YAML output format is pre-existing behavior that was not introduced by this PR. While the specification (updated in this PR) now shows a nested JSON format with
phase_transitions/decision_automation/self_repair/execution_controlsgroupings, restructuring the output would be a new feature requiring its own issue, acceptance criteria, and separate testing. The Rich output already correctly uses the four category headers. Recommendation: File a separate issue for JSON/YAML output restructuring.R1 --
_get_threshold()silent degradationDecision: Out of scope for issue #902.
Justification: The
_get_thresholdmethod inautonomy_controller.pyhad its docstring updated (the only change from this PR), but its silentgetattr()fallback to1.0is pre-existing behavior. Adding alogger.warning()or field validation would introduce new functionality beyond the rename scope. Per CONTRIBUTING.md: "One logical change per commit."R2 -- Migration idempotency guard
Decision: Out of scope for issue #902.
Justification: Standard Alembic practice relies on the version table to track migration state rather than embedding idempotency guards in individual migrations. The
batch_alter_table+alter_columnpattern used is the canonical approach for SQLite column renames. Adding introspection guards would be non-standard and beyond the rename scope.R3 -- Repository
hasattrfallback for renamed columnsDecision: Out of scope for issue #902.
Justification: The pre-existing code also accessed the old column names (
auto_strategize, etc.) withouthasattrguards. The repository assumes the database schema matches the ORM model -- this is standard SQLAlchemy practice. Adding fallback guards for unmigrated databases would be a defensive improvement beyond the rename scope.T2 -- CLI show tests missing category header assertions
Decision: Out of scope (non-blocking).
Justification: The code correctly produces the categorized Rich output ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls"). The existing tests verify the correct field names and values appear in output. Adding header assertions would be a test-strengthening improvement, not a bug fix.
D1 --
extra="forbid"without schema_version-aware relaxationDecision: Design choice, not actionable.
Justification: The fail-fast behavior of
extra="forbid"is intentional -- it catches typos and unknown fields at config load time. Future schema versions with new fields can be handled by bumping theschema_versionand adding the fields to the model, which is the standard evolution path. No spec requirement mandates version-aware relaxation.4dbb6ef97e53c8d0d86cCode Review Report — PR #1147 (Issue #902)
Refactor: Rename automation profile task flags to spec names
Branch:
feature/m5-automation-profile-fieldsCommit:
53c8d0d8by CoreRasurae (Luis Mendes)Reviewer: Automated multi-pass review (3 global cycles across all categories)
Summary
This is a large, well-executed renaming refactor touching 56 files across the domain model, services, CLI, database layer, migrations, tests, benchmarks, and documentation. All 11 task-type threshold fields have been renamed from phase-transition semantics (
auto_strategize,auto_execute, etc.) to spec-defined task-type semantics (decompose_task,create_tool, etc.). The rename is consistent across all layers.Verified correct:
PlanLifecycleServiceare semantically correct_to_domain,_from_domain,_update_row) is consistentextra="forbid"SafetyProfilesub-model (fixesextra="forbid"incompatibility)Findings by Severity
MEDIUM — Spec Deviation
F1. CLI JSON/YAML output structure does not match spec's grouped format
src/cleveragents/cli/commands/automation_profile.py—_profile_spec_dict()(line ~92)automation-profile showwith thresholds grouped into nested objects (phase_transitions,decision_automation,self_repair,execution_controls). The code produces a flat structure with all 11 threshold fields at the top level alongside metadata._profile_spec_dict()to produce the grouped structure when called fromshow, or document the deviation. The flat structure is arguably simpler for YAML round-trip, but the spec is explicit.F2. CLI Rich
showoutput places safety fields in a separate section instead of within "Execution Controls"src/cleveragents/cli/commands/automation_profile.py—_print_profile()(line ~145)require_sandbox,require_checkpoints, andallow_unsafe_toolsinside the "Execution Controls (thresholds)" section alongsideinstall_dependency. The code puts them in a separate "Safety Profile (Composed Sub-Model)" section.LOW — Spec Clarification Needed
F3. "auto" profile description diverges from spec text
src/cleveragents/domain/models/core/automation_profile.py(line ~478)description="Fully automatic except apply and reversion". The spec (line 28406) says "Fully automatic except apply". However, the code description is arguably more accurate since the "auto" profile has bothselect_tool=1.0(Apply manual) andaccess_network=1.0(reversion from Apply manual). The spec's threshold table confirms both are 1.0.F4.
OperationContext.operation_typeaccepts arbitrary strings — no validation against known threshold fieldssrc/cleveragents/domain/models/core/escalation.py(line ~99),src/cleveragents/application/services/autonomy_controller.py—_get_threshold()(line ~300)operation_typeis a free-formstrwith onlymin_length=1validation._get_threshold()usesgetattr(profile, operation_type)with a fallback to1.0. This means:"creat_tool") silently defaults to 1.0 (always escalate)"auto_execute") silently defaults to 1.0warnings.warn()in_get_threshold()when the operation_type does not match a known_THRESHOLD_FIELDSentry, to aid debugging without breaking the safe-fail behavior.LOW — Documentation / Readability
F5. Bridge comments in test steps lost semantic context after rename
features/steps/plan_lifecycle_service_coverage_r2_steps.py(lines ~290, ~373)# ci profile has auto_reversion_from_apply=0.0to# ci profile has access_network=0.0. The old comments explained the semantic role (permits auto-reversion from apply). The new comments only state the field name without explaining what it controls.# ci profile has access_network=0.0 (auto-revert from Apply permitted).INFO — Observations (No Action Required)
F6.
extra="forbid"is a breaking change for custom YAML configs with extra fieldssrc/cleveragents/domain/models/core/automation_profile.py—model_configextra="forbid"means any previously-tolerated extra fields in custom YAML configs will now cause validation errors. This is correct for strict typing but is a behavioral change. The legacy validator provides clear errors for known old names; unknown extra fields get a generic Pydantic error.docs/schema/automation_profile.schema.yaml) was also updated withadditionalProperties: false, maintaining consistency between schema and model.F7.
_get_threshold()dynamic attribute lookup is resilient to non-threshold attributesisinstance(value, (int, float))guard in_get_threshold()correctly prevents non-numeric attributes (e.g.,name,description,safety) from being returned as thresholds. No security concern.F8. Alembic migration uses
batch_alter_table— correct for SQLitebatch_alter_tableis the right pattern for SQLite (which historically lackedALTER TABLE RENAME COLUMN). The migration correctly handles both upgrade and downgrade paths.Test Coverage Assessment
create_toolSafetyProfilesub-modelNo test coverage gaps identified for the changes in scope.
Acceptance Criteria Verification (Issue #902)
automation-profile showdisplays spec names_resolve_profile_for_planunchanged, tests pass_get_thresholdcompatible viagetattrVerdict
This is a thorough, well-structured refactoring. The core rename is correct and consistent across all layers. The findings above are primarily spec-alignment refinements (CLI output grouping) and defensive-coding suggestions (operation_type validation), none of which block merge. The two medium findings (F1, F2) about CLI output structure vs spec deserve attention but may be deferred to a follow-up issue if the current flat structure is preferred for technical reasons.
53c8d0d86cbd277b9469Code Review Report — PR #1147 (Issue #902)
refactor(autonomy): rename automation profile task flags to spec names
Branch:
feature/m5-automation-profile-fields| Author: Luis Mendes (@CoreRasurae)Review scope: All code changes in the branch + close connections to surrounding code
Review cycles: 3 full passes (bugs, security, performance, test coverage, test quality, documentation)
Executive Summary
The rename of all 11 automation profile threshold fields is mechanically correct — all 88 built-in profile threshold values match the specification table exactly, the Alembic migration is SQLite-safe and reversible, the legacy field validator is comprehensive, and no stale references to old
auto_*field names remain in production code. Theextra="forbid"onAutomationProfileprovides a strong safety net against typos.The main areas of concern are: (1) documentation inconsistencies introduced or carried forward by the PR across ADRs and reference docs, (2) test coverage that only verifies 3 of 11 threshold fields in most assertion paths, and (3) a processing-state guard asymmetry between the two reversion methods.
No security issues found. YAML loading uses
safe_load, the migration uses parameterized Alembic API calls, and the model rejects unknown fields.Findings by Severity
HIGH — Documentation Inconsistencies (within PR scope)
H1. ADR-017 built-in profiles summary text is inaccurate
docs/adr/ADR-017-automation-profiles.md:98-101Three profile rows have misleading "Key Settings" summaries:
supervisedselect_tool: 1.0, others lowcreate_tool,execute_command,create_file,delete_content,access_network,install_dependency,approve_plan)trustedselect_tool,delete_content,access_network,approve_plan)autoselect_tool=1.0,access_network=1.0)These summaries predate the rename but were not corrected when the field names were updated.
H2. Default profile name inconsistent across 3 documents
docs/reference/automation_profile_service.md:19manualdocs/adr/ADR-017-automation-profiles.md:126superviseddocs/reference/automation_profiles.md:95reviewThe code truth (
_DEFAULT_PROFILE = "manual") matches only the service doc. The other two are wrong.H3. Resolution precedence in
automation_profiles.mdis wrongdocs/reference/automation_profiles.md:90-95The doc describes: Plan → Project → Organization → System default (
review).The code and spec use: Plan → Action → Project → Global default (
manual).The Action level is entirely missing and "Organization" is fabricated.
MEDIUM — Code Quality
M1.
try_auto_revert_from_executemissing processing-state guardsrc/cleveragents/application/services/plan_lifecycle_service.py:1787try_auto_revert_from_apply(line 1737) guards onplan.processing_state != ProcessingState.CONSTRAINEDbefore proceeding. The symmetrictry_auto_revert_from_executeonly checksplan.phase != PlanPhase.EXECUTE— it will attempt reversion on a plan in any processing state (QUEUED, PROCESSING, COMPLETE, ERRORED). A caller could invoke this on a plan that is happily PROCESSING in Execute and it would revert it. Consider whether a processing-state guard (e.g., requiring ERRORED or CONSTRAINED) is needed here.M2. Schema sub-objects missing
additionalProperties: falsedocs/schema/automation_profile.schema.yaml:97-130, 132-174The top-level schema correctly has
additionalProperties: false(matchingextra="forbid"on the Pydantic model), but theguardsandsafetysub-objects lack this constraint. This creates an asymmetry where the top level strictly rejects unknown fields but sub-objects silently accept them.M3.
autoprofile description omitsaccess_network=1.0gateThree locations describe the
autoprofile inaccurately:automation_profile.py:481access_network=1.0(Apply→Strategize reversion also manual)examples/profiles/auto.yaml:6access_network=1.0The spec's own prose contradicts its own table. The
auto.yamlexample is actually the most accurate of the three but diverges from the code's description string.MEDIUM — Test Coverage Gaps
M4. Escalation test coverage: only 1 of 11 operations tested
features/semantic_escalation.feature:37-51The Scenario Outline evaluates all 8 profiles but only for
operation "create_tool". The other 10 task-type operations (decompose_task,select_tool,edit_code,execute_command,create_file,delete_content,access_network,install_dependency,modify_config,approve_plan) are never tested against profile thresholds. Ifgetattr(profile, op, 1.0)fails for a renamed field, that bug is invisible.M5. Legacy validator test only checks 1 of 11 keys
features/automation_profile_cli.feature:49-53The "Add profile with legacy threshold keys fails" scenario only asserts
auto_strategizeappears in the error output. It does not verify: the error shows the correct new name (decompose_task), all 11 legacy keys are detected, or the mapping advice is present. A validator that catches only one legacy key and silently ignores the other 10 would pass this test.M6. No boundary-value tests for 10 of 11 fields
features/consolidated_automation_profile.featureOnly
decompose_taskgets full boundary treatment (0.0, 0.5, 1.0 accepted; -0.1 and 1.1 rejected). The other 10 fields get single-sided rejection tests only. For example:approve_planonly tested for< 0.0, not> 1.0select_toolonly tested for< 0.0, not> 1.0modify_configonly tested for> 1.0, not< 0.0M7. No test for mixed old + new field names
No scenario tests what happens when a config contains BOTH old and new names (e.g.,
auto_strategize: 0.5ANDdecompose_task: 0.8). The model_validator checks for legacy keys invaluesbut doesn't handle the mixed case explicitly.M8. Robot/service-level tests validate only 3 of 11 thresholds
robot/helper_automation_profile.py:35-37—_test_profile()asserts range only fordecompose_task,create_tool,select_toolfeatures/steps/automation_profile_service_steps.py— assertion steps exist only forselect_tool,decompose_task,create_toolrobot/e2e/m6_acceptance.robot:170-172— show test checks only 3 field namesIf the rename broke any of the other 8 fields, these tests would still pass.
M9. M6 fixture files only specify 3 of 11 thresholds
features/fixtures/m6/automation_profiles.json,features/fixtures/m6/autonomy_guardrails.jsonmanual_thresholdsandfull_auto_thresholdsonly listdecompose_task,create_tool,select_tool+ safety fields. The other 8 threshold fields are absent from fixture validation. Guard profiles also only set the same 3 fields.LOW — Test Quality / Isolation
L1. Silent
getattrfallback masks bugs in escalation logicfeatures/steps/semantic_escalation_steps.py:145The
default=1.0means a typo or stale field name silently falls back to 1.0 (manual). For themanualprofile this is indistinguishable from correct behavior.L2. CLI output assertion case-sensitivity inconsistency
features/steps/automation_profile_cli_steps.py:510,539,548The generic "output should contain" step is case-insensitive while JSON/YAML-specific assertions are case-sensitive. Tests verifying exact field names through the generic step are unreliable.
L3. No test that
model_dump()/serialization uses new names exclusivelyfeatures/steps/automation_profile_steps.py:508-528The dump test only verifies
nameandselect_toolkeys exist. It doesn't verify all 11 new names are present or that no old names appear.L4. Threshold summary assertion only checks 3 of 11 fields
features/steps/automation_profile_cli_coverage_steps.py:119-123Only
decompose_task=,create_tool=,select_tool=are checked. If the summary omits the other 8, the test still passes.L5. Robot guard enforcement uses overly loose OR-chain
robot/e2e/m6_acceptance.robot:96The
has_profilecheck passes if any one of 7 keywords appears anywhere in stdout+stderr. An unrelated error message containingautomation_profilewould satisfy it.L6.
_create_in_memory_profile_service()duplicated 3 timesThe same helper is copy-pasted in
automation_profile_cli_steps.py:21-38,automation_profile_cli_coverage_steps.py:37-54, andautomation_profile_cli_coverage_boost_steps.py:27-44. Maintenance risk if the service setup changes.L7. Test step function names still use old-era naming
features/steps/automation_profile_steps.py— 16 functions (e.g.,step_try_create_profile_auto_dec_stratinstead of referencingedit_code). Not a failure risk but misleading for readers.L8. Feature file step text references old semantics
features/automation_profile_cli_coverage.feature:15— Says "strategize execute and apply values" but the assertion body checksdecompose_task=,create_tool=,select_tool=.LOW — Performance / Benchmarks
L9.
from_configbenchmark uses only 3 of 11 thresholdsbenchmarks/automation_profile_bench.py:114-120The config dict only specifies
decompose_task,create_tool,select_tool. The remaining 8 fields take Pydantic defaults. This doesn't measure the cost of validating a fully-populated config (the common real-world case).L10. CLI benchmark double-calls
_patch_service()benchmarks/automation_profile_cli_bench.py:106,117_patch_service()is called in bothsetup()andtime_add_from_config(). The per-iteration call contaminates timing measurements with service construction overhead.LOW — Code Quality
L11.
validate_thresholdvalidator is redundantsrc/cleveragents/domain/models/core/automation_profile.py:210-228The
@field_validatorchecking0.0 <= v <= 1.0is redundant with thege=0.0, le=1.0constraints already on eachField(). Pydantic validates constraints before field validators run.L12. Migration does not specify
existing_typealembic/versions/m5_001_rename_automation_profile_fields.py:42-43alter_column()omitsexisting_type=sa.Float(). Safe for SQLite (batch mode infers from existing column) but could cause issues on other DB backends.INFO — Observations
I1. Specification uses task-type names for phase-transition gates by design
The spec (lines 28333-28343) explicitly maps
create_tool→ Strategize→Execute gate,select_tool→ Execute→Apply gate,access_network→ Apply→Strategize reversion,delete_content→ Execute→Strategize reversion. The bridge comments inplan_lifecycle_service.py:1505-1511correctly document this. The semantic disconnect between field names and their actual behavioral meaning is a specification design choice, not a code bug.I2. 7 of 11 thresholds are unused in
plan_lifecycle_service.pyOnly
create_tool,select_tool,access_network, anddelete_contentare read by the lifecycle service. The other 7 (decompose_task,edit_code,execute_command,create_file,install_dependency,modify_config,approve_plan) are presumably consumed by actor execution, the AutonomyController, or other services. This is fine but worth documenting.I3. No stale references to old field names in production code
A full codebase sweep confirmed all occurrences of old
auto_*names exist only in intentional locations:_LEGACY_FIELD_MAP(migration detection), Alembic_COLUMN_RENAMES(DB migration), and_LEGACY_THRESHOLD_YAML(test fixture for rejection testing). The unrelatedProjectSettings.auto_applyandPlanConfig.auto_applyare different models and not affected.I4. All 88 threshold values across 8 profiles match the spec table exactly
Verified field-by-field. The code is faithfully implementing the specification.
Summary Table
Recommended Priority
try_auto_revert_from_executeprocessing-state guard asymmetry is a potential logic bug that could cause unintended reversions.bd277b94693b62bd989f3b62bd989fd8198457cdd8198457cdf9b4986b2af9b4986b2a0cf18592b7Review: APPROVED with Comments
Comprehensive rename refactor touching 62 files — high risk but methodically executed.
Notes
extra="forbid"with legacy key detection provides a user-friendly migration path with actionable error messages.auto_reversion_from_apply->access_network) may be non-obvious — ensure the mapping is documented.0cf18592b73501f1b6f0New commits pushed, approval review dismissed automatically according to repository settings
3501f1b6f05969cf49805969cf4980fe9f8ed536fe9f8ed536577466414c577466414c23da21e97e23da21e97e007af498b8