refactor(autonomy): rename automation profile task flags to spec names #1147

Merged
CoreRasurae merged 1 commit from feature/m5-automation-profile-fields into master 2026-03-30 12:34:21 +00:00
Member

Summary

Renamed all 11 task-type confidence threshold fields in the AutomationProfile model from phase-transition semantics (auto_strategize, auto_execute, etc.) to specification-defined task-type semantics (decompose_task, create_tool, etc.). Updated all 8 built-in profiles, CLI formatting, YAML schema, services, Alembic migration, and all Behave/Robot tests.

Changes

  • 42 files modified across domain models, services, CLI, infrastructure, schema, tests, and benchmarks
  • New Alembic migration m5_001_rename_automation_profile_fields for column renames
  • All threshold values preserved (only names changed)
  • Coverage: 98% (exceeds 97% requirement)

Quality Gates

All nox stages pass: lint, typecheck, unit_tests (12,288 scenarios), integration_tests, coverage_report.

Closes #902

## Summary Renamed all 11 task-type confidence threshold fields in the `AutomationProfile` model from phase-transition semantics (auto_strategize, auto_execute, etc.) to specification-defined task-type semantics (decompose_task, create_tool, etc.). Updated all 8 built-in profiles, CLI formatting, YAML schema, services, Alembic migration, and all Behave/Robot tests. ## Changes - 42 files modified across domain models, services, CLI, infrastructure, schema, tests, and benchmarks - New Alembic migration `m5_001_rename_automation_profile_fields` for column renames - All threshold values preserved (only names changed) - Coverage: **98%** (exceeds 97% requirement) ## Quality Gates All nox stages pass: lint, typecheck, unit_tests (12,288 scenarios), integration_tests, coverage_report. Closes #902
CoreRasurae added this to the v3.5.0 milestone 2026-03-24 03:45:24 +00:00
freemo approved these changes 2026-03-24 15:27:28 +00:00
Dismissed
freemo left a comment

Review: APPROVED

Thorough mechanical rename across 42 files (+885/-840 lines, net near-zero). Every reference updated consistently: domain model, DB model + from_domain/to_domain, repository, CLI commands, YAML profiles, schema, benchmarks, BDD features+steps, Robot tests+helpers. Alembic migration uses batch_alter_table for SQLite compatibility — good practice.

Important Note

Alembic migration conflict with PR #1145. Both PRs share the same down_revision = "m4_003_plan_env_columns". Merging both will create multiple Alembic heads. Coordinate with the #1145 author to determine merge order and rebase accordingly.

Minor Notes

  1. YAML profile comments still say "Phase-transition thresholds" / "Decision-autonomy thresholds" / "Self-repair thresholds" — these are semantically wrong after the rename and should be updated to match the new task-type naming.

  2. The semantic mapping between old and new names is non-obvious (e.g., auto_reversion_from_applyaccess_network). A brief mapping table in the CHANGELOG or migration docstring would aid understanding.

## Review: APPROVED Thorough mechanical rename across 42 files (+885/-840 lines, net near-zero). Every reference updated consistently: domain model, DB model + `from_domain`/`to_domain`, repository, CLI commands, YAML profiles, schema, benchmarks, BDD features+steps, Robot tests+helpers. Alembic migration uses `batch_alter_table` for SQLite compatibility — good practice. ### Important Note **Alembic migration conflict with PR #1145.** Both PRs share the same `down_revision = "m4_003_plan_env_columns"`. Merging both will create multiple Alembic heads. Coordinate with the #1145 author to determine merge order and rebase accordingly. ### Minor Notes 1. YAML profile comments still say "Phase-transition thresholds" / "Decision-autonomy thresholds" / "Self-repair thresholds" — these are semantically wrong after the rename and should be updated to match the new task-type naming. 2. The semantic mapping between old and new names is non-obvious (e.g., `auto_reversion_from_apply` → `access_network`). A brief mapping table in the CHANGELOG or migration docstring would aid understanding.
CoreRasurae left a comment

Review Scope

  • Commit reviewed: 2d03b809 (Luis) on branch feature/m5-automation-profile-fields
  • Scope enforced: branch diff + close surrounding automation-profile code paths only
  • Sources checked: issue #902 and docs/specification.md

Iterative Review Cycles

  1. Static diff audit (domain, CLI, DB model/repository, migration, schema, examples, tests)
  2. Runtime validation-path checks (config parsing and threshold resolution behavior)
  3. Verification runs (BDD):
    • uv run --with behave==1.3.3 python -m behave features/automation_profile_cli.feature features/semantic_escalation.feature
    • uv run --with behave==1.3.3 python -m behave features/consolidated_automation_profile.feature features/consolidated_misc.feature
    • Result: all scenarios passed in both runs
  4. Final global pass across bug/security/perf/test categories: no additional findings beyond the items below

Findings (Ordered by Severity and Category)

CRITICAL — Security/Safety: Legacy keys are silently ignored, producing permissive defaults

  • AutomationProfile.from_config directly calls model_validate (src/cleveragents/domain/models/core/automation_profile.py:278) and the model config does not forbid unknown fields (src/cleveragents/domain/models/core/automation_profile.py:307).
  • After this rename, legacy keys like auto_strategize/auto_execute/auto_apply are now unknown and silently dropped; threshold fields remain at default 0.0.
  • Reproduced locally with:
    cfg = {"name": "acme/test", "auto_strategize": 1.0, "auto_execute": 1.0, "auto_apply": 1.0}
    p = AutomationProfile.from_config(cfg)
    # p.decompose_task == 0.0, p.create_tool == 0.0, p.select_tool == 0.0
    
  • Impact: a profile intended to be manual/review can become effectively full-auto in key transitions.
  • Recommended fix: fail fast on unknown keys (extra='forbid') and/or add explicit legacy-key alias/migration with warnings.

HIGH — Spec/Contract Mismatch: Current implementation diverges from docs/specification.md

  • Spec still documents the threshold contract with auto_* names (docs/specification.md:28329, docs/specification.md:28371).
  • This branch switched runtime/CLI/schema/examples to task-type names (src/cleveragents/domain/models/core/automation_profile.py:74, src/cleveragents/cli/commands/automation_profile.py:95, docs/schema/automation_profile.schema.yaml:25).
  • Impact: a user following docs/specification.md will provide keys that runtime now ignores (see critical finding), causing unsafe behavior and acceptance-criteria ambiguity.
  • Recommended fix: update docs/specification.md in lockstep with this PR, or keep backward-compatible aliases until spec docs are migrated.

MEDIUM — Test Flaw/Coverage Gap: No regression guard for unknown/legacy threshold keys

  • Current test updates focus on new keys (e.g., valid YAML in features/steps/automation_profile_cli_steps.py:52) but do not assert rejection/migration of legacy auto_* keys.
  • Impact: the critical misconfiguration path is untested and can regress silently.
  • Recommended fix: add BDD scenarios ensuring automation-profile add either rejects legacy/unknown keys explicitly or validates a controlled compatibility path.

Review Outcome

I found no additional issues after repeated global review cycles; the three findings above are the complete set for this branch scope.

## Review Scope - Commit reviewed: `2d03b809` (Luis) on branch `feature/m5-automation-profile-fields` - Scope enforced: branch diff + close surrounding automation-profile code paths only - Sources checked: issue #902 and `docs/specification.md` ## Iterative Review Cycles 1. Static diff audit (domain, CLI, DB model/repository, migration, schema, examples, tests) 2. Runtime validation-path checks (config parsing and threshold resolution behavior) 3. Verification runs (BDD): - `uv run --with behave==1.3.3 python -m behave features/automation_profile_cli.feature features/semantic_escalation.feature` - `uv run --with behave==1.3.3 python -m behave features/consolidated_automation_profile.feature features/consolidated_misc.feature` - Result: all scenarios passed in both runs 4. Final global pass across bug/security/perf/test categories: no additional findings beyond the items below ## Findings (Ordered by Severity and Category) ### CRITICAL — Security/Safety: Legacy keys are silently ignored, producing permissive defaults - `AutomationProfile.from_config` directly calls `model_validate` (`src/cleveragents/domain/models/core/automation_profile.py:278`) and the model config does **not** forbid unknown fields (`src/cleveragents/domain/models/core/automation_profile.py:307`). - After this rename, legacy keys like `auto_strategize/auto_execute/auto_apply` are now unknown and silently dropped; threshold fields remain at default `0.0`. - Reproduced locally with: ```python cfg = {"name": "acme/test", "auto_strategize": 1.0, "auto_execute": 1.0, "auto_apply": 1.0} p = AutomationProfile.from_config(cfg) # p.decompose_task == 0.0, p.create_tool == 0.0, p.select_tool == 0.0 ``` - **Impact:** a profile intended to be manual/review can become effectively full-auto in key transitions. - **Recommended fix:** fail fast on unknown keys (`extra='forbid'`) and/or add explicit legacy-key alias/migration with warnings. ### HIGH — Spec/Contract Mismatch: Current implementation diverges from `docs/specification.md` - Spec still documents the threshold contract with `auto_*` names (`docs/specification.md:28329`, `docs/specification.md:28371`). - This branch switched runtime/CLI/schema/examples to task-type names (`src/cleveragents/domain/models/core/automation_profile.py:74`, `src/cleveragents/cli/commands/automation_profile.py:95`, `docs/schema/automation_profile.schema.yaml:25`). - **Impact:** a user following `docs/specification.md` will provide keys that runtime now ignores (see critical finding), causing unsafe behavior and acceptance-criteria ambiguity. - **Recommended fix:** update `docs/specification.md` in lockstep with this PR, or keep backward-compatible aliases until spec docs are migrated. ### MEDIUM — Test Flaw/Coverage Gap: No regression guard for unknown/legacy threshold keys - Current test updates focus on new keys (e.g., valid YAML in `features/steps/automation_profile_cli_steps.py:52`) but do not assert rejection/migration of legacy `auto_*` keys. - **Impact:** the critical misconfiguration path is untested and can regress silently. - **Recommended fix:** add BDD scenarios ensuring `automation-profile add` either rejects legacy/unknown keys explicitly or validates a controlled compatibility path. ## Review Outcome I found no additional issues after repeated global review cycles; the three findings above are the complete set for this branch scope.
CoreRasurae left a comment

Review Scope

  • Local commit reviewed: 8fe537c (Luis / CoreRasurae) on branch feature/m5-automation-profile-fields
  • Scope enforced: branch diff + closely connected automation-profile code paths only
  • References checked: issue #902 and docs/specification.md
  • Note: PR #1147 currently points to commit 2d03b809 (the reviewed local commit is not yet the PR head)

Iterative Global Cycles Performed

  1. Static diff audit (domain model, CLI, DB model/repository, migration, schema, tests/docs)
  2. Runtime validation checks (model parsing vs schema behavior, migration behavior)
  3. Security/performance/test-coverage pass
  4. Final global re-pass: no additional issues found

Findings (Ordered by Severity, then Category)

MEDIUM — Contract Correctness

Schema now rejects valid guards profile configs

  • docs/schema/automation_profile.schema.yaml:8 sets additionalProperties: false, but the schema does not define a top-level guards property.
  • Runtime still supports guards via AutomationProfile.guards (src/cleveragents/domain/models/core/automation_profile.py:156).
  • Reproduction (local): runtime accepts config with guards, but JSON Schema validation fails with: Additional properties are not allowed ('guards' was unexpected).

Impact

  • Tooling/docs consumers that validate against the published schema will reject configs that the runtime accepts.
  • This creates contract drift and can break profile onboarding pipelines.

Recommended fix

  • Add a guards object definition to docs/schema/automation_profile.schema.yaml (preferred), or relax top-level additionalProperties until schema/runtime parity is restored.

LOW — Test Coverage Gap

No regression guard for schema/runtime parity on guards

  • Coverage added for rejecting legacy threshold keys, but no scenario verifies that guarded profile configs remain valid under schema validation.

Impact

  • Schema/runtime drift like the issue above can pass CI unnoticed.

Recommended fix

  • Add one schema parity test (or BDD scenario) that validates a profile containing guards against the published schema and runtime parser.

LOW — Migration Documentation Consistency

Migration header text and actual revision chain differ

  • alembic/versions/m5_001_rename_automation_profile_fields.py:8 says Revises: a6_001_automation_profiles, while metadata uses down_revision = "m4_003_plan_env_columns" (alembic/versions/m5_001_rename_automation_profile_fields.py:19).

Impact

  • Functional behavior is fine, but this can mislead future migration audits.

Recommended fix

  • Align the human-readable header text with the actual down_revision metadata.

Outcome

After repeated full-category review cycles, these are the complete findings for the requested scope.

## Review Scope - Local commit reviewed: `8fe537c` (Luis / CoreRasurae) on branch `feature/m5-automation-profile-fields` - Scope enforced: branch diff + closely connected automation-profile code paths only - References checked: issue #902 and `docs/specification.md` - Note: PR #1147 currently points to commit `2d03b809` (the reviewed local commit is not yet the PR head) ## Iterative Global Cycles Performed 1. Static diff audit (domain model, CLI, DB model/repository, migration, schema, tests/docs) 2. Runtime validation checks (model parsing vs schema behavior, migration behavior) 3. Security/performance/test-coverage pass 4. Final global re-pass: no additional issues found ## Findings (Ordered by Severity, then Category) ### MEDIUM — Contract Correctness **Schema now rejects valid `guards` profile configs** - `docs/schema/automation_profile.schema.yaml:8` sets `additionalProperties: false`, but the schema does not define a top-level `guards` property. - Runtime still supports `guards` via `AutomationProfile.guards` (`src/cleveragents/domain/models/core/automation_profile.py:156`). - Reproduction (local): runtime accepts config with `guards`, but JSON Schema validation fails with: `Additional properties are not allowed ('guards' was unexpected)`. **Impact** - Tooling/docs consumers that validate against the published schema will reject configs that the runtime accepts. - This creates contract drift and can break profile onboarding pipelines. **Recommended fix** - Add a `guards` object definition to `docs/schema/automation_profile.schema.yaml` (preferred), or relax top-level `additionalProperties` until schema/runtime parity is restored. ### LOW — Test Coverage Gap **No regression guard for schema/runtime parity on `guards`** - Coverage added for rejecting legacy threshold keys, but no scenario verifies that guarded profile configs remain valid under schema validation. **Impact** - Schema/runtime drift like the issue above can pass CI unnoticed. **Recommended fix** - Add one schema parity test (or BDD scenario) that validates a profile containing `guards` against the published schema and runtime parser. ### LOW — Migration Documentation Consistency **Migration header text and actual revision chain differ** - `alembic/versions/m5_001_rename_automation_profile_fields.py:8` says `Revises: a6_001_automation_profiles`, while metadata uses `down_revision = "m4_003_plan_env_columns"` (`alembic/versions/m5_001_rename_automation_profile_fields.py:19`). **Impact** - Functional behavior is fine, but this can mislead future migration audits. **Recommended fix** - Align the human-readable header text with the actual `down_revision` metadata. ## Outcome After repeated full-category review cycles, these are the complete findings for the requested scope.
CoreRasurae force-pushed feature/m5-automation-profile-fields from 2d03b8098d
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 17s
CI / build (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 4m9s
CI / typecheck (pull_request) Successful in 4m21s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 4m31s
CI / integration_tests (pull_request) Successful in 5m57s
CI / unit_tests (pull_request) Successful in 7m26s
CI / docker (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 1s
CI / e2e_tests (pull_request) Successful in 7m47s
to 66294cac56
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 47s
CI / typecheck (pull_request) Successful in 4m12s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 4m10s
CI / build (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 3m57s
CI / integration_tests (pull_request) Successful in 6m45s
CI / unit_tests (pull_request) Successful in 7m18s
CI / docker (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 2s
CI / e2e_tests (pull_request) Successful in 9m49s
2026-03-27 15:39:57 +00:00
Compare
CoreRasurae dismissed freemo's review 2026-03-27 15:39:57 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

freemo requested changes 2026-03-27 17:08:26 +00:00
Dismissed
freemo left a comment

Review: refactor(autonomy): rename automation profile task flags to spec names

Issues Requiring Resolution

1. Semantic mismatch between old and new field names (High)
The rename mapping maps old phase-transition concepts to completely unrelated task-type concepts. For example:

  • auto_reversion_from_apply (reverting applied changes) → access_network (network access permission)
  • auto_child_plans (child plan spawning) → install_dependency (dependency installation)

The threshold values in the YAML profiles were tuned for the old semantics but now have names suggesting different operations. For example, the auto profile had auto_reversion_from_apply: 1.0 (always require human for reversion), now it's access_network: 1.0 (always require human for network access). These are conceptually different things. This needs clarification — is this an intentional semantic redesign or an error in the mapping?

2. No backward-compatible aliases — breaking change (High)
If any external tooling, stored YAML files, or database records reference the old field names (auto_strategize, etc.), they will break silently. The Pydantic model should use Field(alias="auto_strategize") or model_config = ConfigDict(populate_by_name=True) with aliases for a graceful transition period.

3. Alembic migration conflict — merge blocker (High)
down_revision = "m4_003_plan_env_columns" — this is the same parent as PRs #1074 and #1145. Three PRs branching from the same migration parent will cause an Alembic "multiple heads" error. These must be coordinated: first PR merged keeps its down_revision, subsequent PRs must be rebased.

4. Stale YAML profile comments (Medium)
Profile YAML comments still reference "Phase-transition thresholds" / "Decision-autonomy thresholds" groupings, but the new field names no longer map to phases. Comments should be updated to reflect the new task-type semantics.

5. Branch naming inconsistency (Low)
Branch feature/m5-automation-profile-fields uses feature/ prefix but the commit type is refactor. Should be refactor/m5-automation-profile-fields for consistency.

What's Good

  • Complete rename across all layers (domain, ORM, repository, CLI, YAML, schema, BDD, Robot, benchmarks).
  • Alembic migration uses batch_alter_table for SQLite compatibility.
  • All existing tests updated — no broken references.

Required Before Merge

  1. Clarify the semantic mapping rationale (are these intentionally different concepts or was the mapping table wrong?)
  2. Add backward-compatible Pydantic aliases for stored data
  3. Coordinate Alembic down_revision with PRs #1074 and #1145
  4. Update YAML profile comments to match new semantics
## Review: refactor(autonomy): rename automation profile task flags to spec names ### Issues Requiring Resolution **1. Semantic mismatch between old and new field names (High)** The rename mapping maps old phase-transition concepts to completely unrelated task-type concepts. For example: - `auto_reversion_from_apply` (reverting applied changes) → `access_network` (network access permission) - `auto_child_plans` (child plan spawning) → `install_dependency` (dependency installation) The threshold *values* in the YAML profiles were tuned for the old semantics but now have names suggesting different operations. For example, the `auto` profile had `auto_reversion_from_apply: 1.0` (always require human for reversion), now it's `access_network: 1.0` (always require human for network access). These are conceptually different things. This needs clarification — is this an intentional semantic redesign or an error in the mapping? **2. No backward-compatible aliases — breaking change (High)** If any external tooling, stored YAML files, or database records reference the old field names (`auto_strategize`, etc.), they will break silently. The Pydantic model should use `Field(alias="auto_strategize")` or `model_config = ConfigDict(populate_by_name=True)` with aliases for a graceful transition period. **3. Alembic migration conflict — merge blocker (High)** `down_revision = "m4_003_plan_env_columns"` — this is the same parent as PRs #1074 and #1145. Three PRs branching from the same migration parent will cause an Alembic "multiple heads" error. These must be coordinated: first PR merged keeps its `down_revision`, subsequent PRs must be rebased. **4. Stale YAML profile comments (Medium)** Profile YAML comments still reference "Phase-transition thresholds" / "Decision-autonomy thresholds" groupings, but the new field names no longer map to phases. Comments should be updated to reflect the new task-type semantics. **5. Branch naming inconsistency (Low)** Branch `feature/m5-automation-profile-fields` uses `feature/` prefix but the commit type is `refactor`. Should be `refactor/m5-automation-profile-fields` for consistency. ### What's Good - Complete rename across all layers (domain, ORM, repository, CLI, YAML, schema, BDD, Robot, benchmarks). - Alembic migration uses `batch_alter_table` for SQLite compatibility. - All existing tests updated — no broken references. ### Required Before Merge 1. Clarify the semantic mapping rationale (are these intentionally different concepts or was the mapping table wrong?) 2. Add backward-compatible Pydantic aliases for stored data 3. Coordinate Alembic `down_revision` with PRs #1074 and #1145 4. Update YAML profile comments to match new semantics
Author
Member

Code Review Report — PR #1147: Rename automation profile task flags to spec names

Reviewer: Automated review (3 full cycles)
Scope: All code changes in feature/m5-automation-profile-fields plus close surrounding context
References: Issue #902, docs/specification.md (lines 28312–28476, 35517–35636)


Executive Summary

The rename of 11 automation profile threshold fields from phase-transition semantics to spec-defined task-type semantics is mechanically correct — all field mappings, all 8 built-in profile threshold values, and the Alembic migration mapping are consistent with the specification. However, the review identified 13 issues across 7 categories, including 1 critical issue (stale fixture files that will break M6 tests) and 4 major issues that impact maintainability or test coverage.

Spec compliance: All 11 field names match the specification. All 8 built-in profile threshold values match the specification table (lines 28369–28381). Safety profile values are unchanged. The YAML schema, CLI output, and example profiles are all aligned.


Findings by Severity

CRITICAL (must fix before merge)

C1 — M6 fixture files still use ALL old field names (24 occurrences)

File Lines Old fields present
features/fixtures/m6/automation_profiles.json 8–40 auto_strategize, auto_execute, auto_apply, auto_decisions_strategize, auto_decisions_execute (12 occurrences across manual_thresholds, full_auto_thresholds, custom_profile.config)
features/fixtures/m6/autonomy_guardrails.json 9–70 auto_strategize, auto_execute, auto_apply (12 occurrences across 4 guard profiles: denylist_guard, allowlist_guard, cost_budget_guard, write_approval_guard)

Impact: Any M6 acceptance test that loads these fixtures into an AutomationProfile will fail with ValidationError because the model now has extra="forbid" (line 310 of automation_profile.py). Even if current tests only use these as raw JSON data, they represent incorrect documentation of the schema and will break when future tests construct profile objects from them.

Fix: Rename all 24 old field references to the new spec-defined names.


MAJOR (should fix before merge)

M1 — Phase-transition logic uses task-type field names without explanatory comments

The plan_lifecycle_service.py file uses 4 task-type threshold fields to control phase transitions and reversions. The new field names are semantically unrelated to the behavior they control, making the code opaque to future contributors:

Location Field used Behavior controlled Confusion
Lines 1506, 1520 profile.create_tool Strategize → Execute transition "create tool" doesn't suggest phase progression
Lines 1507, 1528 profile.select_tool Execute → Apply transition "select tool" doesn't suggest phase progression
Lines 1713, 1731, 1735 profile.access_network Auto-revert from constrained Apply "access network" doesn't suggest phase reversion
Lines 1779, 1783 profile.delete_content Execute → Strategize reversion "delete content" doesn't suggest strategy revision

The old field names (auto_execute, auto_apply, auto_reversion_from_apply, auto_strategy_revision) were self-documenting. The new names require a semantic bridge.

Fix: Add a mapping comment block at the top of the auto-progression section:

# Task-type threshold fields used as phase-transition gates (per spec):
#   create_tool    → Strategize → Execute  (was auto_execute)
#   select_tool    → Execute → Apply       (was auto_apply)
#   access_network → auto-revert from Apply (was auto_reversion_from_apply)
#   delete_content → auto-revise strategy   (was auto_strategy_revision)

M2 — extra="forbid" with no migration path for existing YAML configs

File: automation_profile.py:310

Adding extra="forbid" to the Pydantic model_config means any user YAML config with old field names will be hard-rejected with Pydantic's generic error: "Extra inputs are not permitted". The BDD test at automation_profile_cli.feature:49-53 confirms the rejection works, but the error message provides no guidance on what to rename fields to.

Impact: Users with custom automation profile YAML files created before this change will get an opaque error. This is a breaking change with no deprecation path.

Fix (recommended): Add a @model_validator(mode="before") that detects old field names and raises a ValueError with the mapping, e.g.: "Field 'auto_strategize' was renamed to 'decompose_task'. Please update your YAML config."

M3 — No database migration test (upgrade or downgrade)

File: alembic/versions/m5_001_rename_automation_profile_fields.py

The migration renames 11 columns in automation_profiles. There is no test anywhere in features/, tests/, robot/, or benchmarks/ that:

  • Runs the upgrade against a database with old-named columns and verifies data survives
  • Runs the downgrade and verifies column names are restored
  • Verifies the Alembic revision chain is intact

Impact: A subtle migration bug (e.g., data loss during SQLite table recreation in batch_alter_table) would go undetected.

M4 — Built-in profile tests only spot-check a subset of 11 thresholds

File: features/consolidated_automation_profile.feature

6 of 8 built-in profile scenarios test fewer than half of the 11 threshold fields:

Profile Fields asserted Fields NOT asserted
manual 3 + safety 8 thresholds unchecked
review 4 7 thresholds unchecked
ci 2 + safety 9 thresholds unchecked
full-auto 2 + safety 9 thresholds unchecked

Impact: A regression in any non-asserted threshold value (e.g., wrong mapping during rename) would go undetected. For profiles with trivial patterns (all 1.0 or all 0.0), exhaustive assertions are trivial to add and would provide strong regression protection.


MINOR (nice-to-have improvements)

m1 — Stale BDD step function names use old field semantics

File: features/steps/automation_profile_steps.py:~130-172

8 step function names retain old naming: step_check_auto_val_fix (checks create_file), step_check_auto_strat_rev (checks delete_content), step_check_auto_rev_apply (checks access_network), etc. Functionally correct but misleading for maintenance.

m2 — Stale BDD step description text

File Line Issue
features/automation_profile_cli_coverage.feature 15 Says "strategize execute and apply values" but asserts decompose_task, create_tool, select_tool
features/steps/automation_profile_cli_coverage_steps.py 157 Same stale description in @then decorator

m3 — Dead check in Robot test

File: robot/e2e/m6_acceptance.robot:96 — checks for string safety_profile in output, but the actual model field is safety. This is a dead check in an OR chain — it never matches.

m4 — Structured log messages lack field-name context

File: plan_lifecycle_service.py:1731-1735, 1779-1783

Log messages emit threshold=<value> without indicating which field was checked. Debugging reversion issues from logs requires code inspection. Adding threshold_field="access_network" (etc.) to log kwargs would improve observability.

m5 — No idempotency guard on migration column rename

File: m5_001_rename_automation_profile_fields.py:42-43

batch_op.alter_column(old_name, new_column_name=new_name) will error if old_name doesn't exist (e.g., after partial failure or on a fresh DB created with Base.metadata.create_all()). Standard Alembic practice, but a column-existence check would make it more robust.


NOTE (informational / pre-existing)

n1 — auto profile description is imprecise (pre-existing)

File: automation_profile.py:436 — Description says "Fully automatic except reversion" but both select_tool=1.0 (Apply gate) and access_network=1.0 (Apply reversion gate) require human approval. This was already imprecise in the old code (auto_apply=1.0 + auto_reversion_from_apply=1.0); the rename makes the imprecision more visible because the new field names don't convey the apply/reversion relationship. A more accurate description: "Fully automatic except apply and reversion".

n2 — CLI list table headers lost phase-transition context

File: automation_profile.py:407-409 — Column headers changed from "Strategize / Execute / Apply" to "Decompose / Create Tool / Select Tool", which no longer convey the phase-transition semantics. This is by design per the spec but impacts discoverability for users familiar with the old UI.

n3 — float(cast(float, row.xxx)) is stylistically noisy (pre-existing)

File: repositories.py:4453-4463typing.cast is a no-op at runtime; the float() wrapping is sufficient. Not a performance issue but adds visual noise.


Verification Checklist

Check Result
All 11 field names match specification PASS
All 8 built-in profile values match spec table PASS
YAML schema updated (automation_profile.schema.yaml) PASS
CLI output uses new field names PASS
Example YAML files match spec values PASS (all 9 files verified)
Alembic migration mapping is correct PASS
No old field names in src/ Python code PASS
No old field names in __init__.py exports PASS
No SQL injection risk in migration PASS
Safety profile values unchanged PASS
AutonomyController is field-name-agnostic (getattr) PASS

Summary Table

ID Category Severity File(s) Description
C1 Stale Data Critical fixtures/m6/*.json 24 old field names in M6 fixtures
M1 Maintainability Major plan_lifecycle_service.py 4 phase-transition usages lack semantic comments
M2 Backward Compat Major automation_profile.py:310 extra="forbid" with no migration path
M3 Test Coverage Major m5_001_rename_...py No migration upgrade/downgrade test
M4 Test Coverage Major consolidated_automation_profile.feature 6/8 profiles test less than 50% of thresholds
m1 Code Quality Minor automation_profile_steps.py 8 stale step function names
m2 Code Quality Minor automation_profile_cli_coverage.feature/.py Stale step description text
m3 Code Quality Minor m6_acceptance.robot:96 Dead safety_profile check
m4 Observability Minor plan_lifecycle_service.py Log msgs lack field-name context
m5 Migration Safety Minor m5_001_rename_...py No idempotency guard
n1 Documentation Note automation_profile.py:436 auto profile description imprecise (pre-existing)
n2 UX Note automation_profile.py:407-409 CLI headers lost phase-transition context
n3 Style Note repositories.py:4453 Redundant float(cast(float, x)) pattern

Recommendation: Fix C1 (critical) and M1–M4 (major) before merge. Minor and Note items can be addressed as follow-up.

# Code Review Report — PR #1147: Rename automation profile task flags to spec names **Reviewer:** Automated review (3 full cycles) **Scope:** All code changes in `feature/m5-automation-profile-fields` plus close surrounding context **References:** Issue #902, `docs/specification.md` (lines 28312–28476, 35517–35636) --- ## Executive Summary The rename of 11 automation profile threshold fields from phase-transition semantics to spec-defined task-type semantics is **mechanically correct** — all field mappings, all 8 built-in profile threshold values, and the Alembic migration mapping are consistent with the specification. However, the review identified **13 issues** across 7 categories, including **1 critical** issue (stale fixture files that will break M6 tests) and **4 major** issues that impact maintainability or test coverage. **Spec compliance:** All 11 field names match the specification. All 8 built-in profile threshold values match the specification table (lines 28369–28381). Safety profile values are unchanged. The YAML schema, CLI output, and example profiles are all aligned. --- ## Findings by Severity ### CRITICAL (must fix before merge) #### C1 — M6 fixture files still use ALL old field names (24 occurrences) | File | Lines | Old fields present | |------|-------|--------------------| | `features/fixtures/m6/automation_profiles.json` | 8–40 | `auto_strategize`, `auto_execute`, `auto_apply`, `auto_decisions_strategize`, `auto_decisions_execute` (12 occurrences across `manual_thresholds`, `full_auto_thresholds`, `custom_profile.config`) | | `features/fixtures/m6/autonomy_guardrails.json` | 9–70 | `auto_strategize`, `auto_execute`, `auto_apply` (12 occurrences across 4 guard profiles: `denylist_guard`, `allowlist_guard`, `cost_budget_guard`, `write_approval_guard`) | **Impact:** Any M6 acceptance test that loads these fixtures into an `AutomationProfile` will fail with `ValidationError` because the model now has `extra="forbid"` (line 310 of `automation_profile.py`). Even if current tests only use these as raw JSON data, they represent incorrect documentation of the schema and will break when future tests construct profile objects from them. **Fix:** Rename all 24 old field references to the new spec-defined names. --- ### MAJOR (should fix before merge) #### M1 — Phase-transition logic uses task-type field names without explanatory comments The `plan_lifecycle_service.py` file uses 4 task-type threshold fields to control phase transitions and reversions. The new field names are semantically unrelated to the behavior they control, making the code opaque to future contributors: | Location | Field used | Behavior controlled | Confusion | |----------|-----------|-------------------|-----------| | Lines 1506, 1520 | `profile.create_tool` | Strategize → Execute transition | "create tool" doesn't suggest phase progression | | Lines 1507, 1528 | `profile.select_tool` | Execute → Apply transition | "select tool" doesn't suggest phase progression | | Lines 1713, 1731, 1735 | `profile.access_network` | Auto-revert from constrained Apply | "access network" doesn't suggest phase reversion | | Lines 1779, 1783 | `profile.delete_content` | Execute → Strategize reversion | "delete content" doesn't suggest strategy revision | The old field names (`auto_execute`, `auto_apply`, `auto_reversion_from_apply`, `auto_strategy_revision`) were self-documenting. The new names require a semantic bridge. **Fix:** Add a mapping comment block at the top of the auto-progression section: ```python # Task-type threshold fields used as phase-transition gates (per spec): # create_tool → Strategize → Execute (was auto_execute) # select_tool → Execute → Apply (was auto_apply) # access_network → auto-revert from Apply (was auto_reversion_from_apply) # delete_content → auto-revise strategy (was auto_strategy_revision) ``` #### M2 — `extra="forbid"` with no migration path for existing YAML configs **File:** `automation_profile.py:310` Adding `extra="forbid"` to the Pydantic `model_config` means any user YAML config with old field names will be hard-rejected with Pydantic's generic error: *"Extra inputs are not permitted"*. The BDD test at `automation_profile_cli.feature:49-53` confirms the rejection works, but the error message provides no guidance on what to rename fields to. **Impact:** Users with custom automation profile YAML files created before this change will get an opaque error. This is a breaking change with no deprecation path. **Fix (recommended):** Add a `@model_validator(mode="before")` that detects old field names and raises a `ValueError` with the mapping, e.g.: *"Field 'auto_strategize' was renamed to 'decompose_task'. Please update your YAML config."* #### M3 — No database migration test (upgrade or downgrade) **File:** `alembic/versions/m5_001_rename_automation_profile_fields.py` The migration renames 11 columns in `automation_profiles`. There is no test anywhere in `features/`, `tests/`, `robot/`, or `benchmarks/` that: - Runs the upgrade against a database with old-named columns and verifies data survives - Runs the downgrade and verifies column names are restored - Verifies the Alembic revision chain is intact **Impact:** A subtle migration bug (e.g., data loss during SQLite table recreation in `batch_alter_table`) would go undetected. #### M4 — Built-in profile tests only spot-check a subset of 11 thresholds **File:** `features/consolidated_automation_profile.feature` 6 of 8 built-in profile scenarios test fewer than half of the 11 threshold fields: | Profile | Fields asserted | Fields NOT asserted | |---------|----------------|-------------------| | manual | 3 + safety | 8 thresholds unchecked | | review | 4 | 7 thresholds unchecked | | ci | 2 + safety | 9 thresholds unchecked | | full-auto | 2 + safety | 9 thresholds unchecked | **Impact:** A regression in any non-asserted threshold value (e.g., wrong mapping during rename) would go undetected. For profiles with trivial patterns (all 1.0 or all 0.0), exhaustive assertions are trivial to add and would provide strong regression protection. --- ### MINOR (nice-to-have improvements) #### m1 — Stale BDD step function names use old field semantics **File:** `features/steps/automation_profile_steps.py:~130-172` 8 step function names retain old naming: `step_check_auto_val_fix` (checks `create_file`), `step_check_auto_strat_rev` (checks `delete_content`), `step_check_auto_rev_apply` (checks `access_network`), etc. Functionally correct but misleading for maintenance. #### m2 — Stale BDD step description text | File | Line | Issue | |------|------|-------| | `features/automation_profile_cli_coverage.feature` | 15 | Says "strategize execute and apply values" but asserts `decompose_task`, `create_tool`, `select_tool` | | `features/steps/automation_profile_cli_coverage_steps.py` | 157 | Same stale description in `@then` decorator | #### m3 — Dead check in Robot test **File:** `robot/e2e/m6_acceptance.robot:96` — checks for string `safety_profile` in output, but the actual model field is `safety`. This is a dead check in an OR chain — it never matches. #### m4 — Structured log messages lack field-name context **File:** `plan_lifecycle_service.py:1731-1735, 1779-1783` Log messages emit `threshold=<value>` without indicating which field was checked. Debugging reversion issues from logs requires code inspection. Adding `threshold_field="access_network"` (etc.) to log kwargs would improve observability. #### m5 — No idempotency guard on migration column rename **File:** `m5_001_rename_automation_profile_fields.py:42-43` `batch_op.alter_column(old_name, new_column_name=new_name)` will error if `old_name` doesn't exist (e.g., after partial failure or on a fresh DB created with `Base.metadata.create_all()`). Standard Alembic practice, but a column-existence check would make it more robust. --- ### NOTE (informational / pre-existing) #### n1 — `auto` profile description is imprecise (pre-existing) **File:** `automation_profile.py:436` — Description says *"Fully automatic except reversion"* but both `select_tool=1.0` (Apply gate) and `access_network=1.0` (Apply reversion gate) require human approval. This was already imprecise in the old code (`auto_apply=1.0` + `auto_reversion_from_apply=1.0`); the rename makes the imprecision more visible because the new field names don't convey the apply/reversion relationship. A more accurate description: *"Fully automatic except apply and reversion"*. #### n2 — CLI list table headers lost phase-transition context **File:** `automation_profile.py:407-409` — Column headers changed from "Strategize / Execute / Apply" to "Decompose / Create Tool / Select Tool", which no longer convey the phase-transition semantics. This is by design per the spec but impacts discoverability for users familiar with the old UI. #### n3 — `float(cast(float, row.xxx))` is stylistically noisy (pre-existing) **File:** `repositories.py:4453-4463` — `typing.cast` is a no-op at runtime; the `float()` wrapping is sufficient. Not a performance issue but adds visual noise. --- ## Verification Checklist | Check | Result | |-------|--------| | All 11 field names match specification | ✅ PASS | | All 8 built-in profile values match spec table | ✅ PASS | | YAML schema updated (`automation_profile.schema.yaml`) | ✅ PASS | | CLI output uses new field names | ✅ PASS | | Example YAML files match spec values | ✅ PASS (all 9 files verified) | | Alembic migration mapping is correct | ✅ PASS | | No old field names in `src/` Python code | ✅ PASS | | No old field names in `__init__.py` exports | ✅ PASS | | No SQL injection risk in migration | ✅ PASS | | Safety profile values unchanged | ✅ PASS | | `AutonomyController` is field-name-agnostic (`getattr`) | ✅ PASS | --- ## Summary Table | ID | Category | Severity | File(s) | Description | |----|----------|----------|---------|-------------| | **C1** | Stale Data | **Critical** | `fixtures/m6/*.json` | 24 old field names in M6 fixtures | | **M1** | Maintainability | **Major** | `plan_lifecycle_service.py` | 4 phase-transition usages lack semantic comments | | **M2** | Backward Compat | **Major** | `automation_profile.py:310` | `extra="forbid"` with no migration path | | **M3** | Test Coverage | **Major** | `m5_001_rename_...py` | No migration upgrade/downgrade test | | **M4** | Test Coverage | **Major** | `consolidated_automation_profile.feature` | 6/8 profiles test less than 50% of thresholds | | m1 | Code Quality | Minor | `automation_profile_steps.py` | 8 stale step function names | | m2 | Code Quality | Minor | `automation_profile_cli_coverage.feature/.py` | Stale step description text | | m3 | Code Quality | Minor | `m6_acceptance.robot:96` | Dead `safety_profile` check | | m4 | Observability | Minor | `plan_lifecycle_service.py` | Log msgs lack field-name context | | m5 | Migration Safety | Minor | `m5_001_rename_...py` | No idempotency guard | | n1 | Documentation | Note | `automation_profile.py:436` | `auto` profile description imprecise (pre-existing) | | n2 | UX | Note | `automation_profile.py:407-409` | CLI headers lost phase-transition context | | n3 | Style | Note | `repositories.py:4453` | Redundant `float(cast(float, x))` pattern | **Recommendation:** Fix **C1** (critical) and **M1–M4** (major) before merge. Minor and Note items can be addressed as follow-up.
CoreRasurae force-pushed feature/m5-automation-profile-fields from 66294cac56
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Failing after 47s
CI / typecheck (pull_request) Successful in 4m12s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 4m10s
CI / build (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 3m57s
CI / integration_tests (pull_request) Successful in 6m45s
CI / unit_tests (pull_request) Successful in 7m18s
CI / docker (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 2s
CI / e2e_tests (pull_request) Successful in 9m49s
to def66513bf
Some checks failed
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 21s
CI / lint (pull_request) Failing after 25s
CI / quality (pull_request) Successful in 3m41s
CI / typecheck (pull_request) Successful in 3m55s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 4m1s
CI / integration_tests (pull_request) Successful in 6m49s
CI / e2e_tests (pull_request) Successful in 9m36s
CI / unit_tests (pull_request) Failing after 17m18s
2026-03-27 17:59:22 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1147

Branch: feature/m5-automation-profile-fields
Commit: def66513 (CoreRasurae)
Issue: #902 — rename automation profile task flags to spec names
Reviewer scope: All code changes in the branch + close surrounding code
Review cycles completed: 3 full passes across all categories


Executive Summary

The rename of all 11 automation profile threshold fields from phase-transition semantics to spec-defined task-type semantics is mechanically correct. All field mappings match the specification table (§28369–28381), all 8 built-in profiles preserve their exact threshold values, and the codebase-wide stale reference search found zero missed renames. The Alembic migration is correct and safely reversible. No security or performance issues were found.

The findings below center on documentation/spec alignment (the primary concern), test coverage gaps, and one CLI output format regression.


Findings by Severity

HIGH — Documentation: Missing Bridge Documentation (5 findings)

These are the most impactful issues. The task-type field names (delete_content, access_network, etc.) are now used throughout ADRs, reference docs, and the spec to describe phase-transition behaviors (Execute→Strategize reversion, Apply→Strategize reversion). No documentation anywhere explains why a task-type name controls a phase-transition behavior.

ID File(s) Description
H-1 All ADRs and reference docs No bridge documentation exists. A mapping table (field name → spec task type → runtime behavior controlled) is needed in at least ADR-017 and automation_profiles.md. Without it, readers encountering delete_content as a "reversion gate" have no path to understanding.
H-2 docs/adr/ADR-006-plan-lifecycle.md:105,107 delete_content described as "Execute→Strategize reversion" gate; access_network described as "Apply→Strategize reversion" gate. No semantic link between the field name and the behavior it controls.
H-3 docs/adr/ADR-017-automation-profiles.md:60–66 Profile fields table pairs 7 task-type names with phase-transition descriptions (e.g., create_tool → "transition from Strategize to Execute").
H-4 docs/reference/automation_profiles.md:17–29 All 11 fields have task-type names paired with behavioral descriptions from the old naming scheme (e.g., create_file → "Gate for automatic validation fixes", install_dependency → "Gate for spawning child plans", modify_config → "Gate for retrying transient failures").
H-5 docs/reference/phase_reversion.md:20–21 Reversion threshold table uses access_network and delete_content as column headers for reversion gates with no explanation.

Recommendation: Add a "Field Naming Convention" section to ADR-017 and automation_profiles.md with a full 3-column mapping table: Field Name | Spec Task Type | Runtime Behavior Controlled. Add parenthetical notes in phase_reversion.md, error_recovery.md, and ADR-006 referencing this mapping.


MEDIUM — Bugs & Spec Mismatch (5 findings)

ID File(s) Category Description
M-1 src/.../cli/commands/automation_profile.py:148 Bug / Spec deviation CLI show output format regressed. The specification (docs/specification.md:17104–17127) defines categorized output groups ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls") for automation-profile show. The code now outputs all 11 thresholds under a single flat header "Task-Type Confidence Thresholds". The old code had categories matching the spec. This is a regression introduced by this PR.
M-2 docs/specification.md:17104–17230 Bug / Missing field access_network missing from spec show output examples. All 4 format variants (Rich, Plain, JSON, YAML) of the automation-profile show output display only 10 of 11 threshold fields. access_network is not shown in any category group. (Pre-existing: was also missing as auto_reversion_from_apply before the rename.)
M-3 docs/schema/automation_profile.schema.yaml:26–94 Doc/code mismatch Schema descriptions use task-type semantics consistently (e.g., "Threshold for automatic content deletion" for delete_content) but do not describe the actual runtime behavior (strategy revision gate). A user writing a custom profile YAML would not know that access_network: 1.0 means "always require manual approval for Apply→Strategize reversion."
M-4 docs/adr/ADR-017-automation-profiles.md:58–67 Incomplete doc ADR-017 Profile Fields table lists only 7 of 11 threshold fields. Omits create_file, install_dependency, modify_config, approve_plan.
M-5 features/steps/repositories_uncovered_lines_steps.py:1501–1512 Test coverage gap Repository roundtrip test asserts only 3 of 11 threshold fields. The upsert test writes all 11 fields with distinct values (decompose_task=0.9, create_tool=0.8, ..., approve_plan=0.15) but only asserts decompose_task, create_tool, and select_tool after roundtrip. A serialization regression for the other 8 fields (e.g., edit_code, access_network) would pass silently.

LOW — Minor Improvements (5 findings)

ID File(s) Category Description
L-1 features/steps/automation_profile_cli_steps.py:86–93 Test coverage Legacy validator test _LEGACY_THRESHOLD_YAML only includes 3 of 11 legacy names (auto_strategize, auto_execute, auto_apply). Since the validator uses a simple dict lookup, a typo in any of the 8 untested map entries would go undetected.
L-2 features/steps/plan_lifecycle_service_coverage_r2_steps.py:291–296, 374–379 Test coverage Comments state CI profile has access_network=0.0 and delete_content=0.0 (used as rationale for selecting ci in reversion tests), but the CI profile BDD test in consolidated_automation_profile.feature does not assert these specific values.
L-3 docs/reference/error_recovery.md:38 Semantic confusion modify_config described as controlling the "retry policy" threshold. No explanation of why a config-modification task type gates transient error retry.
L-4 docs/reference/plan_execute.md:298 Semantic confusion Same issue: modify_config described as controlling "retry limits" without bridge explanation.
L-5 docs/adr/ADR-007-decision-tree-and-correction.md:118–122 Semantic confusion edit_code gates all strategize-phase decisions (strategy_choice, resource_selection, subplan_spawn, invariant enforcement), not just code editing decisions. The old name auto_decisions_strategize clearly conveyed the broader scope.

What Was Verified Clean

Area Result
Field mapping correctness All 11 old→new mappings match spec table (§28369–28381)
Built-in profile values All 8 profiles preserve exact threshold values under new names
Stale reference search Zero missed renames across src/, features/, robot/, benchmarks/, docs/, examples/ (excluding intentional references in migration, legacy map, and rejection tests)
Migration safety batch_alter_table correct for SQLite; downgrade correctly reverses; no data loss risk
Model validator reject_legacy_field_names correctly detects old keys and produces actionable error with rename mapping
Security extra="forbid" on model + additionalProperties: false on schema aligned; no injection or mutation risks in validator
Performance Model validator is a simple dict key membership check; negligible overhead
Repository serialization All 11 fields correctly mapped in both _to_domain and _to_model/_update_model

  1. (M-1) Restore categorized output groups in _print_profile to match the specification, OR update the specification output examples to reflect the new flat format. One or the other must be aligned.
  2. (M-2) Add access_network to the spec's show output examples in all 4 format variants.
  3. (H-1 through H-5) Add a bridge documentation section (mapping table) in ADR-017 and automation_profiles.md. Add brief parenthetical explanations in ADR-006, phase_reversion.md, error_recovery.md, and plan_execute.md.
  4. (M-5) Extend the repository roundtrip assertion to verify all 11 threshold fields.
  5. (M-4) Add the 4 missing fields to ADR-017's profile fields table.
# Code Review Report — PR #1147 **Branch:** `feature/m5-automation-profile-fields` **Commit:** `def66513` (CoreRasurae) **Issue:** #902 — rename automation profile task flags to spec names **Reviewer scope:** All code changes in the branch + close surrounding code **Review cycles completed:** 3 full passes across all categories --- ## Executive Summary The rename of all 11 automation profile threshold fields from phase-transition semantics to spec-defined task-type semantics is **mechanically correct**. All field mappings match the specification table (§28369–28381), all 8 built-in profiles preserve their exact threshold values, and the codebase-wide stale reference search found **zero missed renames**. The Alembic migration is correct and safely reversible. No security or performance issues were found. The findings below center on **documentation/spec alignment** (the primary concern), **test coverage gaps**, and one **CLI output format regression**. --- ## Findings by Severity ### HIGH — Documentation: Missing Bridge Documentation (5 findings) These are the most impactful issues. The task-type field names (`delete_content`, `access_network`, etc.) are now used throughout ADRs, reference docs, and the spec to describe phase-transition behaviors (`Execute→Strategize reversion`, `Apply→Strategize reversion`). No documentation anywhere explains *why* a task-type name controls a phase-transition behavior. | ID | File(s) | Description | |----|---------|-------------| | **H-1** | All ADRs and reference docs | **No bridge documentation exists.** A mapping table (field name → spec task type → runtime behavior controlled) is needed in at least ADR-017 and `automation_profiles.md`. Without it, readers encountering `delete_content` as a "reversion gate" have no path to understanding. | | **H-2** | `docs/adr/ADR-006-plan-lifecycle.md:105,107` | `delete_content` described as "Execute→Strategize reversion" gate; `access_network` described as "Apply→Strategize reversion" gate. No semantic link between the field name and the behavior it controls. | | **H-3** | `docs/adr/ADR-017-automation-profiles.md:60–66` | Profile fields table pairs 7 task-type names with phase-transition descriptions (e.g., `create_tool` → "transition from Strategize to Execute"). | | **H-4** | `docs/reference/automation_profiles.md:17–29` | All 11 fields have task-type names paired with behavioral descriptions from the old naming scheme (e.g., `create_file` → "Gate for automatic validation fixes", `install_dependency` → "Gate for spawning child plans", `modify_config` → "Gate for retrying transient failures"). | | **H-5** | `docs/reference/phase_reversion.md:20–21` | Reversion threshold table uses `access_network` and `delete_content` as column headers for reversion gates with no explanation. | **Recommendation:** Add a "Field Naming Convention" section to ADR-017 and `automation_profiles.md` with a full 3-column mapping table: `Field Name` | `Spec Task Type` | `Runtime Behavior Controlled`. Add parenthetical notes in `phase_reversion.md`, `error_recovery.md`, and ADR-006 referencing this mapping. --- ### MEDIUM — Bugs & Spec Mismatch (5 findings) | ID | File(s) | Category | Description | |----|---------|----------|-------------| | **M-1** | `src/.../cli/commands/automation_profile.py:148` | Bug / Spec deviation | **CLI `show` output format regressed.** The specification (`docs/specification.md:17104–17127`) defines categorized output groups ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls") for `automation-profile show`. The code now outputs all 11 thresholds under a single flat header `"Task-Type Confidence Thresholds"`. The old code had categories matching the spec. This is a regression introduced by this PR. | | **M-2** | `docs/specification.md:17104–17230` | Bug / Missing field | **`access_network` missing from spec `show` output examples.** All 4 format variants (Rich, Plain, JSON, YAML) of the `automation-profile show` output display only 10 of 11 threshold fields. `access_network` is not shown in any category group. (Pre-existing: was also missing as `auto_reversion_from_apply` before the rename.) | | **M-3** | `docs/schema/automation_profile.schema.yaml:26–94` | Doc/code mismatch | Schema descriptions use task-type semantics consistently (e.g., "Threshold for automatic content deletion" for `delete_content`) but do **not** describe the actual runtime behavior (strategy revision gate). A user writing a custom profile YAML would not know that `access_network: 1.0` means "always require manual approval for Apply→Strategize reversion." | | **M-4** | `docs/adr/ADR-017-automation-profiles.md:58–67` | Incomplete doc | ADR-017 Profile Fields table lists only 7 of 11 threshold fields. Omits `create_file`, `install_dependency`, `modify_config`, `approve_plan`. | | **M-5** | `features/steps/repositories_uncovered_lines_steps.py:1501–1512` | Test coverage gap | **Repository roundtrip test asserts only 3 of 11 threshold fields.** The upsert test writes all 11 fields with distinct values (`decompose_task=0.9`, `create_tool=0.8`, ..., `approve_plan=0.15`) but only asserts `decompose_task`, `create_tool`, and `select_tool` after roundtrip. A serialization regression for the other 8 fields (e.g., `edit_code`, `access_network`) would pass silently. | --- ### LOW — Minor Improvements (5 findings) | ID | File(s) | Category | Description | |----|---------|----------|-------------| | **L-1** | `features/steps/automation_profile_cli_steps.py:86–93` | Test coverage | Legacy validator test `_LEGACY_THRESHOLD_YAML` only includes 3 of 11 legacy names (`auto_strategize`, `auto_execute`, `auto_apply`). Since the validator uses a simple dict lookup, a typo in any of the 8 untested map entries would go undetected. | | **L-2** | `features/steps/plan_lifecycle_service_coverage_r2_steps.py:291–296, 374–379` | Test coverage | Comments state CI profile has `access_network=0.0` and `delete_content=0.0` (used as rationale for selecting `ci` in reversion tests), but the CI profile BDD test in `consolidated_automation_profile.feature` does not assert these specific values. | | **L-3** | `docs/reference/error_recovery.md:38` | Semantic confusion | `modify_config` described as controlling the "retry policy" threshold. No explanation of why a config-modification task type gates transient error retry. | | **L-4** | `docs/reference/plan_execute.md:298` | Semantic confusion | Same issue: `modify_config` described as controlling "retry limits" without bridge explanation. | | **L-5** | `docs/adr/ADR-007-decision-tree-and-correction.md:118–122` | Semantic confusion | `edit_code` gates *all* strategize-phase decisions (strategy_choice, resource_selection, subplan_spawn, invariant enforcement), not just code editing decisions. The old name `auto_decisions_strategize` clearly conveyed the broader scope. | --- ## What Was Verified Clean | Area | Result | |------|--------| | **Field mapping correctness** | All 11 old→new mappings match spec table (§28369–28381) | | **Built-in profile values** | All 8 profiles preserve exact threshold values under new names | | **Stale reference search** | Zero missed renames across `src/`, `features/`, `robot/`, `benchmarks/`, `docs/`, `examples/` (excluding intentional references in migration, legacy map, and rejection tests) | | **Migration safety** | `batch_alter_table` correct for SQLite; downgrade correctly reverses; no data loss risk | | **Model validator** | `reject_legacy_field_names` correctly detects old keys and produces actionable error with rename mapping | | **Security** | `extra="forbid"` on model + `additionalProperties: false` on schema aligned; no injection or mutation risks in validator | | **Performance** | Model validator is a simple dict key membership check; negligible overhead | | **Repository serialization** | All 11 fields correctly mapped in both `_to_domain` and `_to_model`/`_update_model` | --- ## Recommended Actions 1. **(M-1)** Restore categorized output groups in `_print_profile` to match the specification, OR update the specification output examples to reflect the new flat format. One or the other must be aligned. 2. **(M-2)** Add `access_network` to the spec's `show` output examples in all 4 format variants. 3. **(H-1 through H-5)** Add a bridge documentation section (mapping table) in ADR-017 and `automation_profiles.md`. Add brief parenthetical explanations in ADR-006, phase_reversion.md, error_recovery.md, and plan_execute.md. 4. **(M-5)** Extend the repository roundtrip assertion to verify all 11 threshold fields. 5. **(M-4)** Add the 4 missing fields to ADR-017's profile fields table.
CoreRasurae force-pushed feature/m5-automation-profile-fields from def66513bf
Some checks failed
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 21s
CI / lint (pull_request) Failing after 25s
CI / quality (pull_request) Successful in 3m41s
CI / typecheck (pull_request) Successful in 3m55s
CI / coverage (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been skipped
CI / security (pull_request) Successful in 4m1s
CI / integration_tests (pull_request) Successful in 6m49s
CI / e2e_tests (pull_request) Successful in 9m36s
CI / unit_tests (pull_request) Failing after 17m18s
to 4dbb6ef97e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m51s
CI / typecheck (pull_request) Successful in 3m54s
CI / security (pull_request) Successful in 4m2s
CI / e2e_tests (pull_request) Successful in 5m24s
CI / coverage (pull_request) Successful in 7m55s
CI / integration_tests (pull_request) Failing after 20m26s
CI / unit_tests (pull_request) Failing after 20m29s
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
2026-03-27 22:46:13 +00:00
Compare
Author
Member

Code Review Report: PR #1147 — Rename Automation Profile Task Flags

Reviewer scope: All code changes in feature/m5-automation-profile-fields branch plus close connections to surrounding code.
Reference: Issue #902, docs/specification.md (Automation Profiles section, lines 28316-28485).
Method: 4 full review cycles covering bugs, logic errors, semantic mismatches, security, test coverage, test flaws, performance, migration safety, and edge cases. Each cycle examined all categories globally until no new issues were found.


Overall Assessment

The core rename of all 11 task-type threshold fields from phase-transition semantics to spec-defined task-type semantics is correctly implemented. All 8 built-in profile threshold values match the specification table exactly. The semantic mapping in PlanLifecycleService phase-transition gates (create_tool for Strategize->Execute, select_tool for Execute->Apply, access_network for Apply reversion, delete_content for Execute reversion) is correct per spec. The legacy field detection validator and migration are well-designed. Documentation, fixtures, schema, and example profiles are all properly updated with no stale old field names remaining.

However, 8 issues were identified that should be addressed before merge.


Findings by Severity

BUG -- Medium-High

B1. Benchmark _make_profile() crashes due to extra="forbid"

File: benchmarks/automation_profile_bench.py:54-56
Category: Bug
Impact: All ASV benchmark runs that call _make_profile() will fail with ValidationError.

The helper passes require_sandbox, require_checkpoints, allow_unsafe_tools as top-level kwargs to AutomationProfile(). These are SafetyProfile sub-model fields, not AutomationProfile fields. Since extra="forbid" was added to AutomationProfile.model_config (line 355 of automation_profile.py), Pydantic rejects unknown top-level keys.

# Current (broken):
return AutomationProfile(
    ...,
    require_sandbox=True,       # NOT a top-level field
    require_checkpoints=True,   # NOT a top-level field
    allow_unsafe_tools=False,   # NOT a top-level field
)

# Fix:
return AutomationProfile(
    ...,
    safety=SafetyProfile(
        require_sandbox=True,
        require_checkpoints=True,
        allow_unsafe_tools=False,
    ),
)

This affects ProfileValidationSuite.time_profile_construction (line 65) and ProfileSerializationSuite.setup (line 94).


SPEC COMPLIANCE -- Medium

S1. CLI JSON/YAML output uses flat structure; spec requires nested categorized grouping

File: src/cleveragents/cli/commands/automation_profile.py:83-118
Category: Spec compliance
Impact: JSON and YAML output from agents automation-profile show --format json|yaml does not match the specification's documented format.

The _profile_spec_dict function returns a flat dict with all 11 threshold fields as top-level keys. The specification (lines 17179-17200) defines a nested JSON structure with four category groupings:

{
  "phase_transitions": { "decompose_task": 0.0, "create_tool": 0.0, "select_tool": 1.0 },
  "decision_automation": { "edit_code": 0.0, "execute_command": 0.0 },
  "self_repair": { "create_file": 0.0, "delete_content": 1.0, "access_network": 1.0, "modify_config": 0.0, "approve_plan": 1.0 },
  "execution_controls": { "install_dependency": 0.0, "require_sandbox": true, "require_checkpoints": true, "allow_unsafe_tools": false }
}

Additionally, the spec's execution_controls group merges install_dependency with safety fields, while the code keeps safety as a separate nested dict. The Rich output (_print_profile) correctly uses the four category headers, but the machine-readable formats (JSON, YAML) do not.

Note: This may be a pre-existing gap if the flat format preceded the spec update, but since the spec was updated in this PR (574 lines changed in specification.md), the code should match the new spec.


ROBUSTNESS -- Medium

R1. _get_threshold() silently degrades to always-manual on unknown operation types

File: src/cleveragents/application/services/autonomy_controller.py:301-321
Category: Robustness / Observability
Impact: If any component passes an old operation_type string (e.g., "auto_execute" instead of "create_tool"), the hasattr check returns False and the method silently returns 1.0 (always-manual). No error, no warning, no log entry.

@staticmethod
def _get_threshold(profile: AutomationProfile, operation_type: str) -> float:
    if hasattr(profile, operation_type):
        value = getattr(profile, operation_type)
        if isinstance(value, (int, float)):
            return float(value)
    return 1.0  # Silent fallback -- invisible failure mode

Suggestion: Add a logger.warning() on the fallback path, or validate operation_type against AutomationProfile._THRESHOLD_FIELDS.

R2. Migration lacks idempotency guard

File: alembic/versions/m5_001_rename_automation_profile_fields.py:43-50
Category: Migration safety
Impact: upgrade() will fail on databases that already have the new column names (schema drift, re-run scenarios, partial migrations). Neither upgrade() nor downgrade() checks whether source columns exist before renaming.

Suggestion: Inspect inspector.get_columns("automation_profiles") before renaming, or wrap each alter_column in a try/except with appropriate handling.

R3. Repository accesses renamed columns without fallback guard

File: src/cleveragents/infrastructure/database/repositories.py:4453-4463
Category: Operational safety
Impact: If the database migration hasn't been applied, accessing row.decompose_task (etc.) raises AttributeError at runtime with no recovery path. The file already uses hasattr guards for safety_json and guards_json columns (lines 4428, 4443) -- the same pattern should be applied for the renamed threshold columns.


TEST COVERAGE -- Low

T1. Legacy field rejection test covers only 3 of 11 old names

File: features/steps/automation_profile_cli_steps.py:86-93
Category: Test coverage
Impact: _LEGACY_THRESHOLD_YAML only includes auto_strategize, auto_execute, auto_apply. The remaining 8 old field names are not explicitly tested for rejection. While the reject_legacy_field_names validator catches any key in _LEGACY_FIELD_MAP, the test only verifies the mechanism works for 3 of the 11 legacy names.

Suggestion: Either add the remaining 8 fields to the fixture, or add a parameterized test that iterates all 11 legacy names.

T2. CLI show tests don't assert category grouping headers

File: features/automation_profile_cli.feature:100-104
Category: Test coverage
Impact: The automation-profile show tests verify field names and values appear in output but do not assert that the category headers ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls") are present in Rich output. If the category grouping regresses to a flat list, tests would still pass.


DESIGN -- Low (Informational)

D1. extra="forbid" without schema_version-aware relaxation

File: src/cleveragents/domain/models/core/automation_profile.py:355
Category: Forward compatibility
Impact: The schema_version field (line 85) exists for forward compatibility, but there is no conditional logic that relaxes validation based on the version. YAML configs from a newer schema version that add new fields will be rejected by older code. This is a deliberate fail-fast design choice, but worth noting for future extensibility planning.


Positive Observations

  • All 8 built-in profile threshold values verified against spec table (lines 28373-28385) -- all match exactly.
  • Phase-transition gate mapping in PlanLifecycleService is semantically correct per spec.
  • _LEGACY_FIELD_MAP + model_validator(mode="before") provides excellent UX for config migration -- actionable error messages listing the exact renames needed.
  • extra="forbid" catches typos and unknown fields early, which is a good defensive choice.
  • All documentation files (ADR-017, automation_profiles.md, phase_reversion.md, error_recovery.md, plan_execute.md, CHANGELOG) are correctly updated.
  • YAML schema, fixtures, example profiles all clean -- no stale old field names in the codebase outside of intentional legacy detection contexts.
  • Repository roundtrip tests assert all 11 threshold fields individually.
  • Robot and Behave test helpers all properly updated.
  • Semantic escalation tests use new field names throughout.

Summary Table

ID Severity Category File Description
B1 Medium-High Bug benchmarks/automation_profile_bench.py:54-56 Benchmark crashes: safety fields passed as top-level kwargs; rejected by extra="forbid"
S1 Medium Spec Compliance cli/commands/automation_profile.py:83-118 JSON/YAML output is flat; spec requires nested category grouping
R1 Medium Robustness autonomy_controller.py:301-321 _get_threshold() silently degrades on unknown operation types
R2 Medium Migration Safety m5_001_rename_automation_profile_fields.py:43-50 No idempotency guard; fails on already-migrated databases
R3 Medium Operational Safety repositories.py:4453-4463 No hasattr fallback for renamed columns if migration not applied
T1 Low Test Coverage automation_profile_cli_steps.py:86-93 Legacy rejection tested for 3/11 old names only
T2 Low Test Coverage automation_profile_cli.feature:100-104 No assertion on CLI category grouping headers
D1 Low Design automation_profile.py:355 extra="forbid" without schema_version-aware relaxation

Recommendation: Fix B1 (benchmark crash) and address S1 (spec compliance) before merge. R1-R3 are recommended improvements. T1, T2, D1 are informational observations for future hardening.

# Code Review Report: PR #1147 — Rename Automation Profile Task Flags **Reviewer scope**: All code changes in `feature/m5-automation-profile-fields` branch plus close connections to surrounding code. **Reference**: Issue #902, `docs/specification.md` (Automation Profiles section, lines 28316-28485). **Method**: 4 full review cycles covering bugs, logic errors, semantic mismatches, security, test coverage, test flaws, performance, migration safety, and edge cases. Each cycle examined all categories globally until no new issues were found. --- ## Overall Assessment The core rename of all 11 task-type threshold fields from phase-transition semantics to spec-defined task-type semantics is **correctly implemented**. All 8 built-in profile threshold values match the specification table exactly. The semantic mapping in `PlanLifecycleService` phase-transition gates (`create_tool` for Strategize->Execute, `select_tool` for Execute->Apply, `access_network` for Apply reversion, `delete_content` for Execute reversion) is correct per spec. The legacy field detection validator and migration are well-designed. Documentation, fixtures, schema, and example profiles are all properly updated with no stale old field names remaining. However, 8 issues were identified that should be addressed before merge. --- ## Findings by Severity ### BUG -- Medium-High #### B1. Benchmark `_make_profile()` crashes due to `extra="forbid"` **File**: `benchmarks/automation_profile_bench.py:54-56` **Category**: Bug **Impact**: All ASV benchmark runs that call `_make_profile()` will fail with `ValidationError`. The helper passes `require_sandbox`, `require_checkpoints`, `allow_unsafe_tools` as **top-level kwargs** to `AutomationProfile()`. These are `SafetyProfile` sub-model fields, not `AutomationProfile` fields. Since `extra="forbid"` was added to `AutomationProfile.model_config` (line 355 of `automation_profile.py`), Pydantic rejects unknown top-level keys. ```python # Current (broken): return AutomationProfile( ..., require_sandbox=True, # NOT a top-level field require_checkpoints=True, # NOT a top-level field allow_unsafe_tools=False, # NOT a top-level field ) # Fix: return AutomationProfile( ..., safety=SafetyProfile( require_sandbox=True, require_checkpoints=True, allow_unsafe_tools=False, ), ) ``` This affects `ProfileValidationSuite.time_profile_construction` (line 65) and `ProfileSerializationSuite.setup` (line 94). --- ### SPEC COMPLIANCE -- Medium #### S1. CLI JSON/YAML output uses flat structure; spec requires nested categorized grouping **File**: `src/cleveragents/cli/commands/automation_profile.py:83-118` **Category**: Spec compliance **Impact**: JSON and YAML output from `agents automation-profile show --format json|yaml` does not match the specification's documented format. The `_profile_spec_dict` function returns a **flat** dict with all 11 threshold fields as top-level keys. The specification (lines 17179-17200) defines a **nested** JSON structure with four category groupings: ```json { "phase_transitions": { "decompose_task": 0.0, "create_tool": 0.0, "select_tool": 1.0 }, "decision_automation": { "edit_code": 0.0, "execute_command": 0.0 }, "self_repair": { "create_file": 0.0, "delete_content": 1.0, "access_network": 1.0, "modify_config": 0.0, "approve_plan": 1.0 }, "execution_controls": { "install_dependency": 0.0, "require_sandbox": true, "require_checkpoints": true, "allow_unsafe_tools": false } } ``` Additionally, the spec's `execution_controls` group merges `install_dependency` with safety fields, while the code keeps `safety` as a separate nested dict. The Rich output (`_print_profile`) correctly uses the four category headers, but the machine-readable formats (JSON, YAML) do not. Note: This may be a pre-existing gap if the flat format preceded the spec update, but since the spec was updated in this PR (574 lines changed in `specification.md`), the code should match the new spec. --- ### ROBUSTNESS -- Medium #### R1. `_get_threshold()` silently degrades to always-manual on unknown operation types **File**: `src/cleveragents/application/services/autonomy_controller.py:301-321` **Category**: Robustness / Observability **Impact**: If any component passes an old operation_type string (e.g., `"auto_execute"` instead of `"create_tool"`), the `hasattr` check returns `False` and the method silently returns `1.0` (always-manual). No error, no warning, no log entry. ```python @staticmethod def _get_threshold(profile: AutomationProfile, operation_type: str) -> float: if hasattr(profile, operation_type): value = getattr(profile, operation_type) if isinstance(value, (int, float)): return float(value) return 1.0 # Silent fallback -- invisible failure mode ``` **Suggestion**: Add a `logger.warning()` on the fallback path, or validate `operation_type` against `AutomationProfile._THRESHOLD_FIELDS`. #### R2. Migration lacks idempotency guard **File**: `alembic/versions/m5_001_rename_automation_profile_fields.py:43-50` **Category**: Migration safety **Impact**: `upgrade()` will fail on databases that already have the new column names (schema drift, re-run scenarios, partial migrations). Neither `upgrade()` nor `downgrade()` checks whether source columns exist before renaming. **Suggestion**: Inspect `inspector.get_columns("automation_profiles")` before renaming, or wrap each `alter_column` in a try/except with appropriate handling. #### R3. Repository accesses renamed columns without fallback guard **File**: `src/cleveragents/infrastructure/database/repositories.py:4453-4463` **Category**: Operational safety **Impact**: If the database migration hasn't been applied, accessing `row.decompose_task` (etc.) raises `AttributeError` at runtime with no recovery path. The file already uses `hasattr` guards for `safety_json` and `guards_json` columns (lines 4428, 4443) -- the same pattern should be applied for the renamed threshold columns. --- ### TEST COVERAGE -- Low #### T1. Legacy field rejection test covers only 3 of 11 old names **File**: `features/steps/automation_profile_cli_steps.py:86-93` **Category**: Test coverage **Impact**: `_LEGACY_THRESHOLD_YAML` only includes `auto_strategize`, `auto_execute`, `auto_apply`. The remaining 8 old field names are not explicitly tested for rejection. While the `reject_legacy_field_names` validator catches **any** key in `_LEGACY_FIELD_MAP`, the test only verifies the mechanism works for 3 of the 11 legacy names. **Suggestion**: Either add the remaining 8 fields to the fixture, or add a parameterized test that iterates all 11 legacy names. #### T2. CLI show tests don't assert category grouping headers **File**: `features/automation_profile_cli.feature:100-104` **Category**: Test coverage **Impact**: The `automation-profile show` tests verify field names and values appear in output but do not assert that the category headers ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls") are present in Rich output. If the category grouping regresses to a flat list, tests would still pass. --- ### DESIGN -- Low (Informational) #### D1. `extra="forbid"` without schema_version-aware relaxation **File**: `src/cleveragents/domain/models/core/automation_profile.py:355` **Category**: Forward compatibility **Impact**: The `schema_version` field (line 85) exists for forward compatibility, but there is no conditional logic that relaxes validation based on the version. YAML configs from a newer schema version that add new fields will be rejected by older code. This is a deliberate fail-fast design choice, but worth noting for future extensibility planning. --- ## Positive Observations - All 8 built-in profile threshold values verified against spec table (lines 28373-28385) -- **all match exactly**. - Phase-transition gate mapping in `PlanLifecycleService` is semantically correct per spec. - `_LEGACY_FIELD_MAP` + `model_validator(mode="before")` provides excellent UX for config migration -- actionable error messages listing the exact renames needed. - `extra="forbid"` catches typos and unknown fields early, which is a good defensive choice. - All documentation files (ADR-017, automation_profiles.md, phase_reversion.md, error_recovery.md, plan_execute.md, CHANGELOG) are correctly updated. - YAML schema, fixtures, example profiles all clean -- no stale old field names in the codebase outside of intentional legacy detection contexts. - Repository roundtrip tests assert all 11 threshold fields individually. - Robot and Behave test helpers all properly updated. - Semantic escalation tests use new field names throughout. --- ## Summary Table | ID | Severity | Category | File | Description | |-----|----------|----------|------|-------------| | B1 | **Medium-High** | Bug | `benchmarks/automation_profile_bench.py:54-56` | Benchmark crashes: safety fields passed as top-level kwargs; rejected by `extra="forbid"` | | S1 | **Medium** | Spec Compliance | `cli/commands/automation_profile.py:83-118` | JSON/YAML output is flat; spec requires nested category grouping | | R1 | **Medium** | Robustness | `autonomy_controller.py:301-321` | `_get_threshold()` silently degrades on unknown operation types | | R2 | **Medium** | Migration Safety | `m5_001_rename_automation_profile_fields.py:43-50` | No idempotency guard; fails on already-migrated databases | | R3 | **Medium** | Operational Safety | `repositories.py:4453-4463` | No hasattr fallback for renamed columns if migration not applied | | T1 | **Low** | Test Coverage | `automation_profile_cli_steps.py:86-93` | Legacy rejection tested for 3/11 old names only | | T2 | **Low** | Test Coverage | `automation_profile_cli.feature:100-104` | No assertion on CLI category grouping headers | | D1 | **Low** | Design | `automation_profile.py:355` | `extra="forbid"` without schema_version-aware relaxation | **Recommendation**: Fix B1 (benchmark crash) and address S1 (spec compliance) before merge. R1-R3 are recommended improvements. T1, T2, D1 are informational observations for future hardening.
Author
Member

Review Findings: Fix Application Report

Fixes Applied (amended into feature commit)

B1 -- Benchmark _make_profile() crash (FIXED)

File: benchmarks/automation_profile_bench.py:54-56
Fix: Wrapped require_sandbox, require_checkpoints, allow_unsafe_tools in a SafetyProfile(...) sub-model passed via the safety= parameter, replacing the broken top-level kwargs that are rejected by extra="forbid".

T1 -- Legacy rejection test fixture (IMPROVED)

File: features/steps/automation_profile_cli_steps.py:86-101
Fix: Extended _LEGACY_THRESHOLD_YAML fixture to include all 11 legacy field names instead of only 3, ensuring the legacy rejection validator is exercised against the complete set.

CHANGELOG updated

Added the benchmark fix to the existing #902 changelog entry.

All quality gates pass: lint (0 errors), typecheck (0 errors, 1 pre-existing warning), unit_tests (12,293 scenarios, 0 failures), integration_tests (1,699 tests, 0 failures).


Findings NOT Applied (with justification)

S1 -- CLI JSON/YAML flat vs nested structure

Decision: Out of scope for issue #902.
Justification: Issue #902 is scoped to "rename automation profile task flags to match specification field names" -- a field name change, not an output structure change. The flat JSON/YAML output format is pre-existing behavior that was not introduced by this PR. While the specification (updated in this PR) now shows a nested JSON format with phase_transitions/decision_automation/self_repair/execution_controls groupings, restructuring the output would be a new feature requiring its own issue, acceptance criteria, and separate testing. The Rich output already correctly uses the four category headers. Recommendation: File a separate issue for JSON/YAML output restructuring.

R1 -- _get_threshold() silent degradation

Decision: Out of scope for issue #902.
Justification: The _get_threshold method in autonomy_controller.py had its docstring updated (the only change from this PR), but its silent getattr() fallback to 1.0 is pre-existing behavior. Adding a logger.warning() or field validation would introduce new functionality beyond the rename scope. Per CONTRIBUTING.md: "One logical change per commit."

R2 -- Migration idempotency guard

Decision: Out of scope for issue #902.
Justification: Standard Alembic practice relies on the version table to track migration state rather than embedding idempotency guards in individual migrations. The batch_alter_table + alter_column pattern used is the canonical approach for SQLite column renames. Adding introspection guards would be non-standard and beyond the rename scope.

R3 -- Repository hasattr fallback for renamed columns

Decision: Out of scope for issue #902.
Justification: The pre-existing code also accessed the old column names (auto_strategize, etc.) without hasattr guards. The repository assumes the database schema matches the ORM model -- this is standard SQLAlchemy practice. Adding fallback guards for unmigrated databases would be a defensive improvement beyond the rename scope.

T2 -- CLI show tests missing category header assertions

Decision: Out of scope (non-blocking).
Justification: The code correctly produces the categorized Rich output ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls"). The existing tests verify the correct field names and values appear in output. Adding header assertions would be a test-strengthening improvement, not a bug fix.

D1 -- extra="forbid" without schema_version-aware relaxation

Decision: Design choice, not actionable.
Justification: The fail-fast behavior of extra="forbid" is intentional -- it catches typos and unknown fields at config load time. Future schema versions with new fields can be handled by bumping the schema_version and adding the fields to the model, which is the standard evolution path. No spec requirement mandates version-aware relaxation.

# Review Findings: Fix Application Report ## Fixes Applied (amended into feature commit) ### B1 -- Benchmark `_make_profile()` crash (FIXED) **File**: `benchmarks/automation_profile_bench.py:54-56` **Fix**: Wrapped `require_sandbox`, `require_checkpoints`, `allow_unsafe_tools` in a `SafetyProfile(...)` sub-model passed via the `safety=` parameter, replacing the broken top-level kwargs that are rejected by `extra="forbid"`. ### T1 -- Legacy rejection test fixture (IMPROVED) **File**: `features/steps/automation_profile_cli_steps.py:86-101` **Fix**: Extended `_LEGACY_THRESHOLD_YAML` fixture to include all 11 legacy field names instead of only 3, ensuring the legacy rejection validator is exercised against the complete set. ### CHANGELOG updated Added the benchmark fix to the existing #902 changelog entry. **All quality gates pass**: lint (0 errors), typecheck (0 errors, 1 pre-existing warning), unit_tests (12,293 scenarios, 0 failures), integration_tests (1,699 tests, 0 failures). --- ## Findings NOT Applied (with justification) ### S1 -- CLI JSON/YAML flat vs nested structure **Decision**: Out of scope for issue #902. **Justification**: Issue #902 is scoped to "rename automation profile task flags to match specification field names" -- a field **name** change, not an output **structure** change. The flat JSON/YAML output format is pre-existing behavior that was not introduced by this PR. While the specification (updated in this PR) now shows a nested JSON format with `phase_transitions`/`decision_automation`/`self_repair`/`execution_controls` groupings, restructuring the output would be a new feature requiring its own issue, acceptance criteria, and separate testing. The Rich output already correctly uses the four category headers. **Recommendation**: File a separate issue for JSON/YAML output restructuring. ### R1 -- `_get_threshold()` silent degradation **Decision**: Out of scope for issue #902. **Justification**: The `_get_threshold` method in `autonomy_controller.py` had its docstring updated (the only change from this PR), but its silent `getattr()` fallback to `1.0` is pre-existing behavior. Adding a `logger.warning()` or field validation would introduce new functionality beyond the rename scope. Per CONTRIBUTING.md: "One logical change per commit." ### R2 -- Migration idempotency guard **Decision**: Out of scope for issue #902. **Justification**: Standard Alembic practice relies on the version table to track migration state rather than embedding idempotency guards in individual migrations. The `batch_alter_table` + `alter_column` pattern used is the canonical approach for SQLite column renames. Adding introspection guards would be non-standard and beyond the rename scope. ### R3 -- Repository `hasattr` fallback for renamed columns **Decision**: Out of scope for issue #902. **Justification**: The pre-existing code also accessed the old column names (`auto_strategize`, etc.) without `hasattr` guards. The repository assumes the database schema matches the ORM model -- this is standard SQLAlchemy practice. Adding fallback guards for unmigrated databases would be a defensive improvement beyond the rename scope. ### T2 -- CLI show tests missing category header assertions **Decision**: Out of scope (non-blocking). **Justification**: The code correctly produces the categorized Rich output ("Phase Transitions", "Decision Automation", "Self-Repair", "Execution Controls"). The existing tests verify the correct field names and values appear in output. Adding header assertions would be a test-strengthening improvement, not a bug fix. ### D1 -- `extra="forbid"` without schema_version-aware relaxation **Decision**: Design choice, not actionable. **Justification**: The fail-fast behavior of `extra="forbid"` is intentional -- it catches typos and unknown fields at config load time. Future schema versions with new fields can be handled by bumping the `schema_version` and adding the fields to the model, which is the standard evolution path. No spec requirement mandates version-aware relaxation.
CoreRasurae force-pushed feature/m5-automation-profile-fields from 4dbb6ef97e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m51s
CI / typecheck (pull_request) Successful in 3m54s
CI / security (pull_request) Successful in 4m2s
CI / e2e_tests (pull_request) Successful in 5m24s
CI / coverage (pull_request) Successful in 7m55s
CI / integration_tests (pull_request) Failing after 20m26s
CI / unit_tests (pull_request) Failing after 20m29s
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
to 53c8d0d86c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 45s
CI / build (pull_request) Successful in 31s
CI / lint (pull_request) Successful in 3m23s
CI / quality (pull_request) Successful in 3m44s
CI / unit_tests (pull_request) Successful in 3m47s
CI / security (pull_request) Successful in 4m1s
CI / docker (pull_request) Successful in 1m15s
CI / integration_tests (pull_request) Successful in 6m49s
CI / e2e_tests (pull_request) Successful in 8m24s
CI / coverage (pull_request) Successful in 11m4s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-27 23:21:47 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1147 (Issue #902)

Refactor: Rename automation profile task flags to spec names
Branch: feature/m5-automation-profile-fields
Commit: 53c8d0d8 by CoreRasurae (Luis Mendes)
Reviewer: Automated multi-pass review (3 global cycles across all categories)


Summary

This is a large, well-executed renaming refactor touching 56 files across the domain model, services, CLI, database layer, migrations, tests, benchmarks, and documentation. All 11 task-type threshold fields have been renamed from phase-transition semantics (auto_strategize, auto_execute, etc.) to spec-defined task-type semantics (decompose_task, create_tool, etc.). The rename is consistent across all layers.

Verified correct:

  • All 11 field name mappings match the specification's Automatable Tasks table
  • All 8 built-in profile threshold values match the spec table exactly
  • Phase-transition gate mappings in PlanLifecycleService are semantically correct
  • Repository layer (ORM model, _to_domain, _from_domain, _update_row) is consistent
  • Legacy field rejection validator works correctly with extra="forbid"
  • Alembic migration correctly renames all 11 columns
  • No stale old field names found outside of intentional contexts (migration, legacy map, legacy rejection test)
  • Benchmark fixes correctly use SafetyProfile sub-model (fixes extra="forbid" incompatibility)

Findings by Severity

MEDIUM — Spec Deviation

F1. CLI JSON/YAML output structure does not match spec's grouped format

  • Files: src/cleveragents/cli/commands/automation_profile.py_profile_spec_dict() (line ~92)
  • Issue: The specification (lines 17168-17205) defines the JSON output for automation-profile show with thresholds grouped into nested objects (phase_transitions, decision_automation, self_repair, execution_controls). The code produces a flat structure with all 11 threshold fields at the top level alongside metadata.
  • Impact: Clients parsing the JSON output expecting the spec's nested structure would break.
  • Suggestion: Either update _profile_spec_dict() to produce the grouped structure when called from show, or document the deviation. The flat structure is arguably simpler for YAML round-trip, but the spec is explicit.

F2. CLI Rich show output places safety fields in a separate section instead of within "Execution Controls"

  • Files: src/cleveragents/cli/commands/automation_profile.py_print_profile() (line ~145)
  • Issue: The specification (lines 17093-17131) shows require_sandbox, require_checkpoints, and allow_unsafe_tools inside the "Execution Controls (thresholds)" section alongside install_dependency. The code puts them in a separate "Safety Profile (Composed Sub-Model)" section.
  • Impact: Visual/UX deviation from spec. Low user impact but creates inconsistency with spec examples.
  • Suggestion: Consider merging the key safety booleans into Execution Controls for the Rich output, keeping the full safety sub-model detail in JSON/YAML outputs.

LOW — Spec Clarification Needed

F3. "auto" profile description diverges from spec text

  • Files: src/cleveragents/domain/models/core/automation_profile.py (line ~478)
  • Issue: The code says description="Fully automatic except apply and reversion". The spec (line 28406) says "Fully automatic except apply". However, the code description is arguably more accurate since the "auto" profile has both select_tool=1.0 (Apply manual) and access_network=1.0 (reversion from Apply manual). The spec's threshold table confirms both are 1.0.
  • Impact: Cosmetic. The code is more precise than the spec text. This may warrant updating the spec's prose to match.
  • Suggestion: Consider filing a spec clarification issue, or reverting the description to match the spec verbatim and letting the threshold values speak for themselves.

F4. OperationContext.operation_type accepts arbitrary strings — no validation against known threshold fields

  • Files: src/cleveragents/domain/models/core/escalation.py (line ~99), src/cleveragents/application/services/autonomy_controller.py_get_threshold() (line ~300)
  • Issue: operation_type is a free-form str with only min_length=1 validation. _get_threshold() uses getattr(profile, operation_type) with a fallback to 1.0. This means:
    • A typo (e.g., "creat_tool") silently defaults to 1.0 (always escalate)
    • Passing old field names (e.g., "auto_execute") silently defaults to 1.0
    • No compile-time or runtime guardrail catches these
  • Impact: Safe-fail by design (unknown types escalate to human), but could mask misconfiguration. Pre-existing design, not introduced by this PR.
  • Suggestion: Consider adding a warnings.warn() in _get_threshold() when the operation_type does not match a known _THRESHOLD_FIELDS entry, to aid debugging without breaking the safe-fail behavior.

LOW — Documentation / Readability

F5. Bridge comments in test steps lost semantic context after rename

  • Files: features/steps/plan_lifecycle_service_coverage_r2_steps.py (lines ~290, ~373)
  • Issue: Comments were updated from # ci profile has auto_reversion_from_apply=0.0 to # ci profile has access_network=0.0. The old comments explained the semantic role (permits auto-reversion from apply). The new comments only state the field name without explaining what it controls.
  • Suggestion: Update to something like # ci profile has access_network=0.0 (auto-revert from Apply permitted).

INFO — Observations (No Action Required)

F6. extra="forbid" is a breaking change for custom YAML configs with extra fields

  • Files: src/cleveragents/domain/models/core/automation_profile.pymodel_config
  • Observation: The addition of extra="forbid" means any previously-tolerated extra fields in custom YAML configs will now cause validation errors. This is correct for strict typing but is a behavioral change. The legacy validator provides clear errors for known old names; unknown extra fields get a generic Pydantic error.
  • Note: The YAML schema (docs/schema/automation_profile.schema.yaml) was also updated with additionalProperties: false, maintaining consistency between schema and model.

F7. _get_threshold() dynamic attribute lookup is resilient to non-threshold attributes

  • Observation: The isinstance(value, (int, float)) guard in _get_threshold() correctly prevents non-numeric attributes (e.g., name, description, safety) from being returned as thresholds. No security concern.

F8. Alembic migration uses batch_alter_table — correct for SQLite

  • Observation: Using batch_alter_table is the right pattern for SQLite (which historically lacked ALTER TABLE RENAME COLUMN). The migration correctly handles both upgrade and downgrade paths.

Test Coverage Assessment

Area Coverage Notes
Model field rename (all 11 fields) Covered BDD scenarios for valid/invalid thresholds on all fields
Built-in profiles (8 profiles) Covered Scenarios verify key thresholds for each profile
Legacy field rejection Covered New scenario "Add profile with legacy threshold keys fails"
YAML schema + model parity Covered New scenario "Guarded profile config passes docs schema and model"
Repository roundtrip (all 11 fields) Covered Extended assertion in roundtrip test
CLI show/list/add/remove Covered Existing scenarios updated to new field names
Semantic escalation with new names Covered All escalation scenarios use create_tool
Benchmark compatibility Covered Benchmarks updated with SafetyProfile sub-model
Robot integration (E2E) Covered m6_acceptance.robot updated

No test coverage gaps identified for the changes in scope.


Acceptance Criteria Verification (Issue #902)

Criterion Status Notes
All 11 task flag field names match spec PASS All match Automatable Tasks table
All 8 built-in profiles correct values PASS All values match spec threshold table
YAML configs use spec field names PASS Schema, examples, and tests updated
automation-profile show displays spec names PASS Rich/JSON/YAML/Plain outputs use new names
Profile resolution logic works PASS _resolve_profile_for_plan unchanged, tests pass
AutonomyController references updated PASS _get_threshold compatible via getattr
Database migration renames columns PASS Alembic m5_001 migration provided

Verdict

This is a thorough, well-structured refactoring. The core rename is correct and consistent across all layers. The findings above are primarily spec-alignment refinements (CLI output grouping) and defensive-coding suggestions (operation_type validation), none of which block merge. The two medium findings (F1, F2) about CLI output structure vs spec deserve attention but may be deferred to a follow-up issue if the current flat structure is preferred for technical reasons.

## Code Review Report — PR #1147 (Issue #902) **Refactor: Rename automation profile task flags to spec names** **Branch:** `feature/m5-automation-profile-fields` **Commit:** `53c8d0d8` by CoreRasurae (Luis Mendes) **Reviewer:** Automated multi-pass review (3 global cycles across all categories) --- ### Summary This is a large, well-executed renaming refactor touching 56 files across the domain model, services, CLI, database layer, migrations, tests, benchmarks, and documentation. All 11 task-type threshold fields have been renamed from phase-transition semantics (`auto_strategize`, `auto_execute`, etc.) to spec-defined task-type semantics (`decompose_task`, `create_tool`, etc.). The rename is consistent across all layers. **Verified correct:** - All 11 field name mappings match the specification's Automatable Tasks table - All 8 built-in profile threshold values match the spec table exactly - Phase-transition gate mappings in `PlanLifecycleService` are semantically correct - Repository layer (ORM model, `_to_domain`, `_from_domain`, `_update_row`) is consistent - Legacy field rejection validator works correctly with `extra="forbid"` - Alembic migration correctly renames all 11 columns - No stale old field names found outside of intentional contexts (migration, legacy map, legacy rejection test) - Benchmark fixes correctly use `SafetyProfile` sub-model (fixes `extra="forbid"` incompatibility) --- ### Findings by Severity #### MEDIUM — Spec Deviation **F1. CLI JSON/YAML output structure does not match spec's grouped format** - **Files:** `src/cleveragents/cli/commands/automation_profile.py` — `_profile_spec_dict()` (line ~92) - **Issue:** The specification (lines 17168-17205) defines the JSON output for `automation-profile show` with thresholds grouped into nested objects (`phase_transitions`, `decision_automation`, `self_repair`, `execution_controls`). The code produces a **flat** structure with all 11 threshold fields at the top level alongside metadata. - **Impact:** Clients parsing the JSON output expecting the spec's nested structure would break. - **Suggestion:** Either update `_profile_spec_dict()` to produce the grouped structure when called from `show`, or document the deviation. The flat structure is arguably simpler for YAML round-trip, but the spec is explicit. **F2. CLI Rich `show` output places safety fields in a separate section instead of within "Execution Controls"** - **Files:** `src/cleveragents/cli/commands/automation_profile.py` — `_print_profile()` (line ~145) - **Issue:** The specification (lines 17093-17131) shows `require_sandbox`, `require_checkpoints`, and `allow_unsafe_tools` **inside** the "Execution Controls (thresholds)" section alongside `install_dependency`. The code puts them in a separate "Safety Profile (Composed Sub-Model)" section. - **Impact:** Visual/UX deviation from spec. Low user impact but creates inconsistency with spec examples. - **Suggestion:** Consider merging the key safety booleans into Execution Controls for the Rich output, keeping the full safety sub-model detail in JSON/YAML outputs. #### LOW — Spec Clarification Needed **F3. "auto" profile description diverges from spec text** - **Files:** `src/cleveragents/domain/models/core/automation_profile.py` (line ~478) - **Issue:** The code says `description="Fully automatic except apply and reversion"`. The spec (line 28406) says "Fully automatic except apply". However, the code description is arguably **more accurate** since the "auto" profile has both `select_tool=1.0` (Apply manual) and `access_network=1.0` (reversion from Apply manual). The spec's threshold table confirms both are 1.0. - **Impact:** Cosmetic. The code is more precise than the spec text. This may warrant updating the spec's prose to match. - **Suggestion:** Consider filing a spec clarification issue, or reverting the description to match the spec verbatim and letting the threshold values speak for themselves. **F4. `OperationContext.operation_type` accepts arbitrary strings — no validation against known threshold fields** - **Files:** `src/cleveragents/domain/models/core/escalation.py` (line ~99), `src/cleveragents/application/services/autonomy_controller.py` — `_get_threshold()` (line ~300) - **Issue:** `operation_type` is a free-form `str` with only `min_length=1` validation. `_get_threshold()` uses `getattr(profile, operation_type)` with a fallback to `1.0`. This means: - A typo (e.g., `"creat_tool"`) silently defaults to 1.0 (always escalate) - Passing old field names (e.g., `"auto_execute"`) silently defaults to 1.0 - No compile-time or runtime guardrail catches these - **Impact:** Safe-fail by design (unknown types escalate to human), but could mask misconfiguration. Pre-existing design, not introduced by this PR. - **Suggestion:** Consider adding a `warnings.warn()` in `_get_threshold()` when the operation_type does not match a known `_THRESHOLD_FIELDS` entry, to aid debugging without breaking the safe-fail behavior. #### LOW — Documentation / Readability **F5. Bridge comments in test steps lost semantic context after rename** - **Files:** `features/steps/plan_lifecycle_service_coverage_r2_steps.py` (lines ~290, ~373) - **Issue:** Comments were updated from `# ci profile has auto_reversion_from_apply=0.0` to `# ci profile has access_network=0.0`. The old comments explained the **semantic role** (permits auto-reversion from apply). The new comments only state the field name without explaining what it controls. - **Suggestion:** Update to something like `# ci profile has access_network=0.0 (auto-revert from Apply permitted)`. #### INFO — Observations (No Action Required) **F6. `extra="forbid"` is a breaking change for custom YAML configs with extra fields** - **Files:** `src/cleveragents/domain/models/core/automation_profile.py` — `model_config` - **Observation:** The addition of `extra="forbid"` means any previously-tolerated extra fields in custom YAML configs will now cause validation errors. This is correct for strict typing but is a behavioral change. The legacy validator provides clear errors for known old names; unknown extra fields get a generic Pydantic error. - **Note:** The YAML schema (`docs/schema/automation_profile.schema.yaml`) was also updated with `additionalProperties: false`, maintaining consistency between schema and model. **F7. `_get_threshold()` dynamic attribute lookup is resilient to non-threshold attributes** - **Observation:** The `isinstance(value, (int, float))` guard in `_get_threshold()` correctly prevents non-numeric attributes (e.g., `name`, `description`, `safety`) from being returned as thresholds. No security concern. **F8. Alembic migration uses `batch_alter_table` — correct for SQLite** - **Observation:** Using `batch_alter_table` is the right pattern for SQLite (which historically lacked `ALTER TABLE RENAME COLUMN`). The migration correctly handles both upgrade and downgrade paths. --- ### Test Coverage Assessment | Area | Coverage | Notes | |------|----------|-------| | Model field rename (all 11 fields) | Covered | BDD scenarios for valid/invalid thresholds on all fields | | Built-in profiles (8 profiles) | Covered | Scenarios verify key thresholds for each profile | | Legacy field rejection | Covered | New scenario "Add profile with legacy threshold keys fails" | | YAML schema + model parity | Covered | New scenario "Guarded profile config passes docs schema and model" | | Repository roundtrip (all 11 fields) | Covered | Extended assertion in roundtrip test | | CLI show/list/add/remove | Covered | Existing scenarios updated to new field names | | Semantic escalation with new names | Covered | All escalation scenarios use `create_tool` | | Benchmark compatibility | Covered | Benchmarks updated with `SafetyProfile` sub-model | | Robot integration (E2E) | Covered | m6_acceptance.robot updated | **No test coverage gaps identified for the changes in scope.** --- ### Acceptance Criteria Verification (Issue #902) | Criterion | Status | Notes | |-----------|--------|-------| | All 11 task flag field names match spec | PASS | All match Automatable Tasks table | | All 8 built-in profiles correct values | PASS | All values match spec threshold table | | YAML configs use spec field names | PASS | Schema, examples, and tests updated | | `automation-profile show` displays spec names | PASS | Rich/JSON/YAML/Plain outputs use new names | | Profile resolution logic works | PASS | `_resolve_profile_for_plan` unchanged, tests pass | | AutonomyController references updated | PASS | `_get_threshold` compatible via `getattr` | | Database migration renames columns | PASS | Alembic m5_001 migration provided | --- ### Verdict This is a thorough, well-structured refactoring. The core rename is correct and consistent across all layers. The findings above are primarily **spec-alignment refinements** (CLI output grouping) and **defensive-coding suggestions** (operation_type validation), none of which block merge. The two medium findings (F1, F2) about CLI output structure vs spec deserve attention but may be deferred to a follow-up issue if the current flat structure is preferred for technical reasons.
CoreRasurae force-pushed feature/m5-automation-profile-fields from 53c8d0d86c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 45s
CI / build (pull_request) Successful in 31s
CI / lint (pull_request) Successful in 3m23s
CI / quality (pull_request) Successful in 3m44s
CI / unit_tests (pull_request) Successful in 3m47s
CI / security (pull_request) Successful in 4m1s
CI / docker (pull_request) Successful in 1m15s
CI / integration_tests (pull_request) Successful in 6m49s
CI / e2e_tests (pull_request) Successful in 8m24s
CI / coverage (pull_request) Successful in 11m4s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Has been cancelled
to bd277b9469
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 41s
CI / lint (pull_request) Successful in 3m51s
CI / typecheck (pull_request) Successful in 4m28s
CI / quality (pull_request) Successful in 3m59s
CI / security (pull_request) Successful in 4m38s
CI / unit_tests (pull_request) Successful in 7m29s
CI / docker (pull_request) Successful in 1m9s
CI / integration_tests (pull_request) Successful in 8m7s
CI / e2e_tests (pull_request) Successful in 10m22s
CI / coverage (pull_request) Successful in 12m44s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 51m30s
2026-03-28 00:00:12 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1147 (Issue #902)

refactor(autonomy): rename automation profile task flags to spec names
Branch: feature/m5-automation-profile-fields | Author: Luis Mendes (@CoreRasurae)
Review scope: All code changes in the branch + close connections to surrounding code
Review cycles: 3 full passes (bugs, security, performance, test coverage, test quality, documentation)


Executive Summary

The rename of all 11 automation profile threshold fields is mechanically correct — all 88 built-in profile threshold values match the specification table exactly, the Alembic migration is SQLite-safe and reversible, the legacy field validator is comprehensive, and no stale references to old auto_* field names remain in production code. The extra="forbid" on AutomationProfile provides a strong safety net against typos.

The main areas of concern are: (1) documentation inconsistencies introduced or carried forward by the PR across ADRs and reference docs, (2) test coverage that only verifies 3 of 11 threshold fields in most assertion paths, and (3) a processing-state guard asymmetry between the two reversion methods.

No security issues found. YAML loading uses safe_load, the migration uses parameterized Alembic API calls, and the model rejects unknown fields.


Findings by Severity

HIGH — Documentation Inconsistencies (within PR scope)

H1. ADR-017 built-in profiles summary text is inaccurate
docs/adr/ADR-017-automation-profiles.md:98-101

Three profile rows have misleading "Key Settings" summaries:

Profile ADR-017 claims Actual (spec table + code)
supervised select_tool: 1.0, others low 7 fields at 1.0 (create_tool, execute_command, create_file, delete_content, access_network, install_dependency, approve_plan)
trusted Low thresholds (~0.3) 4 fields at 1.0 (select_tool, delete_content, access_network, approve_plan)
auto All thresholds 0.0 2 fields at 1.0 (select_tool=1.0, access_network=1.0)

These summaries predate the rename but were not corrected when the field names were updated.

H2. Default profile name inconsistent across 3 documents

Document Claimed Default
docs/reference/automation_profile_service.md:19 manual
docs/adr/ADR-017-automation-profiles.md:126 supervised
docs/reference/automation_profiles.md:95 review

The code truth (_DEFAULT_PROFILE = "manual") matches only the service doc. The other two are wrong.

H3. Resolution precedence in automation_profiles.md is wrong
docs/reference/automation_profiles.md:90-95

The doc describes: Plan → Project → Organization → System default (review).
The code and spec use: Plan → Action → Project → Global default (manual).
The Action level is entirely missing and "Organization" is fabricated.


MEDIUM — Code Quality

M1. try_auto_revert_from_execute missing processing-state guard
src/cleveragents/application/services/plan_lifecycle_service.py:1787

try_auto_revert_from_apply (line 1737) guards on plan.processing_state != ProcessingState.CONSTRAINED before proceeding. The symmetric try_auto_revert_from_execute only checks plan.phase != PlanPhase.EXECUTE — it will attempt reversion on a plan in any processing state (QUEUED, PROCESSING, COMPLETE, ERRORED). A caller could invoke this on a plan that is happily PROCESSING in Execute and it would revert it. Consider whether a processing-state guard (e.g., requiring ERRORED or CONSTRAINED) is needed here.

M2. Schema sub-objects missing additionalProperties: false
docs/schema/automation_profile.schema.yaml:97-130, 132-174

The top-level schema correctly has additionalProperties: false (matching extra="forbid" on the Pydantic model), but the guards and safety sub-objects lack this constraint. This creates an asymmetry where the top level strictly rejects unknown fields but sub-objects silently accept them.

M3. auto profile description omits access_network=1.0 gate

Three locations describe the auto profile inaccurately:

Location Description Missing
Code automation_profile.py:481 "Fully automatic except apply" access_network=1.0 (Apply→Strategize reversion also manual)
examples/profiles/auto.yaml:6 "Fully automatic except apply and reversion" Closer but inconsistent with code
Spec prose (line 28406) "All thresholds 0.0 except Apply (1.0)" access_network=1.0

The spec's own prose contradicts its own table. The auto.yaml example is actually the most accurate of the three but diverges from the code's description string.


MEDIUM — Test Coverage Gaps

M4. Escalation test coverage: only 1 of 11 operations tested
features/semantic_escalation.feature:37-51

The Scenario Outline evaluates all 8 profiles but only for operation "create_tool". The other 10 task-type operations (decompose_task, select_tool, edit_code, execute_command, create_file, delete_content, access_network, install_dependency, modify_config, approve_plan) are never tested against profile thresholds. If getattr(profile, op, 1.0) fails for a renamed field, that bug is invisible.

M5. Legacy validator test only checks 1 of 11 keys
features/automation_profile_cli.feature:49-53

The "Add profile with legacy threshold keys fails" scenario only asserts auto_strategize appears in the error output. It does not verify: the error shows the correct new name (decompose_task), all 11 legacy keys are detected, or the mapping advice is present. A validator that catches only one legacy key and silently ignores the other 10 would pass this test.

M6. No boundary-value tests for 10 of 11 fields
features/consolidated_automation_profile.feature

Only decompose_task gets full boundary treatment (0.0, 0.5, 1.0 accepted; -0.1 and 1.1 rejected). The other 10 fields get single-sided rejection tests only. For example:

  • approve_plan only tested for < 0.0, not > 1.0
  • select_tool only tested for < 0.0, not > 1.0
  • modify_config only tested for > 1.0, not < 0.0

M7. No test for mixed old + new field names

No scenario tests what happens when a config contains BOTH old and new names (e.g., auto_strategize: 0.5 AND decompose_task: 0.8). The model_validator checks for legacy keys in values but doesn't handle the mixed case explicitly.

M8. Robot/service-level tests validate only 3 of 11 thresholds

  • robot/helper_automation_profile.py:35-37_test_profile() asserts range only for decompose_task, create_tool, select_tool
  • features/steps/automation_profile_service_steps.py — assertion steps exist only for select_tool, decompose_task, create_tool
  • robot/e2e/m6_acceptance.robot:170-172 — show test checks only 3 field names

If the rename broke any of the other 8 fields, these tests would still pass.

M9. M6 fixture files only specify 3 of 11 thresholds
features/fixtures/m6/automation_profiles.json, features/fixtures/m6/autonomy_guardrails.json

manual_thresholds and full_auto_thresholds only list decompose_task, create_tool, select_tool + safety fields. The other 8 threshold fields are absent from fixture validation. Guard profiles also only set the same 3 fields.


LOW — Test Quality / Isolation

L1. Silent getattr fallback masks bugs in escalation logic
features/steps/semantic_escalation_steps.py:145

threshold = getattr(profile, op, 1.0)

The default=1.0 means a typo or stale field name silently falls back to 1.0 (manual). For the manual profile this is indistinguishable from correct behavior.

L2. CLI output assertion case-sensitivity inconsistency
features/steps/automation_profile_cli_steps.py:510,539,548

The generic "output should contain" step is case-insensitive while JSON/YAML-specific assertions are case-sensitive. Tests verifying exact field names through the generic step are unreliable.

L3. No test that model_dump()/serialization uses new names exclusively
features/steps/automation_profile_steps.py:508-528

The dump test only verifies name and select_tool keys exist. It doesn't verify all 11 new names are present or that no old names appear.

L4. Threshold summary assertion only checks 3 of 11 fields
features/steps/automation_profile_cli_coverage_steps.py:119-123

Only decompose_task=, create_tool=, select_tool= are checked. If the summary omits the other 8, the test still passes.

L5. Robot guard enforcement uses overly loose OR-chain
robot/e2e/m6_acceptance.robot:96

The has_profile check passes if any one of 7 keywords appears anywhere in stdout+stderr. An unrelated error message containing automation_profile would satisfy it.

L6. _create_in_memory_profile_service() duplicated 3 times

The same helper is copy-pasted in automation_profile_cli_steps.py:21-38, automation_profile_cli_coverage_steps.py:37-54, and automation_profile_cli_coverage_boost_steps.py:27-44. Maintenance risk if the service setup changes.

L7. Test step function names still use old-era naming
features/steps/automation_profile_steps.py — 16 functions (e.g., step_try_create_profile_auto_dec_strat instead of referencing edit_code). Not a failure risk but misleading for readers.

L8. Feature file step text references old semantics
features/automation_profile_cli_coverage.feature:15 — Says "strategize execute and apply values" but the assertion body checks decompose_task=, create_tool=, select_tool=.


LOW — Performance / Benchmarks

L9. from_config benchmark uses only 3 of 11 thresholds
benchmarks/automation_profile_bench.py:114-120

The config dict only specifies decompose_task, create_tool, select_tool. The remaining 8 fields take Pydantic defaults. This doesn't measure the cost of validating a fully-populated config (the common real-world case).

L10. CLI benchmark double-calls _patch_service()
benchmarks/automation_profile_cli_bench.py:106,117

_patch_service() is called in both setup() and time_add_from_config(). The per-iteration call contaminates timing measurements with service construction overhead.


LOW — Code Quality

L11. validate_threshold validator is redundant
src/cleveragents/domain/models/core/automation_profile.py:210-228

The @field_validator checking 0.0 <= v <= 1.0 is redundant with the ge=0.0, le=1.0 constraints already on each Field(). Pydantic validates constraints before field validators run.

L12. Migration does not specify existing_type
alembic/versions/m5_001_rename_automation_profile_fields.py:42-43

alter_column() omits existing_type=sa.Float(). Safe for SQLite (batch mode infers from existing column) but could cause issues on other DB backends.


INFO — Observations

I1. Specification uses task-type names for phase-transition gates by design

The spec (lines 28333-28343) explicitly maps create_tool → Strategize→Execute gate, select_tool → Execute→Apply gate, access_network → Apply→Strategize reversion, delete_content → Execute→Strategize reversion. The bridge comments in plan_lifecycle_service.py:1505-1511 correctly document this. The semantic disconnect between field names and their actual behavioral meaning is a specification design choice, not a code bug.

I2. 7 of 11 thresholds are unused in plan_lifecycle_service.py

Only create_tool, select_tool, access_network, and delete_content are read by the lifecycle service. The other 7 (decompose_task, edit_code, execute_command, create_file, install_dependency, modify_config, approve_plan) are presumably consumed by actor execution, the AutonomyController, or other services. This is fine but worth documenting.

I3. No stale references to old field names in production code

A full codebase sweep confirmed all occurrences of old auto_* names exist only in intentional locations: _LEGACY_FIELD_MAP (migration detection), Alembic _COLUMN_RENAMES (DB migration), and _LEGACY_THRESHOLD_YAML (test fixture for rejection testing). The unrelated ProjectSettings.auto_apply and PlanConfig.auto_apply are different models and not affected.

I4. All 88 threshold values across 8 profiles match the spec table exactly

Verified field-by-field. The code is faithfully implementing the specification.


Summary Table

Severity Count Categories
HIGH 3 Documentation
MEDIUM 9 Code quality (2), Test coverage (7)
LOW 12 Test quality (6), Performance (2), Code quality (2), Documentation (2)
INFO 4 Observations
Total 28
  1. Fix H1-H3 — Documentation inconsistencies are actively misleading. H2 (wrong default profile) and H3 (wrong precedence chain) could cause users to misconfigure their profiles.
  2. Address M1 — The try_auto_revert_from_execute processing-state guard asymmetry is a potential logic bug that could cause unintended reversions.
  3. Improve M4-M9 — Test coverage breadth. Most test paths only verify 3 of 11 fields, providing a thin safety net for future regressions on the other 8 fields.
  4. Low/Info items — Address at discretion. None are blockers.
# Code Review Report — PR #1147 (Issue #902) **refactor(autonomy): rename automation profile task flags to spec names** **Branch:** `feature/m5-automation-profile-fields` | **Author:** Luis Mendes (@CoreRasurae) **Review scope:** All code changes in the branch + close connections to surrounding code **Review cycles:** 3 full passes (bugs, security, performance, test coverage, test quality, documentation) --- ## Executive Summary The rename of all 11 automation profile threshold fields is **mechanically correct** — all 88 built-in profile threshold values match the specification table exactly, the Alembic migration is SQLite-safe and reversible, the legacy field validator is comprehensive, and no stale references to old `auto_*` field names remain in production code. The `extra="forbid"` on `AutomationProfile` provides a strong safety net against typos. The main areas of concern are: (1) documentation inconsistencies introduced or carried forward by the PR across ADRs and reference docs, (2) test coverage that only verifies 3 of 11 threshold fields in most assertion paths, and (3) a processing-state guard asymmetry between the two reversion methods. **No security issues found.** YAML loading uses `safe_load`, the migration uses parameterized Alembic API calls, and the model rejects unknown fields. --- ## Findings by Severity ### HIGH — Documentation Inconsistencies (within PR scope) **H1. ADR-017 built-in profiles summary text is inaccurate** `docs/adr/ADR-017-automation-profiles.md:98-101` Three profile rows have misleading "Key Settings" summaries: | Profile | ADR-017 claims | Actual (spec table + code) | |---------|---------------|---------------------------| | `supervised` | `select_tool: 1.0`, others low | **7 fields** at 1.0 (`create_tool`, `execute_command`, `create_file`, `delete_content`, `access_network`, `install_dependency`, `approve_plan`) | | `trusted` | Low thresholds (~0.3) | **4 fields** at 1.0 (`select_tool`, `delete_content`, `access_network`, `approve_plan`) | | `auto` | All thresholds 0.0 | **2 fields** at 1.0 (`select_tool=1.0`, `access_network=1.0`) | These summaries predate the rename but were not corrected when the field names were updated. **H2. Default profile name inconsistent across 3 documents** | Document | Claimed Default | |----------|----------------| | `docs/reference/automation_profile_service.md:19` | `manual` | | `docs/adr/ADR-017-automation-profiles.md:126` | `supervised` | | `docs/reference/automation_profiles.md:95` | `review` | The code truth (`_DEFAULT_PROFILE = "manual"`) matches only the service doc. The other two are wrong. **H3. Resolution precedence in `automation_profiles.md` is wrong** `docs/reference/automation_profiles.md:90-95` The doc describes: Plan → Project → Organization → System default (`review`). The code and spec use: Plan → Action → Project → Global default (`manual`). The Action level is entirely missing and "Organization" is fabricated. --- ### MEDIUM — Code Quality **M1. `try_auto_revert_from_execute` missing processing-state guard** `src/cleveragents/application/services/plan_lifecycle_service.py:1787` `try_auto_revert_from_apply` (line 1737) guards on `plan.processing_state != ProcessingState.CONSTRAINED` before proceeding. The symmetric `try_auto_revert_from_execute` only checks `plan.phase != PlanPhase.EXECUTE` — it will attempt reversion on a plan in **any** processing state (QUEUED, PROCESSING, COMPLETE, ERRORED). A caller could invoke this on a plan that is happily PROCESSING in Execute and it would revert it. Consider whether a processing-state guard (e.g., requiring ERRORED or CONSTRAINED) is needed here. **M2. Schema sub-objects missing `additionalProperties: false`** `docs/schema/automation_profile.schema.yaml:97-130, 132-174` The top-level schema correctly has `additionalProperties: false` (matching `extra="forbid"` on the Pydantic model), but the `guards` and `safety` sub-objects lack this constraint. This creates an asymmetry where the top level strictly rejects unknown fields but sub-objects silently accept them. **M3. `auto` profile description omits `access_network=1.0` gate** Three locations describe the `auto` profile inaccurately: | Location | Description | Missing | |----------|-------------|---------| | Code `automation_profile.py:481` | "Fully automatic except apply" | `access_network=1.0` (Apply→Strategize reversion also manual) | | `examples/profiles/auto.yaml:6` | "Fully automatic except apply and reversion" | Closer but inconsistent with code | | Spec prose (line 28406) | "All thresholds 0.0 except Apply (1.0)" | `access_network=1.0` | The spec's own prose contradicts its own table. The `auto.yaml` example is actually the most accurate of the three but diverges from the code's description string. --- ### MEDIUM — Test Coverage Gaps **M4. Escalation test coverage: only 1 of 11 operations tested** `features/semantic_escalation.feature:37-51` The Scenario Outline evaluates all 8 profiles but only for `operation "create_tool"`. The other 10 task-type operations (`decompose_task`, `select_tool`, `edit_code`, `execute_command`, `create_file`, `delete_content`, `access_network`, `install_dependency`, `modify_config`, `approve_plan`) are never tested against profile thresholds. If `getattr(profile, op, 1.0)` fails for a renamed field, that bug is invisible. **M5. Legacy validator test only checks 1 of 11 keys** `features/automation_profile_cli.feature:49-53` The "Add profile with legacy threshold keys fails" scenario only asserts `auto_strategize` appears in the error output. It does not verify: the error shows the correct new name (`decompose_task`), all 11 legacy keys are detected, or the mapping advice is present. A validator that catches only one legacy key and silently ignores the other 10 would pass this test. **M6. No boundary-value tests for 10 of 11 fields** `features/consolidated_automation_profile.feature` Only `decompose_task` gets full boundary treatment (0.0, 0.5, 1.0 accepted; -0.1 and 1.1 rejected). The other 10 fields get single-sided rejection tests only. For example: - `approve_plan` only tested for `< 0.0`, not `> 1.0` - `select_tool` only tested for `< 0.0`, not `> 1.0` - `modify_config` only tested for `> 1.0`, not `< 0.0` **M7. No test for mixed old + new field names** No scenario tests what happens when a config contains BOTH old and new names (e.g., `auto_strategize: 0.5` AND `decompose_task: 0.8`). The model_validator checks for legacy keys in `values` but doesn't handle the mixed case explicitly. **M8. Robot/service-level tests validate only 3 of 11 thresholds** - `robot/helper_automation_profile.py:35-37` — `_test_profile()` asserts range only for `decompose_task`, `create_tool`, `select_tool` - `features/steps/automation_profile_service_steps.py` — assertion steps exist only for `select_tool`, `decompose_task`, `create_tool` - `robot/e2e/m6_acceptance.robot:170-172` — show test checks only 3 field names If the rename broke any of the other 8 fields, these tests would still pass. **M9. M6 fixture files only specify 3 of 11 thresholds** `features/fixtures/m6/automation_profiles.json`, `features/fixtures/m6/autonomy_guardrails.json` `manual_thresholds` and `full_auto_thresholds` only list `decompose_task`, `create_tool`, `select_tool` + safety fields. The other 8 threshold fields are absent from fixture validation. Guard profiles also only set the same 3 fields. --- ### LOW — Test Quality / Isolation **L1. Silent `getattr` fallback masks bugs in escalation logic** `features/steps/semantic_escalation_steps.py:145` ```python threshold = getattr(profile, op, 1.0) ``` The `default=1.0` means a typo or stale field name silently falls back to 1.0 (manual). For the `manual` profile this is indistinguishable from correct behavior. **L2. CLI output assertion case-sensitivity inconsistency** `features/steps/automation_profile_cli_steps.py:510,539,548` The generic "output should contain" step is case-insensitive while JSON/YAML-specific assertions are case-sensitive. Tests verifying exact field names through the generic step are unreliable. **L3. No test that `model_dump()`/serialization uses new names exclusively** `features/steps/automation_profile_steps.py:508-528` The dump test only verifies `name` and `select_tool` keys exist. It doesn't verify all 11 new names are present or that no old names appear. **L4. Threshold summary assertion only checks 3 of 11 fields** `features/steps/automation_profile_cli_coverage_steps.py:119-123` Only `decompose_task=`, `create_tool=`, `select_tool=` are checked. If the summary omits the other 8, the test still passes. **L5. Robot guard enforcement uses overly loose OR-chain** `robot/e2e/m6_acceptance.robot:96` The `has_profile` check passes if **any one** of 7 keywords appears **anywhere** in stdout+stderr. An unrelated error message containing `automation_profile` would satisfy it. **L6. `_create_in_memory_profile_service()` duplicated 3 times** The same helper is copy-pasted in `automation_profile_cli_steps.py:21-38`, `automation_profile_cli_coverage_steps.py:37-54`, and `automation_profile_cli_coverage_boost_steps.py:27-44`. Maintenance risk if the service setup changes. **L7. Test step function names still use old-era naming** `features/steps/automation_profile_steps.py` — 16 functions (e.g., `step_try_create_profile_auto_dec_strat` instead of referencing `edit_code`). Not a failure risk but misleading for readers. **L8. Feature file step text references old semantics** `features/automation_profile_cli_coverage.feature:15` — Says "strategize execute and apply values" but the assertion body checks `decompose_task=`, `create_tool=`, `select_tool=`. --- ### LOW — Performance / Benchmarks **L9. `from_config` benchmark uses only 3 of 11 thresholds** `benchmarks/automation_profile_bench.py:114-120` The config dict only specifies `decompose_task`, `create_tool`, `select_tool`. The remaining 8 fields take Pydantic defaults. This doesn't measure the cost of validating a fully-populated config (the common real-world case). **L10. CLI benchmark double-calls `_patch_service()`** `benchmarks/automation_profile_cli_bench.py:106,117` `_patch_service()` is called in both `setup()` and `time_add_from_config()`. The per-iteration call contaminates timing measurements with service construction overhead. --- ### LOW — Code Quality **L11. `validate_threshold` validator is redundant** `src/cleveragents/domain/models/core/automation_profile.py:210-228` The `@field_validator` checking `0.0 <= v <= 1.0` is redundant with the `ge=0.0, le=1.0` constraints already on each `Field()`. Pydantic validates constraints before field validators run. **L12. Migration does not specify `existing_type`** `alembic/versions/m5_001_rename_automation_profile_fields.py:42-43` `alter_column()` omits `existing_type=sa.Float()`. Safe for SQLite (batch mode infers from existing column) but could cause issues on other DB backends. --- ### INFO — Observations **I1. Specification uses task-type names for phase-transition gates by design** The spec (lines 28333-28343) explicitly maps `create_tool` → Strategize→Execute gate, `select_tool` → Execute→Apply gate, `access_network` → Apply→Strategize reversion, `delete_content` → Execute→Strategize reversion. The bridge comments in `plan_lifecycle_service.py:1505-1511` correctly document this. The semantic disconnect between field names and their actual behavioral meaning is a specification design choice, not a code bug. **I2. 7 of 11 thresholds are unused in `plan_lifecycle_service.py`** Only `create_tool`, `select_tool`, `access_network`, and `delete_content` are read by the lifecycle service. The other 7 (`decompose_task`, `edit_code`, `execute_command`, `create_file`, `install_dependency`, `modify_config`, `approve_plan`) are presumably consumed by actor execution, the AutonomyController, or other services. This is fine but worth documenting. **I3. No stale references to old field names in production code** A full codebase sweep confirmed all occurrences of old `auto_*` names exist only in intentional locations: `_LEGACY_FIELD_MAP` (migration detection), Alembic `_COLUMN_RENAMES` (DB migration), and `_LEGACY_THRESHOLD_YAML` (test fixture for rejection testing). The unrelated `ProjectSettings.auto_apply` and `PlanConfig.auto_apply` are different models and not affected. **I4. All 88 threshold values across 8 profiles match the spec table exactly** Verified field-by-field. The code is faithfully implementing the specification. --- ## Summary Table | Severity | Count | Categories | |----------|-------|-----------| | **HIGH** | 3 | Documentation | | **MEDIUM** | 9 | Code quality (2), Test coverage (7) | | **LOW** | 12 | Test quality (6), Performance (2), Code quality (2), Documentation (2) | | **INFO** | 4 | Observations | | **Total** | 28 | | ## Recommended Priority 1. **Fix H1-H3** — Documentation inconsistencies are actively misleading. H2 (wrong default profile) and H3 (wrong precedence chain) could cause users to misconfigure their profiles. 2. **Address M1** — The `try_auto_revert_from_execute` processing-state guard asymmetry is a potential logic bug that could cause unintended reversions. 3. **Improve M4-M9** — Test coverage breadth. Most test paths only verify 3 of 11 fields, providing a thin safety net for future regressions on the other 8 fields. 4. **Low/Info items** — Address at discretion. None are blockers.
CoreRasurae force-pushed feature/m5-automation-profile-fields from bd277b9469
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 41s
CI / lint (pull_request) Successful in 3m51s
CI / typecheck (pull_request) Successful in 4m28s
CI / quality (pull_request) Successful in 3m59s
CI / security (pull_request) Successful in 4m38s
CI / unit_tests (pull_request) Successful in 7m29s
CI / docker (pull_request) Successful in 1m9s
CI / integration_tests (pull_request) Successful in 8m7s
CI / e2e_tests (pull_request) Successful in 10m22s
CI / coverage (pull_request) Successful in 12m44s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 51m30s
to 3b62bd989f
Some checks failed
CI / lint (pull_request) Successful in 27s
CI / typecheck (pull_request) Successful in 57s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / helm (pull_request) Successful in 30s
CI / security (pull_request) Successful in 4m10s
CI / quality (pull_request) Successful in 3m45s
CI / unit_tests (pull_request) Failing after 4m13s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 4m9s
CI / e2e_tests (pull_request) Failing after 9m34s
CI / coverage (pull_request) Successful in 11m56s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 09:42:44 +00:00
Compare
CoreRasurae force-pushed feature/m5-automation-profile-fields from 3b62bd989f
Some checks failed
CI / lint (pull_request) Successful in 27s
CI / typecheck (pull_request) Successful in 57s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / helm (pull_request) Successful in 30s
CI / security (pull_request) Successful in 4m10s
CI / quality (pull_request) Successful in 3m45s
CI / unit_tests (pull_request) Failing after 4m13s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Failing after 4m9s
CI / e2e_tests (pull_request) Failing after 9m34s
CI / coverage (pull_request) Successful in 11m56s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Has been cancelled
to d8198457cd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 19s
CI / helm (pull_request) Successful in 48s
CI / lint (pull_request) Successful in 3m32s
CI / quality (pull_request) Successful in 3m53s
CI / typecheck (pull_request) Successful in 4m8s
CI / security (pull_request) Successful in 4m9s
CI / integration_tests (pull_request) Failing after 6m35s
CI / unit_tests (pull_request) Failing after 6m58s
CI / docker (pull_request) Has been skipped
CI / e2e_tests (pull_request) Failing after 11m4s
CI / coverage (pull_request) Successful in 12m21s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Successful in 58m56s
2026-03-28 10:04:25 +00:00
Compare
CoreRasurae force-pushed feature/m5-automation-profile-fields from d8198457cd
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 19s
CI / helm (pull_request) Successful in 48s
CI / lint (pull_request) Successful in 3m32s
CI / quality (pull_request) Successful in 3m53s
CI / typecheck (pull_request) Successful in 4m8s
CI / security (pull_request) Successful in 4m9s
CI / integration_tests (pull_request) Failing after 6m35s
CI / unit_tests (pull_request) Failing after 6m58s
CI / docker (pull_request) Has been skipped
CI / e2e_tests (pull_request) Failing after 11m4s
CI / coverage (pull_request) Successful in 12m21s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Successful in 58m56s
to f9b4986b2a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 56s
CI / lint (pull_request) Successful in 3m21s
CI / build (pull_request) Successful in 15s
CI / helm (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 4m15s
CI / security (pull_request) Successful in 4m17s
CI / unit_tests (pull_request) Successful in 4m37s
CI / docker (pull_request) Successful in 1m26s
CI / integration_tests (pull_request) Successful in 7m3s
CI / e2e_tests (pull_request) Failing after 9m30s
CI / coverage (pull_request) Successful in 8m24s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-29 17:15:08 +00:00
Compare
CoreRasurae force-pushed feature/m5-automation-profile-fields from f9b4986b2a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 56s
CI / lint (pull_request) Successful in 3m21s
CI / build (pull_request) Successful in 15s
CI / helm (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 4m15s
CI / security (pull_request) Successful in 4m17s
CI / unit_tests (pull_request) Successful in 4m37s
CI / docker (pull_request) Successful in 1m26s
CI / integration_tests (pull_request) Successful in 7m3s
CI / e2e_tests (pull_request) Failing after 9m30s
CI / coverage (pull_request) Successful in 8m24s
CI / status-check (pull_request) Failing after 1s
CI / benchmark-regression (pull_request) Has been cancelled
to 0cf18592b7
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 24s
CI / typecheck (pull_request) Successful in 57s
CI / quality (pull_request) Successful in 3m42s
CI / build (pull_request) Successful in 15s
CI / security (pull_request) Successful in 4m14s
CI / integration_tests (pull_request) Successful in 3m59s
CI / helm (pull_request) Successful in 21s
CI / unit_tests (pull_request) Successful in 4m35s
CI / docker (pull_request) Successful in 1m22s
CI / e2e_tests (pull_request) Successful in 9m26s
CI / coverage (pull_request) Successful in 11m27s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 58m23s
2026-03-29 17:31:07 +00:00
Compare
freemo approved these changes 2026-03-30 04:22:03 +00:00
Dismissed
freemo left a comment

Review: APPROVED with Comments

Comprehensive rename refactor touching 62 files — high risk but methodically executed.

Notes

  1. Commit scope is very large: 62 files across model, DB, CLI, docs, tests, and examples. Per CONTRIBUTING.md §Atomic Commits, consider splitting into: 1) model + migration, 2) service/CLI updates, 3) doc updates, 4) test updates.
  2. Good: extra="forbid" with legacy key detection provides a user-friendly migration path with actionable error messages.
  3. Good: Alembic merge migration correctly resolves dual-head situation.
  4. Good: Specification, ADRs, reference docs, schema YAML, and example profiles all updated consistently.
  5. The rename semantics (e.g., auto_reversion_from_apply -> access_network) may be non-obvious — ensure the mapping is documented.
## Review: APPROVED with Comments Comprehensive rename refactor touching 62 files — high risk but methodically executed. ### Notes 1. **Commit scope is very large**: 62 files across model, DB, CLI, docs, tests, and examples. Per CONTRIBUTING.md §Atomic Commits, consider splitting into: 1) model + migration, 2) service/CLI updates, 3) doc updates, 4) test updates. 2. **Good**: `extra="forbid"` with legacy key detection provides a user-friendly migration path with actionable error messages. 3. **Good**: Alembic merge migration correctly resolves dual-head situation. 4. **Good**: Specification, ADRs, reference docs, schema YAML, and example profiles all updated consistently. 5. The rename semantics (e.g., `auto_reversion_from_apply` -> `access_network`) may be non-obvious — ensure the mapping is documented.
CoreRasurae force-pushed feature/m5-automation-profile-fields from 0cf18592b7
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 24s
CI / typecheck (pull_request) Successful in 57s
CI / quality (pull_request) Successful in 3m42s
CI / build (pull_request) Successful in 15s
CI / security (pull_request) Successful in 4m14s
CI / integration_tests (pull_request) Successful in 3m59s
CI / helm (pull_request) Successful in 21s
CI / unit_tests (pull_request) Successful in 4m35s
CI / docker (pull_request) Successful in 1m22s
CI / e2e_tests (pull_request) Successful in 9m26s
CI / coverage (pull_request) Successful in 11m27s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 58m23s
to 3501f1b6f0
Some checks failed
CI / lint (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 47s
CI / quality (pull_request) Successful in 3m44s
CI / build (pull_request) Successful in 24s
CI / security (pull_request) Successful in 4m14s
CI / integration_tests (pull_request) Successful in 4m3s
CI / helm (pull_request) Successful in 30s
CI / unit_tests (pull_request) Successful in 4m43s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 9m52s
CI / e2e_tests (pull_request) Successful in 16m58s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-30 09:17:47 +00:00
Compare
CoreRasurae dismissed freemo's review 2026-03-30 09:17:47 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

CoreRasurae force-pushed feature/m5-automation-profile-fields from 3501f1b6f0
Some checks failed
CI / lint (pull_request) Successful in 23s
CI / typecheck (pull_request) Successful in 47s
CI / quality (pull_request) Successful in 3m44s
CI / build (pull_request) Successful in 24s
CI / security (pull_request) Successful in 4m14s
CI / integration_tests (pull_request) Successful in 4m3s
CI / helm (pull_request) Successful in 30s
CI / unit_tests (pull_request) Successful in 4m43s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 9m52s
CI / e2e_tests (pull_request) Successful in 16m58s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
to 5969cf4980
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / build (pull_request) Successful in 25s
CI / security (pull_request) Successful in 1m17s
CI / helm (pull_request) Successful in 36s
CI / e2e_tests (pull_request) Has been cancelled
CI / quality (pull_request) Has been cancelled
CI / typecheck (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-30 10:23:04 +00:00
Compare
CoreRasurae force-pushed feature/m5-automation-profile-fields from 5969cf4980
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / build (pull_request) Successful in 25s
CI / security (pull_request) Successful in 1m17s
CI / helm (pull_request) Successful in 36s
CI / e2e_tests (pull_request) Has been cancelled
CI / quality (pull_request) Has been cancelled
CI / typecheck (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to fe9f8ed536
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 21s
CI / helm (pull_request) Successful in 21s
CI / security (pull_request) Successful in 1m0s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m56s
CI / integration_tests (pull_request) Successful in 7m17s
CI / unit_tests (pull_request) Successful in 8m19s
CI / docker (pull_request) Successful in 1m41s
CI / e2e_tests (pull_request) Successful in 18m25s
CI / coverage (pull_request) Successful in 13m31s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-30 10:26:44 +00:00
Compare
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-30 10:26:59 +00:00
CoreRasurae force-pushed feature/m5-automation-profile-fields from fe9f8ed536
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 21s
CI / helm (pull_request) Successful in 21s
CI / security (pull_request) Successful in 1m0s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m56s
CI / integration_tests (pull_request) Successful in 7m17s
CI / unit_tests (pull_request) Successful in 8m19s
CI / docker (pull_request) Successful in 1m41s
CI / e2e_tests (pull_request) Successful in 18m25s
CI / coverage (pull_request) Successful in 13m31s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to 577466414c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 50s
CI / quality (pull_request) Successful in 44s
CI / security (pull_request) Successful in 54s
CI / build (pull_request) Successful in 14s
CI / helm (pull_request) Successful in 32s
CI / lint (pull_request) Successful in 3m19s
CI / integration_tests (pull_request) Successful in 3m58s
CI / unit_tests (pull_request) Successful in 8m7s
CI / docker (pull_request) Successful in 1m51s
CI / coverage (pull_request) Successful in 11m52s
CI / e2e_tests (pull_request) Successful in 18m36s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-30 10:58:12 +00:00
Compare
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-30 10:58:29 +00:00
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-30 11:36:21 +00:00
CoreRasurae force-pushed feature/m5-automation-profile-fields from 577466414c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 50s
CI / quality (pull_request) Successful in 44s
CI / security (pull_request) Successful in 54s
CI / build (pull_request) Successful in 14s
CI / helm (pull_request) Successful in 32s
CI / lint (pull_request) Successful in 3m19s
CI / integration_tests (pull_request) Successful in 3m58s
CI / unit_tests (pull_request) Successful in 8m7s
CI / docker (pull_request) Successful in 1m51s
CI / coverage (pull_request) Successful in 11m52s
CI / e2e_tests (pull_request) Successful in 18m36s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to 23da21e97e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 19s
CI / helm (pull_request) Successful in 21s
CI / security (pull_request) Successful in 58s
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m55s
CI / unit_tests (pull_request) Successful in 4m14s
CI / docker (pull_request) Successful in 1m44s
CI / integration_tests (pull_request) Successful in 7m7s
CI / coverage (pull_request) Successful in 8m49s
CI / e2e_tests (pull_request) Successful in 19m50s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-30 11:36:45 +00:00
Compare
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-30 11:37:04 +00:00
CoreRasurae force-pushed feature/m5-automation-profile-fields from 23da21e97e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 19s
CI / helm (pull_request) Successful in 21s
CI / security (pull_request) Successful in 58s
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m55s
CI / unit_tests (pull_request) Successful in 4m14s
CI / docker (pull_request) Successful in 1m44s
CI / integration_tests (pull_request) Successful in 7m7s
CI / coverage (pull_request) Successful in 8m49s
CI / e2e_tests (pull_request) Successful in 19m50s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to 007af498b8
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / helm (pull_request) Successful in 33s
CI / lint (pull_request) Successful in 3m42s
CI / security (pull_request) Successful in 4m8s
CI / quality (pull_request) Successful in 4m9s
CI / typecheck (pull_request) Successful in 4m20s
CI / integration_tests (pull_request) Successful in 7m2s
CI / unit_tests (pull_request) Successful in 7m55s
CI / docker (pull_request) Successful in 1m19s
CI / coverage (pull_request) Successful in 8m46s
CI / e2e_tests (pull_request) Successful in 16m1s
CI / status-check (pull_request) Successful in 1s
CI / build (push) Successful in 17s
CI / helm (push) Successful in 22s
CI / quality (push) Successful in 31s
CI / lint (push) Successful in 3m28s
CI / typecheck (push) Successful in 3m54s
CI / benchmark-regression (push) Has been skipped
CI / security (push) Successful in 4m5s
CI / integration_tests (push) Successful in 6m13s
CI / unit_tests (push) Successful in 6m28s
CI / docker (push) Successful in 1m34s
CI / coverage (push) Successful in 12m5s
CI / e2e_tests (push) Successful in 18m47s
CI / status-check (push) Successful in 1s
CI / benchmark-publish (push) Successful in 28m0s
CI / benchmark-regression (pull_request) Successful in 59m48s
2026-03-30 12:18:16 +00:00
Compare
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-30 12:18:32 +00:00
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-30 12:18:52 +00:00
CoreRasurae deleted branch feature/m5-automation-profile-fields 2026-03-30 12:34:21 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!1147
No description provided.