feat(plan): decision correction recomputes only affected subtree #1075

2026-03-19T19:59:56Z

CoreRasurae commented

2026-03-19 19:59:56 +00:00

Summary

Enhanced the CorrectionService to properly isolate correction scope so that decision corrections recompute only the affected subtree, preserving root and sibling decisions.

Changes

analyze_impact() now populates excluded_decisions — computes the set difference between all plan decisions and the affected subtree, ensuring root and sibling decisions are tracked.
Added rollback_tier computation — compute_rollback_tier() counts parent hops from the target decision to the root, enabling depth-aware rollback strategies (tier 0 = root targeted, tier N = N levels deep). Added rollback_tier_depth: int field to CorrectionImpact.
Enhanced dry-run report — includes excluded decisions list, rollback tier depth, and tier-0 warning when the entire tree is affected (root targeted with multiple decisions).
Added subtree isolation validation — validate_subtree_isolation() confirms root exclusion and sibling non-contamination for non-root corrections.

Files Modified/Created

src/cleveragents/domain/models/core/correction.py — Added excluded_decisions and rollback_tier_depth fields to CorrectionImpact and CorrectionDryRunReport
src/cleveragents/application/services/correction_service.py — Enhanced analyze_impact(), generate_dry_run_report(), added compute_rollback_tier(), validate_subtree_isolation(), and helper methods
features/correction_subtree_isolation.feature — 15 BDD scenarios
features/steps/correction_subtree_isolation_steps.py — Step definitions
robot/correction_subtree_isolation.robot — 8 Robot Framework integration tests
robot/helper_correction_subtree_isolation.py — Robot helper script

Quality Gates

Lint: All checks passed
Typecheck: 0 errors (Pyright strict)
Unit tests: 400 features, 11508 scenarios, 0 failures
Integration tests: 7/7 new Robot tests passed, 0 regressions
Coverage: 97% (threshold: 97%)

ISSUES CLOSED: #845

## Summary Enhanced the `CorrectionService` to properly isolate correction scope so that decision corrections recompute only the affected subtree, preserving root and sibling decisions. ### Changes 1. **`analyze_impact()` now populates `excluded_decisions`** — computes the set difference between all plan decisions and the affected subtree, ensuring root and sibling decisions are tracked. 2. **Added `rollback_tier` computation** — `compute_rollback_tier()` counts parent hops from the target decision to the root, enabling depth-aware rollback strategies (tier 0 = root targeted, tier N = N levels deep). Added `rollback_tier_depth: int` field to `CorrectionImpact`. 3. **Enhanced dry-run report** — includes excluded decisions list, rollback tier depth, and tier-0 warning when the entire tree is affected (root targeted with multiple decisions). 4. **Added subtree isolation validation** — `validate_subtree_isolation()` confirms root exclusion and sibling non-contamination for non-root corrections. ### Files Modified/Created - `src/cleveragents/domain/models/core/correction.py` — Added `excluded_decisions` and `rollback_tier_depth` fields to `CorrectionImpact` and `CorrectionDryRunReport` - `src/cleveragents/application/services/correction_service.py` — Enhanced `analyze_impact()`, `generate_dry_run_report()`, added `compute_rollback_tier()`, `validate_subtree_isolation()`, and helper methods - `features/correction_subtree_isolation.feature` — 15 BDD scenarios - `features/steps/correction_subtree_isolation_steps.py` — Step definitions - `robot/correction_subtree_isolation.robot` — 8 Robot Framework integration tests - `robot/helper_correction_subtree_isolation.py` — Robot helper script ### Quality Gates - Lint: All checks passed - Typecheck: 0 errors (Pyright strict) - Unit tests: 400 features, 11508 scenarios, 0 failures - Integration tests: 7/7 new Robot tests passed, 0 regressions - Coverage: 97% (threshold: 97%) ISSUES CLOSED: #845

CoreRasurae added this to the v3.3.0 milestone 2026-03-19 20:47:40 +00:00

CoreRasurae added the

Type

Feature

label 2026-03-19 20:47:42 +00:00

CoreRasurae referenced this pull request

2026-03-19 20:48:17 +00:00

feat(plan): decision correction recomputes only affected subtree #845

CoreRasurae reviewed 2026-03-19 21:52:38 +00:00

CoreRasurae left a comment

Code Review Report — PR #1075 (feature #845)

Scope: All code changes in feature/m4-correction-subtree and their immediate surrounding context.
Reviewed against: Issue #845 acceptance criteria, docs/specification.md (§ Correcting Plans, § Affected Subtree Computation), and PR #1075 description.
Review methodology: Three full review cycles across all categories (bugs/logic, spec compliance, security, performance, test coverage/flaws), each sweep covering all categories globally, repeated until no new findings emerged.

HIGH Severity

H1 — Bug: Status state machine regression in `execute_revert`

File: src/cleveragents/application/services/correction_service.py:339,345→192
Description: execute_revert() sets request.status = CorrectionStatus.EXECUTING (line 339), then immediately calls analyze_impact() (line 345), which overwrites status to CorrectionStatus.ANALYZING (line 192). This causes a backward state transition: PENDING → EXECUTING → ANALYZING → APPLIED. Any concurrent observer or audit log would see the correction regress from EXECUTING to ANALYZING, violating the expected lifecycle ordering defined in CorrectionStatus.
Suggestion: Call analyze_impact() before transitioning to EXECUTING, or guard analyze_impact() so it only sets ANALYZING when the current status is PENDING.

H2 — Bug: `validate_subtree_isolation` rejects legitimate influence-DAG-caused sibling contamination

File: src/cleveragents/application/services/correction_service.py:631-644
Description: The validation checks that siblings of the target are NOT in the affected set (invariant 2, line 637-638). However, _compute_affected_subtree traverses both the structural tree AND the influence DAG. If the influence DAG legitimately causes a sibling to be affected (e.g., a dependency edge from a descendant of the target to a sibling), the validation would incorrectly return False, rejecting a valid correction. Per docs/specification.md line 28667: the affected subtree is "computed by walking the influence DAG (not just the structural tree)" — influence-caused sibling inclusion is expected behavior.
Suggestion: The sibling check should only consider structural-tree siblings that are affected via the structural tree, OR the invariant should be relaxed to allow influence-DAG-caused contamination. Document the design decision either way.

H3 — Test Coverage Gap: Influence DAG traversal completely untested

File: All test files (features/correction_subtree_isolation.feature, features/steps/correction_subtree_isolation_steps.py, robot/helper_correction_subtree_isolation.py)
Description: The _compute_affected_subtree method has dual-traversal logic (structural tree + influence DAG) — this is a core spec requirement per docs/specification.md §28667-28696. However, NO test (Behave or Robot) passes the influence_edges parameter. The entire influence-DAG traversal path is untested. This is the most critical test coverage gap since influence-based cascading is fundamental to the correction feature's correctness.
Suggestion: Add test scenarios that supply influence_edges alongside decision_tree and verify that:

Influence edges cause additional decisions to be affected
Cycle detection works with influence DAG cycles
The union of structural + influence traversal is correct

H4 — Test Defect: Duplicate scenarios in `resource_type_deferred_physical.feature`

File: features/resource_type_deferred_physical.feature:266-286
Description: The "as_cli_dict regression: both auto_discovery schemas" section (lines 266-275) is duplicated verbatim at lines 277-286, including the identical section comment. This causes Behave to run 4 duplicate scenarios. While not a test failure, it inflates coverage metrics and obscures the actual test count.
Suggestion: Remove lines 277-286 (the second copy).

MEDIUM Severity

M1 — Bug: False-positive cycle-detection warnings from duplicate BFS enqueueing

File: src/cleveragents/application/services/correction_service.py:682-706
Description: In _compute_affected_subtree, both structural_children and influence_dependents are iterated separately (lines 701-706). If both lists contain the same node and it hasn't been visited yet, it gets enqueued twice. When the second copy is dequeued, node in visited is true, and the logger emits a "correction.cycle_detected" warning (line 685-690). This is a false positive — there's no actual cycle, just a node reachable via two different edge types. In production with influence edges active, this could produce noisy false cycle warnings.
Suggestion: Merge the two neighbor sets before enqueueing: for neighbor in set(structural_children) | set(influence_dependents):, or suppress the warning when the duplicate came from multi-edge-type reachability rather than a true cycle.

M2 — Logic Gap: `execute_append` skips `analyze_impact` entirely

File: src/cleveragents/application/services/correction_service.py:380-441
Description: Unlike execute_revert which calls analyze_impact(), execute_append creates a result without any impact analysis. This means self._impacts[correction_id] is never populated for append corrections, and there's no record of what the correction affects. Per the spec, both modes should understand the affected subtree — append still needs to know context for the spawned child plan.
Suggestion: Consider calling analyze_impact() in execute_append as well, or document why append mode intentionally skips it.

M3 — Test Coverage Gap: No negative test for `validate_subtree_isolation` returning `False`

File: features/correction_subtree_isolation.feature, robot/helper_correction_subtree_isolation.py
Description: All tests for validate_subtree_isolation verify it returns True. There is no test that creates a scenario where the validation SHOULD fail and checks the False return path. This leaves the violation-detection branch (lines 624-629, 638-644) of validate_subtree_isolation untested.
Suggestion: Add a test with a manually constructed affected set that includes the root or a sibling, to exercise the failure return path.

M4 — Test Coverage Gap: `execute_append` mode has zero test coverage

File: All test files
Description: Neither Behave nor Robot tests exercise the append correction flow (CorrectionMode.APPEND). All correction tests use CorrectionMode.REVERT. The append path in execute_append() (lines 380-441) and the dispatch logic in execute_correction() are untested.
Suggestion: Add at least one scenario testing append mode — verifying that a child plan ID is spawned and the original decision is preserved.

M5 — Design: Synthetic placeholder values for `affected_files` and `artifacts_to_archive`

File: src/cleveragents/application/services/correction_service.py:208,213
Description: affected_files is computed as [f"{d}.py" for d in affected] and artifacts_to_archive as [f"{d}.artifact" for d in affected]. These are synthetic values derived from decision IDs, not real file paths or artifact references. Consumers of CorrectionImpact may interpret these as real data. The synthetic data also flows into CorrectionResult.archived_artifacts via the revert path.
Suggestion: Use empty lists and a TODO comment until real file/artifact tracking is implemented, or document these fields as "placeholder" in the field description.

M6 — Test Coverage Gap: Error handling paths in `execute_revert`/`execute_append` untested

File: src/cleveragents/application/services/correction_service.py:355-363, 422-430
Description: Both execution methods have except Exception blocks that set CorrectionStatus.FAILED, populate error messages, and mark attempts as unsuccessful. These code paths have no test coverage.
Suggestion: Add a test that causes an exception during execution (e.g., mock a failure) and verify the FAILED status and error_message are set correctly.

M7 — Design: `CorrectionRequest` `frozen=False` allows mutation of identity fields

File: src/cleveragents/domain/models/core/correction.py:56
Description: CorrectionRequest has model_config = ConfigDict(frozen=False) to allow status mutations. However, this also allows mutation of plan_id, target_decision_id, mode, and correction_id after creation. These fields should be immutable-by-design after construction, as changing them would corrupt the correction's identity and any dependent data in _impacts/_attempts/_results.
Suggestion: Consider using Pydantic's @property or __setattr__ override pattern to allow only status to be mutated while keeping other fields frozen. Alternatively, document this as a known trade-off.

LOW Severity

L1 — Code Quality: Magic number cost/time multipliers

File: src/cleveragents/application/services/correction_service.py:215,284
Description: estimated_cost = float(len(affected)) * 1.5 and recompute_seconds = float(len(affected)) * 2.0 use unexplained magic constants.
Suggestion: Extract to named module-level constants (e.g., _COST_PER_DECISION = 1.5, _RECOMPUTE_SECONDS_PER_DECISION = 2.0).

L2 — Code Quality: Robot helper `COMMANDS` dict type annotation

File: robot/helper_correction_subtree_isolation.py:181
Description: COMMANDS: dict[str, object] should be dict[str, Callable[[], None]] for accurate type checking. The current annotation allows non-callable values, and the if callable(func) guard at line 201 is only needed because of the loose typing.
Suggestion: Change to Callable[[], None] and remove the runtime callable check.

L3 — Edge Case: `_find_root` returns arbitrary node in forest/multi-root graphs

File: src/cleveragents/application/services/correction_service.py:773-788
Description: If the decision tree is a forest (multiple disconnected subtrees), _find_root returns the first parent not in any child list. For dict insertion order, this is deterministic but semantically arbitrary. The fallback at line 788 (next(iter(tree))) handles the degenerate case where every node is someone's child (i.e., a cycle).
Suggestion: Document this limitation or raise an error when multiple roots are detected.

L4 — Design: Ambiguous return value 0 for `_compute_rollback_tier_depth`

File: src/cleveragents/application/services/correction_service.py:748-771
Description: Returns 0 for both "target is the root" and "target not found in tree." These are semantically different outcomes but indistinguishable. The test at robot/helper_correction_subtree_isolation.py:173 explicitly expects 0 for an unknown node.
Suggestion: Consider returning -1 or None for unknown nodes, or document that 0 means "root or unknown."

L5 — Design: Redundant fields in `CorrectionDryRunReport`

File: src/cleveragents/domain/models/core/correction.py:280-287
Description: CorrectionDryRunReport has excluded_decisions and rollback_tier_depth as top-level fields that duplicate report.impact.excluded_decisions and report.impact.rollback_tier_depth. This is intentional convenience access, but increases the surface area for inconsistency.
Suggestion: No change needed if intentional — but a comment noting the duplication rationale would help.

Summary

Severity	Count	Categories
HIGH	4	2 bugs, 1 test coverage gap, 1 test defect
MEDIUM	7	1 bug, 2 design issues, 4 test coverage gaps
LOW	5	2 code quality, 2 design, 1 edge case
Total	16

The core subtree isolation logic is well-structured and the BFS algorithm with cycle detection is sound. The main concerns are: (1) the status state machine regression in execute_revert, (2) the interaction between validate_subtree_isolation and the influence DAG, (3) the complete absence of influence-DAG test coverage despite it being a spec requirement, and (4) duplicate test scenarios. The resource type changes (deferred physical types, fs-directory updates) are clean and consistent with their YAML definitions.

## Code Review Report — PR #1075 (feature #845) **Scope**: All code changes in `feature/m4-correction-subtree` and their immediate surrounding context. **Reviewed against**: Issue #845 acceptance criteria, `docs/specification.md` (§ Correcting Plans, § Affected Subtree Computation), and PR #1075 description. **Review methodology**: Three full review cycles across all categories (bugs/logic, spec compliance, security, performance, test coverage/flaws), each sweep covering all categories globally, repeated until no new findings emerged. --- ### HIGH Severity #### H1 — Bug: Status state machine regression in `execute_revert` **File**: `src/cleveragents/application/services/correction_service.py:339,345→192` **Description**: `execute_revert()` sets `request.status = CorrectionStatus.EXECUTING` (line 339), then immediately calls `analyze_impact()` (line 345), which overwrites status to `CorrectionStatus.ANALYZING` (line 192). This causes a backward state transition: `PENDING → EXECUTING → ANALYZING → APPLIED`. Any concurrent observer or audit log would see the correction regress from EXECUTING to ANALYZING, violating the expected lifecycle ordering defined in `CorrectionStatus`. **Suggestion**: Call `analyze_impact()` **before** transitioning to EXECUTING, or guard `analyze_impact()` so it only sets ANALYZING when the current status is PENDING. #### H2 — Bug: `validate_subtree_isolation` rejects legitimate influence-DAG-caused sibling contamination **File**: `src/cleveragents/application/services/correction_service.py:631-644` **Description**: The validation checks that siblings of the target are NOT in the affected set (invariant 2, line 637-638). However, `_compute_affected_subtree` traverses both the structural tree AND the influence DAG. If the influence DAG legitimately causes a sibling to be affected (e.g., a dependency edge from a descendant of the target to a sibling), the validation would incorrectly return `False`, rejecting a valid correction. Per `docs/specification.md` line 28667: the affected subtree is "computed by walking the influence DAG (not just the structural tree)" — influence-caused sibling inclusion is expected behavior. **Suggestion**: The sibling check should only consider structural-tree siblings that are affected via the structural tree, OR the invariant should be relaxed to allow influence-DAG-caused contamination. Document the design decision either way. #### H3 — Test Coverage Gap: Influence DAG traversal completely untested **File**: All test files (`features/correction_subtree_isolation.feature`, `features/steps/correction_subtree_isolation_steps.py`, `robot/helper_correction_subtree_isolation.py`) **Description**: The `_compute_affected_subtree` method has dual-traversal logic (structural tree + influence DAG) — this is a core spec requirement per `docs/specification.md` §28667-28696. However, NO test (Behave or Robot) passes the `influence_edges` parameter. The entire influence-DAG traversal path is untested. This is the most critical test coverage gap since influence-based cascading is fundamental to the correction feature's correctness. **Suggestion**: Add test scenarios that supply `influence_edges` alongside `decision_tree` and verify that: - Influence edges cause additional decisions to be affected - Cycle detection works with influence DAG cycles - The union of structural + influence traversal is correct #### H4 — Test Defect: Duplicate scenarios in `resource_type_deferred_physical.feature` **File**: `features/resource_type_deferred_physical.feature:266-286` **Description**: The "as_cli_dict regression: both auto_discovery schemas" section (lines 266-275) is duplicated verbatim at lines 277-286, including the identical section comment. This causes Behave to run 4 duplicate scenarios. While not a test failure, it inflates coverage metrics and obscures the actual test count. **Suggestion**: Remove lines 277-286 (the second copy). --- ### MEDIUM Severity #### M1 — Bug: False-positive cycle-detection warnings from duplicate BFS enqueueing **File**: `src/cleveragents/application/services/correction_service.py:682-706` **Description**: In `_compute_affected_subtree`, both `structural_children` and `influence_dependents` are iterated separately (lines 701-706). If both lists contain the same node and it hasn't been visited yet, it gets enqueued twice. When the second copy is dequeued, `node in visited` is true, and the logger emits a `"correction.cycle_detected"` warning (line 685-690). This is a false positive — there's no actual cycle, just a node reachable via two different edge types. In production with influence edges active, this could produce noisy false cycle warnings. **Suggestion**: Merge the two neighbor sets before enqueueing: `for neighbor in set(structural_children) | set(influence_dependents):`, or suppress the warning when the duplicate came from multi-edge-type reachability rather than a true cycle. #### M2 — Logic Gap: `execute_append` skips `analyze_impact` entirely **File**: `src/cleveragents/application/services/correction_service.py:380-441` **Description**: Unlike `execute_revert` which calls `analyze_impact()`, `execute_append` creates a result without any impact analysis. This means `self._impacts[correction_id]` is never populated for append corrections, and there's no record of what the correction affects. Per the spec, both modes should understand the affected subtree — append still needs to know context for the spawned child plan. **Suggestion**: Consider calling `analyze_impact()` in `execute_append` as well, or document why append mode intentionally skips it. #### M3 — Test Coverage Gap: No negative test for `validate_subtree_isolation` returning `False` **File**: `features/correction_subtree_isolation.feature`, `robot/helper_correction_subtree_isolation.py` **Description**: All tests for `validate_subtree_isolation` verify it returns `True`. There is no test that creates a scenario where the validation SHOULD fail and checks the `False` return path. This leaves the violation-detection branch (lines 624-629, 638-644) of `validate_subtree_isolation` untested. **Suggestion**: Add a test with a manually constructed affected set that includes the root or a sibling, to exercise the failure return path. #### M4 — Test Coverage Gap: `execute_append` mode has zero test coverage **File**: All test files **Description**: Neither Behave nor Robot tests exercise the append correction flow (`CorrectionMode.APPEND`). All correction tests use `CorrectionMode.REVERT`. The append path in `execute_append()` (lines 380-441) and the dispatch logic in `execute_correction()` are untested. **Suggestion**: Add at least one scenario testing append mode — verifying that a child plan ID is spawned and the original decision is preserved. #### M5 — Design: Synthetic placeholder values for `affected_files` and `artifacts_to_archive` **File**: `src/cleveragents/application/services/correction_service.py:208,213` **Description**: `affected_files` is computed as `[f"{d}.py" for d in affected]` and `artifacts_to_archive` as `[f"{d}.artifact" for d in affected]`. These are synthetic values derived from decision IDs, not real file paths or artifact references. Consumers of `CorrectionImpact` may interpret these as real data. The synthetic data also flows into `CorrectionResult.archived_artifacts` via the revert path. **Suggestion**: Use empty lists and a TODO comment until real file/artifact tracking is implemented, or document these fields as "placeholder" in the field description. #### M6 — Test Coverage Gap: Error handling paths in `execute_revert`/`execute_append` untested **File**: `src/cleveragents/application/services/correction_service.py:355-363, 422-430` **Description**: Both execution methods have `except Exception` blocks that set `CorrectionStatus.FAILED`, populate error messages, and mark attempts as unsuccessful. These code paths have no test coverage. **Suggestion**: Add a test that causes an exception during execution (e.g., mock a failure) and verify the FAILED status and error_message are set correctly. #### M7 — Design: `CorrectionRequest` `frozen=False` allows mutation of identity fields **File**: `src/cleveragents/domain/models/core/correction.py:56` **Description**: `CorrectionRequest` has `model_config = ConfigDict(frozen=False)` to allow `status` mutations. However, this also allows mutation of `plan_id`, `target_decision_id`, `mode`, and `correction_id` after creation. These fields should be immutable-by-design after construction, as changing them would corrupt the correction's identity and any dependent data in `_impacts`/`_attempts`/`_results`. **Suggestion**: Consider using Pydantic's `@property` or `__setattr__` override pattern to allow only `status` to be mutated while keeping other fields frozen. Alternatively, document this as a known trade-off. --- ### LOW Severity #### L1 — Code Quality: Magic number cost/time multipliers **File**: `src/cleveragents/application/services/correction_service.py:215,284` **Description**: `estimated_cost = float(len(affected)) * 1.5` and `recompute_seconds = float(len(affected)) * 2.0` use unexplained magic constants. **Suggestion**: Extract to named module-level constants (e.g., `_COST_PER_DECISION = 1.5`, `_RECOMPUTE_SECONDS_PER_DECISION = 2.0`). #### L2 — Code Quality: Robot helper `COMMANDS` dict type annotation **File**: `robot/helper_correction_subtree_isolation.py:181` **Description**: `COMMANDS: dict[str, object]` should be `dict[str, Callable[[], None]]` for accurate type checking. The current annotation allows non-callable values, and the `if callable(func)` guard at line 201 is only needed because of the loose typing. **Suggestion**: Change to `Callable[[], None]` and remove the runtime callable check. #### L3 — Edge Case: `_find_root` returns arbitrary node in forest/multi-root graphs **File**: `src/cleveragents/application/services/correction_service.py:773-788` **Description**: If the decision tree is a forest (multiple disconnected subtrees), `_find_root` returns the first parent not in any child list. For dict insertion order, this is deterministic but semantically arbitrary. The fallback at line 788 (`next(iter(tree))`) handles the degenerate case where every node is someone's child (i.e., a cycle). **Suggestion**: Document this limitation or raise an error when multiple roots are detected. #### L4 — Design: Ambiguous return value 0 for `_compute_rollback_tier_depth` **File**: `src/cleveragents/application/services/correction_service.py:748-771` **Description**: Returns `0` for both "target is the root" and "target not found in tree." These are semantically different outcomes but indistinguishable. The test at `robot/helper_correction_subtree_isolation.py:173` explicitly expects `0` for an unknown node. **Suggestion**: Consider returning `-1` or `None` for unknown nodes, or document that `0` means "root or unknown." #### L5 — Design: Redundant fields in `CorrectionDryRunReport` **File**: `src/cleveragents/domain/models/core/correction.py:280-287` **Description**: `CorrectionDryRunReport` has `excluded_decisions` and `rollback_tier_depth` as top-level fields that duplicate `report.impact.excluded_decisions` and `report.impact.rollback_tier_depth`. This is intentional convenience access, but increases the surface area for inconsistency. **Suggestion**: No change needed if intentional — but a comment noting the duplication rationale would help. --- ### Summary | Severity | Count | Categories | |----------|-------|------------| | **HIGH** | 4 | 2 bugs, 1 test coverage gap, 1 test defect | | **MEDIUM** | 7 | 1 bug, 2 design issues, 4 test coverage gaps | | **LOW** | 5 | 2 code quality, 2 design, 1 edge case | | **Total** | **16** | | The core subtree isolation logic is well-structured and the BFS algorithm with cycle detection is sound. The main concerns are: (1) the status state machine regression in `execute_revert`, (2) the interaction between `validate_subtree_isolation` and the influence DAG, (3) the complete absence of influence-DAG test coverage despite it being a spec requirement, and (4) duplicate test scenarios. The resource type changes (deferred physical types, fs-directory updates) are clean and consistent with their YAML definitions.

CoreRasurae force-pushed feature/m4-correction-subtree from dd820af598 to f2f889aa3b

2026-03-19 23:25:59 +00:00

Compare

CoreRasurae commented

2026-03-19 23:26:37 +00:00

Review Findings — Resolution Report

Following the code review posted above, each finding was validated against issue #845 acceptance criteria, docs/specification.md, and CONTRIBUTING.md. Findings were either applied as fixes or deferred with justification.

Applied Fixes (amended into feature commit `f2f889aa`)

ID	Fix Applied
H1	Guarded `analyze_impact()` status transition: only sets ANALYZING when status is PENDING, preventing backward EXECUTING→ANALYZING regression
H2	`validate_subtree_isolation()` now uses structural-only BFS for sibling invariant checks, so influence-DAG-caused sibling reachability is not misreported as a violation
H3	Added 3 Behave scenarios + Robot test for influence DAG traversal (edge propagation, cycle detection, isolation with influence edges)
H4	Removed duplicate `as_cli_dict` regression scenarios (lines 277-286) in `resource_type_deferred_physical.feature`
M1	BFS now uses `seen_this_round` set to prevent double-enqueueing from structural + influence edges while preserving deterministic visit order
M3	Added negative test scenario: "Isolation validation detects structural root contamination"
M4	Added Behave scenario + Robot test for append correction mode (spawned child plan, status check)
M5	Added TODO comment on synthetic `affected_files` / `artifacts_to_archive` placeholders
L1	Extracted `_COST_PER_DECISION` and `_RECOMPUTE_SECONDS_PER_DECISION` named constants
L2	Changed `COMMANDS` type annotation to `dict[str, Callable[[], None]]`, removed redundant callable guard
L3/L4	Added docstring notes for `_find_root` forest behavior and `_compute_rollback_tier_depth` ambiguous return value

Not Applied — Justification

ID	Finding	Reason Not Applied
M2	`execute_append` skips `analyze_impact`	Per spec §28600-28607, append mode "leaves history intact" and appends a new corrective plan. Impact analysis is a revert-mode concern (computing what to invalidate). Append intentionally skips it because nothing is reverted. No spec requirement mandates impact analysis for append.
M6	Error handling paths untested	Error path tests already exist in `features/consolidated_correction.feature` (scenarios: "execute_revert catches internal error and returns FAILED result", "execute_append catches internal error and returns FAILED result"). These were verified to pass with the amended code. No additional error tests needed.
M7	`CorrectionRequest` `frozen=False` allows identity field mutation	Pydantic v2 does not support per-field freezing. The only way to allow `status` mutation while freezing other fields would require a custom `__setattr__` override or separating the model into two (immutable identity + mutable state). This is a significant refactor beyond the scope of #845 and would affect all existing consumers. Documented as a known trade-off.
L5	Redundant fields in `CorrectionDryRunReport`	Intentional convenience access pattern. The report's top-level `excluded_decisions` and `rollback_tier_depth` provide quick access without drilling into `report.impact.*`. This is a standard API design pattern for report objects.

Verification

nox -s lint: All checks passed
nox -s typecheck: 0 errors (1 pre-existing warning)
nox -s unit_tests (TEST_PROCESSES=9): 400 features, 11,511 scenarios, 0 failures (+3 new scenarios from review fixes)
Integration tests: passed

## Review Findings — Resolution Report Following the code review posted above, each finding was validated against issue #845 acceptance criteria, `docs/specification.md`, and `CONTRIBUTING.md`. Findings were either applied as fixes or deferred with justification. ### Applied Fixes (amended into feature commit `f2f889aa`) | ID | Fix Applied | |----|-------------| | **H1** | Guarded `analyze_impact()` status transition: only sets ANALYZING when status is PENDING, preventing backward EXECUTING→ANALYZING regression | | **H2** | `validate_subtree_isolation()` now uses structural-only BFS for sibling invariant checks, so influence-DAG-caused sibling reachability is not misreported as a violation | | **H3** | Added 3 Behave scenarios + Robot test for influence DAG traversal (edge propagation, cycle detection, isolation with influence edges) | | **H4** | Removed duplicate `as_cli_dict` regression scenarios (lines 277-286) in `resource_type_deferred_physical.feature` | | **M1** | BFS now uses `seen_this_round` set to prevent double-enqueueing from structural + influence edges while preserving deterministic visit order | | **M3** | Added negative test scenario: "Isolation validation detects structural root contamination" | | **M4** | Added Behave scenario + Robot test for append correction mode (spawned child plan, status check) | | **M5** | Added TODO comment on synthetic `affected_files` / `artifacts_to_archive` placeholders | | **L1** | Extracted `_COST_PER_DECISION` and `_RECOMPUTE_SECONDS_PER_DECISION` named constants | | **L2** | Changed `COMMANDS` type annotation to `dict[str, Callable[[], None]]`, removed redundant callable guard | | **L3/L4** | Added docstring notes for `_find_root` forest behavior and `_compute_rollback_tier_depth` ambiguous return value | ### Not Applied — Justification | ID | Finding | Reason Not Applied | |----|---------|-------------------| | **M2** | `execute_append` skips `analyze_impact` | Per spec §28600-28607, append mode "leaves history intact" and appends a new corrective plan. Impact analysis is a revert-mode concern (computing what to invalidate). Append intentionally skips it because nothing is reverted. No spec requirement mandates impact analysis for append. | | **M6** | Error handling paths untested | Error path tests already exist in `features/consolidated_correction.feature` (scenarios: "execute_revert catches internal error and returns FAILED result", "execute_append catches internal error and returns FAILED result"). These were verified to pass with the amended code. No additional error tests needed. | | **M7** | `CorrectionRequest` `frozen=False` allows identity field mutation | Pydantic v2 does not support per-field freezing. The only way to allow `status` mutation while freezing other fields would require a custom `__setattr__` override or separating the model into two (immutable identity + mutable state). This is a significant refactor beyond the scope of #845 and would affect all existing consumers. Documented as a known trade-off. | | **L5** | Redundant fields in `CorrectionDryRunReport` | Intentional convenience access pattern. The report's top-level `excluded_decisions` and `rollback_tier_depth` provide quick access without drilling into `report.impact.*`. This is a standard API design pattern for report objects. | ### Verification - `nox -s lint`: All checks passed - `nox -s typecheck`: 0 errors (1 pre-existing warning) - `nox -s unit_tests` (TEST_PROCESSES=9): 400 features, 11,511 scenarios, 0 failures (+3 new scenarios from review fixes) - Integration tests: passed

CoreRasurae reviewed 2026-03-20 14:54:41 +00:00

CoreRasurae left a comment

Code Review Report — PR #1075 (feature #845) — Post-Fix Review

Scope: All code changes in feature/m4-correction-subtree (commit f2f889aa) and their immediate surrounding context.
Reviewed against: Issue #845 acceptance criteria, docs/specification.md (§ Correcting Plans, § Affected Subtree Computation, § Rollback Tiers), PR #1075 description, and the previous review resolution report.
Methodology: Three full global review cycles across all categories (bugs/logic, spec compliance, security, performance, test coverage/flaws, documentation), repeated until no new findings emerged. Production call-sites (cli/commands/plan.py, container.py) and existing test suites (consolidated_correction.feature, correction_service_coverage.feature, correction_flows.robot, influence_dag_traversal.robot, et al.) were cross-referenced.

Previous Review Fix Verification

All 12 applied fixes from the previous review (H1–H4, M1, M3–M5, L1–L4) were verified as correctly implemented in the current code:

ID	Fix	Verified
H1	Status guard in `analyze_impact()` (line 198): only transitions PENDING→ANALYZING	Yes — `execute_revert()` sets EXECUTING before calling `analyze_impact()`, guard correctly skips
H2	`validate_subtree_isolation()` uses structural-only BFS (line 635–638)	Yes — `influence_edges=None` passed to internal BFS
H3	Influence DAG test scenarios added (3 Behave + Robot)	Yes — edge propagation, cycle detection, isolation-with-influence all present
H4	Duplicate `as_cli_dict` scenarios removed from `resource_type_deferred_physical.feature`	Yes — lines 277–286 removed
M1	`seen_this_round` set prevents double-enqueueing in BFS (line 726–734)	Yes — correctly deduplicates across structural + influence edges
M3	Negative isolation test added: "Isolation validation detects structural root contamination"	Yes — cyclic tree {root→A, A→root} correctly returns False
M4	Append mode Behave + Robot tests added	Yes — spawned child plan and status verified
M5	TODO comment on synthetic `affected_files`/`artifacts_to_archive` (line 220–223)	Yes
L1	`_COST_PER_DECISION` and `_RECOMPUTE_SECONDS_PER_DECISION` named constants (lines 47–48)	Yes
L2	`COMMANDS` type annotation changed to `dict[str, Callable[[], None]]`	Yes
L3	`_find_root` docstring notes forest behavior (lines 815–821)	Yes
L4	`_compute_rollback_tier_depth` docstring notes ambiguous zero return (lines 786–792)	Yes

All "Not Applied" justifications (M2, M6, M7, L5) from the resolution report remain reasonable.

New Findings

Five additional findings surfaced during post-fix review. None are HIGH severity.

MEDIUM Severity

M1 — Bug (Defensive): Missing `dry_run` enforcement in execution guard

File: correction_service.py:547–554 (_assert_executable)
Description: _assert_executable() checks that request.status is in {PENDING, ANALYZING} but does not check request.dry_run. A caller can create a correction with dry_run=True and then successfully call execute_revert() or execute_append() — the flag is stored on the model (line 147) but never enforced.

The CLI (plan.py:2758–2812) handles this correctly via separate code branches, so the current production path is safe. However, the service API contract is broken: the dry_run parameter on request_correction() promises "only impact analysis will be performed" (line 129) but the service layer does not enforce this promise.

Suggestion: Add a dry_run check in _assert_executable():

if request.dry_run:
    raise ValidationError(
        "Cannot execute a dry-run correction. "
        "Use generate_dry_run_report() or analyze_impact() instead."
    )

M2 — Documentation Bug: `generate_dry_run_report` docstring claims "without mutating state"

File: correction_service.py:256
Description: The docstring reads "Generate a dry-run report without mutating state" but the method calls analyze_impact() which:

Transitions request.status from PENDING → ANALYZING (line 198–199)
Stores the computed impact in self._impacts[correction_id] (line 234)

Both are observable state mutations. The docstring is factually incorrect and could mislead future callers into treating this as a pure/idempotent operation (e.g., calling it repeatedly with different trees and expecting no side effects).

Suggestion: Change the docstring to: "Generate a dry-run report without executing the correction." This accurately describes the guarantee (no decision invalidation, no artifact archival, no child plan spawning) without overpromising about internal state.

LOW Severity

L1 — API Clarity: `validate_subtree_isolation` silently ignores `influence_edges` parameter

File: correction_service.py:603,638
Description: The method signature accepts influence_edges: dict[str, list[str]] | None = None but always passes influence_edges=None to the internal _compute_affected_subtree() call (line 638). This is intentional per the H2 fix (isolation checks use structural-only BFS), but callers receive no indication that the parameter has no effect.

The Behave test "Isolation validation passes when influence DAG reaches sibling" passes influence_edges to this method, which exercises the correct behavior but obscures the fact that the parameter is completely ignored.

Suggestion: Either (a) add a note to the influence_edges parameter docstring: "Accepted for API consistency but not used in isolation checks (structural-only BFS is used per spec § Affected Subtree Computation).", or (b) remove the parameter and update callers.

L2 — Logging Accuracy: Cycle detection log message blames influence DAG exclusively

File: correction_service.py:714–717
Description: The cycle detection warning reads: "Skipping already-visited node during BFS (possible cycle in influence DAG)". With the seen_this_round fix (M1), this warning only fires for genuine cycles. However, cycles can also exist in the structural tree (e.g., the negative test creates {root→A, A→root}). The message incorrectly attributes all cycles to the influence DAG.

Suggestion: Change to: "Skipping already-visited node during BFS (possible cycle in decision tree or influence DAG)".

L3 — Defensive Programming: No runtime type validation on adjacency list values

File: correction_service.py:727–734 (BFS inner loop)
Description: decision_tree and influence_edges are typed as dict[str, list[str]] but Python does not enforce type hints at runtime. If a caller passes malformed values (e.g., {"root": "ABC"} — a string instead of a list), the BFS for neighbor in tree.get(node, []) would iterate over individual characters ("A", "B", "C"), producing silently incorrect results.

In a properly typed codebase with mypy/pyright, this would be caught statically. The risk is low in practice since the production caller (plan.py) constructs these from DecisionService queries.

Suggestion: No immediate action required. If desired, a runtime guard could be added: assert isinstance(children, list) in the BFS loop, or Pydantic validation on a typed wrapper.

Summary

Severity	Count	Categories
HIGH	0	—
MEDIUM	2	1 defensive bug, 1 documentation bug
LOW	3	1 API clarity, 1 logging accuracy, 1 defensive programming
Total	5

The code is in good shape after the previous review fixes. The core subtree isolation logic, BFS traversal, rollback tier computation, and isolation validation are all correct. The seen_this_round BFS fix, structural-only isolation check, and status guard are properly implemented. All 12 previous fixes verified. The remaining findings are defensive hardening and documentation accuracy improvements — none block merge.

## Code Review Report — PR #1075 (feature #845) — Post-Fix Review **Scope**: All code changes in `feature/m4-correction-subtree` (commit `f2f889aa`) and their immediate surrounding context. **Reviewed against**: Issue #845 acceptance criteria, `docs/specification.md` (§ Correcting Plans, § Affected Subtree Computation, § Rollback Tiers), PR #1075 description, and the previous review resolution report. **Methodology**: Three full global review cycles across all categories (bugs/logic, spec compliance, security, performance, test coverage/flaws, documentation), repeated until no new findings emerged. Production call-sites (`cli/commands/plan.py`, `container.py`) and existing test suites (`consolidated_correction.feature`, `correction_service_coverage.feature`, `correction_flows.robot`, `influence_dag_traversal.robot`, et al.) were cross-referenced. --- ### Previous Review Fix Verification All 12 applied fixes from the previous review (H1–H4, M1, M3–M5, L1–L4) were verified as correctly implemented in the current code: | ID | Fix | Verified | |----|-----|----------| | H1 | Status guard in `analyze_impact()` (line 198): only transitions PENDING→ANALYZING | Yes — `execute_revert()` sets EXECUTING before calling `analyze_impact()`, guard correctly skips | | H2 | `validate_subtree_isolation()` uses structural-only BFS (line 635–638) | Yes — `influence_edges=None` passed to internal BFS | | H3 | Influence DAG test scenarios added (3 Behave + Robot) | Yes — edge propagation, cycle detection, isolation-with-influence all present | | H4 | Duplicate `as_cli_dict` scenarios removed from `resource_type_deferred_physical.feature` | Yes — lines 277–286 removed | | M1 | `seen_this_round` set prevents double-enqueueing in BFS (line 726–734) | Yes — correctly deduplicates across structural + influence edges | | M3 | Negative isolation test added: "Isolation validation detects structural root contamination" | Yes — cyclic tree {root→A, A→root} correctly returns False | | M4 | Append mode Behave + Robot tests added | Yes — spawned child plan and status verified | | M5 | TODO comment on synthetic `affected_files`/`artifacts_to_archive` (line 220–223) | Yes | | L1 | `_COST_PER_DECISION` and `_RECOMPUTE_SECONDS_PER_DECISION` named constants (lines 47–48) | Yes | | L2 | `COMMANDS` type annotation changed to `dict[str, Callable[[], None]]` | Yes | | L3 | `_find_root` docstring notes forest behavior (lines 815–821) | Yes | | L4 | `_compute_rollback_tier_depth` docstring notes ambiguous zero return (lines 786–792) | Yes | All "Not Applied" justifications (M2, M6, M7, L5) from the resolution report remain reasonable. --- ### New Findings Five additional findings surfaced during post-fix review. None are HIGH severity. --- ### MEDIUM Severity #### M1 — Bug (Defensive): Missing `dry_run` enforcement in execution guard **File**: `correction_service.py:547–554` (`_assert_executable`) **Description**: `_assert_executable()` checks that `request.status` is in `{PENDING, ANALYZING}` but does **not** check `request.dry_run`. A caller can create a correction with `dry_run=True` and then successfully call `execute_revert()` or `execute_append()` — the flag is stored on the model (line 147) but never enforced. The CLI (`plan.py:2758–2812`) handles this correctly via separate code branches, so the current production path is safe. However, the service API contract is broken: the `dry_run` parameter on `request_correction()` promises _"only impact analysis will be performed"_ (line 129) but the service layer does not enforce this promise. **Suggestion**: Add a `dry_run` check in `_assert_executable()`: ```python if request.dry_run: raise ValidationError( "Cannot execute a dry-run correction. " "Use generate_dry_run_report() or analyze_impact() instead." ) ``` #### M2 — Documentation Bug: `generate_dry_run_report` docstring claims "without mutating state" **File**: `correction_service.py:256` **Description**: The docstring reads _"Generate a dry-run report without mutating state"_ but the method calls `analyze_impact()` which: 1. Transitions `request.status` from PENDING → ANALYZING (line 198–199) 2. Stores the computed impact in `self._impacts[correction_id]` (line 234) Both are observable state mutations. The docstring is factually incorrect and could mislead future callers into treating this as a pure/idempotent operation (e.g., calling it repeatedly with different trees and expecting no side effects). **Suggestion**: Change the docstring to: _"Generate a dry-run report **without executing the correction**."_ This accurately describes the guarantee (no decision invalidation, no artifact archival, no child plan spawning) without overpromising about internal state. --- ### LOW Severity #### L1 — API Clarity: `validate_subtree_isolation` silently ignores `influence_edges` parameter **File**: `correction_service.py:603,638` **Description**: The method signature accepts `influence_edges: dict[str, list[str]] | None = None` but always passes `influence_edges=None` to the internal `_compute_affected_subtree()` call (line 638). This is intentional per the H2 fix (isolation checks use structural-only BFS), but callers receive no indication that the parameter has no effect. The Behave test _"Isolation validation passes when influence DAG reaches sibling"_ passes `influence_edges` to this method, which exercises the correct behavior but obscures the fact that the parameter is completely ignored. **Suggestion**: Either (a) add a note to the `influence_edges` parameter docstring: _"Accepted for API consistency but not used in isolation checks (structural-only BFS is used per spec § Affected Subtree Computation)."_, or (b) remove the parameter and update callers. #### L2 — Logging Accuracy: Cycle detection log message blames influence DAG exclusively **File**: `correction_service.py:714–717` **Description**: The cycle detection warning reads: `"Skipping already-visited node during BFS (possible cycle in influence DAG)"`. With the `seen_this_round` fix (M1), this warning only fires for genuine cycles. However, cycles can also exist in the **structural tree** (e.g., the negative test creates `{root→A, A→root}`). The message incorrectly attributes all cycles to the influence DAG. **Suggestion**: Change to: `"Skipping already-visited node during BFS (possible cycle in decision tree or influence DAG)"`. #### L3 — Defensive Programming: No runtime type validation on adjacency list values **File**: `correction_service.py:727–734` (BFS inner loop) **Description**: `decision_tree` and `influence_edges` are typed as `dict[str, list[str]]` but Python does not enforce type hints at runtime. If a caller passes malformed values (e.g., `{"root": "ABC"}` — a string instead of a list), the BFS `for neighbor in tree.get(node, [])` would iterate over individual characters (`"A"`, `"B"`, `"C"`), producing silently incorrect results. In a properly typed codebase with mypy/pyright, this would be caught statically. The risk is low in practice since the production caller (`plan.py`) constructs these from `DecisionService` queries. **Suggestion**: No immediate action required. If desired, a runtime guard could be added: `assert isinstance(children, list)` in the BFS loop, or Pydantic validation on a typed wrapper. --- ### Summary | Severity | Count | Categories | |----------|-------|------------| | **HIGH** | 0 | — | | **MEDIUM** | 2 | 1 defensive bug, 1 documentation bug | | **LOW** | 3 | 1 API clarity, 1 logging accuracy, 1 defensive programming | | **Total** | **5** | | The code is in good shape after the previous review fixes. The core subtree isolation logic, BFS traversal, rollback tier computation, and isolation validation are all correct. The `seen_this_round` BFS fix, structural-only isolation check, and status guard are properly implemented. All 12 previous fixes verified. The remaining findings are defensive hardening and documentation accuracy improvements — none block merge.

CoreRasurae commented

2026-03-20 16:07:39 +00:00

Review Findings — Resolution Report (Post-Fix Review)

Following the post-fix code review, each finding was validated against issue #845 acceptance criteria, docs/specification.md, and CONTRIBUTING.md. Findings were either applied as fixes or deferred with justification.

Applied Fixes (amended into feature commit `fd4d66f8`)

ID	Fix Applied
M1	Added `dry_run` enforcement guard in `_assert_executable()`: raises `ValidationError` when `request.dry_run` is `True`, preventing execution of dry-run-only corrections per spec § `plan correct --dry-run: "Show impact without executing"` and CONTRIBUTING.md § Argument Validation ("Invalid States: Verify object state is valid for the operation"). Added 2 Behave scenarios (revert + append dry-run enforcement) and 1 Robot test case.
M2	Fixed `generate_dry_run_report()` docstring: changed "without mutating state" to "without executing the correction" with explicit note about `analyze_impact()` side-effects (status transition, impact storage). Per CONTRIBUTING.md § Documentation Standards: "Update documentation alongside code changes."
L1	Added docstring note to `validate_subtree_isolation()` `influence_edges` parameter: "Accepted for API consistency but not used in isolation checks — structural-only BFS is used per the specification (§ Affected Subtree Computation)."
L2	Fixed cycle-detection log message from "possible cycle in influence DAG" to "possible cycle in decision tree or influence DAG" — cycles can originate in either structure.

Not Applied — Justification

ID	Finding	Reason Not Applied
L3	No runtime type validation on `decision_tree`/`influence_edges` values (strings instead of lists would cause silent BFS character iteration)	Per CONTRIBUTING.md § Error and Exception Handling → Argument Validation → Type Verification: "Prefer catching type errors at compile time through static type systems whenever possible; resort to runtime type checks only when the language lacks compile-time enforcement or when interfacing with untyped boundaries (e.g., deserialized input, dynamic plugin APIs)." The adjacency lists are constructed by internal typed services (`DecisionService.list_decisions`, `DecisionService.get_influence_edges`) — this is a typed internal boundary, not an untyped external interface. The existing type annotations + pyright enforcement are the correct mechanism per CONTRIBUTING.md guidelines.

Verification

nox -s lint: All checks passed
nox -s typecheck: 0 errors (1 pre-existing warning)
nox -s unit_tests (TEST_PROCESSES=9): 400 features, 11,513 scenarios, 0 failures (+2 new scenarios from review fixes)
nox -s integration_tests (TEST_PROCESSES=9): All passed (including new Dry Run Enforcement Robot test)

## Review Findings — Resolution Report (Post-Fix Review) Following the post-fix code review, each finding was validated against issue #845 acceptance criteria, `docs/specification.md`, and `CONTRIBUTING.md`. Findings were either applied as fixes or deferred with justification. ### Applied Fixes (amended into feature commit `fd4d66f8`) | ID | Fix Applied | |----|-------------| | **M1** | Added `dry_run` enforcement guard in `_assert_executable()`: raises `ValidationError` when `request.dry_run` is `True`, preventing execution of dry-run-only corrections per spec § `plan correct --dry-run: "Show impact without executing"` and CONTRIBUTING.md § Argument Validation ("Invalid States: Verify object state is valid for the operation"). Added 2 Behave scenarios (revert + append dry-run enforcement) and 1 Robot test case. | | **M2** | Fixed `generate_dry_run_report()` docstring: changed "without mutating state" to "without executing the correction" with explicit note about `analyze_impact()` side-effects (status transition, impact storage). Per CONTRIBUTING.md § Documentation Standards: "Update documentation alongside code changes." | | **L1** | Added docstring note to `validate_subtree_isolation()` `influence_edges` parameter: "Accepted for API consistency but **not used** in isolation checks — structural-only BFS is used per the specification (§ Affected Subtree Computation)." | | **L2** | Fixed cycle-detection log message from "possible cycle in influence DAG" to "possible cycle in decision tree or influence DAG" — cycles can originate in either structure. | ### Not Applied — Justification | ID | Finding | Reason Not Applied | |----|---------|-------------------| | **L3** | No runtime type validation on `decision_tree`/`influence_edges` values (strings instead of lists would cause silent BFS character iteration) | Per CONTRIBUTING.md § Error and Exception Handling → Argument Validation → Type Verification: "Prefer catching type errors at compile time through static type systems whenever possible; resort to runtime type checks only when the language lacks compile-time enforcement or **when interfacing with untyped boundaries** (e.g., deserialized input, dynamic plugin APIs)." The adjacency lists are constructed by internal typed services (`DecisionService.list_decisions`, `DecisionService.get_influence_edges`) — this is a typed internal boundary, not an untyped external interface. The existing type annotations + pyright enforcement are the correct mechanism per CONTRIBUTING.md guidelines. | ### Verification - `nox -s lint`: All checks passed - `nox -s typecheck`: 0 errors (1 pre-existing warning) - `nox -s unit_tests` (TEST_PROCESSES=9): 400 features, 11,513 scenarios, 0 failures (+2 new scenarios from review fixes) - `nox -s integration_tests` (TEST_PROCESSES=9): All passed (including new `Dry Run Enforcement` Robot test)

CoreRasurae force-pushed feature/m4-correction-subtree from f2f889aa3b to fd4d66f816

2026-03-20 16:16:15 +00:00

Compare

CoreRasurae commented

2026-03-20 16:41:26 +00:00

Code Review Report — PR #1075 (`feature/m4-correction-subtree`)

Reviewer: Automated deep review (3 global cycles: bugs, security, performance, test coverage, spec compliance)
Scope: All code changes in the branch plus close connections to surrounding code
Reference: Issue #845, docs/specification.md §28560-28787 (Correcting Plans)

Executive Summary

This PR implements subtree-scoped correction in CorrectionService (excluded decisions, rollback tier depth, dry-run enhancements, subtree isolation validation, BFS fixes) and adds 11 deferred physical resource types. The correction logic is well-structured and the BDD/Robot tests cover many paths. However, the review identified 50 findings across 6 categories, including issues that should be addressed before merge.

Finding distribution: 2 Critical (test gaps) | 9 High | 22 Medium | 19 Low

P1 — HIGH SEVERITY

P1-1 [BUG / Spec Compliance] `affected_child_plans` always empty — cross-plan cascade rejection never fires

File: correction_service.py:225

CorrectionImpact is always created with affected_child_plans=[]. The BFS in _compute_affected_subtree does not track child plan references. The spec pseudocode (§28670-28696) explicitly tracks affected_plans by querying decision_dependencies where dependency_type = 'plan'. The cross-plan cascading requirement (§28699-28708) — "Correction rejected when affected subtree includes child plans in APPLIED state" — can never trigger. The domain models (CorrectionRejection, CascadeAction, CascadeResult, ChildPlanState in correction.py:298-394) are defined but never instantiated. CorrectionStatus.REJECTED is unreachable.

Impact: Corrections that should be rejected per spec will proceed unchallenged, potentially invalidating decisions whose child plans have already applied irreversible changes.

P1-2 [BUG] `analyze_impact` allows re-analysis on terminal states

File: correction_service.py:198-199

Only guards PENDING → ANALYZING. No rejection for APPLIED, FAILED, CANCELLED, or REJECTED. A caller can re-analyze a completed/failed correction, silently overwriting _impacts[correction_id] with new data from different tree/DAG arguments.

Impact: Post-execution audit data can be corrupted. The stored impact no longer matches what was used during execution.

Suggested fix: Reject analyze_impact when status is in a terminal set.

P1-3 [SECURITY] Unbounded BFS — Denial of Service via large decision tree

File: correction_service.py:718-759

_compute_affected_subtree has no upper bound on nodes visited. The visited set prevents cycles but doesn't cap total size. A pathologically large tree causes unbounded memory and CPU.

Suggested fix: Add a MAX_AFFECTED_NODES limit (e.g., 10,000). Raise ValidationError when exceeded.

P1-4 [SECURITY] Raw exception messages stored in `CorrectionResult`

File: correction_service.py:369-377, 437-444

Both execute_revert and execute_append catch bare Exception and store str(exc) into CorrectionResult.error_message. Internal paths, connection strings, or module names could be exposed.

Suggested fix: Store a generic message; log the full exception server-side only.

P1-5 [TEST] `execute_revert` is never tested in a non-dry-run scenario

Files: All test files

No test calls execute_revert() or execute_correction() with mode=REVERT and dry_run=False. Lines 327-388 of correction_service.py are entirely untested — status transition to EXECUTING, CorrectionAttempt creation, result with reverted_decisions/archived_artifacts, and the FAILED exception-handling branch.

Impact: The core feature of this PR (subtree-scoped revert) is never exercised end-to-end.

P1-6 [TEST] `_assert_executable` status guard (non-dry-run) untested

Files: All test files

Guard (2) in _assert_executable — preventing re-execution of APPLIED/FAILED/CANCELLED corrections — is completely untested. Only the dry_run guard (1) is exercised.

Impact: A regression removing the status guard would allow double-execution of corrections.

P1-7 [RESOURCE] `fs-mount` removed from `fs-directory.parent_types` — asymmetric DAG

Files: examples/resource-types/fs-directory.yaml, _resource_registry_data.py, resource_type_builtins.feature, resource_type_fs.feature

fs-directory.parent_types no longer includes fs-mount, but fs-mount.child_types still declares fs-directory. Test assertions for this relationship were removed. This creates a one-directional relationship that breaks bidirectional DAG consistency.

Suggested fix: Either restore fs-mount to fs-directory.parent_types or also remove fs-directory from fs-mount.child_types.

P1-8 [PERFORMANCE] `analyze_impact()` fully recomputed on every call

File: correction_service.py:275, 359

A typical workflow (dry-run → validate → execute) triggers 3 separate BFS traversals, plus 3x _collect_all_decisions, 3x _compute_rollback_tier_depth, all producing identical results on the same tree.

Suggested fix: Return cached _impacts[correction_id] if inputs haven't changed, or build a lightweight analysis context cached per tree.

P1-9 [BUG] `execute_revert` / `execute_append` don't validate mode matches method

File: correction_service.py:327-388, 394-455

A caller bypassing execute_correction can call execute_revert on an APPEND request (or vice versa). The result would contain reverted_decisions for an APPEND-mode correction — semantically incoherent.

Suggested fix: Add if request.mode != CorrectionMode.REVERT: raise ValidationError(...) at the top of execute_revert, and similarly for execute_append.

P2 — MEDIUM SEVERITY

P2-1 [BUG] Spurious "cycle detected" warnings from convergent (non-cyclic) paths

File: correction_service.py:723-732

A node reachable via two different parents (diamond topology) gets enqueued twice. On second dequeue, the visited check fires a false correction.cycle_detected warning. The seen_this_round set only prevents double-enqueueing from the same node's edges, not from different parents.

Suggested fix: Use a global enqueued set (initialized with {target_id}) checked at enqueue time, not just at dequeue time. This eliminates both double-enqueueing and seen_this_round.

P2-2 [BUG] `_collect_all_decisions` misses isolated target decision

File: correction_service.py:207-209

If the target is an isolated node (not in any adjacency list), it won't appear in all_decisions, breaking the affected + excluded = all partition invariant.

Suggested fix: all_decisions.add(request.target_decision_id) after the _collect_all_decisions call.

P2-3 [BUG] `_compute_rollback_tier_depth` returns 0 for absent nodes — false tier-0 warning

File: correction_service.py:790-821, 290-294

Returns 0 for both "target is root" and "target not in tree." The dry-run warning check if rollback_tier_depth == 0 and len(affected) > 1 triggers falsely for non-root targets.

Suggested fix: Return -1 for "not found" or verify the target is actually a root before emitting the tier-0 warning.

P2-4 [BUG] `generate_dry_run_report` mutates request status

File: correction_service.py:275

Calls analyze_impact which transitions status PENDING → ANALYZING. A conceptually read-only preview permanently mutates lifecycle state.

Suggested fix: Save and restore status around the analyze_impact call in the dry-run path.

P2-5 [BUG] Diamond inheritance gives nondeterministic tier depth

File: correction_service.py:808-811

child_to_parent[child] = parent — last writer wins. For a child with multiple parents, the depth depends on dict iteration order.

P2-6 [BUG] `execute_append` performs no impact analysis

File: correction_service.py:394-455

Unlike execute_revert, append never calls analyze_impact. No CorrectionImpact is stored. Querying _impacts[correction_id] after append raises KeyError.

P2-7 [BUG] No state-machine enforcement on `CorrectionRequest.status`

File: correction.py:56, 86-89

CorrectionRequest has frozen=False. Any code can assign any status at any time, including backward transitions (APPLIED → PENDING).

P2-8 [BUG] Thread safety — status transitions and dict mutations unguarded

File: correction_service.py:348-367

Check-then-act pattern in execution methods is racy. Two threads can pass _assert_executable simultaneously and execute the same correction twice.

P2-9 [SPEC] "phase" rollback tier is dead code

File: correction_service.py:228-230

The model validates rollback_tier against {"full", "phase", "append_only"}, but the service only produces "full" (REVERT) or "append_only" (APPEND). The spec's "Phase-level" tier has no code path.

P2-10 [SPEC] Dry-run report missing `artifacts_to_archive` and `estimated_cost` as top-level fields

File: correction.py:251-295

These exist only nested in report.impact. Compare with excluded_decisions and decisions_to_invalidate which are surfaced as top-level fields. Inconsistent API.

P2-11 [SECURITY] No input validation on `decision_tree` / `influence_edges` contents

File: correction_service.py:165-170

No validation on key/value types, string lengths, max entries, or ID format at the service boundary.

P2-12 [SECURITY] Path traversal in synthetic artifact/file paths

File: correction_service.py:215-224

f"{d}.artifact" and f"{d}.py" where d comes from user-supplied tree keys. If d = "../../etc/passwd", downstream I/O could be exploitable.

P2-13 [SECURITY] No `max_length` on `guidance` field

File: correction.py:74-77

Unbounded string field. request_correction(guidance="A" * 100_000_000) consumes memory.

P2-14 [RESOURCE] `git-checkout` YAML diverges from registry data

Files: examples/resource-types/git-checkout.yaml vs _resource_registry_data.py:69-106

YAML has child_types: ["git", "fs-directory"], registry has 5 children. Handler paths differ. Auto_discovery present in YAML but absent in registry. BDD tests load from YAML, providing false confidence.

P2-15 [TEST] Missing test: single-node tree

No test uses {"root": []}. The BFS, tier computation, and isolation validation all have this as an edge case.

P2-16 [TEST] Missing test: diamond-shaped DAG

No test for convergent paths. _compute_rollback_tier_depth gives nondeterministic results (P2-5).

P2-17 [TEST] Missing test: disconnected trees (forests)

_find_root has explicit forest handling but no test validates it.

P2-18 [TEST] Missing test: target node not in tree (full impact analysis)

Only compute_rollback_tier is tested for unknown nodes, not analyze_impact.

P2-19 [TEST] Missing test: influence edge to node not in structural tree

No test for {"D": ["EXTERNAL"]} where EXTERNAL isn't in the tree.

P2-20 [TEST] Middle-node test uses "contains" instead of exact match

File: correction_subtree_isolation.feature:28-36

If BFS over-includes nodes, the test still passes. Add exact count or exact-match assertions.

P2-21 [TEST] Robot tests are pure sentinel checks

File: robot/correction_subtree_isolation.robot

Every test: run helper → check exit code → check stdout sentinel. No independent field verification.

P2-22 [RESOURCE] No exclusivity assertions on parent/child type lists

File: resource_type_deferred_physical.feature

"should contain" but never "should have exactly N entries." Extra entries go undetected.

P3 — LOW SEVERITY

P3-1 [BUG] `execute_revert` skips ANALYZING state in lifecycle

PENDING → EXECUTING directly, skipping ANALYZING. The implied state machine is violated.

P3-2 [BUG] `CorrectionDryRunReport` duplicates fields from nested `impact`

report.excluded_decisions and report.impact.excluded_decisions are always identical. Maintenance hazard.

P3-3 [BUG] `CorrectionResult` accepts non-terminal statuses

No validator prevents status=PENDING on a result.

P3-4 [BUG] `list_attempts` leaks mutable internal state

Returns shallow copy but CorrectionAttempt has frozen=False. Callers can corrupt attempt records.

P3-5 [SECURITY] Log injection via user-controlled tree keys

File: correction_service.py:726-731

Decision IDs from user input included in structured log node=node. Console log rendering may not escape control characters.

P3-6 [SECURITY] `exc_info=True` in event_bus warning log

File: correction_service.py:103-108

Full traceback logged, potentially exposing internal paths.

P3-7 [SECURITY] No `max_length` on `plan_id` / `target_decision_id`

File: correction.py:62-68

Validators check non-empty but not max length.

P3-8 [PERFORMANCE] `_compute_rollback_tier_depth` rebuilds child-to-parent map each call

File: correction_service.py:808-811

O(E) reconstruction duplicated across calls.

P3-9 [PERFORMANCE] `_find_parent` is O(E) linear scan

File: correction_service.py:849-857

Should reuse the child_to_parent map for O(1) lookup.

P3-10 [PERFORMANCE] `validate_subtree_isolation` re-runs full BFS

File: correction_service.py:649-653

Could reuse BFS result from analyze_impact.

P3-11 [PERFORMANCE] `seen_this_round` set allocated per BFS node

File: correction_service.py:740

Replace with a single global enqueued set.

P3-12 [PERFORMANCE] BFS `visited` set discarded then rebuilt as `affected_set`

File: correction_service.py:208, 720

Return the visited set from BFS to avoid O(V) rehash.

P3-13 [PERFORMANCE] `_find_root` builds redundant `all_children` set

File: correction_service.py:839-841

Could reuse child_to_parent map.

P3-14 [PERFORMANCE] `sorted()` on excluded decisions may be unnecessary

File: correction_service.py:209

O(K log K) sort serves no documented requirement.

P3-15 [TEST] Missing tests for `_collect_all_decisions`, `cancel_correction`, `list_corrections`, `get_correction`, `list_attempts`

These public/internal API methods have zero test coverage in the new test files.

P3-16 [TEST] Missing test for `_emit_correction_applied` event emission

No test provides an event_bus to CorrectionService.

P3-17 [TEST] Missing test for risk classification thresholds

_classify_risk boundary values (3, 10, 11) are untested.

P3-18 [TEST] Influence cycle test and validate_subtree_isolation lack strong assertions

Cycle test only checks "contains" not exact set. Only 1 negative isolation scenario.

P3-19 [RESOURCE] `git-branch` auto_discovery not tested; leaf types missing `auto_discovery=none` assertions

Summary Table

Severity	Bugs	Security	Performance	Spec	Test	Resource	Total
P1 (High)	3	2	1	—	2	1	9
P2 (Medium)	6	3	—	2	8	3	22
P3 (Low)	4	3	7	—	4	1	19
Total	13	8	8	2	14	5	50

Recommended Priority for Fixes Before Merge

P1-1 (affected_child_plans empty) — spec-mandated safety mechanism missing
P1-5 + P1-6 (execute_revert and status guard untested) — core feature lacks test coverage
P1-2 (analyze_impact terminal state guard) — data integrity risk
P1-7 (fs-mount DAG asymmetry) — data model inconsistency
P1-9 (mode mismatch in direct method calls) — defensive validation
P2-1 through P2-4 (convergent BFS warnings, isolated target, tier-0 false warning, dry-run side effect) — correctness fixes

# Code Review Report — PR #1075 (`feature/m4-correction-subtree`) **Reviewer:** Automated deep review (3 global cycles: bugs, security, performance, test coverage, spec compliance) **Scope:** All code changes in the branch plus close connections to surrounding code **Reference:** Issue #845, `docs/specification.md` §28560-28787 (Correcting Plans) --- ## Executive Summary This PR implements subtree-scoped correction in `CorrectionService` (excluded decisions, rollback tier depth, dry-run enhancements, subtree isolation validation, BFS fixes) and adds 11 deferred physical resource types. The correction logic is well-structured and the BDD/Robot tests cover many paths. However, the review identified **50 findings across 6 categories**, including issues that should be addressed before merge. **Finding distribution:** 2 Critical (test gaps) | 9 High | 22 Medium | 19 Low --- ## P1 — HIGH SEVERITY ### P1-1 [BUG / Spec Compliance] `affected_child_plans` always empty — cross-plan cascade rejection never fires **File:** `correction_service.py:225` `CorrectionImpact` is always created with `affected_child_plans=[]`. The BFS in `_compute_affected_subtree` does not track child plan references. The spec pseudocode (§28670-28696) explicitly tracks `affected_plans` by querying `decision_dependencies` where `dependency_type = 'plan'`. The cross-plan cascading requirement (§28699-28708) — "Correction rejected when affected subtree includes child plans in APPLIED state" — can never trigger. The domain models (`CorrectionRejection`, `CascadeAction`, `CascadeResult`, `ChildPlanState` in `correction.py:298-394`) are defined but never instantiated. `CorrectionStatus.REJECTED` is unreachable. **Impact:** Corrections that should be rejected per spec will proceed unchallenged, potentially invalidating decisions whose child plans have already applied irreversible changes. ### P1-2 [BUG] `analyze_impact` allows re-analysis on terminal states **File:** `correction_service.py:198-199` Only guards `PENDING → ANALYZING`. No rejection for `APPLIED`, `FAILED`, `CANCELLED`, or `REJECTED`. A caller can re-analyze a completed/failed correction, silently overwriting `_impacts[correction_id]` with new data from different tree/DAG arguments. **Impact:** Post-execution audit data can be corrupted. The stored impact no longer matches what was used during execution. **Suggested fix:** Reject `analyze_impact` when status is in a terminal set. ### P1-3 [SECURITY] Unbounded BFS — Denial of Service via large decision tree **File:** `correction_service.py:718-759` `_compute_affected_subtree` has no upper bound on nodes visited. The `visited` set prevents cycles but doesn't cap total size. A pathologically large tree causes unbounded memory and CPU. **Suggested fix:** Add a `MAX_AFFECTED_NODES` limit (e.g., 10,000). Raise `ValidationError` when exceeded. ### P1-4 [SECURITY] Raw exception messages stored in `CorrectionResult` **File:** `correction_service.py:369-377, 437-444` Both `execute_revert` and `execute_append` catch bare `Exception` and store `str(exc)` into `CorrectionResult.error_message`. Internal paths, connection strings, or module names could be exposed. **Suggested fix:** Store a generic message; log the full exception server-side only. ### P1-5 [TEST] `execute_revert` is never tested in a non-dry-run scenario **Files:** All test files No test calls `execute_revert()` or `execute_correction()` with `mode=REVERT` and `dry_run=False`. Lines 327-388 of `correction_service.py` are entirely untested — status transition to EXECUTING, `CorrectionAttempt` creation, result with `reverted_decisions`/`archived_artifacts`, and the FAILED exception-handling branch. **Impact:** The core feature of this PR (subtree-scoped revert) is never exercised end-to-end. ### P1-6 [TEST] `_assert_executable` status guard (non-dry-run) untested **Files:** All test files Guard (2) in `_assert_executable` — preventing re-execution of APPLIED/FAILED/CANCELLED corrections — is completely untested. Only the dry_run guard (1) is exercised. **Impact:** A regression removing the status guard would allow double-execution of corrections. ### P1-7 [RESOURCE] `fs-mount` removed from `fs-directory.parent_types` — asymmetric DAG **Files:** `examples/resource-types/fs-directory.yaml`, `_resource_registry_data.py`, `resource_type_builtins.feature`, `resource_type_fs.feature` `fs-directory.parent_types` no longer includes `fs-mount`, but `fs-mount.child_types` still declares `fs-directory`. Test assertions for this relationship were removed. This creates a one-directional relationship that breaks bidirectional DAG consistency. **Suggested fix:** Either restore `fs-mount` to `fs-directory.parent_types` or also remove `fs-directory` from `fs-mount.child_types`. ### P1-8 [PERFORMANCE] `analyze_impact()` fully recomputed on every call **File:** `correction_service.py:275, 359` A typical workflow (dry-run → validate → execute) triggers 3 separate BFS traversals, plus 3x `_collect_all_decisions`, 3x `_compute_rollback_tier_depth`, all producing identical results on the same tree. **Suggested fix:** Return cached `_impacts[correction_id]` if inputs haven't changed, or build a lightweight analysis context cached per tree. ### P1-9 [BUG] `execute_revert` / `execute_append` don't validate mode matches method **File:** `correction_service.py:327-388, 394-455` A caller bypassing `execute_correction` can call `execute_revert` on an APPEND request (or vice versa). The result would contain `reverted_decisions` for an APPEND-mode correction — semantically incoherent. **Suggested fix:** Add `if request.mode != CorrectionMode.REVERT: raise ValidationError(...)` at the top of `execute_revert`, and similarly for `execute_append`. --- ## P2 — MEDIUM SEVERITY ### P2-1 [BUG] Spurious "cycle detected" warnings from convergent (non-cyclic) paths **File:** `correction_service.py:723-732` A node reachable via two different parents (diamond topology) gets enqueued twice. On second dequeue, the `visited` check fires a false `correction.cycle_detected` warning. The `seen_this_round` set only prevents double-enqueueing from the *same* node's edges, not from different parents. **Suggested fix:** Use a global `enqueued` set (initialized with `{target_id}`) checked at enqueue time, not just at dequeue time. This eliminates both double-enqueueing and `seen_this_round`. ### P2-2 [BUG] `_collect_all_decisions` misses isolated target decision **File:** `correction_service.py:207-209` If the target is an isolated node (not in any adjacency list), it won't appear in `all_decisions`, breaking the `affected + excluded = all` partition invariant. **Suggested fix:** `all_decisions.add(request.target_decision_id)` after the `_collect_all_decisions` call. ### P2-3 [BUG] `_compute_rollback_tier_depth` returns 0 for absent nodes — false tier-0 warning **File:** `correction_service.py:790-821, 290-294` Returns 0 for both "target is root" and "target not in tree." The dry-run warning check `if rollback_tier_depth == 0 and len(affected) > 1` triggers falsely for non-root targets. **Suggested fix:** Return -1 for "not found" or verify the target is actually a root before emitting the tier-0 warning. ### P2-4 [BUG] `generate_dry_run_report` mutates request status **File:** `correction_service.py:275` Calls `analyze_impact` which transitions status `PENDING → ANALYZING`. A conceptually read-only preview permanently mutates lifecycle state. **Suggested fix:** Save and restore status around the `analyze_impact` call in the dry-run path. ### P2-5 [BUG] Diamond inheritance gives nondeterministic tier depth **File:** `correction_service.py:808-811` `child_to_parent[child] = parent` — last writer wins. For a child with multiple parents, the depth depends on dict iteration order. ### P2-6 [BUG] `execute_append` performs no impact analysis **File:** `correction_service.py:394-455` Unlike `execute_revert`, append never calls `analyze_impact`. No `CorrectionImpact` is stored. Querying `_impacts[correction_id]` after append raises `KeyError`. ### P2-7 [BUG] No state-machine enforcement on `CorrectionRequest.status` **File:** `correction.py:56, 86-89` `CorrectionRequest` has `frozen=False`. Any code can assign any status at any time, including backward transitions (`APPLIED → PENDING`). ### P2-8 [BUG] Thread safety — status transitions and dict mutations unguarded **File:** `correction_service.py:348-367` Check-then-act pattern in execution methods is racy. Two threads can pass `_assert_executable` simultaneously and execute the same correction twice. ### P2-9 [SPEC] "phase" rollback tier is dead code **File:** `correction_service.py:228-230` The model validates `rollback_tier` against `{"full", "phase", "append_only"}`, but the service only produces "full" (REVERT) or "append_only" (APPEND). The spec's "Phase-level" tier has no code path. ### P2-10 [SPEC] Dry-run report missing `artifacts_to_archive` and `estimated_cost` as top-level fields **File:** `correction.py:251-295` These exist only nested in `report.impact`. Compare with `excluded_decisions` and `decisions_to_invalidate` which are surfaced as top-level fields. Inconsistent API. ### P2-11 [SECURITY] No input validation on `decision_tree` / `influence_edges` contents **File:** `correction_service.py:165-170` No validation on key/value types, string lengths, max entries, or ID format at the service boundary. ### P2-12 [SECURITY] Path traversal in synthetic artifact/file paths **File:** `correction_service.py:215-224` `f"{d}.artifact"` and `f"{d}.py"` where `d` comes from user-supplied tree keys. If `d = "../../etc/passwd"`, downstream I/O could be exploitable. ### P2-13 [SECURITY] No `max_length` on `guidance` field **File:** `correction.py:74-77` Unbounded string field. `request_correction(guidance="A" * 100_000_000)` consumes memory. ### P2-14 [RESOURCE] `git-checkout` YAML diverges from registry data **Files:** `examples/resource-types/git-checkout.yaml` vs `_resource_registry_data.py:69-106` YAML has `child_types: ["git", "fs-directory"]`, registry has 5 children. Handler paths differ. Auto_discovery present in YAML but absent in registry. BDD tests load from YAML, providing false confidence. ### P2-15 [TEST] Missing test: single-node tree No test uses `{"root": []}`. The BFS, tier computation, and isolation validation all have this as an edge case. ### P2-16 [TEST] Missing test: diamond-shaped DAG No test for convergent paths. `_compute_rollback_tier_depth` gives nondeterministic results (P2-5). ### P2-17 [TEST] Missing test: disconnected trees (forests) `_find_root` has explicit forest handling but no test validates it. ### P2-18 [TEST] Missing test: target node not in tree (full impact analysis) Only `compute_rollback_tier` is tested for unknown nodes, not `analyze_impact`. ### P2-19 [TEST] Missing test: influence edge to node not in structural tree No test for `{"D": ["EXTERNAL"]}` where EXTERNAL isn't in the tree. ### P2-20 [TEST] Middle-node test uses "contains" instead of exact match **File:** `correction_subtree_isolation.feature:28-36` If BFS over-includes nodes, the test still passes. Add exact count or exact-match assertions. ### P2-21 [TEST] Robot tests are pure sentinel checks **File:** `robot/correction_subtree_isolation.robot` Every test: run helper → check exit code → check stdout sentinel. No independent field verification. ### P2-22 [RESOURCE] No exclusivity assertions on parent/child type lists **File:** `resource_type_deferred_physical.feature` "should contain" but never "should have exactly N entries." Extra entries go undetected. --- ## P3 — LOW SEVERITY ### P3-1 [BUG] `execute_revert` skips ANALYZING state in lifecycle `PENDING → EXECUTING` directly, skipping `ANALYZING`. The implied state machine is violated. ### P3-2 [BUG] `CorrectionDryRunReport` duplicates fields from nested `impact` `report.excluded_decisions` and `report.impact.excluded_decisions` are always identical. Maintenance hazard. ### P3-3 [BUG] `CorrectionResult` accepts non-terminal statuses No validator prevents `status=PENDING` on a result. ### P3-4 [BUG] `list_attempts` leaks mutable internal state Returns shallow copy but `CorrectionAttempt` has `frozen=False`. Callers can corrupt attempt records. ### P3-5 [SECURITY] Log injection via user-controlled tree keys **File:** `correction_service.py:726-731` Decision IDs from user input included in structured log `node=node`. Console log rendering may not escape control characters. ### P3-6 [SECURITY] `exc_info=True` in event_bus warning log **File:** `correction_service.py:103-108` Full traceback logged, potentially exposing internal paths. ### P3-7 [SECURITY] No `max_length` on `plan_id` / `target_decision_id` **File:** `correction.py:62-68` Validators check non-empty but not max length. ### P3-8 [PERFORMANCE] `_compute_rollback_tier_depth` rebuilds child-to-parent map each call **File:** `correction_service.py:808-811` O(E) reconstruction duplicated across calls. ### P3-9 [PERFORMANCE] `_find_parent` is O(E) linear scan **File:** `correction_service.py:849-857` Should reuse the `child_to_parent` map for O(1) lookup. ### P3-10 [PERFORMANCE] `validate_subtree_isolation` re-runs full BFS **File:** `correction_service.py:649-653` Could reuse BFS result from `analyze_impact`. ### P3-11 [PERFORMANCE] `seen_this_round` set allocated per BFS node **File:** `correction_service.py:740` Replace with a single global `enqueued` set. ### P3-12 [PERFORMANCE] BFS `visited` set discarded then rebuilt as `affected_set` **File:** `correction_service.py:208, 720` Return the `visited` set from BFS to avoid O(V) rehash. ### P3-13 [PERFORMANCE] `_find_root` builds redundant `all_children` set **File:** `correction_service.py:839-841` Could reuse `child_to_parent` map. ### P3-14 [PERFORMANCE] `sorted()` on excluded decisions may be unnecessary **File:** `correction_service.py:209` O(K log K) sort serves no documented requirement. ### P3-15 [TEST] Missing tests for `_collect_all_decisions`, `cancel_correction`, `list_corrections`, `get_correction`, `list_attempts` These public/internal API methods have zero test coverage in the new test files. ### P3-16 [TEST] Missing test for `_emit_correction_applied` event emission No test provides an `event_bus` to `CorrectionService`. ### P3-17 [TEST] Missing test for risk classification thresholds `_classify_risk` boundary values (3, 10, 11) are untested. ### P3-18 [TEST] Influence cycle test and validate_subtree_isolation lack strong assertions Cycle test only checks "contains" not exact set. Only 1 negative isolation scenario. ### P3-19 [RESOURCE] `git-branch` auto_discovery not tested; leaf types missing `auto_discovery=none` assertions --- ## Summary Table | Severity | Bugs | Security | Performance | Spec | Test | Resource | Total | |----------|------|----------|-------------|------|------|----------|-------| | **P1 (High)** | 3 | 2 | 1 | — | 2 | 1 | **9** | | **P2 (Medium)** | 6 | 3 | — | 2 | 8 | 3 | **22** | | **P3 (Low)** | 4 | 3 | 7 | — | 4 | 1 | **19** | | **Total** | **13** | **8** | **8** | **2** | **14** | **5** | **50** | ### Recommended Priority for Fixes Before Merge 1. **P1-1** (affected_child_plans empty) — spec-mandated safety mechanism missing 2. **P1-5** + **P1-6** (execute_revert and status guard untested) — core feature lacks test coverage 3. **P1-2** (analyze_impact terminal state guard) — data integrity risk 4. **P1-7** (fs-mount DAG asymmetry) — data model inconsistency 5. **P1-9** (mode mismatch in direct method calls) — defensive validation 6. **P2-1** through **P2-4** (convergent BFS warnings, isolated target, tier-0 false warning, dry-run side effect) — correctness fixes

CoreRasurae force-pushed feature/m4-correction-subtree from fd4d66f816 to e021bb90f9

2026-03-20 19:02:41 +00:00

Compare

CoreRasurae reviewed 2026-03-20 19:55:45 +00:00

CoreRasurae left a comment

Code Review Report — PR #1075 (`feat(plan): decision correction recomputes only affected subtree`)

Reviewer: Automated review (on behalf of project)
Branch: feature/m4-correction-subtree
Issue: #845
Commit: e021bb90

Methodology

This review was conducted through 4 iterative cycles, each scanning all categories (bugs, test coverage, test flaws, performance, security, spec compliance) across the full scope of changes in feature/m4-correction-subtree. Each cycle re-examined every category until no new findings emerged.

Files reviewed:

src/cleveragents/application/services/correction_service.py (primary source)
src/cleveragents/domain/models/core/correction.py (domain model)
features/correction_subtree_isolation.feature (BDD tests)
features/steps/correction_subtree_isolation_steps.py (BDD step definitions)
robot/correction_subtree_isolation.robot (Robot integration tests)
robot/helper_correction_subtree_isolation.py (Robot helper)
CHANGELOG.md
features/resource_type_deferred_physical.feature (cleanup)

Specification reference: docs/specification.md — sections on Correction Engine (§28560–28787), Affected Subtree Computation (§28665–28696), Rollback Tiers (§28710–28720), CLI Commands (§14907–15316), and Storage Schema (§45580–45593).

Findings by Severity

MEDIUM Severity

M1 — Bug: `generate_dry_run_report` does not restore status on exception (missing `try/finally`)

File: correction_service.py:296-299
Category: Bug

The status preservation in generate_dry_run_report is not wrapped in a try/finally:

original_status = request.status        # Save
impact = self.analyze_impact(...)        # May change status to ANALYZING
request.status = original_status         # Restore — SKIPPED if analyze_impact raises

If analyze_impact raises an exception after transitioning status from PENDING to ANALYZING (e.g., due to a RuntimeError in _compute_affected_subtree), request.status = original_status is never executed. The status is left stuck at ANALYZING.

Confirmed via live test:

svc = CorrectionService()
req = svc.request_correction('plan1', 'A', CorrectionMode.REVERT)
# Patch _compute_affected_subtree to raise RuntimeError
# After failed dry-run: status='analyzing' instead of 'pending'

The commit message (item 9) claims: "Fixed generate_dry_run_report() to preserve request status so that generating a preview does not advance the correction lifecycle." The happy path IS fixed, but the exception path is not.

Fix: Wrap in try/finally:

original_status = request.status
try:
    impact = self.analyze_impact(correction_id, decision_tree, influence_edges)
finally:
    request.status = original_status

M2 — Test Gap: No test for convergent (diamond) DAG topology

File: features/correction_subtree_isolation.feature
Category: Test Coverage

The commit message (item 7) states: "Fixed false-positive cycle-detection warnings from convergent (diamond) topologies by replacing per-node seen_this_round with a global enqueued set in BFS." However, there is no BDD or Robot test exercising a diamond/convergent topology (e.g., A→C, B→C where node C has two parents).

The existing cycle test (Scenario: Influence DAG cycle does not cause infinite loop) tests C→D→C, which is an actual cycle — not the convergent pattern the fix specifically addresses. The fix is confirmed working (verified live), but there is no regression guard.

Suggested: Add a scenario such as:

Scenario: Convergent (diamond) topology does not trigger false cycle warnings
  Given a subtree isolate influence edge from "C" to "E" and from "D" to "E"
  When I subtree isolate target decision "A" with mode "revert"
  And I subtree isolate analyze the impact with influence edges
  Then the subtree isolate affected decisions should contain "E"
  And "E" should appear exactly once in the affected decisions

M3 — Test Gap: No test for `generate_dry_run_report` exception path

File: features/correction_subtree_isolation.feature
Category: Test Coverage

There is no test verifying the behavior when analyze_impact raises inside generate_dry_run_report. This directly relates to finding M1. A test should confirm that either:

Status is correctly restored on exception (once M1 is fixed), or
The exception propagates cleanly without corrupting service state.

M4 — Spec Gap: `affected_child_plans` is always empty

File: correction_service.py:244
Category: Spec Compliance

The spec (§28665–28696, compute_affected_subtree pseudocode) explicitly describes collecting affected child plans when a subplan_spawn decision is encountered. The implementation always sets affected_child_plans=[] with a TODO comment (lines 239-242) referencing the resource-rollback layer integration.

This means:

Corrections on trees containing subplan_spawn decisions will report no child plan impact.
The cross-plan correction cascading described in §28699–28708 cannot function.

Acknowledged as deferred, but this is a core acceptance criterion gap ("CorrectionImpact accurately reports which decisions are affected vs excluded").

M5 — Spec Gap: `rollback_tier` classification is mode-based, not capability-based

File: correction_service.py:247-249
Category: Spec Compliance

The spec (§28710–28720) defines three rollback tiers determined by system capabilities:

Full decision-level: checkpointing enabled + sandboxable resources
Phase-level: checkpointing disabled or unsupported resources
No rollback: sandbox.strategy: none + checkpointing disabled

The implementation sets rollback_tier based solely on correction mode:

rollback_tier="full" if request.mode == CorrectionMode.REVERT else "append_only"

The "phase" tier is supported by the model validator but never produced by the service. The spec says "System detects tier automatically and informs user." This static assignment could mislead users about actual rollback capabilities.

LOW Severity

L1 — Bug: `_compute_rollback_tier_depth` returns ambiguous 0

File: correction_service.py:840-872
Category: Bug

Returns 0 both when the target is the tree root and when the target is not found in the tree. The docstring acknowledges the ambiguity, and generate_dry_run_report correctly handles it (checks target_is_in_tree before emitting the tier-0 warning). However, analyze_impact stores rollback_tier_depth=0 in the CorrectionImpact without disambiguation, potentially misleading downstream consumers of the impact model.

L2 — Performance: `_TERMINAL` set recreated on every `analyze_impact` call

File: correction_service.py:202-207
Category: Performance

_TERMINAL = {
    CorrectionStatus.APPLIED,
    CorrectionStatus.FAILED,
    CorrectionStatus.CANCELLED,
    CorrectionStatus.REJECTED,
}

This set is created inside analyze_impact on every invocation. It should be a module-level or class-level constant (e.g., _TERMINAL_STATUSES) following the pattern of _RISK_LOW_MAX and _RISK_MEDIUM_MAX.

L3 — Test Gap: No test for `execute_revert` with influence edges

File: features/correction_subtree_isolation.feature
Category: Test Coverage

The "Execute revert correction succeeds for middle node" scenario only passes a structural tree. There is no test verifying that execute_revert correctly handles the combined tree + influence_edges path (which is the full code path per §Affected Subtree Computation).

L4 — Test Gap: No test for `_collect_all_decisions` with DAG-only nodes

File: features/correction_subtree_isolation.feature
Category: Test Coverage

All tests use structural trees. There is no test verifying that nodes appearing only in the influence DAG (not in the structural tree) are correctly included in all_decisions and thus correctly classified as excluded.

L5 — Convention: Robot tests missing `on_timeout=kill` on `Run Process` calls

File: robot/correction_subtree_isolation.robot
Category: Convention / Robustness

The Run Process calls do not include on_timeout=kill, which is used in other Robot tests (e.g., m5_acceptance.robot, m2_acceptance.robot) to prevent zombie processes when tests time out.

L6 — Security: No size/depth limits on tree/DAG input

File: correction_service.py (multiple methods)
Category: Security / Robustness

The service accepts dict[str, list[str]] for tree and influence DAG without any size validation. While this is an internal service (not a public API endpoint), an extremely large or deeply nested tree could cause significant memory consumption and processing time in BFS traversal. A defensive upper-bound check (e.g., max nodes, max edges) would improve robustness.

INFORMATIONAL

I1 — Naming: `CorrectionService` vs spec's `CorrectionFlow`

The spec (§43001, §43212) references CorrectionFlow as the correction component and CorrectionFlow.correct() as the A2A method. The implementation uses CorrectionService. This is a naming divergence from the spec architecture diagram. Likely intentional (service layer naming convention vs workflow naming convention), but worth documenting.

I2 — Schema gap: `CorrectionAttempt` model vs spec DDL

The spec's correction_attempts DDL (§45580–45593) includes mode, guidance, and archived_artifacts_path columns. The CorrectionAttempt model uses a generic details: dict[str, Any] catch-all instead of explicit fields. This is a normalization/denormalization design choice — mode and guidance are available via the parent CorrectionRequest.

Summary

Severity	Count	Categories
Medium	5	1 bug, 2 test gaps, 2 spec gaps
Low	6	1 bug, 2 test gaps, 1 perf, 1 convention, 1 security
Info	2	Naming and schema divergences

Key recommendation: Fix M1 (missing try/finally in generate_dry_run_report) — this is the only confirmed code defect. M2 and M3 are the most impactful test gaps to close. M4 and M5 may be acceptable deferrals if tracked via follow-up tickets.

# Code Review Report — PR #1075 (`feat(plan): decision correction recomputes only affected subtree`) **Reviewer**: Automated review (on behalf of project) **Branch**: `feature/m4-correction-subtree` **Issue**: #845 **Commit**: `e021bb90` --- ## Methodology This review was conducted through **4 iterative cycles**, each scanning all categories (bugs, test coverage, test flaws, performance, security, spec compliance) across the full scope of changes in `feature/m4-correction-subtree`. Each cycle re-examined every category until no new findings emerged. **Files reviewed:** - `src/cleveragents/application/services/correction_service.py` (primary source) - `src/cleveragents/domain/models/core/correction.py` (domain model) - `features/correction_subtree_isolation.feature` (BDD tests) - `features/steps/correction_subtree_isolation_steps.py` (BDD step definitions) - `robot/correction_subtree_isolation.robot` (Robot integration tests) - `robot/helper_correction_subtree_isolation.py` (Robot helper) - `CHANGELOG.md` - `features/resource_type_deferred_physical.feature` (cleanup) **Specification reference:** `docs/specification.md` — sections on Correction Engine (§28560–28787), Affected Subtree Computation (§28665–28696), Rollback Tiers (§28710–28720), CLI Commands (§14907–15316), and Storage Schema (§45580–45593). --- ## Findings by Severity ### MEDIUM Severity #### M1 — Bug: `generate_dry_run_report` does not restore status on exception (missing `try/finally`) **File:** `correction_service.py:296-299` **Category:** Bug The status preservation in `generate_dry_run_report` is not wrapped in a `try/finally`: ```python original_status = request.status # Save impact = self.analyze_impact(...) # May change status to ANALYZING request.status = original_status # Restore — SKIPPED if analyze_impact raises ``` If `analyze_impact` raises an exception **after** transitioning status from `PENDING` to `ANALYZING` (e.g., due to a `RuntimeError` in `_compute_affected_subtree`), `request.status = original_status` is never executed. The status is left stuck at `ANALYZING`. **Confirmed via live test:** ```python svc = CorrectionService() req = svc.request_correction('plan1', 'A', CorrectionMode.REVERT) # Patch _compute_affected_subtree to raise RuntimeError # After failed dry-run: status='analyzing' instead of 'pending' ``` The commit message (item 9) claims: "Fixed `generate_dry_run_report()` to preserve request status so that generating a preview does not advance the correction lifecycle." The happy path IS fixed, but the exception path is not. **Fix:** Wrap in `try/finally`: ```python original_status = request.status try: impact = self.analyze_impact(correction_id, decision_tree, influence_edges) finally: request.status = original_status ``` --- #### M2 — Test Gap: No test for convergent (diamond) DAG topology **File:** `features/correction_subtree_isolation.feature` **Category:** Test Coverage The commit message (item 7) states: "Fixed false-positive cycle-detection warnings from convergent (diamond) topologies by replacing per-node `seen_this_round` with a global `enqueued` set in BFS." However, there is no BDD or Robot test exercising a diamond/convergent topology (e.g., `A→C`, `B→C` where node C has two parents). The existing cycle test (`Scenario: Influence DAG cycle does not cause infinite loop`) tests `C→D→C`, which is an actual cycle — not the convergent pattern the fix specifically addresses. The fix is confirmed working (verified live), but there is no regression guard. **Suggested:** Add a scenario such as: ```gherkin Scenario: Convergent (diamond) topology does not trigger false cycle warnings Given a subtree isolate influence edge from "C" to "E" and from "D" to "E" When I subtree isolate target decision "A" with mode "revert" And I subtree isolate analyze the impact with influence edges Then the subtree isolate affected decisions should contain "E" And "E" should appear exactly once in the affected decisions ``` --- #### M3 — Test Gap: No test for `generate_dry_run_report` exception path **File:** `features/correction_subtree_isolation.feature` **Category:** Test Coverage There is no test verifying the behavior when `analyze_impact` raises inside `generate_dry_run_report`. This directly relates to finding M1. A test should confirm that either: - Status is correctly restored on exception (once M1 is fixed), or - The exception propagates cleanly without corrupting service state. --- #### M4 — Spec Gap: `affected_child_plans` is always empty **File:** `correction_service.py:244` **Category:** Spec Compliance The spec (§28665–28696, `compute_affected_subtree` pseudocode) explicitly describes collecting affected child plans when a subplan_spawn decision is encountered. The implementation always sets `affected_child_plans=[]` with a TODO comment (lines 239-242) referencing the resource-rollback layer integration. This means: - Corrections on trees containing `subplan_spawn` decisions will report no child plan impact. - The cross-plan correction cascading described in §28699–28708 cannot function. Acknowledged as deferred, but this is a core acceptance criterion gap ("CorrectionImpact accurately reports which decisions are affected vs excluded"). --- #### M5 — Spec Gap: `rollback_tier` classification is mode-based, not capability-based **File:** `correction_service.py:247-249` **Category:** Spec Compliance The spec (§28710–28720) defines three rollback tiers determined by system capabilities: - **Full decision-level**: checkpointing enabled + sandboxable resources - **Phase-level**: checkpointing disabled or unsupported resources - **No rollback**: `sandbox.strategy: none` + checkpointing disabled The implementation sets `rollback_tier` based solely on correction mode: ```python rollback_tier="full" if request.mode == CorrectionMode.REVERT else "append_only" ``` The `"phase"` tier is supported by the model validator but never produced by the service. The spec says "System detects tier automatically and informs user." This static assignment could mislead users about actual rollback capabilities. --- ### LOW Severity #### L1 — Bug: `_compute_rollback_tier_depth` returns ambiguous 0 **File:** `correction_service.py:840-872` **Category:** Bug Returns `0` both when the target is the tree root and when the target is not found in the tree. The docstring acknowledges the ambiguity, and `generate_dry_run_report` correctly handles it (checks `target_is_in_tree` before emitting the tier-0 warning). However, `analyze_impact` stores `rollback_tier_depth=0` in the `CorrectionImpact` without disambiguation, potentially misleading downstream consumers of the impact model. --- #### L2 — Performance: `_TERMINAL` set recreated on every `analyze_impact` call **File:** `correction_service.py:202-207` **Category:** Performance ```python _TERMINAL = { CorrectionStatus.APPLIED, CorrectionStatus.FAILED, CorrectionStatus.CANCELLED, CorrectionStatus.REJECTED, } ``` This set is created inside `analyze_impact` on every invocation. It should be a module-level or class-level constant (e.g., `_TERMINAL_STATUSES`) following the pattern of `_RISK_LOW_MAX` and `_RISK_MEDIUM_MAX`. --- #### L3 — Test Gap: No test for `execute_revert` with influence edges **File:** `features/correction_subtree_isolation.feature` **Category:** Test Coverage The "Execute revert correction succeeds for middle node" scenario only passes a structural tree. There is no test verifying that `execute_revert` correctly handles the combined tree + influence_edges path (which is the full code path per §Affected Subtree Computation). --- #### L4 — Test Gap: No test for `_collect_all_decisions` with DAG-only nodes **File:** `features/correction_subtree_isolation.feature` **Category:** Test Coverage All tests use structural trees. There is no test verifying that nodes appearing **only** in the influence DAG (not in the structural tree) are correctly included in `all_decisions` and thus correctly classified as excluded. --- #### L5 — Convention: Robot tests missing `on_timeout=kill` on `Run Process` calls **File:** `robot/correction_subtree_isolation.robot` **Category:** Convention / Robustness The `Run Process` calls do not include `on_timeout=kill`, which is used in other Robot tests (e.g., `m5_acceptance.robot`, `m2_acceptance.robot`) to prevent zombie processes when tests time out. --- #### L6 — Security: No size/depth limits on tree/DAG input **File:** `correction_service.py` (multiple methods) **Category:** Security / Robustness The service accepts `dict[str, list[str]]` for tree and influence DAG without any size validation. While this is an internal service (not a public API endpoint), an extremely large or deeply nested tree could cause significant memory consumption and processing time in BFS traversal. A defensive upper-bound check (e.g., max nodes, max edges) would improve robustness. --- ### INFORMATIONAL #### I1 — Naming: `CorrectionService` vs spec's `CorrectionFlow` The spec (§43001, §43212) references `CorrectionFlow` as the correction component and `CorrectionFlow.correct()` as the A2A method. The implementation uses `CorrectionService`. This is a naming divergence from the spec architecture diagram. Likely intentional (service layer naming convention vs workflow naming convention), but worth documenting. #### I2 — Schema gap: `CorrectionAttempt` model vs spec DDL The spec's `correction_attempts` DDL (§45580–45593) includes `mode`, `guidance`, and `archived_artifacts_path` columns. The `CorrectionAttempt` model uses a generic `details: dict[str, Any]` catch-all instead of explicit fields. This is a normalization/denormalization design choice — mode and guidance are available via the parent `CorrectionRequest`. --- ## Summary | Severity | Count | Categories | |----------|-------|------------| | Medium | 5 | 1 bug, 2 test gaps, 2 spec gaps | | Low | 6 | 1 bug, 2 test gaps, 1 perf, 1 convention, 1 security | | Info | 2 | Naming and schema divergences | **Key recommendation:** Fix M1 (missing `try/finally` in `generate_dry_run_report`) — this is the only confirmed code defect. M2 and M3 are the most impactful test gaps to close. M4 and M5 may be acceptable deferrals if tracked via follow-up tickets.

src/cleveragents/application/services/correction_service.py Outdated

						
				@@ -198,2 +241,4 @@

				            # file / artifact tracking once the resource-rollback layer

				            # is integrated (see spec § Mid-Execute Correction).

				            affected_files=[f"{d}.py" for d in affected],

				            affected_child_plans=[],

CoreRasurae commented

2026-03-20 19:55:45 +00:00

M4 — Spec gap: affected_child_plans always empty

The spec §28665–28696 pseudocode explicitly collects affected child plans when encountering subplan_spawn decisions. This hardcoded [] means cross-plan correction cascading (§28699–28708) cannot function. The TODO is acknowledged but this is a core acceptance criterion per issue #845.

**M4 — Spec gap: `affected_child_plans` always empty** The spec §28665–28696 pseudocode explicitly collects affected child plans when encountering `subplan_spawn` decisions. This hardcoded `[]` means cross-plan correction cascading (§28699–28708) cannot function. The TODO is acknowledged but this is a core acceptance criterion per issue #845.

src/cleveragents/application/services/correction_service.py

						
				@@ -200,3 +245,3 @@

				            estimated_cost=float(len(affected)) * 1.5,

				            estimated_cost=float(len(affected)) * _COST_PER_DECISION,

				            risk_level=risk,

				            rollback_tier="full"

CoreRasurae commented

2026-03-20 19:55:45 +00:00

M5 — Spec gap: rollback_tier is mode-based, not capability-based

Per spec §28710–28720, the three rollback tiers are determined by system capabilities (checkpoint + sandbox state), not by correction mode. This static assignment ("full" for revert, "append_only" for append) may mislead users about actual rollback capabilities. The "phase" tier is supported by the model validator but never produced here.

**M5 — Spec gap: `rollback_tier` is mode-based, not capability-based** Per spec §28710–28720, the three rollback tiers are determined by system capabilities (checkpoint + sandbox state), not by correction mode. This static assignment (`"full"` for revert, `"append_only"` for append) may mislead users about actual rollback capabilities. The `"phase"` tier is supported by the model validator but never produced here.

src/cleveragents/application/services/correction_service.py Outdated

						
				@@ -187,0 +199,4 @@

				        # Reject analysis on terminal states to prevent audit-data

				        # corruption (the stored impact must match what was used during

				        # execution).

				        _TERMINAL = {

CoreRasurae commented

2026-03-20 19:55:45 +00:00

L2 — Performance: _TERMINAL set recreated on every call

This set is allocated on every analyze_impact invocation. Consider promoting it to a module-level constant (e.g., _TERMINAL_STATUSES) following the pattern of _RISK_LOW_MAX and _RISK_MEDIUM_MAX already at module level.

**L2 — Performance: `_TERMINAL` set recreated on every call** This set is allocated on every `analyze_impact` invocation. Consider promoting it to a module-level constant (e.g., `_TERMINAL_STATUSES`) following the pattern of `_RISK_LOW_MAX` and `_RISK_MEDIUM_MAX` already at module level.

src/cleveragents/application/services/correction_service.py Outdated

						
				@@ -240,1 +294,4 @@

				        request = self._get_request_or_raise(correction_id)

				        # Preserve original status — dry-run is conceptually read-only.

				        original_status = request.status

CoreRasurae commented

2026-03-20 19:55:45 +00:00

M1 — Bug: Missing try/finally for status restoration

If analyze_impact raises after transitioning status from PENDING to ANALYZING, the request.status = original_status on line 299 is skipped. Confirmed via live test: status gets stuck at analyzing.

Fix:

original_status = request.status
try:
    impact = self.analyze_impact(correction_id, decision_tree, influence_edges)
finally:
    request.status = original_status

**M1 — Bug: Missing `try/finally` for status restoration** If `analyze_impact` raises after transitioning status from PENDING to ANALYZING, the `request.status = original_status` on line 299 is skipped. Confirmed via live test: status gets stuck at `analyzing`. Fix: ```python original_status = request.status try: impact = self.analyze_impact(correction_id, decision_tree, influence_edges) finally: request.status = original_status ```

CoreRasurae commented

2026-03-20 22:42:18 +00:00

Code Review Report — PR #1075 `feat(plan): decision correction recomputes only affected subtree`

Reviewer: CoreRasurae (automated multi-cycle review)
Scope: All code changes in feature/m4-correction-subtree plus close connections to surrounding code
Reference: Issue #845, docs/specification.md (Correction Engine architecture, §§ Affected Subtree Computation, Cross-Plan Correction Cascading, Rollback Tiers, Correction Safety)
Methodology: 3 global review cycles across all categories (bugs, security, performance, test coverage, spec compliance) until convergence

Executive Summary

The PR delivers the core subtree-isolation logic correctly: BFS traversal, excluded-decision computation, rollback tier depth, dry-run reporting, and status guards all work as intended for well-formed inputs. The test suite (15 Behave scenarios + 18 Robot tests) covers the primary happy paths thoroughly.

However, this review identifies 47 findings across 6 categories. The most significant issues are: (1) spec compliance gaps where cross-plan cascade and rollback tier detection are stubbed rather than implemented, (2) a broad except Exception in execution paths that swallows errors and leaks internal details, (3) dead code in the BFS cycle-detection path, (4) multiple redundant O(V+E) passes over the same data, and (5) critical test coverage gaps around the execute_correction dispatch entry point and error paths.

Findings by severity:

Severity	Count
Critical	4
High	12
Medium	22
Low	9

P1 — Critical Findings

P1-1 [Test Coverage] `execute_correction` dispatch happy-path untested

Every test that calls execute_correction() expects it to fail (dry-run guard, status guard, cancelled guard). No test verifies that the dispatch method successfully routes REVERT to execute_revert() and APPEND to execute_append() and returns the correct result. A bug that swaps the dispatch branches would not be caught. This is the public API implied by AC1.

File: correction_service.py:537-541
Impact: Violates AC1 ("CorrectionEngine.correct() accepts a target decision ID and recomputes only that subtree")

P1-2 [Test Coverage] `execute_correction` influence_edges forwarding untested

execute_correction forwards influence_edges to execute_revert (line 540), but no test calls execute_correction with influence edges on a REVERT correction. A parameter-forwarding bug would be undetected.

File: correction_service.py:540
Impact: Violates AC1 (dispatch must faithfully forward all parameters)

P1-3 [Test Coverage] `request_correction` empty input validation untested

No test passes empty or whitespace-only plan_id or target_decision_id to request_correction() and verifies ValidationError is raised.

File: correction_service.py:149-152

P1-4 [Test Coverage] `ResourceNotFoundError` for nonexistent correction_id untested

No test calls analyze_impact, execute_revert, execute_append, execute_correction, get_correction, list_attempts, or cancel_correction with a fabricated/nonexistent correction ID and verifies the error path.

File: correction_service.py:596-604

P2 — High Findings

P2-1 [Security] Bare `except Exception` swallows errors and leaks internal details

Both execute_revert (lines 422-430) and execute_append (lines 494-502) catch any exception, store str(exc) verbatim in CorrectionResult.error_message and attempt.details["error"]. This: (a) silently swallows unexpected errors like MemoryError or Pydantic ValidationError, (b) exposes internal exception text (file paths, class names, query strings) to callers/UI, and (c) prevents bug detection by converting programming errors into FAILED results.

Suggested fix: Narrow to expected operational exceptions (e.g., RuntimeError, ValidationError, ResourceNotFoundError). Return generic error messages to callers; log full tracebacks server-side only. Re-raise unexpected exceptions after recording the attempt failure.

P2-2 [Spec Compliance] `affected_child_plans` never populated — cross-plan cascade unimplemented

The spec (§ Affected Subtree Computation, § Cross-Plan Correction Cascading) requires collecting affected child plans by following dependency_type = 'plan' edges, and rejecting corrections whose affected subtree includes already-applied child plans. The domain models exist (CascadeAction, CascadeResult, CorrectionRejection, ChildPlanState) but the service hardcodes affected_child_plans=[] at line 258. The REJECTED status is defined but never set.

File: correction_service.py:258
Impact: Corrections that should be rejected for safety reasons silently succeed.

P2-3 [Spec Compliance] `rollback_tier` hardcoded, ignores plan checkpoint/sandbox config

The spec (§ Rollback Tiers) defines three tiers based on plan configuration. The service always reports "full" for reverts regardless of sandbox/checkpoint config. A plan with sandbox.strategy: none would incorrectly claim full rollback capability. The "phase" tier value exists in the model validator but can never be produced by the service.

File: correction_service.py:253-255

P2-4 [Test Coverage] Event bus emission (`_emit_correction_applied`) completely untested

No test constructs CorrectionService(event_bus=...) and verifies event emission. The silent-failure path (except Exception at line 114) is also untested. If event emission is silently broken, downstream audit/compliance listeners never learn about corrections.

File: correction_service.py:91-120

P2-5 [Test Coverage] `execute_revert` failure/exception path untested

No test forces analyze_impact to raise during execute_revert and verifies status=FAILED, error message population, attempt.success=False, and attempt.details.

File: correction_service.py:422-430

P2-6 [Test Coverage] `execute_append` failure/exception path untested

Same as P2-5 but for execute_append. The except block at lines 494-502 is never exercised.

File: correction_service.py:494-502

P2-7 [Test Coverage] `cancel_correction` from non-cancellable status untested

Tests cancel from PENDING state only. No test attempts to cancel from APPLIED, FAILED, EXECUTING, or REJECTED.

File: correction_service.py:580-584

P2-8 [Test Coverage] `list_corrections` and `list_attempts` zero coverage

Two public query methods have no test coverage whatsoever.

File: correction_service.py:555-569

P2-9 [Test Coverage] Medium and high risk classification + corresponding dry-run warnings untested

No test creates trees with 4-10 decisions (medium) or >10 decisions (high) to verify risk_level and warning text.

File: correction_service.py:823-829, 312-318

P2-10 [Test Coverage] Status guard assertions don't verify error messages

step_subtree_execute_again_raises and step_subtree_execute_cancelled_raises catch ValidationError but don't assert on the message content. Any ValidationError from any cause would pass. Compare with dry-run and mode-mismatch steps which correctly check message content.

File: correction_subtree_isolation_steps.py:467, 486

P2-11 [Test Coverage] `CorrectionAttempt` tracking never verified

After execution, no test verifies attempt count, attempt.success, attempt.completed_at, attempt.details, or that list_attempts() returns them.

File: correction_service.py:407-432, 475-504

P2-12 [Spec Compliance] Event emission missing `attempt_id`

The spec's event table says the correction_applied event should include the correction attempt ID. The _emit_correction_applied method includes correction_id (the request ID) but never includes CorrectionAttempt.attempt_id.

File: correction_service.py:100-113

P3 — Medium Findings

P3-1 [Bug] Dead code: BFS cycle-detection warning is unreachable

The enqueued set (line 784) prevents any node from being added to the queue more than once. Therefore the if node in visited guard (line 789) is always false, making the logger.warning("correction.cycle_detected", ...) block dead code. Cycles are correctly handled, but operators get no diagnostic log message when cycles occur.

File: correction_service.py:789-796
Suggested fix: Remove the dead code, or remove enqueued and rely solely on visited for dedup (making the warning reachable).

P3-2 [Bug] Tier-0 warning fires for non-root subtree roots in forests

For a forest (disconnected subtrees), a non-root subtree head would also be a key with tier_depth=0, triggering the misleading warning "Tier 0: root decision targeted — entire decision tree will be affected" when only a disconnected subtree is affected.

File: correction_service.py:329-338
Suggested fix: Use _find_root to determine the actual root and compare: request.target_decision_id == root.

P3-3 [Bug] Dry-run mutates `_impacts` dict, violating non-mutation contract

generate_dry_run_report calls analyze_impact, which writes self._impacts[correction_id] = impact (line 259). The finally block only restores request.status — it does not revert the _impacts mutation. This violates the documented intent: "dry-run is conceptually read-only."

File: correction_service.py:259, 305-309
Suggested fix: Save and restore _impacts state in the finally block.

P3-4 [Security] No upper bound on BFS traversal — DoS vector

_compute_affected_subtree performs BFS with no limit on node/edge count. A caller supplying adjacency lists with millions of entries causes unbounded memory/CPU consumption.

File: correction_service.py:748-820
Suggested fix: Enforce a configurable MAX_TREE_SIZE constant and raise ValidationError if exceeded.

P3-5 [Security] Unvalidated `guidance` string — no length limit

The guidance field has no max_length constraint. A multi-megabyte guidance string propagates verbatim into the event bus, log sinks, and stored state.

File: correction.py:74-76, correction_service.py:110
Suggested fix: Add max_length=4096 to the Pydantic field.

P3-6 [Security] `error_message` in `CorrectionResult` exposes internal exception details

error_message=str(exc) captures full exception text including potential file paths, class names, or query strings. This is persisted and returned to callers.

File: correction_service.py:426, 500
Suggested fix: Return generic error messages; log details server-side only.

P3-7 [Security] Decision IDs not validated — path traversal in synthetic paths

Decision IDs from adjacency lists are used to construct f"{d}.py" and f"{d}.artifact" without format validation. Path traversal sequences in IDs could be exploitable in future integrations.

File: correction_service.py:240, 249
Suggested fix: Validate decision IDs against a ULID/alphanumeric pattern.

P3-8 [Security] Event bus failures silently swallowed — audit gap

The except Exception in _emit_correction_applied catches all failures and only logs at warning level. For audit-critical deployments, corrections can be applied without any event trail.

File: correction_service.py:114-120
Suggested fix: Log at error level; consider making failure handling configurable.

P3-9 [Performance] `execute_revert` unconditionally recomputes already-cached impact

The typical flow is analyze_impact() → review → execute_revert(). But execute_revert always calls analyze_impact() again (line 411), doubling all O(V+E) work even when the impact was already computed and stored in self._impacts.

File: correction_service.py:411
Suggested fix: Check for cached impact: impact = self._impacts.get(correction_id) or self.analyze_impact(...).

P3-10 [Performance] `analyze_impact` makes 3 independent O(V+E) passes

Three methods each independently iterate the tree/dag: _compute_affected_subtree (BFS), _collect_all_decisions (full iteration), _compute_rollback_tier_depth (builds child→parent map). All scan the same data.

File: correction_service.py:225, 229, 237
Suggested fix: Build a child_to_parent map once and derive all needed structures from a single pass.

P3-11 [Performance] `validate_subtree_isolation` makes 3 redundant O(V+E) passes

BFS + _find_root + _find_parent each independently scan the adjacency list.

File: correction_service.py:707-731
Suggested fix: Build a child_to_parent map once at the top of the method.

P3-12 [Performance] `_find_parent` is O(E) linear scan

Scans all tree entries to find a parent. Should use a child→parent dict for O(1) lookup.

File: correction_service.py:914-918

P3-13 [Spec Compliance] `--guidance` required per spec but optional in code

The spec CLI synopsis (line 340) shows (--guidance|-g) <GUIDANCE> as a required argument. But request_correction() defaults guidance="".

File: correction_service.py:131, correction.py:74-76

P3-14 [Consistency] `execute_append` skips ANALYZING state transition

execute_revert: PENDING → ANALYZING → EXECUTING → APPLIED.
execute_append: PENDING → EXECUTING → APPLIED (skips ANALYZING).
Consumers expecting ANALYZING for all corrections will see it only for reverts.

File: correction_service.py:405 vs 473

P3-15 [Consistency] `execute_correction` silently discards parameters for append mode

decision_tree and influence_edges are silently discarded when dispatching to append. A caller providing these parameters gets no feedback.

File: correction_service.py:541

P3-16 [Test Coverage] Multi-level (3+) subtree end-to-end correction missing

AC5 requires "multi-level subtree correction" tests. A depth-4 tree exists for rollback tier tests but never for a full correction execution.

Impact: Violates AC5

P3-17 [Test Coverage] Dry-run side-effect absence not verified

AC4 requires "dry-run mode produces a report without applying changes." Tests verify content and status restoration, but no test asserts that _attempts, _results, _impacts are NOT populated after dry-run.

Impact: Partially violates AC4

P3-18 [Test Coverage] `excluded_decisions` completeness (complement property) unverified

Tests check individual decisions are in the excluded set but never assert affected ∪ excluded = all_decisions. A bug omitting a decision from both sets would pass.

Impact: Partially violates AC2

P3-19 [Test Coverage] `rollback_tier` field value ("full"/"append_only") untested

CorrectionImpact.rollback_tier is set to "full" for REVERT and "append_only" for APPEND but no test verifies this.

File: correction_service.py:253-255

P3-20 [Test Coverage] Dry-run report for APPEND mode untested

No test generates a dry-run report for an APPEND mode correction. The code path at line 348-350 sets decisions_to_invalidate=[] for append mode but this is never tested.

P3-21 [Test Coverage] Domain model validators never directly tested

No test verifies that CorrectionImpact rejects invalid risk_level or rollback_tier values, that CorrectionRequest rejects empty fields at model level, that guidance is stripped, or that CascadeAction rejects invalid action values.

File: correction.py:91-108, 161-175, 324-329

P3-22 [Test Coverage] 3 Behave scenarios missing Robot Framework equivalents

No Robot equivalent for: structural root contamination (negative isolation), single-node tree edge case, terminal state guard re-analysis. Creates coverage asymmetry.

P4 — Low Findings

P4-1 [Bug] Dead exception handler in `execute_append`

The try block (lines 478-493) contains only ULID() calls and Pydantic construction with literals — none can raise. The except Exception block is dead code.

File: correction_service.py:478-502

P4-2 [Bug] Status transition before attempt creation in `execute_revert`

Status is set to ANALYZING (line 405) before CorrectionAttempt creation (line 407). If attempt creation fails, status is inconsistent.

File: correction_service.py:405-408

P4-3 [Security] TOCTOU race in dry-run status save/restore

Another thread could observe the intermediate ANALYZING status during dry-run. Currently single-threaded but unsafe if concurrency is introduced.

File: correction_service.py:305-309

P4-4 [Performance] Set literals rebuilt per call instead of module-level frozenset

_assert_executable and cancel_correction create identical {PENDING, ANALYZING} set literals on every call. Should be a module-level _EXECUTABLE_STATUSES frozenset, consistent with _TERMINAL_STATUSES.

File: correction_service.py:618, 579

P4-5 [Performance] `sorted()` on excluded decisions — O(N log N) for determinism

If ordering is not contractually required, a list comprehension would be O(N).

File: correction_service.py:234

P4-6 [Performance] Synthetic f-string list comprehensions for placeholder data

Two list comprehensions allocate throwaway strings for affected_files and artifacts_to_archive. Could be deferred until real artifact tracking exists.

File: correction_service.py:240, 249

P4-7 [Consistency] `CorrectionDryRunReport` has redundant fields duplicating `impact` data

The report contains both impact.excluded_decisions/impact.rollback_tier_depth AND separate top-level fields. These can diverge if constructed manually.

File: correction.py:280-286

P4-8 [Style] Independent `if` for mutually exclusive risk levels

if impact.risk_level == "high": ...
if impact.risk_level == "medium": ...

Should be elif since risk_level is always exactly one of {low, medium, high}.

File: correction_service.py:312-317

P4-9 [Test Coverage] Robot tests structurally fragile — stdout-marker pattern

All Robot tests check rc == 0 and stdout contains <marker>. If a helper silently swallows an assertion and still prints the marker, all Robot tests are false passes.

Recommendations

Before merge (must fix):

P1-1/P1-2: Add happy-path tests for execute_correction dispatch (both modes, with influence_edges)
P2-1: Narrow except Exception to expected operational exceptions; sanitize error messages
P1-3/P1-4: Add input validation and not-found error path tests

Should fix (near-term):
4. P2-2/P2-3: Document that cross-plan cascade and rollback tier detection are deferred, or implement stub rejection logic
5. P3-1: Fix or remove dead BFS cycle-detection code
6. P3-3: Restore _impacts state in dry-run finally block
7. P3-9/P3-10: Reduce redundant O(V+E) passes by building a shared tree index
8. P2-4 through P2-11: Add tests for event emission, failure paths, cancel guards, query methods, attempt tracking

Nice to have:
9. P3-4/P3-5: Add BFS size bound and guidance max_length
10. P4-4/P4-8: Module-level frozenset constants, elif for risk levels

# Code Review Report — PR #1075 `feat(plan): decision correction recomputes only affected subtree` **Reviewer:** CoreRasurae (automated multi-cycle review) **Scope:** All code changes in `feature/m4-correction-subtree` plus close connections to surrounding code **Reference:** Issue #845, `docs/specification.md` (Correction Engine architecture, §§ Affected Subtree Computation, Cross-Plan Correction Cascading, Rollback Tiers, Correction Safety) **Methodology:** 3 global review cycles across all categories (bugs, security, performance, test coverage, spec compliance) until convergence --- ## Executive Summary The PR delivers the core subtree-isolation logic correctly: BFS traversal, excluded-decision computation, rollback tier depth, dry-run reporting, and status guards all work as intended for well-formed inputs. The test suite (15 Behave scenarios + 18 Robot tests) covers the primary happy paths thoroughly. However, this review identifies **47 findings** across 6 categories. The most significant issues are: (1) spec compliance gaps where cross-plan cascade and rollback tier detection are stubbed rather than implemented, (2) a broad `except Exception` in execution paths that swallows errors and leaks internal details, (3) dead code in the BFS cycle-detection path, (4) multiple redundant O(V+E) passes over the same data, and (5) critical test coverage gaps around the `execute_correction` dispatch entry point and error paths. **Findings by severity:** | Severity | Count | |----------|-------| | Critical | 4 | | High | 12 | | Medium | 22 | | Low | 9 | --- ## P1 — Critical Findings ### P1-1 [Test Coverage] `execute_correction` dispatch happy-path untested Every test that calls `execute_correction()` expects it to _fail_ (dry-run guard, status guard, cancelled guard). No test verifies that the dispatch method successfully routes REVERT to `execute_revert()` and APPEND to `execute_append()` and returns the correct result. A bug that swaps the dispatch branches would not be caught. This is the public API implied by AC1. - **File:** `correction_service.py:537-541` - **Impact:** Violates AC1 ("CorrectionEngine.correct() accepts a target decision ID and recomputes only that subtree") ### P1-2 [Test Coverage] `execute_correction` influence_edges forwarding untested `execute_correction` forwards `influence_edges` to `execute_revert` (line 540), but no test calls `execute_correction` with influence edges on a REVERT correction. A parameter-forwarding bug would be undetected. - **File:** `correction_service.py:540` - **Impact:** Violates AC1 (dispatch must faithfully forward all parameters) ### P1-3 [Test Coverage] `request_correction` empty input validation untested No test passes empty or whitespace-only `plan_id` or `target_decision_id` to `request_correction()` and verifies `ValidationError` is raised. - **File:** `correction_service.py:149-152` ### P1-4 [Test Coverage] `ResourceNotFoundError` for nonexistent correction_id untested No test calls `analyze_impact`, `execute_revert`, `execute_append`, `execute_correction`, `get_correction`, `list_attempts`, or `cancel_correction` with a fabricated/nonexistent correction ID and verifies the error path. - **File:** `correction_service.py:596-604` --- ## P2 — High Findings ### P2-1 [Security] Bare `except Exception` swallows errors and leaks internal details Both `execute_revert` (lines 422-430) and `execute_append` (lines 494-502) catch **any** exception, store `str(exc)` verbatim in `CorrectionResult.error_message` and `attempt.details["error"]`. This: (a) silently swallows unexpected errors like `MemoryError` or Pydantic `ValidationError`, (b) exposes internal exception text (file paths, class names, query strings) to callers/UI, and (c) prevents bug detection by converting programming errors into FAILED results. - **Suggested fix:** Narrow to expected operational exceptions (e.g., `RuntimeError, ValidationError, ResourceNotFoundError`). Return generic error messages to callers; log full tracebacks server-side only. Re-raise unexpected exceptions after recording the attempt failure. ### P2-2 [Spec Compliance] `affected_child_plans` never populated — cross-plan cascade unimplemented The spec (§ Affected Subtree Computation, § Cross-Plan Correction Cascading) requires collecting affected child plans by following `dependency_type = 'plan'` edges, and **rejecting** corrections whose affected subtree includes already-applied child plans. The domain models exist (`CascadeAction`, `CascadeResult`, `CorrectionRejection`, `ChildPlanState`) but the service hardcodes `affected_child_plans=[]` at line 258. The `REJECTED` status is defined but never set. - **File:** `correction_service.py:258` - **Impact:** Corrections that should be rejected for safety reasons silently succeed. ### P2-3 [Spec Compliance] `rollback_tier` hardcoded, ignores plan checkpoint/sandbox config The spec (§ Rollback Tiers) defines three tiers based on plan configuration. The service always reports `"full"` for reverts regardless of sandbox/checkpoint config. A plan with `sandbox.strategy: none` would incorrectly claim full rollback capability. The `"phase"` tier value exists in the model validator but can never be produced by the service. - **File:** `correction_service.py:253-255` ### P2-4 [Test Coverage] Event bus emission (`_emit_correction_applied`) completely untested No test constructs `CorrectionService(event_bus=...)` and verifies event emission. The silent-failure path (`except Exception` at line 114) is also untested. If event emission is silently broken, downstream audit/compliance listeners never learn about corrections. - **File:** `correction_service.py:91-120` ### P2-5 [Test Coverage] `execute_revert` failure/exception path untested No test forces `analyze_impact` to raise during `execute_revert` and verifies `status=FAILED`, error message population, `attempt.success=False`, and `attempt.details`. - **File:** `correction_service.py:422-430` ### P2-6 [Test Coverage] `execute_append` failure/exception path untested Same as P2-5 but for `execute_append`. The `except` block at lines 494-502 is never exercised. - **File:** `correction_service.py:494-502` ### P2-7 [Test Coverage] `cancel_correction` from non-cancellable status untested Tests cancel from PENDING state only. No test attempts to cancel from APPLIED, FAILED, EXECUTING, or REJECTED. - **File:** `correction_service.py:580-584` ### P2-8 [Test Coverage] `list_corrections` and `list_attempts` zero coverage Two public query methods have no test coverage whatsoever. - **File:** `correction_service.py:555-569` ### P2-9 [Test Coverage] Medium and high risk classification + corresponding dry-run warnings untested No test creates trees with 4-10 decisions (medium) or >10 decisions (high) to verify `risk_level` and warning text. - **File:** `correction_service.py:823-829`, `312-318` ### P2-10 [Test Coverage] Status guard assertions don't verify error messages `step_subtree_execute_again_raises` and `step_subtree_execute_cancelled_raises` catch `ValidationError` but don't assert on the message content. Any `ValidationError` from any cause would pass. Compare with dry-run and mode-mismatch steps which correctly check message content. - **File:** `correction_subtree_isolation_steps.py:467, 486` ### P2-11 [Test Coverage] `CorrectionAttempt` tracking never verified After execution, no test verifies attempt count, `attempt.success`, `attempt.completed_at`, `attempt.details`, or that `list_attempts()` returns them. - **File:** `correction_service.py:407-432, 475-504` ### P2-12 [Spec Compliance] Event emission missing `attempt_id` The spec's event table says the `correction_applied` event should include the correction attempt ID. The `_emit_correction_applied` method includes `correction_id` (the request ID) but never includes `CorrectionAttempt.attempt_id`. - **File:** `correction_service.py:100-113` --- ## P3 — Medium Findings ### P3-1 [Bug] Dead code: BFS cycle-detection warning is unreachable The `enqueued` set (line 784) prevents any node from being added to the queue more than once. Therefore the `if node in visited` guard (line 789) is always false, making the `logger.warning("correction.cycle_detected", ...)` block dead code. Cycles are correctly handled, but operators get no diagnostic log message when cycles occur. - **File:** `correction_service.py:789-796` - **Suggested fix:** Remove the dead code, or remove `enqueued` and rely solely on `visited` for dedup (making the warning reachable). ### P3-2 [Bug] Tier-0 warning fires for non-root subtree roots in forests For a forest (disconnected subtrees), a non-root subtree head would also be a key with `tier_depth=0`, triggering the misleading warning "Tier 0: root decision targeted — entire decision tree will be affected" when only a disconnected subtree is affected. - **File:** `correction_service.py:329-338` - **Suggested fix:** Use `_find_root` to determine the actual root and compare: `request.target_decision_id == root`. ### P3-3 [Bug] Dry-run mutates `_impacts` dict, violating non-mutation contract `generate_dry_run_report` calls `analyze_impact`, which writes `self._impacts[correction_id] = impact` (line 259). The `finally` block only restores `request.status` — it does not revert the `_impacts` mutation. This violates the documented intent: "dry-run is conceptually read-only." - **File:** `correction_service.py:259, 305-309` - **Suggested fix:** Save and restore `_impacts` state in the `finally` block. ### P3-4 [Security] No upper bound on BFS traversal — DoS vector `_compute_affected_subtree` performs BFS with no limit on node/edge count. A caller supplying adjacency lists with millions of entries causes unbounded memory/CPU consumption. - **File:** `correction_service.py:748-820` - **Suggested fix:** Enforce a configurable `MAX_TREE_SIZE` constant and raise `ValidationError` if exceeded. ### P3-5 [Security] Unvalidated `guidance` string — no length limit The `guidance` field has no `max_length` constraint. A multi-megabyte guidance string propagates verbatim into the event bus, log sinks, and stored state. - **File:** `correction.py:74-76`, `correction_service.py:110` - **Suggested fix:** Add `max_length=4096` to the Pydantic field. ### P3-6 [Security] `error_message` in `CorrectionResult` exposes internal exception details `error_message=str(exc)` captures full exception text including potential file paths, class names, or query strings. This is persisted and returned to callers. - **File:** `correction_service.py:426, 500` - **Suggested fix:** Return generic error messages; log details server-side only. ### P3-7 [Security] Decision IDs not validated — path traversal in synthetic paths Decision IDs from adjacency lists are used to construct `f"{d}.py"` and `f"{d}.artifact"` without format validation. Path traversal sequences in IDs could be exploitable in future integrations. - **File:** `correction_service.py:240, 249` - **Suggested fix:** Validate decision IDs against a ULID/alphanumeric pattern. ### P3-8 [Security] Event bus failures silently swallowed — audit gap The `except Exception` in `_emit_correction_applied` catches all failures and only logs at `warning` level. For audit-critical deployments, corrections can be applied without any event trail. - **File:** `correction_service.py:114-120` - **Suggested fix:** Log at `error` level; consider making failure handling configurable. ### P3-9 [Performance] `execute_revert` unconditionally recomputes already-cached impact The typical flow is `analyze_impact()` → review → `execute_revert()`. But `execute_revert` always calls `analyze_impact()` again (line 411), doubling all O(V+E) work even when the impact was already computed and stored in `self._impacts`. - **File:** `correction_service.py:411` - **Suggested fix:** Check for cached impact: `impact = self._impacts.get(correction_id) or self.analyze_impact(...)`. ### P3-10 [Performance] `analyze_impact` makes 3 independent O(V+E) passes Three methods each independently iterate the tree/dag: `_compute_affected_subtree` (BFS), `_collect_all_decisions` (full iteration), `_compute_rollback_tier_depth` (builds child→parent map). All scan the same data. - **File:** `correction_service.py:225, 229, 237` - **Suggested fix:** Build a `child_to_parent` map once and derive all needed structures from a single pass. ### P3-11 [Performance] `validate_subtree_isolation` makes 3 redundant O(V+E) passes BFS + `_find_root` + `_find_parent` each independently scan the adjacency list. - **File:** `correction_service.py:707-731` - **Suggested fix:** Build a `child_to_parent` map once at the top of the method. ### P3-12 [Performance] `_find_parent` is O(E) linear scan Scans all tree entries to find a parent. Should use a child→parent dict for O(1) lookup. - **File:** `correction_service.py:914-918` ### P3-13 [Spec Compliance] `--guidance` required per spec but optional in code The spec CLI synopsis (line 340) shows `(--guidance|-g) <GUIDANCE>` as a required argument. But `request_correction()` defaults `guidance=""`. - **File:** `correction_service.py:131`, `correction.py:74-76` ### P3-14 [Consistency] `execute_append` skips ANALYZING state transition `execute_revert`: PENDING → ANALYZING → EXECUTING → APPLIED. `execute_append`: PENDING → EXECUTING → APPLIED (skips ANALYZING). Consumers expecting ANALYZING for all corrections will see it only for reverts. - **File:** `correction_service.py:405 vs 473` ### P3-15 [Consistency] `execute_correction` silently discards parameters for append mode `decision_tree` and `influence_edges` are silently discarded when dispatching to append. A caller providing these parameters gets no feedback. - **File:** `correction_service.py:541` ### P3-16 [Test Coverage] Multi-level (3+) subtree end-to-end correction missing AC5 requires "multi-level subtree correction" tests. A depth-4 tree exists for rollback tier tests but never for a full correction execution. - **Impact:** Violates AC5 ### P3-17 [Test Coverage] Dry-run side-effect absence not verified AC4 requires "dry-run mode produces a report without applying changes." Tests verify content and status restoration, but no test asserts that `_attempts`, `_results`, `_impacts` are NOT populated after dry-run. - **Impact:** Partially violates AC4 ### P3-18 [Test Coverage] `excluded_decisions` completeness (complement property) unverified Tests check individual decisions are in the excluded set but never assert `affected ∪ excluded = all_decisions`. A bug omitting a decision from both sets would pass. - **Impact:** Partially violates AC2 ### P3-19 [Test Coverage] `rollback_tier` field value ("full"/"append_only") untested `CorrectionImpact.rollback_tier` is set to "full" for REVERT and "append_only" for APPEND but no test verifies this. - **File:** `correction_service.py:253-255` ### P3-20 [Test Coverage] Dry-run report for APPEND mode untested No test generates a dry-run report for an APPEND mode correction. The code path at line 348-350 sets `decisions_to_invalidate=[]` for append mode but this is never tested. ### P3-21 [Test Coverage] Domain model validators never directly tested No test verifies that `CorrectionImpact` rejects invalid `risk_level` or `rollback_tier` values, that `CorrectionRequest` rejects empty fields at model level, that `guidance` is stripped, or that `CascadeAction` rejects invalid `action` values. - **File:** `correction.py:91-108, 161-175, 324-329` ### P3-22 [Test Coverage] 3 Behave scenarios missing Robot Framework equivalents No Robot equivalent for: structural root contamination (negative isolation), single-node tree edge case, terminal state guard re-analysis. Creates coverage asymmetry. --- ## P4 — Low Findings ### P4-1 [Bug] Dead exception handler in `execute_append` The `try` block (lines 478-493) contains only `ULID()` calls and Pydantic construction with literals — none can raise. The `except Exception` block is dead code. - **File:** `correction_service.py:478-502` ### P4-2 [Bug] Status transition before attempt creation in `execute_revert` Status is set to ANALYZING (line 405) before `CorrectionAttempt` creation (line 407). If attempt creation fails, status is inconsistent. - **File:** `correction_service.py:405-408` ### P4-3 [Security] TOCTOU race in dry-run status save/restore Another thread could observe the intermediate ANALYZING status during dry-run. Currently single-threaded but unsafe if concurrency is introduced. - **File:** `correction_service.py:305-309` ### P4-4 [Performance] Set literals rebuilt per call instead of module-level frozenset `_assert_executable` and `cancel_correction` create identical `{PENDING, ANALYZING}` set literals on every call. Should be a module-level `_EXECUTABLE_STATUSES` frozenset, consistent with `_TERMINAL_STATUSES`. - **File:** `correction_service.py:618, 579` ### P4-5 [Performance] `sorted()` on excluded decisions — O(N log N) for determinism If ordering is not contractually required, a list comprehension would be O(N). - **File:** `correction_service.py:234` ### P4-6 [Performance] Synthetic f-string list comprehensions for placeholder data Two list comprehensions allocate throwaway strings for `affected_files` and `artifacts_to_archive`. Could be deferred until real artifact tracking exists. - **File:** `correction_service.py:240, 249` ### P4-7 [Consistency] `CorrectionDryRunReport` has redundant fields duplicating `impact` data The report contains both `impact.excluded_decisions`/`impact.rollback_tier_depth` AND separate top-level fields. These can diverge if constructed manually. - **File:** `correction.py:280-286` ### P4-8 [Style] Independent `if` for mutually exclusive risk levels ```python if impact.risk_level == "high": ... if impact.risk_level == "medium": ... ``` Should be `elif` since `risk_level` is always exactly one of `{low, medium, high}`. - **File:** `correction_service.py:312-317` ### P4-9 [Test Coverage] Robot tests structurally fragile — stdout-marker pattern All Robot tests check `rc == 0` and `stdout contains <marker>`. If a helper silently swallows an assertion and still prints the marker, all Robot tests are false passes. --- ## Recommendations **Before merge (must fix):** 1. P1-1/P1-2: Add happy-path tests for `execute_correction` dispatch (both modes, with influence_edges) 2. P2-1: Narrow `except Exception` to expected operational exceptions; sanitize error messages 3. P1-3/P1-4: Add input validation and not-found error path tests **Should fix (near-term):** 4. P2-2/P2-3: Document that cross-plan cascade and rollback tier detection are deferred, or implement stub rejection logic 5. P3-1: Fix or remove dead BFS cycle-detection code 6. P3-3: Restore `_impacts` state in dry-run finally block 7. P3-9/P3-10: Reduce redundant O(V+E) passes by building a shared tree index 8. P2-4 through P2-11: Add tests for event emission, failure paths, cancel guards, query methods, attempt tracking **Nice to have:** 9. P3-4/P3-5: Add BFS size bound and guidance max_length 10. P4-4/P4-8: Module-level frozenset constants, elif for risk levels

CoreRasurae referenced this pull request

2026-03-21 00:05:12 +00:00

refactor(correction): eliminate redundant fields in CorrectionDryRunReport model #1087

CoreRasurae force-pushed feature/m4-correction-subtree from e021bb90f9 to 6512e1a902

2026-03-21 00:05:52 +00:00

Compare

freemo requested review from freemo 2026-03-22 16:35:30 +00:00

freemo requested review from brent.edwards 2026-03-22 16:35:30 +00:00

freemo approved these changes 2026-03-23 02:41:19 +00:00

Dismissed

freemo left a comment

Review: APPROVE (with minor suggestions)

This is a well-structured, thoroughly tested PR that delivers the M4 acceptance criterion for subtree-scoped corrections. The implementation is correct, file organization follows CONTRIBUTING.md guidelines, and test coverage is comprehensive (30 BDD scenarios + 18 Robot integration tests).

PR Description Quality ✅

ISSUES CLOSED: #845 — proper closing keyword present
Clear summary, files-changed section, quality gates documented

File Organization ✅

Source in src/, features in features/, steps in features/steps/, integration tests in robot/
Step file correction_subtree_isolation_steps.py correctly named after correction_subtree_isolation.feature

Minor Issues (Non-Blocking)

1. _MAX_TREE_NODES guard counts adjacency-list keys, not actual nodes — total_keys = len(tree) + len(dag) only counts parent entries. A tree with 1 parent and 50,001 children has len(tree)==1 and passes the guard. Consider len(self._collect_all_decisions(tree, dag)) for accurate node count, or rename to _MAX_TREE_PARENTS.

2. execute_append has a no-op ANALYZING → EXECUTING transition — Two consecutive status assignments with no work between them mean ANALYZING is never observable. In execute_revert, ANALYZING is meaningful because analyze_impact() runs between transitions. Here it's purely ceremonial. Consider a comment or removing the dead transition.

3. validate_subtree_isolation accepts influence_edges but ignores it — Accepted "for API consistency" per docstring, but silently discarded. Consider _influence_edges prefix to signal it's unused, or omit it.

4. Unrelated duplicate scenario removal in resource_type_deferred_physical.feature — Correct fix (they were duplicates), but slightly breaks commit atomicity per CONTRIBUTING.md. Acceptable, but ideally a separate cleanup commit.

5. _compute_rollback_tier_depth returns 0 for both "root" and "unknown node" — Documented ambiguity, correctly guarded at the call site by _find_root() comparison, but could bite future callers. Consider an inline comment at the analyze_impact call site.

Positive Observations

Excellent defensive coding: terminal-state guards, dry-run enforcement, mode-mismatch checks, try/finally status restoration
BFS cycle detection properly fixed for convergent (diamond) topologies
Module-level frozenset constants (_TERMINAL_STATUSES, _EXECUTABLE_STATUSES) reduce scattered magic sets
Structured logging throughout with relevant context fields
Smart step name prefixing (subtree isolat) to avoid Behave AmbiguousStep conflicts

All minor issues are suggestions for improvement — none are blocking.

## Review: APPROVE (with minor suggestions) This is a well-structured, thoroughly tested PR that delivers the M4 acceptance criterion for subtree-scoped corrections. The implementation is correct, file organization follows CONTRIBUTING.md guidelines, and test coverage is comprehensive (30 BDD scenarios + 18 Robot integration tests). ### PR Description Quality ✅ - `ISSUES CLOSED: #845` — proper closing keyword present - Clear summary, files-changed section, quality gates documented ### File Organization ✅ - Source in `src/`, features in `features/`, steps in `features/steps/`, integration tests in `robot/` - Step file `correction_subtree_isolation_steps.py` correctly named after `correction_subtree_isolation.feature` ### Minor Issues (Non-Blocking) **1. `_MAX_TREE_NODES` guard counts adjacency-list keys, not actual nodes** — `total_keys = len(tree) + len(dag)` only counts parent entries. A tree with 1 parent and 50,001 children has `len(tree)==1` and passes the guard. Consider `len(self._collect_all_decisions(tree, dag))` for accurate node count, or rename to `_MAX_TREE_PARENTS`. **2. `execute_append` has a no-op ANALYZING → EXECUTING transition** — Two consecutive status assignments with no work between them mean ANALYZING is never observable. In `execute_revert`, ANALYZING is meaningful because `analyze_impact()` runs between transitions. Here it's purely ceremonial. Consider a comment or removing the dead transition. **3. `validate_subtree_isolation` accepts `influence_edges` but ignores it** — Accepted "for API consistency" per docstring, but silently discarded. Consider `_influence_edges` prefix to signal it's unused, or omit it. **4. Unrelated duplicate scenario removal** in `resource_type_deferred_physical.feature` — Correct fix (they were duplicates), but slightly breaks commit atomicity per CONTRIBUTING.md. Acceptable, but ideally a separate cleanup commit. **5. `_compute_rollback_tier_depth` returns 0 for both "root" and "unknown node"** — Documented ambiguity, correctly guarded at the call site by `_find_root()` comparison, but could bite future callers. Consider an inline comment at the `analyze_impact` call site. ### Positive Observations - Excellent defensive coding: terminal-state guards, dry-run enforcement, mode-mismatch checks, try/finally status restoration - BFS cycle detection properly fixed for convergent (diamond) topologies - Module-level frozenset constants (`_TERMINAL_STATUSES`, `_EXECUTABLE_STATUSES`) reduce scattered magic sets - Structured logging throughout with relevant context fields - Smart step name prefixing (`subtree isolat`) to avoid Behave `AmbiguousStep` conflicts All minor issues are suggestions for improvement — none are blocking.

src/cleveragents/application/services/correction_service.py

freemo commented

2026-03-23 02:41:19 +00:00

Minor: total_keys = len(tree) + len(dag) counts only parent entries in the adjacency lists, not unique nodes. A tree with 1 parent node and 50,001 children has len(tree)==1 and passes this guard, yet BFS would visit 50,002 nodes.

Consider either:

total_nodes = len(self._collect_all_decisions(tree, dag)) for an accurate node count
Or renaming _MAX_TREE_NODES to _MAX_TREE_PARENTS to be transparent about what is actually checked

Not blocking since the guard still bounds the problem to ~2× the stated limit in realistic trees.

**Minor**: `total_keys = len(tree) + len(dag)` counts only parent entries in the adjacency lists, not unique nodes. A tree with 1 parent node and 50,001 children has `len(tree)==1` and passes this guard, yet BFS would visit 50,002 nodes. Consider either: - `total_nodes = len(self._collect_all_decisions(tree, dag))` for an accurate node count - Or renaming `_MAX_TREE_NODES` to `_MAX_TREE_PARENTS` to be transparent about what is actually checked Not blocking since the guard still bounds the problem to ~2× the stated limit in realistic trees.

src/cleveragents/application/services/correction_service.py Outdated

						
				@@ -44,0 +65,4 @@

				_EXECUTABLE_STATUSES: frozenset[CorrectionStatus] = frozenset(

				    {CorrectionStatus.PENDING, CorrectionStatus.ANALYZING}

				)

freemo commented

2026-03-23 02:41:19 +00:00

Minor: These two consecutive assignments mean the ANALYZING state is never observable — no work runs between them, no event is emitted, nothing is persisted.

request.status = CorrectionStatus.ANALYZING
request.status = CorrectionStatus.EXECUTING

Contrast with execute_revert where ANALYZING is meaningful because analyze_impact() runs between the transitions. Here it's purely ceremonial.

Consider either:

Adding a brief comment explaining this is for state-machine audit compliance
Or removing the ANALYZING assignment since no observer can see it

**Minor**: These two consecutive assignments mean the ANALYZING state is never observable — no work runs between them, no event is emitted, nothing is persisted. ```python request.status = CorrectionStatus.ANALYZING request.status = CorrectionStatus.EXECUTING ``` Contrast with `execute_revert` where ANALYZING is meaningful because `analyze_impact()` runs between the transitions. Here it's purely ceremonial. Consider either: - Adding a brief comment explaining this is for state-machine audit compliance - Or removing the ANALYZING assignment since no observer can see it

src/cleveragents/application/services/correction_service.py Outdated

freemo commented

2026-03-23 02:41:19 +00:00

Minor: This parameter is accepted "for API consistency" per the docstring, but silently discarded (structural-only BFS is used). This could mislead callers into believing influence edges affect isolation validation.

Consider either _influence_edges prefix to signal it's unused, or omitting it if the spec is clear that isolation is structural-only.

**Minor**: This parameter is accepted "for API consistency" per the docstring, but silently discarded (structural-only BFS is used). This could mislead callers into believing influence edges affect isolation validation. Consider either `_influence_edges` prefix to signal it's unused, or omitting it if the spec is clear that isolation is structural-only.

freemo approved these changes 2026-03-23 02:46:26 +00:00

Dismissed

freemo left a comment

Review: APPROVED (with minor comments)

Solid implementation of subtree-scoped correction isolation. The code is well-structured with comprehensive BDD (15 scenarios) and Robot Framework integration tests.

Minor observations (non-blocking):

_MAX_TREE_NODES guard in analyze_impact() counts len(tree.keys()) which only counts parent nodes, not the full set of unique nodes (children that aren't parents would be missed). Consider collecting all unique nodes across both keys and values for an accurate count.
No-op transition in execute_append() — The ANALYZING → EXECUTING transition mirrors execute_revert for consistency, but in append mode the analyze step is essentially a pass-through. A brief comment explaining this is intentional for lifecycle consistency would help future maintainers.
validate_subtree_isolation() ignores influence_edges parameter — The method signature accepts influence edges but the BFS only walks the structural tree. The step definition for "validate isolation for target with influence edges" passes influence edges, but the validation method doesn't use them. This appears intentional (structural-only check) but should be documented.
Step prefix consistency — The subtree isolat prefix convention is pragmatic for avoiding AmbiguousStep errors. Good approach.
CHANGELOG entry — The entry is quite long (50+ lines for one bullet). Consider breaking it into sub-bullets for readability.

File organization is correct, issue reference (ISSUES CLOSED: #845) is present, and the PR description is thorough. Well done.

## Review: APPROVED (with minor comments) Solid implementation of subtree-scoped correction isolation. The code is well-structured with comprehensive BDD (15 scenarios) and Robot Framework integration tests. ### Minor observations (non-blocking): 1. **`_MAX_TREE_NODES` guard in `analyze_impact()`** counts `len(tree.keys())` which only counts parent nodes, not the full set of unique nodes (children that aren't parents would be missed). Consider collecting all unique nodes across both keys and values for an accurate count. 2. **No-op transition in `execute_append()`** — The ANALYZING → EXECUTING transition mirrors `execute_revert` for consistency, but in append mode the analyze step is essentially a pass-through. A brief comment explaining this is intentional for lifecycle consistency would help future maintainers. 3. **`validate_subtree_isolation()` ignores `influence_edges` parameter** — The method signature accepts influence edges but the BFS only walks the structural tree. The step definition for "validate isolation for target with influence edges" passes influence edges, but the validation method doesn't use them. This appears intentional (structural-only check) but should be documented. 4. **Step prefix consistency** — The `subtree isolat` prefix convention is pragmatic for avoiding AmbiguousStep errors. Good approach. 5. **CHANGELOG entry** — The entry is quite long (50+ lines for one bullet). Consider breaking it into sub-bullets for readability. File organization is correct, issue reference (`ISSUES CLOSED: #845`) is present, and the PR description is thorough. Well done.

freemo added the

labels 2026-03-23 03:33:56 +00:00

freemo approved these changes 2026-03-23 03:42:34 +00:00

Dismissed

freemo left a comment

Day 43 Review — PR #1075 `feat(plan): correction recomputes only affected subtree`

Milestone: v3.3.0
Status: Mergeable (no conflicts)

Review Notes

This PR has been reviewed for compliance with CONTRIBUTING.md standards. Key checks:

Commit message format: Verified Conventional Changelog format from title
Mergeable status: Clean
Milestone assignment: v3.3.0

Action Items

Ensure the PR body includes a closing keyword (e.g., Closes #NNN)
Ensure at least 2 peer reviewers are assigned
Verify all CI checks pass before merge

Please ensure all subtasks in the linked issue are complete before merging.

## Day 43 Review — PR #1075 `feat(plan): correction recomputes only affected subtree` **Milestone**: v3.3.0 **Status**: Mergeable (no conflicts) ### Review Notes This PR has been reviewed for compliance with `CONTRIBUTING.md` standards. Key checks: - **Commit message format**: Verified Conventional Changelog format from title - **Mergeable status**: Clean - **Milestone assignment**: v3.3.0 ### Action Items - Ensure the PR body includes a closing keyword (e.g., `Closes #NNN`) - Ensure at least 2 peer reviewers are assigned - Verify all CI checks pass before merge Please ensure all subtasks in the linked issue are complete before merging.

CoreRasurae force-pushed feature/m4-correction-subtree from 6512e1a902 to 1ffb0fddbe

2026-03-23 20:25:58 +00:00

Compare

CoreRasurae dismissed freemo's review 2026-03-23 20:25:59 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-23 20:26:51 +00:00

CoreRasurae merged commit 1ffb0fddbe into master

2026-03-23 20:58:45 +00:00

CoreRasurae deleted branch feature/m4-correction-subtree

2026-03-23 20:58:46 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#1075

feat(plan): decision correction recomputes only affected subtree #1075

Summary

Changes

Files Modified/Created

Quality Gates

Code Review Report — PR #1075 (feature #845)

HIGH Severity

H1 — Bug: Status state machine regression in execute_revert

H2 — Bug: validate_subtree_isolation rejects legitimate influence-DAG-caused sibling contamination

H3 — Test Coverage Gap: Influence DAG traversal completely untested

H4 — Test Defect: Duplicate scenarios in resource_type_deferred_physical.feature

MEDIUM Severity

M1 — Bug: False-positive cycle-detection warnings from duplicate BFS enqueueing

M2 — Logic Gap: execute_append skips analyze_impact entirely

M3 — Test Coverage Gap: No negative test for validate_subtree_isolation returning False

M4 — Test Coverage Gap: execute_append mode has zero test coverage

M5 — Design: Synthetic placeholder values for affected_files and artifacts_to_archive

M6 — Test Coverage Gap: Error handling paths in execute_revert/execute_append untested

M7 — Design: CorrectionRequest frozen=False allows mutation of identity fields

LOW Severity

L1 — Code Quality: Magic number cost/time multipliers

L2 — Code Quality: Robot helper COMMANDS dict type annotation

L3 — Edge Case: _find_root returns arbitrary node in forest/multi-root graphs

L4 — Design: Ambiguous return value 0 for _compute_rollback_tier_depth

L5 — Design: Redundant fields in CorrectionDryRunReport

Summary

Review Findings — Resolution Report

Applied Fixes (amended into feature commit f2f889aa)

Not Applied — Justification

Verification

Code Review Report — PR #1075 (feature #845) — Post-Fix Review

Previous Review Fix Verification

New Findings

MEDIUM Severity

M1 — Bug (Defensive): Missing dry_run enforcement in execution guard

M2 — Documentation Bug: generate_dry_run_report docstring claims "without mutating state"

LOW Severity

L1 — API Clarity: validate_subtree_isolation silently ignores influence_edges parameter

L2 — Logging Accuracy: Cycle detection log message blames influence DAG exclusively

L3 — Defensive Programming: No runtime type validation on adjacency list values

Summary

Review Findings — Resolution Report (Post-Fix Review)

Applied Fixes (amended into feature commit fd4d66f8)

Not Applied — Justification

Verification

Code Review Report — PR #1075 (feature/m4-correction-subtree)

Executive Summary

P1 — HIGH SEVERITY

P1-1 [BUG / Spec Compliance] affected_child_plans always empty — cross-plan cascade rejection never fires

P1-2 [BUG] analyze_impact allows re-analysis on terminal states

P1-3 [SECURITY] Unbounded BFS — Denial of Service via large decision tree

P1-4 [SECURITY] Raw exception messages stored in CorrectionResult

P1-5 [TEST] execute_revert is never tested in a non-dry-run scenario

P1-6 [TEST] _assert_executable status guard (non-dry-run) untested

P1-7 [RESOURCE] fs-mount removed from fs-directory.parent_types — asymmetric DAG

P1-8 [PERFORMANCE] analyze_impact() fully recomputed on every call

P1-9 [BUG] execute_revert / execute_append don't validate mode matches method

P2 — MEDIUM SEVERITY

P2-1 [BUG] Spurious "cycle detected" warnings from convergent (non-cyclic) paths

P2-2 [BUG] _collect_all_decisions misses isolated target decision

P2-3 [BUG] _compute_rollback_tier_depth returns 0 for absent nodes — false tier-0 warning

P2-4 [BUG] generate_dry_run_report mutates request status

P2-5 [BUG] Diamond inheritance gives nondeterministic tier depth

P2-6 [BUG] execute_append performs no impact analysis

P2-7 [BUG] No state-machine enforcement on CorrectionRequest.status

P2-8 [BUG] Thread safety — status transitions and dict mutations unguarded

P2-9 [SPEC] "phase" rollback tier is dead code

P2-10 [SPEC] Dry-run report missing artifacts_to_archive and estimated_cost as top-level fields

P2-11 [SECURITY] No input validation on decision_tree / influence_edges contents

P2-12 [SECURITY] Path traversal in synthetic artifact/file paths

P2-13 [SECURITY] No max_length on guidance field

P2-14 [RESOURCE] git-checkout YAML diverges from registry data

P2-15 [TEST] Missing test: single-node tree

P2-16 [TEST] Missing test: diamond-shaped DAG

P2-17 [TEST] Missing test: disconnected trees (forests)

P2-18 [TEST] Missing test: target node not in tree (full impact analysis)

P2-19 [TEST] Missing test: influence edge to node not in structural tree

P2-20 [TEST] Middle-node test uses "contains" instead of exact match

P2-21 [TEST] Robot tests are pure sentinel checks

P2-22 [RESOURCE] No exclusivity assertions on parent/child type lists

H1 — Bug: Status state machine regression in `execute_revert`

H2 — Bug: `validate_subtree_isolation` rejects legitimate influence-DAG-caused sibling contamination

H4 — Test Defect: Duplicate scenarios in `resource_type_deferred_physical.feature`

M2 — Logic Gap: `execute_append` skips `analyze_impact` entirely

M3 — Test Coverage Gap: No negative test for `validate_subtree_isolation` returning `False`

M4 — Test Coverage Gap: `execute_append` mode has zero test coverage

M5 — Design: Synthetic placeholder values for `affected_files` and `artifacts_to_archive`

M6 — Test Coverage Gap: Error handling paths in `execute_revert`/`execute_append` untested

M7 — Design: `CorrectionRequest` `frozen=False` allows mutation of identity fields

L2 — Code Quality: Robot helper `COMMANDS` dict type annotation

L3 — Edge Case: `_find_root` returns arbitrary node in forest/multi-root graphs

L4 — Design: Ambiguous return value 0 for `_compute_rollback_tier_depth`

L5 — Design: Redundant fields in `CorrectionDryRunReport`

Applied Fixes (amended into feature commit `f2f889aa`)

M1 — Bug (Defensive): Missing `dry_run` enforcement in execution guard

M2 — Documentation Bug: `generate_dry_run_report` docstring claims "without mutating state"

L1 — API Clarity: `validate_subtree_isolation` silently ignores `influence_edges` parameter

Applied Fixes (amended into feature commit `fd4d66f8`)

Code Review Report — PR #1075 (`feature/m4-correction-subtree`)

P1-1 [BUG / Spec Compliance] `affected_child_plans` always empty — cross-plan cascade rejection never fires

P1-2 [BUG] `analyze_impact` allows re-analysis on terminal states

P1-4 [SECURITY] Raw exception messages stored in `CorrectionResult`

P1-5 [TEST] `execute_revert` is never tested in a non-dry-run scenario

P1-6 [TEST] `_assert_executable` status guard (non-dry-run) untested

P1-7 [RESOURCE] `fs-mount` removed from `fs-directory.parent_types` — asymmetric DAG

P1-8 [PERFORMANCE] `analyze_impact()` fully recomputed on every call

P1-9 [BUG] `execute_revert` / `execute_append` don't validate mode matches method

P2-2 [BUG] `_collect_all_decisions` misses isolated target decision

P2-3 [BUG] `_compute_rollback_tier_depth` returns 0 for absent nodes — false tier-0 warning

P2-4 [BUG] `generate_dry_run_report` mutates request status

P2-6 [BUG] `execute_append` performs no impact analysis

P2-7 [BUG] No state-machine enforcement on `CorrectionRequest.status`

P2-10 [SPEC] Dry-run report missing `artifacts_to_archive` and `estimated_cost` as top-level fields

P2-11 [SECURITY] No input validation on `decision_tree` / `influence_edges` contents

P2-13 [SECURITY] No `max_length` on `guidance` field

P2-14 [RESOURCE] `git-checkout` YAML diverges from registry data

P3-1 [BUG] `execute_revert` skips ANALYZING state in lifecycle

P3-2 [BUG] `CorrectionDryRunReport` duplicates fields from nested `impact`

P3-3 [BUG] `CorrectionResult` accepts non-terminal statuses

P3-4 [BUG] `list_attempts` leaks mutable internal state

P3-6 [SECURITY] `exc_info=True` in event_bus warning log

P3-7 [SECURITY] No `max_length` on `plan_id` / `target_decision_id`

P3-8 [PERFORMANCE] `_compute_rollback_tier_depth` rebuilds child-to-parent map each call

P3-9 [PERFORMANCE] `_find_parent` is O(E) linear scan

P3-10 [PERFORMANCE] `validate_subtree_isolation` re-runs full BFS

P3-11 [PERFORMANCE] `seen_this_round` set allocated per BFS node

P3-12 [PERFORMANCE] BFS `visited` set discarded then rebuilt as `affected_set`

P3-13 [PERFORMANCE] `_find_root` builds redundant `all_children` set

P3-14 [PERFORMANCE] `sorted()` on excluded decisions may be unnecessary

P3-15 [TEST] Missing tests for `_collect_all_decisions`, `cancel_correction`, `list_corrections`, `get_correction`, `list_attempts`

P3-16 [TEST] Missing test for `_emit_correction_applied` event emission

P3-19 [RESOURCE] `git-branch` auto_discovery not tested; leaf types missing `auto_discovery=none` assertions

Code Review Report — PR #1075 (`feat(plan): decision correction recomputes only affected subtree`)

M1 — Bug: `generate_dry_run_report` does not restore status on exception (missing `try/finally`)

M3 — Test Gap: No test for `generate_dry_run_report` exception path

M4 — Spec Gap: `affected_child_plans` is always empty

M5 — Spec Gap: `rollback_tier` classification is mode-based, not capability-based

L1 — Bug: `_compute_rollback_tier_depth` returns ambiguous 0

L2 — Performance: `_TERMINAL` set recreated on every `analyze_impact` call

L3 — Test Gap: No test for `execute_revert` with influence edges

L4 — Test Gap: No test for `_collect_all_decisions` with DAG-only nodes

L5 — Convention: Robot tests missing `on_timeout=kill` on `Run Process` calls

I1 — Naming: `CorrectionService` vs spec's `CorrectionFlow`

I2 — Schema gap: `CorrectionAttempt` model vs spec DDL

Code Review Report — PR #1075 `feat(plan): decision correction recomputes only affected subtree`

P1-1 [Test Coverage] `execute_correction` dispatch happy-path untested

P1-2 [Test Coverage] `execute_correction` influence_edges forwarding untested

P1-3 [Test Coverage] `request_correction` empty input validation untested

P1-4 [Test Coverage] `ResourceNotFoundError` for nonexistent correction_id untested

P2-1 [Security] Bare `except Exception` swallows errors and leaks internal details

P2-2 [Spec Compliance] `affected_child_plans` never populated — cross-plan cascade unimplemented

P2-3 [Spec Compliance] `rollback_tier` hardcoded, ignores plan checkpoint/sandbox config

P2-4 [Test Coverage] Event bus emission (`_emit_correction_applied`) completely untested

P2-5 [Test Coverage] `execute_revert` failure/exception path untested

P2-6 [Test Coverage] `execute_append` failure/exception path untested

P2-7 [Test Coverage] `cancel_correction` from non-cancellable status untested

P2-8 [Test Coverage] `list_corrections` and `list_attempts` zero coverage

P2-11 [Test Coverage] `CorrectionAttempt` tracking never verified

P2-12 [Spec Compliance] Event emission missing `attempt_id`

P3-3 [Bug] Dry-run mutates `_impacts` dict, violating non-mutation contract