feat(plan): decision correction recomputes only affected subtree #1075
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!1075
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m4-correction-subtree"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Enhanced the
CorrectionServiceto properly isolate correction scope so that decision corrections recompute only the affected subtree, preserving root and sibling decisions.Changes
analyze_impact()now populatesexcluded_decisions— computes the set difference between all plan decisions and the affected subtree, ensuring root and sibling decisions are tracked.Added
rollback_tiercomputation —compute_rollback_tier()counts parent hops from the target decision to the root, enabling depth-aware rollback strategies (tier 0 = root targeted, tier N = N levels deep). Addedrollback_tier_depth: intfield toCorrectionImpact.Enhanced dry-run report — includes excluded decisions list, rollback tier depth, and tier-0 warning when the entire tree is affected (root targeted with multiple decisions).
Added subtree isolation validation —
validate_subtree_isolation()confirms root exclusion and sibling non-contamination for non-root corrections.Files Modified/Created
src/cleveragents/domain/models/core/correction.py— Addedexcluded_decisionsandrollback_tier_depthfields toCorrectionImpactandCorrectionDryRunReportsrc/cleveragents/application/services/correction_service.py— Enhancedanalyze_impact(),generate_dry_run_report(), addedcompute_rollback_tier(),validate_subtree_isolation(), and helper methodsfeatures/correction_subtree_isolation.feature— 15 BDD scenariosfeatures/steps/correction_subtree_isolation_steps.py— Step definitionsrobot/correction_subtree_isolation.robot— 8 Robot Framework integration testsrobot/helper_correction_subtree_isolation.py— Robot helper scriptQuality Gates
ISSUES CLOSED: #845
Code Review Report — PR #1075 (feature #845)
Scope: All code changes in
feature/m4-correction-subtreeand their immediate surrounding context.Reviewed against: Issue #845 acceptance criteria,
docs/specification.md(§ Correcting Plans, § Affected Subtree Computation), and PR #1075 description.Review methodology: Three full review cycles across all categories (bugs/logic, spec compliance, security, performance, test coverage/flaws), each sweep covering all categories globally, repeated until no new findings emerged.
HIGH Severity
H1 — Bug: Status state machine regression in
execute_revertFile:
src/cleveragents/application/services/correction_service.py:339,345→192Description:
execute_revert()setsrequest.status = CorrectionStatus.EXECUTING(line 339), then immediately callsanalyze_impact()(line 345), which overwrites status toCorrectionStatus.ANALYZING(line 192). This causes a backward state transition:PENDING → EXECUTING → ANALYZING → APPLIED. Any concurrent observer or audit log would see the correction regress from EXECUTING to ANALYZING, violating the expected lifecycle ordering defined inCorrectionStatus.Suggestion: Call
analyze_impact()before transitioning to EXECUTING, or guardanalyze_impact()so it only sets ANALYZING when the current status is PENDING.H2 — Bug:
validate_subtree_isolationrejects legitimate influence-DAG-caused sibling contaminationFile:
src/cleveragents/application/services/correction_service.py:631-644Description: The validation checks that siblings of the target are NOT in the affected set (invariant 2, line 637-638). However,
_compute_affected_subtreetraverses both the structural tree AND the influence DAG. If the influence DAG legitimately causes a sibling to be affected (e.g., a dependency edge from a descendant of the target to a sibling), the validation would incorrectly returnFalse, rejecting a valid correction. Perdocs/specification.mdline 28667: the affected subtree is "computed by walking the influence DAG (not just the structural tree)" — influence-caused sibling inclusion is expected behavior.Suggestion: The sibling check should only consider structural-tree siblings that are affected via the structural tree, OR the invariant should be relaxed to allow influence-DAG-caused contamination. Document the design decision either way.
H3 — Test Coverage Gap: Influence DAG traversal completely untested
File: All test files (
features/correction_subtree_isolation.feature,features/steps/correction_subtree_isolation_steps.py,robot/helper_correction_subtree_isolation.py)Description: The
_compute_affected_subtreemethod has dual-traversal logic (structural tree + influence DAG) — this is a core spec requirement perdocs/specification.md§28667-28696. However, NO test (Behave or Robot) passes theinfluence_edgesparameter. The entire influence-DAG traversal path is untested. This is the most critical test coverage gap since influence-based cascading is fundamental to the correction feature's correctness.Suggestion: Add test scenarios that supply
influence_edgesalongsidedecision_treeand verify that:H4 — Test Defect: Duplicate scenarios in
resource_type_deferred_physical.featureFile:
features/resource_type_deferred_physical.feature:266-286Description: The "as_cli_dict regression: both auto_discovery schemas" section (lines 266-275) is duplicated verbatim at lines 277-286, including the identical section comment. This causes Behave to run 4 duplicate scenarios. While not a test failure, it inflates coverage metrics and obscures the actual test count.
Suggestion: Remove lines 277-286 (the second copy).
MEDIUM Severity
M1 — Bug: False-positive cycle-detection warnings from duplicate BFS enqueueing
File:
src/cleveragents/application/services/correction_service.py:682-706Description: In
_compute_affected_subtree, bothstructural_childrenandinfluence_dependentsare iterated separately (lines 701-706). If both lists contain the same node and it hasn't been visited yet, it gets enqueued twice. When the second copy is dequeued,node in visitedis true, and the logger emits a"correction.cycle_detected"warning (line 685-690). This is a false positive — there's no actual cycle, just a node reachable via two different edge types. In production with influence edges active, this could produce noisy false cycle warnings.Suggestion: Merge the two neighbor sets before enqueueing:
for neighbor in set(structural_children) | set(influence_dependents):, or suppress the warning when the duplicate came from multi-edge-type reachability rather than a true cycle.M2 — Logic Gap:
execute_appendskipsanalyze_impactentirelyFile:
src/cleveragents/application/services/correction_service.py:380-441Description: Unlike
execute_revertwhich callsanalyze_impact(),execute_appendcreates a result without any impact analysis. This meansself._impacts[correction_id]is never populated for append corrections, and there's no record of what the correction affects. Per the spec, both modes should understand the affected subtree — append still needs to know context for the spawned child plan.Suggestion: Consider calling
analyze_impact()inexecute_appendas well, or document why append mode intentionally skips it.M3 — Test Coverage Gap: No negative test for
validate_subtree_isolationreturningFalseFile:
features/correction_subtree_isolation.feature,robot/helper_correction_subtree_isolation.pyDescription: All tests for
validate_subtree_isolationverify it returnsTrue. There is no test that creates a scenario where the validation SHOULD fail and checks theFalsereturn path. This leaves the violation-detection branch (lines 624-629, 638-644) ofvalidate_subtree_isolationuntested.Suggestion: Add a test with a manually constructed affected set that includes the root or a sibling, to exercise the failure return path.
M4 — Test Coverage Gap:
execute_appendmode has zero test coverageFile: All test files
Description: Neither Behave nor Robot tests exercise the append correction flow (
CorrectionMode.APPEND). All correction tests useCorrectionMode.REVERT. The append path inexecute_append()(lines 380-441) and the dispatch logic inexecute_correction()are untested.Suggestion: Add at least one scenario testing append mode — verifying that a child plan ID is spawned and the original decision is preserved.
M5 — Design: Synthetic placeholder values for
affected_filesandartifacts_to_archiveFile:
src/cleveragents/application/services/correction_service.py:208,213Description:
affected_filesis computed as[f"{d}.py" for d in affected]andartifacts_to_archiveas[f"{d}.artifact" for d in affected]. These are synthetic values derived from decision IDs, not real file paths or artifact references. Consumers ofCorrectionImpactmay interpret these as real data. The synthetic data also flows intoCorrectionResult.archived_artifactsvia the revert path.Suggestion: Use empty lists and a TODO comment until real file/artifact tracking is implemented, or document these fields as "placeholder" in the field description.
M6 — Test Coverage Gap: Error handling paths in
execute_revert/execute_appenduntestedFile:
src/cleveragents/application/services/correction_service.py:355-363, 422-430Description: Both execution methods have
except Exceptionblocks that setCorrectionStatus.FAILED, populate error messages, and mark attempts as unsuccessful. These code paths have no test coverage.Suggestion: Add a test that causes an exception during execution (e.g., mock a failure) and verify the FAILED status and error_message are set correctly.
M7 — Design:
CorrectionRequestfrozen=Falseallows mutation of identity fieldsFile:
src/cleveragents/domain/models/core/correction.py:56Description:
CorrectionRequesthasmodel_config = ConfigDict(frozen=False)to allowstatusmutations. However, this also allows mutation ofplan_id,target_decision_id,mode, andcorrection_idafter creation. These fields should be immutable-by-design after construction, as changing them would corrupt the correction's identity and any dependent data in_impacts/_attempts/_results.Suggestion: Consider using Pydantic's
@propertyor__setattr__override pattern to allow onlystatusto be mutated while keeping other fields frozen. Alternatively, document this as a known trade-off.LOW Severity
L1 — Code Quality: Magic number cost/time multipliers
File:
src/cleveragents/application/services/correction_service.py:215,284Description:
estimated_cost = float(len(affected)) * 1.5andrecompute_seconds = float(len(affected)) * 2.0use unexplained magic constants.Suggestion: Extract to named module-level constants (e.g.,
_COST_PER_DECISION = 1.5,_RECOMPUTE_SECONDS_PER_DECISION = 2.0).L2 — Code Quality: Robot helper
COMMANDSdict type annotationFile:
robot/helper_correction_subtree_isolation.py:181Description:
COMMANDS: dict[str, object]should bedict[str, Callable[[], None]]for accurate type checking. The current annotation allows non-callable values, and theif callable(func)guard at line 201 is only needed because of the loose typing.Suggestion: Change to
Callable[[], None]and remove the runtime callable check.L3 — Edge Case:
_find_rootreturns arbitrary node in forest/multi-root graphsFile:
src/cleveragents/application/services/correction_service.py:773-788Description: If the decision tree is a forest (multiple disconnected subtrees),
_find_rootreturns the first parent not in any child list. For dict insertion order, this is deterministic but semantically arbitrary. The fallback at line 788 (next(iter(tree))) handles the degenerate case where every node is someone's child (i.e., a cycle).Suggestion: Document this limitation or raise an error when multiple roots are detected.
L4 — Design: Ambiguous return value 0 for
_compute_rollback_tier_depthFile:
src/cleveragents/application/services/correction_service.py:748-771Description: Returns
0for both "target is the root" and "target not found in tree." These are semantically different outcomes but indistinguishable. The test atrobot/helper_correction_subtree_isolation.py:173explicitly expects0for an unknown node.Suggestion: Consider returning
-1orNonefor unknown nodes, or document that0means "root or unknown."L5 — Design: Redundant fields in
CorrectionDryRunReportFile:
src/cleveragents/domain/models/core/correction.py:280-287Description:
CorrectionDryRunReporthasexcluded_decisionsandrollback_tier_depthas top-level fields that duplicatereport.impact.excluded_decisionsandreport.impact.rollback_tier_depth. This is intentional convenience access, but increases the surface area for inconsistency.Suggestion: No change needed if intentional — but a comment noting the duplication rationale would help.
Summary
The core subtree isolation logic is well-structured and the BFS algorithm with cycle detection is sound. The main concerns are: (1) the status state machine regression in
execute_revert, (2) the interaction betweenvalidate_subtree_isolationand the influence DAG, (3) the complete absence of influence-DAG test coverage despite it being a spec requirement, and (4) duplicate test scenarios. The resource type changes (deferred physical types, fs-directory updates) are clean and consistent with their YAML definitions.dd820af598f2f889aa3bReview Findings — Resolution Report
Following the code review posted above, each finding was validated against issue #845 acceptance criteria,
docs/specification.md, andCONTRIBUTING.md. Findings were either applied as fixes or deferred with justification.Applied Fixes (amended into feature commit
f2f889aa)analyze_impact()status transition: only sets ANALYZING when status is PENDING, preventing backward EXECUTING→ANALYZING regressionvalidate_subtree_isolation()now uses structural-only BFS for sibling invariant checks, so influence-DAG-caused sibling reachability is not misreported as a violationas_cli_dictregression scenarios (lines 277-286) inresource_type_deferred_physical.featureseen_this_roundset to prevent double-enqueueing from structural + influence edges while preserving deterministic visit orderaffected_files/artifacts_to_archiveplaceholders_COST_PER_DECISIONand_RECOMPUTE_SECONDS_PER_DECISIONnamed constantsCOMMANDStype annotation todict[str, Callable[[], None]], removed redundant callable guard_find_rootforest behavior and_compute_rollback_tier_depthambiguous return valueNot Applied — Justification
execute_appendskipsanalyze_impactfeatures/consolidated_correction.feature(scenarios: "execute_revert catches internal error and returns FAILED result", "execute_append catches internal error and returns FAILED result"). These were verified to pass with the amended code. No additional error tests needed.CorrectionRequestfrozen=Falseallows identity field mutationstatusmutation while freezing other fields would require a custom__setattr__override or separating the model into two (immutable identity + mutable state). This is a significant refactor beyond the scope of #845 and would affect all existing consumers. Documented as a known trade-off.CorrectionDryRunReportexcluded_decisionsandrollback_tier_depthprovide quick access without drilling intoreport.impact.*. This is a standard API design pattern for report objects.Verification
nox -s lint: All checks passednox -s typecheck: 0 errors (1 pre-existing warning)nox -s unit_tests(TEST_PROCESSES=9): 400 features, 11,511 scenarios, 0 failures (+3 new scenarios from review fixes)Code Review Report — PR #1075 (feature #845) — Post-Fix Review
Scope: All code changes in
feature/m4-correction-subtree(commitf2f889aa) and their immediate surrounding context.Reviewed against: Issue #845 acceptance criteria,
docs/specification.md(§ Correcting Plans, § Affected Subtree Computation, § Rollback Tiers), PR #1075 description, and the previous review resolution report.Methodology: Three full global review cycles across all categories (bugs/logic, spec compliance, security, performance, test coverage/flaws, documentation), repeated until no new findings emerged. Production call-sites (
cli/commands/plan.py,container.py) and existing test suites (consolidated_correction.feature,correction_service_coverage.feature,correction_flows.robot,influence_dag_traversal.robot, et al.) were cross-referenced.Previous Review Fix Verification
All 12 applied fixes from the previous review (H1–H4, M1, M3–M5, L1–L4) were verified as correctly implemented in the current code:
analyze_impact()(line 198): only transitions PENDING→ANALYZINGexecute_revert()sets EXECUTING before callinganalyze_impact(), guard correctly skipsvalidate_subtree_isolation()uses structural-only BFS (line 635–638)influence_edges=Nonepassed to internal BFSas_cli_dictscenarios removed fromresource_type_deferred_physical.featureseen_this_roundset prevents double-enqueueing in BFS (line 726–734)affected_files/artifacts_to_archive(line 220–223)_COST_PER_DECISIONand_RECOMPUTE_SECONDS_PER_DECISIONnamed constants (lines 47–48)COMMANDStype annotation changed todict[str, Callable[[], None]]_find_rootdocstring notes forest behavior (lines 815–821)_compute_rollback_tier_depthdocstring notes ambiguous zero return (lines 786–792)All "Not Applied" justifications (M2, M6, M7, L5) from the resolution report remain reasonable.
New Findings
Five additional findings surfaced during post-fix review. None are HIGH severity.
MEDIUM Severity
M1 — Bug (Defensive): Missing
dry_runenforcement in execution guardFile:
correction_service.py:547–554(_assert_executable)Description:
_assert_executable()checks thatrequest.statusis in{PENDING, ANALYZING}but does not checkrequest.dry_run. A caller can create a correction withdry_run=Trueand then successfully callexecute_revert()orexecute_append()— the flag is stored on the model (line 147) but never enforced.The CLI (
plan.py:2758–2812) handles this correctly via separate code branches, so the current production path is safe. However, the service API contract is broken: thedry_runparameter onrequest_correction()promises "only impact analysis will be performed" (line 129) but the service layer does not enforce this promise.Suggestion: Add a
dry_runcheck in_assert_executable():M2 — Documentation Bug:
generate_dry_run_reportdocstring claims "without mutating state"File:
correction_service.py:256Description: The docstring reads "Generate a dry-run report without mutating state" but the method calls
analyze_impact()which:request.statusfrom PENDING → ANALYZING (line 198–199)self._impacts[correction_id](line 234)Both are observable state mutations. The docstring is factually incorrect and could mislead future callers into treating this as a pure/idempotent operation (e.g., calling it repeatedly with different trees and expecting no side effects).
Suggestion: Change the docstring to: "Generate a dry-run report without executing the correction." This accurately describes the guarantee (no decision invalidation, no artifact archival, no child plan spawning) without overpromising about internal state.
LOW Severity
L1 — API Clarity:
validate_subtree_isolationsilently ignoresinfluence_edgesparameterFile:
correction_service.py:603,638Description: The method signature accepts
influence_edges: dict[str, list[str]] | None = Nonebut always passesinfluence_edges=Noneto the internal_compute_affected_subtree()call (line 638). This is intentional per the H2 fix (isolation checks use structural-only BFS), but callers receive no indication that the parameter has no effect.The Behave test "Isolation validation passes when influence DAG reaches sibling" passes
influence_edgesto this method, which exercises the correct behavior but obscures the fact that the parameter is completely ignored.Suggestion: Either (a) add a note to the
influence_edgesparameter docstring: "Accepted for API consistency but not used in isolation checks (structural-only BFS is used per spec § Affected Subtree Computation).", or (b) remove the parameter and update callers.L2 — Logging Accuracy: Cycle detection log message blames influence DAG exclusively
File:
correction_service.py:714–717Description: The cycle detection warning reads:
"Skipping already-visited node during BFS (possible cycle in influence DAG)". With theseen_this_roundfix (M1), this warning only fires for genuine cycles. However, cycles can also exist in the structural tree (e.g., the negative test creates{root→A, A→root}). The message incorrectly attributes all cycles to the influence DAG.Suggestion: Change to:
"Skipping already-visited node during BFS (possible cycle in decision tree or influence DAG)".L3 — Defensive Programming: No runtime type validation on adjacency list values
File:
correction_service.py:727–734(BFS inner loop)Description:
decision_treeandinfluence_edgesare typed asdict[str, list[str]]but Python does not enforce type hints at runtime. If a caller passes malformed values (e.g.,{"root": "ABC"}— a string instead of a list), the BFSfor neighbor in tree.get(node, [])would iterate over individual characters ("A","B","C"), producing silently incorrect results.In a properly typed codebase with mypy/pyright, this would be caught statically. The risk is low in practice since the production caller (
plan.py) constructs these fromDecisionServicequeries.Suggestion: No immediate action required. If desired, a runtime guard could be added:
assert isinstance(children, list)in the BFS loop, or Pydantic validation on a typed wrapper.Summary
The code is in good shape after the previous review fixes. The core subtree isolation logic, BFS traversal, rollback tier computation, and isolation validation are all correct. The
seen_this_roundBFS fix, structural-only isolation check, and status guard are properly implemented. All 12 previous fixes verified. The remaining findings are defensive hardening and documentation accuracy improvements — none block merge.Review Findings — Resolution Report (Post-Fix Review)
Following the post-fix code review, each finding was validated against issue #845 acceptance criteria,
docs/specification.md, andCONTRIBUTING.md. Findings were either applied as fixes or deferred with justification.Applied Fixes (amended into feature commit
fd4d66f8)dry_runenforcement guard in_assert_executable(): raisesValidationErrorwhenrequest.dry_runisTrue, preventing execution of dry-run-only corrections per spec §plan correct --dry-run: "Show impact without executing"and CONTRIBUTING.md § Argument Validation ("Invalid States: Verify object state is valid for the operation"). Added 2 Behave scenarios (revert + append dry-run enforcement) and 1 Robot test case.generate_dry_run_report()docstring: changed "without mutating state" to "without executing the correction" with explicit note aboutanalyze_impact()side-effects (status transition, impact storage). Per CONTRIBUTING.md § Documentation Standards: "Update documentation alongside code changes."validate_subtree_isolation()influence_edgesparameter: "Accepted for API consistency but not used in isolation checks — structural-only BFS is used per the specification (§ Affected Subtree Computation)."Not Applied — Justification
decision_tree/influence_edgesvalues (strings instead of lists would cause silent BFS character iteration)DecisionService.list_decisions,DecisionService.get_influence_edges) — this is a typed internal boundary, not an untyped external interface. The existing type annotations + pyright enforcement are the correct mechanism per CONTRIBUTING.md guidelines.Verification
nox -s lint: All checks passednox -s typecheck: 0 errors (1 pre-existing warning)nox -s unit_tests(TEST_PROCESSES=9): 400 features, 11,513 scenarios, 0 failures (+2 new scenarios from review fixes)nox -s integration_tests(TEST_PROCESSES=9): All passed (including newDry Run EnforcementRobot test)f2f889aa3bfd4d66f816Code Review Report — PR #1075 (
feature/m4-correction-subtree)Reviewer: Automated deep review (3 global cycles: bugs, security, performance, test coverage, spec compliance)
Scope: All code changes in the branch plus close connections to surrounding code
Reference: Issue #845,
docs/specification.md§28560-28787 (Correcting Plans)Executive Summary
This PR implements subtree-scoped correction in
CorrectionService(excluded decisions, rollback tier depth, dry-run enhancements, subtree isolation validation, BFS fixes) and adds 11 deferred physical resource types. The correction logic is well-structured and the BDD/Robot tests cover many paths. However, the review identified 50 findings across 6 categories, including issues that should be addressed before merge.Finding distribution: 2 Critical (test gaps) | 9 High | 22 Medium | 19 Low
P1 — HIGH SEVERITY
P1-1 [BUG / Spec Compliance]
affected_child_plansalways empty — cross-plan cascade rejection never firesFile:
correction_service.py:225CorrectionImpactis always created withaffected_child_plans=[]. The BFS in_compute_affected_subtreedoes not track child plan references. The spec pseudocode (§28670-28696) explicitly tracksaffected_plansby queryingdecision_dependencieswheredependency_type = 'plan'. The cross-plan cascading requirement (§28699-28708) — "Correction rejected when affected subtree includes child plans in APPLIED state" — can never trigger. The domain models (CorrectionRejection,CascadeAction,CascadeResult,ChildPlanStateincorrection.py:298-394) are defined but never instantiated.CorrectionStatus.REJECTEDis unreachable.Impact: Corrections that should be rejected per spec will proceed unchallenged, potentially invalidating decisions whose child plans have already applied irreversible changes.
P1-2 [BUG]
analyze_impactallows re-analysis on terminal statesFile:
correction_service.py:198-199Only guards
PENDING → ANALYZING. No rejection forAPPLIED,FAILED,CANCELLED, orREJECTED. A caller can re-analyze a completed/failed correction, silently overwriting_impacts[correction_id]with new data from different tree/DAG arguments.Impact: Post-execution audit data can be corrupted. The stored impact no longer matches what was used during execution.
Suggested fix: Reject
analyze_impactwhen status is in a terminal set.P1-3 [SECURITY] Unbounded BFS — Denial of Service via large decision tree
File:
correction_service.py:718-759_compute_affected_subtreehas no upper bound on nodes visited. Thevisitedset prevents cycles but doesn't cap total size. A pathologically large tree causes unbounded memory and CPU.Suggested fix: Add a
MAX_AFFECTED_NODESlimit (e.g., 10,000). RaiseValidationErrorwhen exceeded.P1-4 [SECURITY] Raw exception messages stored in
CorrectionResultFile:
correction_service.py:369-377, 437-444Both
execute_revertandexecute_appendcatch bareExceptionand storestr(exc)intoCorrectionResult.error_message. Internal paths, connection strings, or module names could be exposed.Suggested fix: Store a generic message; log the full exception server-side only.
P1-5 [TEST]
execute_revertis never tested in a non-dry-run scenarioFiles: All test files
No test calls
execute_revert()orexecute_correction()withmode=REVERTanddry_run=False. Lines 327-388 ofcorrection_service.pyare entirely untested — status transition to EXECUTING,CorrectionAttemptcreation, result withreverted_decisions/archived_artifacts, and the FAILED exception-handling branch.Impact: The core feature of this PR (subtree-scoped revert) is never exercised end-to-end.
P1-6 [TEST]
_assert_executablestatus guard (non-dry-run) untestedFiles: All test files
Guard (2) in
_assert_executable— preventing re-execution of APPLIED/FAILED/CANCELLED corrections — is completely untested. Only the dry_run guard (1) is exercised.Impact: A regression removing the status guard would allow double-execution of corrections.
P1-7 [RESOURCE]
fs-mountremoved fromfs-directory.parent_types— asymmetric DAGFiles:
examples/resource-types/fs-directory.yaml,_resource_registry_data.py,resource_type_builtins.feature,resource_type_fs.featurefs-directory.parent_typesno longer includesfs-mount, butfs-mount.child_typesstill declaresfs-directory. Test assertions for this relationship were removed. This creates a one-directional relationship that breaks bidirectional DAG consistency.Suggested fix: Either restore
fs-mounttofs-directory.parent_typesor also removefs-directoryfromfs-mount.child_types.P1-8 [PERFORMANCE]
analyze_impact()fully recomputed on every callFile:
correction_service.py:275, 359A typical workflow (dry-run → validate → execute) triggers 3 separate BFS traversals, plus 3x
_collect_all_decisions, 3x_compute_rollback_tier_depth, all producing identical results on the same tree.Suggested fix: Return cached
_impacts[correction_id]if inputs haven't changed, or build a lightweight analysis context cached per tree.P1-9 [BUG]
execute_revert/execute_appenddon't validate mode matches methodFile:
correction_service.py:327-388, 394-455A caller bypassing
execute_correctioncan callexecute_reverton an APPEND request (or vice versa). The result would containreverted_decisionsfor an APPEND-mode correction — semantically incoherent.Suggested fix: Add
if request.mode != CorrectionMode.REVERT: raise ValidationError(...)at the top ofexecute_revert, and similarly forexecute_append.P2 — MEDIUM SEVERITY
P2-1 [BUG] Spurious "cycle detected" warnings from convergent (non-cyclic) paths
File:
correction_service.py:723-732A node reachable via two different parents (diamond topology) gets enqueued twice. On second dequeue, the
visitedcheck fires a falsecorrection.cycle_detectedwarning. Theseen_this_roundset only prevents double-enqueueing from the same node's edges, not from different parents.Suggested fix: Use a global
enqueuedset (initialized with{target_id}) checked at enqueue time, not just at dequeue time. This eliminates both double-enqueueing andseen_this_round.P2-2 [BUG]
_collect_all_decisionsmisses isolated target decisionFile:
correction_service.py:207-209If the target is an isolated node (not in any adjacency list), it won't appear in
all_decisions, breaking theaffected + excluded = allpartition invariant.Suggested fix:
all_decisions.add(request.target_decision_id)after the_collect_all_decisionscall.P2-3 [BUG]
_compute_rollback_tier_depthreturns 0 for absent nodes — false tier-0 warningFile:
correction_service.py:790-821, 290-294Returns 0 for both "target is root" and "target not in tree." The dry-run warning check
if rollback_tier_depth == 0 and len(affected) > 1triggers falsely for non-root targets.Suggested fix: Return -1 for "not found" or verify the target is actually a root before emitting the tier-0 warning.
P2-4 [BUG]
generate_dry_run_reportmutates request statusFile:
correction_service.py:275Calls
analyze_impactwhich transitions statusPENDING → ANALYZING. A conceptually read-only preview permanently mutates lifecycle state.Suggested fix: Save and restore status around the
analyze_impactcall in the dry-run path.P2-5 [BUG] Diamond inheritance gives nondeterministic tier depth
File:
correction_service.py:808-811child_to_parent[child] = parent— last writer wins. For a child with multiple parents, the depth depends on dict iteration order.P2-6 [BUG]
execute_appendperforms no impact analysisFile:
correction_service.py:394-455Unlike
execute_revert, append never callsanalyze_impact. NoCorrectionImpactis stored. Querying_impacts[correction_id]after append raisesKeyError.P2-7 [BUG] No state-machine enforcement on
CorrectionRequest.statusFile:
correction.py:56, 86-89CorrectionRequesthasfrozen=False. Any code can assign any status at any time, including backward transitions (APPLIED → PENDING).P2-8 [BUG] Thread safety — status transitions and dict mutations unguarded
File:
correction_service.py:348-367Check-then-act pattern in execution methods is racy. Two threads can pass
_assert_executablesimultaneously and execute the same correction twice.P2-9 [SPEC] "phase" rollback tier is dead code
File:
correction_service.py:228-230The model validates
rollback_tieragainst{"full", "phase", "append_only"}, but the service only produces "full" (REVERT) or "append_only" (APPEND). The spec's "Phase-level" tier has no code path.P2-10 [SPEC] Dry-run report missing
artifacts_to_archiveandestimated_costas top-level fieldsFile:
correction.py:251-295These exist only nested in
report.impact. Compare withexcluded_decisionsanddecisions_to_invalidatewhich are surfaced as top-level fields. Inconsistent API.P2-11 [SECURITY] No input validation on
decision_tree/influence_edgescontentsFile:
correction_service.py:165-170No validation on key/value types, string lengths, max entries, or ID format at the service boundary.
P2-12 [SECURITY] Path traversal in synthetic artifact/file paths
File:
correction_service.py:215-224f"{d}.artifact"andf"{d}.py"wheredcomes from user-supplied tree keys. Ifd = "../../etc/passwd", downstream I/O could be exploitable.P2-13 [SECURITY] No
max_lengthonguidancefieldFile:
correction.py:74-77Unbounded string field.
request_correction(guidance="A" * 100_000_000)consumes memory.P2-14 [RESOURCE]
git-checkoutYAML diverges from registry dataFiles:
examples/resource-types/git-checkout.yamlvs_resource_registry_data.py:69-106YAML has
child_types: ["git", "fs-directory"], registry has 5 children. Handler paths differ. Auto_discovery present in YAML but absent in registry. BDD tests load from YAML, providing false confidence.P2-15 [TEST] Missing test: single-node tree
No test uses
{"root": []}. The BFS, tier computation, and isolation validation all have this as an edge case.P2-16 [TEST] Missing test: diamond-shaped DAG
No test for convergent paths.
_compute_rollback_tier_depthgives nondeterministic results (P2-5).P2-17 [TEST] Missing test: disconnected trees (forests)
_find_roothas explicit forest handling but no test validates it.P2-18 [TEST] Missing test: target node not in tree (full impact analysis)
Only
compute_rollback_tieris tested for unknown nodes, notanalyze_impact.P2-19 [TEST] Missing test: influence edge to node not in structural tree
No test for
{"D": ["EXTERNAL"]}where EXTERNAL isn't in the tree.P2-20 [TEST] Middle-node test uses "contains" instead of exact match
File:
correction_subtree_isolation.feature:28-36If BFS over-includes nodes, the test still passes. Add exact count or exact-match assertions.
P2-21 [TEST] Robot tests are pure sentinel checks
File:
robot/correction_subtree_isolation.robotEvery test: run helper → check exit code → check stdout sentinel. No independent field verification.
P2-22 [RESOURCE] No exclusivity assertions on parent/child type lists
File:
resource_type_deferred_physical.feature"should contain" but never "should have exactly N entries." Extra entries go undetected.
P3 — LOW SEVERITY
P3-1 [BUG]
execute_revertskips ANALYZING state in lifecyclePENDING → EXECUTINGdirectly, skippingANALYZING. The implied state machine is violated.P3-2 [BUG]
CorrectionDryRunReportduplicates fields from nestedimpactreport.excluded_decisionsandreport.impact.excluded_decisionsare always identical. Maintenance hazard.P3-3 [BUG]
CorrectionResultaccepts non-terminal statusesNo validator prevents
status=PENDINGon a result.P3-4 [BUG]
list_attemptsleaks mutable internal stateReturns shallow copy but
CorrectionAttempthasfrozen=False. Callers can corrupt attempt records.P3-5 [SECURITY] Log injection via user-controlled tree keys
File:
correction_service.py:726-731Decision IDs from user input included in structured log
node=node. Console log rendering may not escape control characters.P3-6 [SECURITY]
exc_info=Truein event_bus warning logFile:
correction_service.py:103-108Full traceback logged, potentially exposing internal paths.
P3-7 [SECURITY] No
max_lengthonplan_id/target_decision_idFile:
correction.py:62-68Validators check non-empty but not max length.
P3-8 [PERFORMANCE]
_compute_rollback_tier_depthrebuilds child-to-parent map each callFile:
correction_service.py:808-811O(E) reconstruction duplicated across calls.
P3-9 [PERFORMANCE]
_find_parentis O(E) linear scanFile:
correction_service.py:849-857Should reuse the
child_to_parentmap for O(1) lookup.P3-10 [PERFORMANCE]
validate_subtree_isolationre-runs full BFSFile:
correction_service.py:649-653Could reuse BFS result from
analyze_impact.P3-11 [PERFORMANCE]
seen_this_roundset allocated per BFS nodeFile:
correction_service.py:740Replace with a single global
enqueuedset.P3-12 [PERFORMANCE] BFS
visitedset discarded then rebuilt asaffected_setFile:
correction_service.py:208, 720Return the
visitedset from BFS to avoid O(V) rehash.P3-13 [PERFORMANCE]
_find_rootbuilds redundantall_childrensetFile:
correction_service.py:839-841Could reuse
child_to_parentmap.P3-14 [PERFORMANCE]
sorted()on excluded decisions may be unnecessaryFile:
correction_service.py:209O(K log K) sort serves no documented requirement.
P3-15 [TEST] Missing tests for
_collect_all_decisions,cancel_correction,list_corrections,get_correction,list_attemptsThese public/internal API methods have zero test coverage in the new test files.
P3-16 [TEST] Missing test for
_emit_correction_appliedevent emissionNo test provides an
event_bustoCorrectionService.P3-17 [TEST] Missing test for risk classification thresholds
_classify_riskboundary values (3, 10, 11) are untested.P3-18 [TEST] Influence cycle test and validate_subtree_isolation lack strong assertions
Cycle test only checks "contains" not exact set. Only 1 negative isolation scenario.
P3-19 [RESOURCE]
git-branchauto_discovery not tested; leaf types missingauto_discovery=noneassertionsSummary Table
Recommended Priority for Fixes Before Merge
fd4d66f816e021bb90f9Code Review Report — PR #1075 (
feat(plan): decision correction recomputes only affected subtree)Reviewer: Automated review (on behalf of project)
Branch:
feature/m4-correction-subtreeIssue: #845
Commit:
e021bb90Methodology
This review was conducted through 4 iterative cycles, each scanning all categories (bugs, test coverage, test flaws, performance, security, spec compliance) across the full scope of changes in
feature/m4-correction-subtree. Each cycle re-examined every category until no new findings emerged.Files reviewed:
src/cleveragents/application/services/correction_service.py(primary source)src/cleveragents/domain/models/core/correction.py(domain model)features/correction_subtree_isolation.feature(BDD tests)features/steps/correction_subtree_isolation_steps.py(BDD step definitions)robot/correction_subtree_isolation.robot(Robot integration tests)robot/helper_correction_subtree_isolation.py(Robot helper)CHANGELOG.mdfeatures/resource_type_deferred_physical.feature(cleanup)Specification reference:
docs/specification.md— sections on Correction Engine (§28560–28787), Affected Subtree Computation (§28665–28696), Rollback Tiers (§28710–28720), CLI Commands (§14907–15316), and Storage Schema (§45580–45593).Findings by Severity
MEDIUM Severity
M1 — Bug:
generate_dry_run_reportdoes not restore status on exception (missingtry/finally)File:
correction_service.py:296-299Category: Bug
The status preservation in
generate_dry_run_reportis not wrapped in atry/finally:If
analyze_impactraises an exception after transitioning status fromPENDINGtoANALYZING(e.g., due to aRuntimeErrorin_compute_affected_subtree),request.status = original_statusis never executed. The status is left stuck atANALYZING.Confirmed via live test:
The commit message (item 9) claims: "Fixed
generate_dry_run_report()to preserve request status so that generating a preview does not advance the correction lifecycle." The happy path IS fixed, but the exception path is not.Fix: Wrap in
try/finally:M2 — Test Gap: No test for convergent (diamond) DAG topology
File:
features/correction_subtree_isolation.featureCategory: Test Coverage
The commit message (item 7) states: "Fixed false-positive cycle-detection warnings from convergent (diamond) topologies by replacing per-node
seen_this_roundwith a globalenqueuedset in BFS." However, there is no BDD or Robot test exercising a diamond/convergent topology (e.g.,A→C,B→Cwhere node C has two parents).The existing cycle test (
Scenario: Influence DAG cycle does not cause infinite loop) testsC→D→C, which is an actual cycle — not the convergent pattern the fix specifically addresses. The fix is confirmed working (verified live), but there is no regression guard.Suggested: Add a scenario such as:
M3 — Test Gap: No test for
generate_dry_run_reportexception pathFile:
features/correction_subtree_isolation.featureCategory: Test Coverage
There is no test verifying the behavior when
analyze_impactraises insidegenerate_dry_run_report. This directly relates to finding M1. A test should confirm that either:M4 — Spec Gap:
affected_child_plansis always emptyFile:
correction_service.py:244Category: Spec Compliance
The spec (§28665–28696,
compute_affected_subtreepseudocode) explicitly describes collecting affected child plans when a subplan_spawn decision is encountered. The implementation always setsaffected_child_plans=[]with a TODO comment (lines 239-242) referencing the resource-rollback layer integration.This means:
subplan_spawndecisions will report no child plan impact.Acknowledged as deferred, but this is a core acceptance criterion gap ("CorrectionImpact accurately reports which decisions are affected vs excluded").
M5 — Spec Gap:
rollback_tierclassification is mode-based, not capability-basedFile:
correction_service.py:247-249Category: Spec Compliance
The spec (§28710–28720) defines three rollback tiers determined by system capabilities:
sandbox.strategy: none+ checkpointing disabledThe implementation sets
rollback_tierbased solely on correction mode:The
"phase"tier is supported by the model validator but never produced by the service. The spec says "System detects tier automatically and informs user." This static assignment could mislead users about actual rollback capabilities.LOW Severity
L1 — Bug:
_compute_rollback_tier_depthreturns ambiguous 0File:
correction_service.py:840-872Category: Bug
Returns
0both when the target is the tree root and when the target is not found in the tree. The docstring acknowledges the ambiguity, andgenerate_dry_run_reportcorrectly handles it (checkstarget_is_in_treebefore emitting the tier-0 warning). However,analyze_impactstoresrollback_tier_depth=0in theCorrectionImpactwithout disambiguation, potentially misleading downstream consumers of the impact model.L2 — Performance:
_TERMINALset recreated on everyanalyze_impactcallFile:
correction_service.py:202-207Category: Performance
This set is created inside
analyze_impacton every invocation. It should be a module-level or class-level constant (e.g.,_TERMINAL_STATUSES) following the pattern of_RISK_LOW_MAXand_RISK_MEDIUM_MAX.L3 — Test Gap: No test for
execute_revertwith influence edgesFile:
features/correction_subtree_isolation.featureCategory: Test Coverage
The "Execute revert correction succeeds for middle node" scenario only passes a structural tree. There is no test verifying that
execute_revertcorrectly handles the combined tree + influence_edges path (which is the full code path per §Affected Subtree Computation).L4 — Test Gap: No test for
_collect_all_decisionswith DAG-only nodesFile:
features/correction_subtree_isolation.featureCategory: Test Coverage
All tests use structural trees. There is no test verifying that nodes appearing only in the influence DAG (not in the structural tree) are correctly included in
all_decisionsand thus correctly classified as excluded.L5 — Convention: Robot tests missing
on_timeout=killonRun ProcesscallsFile:
robot/correction_subtree_isolation.robotCategory: Convention / Robustness
The
Run Processcalls do not includeon_timeout=kill, which is used in other Robot tests (e.g.,m5_acceptance.robot,m2_acceptance.robot) to prevent zombie processes when tests time out.L6 — Security: No size/depth limits on tree/DAG input
File:
correction_service.py(multiple methods)Category: Security / Robustness
The service accepts
dict[str, list[str]]for tree and influence DAG without any size validation. While this is an internal service (not a public API endpoint), an extremely large or deeply nested tree could cause significant memory consumption and processing time in BFS traversal. A defensive upper-bound check (e.g., max nodes, max edges) would improve robustness.INFORMATIONAL
I1 — Naming:
CorrectionServicevs spec'sCorrectionFlowThe spec (§43001, §43212) references
CorrectionFlowas the correction component andCorrectionFlow.correct()as the A2A method. The implementation usesCorrectionService. This is a naming divergence from the spec architecture diagram. Likely intentional (service layer naming convention vs workflow naming convention), but worth documenting.I2 — Schema gap:
CorrectionAttemptmodel vs spec DDLThe spec's
correction_attemptsDDL (§45580–45593) includesmode,guidance, andarchived_artifacts_pathcolumns. TheCorrectionAttemptmodel uses a genericdetails: dict[str, Any]catch-all instead of explicit fields. This is a normalization/denormalization design choice — mode and guidance are available via the parentCorrectionRequest.Summary
Key recommendation: Fix M1 (missing
try/finallyingenerate_dry_run_report) — this is the only confirmed code defect. M2 and M3 are the most impactful test gaps to close. M4 and M5 may be acceptable deferrals if tracked via follow-up tickets.@ -198,2 +241,4 @@# file / artifact tracking once the resource-rollback layer# is integrated (see spec § Mid-Execute Correction).affected_files=[f"{d}.py" for d in affected],affected_child_plans=[],M4 — Spec gap:
affected_child_plansalways emptyThe spec §28665–28696 pseudocode explicitly collects affected child plans when encountering
subplan_spawndecisions. This hardcoded[]means cross-plan correction cascading (§28699–28708) cannot function. The TODO is acknowledged but this is a core acceptance criterion per issue #845.@ -200,3 +245,3 @@estimated_cost=float(len(affected)) * 1.5,estimated_cost=float(len(affected)) * _COST_PER_DECISION,risk_level=risk,rollback_tier="full"M5 — Spec gap:
rollback_tieris mode-based, not capability-basedPer spec §28710–28720, the three rollback tiers are determined by system capabilities (checkpoint + sandbox state), not by correction mode. This static assignment (
"full"for revert,"append_only"for append) may mislead users about actual rollback capabilities. The"phase"tier is supported by the model validator but never produced here.@ -187,0 +199,4 @@# Reject analysis on terminal states to prevent audit-data# corruption (the stored impact must match what was used during# execution)._TERMINAL = {L2 — Performance:
_TERMINALset recreated on every callThis set is allocated on every
analyze_impactinvocation. Consider promoting it to a module-level constant (e.g.,_TERMINAL_STATUSES) following the pattern of_RISK_LOW_MAXand_RISK_MEDIUM_MAXalready at module level.@ -240,1 +294,4 @@request = self._get_request_or_raise(correction_id)# Preserve original status — dry-run is conceptually read-only.original_status = request.statusM1 — Bug: Missing
try/finallyfor status restorationIf
analyze_impactraises after transitioning status from PENDING to ANALYZING, therequest.status = original_statuson line 299 is skipped. Confirmed via live test: status gets stuck atanalyzing.Fix:
Code Review Report — PR #1075
feat(plan): decision correction recomputes only affected subtreeReviewer: CoreRasurae (automated multi-cycle review)
Scope: All code changes in
feature/m4-correction-subtreeplus close connections to surrounding codeReference: Issue #845,
docs/specification.md(Correction Engine architecture, §§ Affected Subtree Computation, Cross-Plan Correction Cascading, Rollback Tiers, Correction Safety)Methodology: 3 global review cycles across all categories (bugs, security, performance, test coverage, spec compliance) until convergence
Executive Summary
The PR delivers the core subtree-isolation logic correctly: BFS traversal, excluded-decision computation, rollback tier depth, dry-run reporting, and status guards all work as intended for well-formed inputs. The test suite (15 Behave scenarios + 18 Robot tests) covers the primary happy paths thoroughly.
However, this review identifies 47 findings across 6 categories. The most significant issues are: (1) spec compliance gaps where cross-plan cascade and rollback tier detection are stubbed rather than implemented, (2) a broad
except Exceptionin execution paths that swallows errors and leaks internal details, (3) dead code in the BFS cycle-detection path, (4) multiple redundant O(V+E) passes over the same data, and (5) critical test coverage gaps around theexecute_correctiondispatch entry point and error paths.Findings by severity:
P1 — Critical Findings
P1-1 [Test Coverage]
execute_correctiondispatch happy-path untestedEvery test that calls
execute_correction()expects it to fail (dry-run guard, status guard, cancelled guard). No test verifies that the dispatch method successfully routes REVERT toexecute_revert()and APPEND toexecute_append()and returns the correct result. A bug that swaps the dispatch branches would not be caught. This is the public API implied by AC1.correction_service.py:537-541P1-2 [Test Coverage]
execute_correctioninfluence_edges forwarding untestedexecute_correctionforwardsinfluence_edgestoexecute_revert(line 540), but no test callsexecute_correctionwith influence edges on a REVERT correction. A parameter-forwarding bug would be undetected.correction_service.py:540P1-3 [Test Coverage]
request_correctionempty input validation untestedNo test passes empty or whitespace-only
plan_idortarget_decision_idtorequest_correction()and verifiesValidationErroris raised.correction_service.py:149-152P1-4 [Test Coverage]
ResourceNotFoundErrorfor nonexistent correction_id untestedNo test calls
analyze_impact,execute_revert,execute_append,execute_correction,get_correction,list_attempts, orcancel_correctionwith a fabricated/nonexistent correction ID and verifies the error path.correction_service.py:596-604P2 — High Findings
P2-1 [Security] Bare
except Exceptionswallows errors and leaks internal detailsBoth
execute_revert(lines 422-430) andexecute_append(lines 494-502) catch any exception, storestr(exc)verbatim inCorrectionResult.error_messageandattempt.details["error"]. This: (a) silently swallows unexpected errors likeMemoryErroror PydanticValidationError, (b) exposes internal exception text (file paths, class names, query strings) to callers/UI, and (c) prevents bug detection by converting programming errors into FAILED results.RuntimeError, ValidationError, ResourceNotFoundError). Return generic error messages to callers; log full tracebacks server-side only. Re-raise unexpected exceptions after recording the attempt failure.P2-2 [Spec Compliance]
affected_child_plansnever populated — cross-plan cascade unimplementedThe spec (§ Affected Subtree Computation, § Cross-Plan Correction Cascading) requires collecting affected child plans by following
dependency_type = 'plan'edges, and rejecting corrections whose affected subtree includes already-applied child plans. The domain models exist (CascadeAction,CascadeResult,CorrectionRejection,ChildPlanState) but the service hardcodesaffected_child_plans=[]at line 258. TheREJECTEDstatus is defined but never set.correction_service.py:258P2-3 [Spec Compliance]
rollback_tierhardcoded, ignores plan checkpoint/sandbox configThe spec (§ Rollback Tiers) defines three tiers based on plan configuration. The service always reports
"full"for reverts regardless of sandbox/checkpoint config. A plan withsandbox.strategy: nonewould incorrectly claim full rollback capability. The"phase"tier value exists in the model validator but can never be produced by the service.correction_service.py:253-255P2-4 [Test Coverage] Event bus emission (
_emit_correction_applied) completely untestedNo test constructs
CorrectionService(event_bus=...)and verifies event emission. The silent-failure path (except Exceptionat line 114) is also untested. If event emission is silently broken, downstream audit/compliance listeners never learn about corrections.correction_service.py:91-120P2-5 [Test Coverage]
execute_revertfailure/exception path untestedNo test forces
analyze_impactto raise duringexecute_revertand verifiesstatus=FAILED, error message population,attempt.success=False, andattempt.details.correction_service.py:422-430P2-6 [Test Coverage]
execute_appendfailure/exception path untestedSame as P2-5 but for
execute_append. Theexceptblock at lines 494-502 is never exercised.correction_service.py:494-502P2-7 [Test Coverage]
cancel_correctionfrom non-cancellable status untestedTests cancel from PENDING state only. No test attempts to cancel from APPLIED, FAILED, EXECUTING, or REJECTED.
correction_service.py:580-584P2-8 [Test Coverage]
list_correctionsandlist_attemptszero coverageTwo public query methods have no test coverage whatsoever.
correction_service.py:555-569P2-9 [Test Coverage] Medium and high risk classification + corresponding dry-run warnings untested
No test creates trees with 4-10 decisions (medium) or >10 decisions (high) to verify
risk_leveland warning text.correction_service.py:823-829,312-318P2-10 [Test Coverage] Status guard assertions don't verify error messages
step_subtree_execute_again_raisesandstep_subtree_execute_cancelled_raisescatchValidationErrorbut don't assert on the message content. AnyValidationErrorfrom any cause would pass. Compare with dry-run and mode-mismatch steps which correctly check message content.correction_subtree_isolation_steps.py:467, 486P2-11 [Test Coverage]
CorrectionAttempttracking never verifiedAfter execution, no test verifies attempt count,
attempt.success,attempt.completed_at,attempt.details, or thatlist_attempts()returns them.correction_service.py:407-432, 475-504P2-12 [Spec Compliance] Event emission missing
attempt_idThe spec's event table says the
correction_appliedevent should include the correction attempt ID. The_emit_correction_appliedmethod includescorrection_id(the request ID) but never includesCorrectionAttempt.attempt_id.correction_service.py:100-113P3 — Medium Findings
P3-1 [Bug] Dead code: BFS cycle-detection warning is unreachable
The
enqueuedset (line 784) prevents any node from being added to the queue more than once. Therefore theif node in visitedguard (line 789) is always false, making thelogger.warning("correction.cycle_detected", ...)block dead code. Cycles are correctly handled, but operators get no diagnostic log message when cycles occur.correction_service.py:789-796enqueuedand rely solely onvisitedfor dedup (making the warning reachable).P3-2 [Bug] Tier-0 warning fires for non-root subtree roots in forests
For a forest (disconnected subtrees), a non-root subtree head would also be a key with
tier_depth=0, triggering the misleading warning "Tier 0: root decision targeted — entire decision tree will be affected" when only a disconnected subtree is affected.correction_service.py:329-338_find_rootto determine the actual root and compare:request.target_decision_id == root.P3-3 [Bug] Dry-run mutates
_impactsdict, violating non-mutation contractgenerate_dry_run_reportcallsanalyze_impact, which writesself._impacts[correction_id] = impact(line 259). Thefinallyblock only restoresrequest.status— it does not revert the_impactsmutation. This violates the documented intent: "dry-run is conceptually read-only."correction_service.py:259, 305-309_impactsstate in thefinallyblock.P3-4 [Security] No upper bound on BFS traversal — DoS vector
_compute_affected_subtreeperforms BFS with no limit on node/edge count. A caller supplying adjacency lists with millions of entries causes unbounded memory/CPU consumption.correction_service.py:748-820MAX_TREE_SIZEconstant and raiseValidationErrorif exceeded.P3-5 [Security] Unvalidated
guidancestring — no length limitThe
guidancefield has nomax_lengthconstraint. A multi-megabyte guidance string propagates verbatim into the event bus, log sinks, and stored state.correction.py:74-76,correction_service.py:110max_length=4096to the Pydantic field.P3-6 [Security]
error_messageinCorrectionResultexposes internal exception detailserror_message=str(exc)captures full exception text including potential file paths, class names, or query strings. This is persisted and returned to callers.correction_service.py:426, 500P3-7 [Security] Decision IDs not validated — path traversal in synthetic paths
Decision IDs from adjacency lists are used to construct
f"{d}.py"andf"{d}.artifact"without format validation. Path traversal sequences in IDs could be exploitable in future integrations.correction_service.py:240, 249P3-8 [Security] Event bus failures silently swallowed — audit gap
The
except Exceptionin_emit_correction_appliedcatches all failures and only logs atwarninglevel. For audit-critical deployments, corrections can be applied without any event trail.correction_service.py:114-120errorlevel; consider making failure handling configurable.P3-9 [Performance]
execute_revertunconditionally recomputes already-cached impactThe typical flow is
analyze_impact()→ review →execute_revert(). Butexecute_revertalways callsanalyze_impact()again (line 411), doubling all O(V+E) work even when the impact was already computed and stored inself._impacts.correction_service.py:411impact = self._impacts.get(correction_id) or self.analyze_impact(...).P3-10 [Performance]
analyze_impactmakes 3 independent O(V+E) passesThree methods each independently iterate the tree/dag:
_compute_affected_subtree(BFS),_collect_all_decisions(full iteration),_compute_rollback_tier_depth(builds child→parent map). All scan the same data.correction_service.py:225, 229, 237child_to_parentmap once and derive all needed structures from a single pass.P3-11 [Performance]
validate_subtree_isolationmakes 3 redundant O(V+E) passesBFS +
_find_root+_find_parenteach independently scan the adjacency list.correction_service.py:707-731child_to_parentmap once at the top of the method.P3-12 [Performance]
_find_parentis O(E) linear scanScans all tree entries to find a parent. Should use a child→parent dict for O(1) lookup.
correction_service.py:914-918P3-13 [Spec Compliance]
--guidancerequired per spec but optional in codeThe spec CLI synopsis (line 340) shows
(--guidance|-g) <GUIDANCE>as a required argument. Butrequest_correction()defaultsguidance="".correction_service.py:131,correction.py:74-76P3-14 [Consistency]
execute_appendskips ANALYZING state transitionexecute_revert: PENDING → ANALYZING → EXECUTING → APPLIED.execute_append: PENDING → EXECUTING → APPLIED (skips ANALYZING).Consumers expecting ANALYZING for all corrections will see it only for reverts.
correction_service.py:405 vs 473P3-15 [Consistency]
execute_correctionsilently discards parameters for append modedecision_treeandinfluence_edgesare silently discarded when dispatching to append. A caller providing these parameters gets no feedback.correction_service.py:541P3-16 [Test Coverage] Multi-level (3+) subtree end-to-end correction missing
AC5 requires "multi-level subtree correction" tests. A depth-4 tree exists for rollback tier tests but never for a full correction execution.
P3-17 [Test Coverage] Dry-run side-effect absence not verified
AC4 requires "dry-run mode produces a report without applying changes." Tests verify content and status restoration, but no test asserts that
_attempts,_results,_impactsare NOT populated after dry-run.P3-18 [Test Coverage]
excluded_decisionscompleteness (complement property) unverifiedTests check individual decisions are in the excluded set but never assert
affected ∪ excluded = all_decisions. A bug omitting a decision from both sets would pass.P3-19 [Test Coverage]
rollback_tierfield value ("full"/"append_only") untestedCorrectionImpact.rollback_tieris set to "full" for REVERT and "append_only" for APPEND but no test verifies this.correction_service.py:253-255P3-20 [Test Coverage] Dry-run report for APPEND mode untested
No test generates a dry-run report for an APPEND mode correction. The code path at line 348-350 sets
decisions_to_invalidate=[]for append mode but this is never tested.P3-21 [Test Coverage] Domain model validators never directly tested
No test verifies that
CorrectionImpactrejects invalidrisk_levelorrollback_tiervalues, thatCorrectionRequestrejects empty fields at model level, thatguidanceis stripped, or thatCascadeActionrejects invalidactionvalues.correction.py:91-108, 161-175, 324-329P3-22 [Test Coverage] 3 Behave scenarios missing Robot Framework equivalents
No Robot equivalent for: structural root contamination (negative isolation), single-node tree edge case, terminal state guard re-analysis. Creates coverage asymmetry.
P4 — Low Findings
P4-1 [Bug] Dead exception handler in
execute_appendThe
tryblock (lines 478-493) contains onlyULID()calls and Pydantic construction with literals — none can raise. Theexcept Exceptionblock is dead code.correction_service.py:478-502P4-2 [Bug] Status transition before attempt creation in
execute_revertStatus is set to ANALYZING (line 405) before
CorrectionAttemptcreation (line 407). If attempt creation fails, status is inconsistent.correction_service.py:405-408P4-3 [Security] TOCTOU race in dry-run status save/restore
Another thread could observe the intermediate ANALYZING status during dry-run. Currently single-threaded but unsafe if concurrency is introduced.
correction_service.py:305-309P4-4 [Performance] Set literals rebuilt per call instead of module-level frozenset
_assert_executableandcancel_correctioncreate identical{PENDING, ANALYZING}set literals on every call. Should be a module-level_EXECUTABLE_STATUSESfrozenset, consistent with_TERMINAL_STATUSES.correction_service.py:618, 579P4-5 [Performance]
sorted()on excluded decisions — O(N log N) for determinismIf ordering is not contractually required, a list comprehension would be O(N).
correction_service.py:234P4-6 [Performance] Synthetic f-string list comprehensions for placeholder data
Two list comprehensions allocate throwaway strings for
affected_filesandartifacts_to_archive. Could be deferred until real artifact tracking exists.correction_service.py:240, 249P4-7 [Consistency]
CorrectionDryRunReporthas redundant fields duplicatingimpactdataThe report contains both
impact.excluded_decisions/impact.rollback_tier_depthAND separate top-level fields. These can diverge if constructed manually.correction.py:280-286P4-8 [Style] Independent
iffor mutually exclusive risk levelsShould be
elifsincerisk_levelis always exactly one of{low, medium, high}.correction_service.py:312-317P4-9 [Test Coverage] Robot tests structurally fragile — stdout-marker pattern
All Robot tests check
rc == 0andstdout contains <marker>. If a helper silently swallows an assertion and still prints the marker, all Robot tests are false passes.Recommendations
Before merge (must fix):
execute_correctiondispatch (both modes, with influence_edges)except Exceptionto expected operational exceptions; sanitize error messagesShould fix (near-term):
4. P2-2/P2-3: Document that cross-plan cascade and rollback tier detection are deferred, or implement stub rejection logic
5. P3-1: Fix or remove dead BFS cycle-detection code
6. P3-3: Restore
_impactsstate in dry-run finally block7. P3-9/P3-10: Reduce redundant O(V+E) passes by building a shared tree index
8. P2-4 through P2-11: Add tests for event emission, failure paths, cancel guards, query methods, attempt tracking
Nice to have:
9. P3-4/P3-5: Add BFS size bound and guidance max_length
10. P4-4/P4-8: Module-level frozenset constants, elif for risk levels
e021bb90f96512e1a902Review: APPROVE (with minor suggestions)
This is a well-structured, thoroughly tested PR that delivers the M4 acceptance criterion for subtree-scoped corrections. The implementation is correct, file organization follows CONTRIBUTING.md guidelines, and test coverage is comprehensive (30 BDD scenarios + 18 Robot integration tests).
PR Description Quality ✅
ISSUES CLOSED: #845— proper closing keyword presentFile Organization ✅
src/, features infeatures/, steps infeatures/steps/, integration tests inrobot/correction_subtree_isolation_steps.pycorrectly named aftercorrection_subtree_isolation.featureMinor Issues (Non-Blocking)
1.
_MAX_TREE_NODESguard counts adjacency-list keys, not actual nodes —total_keys = len(tree) + len(dag)only counts parent entries. A tree with 1 parent and 50,001 children haslen(tree)==1and passes the guard. Considerlen(self._collect_all_decisions(tree, dag))for accurate node count, or rename to_MAX_TREE_PARENTS.2.
execute_appendhas a no-op ANALYZING → EXECUTING transition — Two consecutive status assignments with no work between them mean ANALYZING is never observable. Inexecute_revert, ANALYZING is meaningful becauseanalyze_impact()runs between transitions. Here it's purely ceremonial. Consider a comment or removing the dead transition.3.
validate_subtree_isolationacceptsinfluence_edgesbut ignores it — Accepted "for API consistency" per docstring, but silently discarded. Consider_influence_edgesprefix to signal it's unused, or omit it.4. Unrelated duplicate scenario removal in
resource_type_deferred_physical.feature— Correct fix (they were duplicates), but slightly breaks commit atomicity per CONTRIBUTING.md. Acceptable, but ideally a separate cleanup commit.5.
_compute_rollback_tier_depthreturns 0 for both "root" and "unknown node" — Documented ambiguity, correctly guarded at the call site by_find_root()comparison, but could bite future callers. Consider an inline comment at theanalyze_impactcall site.Positive Observations
_TERMINAL_STATUSES,_EXECUTABLE_STATUSES) reduce scattered magic setssubtree isolat) to avoid BehaveAmbiguousStepconflictsAll minor issues are suggestions for improvement — none are blocking.
Minor:
total_keys = len(tree) + len(dag)counts only parent entries in the adjacency lists, not unique nodes. A tree with 1 parent node and 50,001 children haslen(tree)==1and passes this guard, yet BFS would visit 50,002 nodes.Consider either:
total_nodes = len(self._collect_all_decisions(tree, dag))for an accurate node count_MAX_TREE_NODESto_MAX_TREE_PARENTSto be transparent about what is actually checkedNot blocking since the guard still bounds the problem to ~2× the stated limit in realistic trees.
@ -44,0 +65,4 @@_EXECUTABLE_STATUSES: frozenset[CorrectionStatus] = frozenset({CorrectionStatus.PENDING, CorrectionStatus.ANALYZING})Minor: These two consecutive assignments mean the ANALYZING state is never observable — no work runs between them, no event is emitted, nothing is persisted.
Contrast with
execute_revertwhere ANALYZING is meaningful becauseanalyze_impact()runs between the transitions. Here it's purely ceremonial.Consider either:
Minor: This parameter is accepted "for API consistency" per the docstring, but silently discarded (structural-only BFS is used). This could mislead callers into believing influence edges affect isolation validation.
Consider either
_influence_edgesprefix to signal it's unused, or omitting it if the spec is clear that isolation is structural-only.Review: APPROVED (with minor comments)
Solid implementation of subtree-scoped correction isolation. The code is well-structured with comprehensive BDD (15 scenarios) and Robot Framework integration tests.
Minor observations (non-blocking):
_MAX_TREE_NODESguard inanalyze_impact()countslen(tree.keys())which only counts parent nodes, not the full set of unique nodes (children that aren't parents would be missed). Consider collecting all unique nodes across both keys and values for an accurate count.No-op transition in
execute_append()— The ANALYZING → EXECUTING transition mirrorsexecute_revertfor consistency, but in append mode the analyze step is essentially a pass-through. A brief comment explaining this is intentional for lifecycle consistency would help future maintainers.validate_subtree_isolation()ignoresinfluence_edgesparameter — The method signature accepts influence edges but the BFS only walks the structural tree. The step definition for "validate isolation for target with influence edges" passes influence edges, but the validation method doesn't use them. This appears intentional (structural-only check) but should be documented.Step prefix consistency — The
subtree isolatprefix convention is pragmatic for avoiding AmbiguousStep errors. Good approach.CHANGELOG entry — The entry is quite long (50+ lines for one bullet). Consider breaking it into sub-bullets for readability.
File organization is correct, issue reference (
ISSUES CLOSED: #845) is present, and the PR description is thorough. Well done.Day 43 Review — PR #1075
feat(plan): correction recomputes only affected subtreeMilestone: v3.3.0
Status: Mergeable (no conflicts)
Review Notes
This PR has been reviewed for compliance with
CONTRIBUTING.mdstandards. Key checks:Action Items
Closes #NNN)Please ensure all subtasks in the linked issue are complete before merging.
6512e1a9021ffb0fddbeNew commits pushed, approval review dismissed automatically according to repository settings