refactor(correction): eliminate redundant fields in CorrectionDryRunReport model #1087

Closed
opened 2026-03-21 00:05:12 +00:00 by CoreRasurae · 6 comments
Member

Metadata

  • Commit Message: refactor(correction): eliminate redundant fields in CorrectionDryRunReport
  • Branch: refactor/m4-dry-run-report-fields

Background and Context

During the code review of PR #1075 (issue #845 — subtree isolation for the correction engine), finding P4-7 identified a data redundancy in the CorrectionDryRunReport domain model. The report embeds a full CorrectionImpact object and duplicates three of its fields at the top level:

Report top-level field Duplicated from
CorrectionDryRunReport.excluded_decisions: list[str] CorrectionDryRunReport.impact.excluded_decisions
CorrectionDryRunReport.rollback_tier_depth: int CorrectionDryRunReport.impact.rollback_tier_depth
CorrectionDryRunReport.child_plans_to_rollback: list[str] CorrectionDryRunReport.impact.affected_child_plans

The service populates both from the same source in CorrectionService.generate_dry_run_report():

report = CorrectionDryRunReport(
    ...
    impact=impact,                                          # full impact object
    child_plans_to_rollback=impact.affected_child_plans,    # ← copy of impact field
    excluded_decisions=list(impact.excluded_decisions),      # ← copy of impact field
    rollback_tier_depth=impact.rollback_tier_depth,          # ← copy of impact field
    ...
)

Why this matters:

  1. Divergence risk: Both models are frozen=True Pydantic models, so post-construction mutation is impossible. However, nothing enforces consistency at construction time. If the report is constructed manually (e.g., in tests, serialization round-trips, or future code paths), the top-level fields can hold different values than impact contains. A consumer reading report.excluded_decisions vs report.impact.excluded_decisions could get different answers.

  2. Spec alignment: The specification's JSON output format for --dry-runplan correct --dry-run output, around line 15257) has a flat structure — data.would_revert.unaffected_decisions, data.would_revert.decisions_to_invalidate, etc. It does not embed a separate impact sub-object. The current model is a hybrid that matches neither a clean nested design nor the spec's flat CLI output format.

  3. Ambiguous source of truth: No documentation or docstring declares which field is authoritative. Consumers have two paths to the same data with no guidance on which to use.

Current Behavior

CorrectionDryRunReport contains both a nested impact: CorrectionImpact field and three top-level fields (excluded_decisions, rollback_tier_depth, child_plans_to_rollback) that duplicate data already present inside impact. The Pydantic model permits constructing instances where these diverge.

Affected code locations:

  • Model: cleveragents.domain.models.core.correction.CorrectionDryRunReport
  • Service: cleveragents.application.services.correction_service.CorrectionService.generate_dry_run_report

Expected Behavior

Each datum should have exactly one authoritative source in the model. Consumers should not need to choose between report.excluded_decisions and report.impact.excluded_decisions — there should be one obvious, correct way to access each piece of data.

Acceptance Criteria

  1. The CorrectionDryRunReport model has no data redundancy — each datum is stored exactly once.
  2. Existing consumers (Behave tests, Robot tests, CLI rendering) continue to work without breakage.
  3. It is impossible to construct a CorrectionDryRunReport where top-level accessors and impact sub-fields disagree.
  4. The chosen approach is documented with a rationale comment in the model.
  5. All existing tests pass. No coverage regression.

Analysis and Options

Three approaches were identified during the PR #1075 review:

Option A — Flatten (remove impact)

Remove the nested impact field entirely. Promote all CorrectionImpact fields to the top level of CorrectionDryRunReport. This matches the spec's flat CLI output format most closely.

Pros: Matches spec JSON output; no nesting ambiguity; simplest consumer access.
Cons: Largest refactor surface. All consumers reading report.impact.* must be updated. Loses the semantic grouping of "impact analysis" as a distinct concept. The CorrectionImpact model would become CLI/internal-only.

Option B — Nest-only (remove duplicated top-level fields)

Remove excluded_decisions, rollback_tier_depth, and child_plans_to_rollback from CorrectionDryRunReport. Consumers access these through report.impact.*.

Pros: Clean data model with single source of truth; CorrectionImpact remains reusable as a standalone result.
Cons: All consumers reading report.excluded_decisions etc. must be updated to report.impact.excluded_decisions. Minor API breakage.

Option C — Computed properties (delegate to impact)

Keep both the nested impact field and the top-level field names, but make the top-level fields read-only @computed_field properties that delegate to impact. This eliminates divergence while maintaining backward-compatible access patterns.

@computed_field  # type: ignore[prop-decorator]
@property
def excluded_decisions(self) -> list[str]:
    return list(self.impact.excluded_decisions)

@computed_field  # type: ignore[prop-decorator]
@property
def rollback_tier_depth(self) -> int:
    return self.impact.rollback_tier_depth

@computed_field  # type: ignore[prop-decorator]
@property
def child_plans_to_rollback(self) -> list[str]:
    return list(self.impact.affected_child_plans)

Pros: Zero breakage for existing consumers; enforced consistency; backward compatible.
Cons: Slight complexity; frozen=True + @computed_field requires validation that Pydantic handles this correctly; serialization output includes both the impact object and the computed fields (larger JSON payload).

Subtasks

  • Decide on approach (A, B, or C) — document rationale
  • Implement the chosen refactoring in the domain model
  • Update CorrectionService.generate_dry_run_report() to match
  • Update all Behave step definitions that access the affected fields
  • Update all Robot Framework helpers that access the affected fields
  • Verify no Pyright type errors
  • Verify all unit tests pass (nox -s unit_tests)
  • Verify all integration tests pass (nox -s integration_tests)
  • Verify coverage >= 97% (nox -s coverage_report)
  • Run full nox suite, fix any errors
  • Update CHANGELOG.md

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `refactor(correction): eliminate redundant fields in CorrectionDryRunReport` - **Branch**: `refactor/m4-dry-run-report-fields` ## Background and Context During the code review of PR #1075 (issue #845 — subtree isolation for the correction engine), finding **P4-7** identified a data redundancy in the `CorrectionDryRunReport` domain model. The report embeds a full `CorrectionImpact` object **and** duplicates three of its fields at the top level: | Report top-level field | Duplicated from | |---|---| | `CorrectionDryRunReport.excluded_decisions: list[str]` | `CorrectionDryRunReport.impact.excluded_decisions` | | `CorrectionDryRunReport.rollback_tier_depth: int` | `CorrectionDryRunReport.impact.rollback_tier_depth` | | `CorrectionDryRunReport.child_plans_to_rollback: list[str]` | `CorrectionDryRunReport.impact.affected_child_plans` | The service populates both from the same source in `CorrectionService.generate_dry_run_report()`: ```python report = CorrectionDryRunReport( ... impact=impact, # full impact object child_plans_to_rollback=impact.affected_child_plans, # ← copy of impact field excluded_decisions=list(impact.excluded_decisions), # ← copy of impact field rollback_tier_depth=impact.rollback_tier_depth, # ← copy of impact field ... ) ``` **Why this matters:** 1. **Divergence risk**: Both models are `frozen=True` Pydantic models, so post-construction mutation is impossible. However, nothing enforces consistency at construction time. If the report is constructed manually (e.g., in tests, serialization round-trips, or future code paths), the top-level fields can hold different values than `impact` contains. A consumer reading `report.excluded_decisions` vs `report.impact.excluded_decisions` could get different answers. 2. **Spec alignment**: The specification's JSON output format for `--dry-run` (§ `plan correct --dry-run` output, around line 15257) has a **flat** structure — `data.would_revert.unaffected_decisions`, `data.would_revert.decisions_to_invalidate`, etc. It does **not** embed a separate `impact` sub-object. The current model is a hybrid that matches neither a clean nested design nor the spec's flat CLI output format. 3. **Ambiguous source of truth**: No documentation or docstring declares which field is authoritative. Consumers have two paths to the same data with no guidance on which to use. ## Current Behavior `CorrectionDryRunReport` contains both a nested `impact: CorrectionImpact` field and three top-level fields (`excluded_decisions`, `rollback_tier_depth`, `child_plans_to_rollback`) that duplicate data already present inside `impact`. The Pydantic model permits constructing instances where these diverge. **Affected code locations:** - Model: `cleveragents.domain.models.core.correction.CorrectionDryRunReport` - Service: `cleveragents.application.services.correction_service.CorrectionService.generate_dry_run_report` ## Expected Behavior Each datum should have exactly one authoritative source in the model. Consumers should not need to choose between `report.excluded_decisions` and `report.impact.excluded_decisions` — there should be one obvious, correct way to access each piece of data. ## Acceptance Criteria 1. The `CorrectionDryRunReport` model has **no data redundancy** — each datum is stored exactly once. 2. Existing consumers (Behave tests, Robot tests, CLI rendering) continue to work without breakage. 3. It is impossible to construct a `CorrectionDryRunReport` where top-level accessors and `impact` sub-fields disagree. 4. The chosen approach is documented with a rationale comment in the model. 5. All existing tests pass. No coverage regression. ## Analysis and Options Three approaches were identified during the PR #1075 review: ### Option A — Flatten (remove `impact`) Remove the nested `impact` field entirely. Promote all `CorrectionImpact` fields to the top level of `CorrectionDryRunReport`. This matches the spec's flat CLI output format most closely. **Pros:** Matches spec JSON output; no nesting ambiguity; simplest consumer access. **Cons:** Largest refactor surface. All consumers reading `report.impact.*` must be updated. Loses the semantic grouping of "impact analysis" as a distinct concept. The `CorrectionImpact` model would become CLI/internal-only. ### Option B — Nest-only (remove duplicated top-level fields) Remove `excluded_decisions`, `rollback_tier_depth`, and `child_plans_to_rollback` from `CorrectionDryRunReport`. Consumers access these through `report.impact.*`. **Pros:** Clean data model with single source of truth; `CorrectionImpact` remains reusable as a standalone result. **Cons:** All consumers reading `report.excluded_decisions` etc. must be updated to `report.impact.excluded_decisions`. Minor API breakage. ### Option C — Computed properties (delegate to `impact`) Keep both the nested `impact` field and the top-level field names, but make the top-level fields read-only `@computed_field` properties that delegate to `impact`. This eliminates divergence while maintaining backward-compatible access patterns. ```python @computed_field # type: ignore[prop-decorator] @property def excluded_decisions(self) -> list[str]: return list(self.impact.excluded_decisions) @computed_field # type: ignore[prop-decorator] @property def rollback_tier_depth(self) -> int: return self.impact.rollback_tier_depth @computed_field # type: ignore[prop-decorator] @property def child_plans_to_rollback(self) -> list[str]: return list(self.impact.affected_child_plans) ``` **Pros:** Zero breakage for existing consumers; enforced consistency; backward compatible. **Cons:** Slight complexity; `frozen=True` + `@computed_field` requires validation that Pydantic handles this correctly; serialization output includes both the `impact` object and the computed fields (larger JSON payload). ## Subtasks - [ ] Decide on approach (A, B, or C) — document rationale - [ ] Implement the chosen refactoring in the domain model - [ ] Update `CorrectionService.generate_dry_run_report()` to match - [ ] Update all Behave step definitions that access the affected fields - [ ] Update all Robot Framework helpers that access the affected fields - [ ] Verify no Pyright type errors - [ ] Verify all unit tests pass (`nox -s unit_tests`) - [ ] Verify all integration tests pass (`nox -s integration_tests`) - [ ] Verify coverage >= 97% (`nox -s coverage_report`) - [ ] Run full `nox` suite, fix any errors - [ ] Update CHANGELOG.md ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo self-assigned this 2026-03-22 16:34:57 +00:00
freemo added this to the v3.5.0 milestone 2026-03-22 16:34:57 +00:00
Owner

Triage

Milestone: v3.5.0
Assignee: @freemo
Labels: Type/Task, Priority/Medium, State/Unverified, Points/3, MoSCoW/Could have

Rationale: This is a refactoring task to clean up redundant fields in CorrectionDryRunReport. It has no user-facing impact and is not blocking any other work, so it fits comfortably in v3.5.0 as a code-health improvement. Estimated at 3 points — the change itself is straightforward, but verifying no downstream consumers depend on the redundant fields requires careful review. Assigned to @freemo who is familiar with the correction domain.

## Triage **Milestone:** v3.5.0 **Assignee:** @freemo **Labels:** Type/Task, Priority/Medium, State/Unverified, Points/3, MoSCoW/Could have **Rationale:** This is a refactoring task to clean up redundant fields in `CorrectionDryRunReport`. It has no user-facing impact and is not blocking any other work, so it fits comfortably in v3.5.0 as a code-health improvement. Estimated at 3 points — the change itself is straightforward, but verifying no downstream consumers depend on the redundant fields requires careful review. Assigned to @freemo who is familiar with the correction domain.
Owner

Planning Agent — Discussion Review

Triage confirmed. v3.5.0, Priority/Medium, MoSCoW/Could have, 3 points, assigned to @freemo.

This is a code-health refactoring task with no user-facing impact. The 3-point estimate accounts for verifying no downstream consumers depend on the redundant fields — this is the right concern. The CorrectionDryRunReport model may be consumed by multiple code paths (CLI display, persistence, event emission).

Recommendation: Before starting the refactoring, grep for all usages of the redundant fields to map the blast radius. If any external consumers (tests, integration points) reference the fields, the estimate may need to increase.

No disputes. Low urgency, correct placement as a code-health improvement.

## Planning Agent — Discussion Review Triage confirmed. v3.5.0, Priority/Medium, MoSCoW/Could have, 3 points, assigned to @freemo. This is a code-health refactoring task with no user-facing impact. The 3-point estimate accounts for verifying no downstream consumers depend on the redundant fields — this is the right concern. The `CorrectionDryRunReport` model may be consumed by multiple code paths (CLI display, persistence, event emission). **Recommendation:** Before starting the refactoring, grep for all usages of the redundant fields to map the blast radius. If any external consumers (tests, integration points) reference the fields, the estimate may need to increase. No disputes. Low urgency, correct placement as a code-health improvement.
Owner

PR #1278 created on branch refactor/m4-dry-run-report-fields. PR review and merge handled by continuous review stream.

PR #1278 created on branch `refactor/m4-dry-run-report-fields`. PR review and merge handled by continuous review stream.
Owner

PR #1278 reviewed, approved, and merged (squash).

Review summary: All three redundant fields (excluded_decisions, rollback_tier_depth, child_plans_to_rollback) removed from CorrectionDryRunReport. All consumers (service layer, Behave steps, Robot helpers) updated to use report.impact.* access paths. Codebase-wide search confirmed no missed references. CHANGELOG updated. Clean single-commit merge.

PR #1278 reviewed, approved, and merged (squash). **Review summary**: All three redundant fields (`excluded_decisions`, `rollback_tier_depth`, `child_plans_to_rollback`) removed from `CorrectionDryRunReport`. All consumers (service layer, Behave steps, Robot helpers) updated to use `report.impact.*` access paths. Codebase-wide search confirmed no missed references. CHANGELOG updated. Clean single-commit merge.
Owner

[Backlog Groomer - groomer-1] 📋 Label state mismatch. This issue has State/Unverified but PR #1278 (refactor(correction): eliminate redundant fields in CorrectionDryRunResponse) is open and references this issue. The state should be updated to State/In Review.

**[Backlog Groomer - groomer-1]** 📋 **Label state mismatch.** This issue has `State/Unverified` but PR #1278 (`refactor(correction): eliminate redundant fields in CorrectionDryRunResponse`) is open and references this issue. The state should be updated to `State/In Review`.
Owner

PR #1278 reviewed, approved, and merged.

All acceptance criteria verified:

  1. No data redundancy — each datum stored exactly once via CorrectionImpact
  2. All consumers (Behave steps, Robot helpers, service layer) updated without breakage
  3. Impossible to construct divergent top-level vs impact fields (redundant fields removed entirely)
  4. Design rationale documented in model docstring (Option B — nest-only)
  5. All tests pass (100 correction scenarios, 0 failures)
PR #1278 reviewed, approved, and merged. All acceptance criteria verified: 1. ✅ No data redundancy — each datum stored exactly once via `CorrectionImpact` 2. ✅ All consumers (Behave steps, Robot helpers, service layer) updated without breakage 3. ✅ Impossible to construct divergent top-level vs impact fields (redundant fields removed entirely) 4. ✅ Design rationale documented in model docstring (Option B — nest-only) 5. ✅ All tests pass (100 correction scenarios, 0 failures)
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#1087
No description provided.