feat(db): add correction_attempts table per specification DDL #1145

Merged
CoreRasurae merged 1 commit from feat/correction-attempts-table into master 2026-03-29 15:36:58 +00:00
Member

Summary

Added CorrectionAttemptModel SQLAlchemy model implementing the spec-defined correction_attempts table for tracking decision correction workflows (revert/append modes). Includes all 10 columns from the spec DDL, FK constraints, index, Alembic migration, repository CRUD layer, and comprehensive tests.

Changes

  • CorrectionAttemptRecord domain model and CorrectionAttemptState enum
  • CorrectionAttemptModel with FK constraints to v3_plans and decisions
  • CorrectionAttemptRepository with full CRUD operations
  • Alembic migration m8_001_correction_attempts_table
  • 13 Behave scenarios + 5 Robot integration tests
  • Coverage: 98% (exceeds 97% requirement)

Quality Gates

All nox stages pass: lint, typecheck, unit_tests (12,303 scenarios), integration_tests, coverage_report.

Closes #920

## Summary Added `CorrectionAttemptModel` SQLAlchemy model implementing the spec-defined `correction_attempts` table for tracking decision correction workflows (revert/append modes). Includes all 10 columns from the spec DDL, FK constraints, index, Alembic migration, repository CRUD layer, and comprehensive tests. ## Changes - `CorrectionAttemptRecord` domain model and `CorrectionAttemptState` enum - `CorrectionAttemptModel` with FK constraints to `v3_plans` and `decisions` - `CorrectionAttemptRepository` with full CRUD operations - Alembic migration `m8_001_correction_attempts_table` - 13 Behave scenarios + 5 Robot integration tests - Coverage: **98%** (exceeds 97% requirement) ## Quality Gates All nox stages pass: lint, typecheck, unit_tests (12,303 scenarios), integration_tests, coverage_report. Closes #920
CoreRasurae added this to the v3.5.0 milestone 2026-03-24 03:45:25 +00:00
freemo approved these changes 2026-03-24 15:27:21 +00:00
Dismissed
freemo left a comment

Review: APPROVED

Excellent full-stack CRUD implementation. Alembic migration with proper upgrade/downgrade, CHECK constraints, and indexes. Domain model uses field_validator for non-empty required fields. Repository follows established patterns (session-factory, @database_retry, proper exception hierarchy). Error handling is thorough: DuplicateCorrectionAttemptError, CorrectionAttemptNotFoundError, IntegrityError with rollback, and OperationalError catch blocks on every operation. 13 BDD scenarios + 5 Robot test cases — both present.

Important Note

Alembic migration conflict with PR #1147. Both PRs share the same down_revision = "m4_003_plan_env_columns". Merging both will create multiple Alembic heads. One migration must be rebased onto the other before merging both. Coordinate with the #1147 author to determine merge order.

Minor Note

time.sleep(0.01) in the ordering test is a fragile timing hack. Consider using deterministic timestamps instead.

## Review: APPROVED Excellent full-stack CRUD implementation. Alembic migration with proper `upgrade`/`downgrade`, CHECK constraints, and indexes. Domain model uses `field_validator` for non-empty required fields. Repository follows established patterns (`session-factory`, `@database_retry`, proper exception hierarchy). Error handling is thorough: `DuplicateCorrectionAttemptError`, `CorrectionAttemptNotFoundError`, `IntegrityError` with rollback, and `OperationalError` catch blocks on every operation. 13 BDD scenarios + 5 Robot test cases — both present. ### Important Note **Alembic migration conflict with PR #1147.** Both PRs share the same `down_revision = "m4_003_plan_env_columns"`. Merging both will create multiple Alembic heads. One migration must be rebased onto the other before merging both. Coordinate with the #1147 author to determine merge order. ### Minor Note `time.sleep(0.01)` in the ordering test is a fragile timing hack. Consider using deterministic timestamps instead.
freemo approved these changes 2026-03-27 17:10:45 +00:00
Dismissed
freemo left a comment

Review: feat(db): add correction_attempts table per specification DDL

Approved with comments. Complete vertical slice implementation.

Issues to Address

1. Alembic migration conflict — merge blocker (High)
down_revision = "m4_003_plan_env_columns" — same parent as PRs #1074 and #1147. Must coordinate merge order. The first PR to merge keeps its down_revision; subsequent PRs must rebase.

2. Repository methods typed as Any (Medium)

def create(self, record: Any) -> Any:
def get(self, correction_attempt_id: str) -> Any:

These should use CorrectionAttemptRecord. Even if avoiding circular imports, use TYPE_CHECKING imports:

from __future__ import annotations
from typing import TYPE_CHECKING
if TYPE_CHECKING:
    from ...domain.models import CorrectionAttemptRecord

3. Fragile test ordering (Low)
time.sleep(0.01) in the "3 persisted correction attempts" scenario relies on timing for ordering. Should use explicit created_at values instead.

4. No cascade deletion test (Low)
The migration defines ON DELETE CASCADE for plan_id and original_decision_id, but no test verifies cascade behavior.

What's Good

  • Full vertical slice: domain model → ORM → repository → UoW → migration → tests.
  • CorrectionAttemptRecord Pydantic model has proper field validators for plan_id and original_decision_id.
  • Repository follows session_factory pattern (ADR-007) with @database_retry decorators.
  • Check constraints ensure mode ∈ {revert, append} and state ∈ {pending, executing, complete, failed}.
  • 12 BDD scenarios, 5 Robot test cases.
## Review: feat(db): add correction_attempts table per specification DDL **Approved with comments.** Complete vertical slice implementation. ### Issues to Address **1. Alembic migration conflict — merge blocker (High)** `down_revision = "m4_003_plan_env_columns"` — same parent as PRs #1074 and #1147. **Must coordinate merge order.** The first PR to merge keeps its `down_revision`; subsequent PRs must rebase. **2. Repository methods typed as `Any` (Medium)** ```python def create(self, record: Any) -> Any: def get(self, correction_attempt_id: str) -> Any: ``` These should use `CorrectionAttemptRecord`. Even if avoiding circular imports, use `TYPE_CHECKING` imports: ```python from __future__ import annotations from typing import TYPE_CHECKING if TYPE_CHECKING: from ...domain.models import CorrectionAttemptRecord ``` **3. Fragile test ordering (Low)** `time.sleep(0.01)` in the "3 persisted correction attempts" scenario relies on timing for ordering. Should use explicit `created_at` values instead. **4. No cascade deletion test (Low)** The migration defines `ON DELETE CASCADE` for `plan_id` and `original_decision_id`, but no test verifies cascade behavior. ### What's Good - Full vertical slice: domain model → ORM → repository → UoW → migration → tests. - `CorrectionAttemptRecord` Pydantic model has proper field validators for `plan_id` and `original_decision_id`. - Repository follows `session_factory` pattern (ADR-007) with `@database_retry` decorators. - Check constraints ensure `mode ∈ {revert, append}` and `state ∈ {pending, executing, complete, failed}`. - 12 BDD scenarios, 5 Robot test cases.
CoreRasurae force-pushed feat/correction-attempts-table from 1022b71eb5
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m18s
CI / build (pull_request) Successful in 21s
CI / typecheck (pull_request) Successful in 3m56s
CI / security (pull_request) Successful in 3m55s
CI / quality (pull_request) Successful in 4m6s
CI / e2e_tests (pull_request) Successful in 7m47s
CI / unit_tests (pull_request) Successful in 10m20s
CI / docker (pull_request) Successful in 52s
CI / integration_tests (pull_request) Successful in 10m15s
CI / coverage (pull_request) Successful in 10m9s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 51m7s
to fa4380cac2
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m13s
CI / quality (pull_request) Successful in 4m20s
CI / integration_tests (pull_request) Successful in 9m5s
CI / e2e_tests (pull_request) Successful in 9m2s
CI / unit_tests (pull_request) Successful in 9m17s
CI / docker (pull_request) Successful in 1m14s
CI / coverage (pull_request) Successful in 11m30s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 10:36:00 +00:00
Compare
CoreRasurae dismissed freemo's review 2026-03-28 10:36:00 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Reviewed commit: fa4380cafeat(db): add correction_attempts table per specification DDL
Issue: #920
Scope: All code changes on branch feat/correction-attempts-table plus close connections to surrounding code.
Review method: 3 full review cycles across all categories (bug detection, correctness, security, test coverage/flaws, performance, maintainability), iterating until no new findings emerged.


Summary

Overall, this is a well-structured PR that follows established codebase conventions. The ORM model, Alembic migration, repository CRUD layer, and domain model all align with the canonical spec DDL (Block #2, lines 45580–45593). The 13 BDD scenarios and 5 Robot tests provide solid coverage for the happy paths. The issues found are primarily around input validation gaps in the repository layer and a test infrastructure inconsistency in the Robot helper.

No security vulnerabilities were found. No critical-severity issues were found.


Findings by Severity

MEDIUM

M1 — Bug/Correctness: update_state() accepts completed_at as unvalidated raw string

  • Files: repositories.py:5754, models.py:3048–3049
  • Detail: The completed_at: str | None parameter in update_state() is written directly to the DB without any format validation. However, to_domain() later calls datetime.fromisoformat(raw_completed) to parse it back. If a malformed timestamp string is persisted, the write succeeds silently, but subsequent reads raise ValueError — a latent data corruption pattern where the error surfaces at read time, not write time.
  • Recommendation: Either accept datetime | None instead of str | None (converting internally), or validate the string as ISO-8601 before persisting.

M2 — Bug/Correctness: update_state() state parameter lacks enum validation

  • Files: repositories.py:5750–5753
  • Detail: The state: str parameter accepts arbitrary strings. The DB CHECK constraint (ck_correction_attempts_state) catches invalid values at flush time, but the error surfaces as a generic DatabaseError, not a clear validation error. The CorrectionAttemptState enum exists in the domain model but is not used for input validation in the repository.
  • Recommendation: Validate the state parameter against CorrectionAttemptState values before writing, or accept the enum type directly.

M3 — Test Flaw: Robot helper missing SQLite foreign key enforcement

  • File: robot/helper_correction_attempt_persistence.py:73–75
  • Detail: The _setup() function creates an in-memory SQLite engine without enabling PRAGMA foreign_keys=ON. The BDD tests (features/steps/correction_attempt_persistence_steps.py:65–68) correctly enable FK enforcement via @event.listens_for(engine, "connect"). Without this pragma, the Robot tests silently skip all FK constraint validation — any invalid plan_id or decision_id would be accepted without error.
  • Recommendation: Add the same @event.listens_for FK pragma to the Robot helper's _setup() function.

M4 — Bug/Performance: database_retry retries non-transient custom errors

  • Files: repositories.py:5627–5631, retry_patterns.py:128–131
  • Detail: Both CorrectionAttemptNotFoundError and DuplicateCorrectionAttemptError extend DatabaseError. The database_retry decorator retries all DatabaseError subclasses, causing 3 retries with 0.5s waits for deterministic failures (not-found, duplicate). This adds ~1 second of unnecessary latency to these error paths.
  • Note: This is a pre-existing pattern affecting all repositories in the codebase (e.g., CheckpointNotFoundError, DecisionNotFoundError, etc.). Not introduced by this PR, but the new code perpetuates it.
  • Recommendation: For a future cross-cutting fix, consider a non-retryable base class for domain-specific errors, or use retry_if_not_exception_type exclusions.

LOW

L1 — Missing Validation: guidance field lacks max_length and empty-string validation

  • File: correction.py:374–376
  • Detail: CorrectionAttemptRecord.guidance is required (...) but has no max_length constraint and no empty-string validator. The sibling model CorrectionRequest.guidance (line 74–76) has max_length=10_000 and a _guidance_stripped validator. In the same CorrectionAttemptRecord, plan_id and original_decision_id DO have empty-string validators, making this an inconsistency.
  • Recommendation: Add max_length=10_000 to match CorrectionRequest and add a non-empty/strip validator.

L2 — Code Smell: Redundant hasattr checks in from_domain()

  • File: models.py:3083–3090
  • Detail: record.mode.value if hasattr(record.mode, "value") else str(record.mode) — since record.mode is typed as CorrectionMode (a StrEnum), .value is always available. Same for record.state. The hasattr guard suggests type unsafety that doesn't exist per the method's type signature.
  • Recommendation: Remove hasattr guards; use .value directly.

L3 — Spec Deviation: Missing server_default for created_at

  • Files: models.py:3017, m8_001_correction_attempts_table.py:58
  • Detail: Spec DDL defines created_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%f', 'now')). The implementation has no server_default. Python code always provides the value via default_factory, so there is no runtime impact for ORM operations. However, raw SQL inserts without created_at would fail with a NOT NULL violation rather than using a server-side default.
  • Recommendation: Add server_default=sa.text("(strftime('%Y-%m-%dT%H:%M:%f', 'now'))") to match the spec DDL.

L4 — Test Gap: No test for invalid state values

  • Detail: No BDD or Robot scenario verifies behavior when an invalid state string (e.g., "invalid") is passed to update_state(). The CHECK constraint error path is unverified.
  • Recommendation: Add a negative test scenario.

L5 — Test Gap: No test for FK constraint violation on new_decision_id

  • Detail: No test verifies that setting new_decision_id to a non-existent decision ID raises an appropriate error.
  • Recommendation: Add a negative test scenario.

L6 — Test Gap: list_by_plan() with zero results not explicitly tested

  • Detail: No BDD scenario tests list_by_plan() returning an empty list for a plan with no correction attempts.
  • Recommendation: Add an empty-result scenario.

L7 — Performance: list_by_plan() has no pagination support

  • File: repositories.py:5721–5745
  • Detail: Returns all results in one query with no limit/offset parameters. Unlikely to be a practical issue (plans rarely accumulate many correction attempts), but inconsistent with production-grade query patterns.
  • Recommendation: Add optional limit/offset parameters for future-proofing.

Findings NOT Raised (Confirmed Correct)

For completeness, the following areas were reviewed and found to be correct:

  • Alembic migration chain: m8_001 correctly chains from m4_003_plan_env_columns (verified as current head). upgrade() and downgrade() are symmetric.
  • FK references: v3_plans.plan_id and decisions.decision_id match the actual table names in the codebase (the spec uses logical name plans).
  • ondelete semantics: CASCADE on plan_id and original_decision_id, SET NULL on new_decision_id — correctly match the correction workflow semantics.
  • CHECK constraints: Both ck_correction_attempts_mode and ck_correction_attempts_state correctly match the spec enum values.
  • Index: idx_corrections_plan on plan_id matches the spec.
  • __init__.py exports: CorrectionAttemptRecord and CorrectionAttemptState correctly added.
  • unit_of_work.py integration: correction_attempts property correctly follows the session-factory pattern.
  • Domain model design: CorrectionAttemptRecord is correctly separated from the existing CorrectionAttempt and CorrectionRequest models (different abstraction levels).
  • Session management pattern: Follows Gen 2 (session-factory) pattern consistent with DecisionRepository, LifecyclePlanRepository, etc.
  • Security: No SQL injection risk (all queries use parameterized SQLAlchemy). No sensitive data exposure. archived_artifacts_path is persistence-only (security depends on downstream consumers).
  • Spec DDL alignment: Implementation correctly follows canonical DDL Block #2 (line 45580), not the earlier draft DDL Block #1 (line 18783) which has different column names/states.
  • String(30) for timestamps: Follows established codebase convention (used in LifecyclePlanModel, ActionModel, DecisionModel, etc.). SQLite does not enforce length limits. A potential portability concern for PostgreSQL, but pre-existing and out of scope for this PR.
# Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Reviewed commit:** `fa4380ca` — `feat(db): add correction_attempts table per specification DDL` **Issue:** #920 **Scope:** All code changes on branch `feat/correction-attempts-table` plus close connections to surrounding code. **Review method:** 3 full review cycles across all categories (bug detection, correctness, security, test coverage/flaws, performance, maintainability), iterating until no new findings emerged. --- ## Summary Overall, this is a well-structured PR that follows established codebase conventions. The ORM model, Alembic migration, repository CRUD layer, and domain model all align with the canonical spec DDL (Block #2, lines 45580–45593). The 13 BDD scenarios and 5 Robot tests provide solid coverage for the happy paths. The issues found are primarily around **input validation gaps** in the repository layer and a **test infrastructure inconsistency** in the Robot helper. No security vulnerabilities were found. No critical-severity issues were found. --- ## Findings by Severity ### MEDIUM #### M1 — Bug/Correctness: `update_state()` accepts `completed_at` as unvalidated raw string - **Files:** `repositories.py:5754`, `models.py:3048–3049` - **Detail:** The `completed_at: str | None` parameter in `update_state()` is written directly to the DB without any format validation. However, `to_domain()` later calls `datetime.fromisoformat(raw_completed)` to parse it back. If a malformed timestamp string is persisted, the write succeeds silently, but **subsequent reads raise `ValueError`** — a latent data corruption pattern where the error surfaces at read time, not write time. - **Recommendation:** Either accept `datetime | None` instead of `str | None` (converting internally), or validate the string as ISO-8601 before persisting. #### M2 — Bug/Correctness: `update_state()` state parameter lacks enum validation - **Files:** `repositories.py:5750–5753` - **Detail:** The `state: str` parameter accepts arbitrary strings. The DB CHECK constraint (`ck_correction_attempts_state`) catches invalid values at flush time, but the error surfaces as a generic `DatabaseError`, not a clear validation error. The `CorrectionAttemptState` enum exists in the domain model but is not used for input validation in the repository. - **Recommendation:** Validate the `state` parameter against `CorrectionAttemptState` values before writing, or accept the enum type directly. #### M3 — Test Flaw: Robot helper missing SQLite foreign key enforcement - **File:** `robot/helper_correction_attempt_persistence.py:73–75` - **Detail:** The `_setup()` function creates an in-memory SQLite engine without enabling `PRAGMA foreign_keys=ON`. The BDD tests (`features/steps/correction_attempt_persistence_steps.py:65–68`) correctly enable FK enforcement via `@event.listens_for(engine, "connect")`. Without this pragma, the Robot tests silently skip all FK constraint validation — any invalid `plan_id` or `decision_id` would be accepted without error. - **Recommendation:** Add the same `@event.listens_for` FK pragma to the Robot helper's `_setup()` function. #### M4 — Bug/Performance: `database_retry` retries non-transient custom errors - **Files:** `repositories.py:5627–5631`, `retry_patterns.py:128–131` - **Detail:** Both `CorrectionAttemptNotFoundError` and `DuplicateCorrectionAttemptError` extend `DatabaseError`. The `database_retry` decorator retries all `DatabaseError` subclasses, causing 3 retries with 0.5s waits for deterministic failures (not-found, duplicate). This adds ~1 second of unnecessary latency to these error paths. - **Note:** This is a **pre-existing pattern** affecting all repositories in the codebase (e.g., `CheckpointNotFoundError`, `DecisionNotFoundError`, etc.). Not introduced by this PR, but the new code perpetuates it. - **Recommendation:** For a future cross-cutting fix, consider a non-retryable base class for domain-specific errors, or use `retry_if_not_exception_type` exclusions. --- ### LOW #### L1 — Missing Validation: `guidance` field lacks `max_length` and empty-string validation - **File:** `correction.py:374–376` - **Detail:** `CorrectionAttemptRecord.guidance` is required (`...`) but has no `max_length` constraint and no empty-string validator. The sibling model `CorrectionRequest.guidance` (line 74–76) has `max_length=10_000` and a `_guidance_stripped` validator. In the same `CorrectionAttemptRecord`, `plan_id` and `original_decision_id` DO have empty-string validators, making this an inconsistency. - **Recommendation:** Add `max_length=10_000` to match `CorrectionRequest` and add a non-empty/strip validator. #### L2 — Code Smell: Redundant `hasattr` checks in `from_domain()` - **File:** `models.py:3083–3090` - **Detail:** `record.mode.value if hasattr(record.mode, "value") else str(record.mode)` — since `record.mode` is typed as `CorrectionMode` (a `StrEnum`), `.value` is always available. Same for `record.state`. The `hasattr` guard suggests type unsafety that doesn't exist per the method's type signature. - **Recommendation:** Remove `hasattr` guards; use `.value` directly. #### L3 — Spec Deviation: Missing `server_default` for `created_at` - **Files:** `models.py:3017`, `m8_001_correction_attempts_table.py:58` - **Detail:** Spec DDL defines `created_at TEXT NOT NULL DEFAULT (strftime('%Y-%m-%dT%H:%M:%f', 'now'))`. The implementation has no `server_default`. Python code always provides the value via `default_factory`, so there is no runtime impact for ORM operations. However, raw SQL inserts without `created_at` would fail with a NOT NULL violation rather than using a server-side default. - **Recommendation:** Add `server_default=sa.text("(strftime('%Y-%m-%dT%H:%M:%f', 'now'))")` to match the spec DDL. #### L4 — Test Gap: No test for invalid state values - **Detail:** No BDD or Robot scenario verifies behavior when an invalid state string (e.g., `"invalid"`) is passed to `update_state()`. The CHECK constraint error path is unverified. - **Recommendation:** Add a negative test scenario. #### L5 — Test Gap: No test for FK constraint violation on `new_decision_id` - **Detail:** No test verifies that setting `new_decision_id` to a non-existent decision ID raises an appropriate error. - **Recommendation:** Add a negative test scenario. #### L6 — Test Gap: `list_by_plan()` with zero results not explicitly tested - **Detail:** No BDD scenario tests `list_by_plan()` returning an empty list for a plan with no correction attempts. - **Recommendation:** Add an empty-result scenario. #### L7 — Performance: `list_by_plan()` has no pagination support - **File:** `repositories.py:5721–5745` - **Detail:** Returns all results in one query with no `limit`/`offset` parameters. Unlikely to be a practical issue (plans rarely accumulate many correction attempts), but inconsistent with production-grade query patterns. - **Recommendation:** Add optional `limit`/`offset` parameters for future-proofing. --- ## Findings NOT Raised (Confirmed Correct) For completeness, the following areas were reviewed and found to be correct: - **Alembic migration chain:** `m8_001` correctly chains from `m4_003_plan_env_columns` (verified as current head). `upgrade()` and `downgrade()` are symmetric. - **FK references:** `v3_plans.plan_id` and `decisions.decision_id` match the actual table names in the codebase (the spec uses logical name `plans`). - **`ondelete` semantics:** `CASCADE` on `plan_id` and `original_decision_id`, `SET NULL` on `new_decision_id` — correctly match the correction workflow semantics. - **CHECK constraints:** Both `ck_correction_attempts_mode` and `ck_correction_attempts_state` correctly match the spec enum values. - **Index:** `idx_corrections_plan` on `plan_id` matches the spec. - **`__init__.py` exports:** `CorrectionAttemptRecord` and `CorrectionAttemptState` correctly added. - **`unit_of_work.py` integration:** `correction_attempts` property correctly follows the session-factory pattern. - **Domain model design:** `CorrectionAttemptRecord` is correctly separated from the existing `CorrectionAttempt` and `CorrectionRequest` models (different abstraction levels). - **Session management pattern:** Follows Gen 2 (session-factory) pattern consistent with `DecisionRepository`, `LifecyclePlanRepository`, etc. - **Security:** No SQL injection risk (all queries use parameterized SQLAlchemy). No sensitive data exposure. `archived_artifacts_path` is persistence-only (security depends on downstream consumers). - **Spec DDL alignment:** Implementation correctly follows canonical DDL Block #2 (line 45580), not the earlier draft DDL Block #1 (line 18783) which has different column names/states. - **`String(30)` for timestamps:** Follows established codebase convention (used in `LifecyclePlanModel`, `ActionModel`, `DecisionModel`, etc.). SQLite does not enforce length limits. A potential portability concern for PostgreSQL, but pre-existing and out of scope for this PR.
@ -0,0 +70,4 @@
Base.metadata.create_all(engine)
sm = sessionmaker(bind=engine)
session = sm()
factory: Callable[[], Session] = lambda: session # noqa: E731
Author
Member

[M3] This _setup() function does NOT enable PRAGMA foreign_keys=ON. Compare with the BDD test setup at features/steps/correction_attempt_persistence_steps.py:65-68 which correctly uses @event.listens_for(engine, 'connect') to enable FK enforcement. Without this pragma, SQLite silently ignores all FK constraints — any invalid plan_id or decision_id would be accepted.

**[M3]** This `_setup()` function does NOT enable `PRAGMA foreign_keys=ON`. Compare with the BDD test setup at `features/steps/correction_attempt_persistence_steps.py:65-68` which correctly uses `@event.listens_for(engine, 'connect')` to enable FK enforcement. Without this pragma, SQLite silently ignores all FK constraints — any invalid `plan_id` or `decision_id` would be accepted.
Author
Member

[L1] Unlike the sibling CorrectionRequest.guidance (line 74-76) which has max_length=10_000 and a _guidance_stripped validator, this field has no max_length and no empty-string check. The plan_id and original_decision_id fields in this same model DO have empty-string validators, making this inconsistent.

**[L1]** Unlike the sibling `CorrectionRequest.guidance` (line 74-76) which has `max_length=10_000` and a `_guidance_stripped` validator, this field has no `max_length` and no empty-string check. The `plan_id` and `original_decision_id` fields in this same model DO have empty-string validators, making this inconsistent.
Author
Member

[L2] The hasattr(record.mode, 'value') guard is redundant — record.mode is typed as CorrectionMode (a StrEnum) which always has .value. Same for record.state. This defensive pattern suggests type unsafety that doesn't actually exist. Consider using .value directly.

**[L2]** The `hasattr(record.mode, 'value')` guard is redundant — `record.mode` is typed as `CorrectionMode` (a `StrEnum`) which always has `.value`. Same for `record.state`. This defensive pattern suggests type unsafety that doesn't actually exist. Consider using `.value` directly.
Author
Member

[M4] Both CorrectionAttemptNotFoundError and DuplicateCorrectionAttemptError extend DatabaseError, which is the exception type database_retry retries on (3 attempts, 0.5s waits). This means deterministic failures like not-found and duplicate-ID errors are retried unnecessarily, adding ~1s latency. Pre-existing codebase pattern, but worth noting for a future cross-cutting fix.

**[M4]** Both `CorrectionAttemptNotFoundError` and `DuplicateCorrectionAttemptError` extend `DatabaseError`, which is the exception type `database_retry` retries on (3 attempts, 0.5s waits). This means deterministic failures like not-found and duplicate-ID errors are retried unnecessarily, adding ~1s latency. Pre-existing codebase pattern, but worth noting for a future cross-cutting fix.
Author
Member

[M2] The state: str parameter accepts arbitrary strings. The CorrectionAttemptState enum exists in the domain model but isn't used for validation here. Invalid values will hit the DB CHECK constraint at flush time, producing a generic DatabaseError instead of a clear validation error. Consider validating against CorrectionAttemptState before writing.

**[M2]** The `state: str` parameter accepts arbitrary strings. The `CorrectionAttemptState` enum exists in the domain model but isn't used for validation here. Invalid values will hit the DB CHECK constraint at flush time, producing a generic `DatabaseError` instead of a clear validation error. Consider validating against `CorrectionAttemptState` before writing.
Author
Member

[M1] completed_at: str | None is written directly to the DB without format validation, but to_domain() later calls datetime.fromisoformat() on the stored value. A malformed string would persist successfully but cause ValueError on subsequent reads — a latent data corruption pattern. Consider accepting datetime | None and converting internally, or validating ISO-8601 format here.

**[M1]** `completed_at: str | None` is written directly to the DB without format validation, but `to_domain()` later calls `datetime.fromisoformat()` on the stored value. A malformed string would persist successfully but cause `ValueError` on subsequent reads — a latent data corruption pattern. Consider accepting `datetime | None` and converting internally, or validating ISO-8601 format here.
CoreRasurae force-pushed feat/correction-attempts-table from fa4380cac2
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m13s
CI / quality (pull_request) Successful in 4m20s
CI / integration_tests (pull_request) Successful in 9m5s
CI / e2e_tests (pull_request) Successful in 9m2s
CI / unit_tests (pull_request) Successful in 9m17s
CI / docker (pull_request) Successful in 1m14s
CI / coverage (pull_request) Successful in 11m30s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to 7e50f679d8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 40s
CI / typecheck (pull_request) Successful in 53s
CI / security (pull_request) Successful in 53s
CI / build (pull_request) Successful in 26s
CI / quality (pull_request) Successful in 3m51s
CI / integration_tests (pull_request) Successful in 8m41s
CI / unit_tests (pull_request) Successful in 8m54s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Successful in 8m55s
CI / coverage (pull_request) Successful in 11m27s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 11:21:42 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Scope: Commit 7e50f67 by Luis Mendes on branch feat/correction-attempts-table
Issue: #920feat(db): implement missing correction_attempts table per spec DDL
Reviewed against: Specification DDL (canonical, docs/specification.md line 45579-45593), issue acceptance criteria, and surrounding codebase patterns.
Review cycles: 4 full passes across all categories (bugs, security, performance, test coverage, test flaws, spec compliance, design).


Overall Assessment

The implementation is solid and well-structured. It correctly follows the canonical spec DDL, maintains consistency with existing repository patterns (ADR-007 session-factory, @database_retry), and includes comprehensive BDD+Robot test coverage. The domain model, ORM model, Alembic migration, and repository CRUD layer are all well-aligned.

The findings below are organized by severity and category. None are blockers, but several are worth addressing or acknowledging before merge.


Findings

HIGH Severity

H1 — Bug: Timezone inconsistency in created_at round-trip

File: src/cleveragents/infrastructure/database/models.py:3065
Category: Bug

The to_domain() method parses created_at via datetime.fromisoformat(). When the domain model provides the value (timezone-aware UTC from datetime.now(UTC)), the round-trip preserves timezone info. However, if the SQLite server_default (strftime('%Y-%m-%dT%H:%M:%f', 'now')) is ever exercised, it produces a naive datetime string (no timezone). Parsing this via fromisoformat() yields a naive datetime, creating a potential timezone-aware vs. naive inconsistency in CorrectionAttemptRecord.created_at.

While the normal code path via from_domain() always provides an explicit timezone-aware value (making the server_default a fallback-only path), any code that compares created_at across records could encounter TypeError: can't compare offset-naive and offset-aware datetimes if one record went through the server_default path.

Suggested fix: In to_domain(), if the parsed created_at is naive, assume UTC:

dt = datetime.fromisoformat(cast(str, self.created_at))
if dt.tzinfo is None:
    dt = dt.replace(tzinfo=UTC)

H2 — Bug/Performance: database_retry retries deterministic NotFound/Duplicate errors

File: src/cleveragents/infrastructure/database/repositories.py:5630-5634
Category: Bug / Performance

CorrectionAttemptNotFoundError and DuplicateCorrectionAttemptError inherit from DatabaseError. The @database_retry decorator retries on DatabaseError (via retry_if_exception_type which uses isinstance, matching subclasses). This causes deterministic, non-transient errors (not-found, duplicate) to be retried 3 times with 0.5s waits — adding 1 second of unnecessary latency to every such error.

Note: This is a pre-existing pattern shared by DecisionNotFoundError, CheckpointNotFoundError, etc. across all repositories. It is not introduced by this PR but is replicated here. Flagging it for awareness.

Suggested fix (project-wide): Either (a) make NotFound/Duplicate exceptions inherit from a non-retryable base, or (b) add a retry_if_not_exception_type clause to exclude deterministic errors.


MEDIUM Severity

M1 — Bug: update_state() cannot clear optional fields to None

File: src/cleveragents/infrastructure/database/repositories.py:5789-5795
Category: Bug

The update_state() method uses None as the "don't update" sentinel for completed_at, new_decision_id, and archived_artifacts_path. This means it is impossible to set these fields back to None after they have been set (e.g., unlinking a new_decision_id).

if completed_at is not None:      # can never clear to None
    mutable.completed_at = ...
if new_decision_id is not None:    # can never clear to None
    mutable.new_decision_id = ...

Suggested fix: Use a sentinel object (_UNSET = object()) or accept explicit flags (e.g., clear_completed_at: bool = False).

M2 — Code quality: Redundant condition in duplicate detection

File: src/cleveragents/infrastructure/database/repositories.py:5677
Category: Code Quality

if "UNIQUE" in str(exc).upper() or "unique" in str(exc).lower():

The second condition ("unique" in str(exc).lower()) is logically redundant — str(exc).upper() already handles case-insensitive matching. Not harmful but indicates copy-paste without simplification.

M3 — Design: No state transition validation in update_state()

File: src/cleveragents/infrastructure/database/repositories.py:5753-5804
Category: Design

The repository's update_state() accepts any state transition without validation. The spec defines a clear lifecycle (pending → executing → complete|failed), but invalid transitions like complete → pending or failed → executing are silently accepted. While enforcement could live at the service layer, a simple guard here would prevent data corruption.

Suggested fix: Add a _VALID_TRANSITIONS map and validate before mutating:

_VALID_TRANSITIONS = {
    CorrectionAttemptState.PENDING: {CorrectionAttemptState.EXECUTING},
    CorrectionAttemptState.EXECUTING: {CorrectionAttemptState.COMPLETE, CorrectionAttemptState.FAILED},
}

M4 — Robustness: Missing session.rollback() in read-path error handlers

File: src/cleveragents/infrastructure/database/repositories.py:5716-5718, 5745-5748
Category: Robustness

The get() and list_by_plan() methods do not call session.rollback() when catching OperationalError/SQLAlchemyDatabaseError, unlike the write methods (create(), update_state(), delete()) which do. This can leave the session in an indeterminate state after a transient read error.

Note: This inconsistency exists in some other repositories too (e.g., DecisionRepository.get_tree() omits rollback), while others include it (e.g., CheckpointRepository.get() does rollback). Consider aligning for consistency.

M5 — Robustness: Missing FK violation differentiation in create()

File: src/cleveragents/infrastructure/database/repositories.py:5675-5683
Category: Robustness

The IntegrityError handler only distinguishes UNIQUE constraint violations (mapping to DuplicateCorrectionAttemptError). Other integrity errors — FK violations for non-existent plan_id, original_decision_id, or new_decision_id — fall through to a generic DatabaseError with an unhelpful message. Distinguishing FK violations would improve debuggability.


LOW Severity

L1 — Test coverage gap: No test for guidance max_length (10,000 chars)

File: features/correction_attempt_persistence.feature
Category: Test Coverage

The domain model specifies max_length=10_000 on guidance but no test verifies that exceeding this limit is rejected.

L2 — Test coverage gap: No test for invalid mode/state values at DB level

File: features/correction_attempt_persistence.feature
Category: Test Coverage

No test verifies that the CHECK constraints (ck_correction_attempts_mode, ck_correction_attempts_state) reject invalid values at the database level when bypassing the domain model.

L3 — Test coverage gap: No test for FK constraint violations

File: features/correction_attempt_persistence.feature
Category: Test Coverage

No test verifies behavior when creating a correction attempt with a non-existent plan_id or decision_id (FK violation path in create()).

L4 — Test coverage gap: No test for to_domain() with malformed DB data

File: features/correction_attempt_persistence.feature
Category: Test Coverage

No test verifies behavior when the DB contains values that don't map to domain enums (e.g., mode='invalid'). The to_domain() method would raise ValueError from CorrectionMode(...) — this path is untested.

L5 — Test flaw: Robot helper cmd_list_by_plan relies on implicit timestamp ordering

File: robot/helper_correction_attempt_persistence.py:185-196
Category: Test Flaw

Unlike the BDD test (which uses deterministic timestamps via timedelta), the Robot helper creates records in a tight loop relying on datetime.now(UTC) natural ordering. While unlikely to fail due to microsecond precision, explicit timestamps would make the test more robust and consistent with the BDD approach.


INFORMATIONAL

I1 — The mutable: Any = row type-erasure pattern

File: src/cleveragents/infrastructure/database/repositories.py:5788

The update_state() method uses mutable: Any = row to bypass mypy's type checker when modifying SQLAlchemy model attributes. This is a known workaround but defeats type safety at the mutation boundary.

I2 — Spec has two divergent DDL definitions

The specification contains two correction_attempts DDL definitions:

  • Lines 18782-18797 (Decision Tree Storage Schema): Uses attempt_id as PK, status column with value 'completed', original_subtree_snapshot, correction_reason
  • Lines 45579-45593 (Core tables DDL): Uses correction_attempt_id as PK, state column with value 'complete', mode, guidance, archived_artifacts_path

The implementation correctly follows the canonical DDL (line 45579). The spec inconsistency is worth flagging to the spec maintainer.

I3 — Migration naming vs milestone

The Alembic migration is named m8_001_correction_attempts_table but the issue belongs to milestone v3.5.0 (M6). The m8_ prefix suggests M8, which is a minor naming inconsistency.


Summary Table

ID Severity Category Description
H1 HIGH Bug Timezone-naive vs timezone-aware inconsistency in created_at round-trip
H2 HIGH Bug/Perf database_retry retries deterministic not-found/duplicate errors (pre-existing pattern)
M1 MEDIUM Bug update_state() cannot clear optional fields to None
M2 MEDIUM Code Quality Redundant or condition in duplicate detection
M3 MEDIUM Design No state transition validation in update_state()
M4 MEDIUM Robustness Missing session.rollback() in read-path error handlers
M5 MEDIUM Robustness FK violation not differentiated from UNIQUE in create()
L1 LOW Test Coverage No test for guidance max_length
L2 LOW Test Coverage No test for invalid mode/state at DB CHECK constraint level
L3 LOW Test Coverage No test for FK constraint violations
L4 LOW Test Coverage No test for to_domain() with malformed DB data
L5 LOW Test Flaw Robot helper relies on implicit timestamp ordering
I1 INFO Code Quality mutable: Any type-erasure pattern
I2 INFO Spec Two divergent DDL definitions in specification
I3 INFO Naming Migration prefix m8_ vs actual milestone M6

Review performed over 4 global cycles covering: bugs, security, performance, test coverage, test flaws, spec compliance, and design. No tests were executed during this review.

# Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Scope**: Commit `7e50f67` by Luis Mendes on branch `feat/correction-attempts-table` **Issue**: #920 — `feat(db): implement missing correction_attempts table per spec DDL` **Reviewed against**: Specification DDL (canonical, `docs/specification.md` line 45579-45593), issue acceptance criteria, and surrounding codebase patterns. **Review cycles**: 4 full passes across all categories (bugs, security, performance, test coverage, test flaws, spec compliance, design). --- ## Overall Assessment The implementation is solid and well-structured. It correctly follows the canonical spec DDL, maintains consistency with existing repository patterns (ADR-007 session-factory, `@database_retry`), and includes comprehensive BDD+Robot test coverage. The domain model, ORM model, Alembic migration, and repository CRUD layer are all well-aligned. The findings below are organized by severity and category. None are blockers, but several are worth addressing or acknowledging before merge. --- ## Findings ### HIGH Severity #### H1 — Bug: Timezone inconsistency in `created_at` round-trip **File**: `src/cleveragents/infrastructure/database/models.py:3065` **Category**: Bug The `to_domain()` method parses `created_at` via `datetime.fromisoformat()`. When the domain model provides the value (timezone-aware UTC from `datetime.now(UTC)`), the round-trip preserves timezone info. However, if the SQLite `server_default` (`strftime('%Y-%m-%dT%H:%M:%f', 'now')`) is ever exercised, it produces a naive datetime string (no timezone). Parsing this via `fromisoformat()` yields a **naive** `datetime`, creating a potential timezone-aware vs. naive inconsistency in `CorrectionAttemptRecord.created_at`. While the normal code path via `from_domain()` always provides an explicit timezone-aware value (making the `server_default` a fallback-only path), any code that compares `created_at` across records could encounter `TypeError: can't compare offset-naive and offset-aware datetimes` if one record went through the server_default path. **Suggested fix**: In `to_domain()`, if the parsed `created_at` is naive, assume UTC: ```python dt = datetime.fromisoformat(cast(str, self.created_at)) if dt.tzinfo is None: dt = dt.replace(tzinfo=UTC) ``` #### H2 — Bug/Performance: `database_retry` retries deterministic `NotFound`/`Duplicate` errors **File**: `src/cleveragents/infrastructure/database/repositories.py:5630-5634` **Category**: Bug / Performance `CorrectionAttemptNotFoundError` and `DuplicateCorrectionAttemptError` inherit from `DatabaseError`. The `@database_retry` decorator retries on `DatabaseError` (via `retry_if_exception_type` which uses `isinstance`, matching subclasses). This causes deterministic, non-transient errors (not-found, duplicate) to be retried 3 times with 0.5s waits — adding **1 second of unnecessary latency** to every such error. **Note**: This is a **pre-existing pattern** shared by `DecisionNotFoundError`, `CheckpointNotFoundError`, etc. across all repositories. It is not introduced by this PR but is replicated here. Flagging it for awareness. **Suggested fix (project-wide)**: Either (a) make `NotFound`/`Duplicate` exceptions inherit from a non-retryable base, or (b) add a `retry_if_not_exception_type` clause to exclude deterministic errors. --- ### MEDIUM Severity #### M1 — Bug: `update_state()` cannot clear optional fields to `None` **File**: `src/cleveragents/infrastructure/database/repositories.py:5789-5795` **Category**: Bug The `update_state()` method uses `None` as the "don't update" sentinel for `completed_at`, `new_decision_id`, and `archived_artifacts_path`. This means it is impossible to set these fields back to `None` after they have been set (e.g., unlinking a `new_decision_id`). ```python if completed_at is not None: # can never clear to None mutable.completed_at = ... if new_decision_id is not None: # can never clear to None mutable.new_decision_id = ... ``` **Suggested fix**: Use a sentinel object (`_UNSET = object()`) or accept explicit flags (e.g., `clear_completed_at: bool = False`). #### M2 — Code quality: Redundant condition in duplicate detection **File**: `src/cleveragents/infrastructure/database/repositories.py:5677` **Category**: Code Quality ```python if "UNIQUE" in str(exc).upper() or "unique" in str(exc).lower(): ``` The second condition (`"unique" in str(exc).lower()`) is logically redundant — `str(exc).upper()` already handles case-insensitive matching. Not harmful but indicates copy-paste without simplification. #### M3 — Design: No state transition validation in `update_state()` **File**: `src/cleveragents/infrastructure/database/repositories.py:5753-5804` **Category**: Design The repository's `update_state()` accepts any state transition without validation. The spec defines a clear lifecycle (`pending → executing → complete|failed`), but invalid transitions like `complete → pending` or `failed → executing` are silently accepted. While enforcement could live at the service layer, a simple guard here would prevent data corruption. **Suggested fix**: Add a `_VALID_TRANSITIONS` map and validate before mutating: ```python _VALID_TRANSITIONS = { CorrectionAttemptState.PENDING: {CorrectionAttemptState.EXECUTING}, CorrectionAttemptState.EXECUTING: {CorrectionAttemptState.COMPLETE, CorrectionAttemptState.FAILED}, } ``` #### M4 — Robustness: Missing `session.rollback()` in read-path error handlers **File**: `src/cleveragents/infrastructure/database/repositories.py:5716-5718`, `5745-5748` **Category**: Robustness The `get()` and `list_by_plan()` methods do not call `session.rollback()` when catching `OperationalError`/`SQLAlchemyDatabaseError`, unlike the write methods (`create()`, `update_state()`, `delete()`) which do. This can leave the session in an indeterminate state after a transient read error. **Note**: This inconsistency exists in some other repositories too (e.g., `DecisionRepository.get_tree()` omits rollback), while others include it (e.g., `CheckpointRepository.get()` does rollback). Consider aligning for consistency. #### M5 — Robustness: Missing FK violation differentiation in `create()` **File**: `src/cleveragents/infrastructure/database/repositories.py:5675-5683` **Category**: Robustness The `IntegrityError` handler only distinguishes UNIQUE constraint violations (mapping to `DuplicateCorrectionAttemptError`). Other integrity errors — FK violations for non-existent `plan_id`, `original_decision_id`, or `new_decision_id` — fall through to a generic `DatabaseError` with an unhelpful message. Distinguishing FK violations would improve debuggability. --- ### LOW Severity #### L1 — Test coverage gap: No test for `guidance` max_length (10,000 chars) **File**: `features/correction_attempt_persistence.feature` **Category**: Test Coverage The domain model specifies `max_length=10_000` on guidance but no test verifies that exceeding this limit is rejected. #### L2 — Test coverage gap: No test for invalid mode/state values at DB level **File**: `features/correction_attempt_persistence.feature` **Category**: Test Coverage No test verifies that the CHECK constraints (`ck_correction_attempts_mode`, `ck_correction_attempts_state`) reject invalid values at the database level when bypassing the domain model. #### L3 — Test coverage gap: No test for FK constraint violations **File**: `features/correction_attempt_persistence.feature` **Category**: Test Coverage No test verifies behavior when creating a correction attempt with a non-existent `plan_id` or `decision_id` (FK violation path in `create()`). #### L4 — Test coverage gap: No test for `to_domain()` with malformed DB data **File**: `features/correction_attempt_persistence.feature` **Category**: Test Coverage No test verifies behavior when the DB contains values that don't map to domain enums (e.g., `mode='invalid'`). The `to_domain()` method would raise `ValueError` from `CorrectionMode(...)` — this path is untested. #### L5 — Test flaw: Robot helper `cmd_list_by_plan` relies on implicit timestamp ordering **File**: `robot/helper_correction_attempt_persistence.py:185-196` **Category**: Test Flaw Unlike the BDD test (which uses deterministic timestamps via `timedelta`), the Robot helper creates records in a tight loop relying on `datetime.now(UTC)` natural ordering. While unlikely to fail due to microsecond precision, explicit timestamps would make the test more robust and consistent with the BDD approach. --- ### INFORMATIONAL #### I1 — The `mutable: Any = row` type-erasure pattern **File**: `src/cleveragents/infrastructure/database/repositories.py:5788` The `update_state()` method uses `mutable: Any = row` to bypass mypy's type checker when modifying SQLAlchemy model attributes. This is a known workaround but defeats type safety at the mutation boundary. #### I2 — Spec has two divergent DDL definitions The specification contains two `correction_attempts` DDL definitions: - **Lines 18782-18797** (Decision Tree Storage Schema): Uses `attempt_id` as PK, `status` column with value `'completed'`, `original_subtree_snapshot`, `correction_reason` - **Lines 45579-45593** (Core tables DDL): Uses `correction_attempt_id` as PK, `state` column with value `'complete'`, `mode`, `guidance`, `archived_artifacts_path` The implementation correctly follows the canonical DDL (line 45579). The spec inconsistency is worth flagging to the spec maintainer. #### I3 — Migration naming vs milestone The Alembic migration is named `m8_001_correction_attempts_table` but the issue belongs to milestone v3.5.0 (M6). The `m8_` prefix suggests M8, which is a minor naming inconsistency. --- ## Summary Table | ID | Severity | Category | Description | |----|----------|----------|-------------| | H1 | HIGH | Bug | Timezone-naive vs timezone-aware inconsistency in `created_at` round-trip | | H2 | HIGH | Bug/Perf | `database_retry` retries deterministic not-found/duplicate errors (pre-existing pattern) | | M1 | MEDIUM | Bug | `update_state()` cannot clear optional fields to `None` | | M2 | MEDIUM | Code Quality | Redundant `or` condition in duplicate detection | | M3 | MEDIUM | Design | No state transition validation in `update_state()` | | M4 | MEDIUM | Robustness | Missing `session.rollback()` in read-path error handlers | | M5 | MEDIUM | Robustness | FK violation not differentiated from UNIQUE in `create()` | | L1 | LOW | Test Coverage | No test for guidance max_length | | L2 | LOW | Test Coverage | No test for invalid mode/state at DB CHECK constraint level | | L3 | LOW | Test Coverage | No test for FK constraint violations | | L4 | LOW | Test Coverage | No test for `to_domain()` with malformed DB data | | L5 | LOW | Test Flaw | Robot helper relies on implicit timestamp ordering | | I1 | INFO | Code Quality | `mutable: Any` type-erasure pattern | | I2 | INFO | Spec | Two divergent DDL definitions in specification | | I3 | INFO | Naming | Migration prefix `m8_` vs actual milestone M6 | --- *Review performed over 4 global cycles covering: bugs, security, performance, test coverage, test flaws, spec compliance, and design. No tests were executed during this review.*
CoreRasurae force-pushed feat/correction-attempts-table from 7e50f679d8
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 40s
CI / typecheck (pull_request) Successful in 53s
CI / security (pull_request) Successful in 53s
CI / build (pull_request) Successful in 26s
CI / quality (pull_request) Successful in 3m51s
CI / integration_tests (pull_request) Successful in 8m41s
CI / unit_tests (pull_request) Successful in 8m54s
CI / docker (pull_request) Successful in 1m8s
CI / e2e_tests (pull_request) Successful in 8m55s
CI / coverage (pull_request) Successful in 11m27s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to 543d1308d0
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 4m3s
CI / security (pull_request) Successful in 4m5s
CI / quality (pull_request) Successful in 4m21s
CI / e2e_tests (pull_request) Successful in 8m58s
CI / integration_tests (pull_request) Successful in 9m6s
CI / unit_tests (pull_request) Successful in 9m13s
CI / docker (pull_request) Successful in 1m5s
CI / coverage (pull_request) Successful in 11m56s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 55m29s
2026-03-28 11:55:16 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Scope: All code changes in branch feat/correction-attempts-table (commit 543d130) plus close connections to surrounding code.
Methodology: Three global review cycles across all categories (bugs, spec compliance, security, performance, test coverage, design).
Reference: Issue #920, Spec DDL at lines 45580-45593, docs/specification.md.


Medium Severity

M1 — [Logic] update_state() allows completed_at on non-terminal transitions

File: repositories.py:5765-5816

The update_state() method accepts completed_at regardless of the target state. A caller can set completed_at when transitioning from pendingexecuting, which is semantically incorrect — only terminal states (complete, failed) should record a completion timestamp.

Suggestion: Add a guard:

if completed_at is not None and state.value not in {"complete", "failed"}:
    raise InvalidCorrectionStateTransitionError(
        "completed_at can only be set for terminal states (complete, failed)"
    )

M2 — [Spec Compliance] ON DELETE CASCADE on original_decision_id FK may silently destroy correction audit history

Files: models.py:2987-2990, m8_001_correction_attempts_table.py:42-47

The spec DDL (line 45583) specifies REFERENCES decisions(decision_id) with no ON DELETE clause. The implementation adds ondelete="CASCADE", meaning if the original decision is ever deleted (e.g., during subtree recomputation or a correction itself), the correction attempt record — which is audit/history data — is silently deleted too.

Suggestion: Consider ondelete="RESTRICT" or ondelete="SET NULL" to preserve correction history. If CASCADE is intentional, document the rationale.


M3 — [Test Coverage] No BDD scenario for executing → failed state transition

File: correction_attempt_persistence.feature

The spec lifecycle defines executing → complete|failed, but only the executing → complete path is tested. The executing → failed transition (the error/failure path) has zero test coverage.

Suggestion: Add a scenario:

Scenario: Update correction attempt state to failed with timestamp
  Given a persisted correction attempt in executing state
  When I update the correction attempt to failed with timestamp
  Then the correction attempt state should be "failed"
  And the correction attempt completed_at should be set

M4 — [Design] State transition logic in repository (infrastructure) layer

File: repositories.py:5650-5653

The _VALID_TRANSITIONS dict encoding the spec lifecycle (pending → executing → complete|failed) is business logic placed in the repository (infrastructure layer). Per clean architecture and the project's ADR-007, business rules should reside in the domain layer.

Suggestion: Move transition validation into CorrectionAttemptRecord (e.g., a validate_transition(new_state) method) or a domain service. The repository would call it before persisting.


Low Severity

L1 — [Spec Compliance] ON DELETE behaviors are implementation additions not in spec

Files: models.py:2980-2998, m8_001_correction_attempts_table.py:32-51

Three ON DELETE clauses (CASCADE on plan_id, CASCADE on original_decision_id, SET NULL on new_decision_id) are not present in the spec DDL. While plan_id CASCADE and new_decision_id SET NULL are reasonable defaults, these should be documented as intentional deviations from the spec DDL.


L2 — [Test Coverage] No test for terminal state rejection

File: correction_attempt_persistence.feature

There is no scenario verifying that transitions FROM terminal states (complete, failed) are properly rejected. For example, attempting complete → pending or failed → executing should raise InvalidCorrectionStateTransitionError, but this is never tested.


L3 — [Test Coverage] No test for guidance validation edge cases

File: correction_attempt_persistence.feature

No scenario tests:

  • Empty/whitespace-only guidance rejection (the validator _guidance_not_empty exists but is untested by BDD)
  • max_length=10_000 enforcement (exceeding the limit should raise a validation error)

L4 — [Documentation] Repository module docstring not updated

File: repositories.py:1-50

The module-level docstring has two tables — Repositories (lines 8-14) and Custom Exceptions (lines 40-48) — that do not list CorrectionAttemptRepository, CorrectionAttemptNotFoundError, DuplicateCorrectionAttemptError, or InvalidCorrectionStateTransitionError.


L5 — [Robustness] plan_id/original_decision_id validators don't strip whitespace

File: correction.py:396-408

The _plan_id_not_empty and _original_decision_id_not_empty validators check for emptiness using .strip() but return the original (unstripped) value. A plan_id like " 01HV... " would pass validation and be stored with leading/trailing whitespace, potentially causing FK lookup failures.

Suggestion: Either return the stripped value (like _guidance_not_empty does) or remove the .strip() from the check for consistency.


L6 — [Robustness] create() FK violation produces generic error message

File: repositories.py:5685-5691

When create() hits an IntegrityError for an FK violation (e.g., plan_id doesn't exist in v3_plans), the error falls through the "UNIQUE" string check and raises a generic DatabaseError("Failed to create correction attempt: ..."). A more specific error message indicating which FK constraint was violated would aid debugging.


Summary

Severity Count Categories
Medium 4 Logic (1), Spec Compliance (1), Test Coverage (1), Design (1)
Low 6 Spec Compliance (1), Test Coverage (2), Documentation (1), Robustness (2)
Total 10

Overall the implementation is solid — the table schema, Alembic migration, ORM model, domain model, repository CRUD, and tests are well-structured and largely spec-compliant. The most actionable items are M1 (completed_at guard), M2 (CASCADE on original_decision_id), and M3 (missing failed-path test).

## Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Scope**: All code changes in branch `feat/correction-attempts-table` (commit `543d130`) plus close connections to surrounding code. **Methodology**: Three global review cycles across all categories (bugs, spec compliance, security, performance, test coverage, design). **Reference**: Issue #920, Spec DDL at lines 45580-45593, `docs/specification.md`. --- ### Medium Severity #### M1 — [Logic] `update_state()` allows `completed_at` on non-terminal transitions **File**: `repositories.py:5765-5816` The `update_state()` method accepts `completed_at` regardless of the target state. A caller can set `completed_at` when transitioning from `pending` → `executing`, which is semantically incorrect — only terminal states (`complete`, `failed`) should record a completion timestamp. **Suggestion**: Add a guard: ```python if completed_at is not None and state.value not in {"complete", "failed"}: raise InvalidCorrectionStateTransitionError( "completed_at can only be set for terminal states (complete, failed)" ) ``` --- #### M2 — [Spec Compliance] `ON DELETE CASCADE` on `original_decision_id` FK may silently destroy correction audit history **Files**: `models.py:2987-2990`, `m8_001_correction_attempts_table.py:42-47` The spec DDL (line 45583) specifies `REFERENCES decisions(decision_id)` with **no** ON DELETE clause. The implementation adds `ondelete="CASCADE"`, meaning if the original decision is ever deleted (e.g., during subtree recomputation or a correction itself), the correction attempt record — which is audit/history data — is silently deleted too. **Suggestion**: Consider `ondelete="RESTRICT"` or `ondelete="SET NULL"` to preserve correction history. If CASCADE is intentional, document the rationale. --- #### M3 — [Test Coverage] No BDD scenario for `executing → failed` state transition **File**: `correction_attempt_persistence.feature` The spec lifecycle defines `executing → complete|failed`, but only the `executing → complete` path is tested. The `executing → failed` transition (the error/failure path) has zero test coverage. **Suggestion**: Add a scenario: ```gherkin Scenario: Update correction attempt state to failed with timestamp Given a persisted correction attempt in executing state When I update the correction attempt to failed with timestamp Then the correction attempt state should be "failed" And the correction attempt completed_at should be set ``` --- #### M4 — [Design] State transition logic in repository (infrastructure) layer **File**: `repositories.py:5650-5653` The `_VALID_TRANSITIONS` dict encoding the spec lifecycle (`pending → executing → complete|failed`) is business logic placed in the repository (infrastructure layer). Per clean architecture and the project's ADR-007, business rules should reside in the domain layer. **Suggestion**: Move transition validation into `CorrectionAttemptRecord` (e.g., a `validate_transition(new_state)` method) or a domain service. The repository would call it before persisting. --- ### Low Severity #### L1 — [Spec Compliance] ON DELETE behaviors are implementation additions not in spec **Files**: `models.py:2980-2998`, `m8_001_correction_attempts_table.py:32-51` Three ON DELETE clauses (`CASCADE` on `plan_id`, `CASCADE` on `original_decision_id`, `SET NULL` on `new_decision_id`) are not present in the spec DDL. While `plan_id CASCADE` and `new_decision_id SET NULL` are reasonable defaults, these should be documented as intentional deviations from the spec DDL. --- #### L2 — [Test Coverage] No test for terminal state rejection **File**: `correction_attempt_persistence.feature` There is no scenario verifying that transitions FROM terminal states (`complete`, `failed`) are properly rejected. For example, attempting `complete → pending` or `failed → executing` should raise `InvalidCorrectionStateTransitionError`, but this is never tested. --- #### L3 — [Test Coverage] No test for `guidance` validation edge cases **File**: `correction_attempt_persistence.feature` No scenario tests: - Empty/whitespace-only guidance rejection (the validator `_guidance_not_empty` exists but is untested by BDD) - `max_length=10_000` enforcement (exceeding the limit should raise a validation error) --- #### L4 — [Documentation] Repository module docstring not updated **File**: `repositories.py:1-50` The module-level docstring has two tables — Repositories (lines 8-14) and Custom Exceptions (lines 40-48) — that do not list `CorrectionAttemptRepository`, `CorrectionAttemptNotFoundError`, `DuplicateCorrectionAttemptError`, or `InvalidCorrectionStateTransitionError`. --- #### L5 — [Robustness] `plan_id`/`original_decision_id` validators don't strip whitespace **File**: `correction.py:396-408` The `_plan_id_not_empty` and `_original_decision_id_not_empty` validators check for emptiness using `.strip()` but return the **original** (unstripped) value. A plan_id like `" 01HV... "` would pass validation and be stored with leading/trailing whitespace, potentially causing FK lookup failures. **Suggestion**: Either return the stripped value (like `_guidance_not_empty` does) or remove the `.strip()` from the check for consistency. --- #### L6 — [Robustness] `create()` FK violation produces generic error message **File**: `repositories.py:5685-5691` When `create()` hits an `IntegrityError` for an FK violation (e.g., `plan_id` doesn't exist in `v3_plans`), the error falls through the "UNIQUE" string check and raises a generic `DatabaseError("Failed to create correction attempt: ...")`. A more specific error message indicating which FK constraint was violated would aid debugging. --- ### Summary | Severity | Count | Categories | |----------|-------|------------| | **Medium** | 4 | Logic (1), Spec Compliance (1), Test Coverage (1), Design (1) | | **Low** | 6 | Spec Compliance (1), Test Coverage (2), Documentation (1), Robustness (2) | | **Total** | **10** | | Overall the implementation is solid — the table schema, Alembic migration, ORM model, domain model, repository CRUD, and tests are well-structured and largely spec-compliant. The most actionable items are **M1** (completed_at guard), **M2** (CASCADE on original_decision_id), and **M3** (missing failed-path test).
Author
Member

Code Review Follow-up — Fixes Applied

The following fixes from the code review have been applied and verified locally (commit c0a30614, pending push):

Applied Fixes

ID Category Fix Description
M1 Logic update_state() now rejects completed_at on non-terminal transitions (pending→executing). Raises InvalidCorrectionStateTransitionError if completed_at is provided for a non-terminal target state.
M3 Test Coverage Added BDD scenario for executing → failed state transition with timestamp verification.
M4 Design Extracted state transition validation from repository to domain layer. Added validate_correction_state_transition() function and CORRECTION_ATTEMPT_VALID_TRANSITIONS / CORRECTION_ATTEMPT_TERMINAL_STATES constants to correction.py. Repository now delegates to the domain function.
L2 Test Coverage Added BDD scenarios verifying terminal-state rejection (complete→executing, failed→pending).
L3 Test Coverage Added BDD scenarios for empty/whitespace-only guidance rejection at domain model level.
L4 Documentation Updated repositories.py module docstring tables to include CorrectionAttemptRepository and its three custom exceptions.
L6 Robustness Improved FK-violation error message in create() to identify the violated constraint and suggest checking plan_id/decision IDs.

Not Applied (with justification)

ID Category Reason Not Applied
M2 Spec Compliance ON DELETE CASCADE on original_decision_id: The spec DDL has no ON DELETE clause (defaults vary by SQL dialect). Changing this to RESTRICT or SET NULL would require an Alembic migration change and break the existing cascade deletion test. In practice, original decisions are superseded during corrections, not deleted, so the CASCADE path is unlikely to fire. The current behavior is defensive and consistent with plan_id CASCADE.
L1 Spec Compliance ON DELETE behaviors as implementation additions: Same reasoning as M2 — these are reasonable implementation defaults. plan_id CASCADE ensures cleanup, new_decision_id SET NULL preserves the record. Documenting these as intentional is handled via existing code comments.
L5 Robustness plan_id/original_decision_id validators not stripping whitespace: The existing CorrectionRequest model follows the same pattern (check-but-don't-strip). Changing only the new code would create inconsistency with pre-existing validators. ULIDs don't contain whitespace, making this a theoretical concern.

Verification

  • nox -s lintPASS
  • nox -s typecheckPASS (0 errors, 1 pre-existing warning)
  • nox -s unit_testsPASS (12,312 scenarios, 0 failures)
  • nox -s integration_testsPASS (1,704 tests, 0 failures)

Total: 21 BDD scenarios (up from 15) and 5 Robot integration tests.

## Code Review Follow-up — Fixes Applied The following fixes from the code review have been applied and verified locally (commit `c0a30614`, pending push): ### Applied Fixes | ID | Category | Fix Description | |----|----------|-----------------| | **M1** | Logic | `update_state()` now rejects `completed_at` on non-terminal transitions (`pending→executing`). Raises `InvalidCorrectionStateTransitionError` if `completed_at` is provided for a non-terminal target state. | | **M3** | Test Coverage | Added BDD scenario for `executing → failed` state transition with timestamp verification. | | **M4** | Design | Extracted state transition validation from repository to domain layer. Added `validate_correction_state_transition()` function and `CORRECTION_ATTEMPT_VALID_TRANSITIONS` / `CORRECTION_ATTEMPT_TERMINAL_STATES` constants to `correction.py`. Repository now delegates to the domain function. | | **L2** | Test Coverage | Added BDD scenarios verifying terminal-state rejection (`complete→executing`, `failed→pending`). | | **L3** | Test Coverage | Added BDD scenarios for empty/whitespace-only guidance rejection at domain model level. | | **L4** | Documentation | Updated `repositories.py` module docstring tables to include `CorrectionAttemptRepository` and its three custom exceptions. | | **L6** | Robustness | Improved FK-violation error message in `create()` to identify the violated constraint and suggest checking `plan_id`/decision IDs. | ### Not Applied (with justification) | ID | Category | Reason Not Applied | |----|----------|-------------------| | **M2** | Spec Compliance | `ON DELETE CASCADE` on `original_decision_id`: The spec DDL has no `ON DELETE` clause (defaults vary by SQL dialect). Changing this to `RESTRICT` or `SET NULL` would require an Alembic migration change and break the existing cascade deletion test. In practice, original decisions are _superseded_ during corrections, not deleted, so the CASCADE path is unlikely to fire. The current behavior is defensive and consistent with `plan_id` CASCADE. | | **L1** | Spec Compliance | ON DELETE behaviors as implementation additions: Same reasoning as M2 — these are reasonable implementation defaults. `plan_id CASCADE` ensures cleanup, `new_decision_id SET NULL` preserves the record. Documenting these as intentional is handled via existing code comments. | | **L5** | Robustness | `plan_id`/`original_decision_id` validators not stripping whitespace: The existing `CorrectionRequest` model follows the same pattern (check-but-don't-strip). Changing only the new code would create inconsistency with pre-existing validators. ULIDs don't contain whitespace, making this a theoretical concern. | ### Verification - `nox -s lint` — **PASS** - `nox -s typecheck` — **PASS** (0 errors, 1 pre-existing warning) - `nox -s unit_tests` — **PASS** (12,312 scenarios, 0 failures) - `nox -s integration_tests` — **PASS** (1,704 tests, 0 failures) Total: **21 BDD scenarios** (up from 15) and **5 Robot integration tests**.
CoreRasurae force-pushed feat/correction-attempts-table from 543d1308d0
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 23s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 4m3s
CI / security (pull_request) Successful in 4m5s
CI / quality (pull_request) Successful in 4m21s
CI / e2e_tests (pull_request) Successful in 8m58s
CI / integration_tests (pull_request) Successful in 9m6s
CI / unit_tests (pull_request) Successful in 9m13s
CI / docker (pull_request) Successful in 1m5s
CI / coverage (pull_request) Successful in 11m56s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 55m29s
to c0a306149f
Some checks failed
CI / lint (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 45s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 26s
CI / security (pull_request) Successful in 4m11s
CI / quality (pull_request) Successful in 3m48s
CI / unit_tests (pull_request) Successful in 7m1s
CI / integration_tests (pull_request) Successful in 7m20s
CI / docker (pull_request) Successful in 1m25s
CI / e2e_tests (pull_request) Successful in 9m37s
CI / coverage (pull_request) Successful in 13m56s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 13:24:49 +00:00
Compare
CoreRasurae left a comment

Code Review Report -- PR #1145 feat(db): add correction_attempts table per specification DDL

Reviewer: Automated code review (3 full review cycles)
Branch: feat/correction-attempts-table
Commit: c0a3061 by Luis Mendes
Issue: #920
Spec reference: docs/specification.md DDL at line 45579


Summary

The PR correctly implements the correction_attempts table from the spec DDL (Version B, line 45579) with proper ORM model, Alembic migration, repository layer, domain model, and comprehensive BDD/Robot tests. The implementation follows existing codebase patterns (session-factory, ADR-007, @database_retry). Overall quality is high. Below are 15 findings organized by severity and category from 3 independent review cycles.


Findings by Severity

MEDIUM (5)

B-1. Bug: update_state() missing specific IntegrityError handling for FK violations

File: repositories.py:5840

In create(), IntegrityError is caught explicitly and FK violations produce a targeted error message identifying which constraint was violated. In update_state(), an IntegrityError from setting new_decision_id to a non-existent decision falls through to the generic (OperationalError, SQLAlchemyDatabaseError) catch at line 5844, producing only "Failed to update correction attempt ..." without identifying the FK violation. Since IntegrityError is a subclass of SQLAlchemy's DatabaseError, it IS caught, but the diagnostic quality is significantly worse than in create().

Suggestion: Add an explicit except IntegrityError handler in update_state() matching the pattern in create().


B-2. Bug: Guidance validation inconsistency at domain boundary

File: correction.py:74-76 (existing) vs correction.py:414-416 (new)

The existing CorrectionRequest.guidance allows empty guidance (default="", validator only strips). The new CorrectionAttemptRecord.guidance is required and rejects empty strings. When the correction flow eventually converts CorrectionRequest to CorrectionAttemptRecord, an empty-guidance request would produce a ValidationError at the domain boundary rather than a clear business-rule error.

This also creates a semantic divergence between two closely related domain models in the same module: CorrectionRequest uses CorrectionStatus (7 states) while CorrectionAttemptRecord uses CorrectionAttemptState (4 states) -- with the record being stricter about guidance.

Suggestion: Either align CorrectionRequest to also reject empty guidance (matching the spec CLI where --guidance is required), or document the intended contract at the boundary.


B-3. Bug: Timestamp format inconsistency between Python and SQLite server_default

File: models.py:3018-3022 and models.py:3098

from_domain() stores timestamps as datetime.isoformat(), producing "2026-03-24T03:33:08.123456+00:00" (with UTC offset). The server_default produces "2026-03-24T03:33:08.123456" (no offset). list_by_plan() orders by created_at as a string column. If any records are inserted via direct SQL (using server_default), the mixed formats break lexicographic sort order because + (ASCII 43) sorts before . (ASCII 46).

In practice, all records currently go through from_domain(), so the server_default is never triggered. This is a latent defect.

Suggestion: Either make from_domain() strip the timezone offset to match the SQLite format, or make the server_default produce a timezone-aware string.


P-1. Performance: Missing index on original_decision_id

File: m8_001_correction_attempts_table.py

The correction workflow is likely to look up correction attempts by the original decision being corrected (e.g., "show all corrections for decision X"). Without an index on original_decision_id, this requires a full table scan. The spec DDL only defines idx_corrections_plan on plan_id, but the implementation should consider query patterns.

Suggestion: Add CREATE INDEX idx_corrections_decision ON correction_attempts(original_decision_id) if decision-based lookups are expected.


T-1. Test Coverage: Missing update_state() not-found scenario

File: features/correction_attempt_persistence.feature

There is a "Getting a non-existent correction attempt raises error" scenario for get(), but no equivalent for update_state(). The CorrectionAttemptNotFoundError path at repositories.py:5808-5811 is untested.

Suggestion: Add a BDD scenario: "Updating state of a non-existent correction attempt raises error".


LOW (8)

S-1. Spec Compliance: FK ON DELETE behaviors not in spec DDL

File: models.py:2982, models.py:2989, models.py:2996

The implementation adds ON DELETE CASCADE for plan_id and original_decision_id, and ON DELETE SET NULL for new_decision_id. The spec DDL (Version B, line 45579) specifies bare REFERENCES without ON DELETE clauses (defaulting to NO ACTION). These are reasonable additions but represent undocumented spec deviations.


S-2. Spec Compliance: Two conflicting DDL definitions in specification

File: docs/specification.md lines 18782 and 45579

The spec contains TWO different DDL definitions for correction_attempts. Version A (line 18782, "Correction history") uses attempt_id, correction_reason, original_subtree_snapshot, and status (with 'completed'). Version B (line 45579, "Data Storage") uses correction_attempt_id, guidance, archived_artifacts_path, and state (with 'complete'). The implementation correctly follows Version B, but the spec inconsistency may cause confusion for future developers.


B-4. Bug: update_state() cannot null-out optional fields

File: repositories.py:5836-5839

The None sentinel for new_decision_id and archived_artifacts_path means "don't change", making it impossible to explicitly set these fields back to NULL. If a correction's new decision becomes invalid and needs to be disassociated, the current API provides no mechanism.

Suggestion: Consider a dedicated sentinel (e.g., _UNSET = object()) to distinguish "don't change" from "set to NULL".


T-2. Test Coverage: Missing FK violation test for update_state()

File: features/correction_attempt_persistence.feature

Setting new_decision_id to a non-existent decision ID during update_state() would trigger an IntegrityError. This error path is untested.


T-3. Test Coverage: No guidance max-length boundary test

File: features/correction_attempt_persistence.feature

The max_length=10_000 constraint on CorrectionAttemptRecord.guidance is not exercised at the boundary (10,000 chars accepted, 10,001 rejected).


Q-1. Test Quality: step_check_guidance_error catches generic Exception

File: features/steps/correction_attempt_persistence_steps.py:642,652

The guidance validation steps catch except Exception rather than pydantic.ValidationError. The assertion at line 657 checks is not None but not the exception type. A different exception (e.g., TypeError) would falsely pass the test.

Suggestion: Catch pydantic.ValidationError specifically, or assert the exception type in step_check_guidance_error.


CQ-1. Code Quality: mutable: Any = row type safety bypass

File: repositories.py:5832

The mutable: Any = row pattern defeats type checking for the attribute assignments that follow. While functional, this masks potential type errors.

Suggestion: Use setattr(row, "state", state.value) or add type stubs for the ORM model to avoid the Any escape hatch.


CQ-2. Code Quality: Transition map uses strings instead of enum values

File: correction.py:351-354

CORRECTION_ATTEMPT_VALID_TRANSITIONS uses str keys and frozenset[str] values, while the API (validate_correction_state_transition) operates on CorrectionAttemptState enum instances and accesses .value. Using the enum directly as keys would make the mapping self-documenting and prevent typo-related bugs.


INFO (2)

P-2. Performance: list_by_plan() unbounded

File: repositories.py:5765

No LIMIT or pagination support. While correction attempts per plan should be few, a defensive limit would prevent accidental full-table fetches.


SEC-1. Security: No validation on archived_artifacts_path

File: correction.py:419-421

The path is stored as-is with no format, length, or character validation. If used later for filesystem operations, this could be a path traversal vector. The validation should be at the service layer, but a domain-level format check would add defense in depth.


Checklist vs Issue #920 Acceptance Criteria

Criterion Status
CorrectionAttemptModel SQLAlchemy model created PASS
All columns match spec DDL PASS
FK constraints to v3_plans and decisions PASS (with undocumented CASCADE/SET NULL additions)
Index idx_corrections_plan on plan_id PASS
Alembic migration created PASS
Repository/service layer for CRUD PASS
Existing tests pass Not verified in this review (per instructions)

Totals: 5 Medium, 8 Low, 2 Info across 7 categories (Bug Detection, Spec Compliance, Test Coverage, Test Quality, Performance, Security, Code Quality).

None of the findings are critical or blocking. The Medium items warrant attention before merge; the Low/Info items can be addressed at the author's discretion or tracked as follow-up.

# Code Review Report -- PR #1145 `feat(db): add correction_attempts table per specification DDL` **Reviewer:** Automated code review (3 full review cycles) **Branch:** `feat/correction-attempts-table` **Commit:** `c0a3061` by Luis Mendes **Issue:** #920 **Spec reference:** `docs/specification.md` DDL at line 45579 --- ## Summary The PR correctly implements the `correction_attempts` table from the spec DDL (Version B, line 45579) with proper ORM model, Alembic migration, repository layer, domain model, and comprehensive BDD/Robot tests. The implementation follows existing codebase patterns (session-factory, ADR-007, `@database_retry`). Overall quality is high. Below are 15 findings organized by severity and category from 3 independent review cycles. --- ## Findings by Severity ### MEDIUM (5) #### B-1. Bug: `update_state()` missing specific `IntegrityError` handling for FK violations **File:** `repositories.py:5840` In `create()`, `IntegrityError` is caught explicitly and FK violations produce a targeted error message identifying which constraint was violated. In `update_state()`, an `IntegrityError` from setting `new_decision_id` to a non-existent decision falls through to the generic `(OperationalError, SQLAlchemyDatabaseError)` catch at line 5844, producing only `"Failed to update correction attempt ..."` without identifying the FK violation. Since `IntegrityError` is a subclass of SQLAlchemy's `DatabaseError`, it IS caught, but the diagnostic quality is significantly worse than in `create()`. **Suggestion:** Add an explicit `except IntegrityError` handler in `update_state()` matching the pattern in `create()`. --- #### B-2. Bug: Guidance validation inconsistency at domain boundary **File:** `correction.py:74-76` (existing) vs `correction.py:414-416` (new) The existing `CorrectionRequest.guidance` allows empty guidance (`default=""`, validator only strips). The new `CorrectionAttemptRecord.guidance` is required and rejects empty strings. When the correction flow eventually converts `CorrectionRequest` to `CorrectionAttemptRecord`, an empty-guidance request would produce a `ValidationError` at the domain boundary rather than a clear business-rule error. This also creates a semantic divergence between two closely related domain models in the same module: `CorrectionRequest` uses `CorrectionStatus` (7 states) while `CorrectionAttemptRecord` uses `CorrectionAttemptState` (4 states) -- with the record being stricter about guidance. **Suggestion:** Either align `CorrectionRequest` to also reject empty guidance (matching the spec CLI where `--guidance` is required), or document the intended contract at the boundary. --- #### B-3. Bug: Timestamp format inconsistency between Python and SQLite `server_default` **File:** `models.py:3018-3022` and `models.py:3098` `from_domain()` stores timestamps as `datetime.isoformat()`, producing `"2026-03-24T03:33:08.123456+00:00"` (with UTC offset). The `server_default` produces `"2026-03-24T03:33:08.123456"` (no offset). `list_by_plan()` orders by `created_at` as a string column. If any records are inserted via direct SQL (using server_default), the mixed formats break lexicographic sort order because `+` (ASCII 43) sorts before `.` (ASCII 46). In practice, all records currently go through `from_domain()`, so the server_default is never triggered. This is a latent defect. **Suggestion:** Either make `from_domain()` strip the timezone offset to match the SQLite format, or make the server_default produce a timezone-aware string. --- #### P-1. Performance: Missing index on `original_decision_id` **File:** `m8_001_correction_attempts_table.py` The correction workflow is likely to look up correction attempts by the original decision being corrected (e.g., "show all corrections for decision X"). Without an index on `original_decision_id`, this requires a full table scan. The spec DDL only defines `idx_corrections_plan` on `plan_id`, but the implementation should consider query patterns. **Suggestion:** Add `CREATE INDEX idx_corrections_decision ON correction_attempts(original_decision_id)` if decision-based lookups are expected. --- #### T-1. Test Coverage: Missing `update_state()` not-found scenario **File:** `features/correction_attempt_persistence.feature` There is a "Getting a non-existent correction attempt raises error" scenario for `get()`, but no equivalent for `update_state()`. The `CorrectionAttemptNotFoundError` path at `repositories.py:5808-5811` is untested. **Suggestion:** Add a BDD scenario: "Updating state of a non-existent correction attempt raises error". --- ### LOW (8) #### S-1. Spec Compliance: FK `ON DELETE` behaviors not in spec DDL **File:** `models.py:2982`, `models.py:2989`, `models.py:2996` The implementation adds `ON DELETE CASCADE` for `plan_id` and `original_decision_id`, and `ON DELETE SET NULL` for `new_decision_id`. The spec DDL (Version B, line 45579) specifies bare `REFERENCES` without `ON DELETE` clauses (defaulting to NO ACTION). These are reasonable additions but represent undocumented spec deviations. --- #### S-2. Spec Compliance: Two conflicting DDL definitions in specification **File:** `docs/specification.md` lines 18782 and 45579 The spec contains TWO different DDL definitions for `correction_attempts`. Version A (line 18782, "Correction history") uses `attempt_id`, `correction_reason`, `original_subtree_snapshot`, and `status` (with `'completed'`). Version B (line 45579, "Data Storage") uses `correction_attempt_id`, `guidance`, `archived_artifacts_path`, and `state` (with `'complete'`). The implementation correctly follows Version B, but the spec inconsistency may cause confusion for future developers. --- #### B-4. Bug: `update_state()` cannot null-out optional fields **File:** `repositories.py:5836-5839` The `None` sentinel for `new_decision_id` and `archived_artifacts_path` means "don't change", making it impossible to explicitly set these fields back to `NULL`. If a correction's new decision becomes invalid and needs to be disassociated, the current API provides no mechanism. **Suggestion:** Consider a dedicated sentinel (e.g., `_UNSET = object()`) to distinguish "don't change" from "set to NULL". --- #### T-2. Test Coverage: Missing FK violation test for `update_state()` **File:** `features/correction_attempt_persistence.feature` Setting `new_decision_id` to a non-existent decision ID during `update_state()` would trigger an `IntegrityError`. This error path is untested. --- #### T-3. Test Coverage: No `guidance` max-length boundary test **File:** `features/correction_attempt_persistence.feature` The `max_length=10_000` constraint on `CorrectionAttemptRecord.guidance` is not exercised at the boundary (10,000 chars accepted, 10,001 rejected). --- #### Q-1. Test Quality: `step_check_guidance_error` catches generic `Exception` **File:** `features/steps/correction_attempt_persistence_steps.py:642,652` The guidance validation steps catch `except Exception` rather than `pydantic.ValidationError`. The assertion at line 657 checks `is not None` but not the exception type. A different exception (e.g., `TypeError`) would falsely pass the test. **Suggestion:** Catch `pydantic.ValidationError` specifically, or assert the exception type in `step_check_guidance_error`. --- #### CQ-1. Code Quality: `mutable: Any = row` type safety bypass **File:** `repositories.py:5832` The `mutable: Any = row` pattern defeats type checking for the attribute assignments that follow. While functional, this masks potential type errors. **Suggestion:** Use `setattr(row, "state", state.value)` or add type stubs for the ORM model to avoid the `Any` escape hatch. --- #### CQ-2. Code Quality: Transition map uses strings instead of enum values **File:** `correction.py:351-354` `CORRECTION_ATTEMPT_VALID_TRANSITIONS` uses `str` keys and `frozenset[str]` values, while the API (`validate_correction_state_transition`) operates on `CorrectionAttemptState` enum instances and accesses `.value`. Using the enum directly as keys would make the mapping self-documenting and prevent typo-related bugs. --- ### INFO (2) #### P-2. Performance: `list_by_plan()` unbounded **File:** `repositories.py:5765` No `LIMIT` or pagination support. While correction attempts per plan should be few, a defensive limit would prevent accidental full-table fetches. --- #### SEC-1. Security: No validation on `archived_artifacts_path` **File:** `correction.py:419-421` The path is stored as-is with no format, length, or character validation. If used later for filesystem operations, this could be a path traversal vector. The validation should be at the service layer, but a domain-level format check would add defense in depth. --- ## Checklist vs Issue #920 Acceptance Criteria | Criterion | Status | |:--|:--| | `CorrectionAttemptModel` SQLAlchemy model created | PASS | | All columns match spec DDL | PASS | | FK constraints to `v3_plans` and `decisions` | PASS (with undocumented CASCADE/SET NULL additions) | | Index `idx_corrections_plan` on `plan_id` | PASS | | Alembic migration created | PASS | | Repository/service layer for CRUD | PASS | | Existing tests pass | Not verified in this review (per instructions) | --- **Totals:** 5 Medium, 8 Low, 2 Info across 7 categories (Bug Detection, Spec Compliance, Test Coverage, Test Quality, Performance, Security, Code Quality). None of the findings are critical or blocking. The Medium items warrant attention before merge; the Low/Info items can be addressed at the author's discretion or tracked as follow-up.
Author
Member

Review Fixes Applied (Fifth Review Round)

Addressed 8 of the 15 findings from the automated code review. The commit has been amended on feat/correction-attempts-table.

Fixes Applied

ID Category Fix
B-1 Bug Added explicit IntegrityError handling in update_state() for FK violations (e.g. non-existent new_decision_id), matching the diagnostic quality of create()
B-3 Bug Normalised timestamp format in from_domain() to use strftime("%Y-%m-%dT%H:%M:%S.%f") without timezone offset, matching SQLite server_default format for consistent string-based ordering in list_by_plan()
CQ-1 Code Quality Replaced mutable: Any = row pattern with direct attribute assignment using # type: ignore[assignment], matching the established codebase convention (e.g. ActionRepository.update())
CQ-2 Code Quality Changed CORRECTION_ATTEMPT_VALID_TRANSITIONS and CORRECTION_ATTEMPT_TERMINAL_STATES from dict[str, frozenset[str]] to dict[CorrectionAttemptState, frozenset[CorrectionAttemptState]] for type-safe lookups
Q-1 Test Quality Changed BDD guidance validation steps to catch pydantic.ValidationError instead of generic Exception, added isinstance assertion in step_check_guidance_error
T-1 Test Coverage Added BDD scenario: "Updating state of a non-existent correction attempt raises error"
T-2 Test Coverage Added BDD scenario: "Updating with non-existent new_decision_id raises FK error"
T-3 Test Coverage Added BDD scenarios: "Guidance at max length is accepted" and "Guidance exceeding max length is rejected at domain model level"

Quality Gates

Gate Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS (464 features, 12,316 scenarios, 47,059 steps)
nox -s integration_tests PASS (1,704 tests)

BDD Scenarios

Now 25 BDD scenarios (was 21), with 4 new scenarios covering update_state() not-found, FK violation, and guidance max-length boundary (accepted + rejected).

## Review Fixes Applied (Fifth Review Round) Addressed 8 of the 15 findings from the automated code review. The commit has been amended on `feat/correction-attempts-table`. ### Fixes Applied | ID | Category | Fix | |:--|:--|:--| | B-1 | Bug | Added explicit `IntegrityError` handling in `update_state()` for FK violations (e.g. non-existent `new_decision_id`), matching the diagnostic quality of `create()` | | B-3 | Bug | Normalised timestamp format in `from_domain()` to use `strftime("%Y-%m-%dT%H:%M:%S.%f")` without timezone offset, matching SQLite `server_default` format for consistent string-based ordering in `list_by_plan()` | | CQ-1 | Code Quality | Replaced `mutable: Any = row` pattern with direct attribute assignment using `# type: ignore[assignment]`, matching the established codebase convention (e.g. `ActionRepository.update()`) | | CQ-2 | Code Quality | Changed `CORRECTION_ATTEMPT_VALID_TRANSITIONS` and `CORRECTION_ATTEMPT_TERMINAL_STATES` from `dict[str, frozenset[str]]` to `dict[CorrectionAttemptState, frozenset[CorrectionAttemptState]]` for type-safe lookups | | Q-1 | Test Quality | Changed BDD guidance validation steps to catch `pydantic.ValidationError` instead of generic `Exception`, added `isinstance` assertion in `step_check_guidance_error` | | T-1 | Test Coverage | Added BDD scenario: "Updating state of a non-existent correction attempt raises error" | | T-2 | Test Coverage | Added BDD scenario: "Updating with non-existent new_decision_id raises FK error" | | T-3 | Test Coverage | Added BDD scenarios: "Guidance at max length is accepted" and "Guidance exceeding max length is rejected at domain model level" | ### Quality Gates | Gate | Result | |:--|:--| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (464 features, 12,316 scenarios, 47,059 steps) | | `nox -s integration_tests` | PASS (1,704 tests) | ### BDD Scenarios Now 25 BDD scenarios (was 21), with 4 new scenarios covering update_state() not-found, FK violation, and guidance max-length boundary (accepted + rejected).
CoreRasurae force-pushed feat/correction-attempts-table from c0a306149f
Some checks failed
CI / lint (pull_request) Successful in 39s
CI / typecheck (pull_request) Successful in 45s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 26s
CI / security (pull_request) Successful in 4m11s
CI / quality (pull_request) Successful in 3m48s
CI / unit_tests (pull_request) Successful in 7m1s
CI / integration_tests (pull_request) Successful in 7m20s
CI / docker (pull_request) Successful in 1m25s
CI / e2e_tests (pull_request) Successful in 9m37s
CI / coverage (pull_request) Successful in 13m56s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to a60af23449
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m32s
CI / build (pull_request) Successful in 23s
CI / security (pull_request) Successful in 4m2s
CI / typecheck (pull_request) Successful in 4m15s
CI / quality (pull_request) Successful in 3m46s
CI / unit_tests (pull_request) Successful in 6m42s
CI / integration_tests (pull_request) Successful in 6m44s
CI / docker (pull_request) Successful in 1m21s
CI / e2e_tests (pull_request) Successful in 8m39s
CI / coverage (pull_request) Successful in 11m50s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 14:08:42 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Reviewed commit: a60af234 by Luis Mendes
Issue: #920
Branch: feat/correction-attempts-table
Spec reference: docs/specification.md lines 45579–45593 (canonical DDL)
Scope: All code changes in the branch plus close connections to surrounding code.
Method: 4 full global review cycles across all categories (bugs, security, performance, test coverage/flaws, spec compliance, design). No tests were executed.


Context

This is a review of the latest commit (a60af234) which incorporates fixes from 5 rounds of prior review feedback. Many earlier findings (typed parameters, state transition validation, completed_at guard, FK violation messages, deterministic BDD timestamps, domain-layer transitions, PRAGMA foreign_keys in Robot, guidance validators, etc.) have been addressed. This report focuses on newly identified issues and previously raised items that remain unresolved in the current code.


Findings by Severity

MEDIUM (3)

M1 — Bug: __init__.py __all__ missing 5 new exports

File: src/cleveragents/domain/models/core/__init__.py
Category: Bug / API Surface

The 5 new symbols added to the import block (lines 75–86) are not added to the __all__ list (lines 312–536):

  • CORRECTION_ATTEMPT_TERMINAL_STATES
  • CORRECTION_ATTEMPT_VALID_TRANSITIONS
  • CorrectionAttemptRecord
  • CorrectionAttemptState
  • validate_correction_state_transition

This means from cleveragents.domain.models.core import * will not expose these types, breaking the convention followed by every other type in the module. Direct imports from the submodule work, but any code relying on the package-level wildcard or introspecting __all__ will miss these.

Not raised in prior reviews.


M2 — Bug: from_domain() silently discards timezone for non-UTC datetimes

File: src/cleveragents/infrastructure/database/models.pyCorrectionAttemptModel.from_domain() (around line 3095)
Category: Bug / Data Integrity

from_domain() uses record.created_at.strftime("%Y-%m-%dT%H:%M:%S.%f") which strips timezone information. to_domain() then assumes any naive datetime is UTC via replace(tzinfo=UTC).

Round-trip for UTC: datetime(2026, 1, 1, 12, 0, tzinfo=UTC)"2026-01-01T12:00:00.000000"datetime(2026, 1, 1, 12, 0, tzinfo=UTC)correct.

Round-trip for non-UTC: datetime(2026, 1, 1, 12, 0, tzinfo=timezone(timedelta(hours=5)))"2026-01-01T12:00:00.000000"datetime(2026, 1, 1, 12, 0, tzinfo=UTC)data corruption (the real UTC time is 07:00, not 12:00).

The commit message notes this was intentional for SQLite format consistency. The default factory and typical callers use UTC. However, the CorrectionAttemptRecord API accepts any datetime, creating an implicit contract that is nowhere enforced or documented.

Suggestion: Either (a) add a field_validator on created_at/completed_at that rejects non-UTC datetimes or converts to UTC, or (b) have from_domain() normalize to UTC before stripping: value.astimezone(UTC).strftime(...).

Previously raised in prior reviews (H1 in review 2816) for the to_domain() side. The to_domain() side was fixed (naive→UTC normalization). The from_domain() input side — which is the actual data-loss vector — remains unaddressed.


M3 — Design: update_state() doesn't enforce completed_at on terminal transitions

File: src/cleveragents/infrastructure/database/repositories.pyupdate_state()
Category: Design / Business Logic

The method correctly rejects completed_at on non-terminal transitions (the guard added per prior review M1/2818). However, it does not require completed_at when transitioning to terminal states (COMPLETE, FAILED). A caller can do:

repo.update_state(attempt_id, state=CorrectionAttemptState.COMPLETE)
# → state='complete', completed_at=NULL

This produces a semantically inconsistent record: "completed" with no completion timestamp. The spec DDL allows completed_at TEXT (nullable), so this is technically valid at the DB level, but violates the expected invariant that terminal records always have a completion timestamp.

Suggestion: Either (a) auto-set completed_at = datetime.now(UTC) when transitioning to a terminal state if not provided, or (b) raise an error requiring it on terminal transitions.

Partially related to prior review M1/2818 (which addressed the negative case). The positive enforcement case is new.


LOW (5)

L1 — Bug: plan_id / original_decision_id validators don't return stripped values

File: src/cleveragents/domain/models/core/correction.py:442–453
Category: Bug / Input Validation

The _plan_id_not_empty and _original_decision_id_not_empty validators use .strip() for the emptiness check but return the original unstripped value:

def _plan_id_not_empty(cls, v: str) -> str:
    if not v or not v.strip():
        raise ValueError("plan_id must not be empty")
    return v  # ← returns "  01HV...  " with whitespace intact

A plan_id like " 01HV000000000000000000CA01 " passes validation and is stored with leading/trailing whitespace, which would cause FK lookup failures. The _guidance_not_empty validator (line 458–462) correctly returns the stripped value.

Previously raised as L5 in review 2818. Remains unaddressed in commit a60af234.


L2 — Test Flaw: Robot cmd_list_by_plan uses implicit timestamp ordering

File: robot/helper_correction_attempt_persistence.py:185–196
Category: Test Flaw

The Robot helper creates 3 correction attempts in a tight loop without explicit created_at values, relying on datetime.now(UTC) natural ordering. The BDD equivalent (step_three_attempts) was fixed to use deterministic timestamps with explicit timedelta offsets, but the Robot test was not updated. If records are created within the same microsecond, the ordering assertion results[i].created_at <= results[i + 1].created_at could pass by coincidence or fail non-deterministically.

Previously raised as L5 in review 2819. Remains unaddressed in commit a60af234.


L3 — Design: update_state() None sentinel prevents nulling optional fields

File: src/cleveragents/infrastructure/database/repositories.pyupdate_state()
Category: Design

The None default for new_decision_id and archived_artifacts_path means "don't change". This makes it impossible to explicitly set these fields back to NULL (e.g., disassociating a new_decision_id that became invalid).

if new_decision_id is not None:      # can never set to NULL
    row.new_decision_id = new_decision_id

Previously raised as M1 in review 2816 and B-4 in review 2819. Remains unaddressed in commit a60af234.


L4 — Design: No ORM relationship on CorrectionAttemptModel to parent tables

File: src/cleveragents/infrastructure/database/models.pyCorrectionAttemptModel
Category: Design / Consistency

The model defines FK columns to v3_plans and decisions but declares no SQLAlchemy relationship() back-references. Other child models in the codebase (e.g., PlanProjectModel, PlanInvariantModel, DecisionModel) define relationships to their parent. This prevents ORM-level navigation like plan.correction_attempts and is inconsistent with established patterns.

While the repository uses explicit query filters (making relationships non-essential for current functionality), the inconsistency could cause confusion.

Not raised in prior reviews.


L5 — Performance / Security: list_by_plan() unbounded + archived_artifacts_path unvalidated

File: src/cleveragents/infrastructure/database/repositories.py
Category: Performance / Security

Two minor items previously noted but still present:

  1. list_by_plan() returns all results with no LIMIT/OFFSET parameters. While correction attempts per plan are expected to be few, a defensive limit would prevent unbounded queries.
  2. archived_artifacts_path accepts arbitrary strings with no format, length, or character validation. If consumed by filesystem operations downstream, this is a potential path traversal vector.

Previously raised as P-2 and SEC-1 in review 2819. Acknowledged as informational.


Addressed Since Prior Reviews

The following items from prior reviews have been confirmed fixed in commit a60af234:

Prior Finding Status
Any type annotations → typed CorrectionAttemptRecord signatures Fixed
time.sleep(0.01) in BDD ordering test → deterministic timestamps Fixed
Missing cascade deletion test Fixed
Robot helper missing PRAGMA foreign_keys=ON Fixed
update_state() accepts raw strings → typed enum + datetime Fixed
Missing guidance non-empty validator and max_length=10_000 Fixed
Redundant hasattr guards in from_domain() Fixed
Missing server_default for created_at Fixed
Missing BDD scenario for empty list_by_plan() Fixed
Timezone inconsistency in to_domain() (naive→UTC normalization) Fixed
Redundant duplicate-detection condition Fixed
No state transition validation Fixed (domain-layer with validate_correction_state_transition)
Missing session.rollback() in read-path error handlers Fixed
Missing BDD scenarios for terminal-state rejection Fixed
Completed_at guard on non-terminal transitions Fixed
FK-violation error message in create() Fixed
Repository module docstring tables not updated Fixed
Explicit IntegrityError handling in update_state() Fixed
Transition maps using strings instead of enum values Fixed
mutable: Any = row pattern → direct assignment with type: ignore Fixed
BDD guidance validation catching generic Exceptionpydantic.ValidationError Fixed
Missing BDD scenarios: update_state not-found, FK violation update, guidance max_length boundary/exceeding Fixed

Summary Table

ID Severity Category Description New?
M1 MEDIUM Bug __init__.py __all__ missing 5 new exports New
M2 MEDIUM Bug from_domain() silently discards timezone for non-UTC datetimes Residual
M3 MEDIUM Design update_state() doesn't enforce completed_at on terminal transitions New
L1 LOW Bug plan_id/original_decision_id validators don't return stripped values Residual
L2 LOW Test Flaw Robot cmd_list_by_plan implicit timestamp ordering Residual
L3 LOW Design update_state() None sentinel prevents nulling optional fields Residual
L4 LOW Design No ORM relationship on CorrectionAttemptModel New
L5 LOW Perf/Sec list_by_plan() unbounded + archived_artifacts_path unvalidated Residual

Totals: 3 Medium, 5 Low. 4 new findings, 4 residual from prior reviews.


Review performed over 4 global cycles across all categories. Prior review history (reviews 2719, 2790, 2815, 2816, 2818, 2819) was analyzed to avoid duplication and track resolution status. No tests were executed during this review.

# Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Reviewed commit:** `a60af234` by Luis Mendes **Issue:** #920 **Branch:** `feat/correction-attempts-table` **Spec reference:** `docs/specification.md` lines 45579–45593 (canonical DDL) **Scope:** All code changes in the branch plus close connections to surrounding code. **Method:** 4 full global review cycles across all categories (bugs, security, performance, test coverage/flaws, spec compliance, design). No tests were executed. --- ## Context This is a review of the **latest commit** (`a60af234`) which incorporates fixes from 5 rounds of prior review feedback. Many earlier findings (typed parameters, state transition validation, completed_at guard, FK violation messages, deterministic BDD timestamps, domain-layer transitions, PRAGMA foreign_keys in Robot, guidance validators, etc.) have been addressed. This report focuses on **newly identified issues** and **previously raised items that remain unresolved** in the current code. --- ## Findings by Severity ### MEDIUM (3) #### M1 — Bug: `__init__.py` `__all__` missing 5 new exports **File:** `src/cleveragents/domain/models/core/__init__.py` **Category:** Bug / API Surface The 5 new symbols added to the import block (lines 75–86) are **not** added to the `__all__` list (lines 312–536): - `CORRECTION_ATTEMPT_TERMINAL_STATES` - `CORRECTION_ATTEMPT_VALID_TRANSITIONS` - `CorrectionAttemptRecord` - `CorrectionAttemptState` - `validate_correction_state_transition` This means `from cleveragents.domain.models.core import *` will not expose these types, breaking the convention followed by every other type in the module. Direct imports from the submodule work, but any code relying on the package-level wildcard or introspecting `__all__` will miss these. **Not raised in prior reviews.** --- #### M2 — Bug: `from_domain()` silently discards timezone for non-UTC datetimes **File:** `src/cleveragents/infrastructure/database/models.py` — `CorrectionAttemptModel.from_domain()` (around line 3095) **Category:** Bug / Data Integrity `from_domain()` uses `record.created_at.strftime("%Y-%m-%dT%H:%M:%S.%f")` which strips timezone information. `to_domain()` then assumes any naive datetime is UTC via `replace(tzinfo=UTC)`. Round-trip for UTC: `datetime(2026, 1, 1, 12, 0, tzinfo=UTC)` → `"2026-01-01T12:00:00.000000"` → `datetime(2026, 1, 1, 12, 0, tzinfo=UTC)` — **correct**. Round-trip for non-UTC: `datetime(2026, 1, 1, 12, 0, tzinfo=timezone(timedelta(hours=5)))` → `"2026-01-01T12:00:00.000000"` → `datetime(2026, 1, 1, 12, 0, tzinfo=UTC)` — **data corruption** (the real UTC time is 07:00, not 12:00). The commit message notes this was intentional for SQLite format consistency. The default factory and typical callers use UTC. However, the `CorrectionAttemptRecord` API accepts any `datetime`, creating an implicit contract that is nowhere enforced or documented. **Suggestion:** Either (a) add a `field_validator` on `created_at`/`completed_at` that rejects non-UTC datetimes or converts to UTC, or (b) have `from_domain()` normalize to UTC before stripping: `value.astimezone(UTC).strftime(...)`. **Previously raised in prior reviews (H1 in review 2816) for the `to_domain()` side. The `to_domain()` side was fixed (naive→UTC normalization). The `from_domain()` input side — which is the actual data-loss vector — remains unaddressed.** --- #### M3 — Design: `update_state()` doesn't enforce `completed_at` on terminal transitions **File:** `src/cleveragents/infrastructure/database/repositories.py` — `update_state()` **Category:** Design / Business Logic The method correctly rejects `completed_at` on non-terminal transitions (the guard added per prior review M1/2818). However, it does **not** require `completed_at` when transitioning to terminal states (`COMPLETE`, `FAILED`). A caller can do: ```python repo.update_state(attempt_id, state=CorrectionAttemptState.COMPLETE) # → state='complete', completed_at=NULL ``` This produces a semantically inconsistent record: "completed" with no completion timestamp. The spec DDL allows `completed_at TEXT` (nullable), so this is technically valid at the DB level, but violates the expected invariant that terminal records always have a completion timestamp. **Suggestion:** Either (a) auto-set `completed_at = datetime.now(UTC)` when transitioning to a terminal state if not provided, or (b) raise an error requiring it on terminal transitions. **Partially related to prior review M1/2818 (which addressed the negative case). The positive enforcement case is new.** --- ### LOW (5) #### L1 — Bug: `plan_id` / `original_decision_id` validators don't return stripped values **File:** `src/cleveragents/domain/models/core/correction.py:442–453` **Category:** Bug / Input Validation The `_plan_id_not_empty` and `_original_decision_id_not_empty` validators use `.strip()` for the emptiness check but return the **original unstripped** value: ```python def _plan_id_not_empty(cls, v: str) -> str: if not v or not v.strip(): raise ValueError("plan_id must not be empty") return v # ← returns " 01HV... " with whitespace intact ``` A plan_id like `" 01HV000000000000000000CA01 "` passes validation and is stored with leading/trailing whitespace, which would cause FK lookup failures. The `_guidance_not_empty` validator (line 458–462) correctly returns the stripped value. **Previously raised as L5 in review 2818. Remains unaddressed in commit `a60af234`.** --- #### L2 — Test Flaw: Robot `cmd_list_by_plan` uses implicit timestamp ordering **File:** `robot/helper_correction_attempt_persistence.py:185–196` **Category:** Test Flaw The Robot helper creates 3 correction attempts in a tight loop without explicit `created_at` values, relying on `datetime.now(UTC)` natural ordering. The BDD equivalent (`step_three_attempts`) was fixed to use deterministic timestamps with explicit `timedelta` offsets, but the Robot test was not updated. If records are created within the same microsecond, the ordering assertion `results[i].created_at <= results[i + 1].created_at` could pass by coincidence or fail non-deterministically. **Previously raised as L5 in review 2819. Remains unaddressed in commit `a60af234`.** --- #### L3 — Design: `update_state()` `None` sentinel prevents nulling optional fields **File:** `src/cleveragents/infrastructure/database/repositories.py` — `update_state()` **Category:** Design The `None` default for `new_decision_id` and `archived_artifacts_path` means "don't change". This makes it impossible to explicitly set these fields back to `NULL` (e.g., disassociating a `new_decision_id` that became invalid). ```python if new_decision_id is not None: # can never set to NULL row.new_decision_id = new_decision_id ``` **Previously raised as M1 in review 2816 and B-4 in review 2819. Remains unaddressed in commit `a60af234`.** --- #### L4 — Design: No ORM relationship on `CorrectionAttemptModel` to parent tables **File:** `src/cleveragents/infrastructure/database/models.py` — `CorrectionAttemptModel` **Category:** Design / Consistency The model defines FK columns to `v3_plans` and `decisions` but declares no SQLAlchemy `relationship()` back-references. Other child models in the codebase (e.g., `PlanProjectModel`, `PlanInvariantModel`, `DecisionModel`) define relationships to their parent. This prevents ORM-level navigation like `plan.correction_attempts` and is inconsistent with established patterns. While the repository uses explicit query filters (making relationships non-essential for current functionality), the inconsistency could cause confusion. **Not raised in prior reviews.** --- #### L5 — Performance / Security: `list_by_plan()` unbounded + `archived_artifacts_path` unvalidated **File:** `src/cleveragents/infrastructure/database/repositories.py` **Category:** Performance / Security Two minor items previously noted but still present: 1. `list_by_plan()` returns all results with no `LIMIT`/`OFFSET` parameters. While correction attempts per plan are expected to be few, a defensive limit would prevent unbounded queries. 2. `archived_artifacts_path` accepts arbitrary strings with no format, length, or character validation. If consumed by filesystem operations downstream, this is a potential path traversal vector. **Previously raised as P-2 and SEC-1 in review 2819. Acknowledged as informational.** --- ## Addressed Since Prior Reviews The following items from prior reviews have been **confirmed fixed** in commit `a60af234`: | Prior Finding | Status | |:--|:--| | `Any` type annotations → typed `CorrectionAttemptRecord` signatures | Fixed | | `time.sleep(0.01)` in BDD ordering test → deterministic timestamps | Fixed | | Missing cascade deletion test | Fixed | | Robot helper missing `PRAGMA foreign_keys=ON` | Fixed | | `update_state()` accepts raw strings → typed enum + datetime | Fixed | | Missing `guidance` non-empty validator and `max_length=10_000` | Fixed | | Redundant `hasattr` guards in `from_domain()` | Fixed | | Missing `server_default` for `created_at` | Fixed | | Missing BDD scenario for empty `list_by_plan()` | Fixed | | Timezone inconsistency in `to_domain()` (naive→UTC normalization) | Fixed | | Redundant duplicate-detection condition | Fixed | | No state transition validation | Fixed (domain-layer with `validate_correction_state_transition`) | | Missing `session.rollback()` in read-path error handlers | Fixed | | Missing BDD scenarios for terminal-state rejection | Fixed | | Completed_at guard on non-terminal transitions | Fixed | | FK-violation error message in `create()` | Fixed | | Repository module docstring tables not updated | Fixed | | Explicit `IntegrityError` handling in `update_state()` | Fixed | | Transition maps using strings instead of enum values | Fixed | | `mutable: Any = row` pattern → direct assignment with `type: ignore` | Fixed | | BDD guidance validation catching generic `Exception` → `pydantic.ValidationError` | Fixed | | Missing BDD scenarios: update_state not-found, FK violation update, guidance max_length boundary/exceeding | Fixed | --- ## Summary Table | ID | Severity | Category | Description | New? | |:---|:---------|:---------|:------------|:-----| | M1 | MEDIUM | Bug | `__init__.py` `__all__` missing 5 new exports | New | | M2 | MEDIUM | Bug | `from_domain()` silently discards timezone for non-UTC datetimes | Residual | | M3 | MEDIUM | Design | `update_state()` doesn't enforce `completed_at` on terminal transitions | New | | L1 | LOW | Bug | `plan_id`/`original_decision_id` validators don't return stripped values | Residual | | L2 | LOW | Test Flaw | Robot `cmd_list_by_plan` implicit timestamp ordering | Residual | | L3 | LOW | Design | `update_state()` `None` sentinel prevents nulling optional fields | Residual | | L4 | LOW | Design | No ORM relationship on `CorrectionAttemptModel` | New | | L5 | LOW | Perf/Sec | `list_by_plan()` unbounded + `archived_artifacts_path` unvalidated | Residual | **Totals:** 3 Medium, 5 Low. 4 new findings, 4 residual from prior reviews. --- *Review performed over 4 global cycles across all categories. Prior review history (reviews 2719, 2790, 2815, 2816, 2818, 2819) was analyzed to avoid duplication and track resolution status. No tests were executed during this review.*
CoreRasurae force-pushed feat/correction-attempts-table from a60af23449
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m32s
CI / build (pull_request) Successful in 23s
CI / security (pull_request) Successful in 4m2s
CI / typecheck (pull_request) Successful in 4m15s
CI / quality (pull_request) Successful in 3m46s
CI / unit_tests (pull_request) Successful in 6m42s
CI / integration_tests (pull_request) Successful in 6m44s
CI / docker (pull_request) Successful in 1m21s
CI / e2e_tests (pull_request) Successful in 8m39s
CI / coverage (pull_request) Successful in 11m50s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to d1b0fe4952
Some checks failed
CI / coverage (pull_request) Blocked by required conditions
CI / benchmark-regression (pull_request) Blocked by required conditions
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m29s
CI / build (pull_request) Successful in 51s
CI / quality (pull_request) Successful in 4m4s
CI / security (pull_request) Successful in 4m25s
CI / unit_tests (pull_request) Successful in 6m42s
CI / integration_tests (pull_request) Successful in 7m16s
CI / e2e_tests (pull_request) Successful in 9m39s
CI / typecheck (pull_request) Failing after 13m4s
2026-03-28 14:43:37 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 feat(db): add correction_attempts table per specification DDL

Reviewer: Automated multi-cycle code review (3 global passes)
Scope: All code changes in branch feat/correction-attempts-table plus close connections to surrounding code
Reference: Issue #920, docs/specification.md DDL (lines 45579-45599)


Summary

Solid implementation overall. The CorrectionAttemptModel, CorrectionAttemptRecord domain model, CorrectionAttemptRepository, Alembic migration, and test suite are well-structured and follow established codebase patterns. The state machine validation is correctly extracted to the domain layer, and the test coverage is comprehensive (25 BDD scenarios + 5 Robot tests).

Below are the findings organized by severity and category after three complete review cycles.


HIGH Severity

[H1] Bug / Data Integrity — Timestamp format inconsistency between from_domain() and SQLite server_default

Files: models.py:3021 vs models.py:3093-3106

from_domain() produces timestamps via Python's strftime("%Y-%m-%dT%H:%M:%S.%f") which yields 6 fractional digits (e.g., 2026-03-28T12:34:56.789000).

The SQLite server_default uses strftime('%Y-%m-%dT%H:%M:%f', 'now') where SQLite's %f produces SS.SSS — only 3 fractional digits (e.g., 2026-03-28T12:34:56.789).

Impact: list_by_plan() uses string-based ORDER BY created_at. Records created via application code (from_domain) and records created via DB default will have different string lengths for the same timestamp precision, causing inconsistent lexicographic ordering. The from_domain() docstring at line 3082 incorrectly claims the format "matches the SQLite server_default."

Suggestion: Normalize both paths to the same precision — either truncate Python output to 3 fractional digits to match SQLite, or change the server_default to produce 6 digits via a custom expression. The simplest fix:

# In from_domain(), line 3106:
created_at_utc.strftime("%Y-%m-%dT%H:%M:%S.%f")[:23]  # Truncate to match SQLite's SS.SSS

Apply the same fix at line 3093 for completed_at.


MEDIUM Severity

[M1] Security — No validation of archived_artifacts_path for path traversal

Files: correction.py:425-428, repositories.py:5847

Neither the CorrectionAttemptRecord domain model nor the repository validates the archived_artifacts_path string. A malicious or buggy caller could store path traversal strings like ../../etc/passwd or absolute paths outside expected directories. While this field is currently only persisted (not consumed for file operations in this PR), downstream consumers of these records could trust the path value.

Suggestion: Add a field_validator on CorrectionAttemptRecord.archived_artifacts_path that rejects paths containing .. sequences or that are absolute when relative paths are expected. At minimum, strip whitespace and reject empty-after-strip values similar to the guidance validator.

[M2] Performance — list_by_plan() lacks pagination

File: repositories.py:5747

list_by_plan() returns all correction attempts for a plan without limit/offset parameters. Other repositories in the codebase consistently support pagination (e.g., LifecyclePlanRepository.list_plans() at line 1515, ResourceRepository.list_resources() at line 2176).

Suggestion: Add optional limit and offset parameters with sensible defaults for consistency and to prevent unbounded result sets.

[M3] Test Coverage — No test for from_domain() timezone normalization

File: models.py:3092-3095

The changelog highlights that from_domain() normalizes timestamps to UTC via astimezone(UTC), but no test verifies behavior when a non-UTC timezone-aware datetime is passed as created_at or completed_at. This is an explicitly called-out feature that lacks coverage.

Suggestion: Add a BDD scenario or Robot test that creates a CorrectionAttemptRecord with a non-UTC timezone (e.g., timezone(timedelta(hours=5))), persists it, retrieves it, and verifies the stored value is in UTC.

[M4] Test Coverage — No test for archived_artifacts_path round-trip on create()

Tests only set archived_artifacts_path via update_state(), but the "Persisted correction attempt preserves all fields" scenario doesn't verify this field survives a full create()get() round-trip when set at creation time.

[M5] Design — update_state() cannot clear new_decision_id or archived_artifacts_path back to None

File: repositories.py:5845-5848

The conditional guards if new_decision_id is not None and if archived_artifacts_path is not None mean callers can never reset these fields to NULL once populated. The None sentinel conflates "don't change" with "set to null." If the correction workflow ever needs to unlink a new decision (e.g., on retry), this API would need a sentinel value or a separate method.

Suggestion: Document this as intentional behavior (correction attempts accumulate state, never retract), or introduce an UNSET sentinel constant if clearing is a valid operation.


LOW Severity

[L1] Spec Compliance — Two conflicting DDL definitions in the specification

The specification contains two different DDL definitions for correction_attempts:

  • DDL #1 (spec line ~18782): PK = attempt_id, column = correction_reason, state value = 'completed'
  • DDL #2 (spec line ~45579): PK = correction_attempt_id, columns = mode + guidance, state value = 'complete'

This implementation correctly follows DDL #2 (the "Database Schema Design" section, referenced by issue #920). However, the spec itself has an internal inconsistency that should be flagged for documentation cleanup.

[L2] Bug — update_state() FK error message can be misleading

File: repositories.py:5856-5861

The IntegrityError handler assumes a FOREIGN KEY violation is always about new_decision_id:

f"verify that new_decision_id '{new_decision_id}' exists"

But new_decision_id could be None if the FK error originated from another cause (e.g., the row's existing plan_id was concurrently deleted). The message would then say "verify that new_decision_id 'None' exists", which is misleading.

[L3] Inconsistency — get() calls session.rollback() unlike other repos

File: repositories.py:5739

CorrectionAttemptRepository.get() calls session.rollback() in its error handler. Other repos' read-only get() methods (e.g., DecisionRepository.get(), CheckpointRepository.get()) do not rollback on read errors. This is not harmful but is inconsistent.

[L4] Inconsistency — No ORM relationship from parent models

File: models.py:2960-3035

CorrectionAttemptModel has FK references to v3_plans and decisions but defines no SQLAlchemy relationship() back to LifecyclePlanModel or DecisionModel. Other FK child tables (e.g., PlanProjectModel, PlanInvariantModel) define bidirectional relationships. While DB-level CASCADE works correctly, the lack of ORM relationships means no eager/lazy loading support and no ORM-level cascade verification.

[L5] Test Coverage — No test for DB-level CHECK constraints

The ck_correction_attempts_mode and ck_correction_attempts_state CHECK constraints are untested at the database level. Domain model validation via CorrectionMode/CorrectionAttemptState enums catches invalid values first, but the CHECK constraints (the last line of defense against bypassed validation) are never exercised in tests.

[L6] Test Coverage — No test for Alembic migration downgrade() path

The migration file defines both upgrade() and downgrade(), but no test verifies the downgrade correctly drops the index and table in the right order.

[L7] Test Coverage — Missing minimum-boundary guidance test

Tests cover empty guidance, whitespace-only guidance, max-length (10,000), and over-max-length (10,001), but there is no test for guidance with exactly 1 character (the minimum valid length after stripping).

[L8] Test Coverage — Robot helper cmd_domain_roundtrip weak timestamp assertion

File: helper_correction_attempt_persistence.py:~286

The round-trip test only asserts retrieved.created_at is not None but does not verify the timestamp value is within an acceptable delta of the original. Given the format mismatch in H1, a stronger assertion would catch drift.


Confirmed OK (no issues found)

  • Alembic migration chain: down_revision = "m4_003_plan_env_columns" is the correct single head.
  • CHECK constraints in __table_args__ match Alembic migration constraints exactly.
  • Domain enum values match DB CHECK constraint values.
  • to_domain() / from_domain() round-trip is field-complete (all 10 columns mapped).
  • State transition validation order in update_state() is safe (all checks before mutation).
  • UnitOfWork integration: correction_attempts property correctly wired in unit_of_work.py:294-303.
  • __init__.py exports: All new domain exports added to __all__.
  • BDD step error-capture pattern: No silently-passing "try" steps — all assertion steps verify error is not None.
  • All 5 Robot helper subcommands are fully implemented with proper assertions.

Disposition: The implementation is fundamentally sound. H1 (timestamp format) should be fixed before merge as it affects data ordering correctness. M1-M5 are recommended improvements. L1-L8 are quality observations that can be addressed in follow-up work.

# Code Review Report — PR #1145 `feat(db): add correction_attempts table per specification DDL` **Reviewer:** Automated multi-cycle code review (3 global passes) **Scope:** All code changes in branch `feat/correction-attempts-table` plus close connections to surrounding code **Reference:** Issue #920, `docs/specification.md` DDL (lines 45579-45599) --- ## Summary Solid implementation overall. The `CorrectionAttemptModel`, `CorrectionAttemptRecord` domain model, `CorrectionAttemptRepository`, Alembic migration, and test suite are well-structured and follow established codebase patterns. The state machine validation is correctly extracted to the domain layer, and the test coverage is comprehensive (25 BDD scenarios + 5 Robot tests). Below are the findings organized by **severity** and **category** after three complete review cycles. --- ## HIGH Severity ### [H1] Bug / Data Integrity — Timestamp format inconsistency between `from_domain()` and SQLite `server_default` **Files:** `models.py:3021` vs `models.py:3093-3106` `from_domain()` produces timestamps via Python's `strftime("%Y-%m-%dT%H:%M:%S.%f")` which yields **6 fractional digits** (e.g., `2026-03-28T12:34:56.789000`). The SQLite `server_default` uses `strftime('%Y-%m-%dT%H:%M:%f', 'now')` where SQLite's `%f` produces `SS.SSS` — only **3 fractional digits** (e.g., `2026-03-28T12:34:56.789`). **Impact:** `list_by_plan()` uses string-based `ORDER BY created_at`. Records created via application code (`from_domain`) and records created via DB default will have different string lengths for the same timestamp precision, causing **inconsistent lexicographic ordering**. The `from_domain()` docstring at line 3082 incorrectly claims the format "matches the SQLite `server_default`." **Suggestion:** Normalize both paths to the same precision — either truncate Python output to 3 fractional digits to match SQLite, or change the `server_default` to produce 6 digits via a custom expression. The simplest fix: ```python # In from_domain(), line 3106: created_at_utc.strftime("%Y-%m-%dT%H:%M:%S.%f")[:23] # Truncate to match SQLite's SS.SSS ``` Apply the same fix at line 3093 for `completed_at`. --- ## MEDIUM Severity ### [M1] Security — No validation of `archived_artifacts_path` for path traversal **Files:** `correction.py:425-428`, `repositories.py:5847` Neither the `CorrectionAttemptRecord` domain model nor the repository validates the `archived_artifacts_path` string. A malicious or buggy caller could store path traversal strings like `../../etc/passwd` or absolute paths outside expected directories. While this field is currently only persisted (not consumed for file operations in this PR), downstream consumers of these records could trust the path value. **Suggestion:** Add a `field_validator` on `CorrectionAttemptRecord.archived_artifacts_path` that rejects paths containing `..` sequences or that are absolute when relative paths are expected. At minimum, strip whitespace and reject empty-after-strip values similar to the `guidance` validator. ### [M2] Performance — `list_by_plan()` lacks pagination **File:** `repositories.py:5747` `list_by_plan()` returns **all** correction attempts for a plan without `limit`/`offset` parameters. Other repositories in the codebase consistently support pagination (e.g., `LifecyclePlanRepository.list_plans()` at line 1515, `ResourceRepository.list_resources()` at line 2176). **Suggestion:** Add optional `limit` and `offset` parameters with sensible defaults for consistency and to prevent unbounded result sets. ### [M3] Test Coverage — No test for `from_domain()` timezone normalization **File:** `models.py:3092-3095` The changelog highlights that `from_domain()` normalizes timestamps to UTC via `astimezone(UTC)`, but no test verifies behavior when a non-UTC timezone-aware datetime is passed as `created_at` or `completed_at`. This is an explicitly called-out feature that lacks coverage. **Suggestion:** Add a BDD scenario or Robot test that creates a `CorrectionAttemptRecord` with a non-UTC timezone (e.g., `timezone(timedelta(hours=5))`), persists it, retrieves it, and verifies the stored value is in UTC. ### [M4] Test Coverage — No test for `archived_artifacts_path` round-trip on `create()` Tests only set `archived_artifacts_path` via `update_state()`, but the `"Persisted correction attempt preserves all fields"` scenario doesn't verify this field survives a full `create()` → `get()` round-trip when set at creation time. ### [M5] Design — `update_state()` cannot clear `new_decision_id` or `archived_artifacts_path` back to `None` **File:** `repositories.py:5845-5848` The conditional guards `if new_decision_id is not None` and `if archived_artifacts_path is not None` mean callers can never reset these fields to `NULL` once populated. The `None` sentinel conflates "don't change" with "set to null." If the correction workflow ever needs to unlink a new decision (e.g., on retry), this API would need a sentinel value or a separate method. **Suggestion:** Document this as intentional behavior (correction attempts accumulate state, never retract), or introduce an `UNSET` sentinel constant if clearing is a valid operation. --- ## LOW Severity ### [L1] Spec Compliance — Two conflicting DDL definitions in the specification The specification contains **two different DDL definitions** for `correction_attempts`: - **DDL #1** (spec line ~18782): PK = `attempt_id`, column = `correction_reason`, state value = `'completed'` - **DDL #2** (spec line ~45579): PK = `correction_attempt_id`, columns = `mode` + `guidance`, state value = `'complete'` This implementation correctly follows DDL #2 (the "Database Schema Design" section, referenced by issue #920). However, the spec itself has an internal inconsistency that should be flagged for documentation cleanup. ### [L2] Bug — `update_state()` FK error message can be misleading **File:** `repositories.py:5856-5861` The `IntegrityError` handler assumes a FOREIGN KEY violation is always about `new_decision_id`: ```python f"verify that new_decision_id '{new_decision_id}' exists" ``` But `new_decision_id` could be `None` if the FK error originated from another cause (e.g., the row's existing `plan_id` was concurrently deleted). The message would then say `"verify that new_decision_id 'None' exists"`, which is misleading. ### [L3] Inconsistency — `get()` calls `session.rollback()` unlike other repos **File:** `repositories.py:5739` `CorrectionAttemptRepository.get()` calls `session.rollback()` in its error handler. Other repos' read-only `get()` methods (e.g., `DecisionRepository.get()`, `CheckpointRepository.get()`) do **not** rollback on read errors. This is not harmful but is inconsistent. ### [L4] Inconsistency — No ORM relationship from parent models **File:** `models.py:2960-3035` `CorrectionAttemptModel` has FK references to `v3_plans` and `decisions` but defines no SQLAlchemy `relationship()` back to `LifecyclePlanModel` or `DecisionModel`. Other FK child tables (e.g., `PlanProjectModel`, `PlanInvariantModel`) define bidirectional relationships. While DB-level CASCADE works correctly, the lack of ORM relationships means no eager/lazy loading support and no ORM-level cascade verification. ### [L5] Test Coverage — No test for DB-level CHECK constraints The `ck_correction_attempts_mode` and `ck_correction_attempts_state` CHECK constraints are untested at the database level. Domain model validation via `CorrectionMode`/`CorrectionAttemptState` enums catches invalid values first, but the CHECK constraints (the last line of defense against bypassed validation) are never exercised in tests. ### [L6] Test Coverage — No test for Alembic migration `downgrade()` path The migration file defines both `upgrade()` and `downgrade()`, but no test verifies the downgrade correctly drops the index and table in the right order. ### [L7] Test Coverage — Missing minimum-boundary guidance test Tests cover empty guidance, whitespace-only guidance, max-length (10,000), and over-max-length (10,001), but there is no test for guidance with exactly 1 character (the minimum valid length after stripping). ### [L8] Test Coverage — Robot helper `cmd_domain_roundtrip` weak timestamp assertion **File:** `helper_correction_attempt_persistence.py:~286` The round-trip test only asserts `retrieved.created_at is not None` but does not verify the timestamp value is within an acceptable delta of the original. Given the format mismatch in H1, a stronger assertion would catch drift. --- ## Confirmed OK (no issues found) - Alembic migration chain: `down_revision = "m4_003_plan_env_columns"` is the correct single head. - CHECK constraints in `__table_args__` match Alembic migration constraints exactly. - Domain enum values match DB CHECK constraint values. - `to_domain()` / `from_domain()` round-trip is field-complete (all 10 columns mapped). - State transition validation order in `update_state()` is safe (all checks before mutation). - UnitOfWork integration: `correction_attempts` property correctly wired in `unit_of_work.py:294-303`. - `__init__.py` exports: All new domain exports added to `__all__`. - BDD step error-capture pattern: No silently-passing "try" steps — all assertion steps verify error is `not None`. - All 5 Robot helper subcommands are fully implemented with proper assertions. --- **Disposition:** The implementation is fundamentally sound. H1 (timestamp format) should be fixed before merge as it affects data ordering correctness. M1-M5 are recommended improvements. L1-L8 are quality observations that can be addressed in follow-up work.
CoreRasurae force-pushed feat/correction-attempts-table from d1b0fe4952
Some checks failed
CI / coverage (pull_request) Blocked by required conditions
CI / benchmark-regression (pull_request) Blocked by required conditions
CI / docker (pull_request) Blocked by required conditions
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m29s
CI / build (pull_request) Successful in 51s
CI / quality (pull_request) Successful in 4m4s
CI / security (pull_request) Successful in 4m25s
CI / unit_tests (pull_request) Successful in 6m42s
CI / integration_tests (pull_request) Successful in 7m16s
CI / e2e_tests (pull_request) Successful in 9m39s
CI / typecheck (pull_request) Failing after 13m4s
to 21864a5a0e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m23s
CI / integration_tests (pull_request) Successful in 3m45s
CI / quality (pull_request) Successful in 3m49s
CI / build (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 4m19s
CI / security (pull_request) Successful in 4m33s
CI / unit_tests (pull_request) Successful in 6m54s
CI / e2e_tests (pull_request) Successful in 5m54s
CI / docker (pull_request) Successful in 1m4s
CI / coverage (pull_request) Successful in 8m6s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 51m11s
2026-03-28 19:54:29 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Reviewer: Automated Code Review (3 full review cycles)
Scope: All code changes in branch feat/correction-attempts-table plus immediate surrounding context
Issue: #920


Overview

Solid implementation of the correction_attempts table per the spec DDL. The ORM model, Alembic migration, repository layer, domain model, and test suite are thorough and well-aligned with the codebase conventions. The following findings are organized by severity and category.


MEDIUM Severity

M1 — Bug: from_domain() crashes on naive datetimes

File: src/cleveragents/infrastructure/database/models.py:3094-3099

CorrectionAttemptModel.from_domain() calls record.created_at.astimezone(UTC) (line 3099) and record.completed_at.astimezone(UTC) (line 3094). Both calls will raise ValueError: astimezone() cannot be applied to a naive datetime if the datetime has no tzinfo.

While the CorrectionAttemptRecord default factory always produces UTC-aware datetimes, the created_at field has no validator enforcing timezone awareness. A caller providing a naive datetime (e.g., datetime(2026, 1, 1, 12, 0, 0)) would trigger an unhandled crash.

Recommendation: Add a defensive guard or Pydantic validator:

# Option A: Defensive guard in from_domain()
created_at_utc = (
    record.created_at.astimezone(UTC)
    if record.created_at.tzinfo is not None
    else record.created_at.replace(tzinfo=UTC)
)
# Option B: Pydantic validator on CorrectionAttemptRecord
@field_validator("created_at")
@classmethod
def _ensure_tz_aware(cls, v: datetime) -> datetime:
    if v.tzinfo is None:
        return v.replace(tzinfo=UTC)
    return v

M2 — Design limitation: Cannot update new_decision_id/archived_artifacts_path in EXECUTING state

File: src/cleveragents/infrastructure/database/repositories.py:5774-5848

update_state() requires a mandatory state parameter and validates the transition. The valid transitions map only allows EXECUTING → {COMPLETE, FAILED}. This means that if an attempt is in EXECUTING state and the caller needs to set new_decision_id (e.g., the correction engine produced a new decision during execution), they must also transition to a terminal state.

There is no way to update new_decision_id or archived_artifacts_path on an EXECUTING attempt without completing or failing it, because EXECUTING → EXECUTING is not a valid transition.

Recommendation: Either:

  1. Add EXECUTING as a valid self-transition: CorrectionAttemptState.EXECUTING: frozenset({EXECUTING, COMPLETE, FAILED}), or
  2. Add a separate update_fields() method that allows updating non-state fields without requiring a state transition.

LOW Severity

L1 — Test flaw: Overly generous timestamp tolerance in Robot helper

File: robot/helper_correction_attempt_persistence.py:288-289

delta = abs((retrieved.created_at - now).total_seconds())
assert delta < 1.0, f"created_at drift {delta}s exceeds 1s tolerance"

The from_domain() method truncates timestamps to millisecond precision. A 1-second tolerance could mask real precision or format issues. Consider tightening to < 0.01 (10ms) to verify the millisecond-precision round-trip works correctly.


L2 — Code quality: Redundant in-method import in update_state()

File: src/cleveragents/infrastructure/database/repositories.py:5810-5812

from cleveragents.domain.models.core.correction import (
    CorrectionAttemptState as _CAS,
)

CorrectionAttemptState is already loaded at module level via the top-level import of CORRECTION_ATTEMPT_TERMINAL_STATES (line 131), which is a frozenset[CorrectionAttemptState]. The in-method import is redundant.


L3 — Robustness: to_domain() does not defensively handle invalid enum values

File: src/cleveragents/infrastructure/database/models.py:3067-3070

mode=CorrectionMode(cast(str, self.mode)),
state=CorrectionAttemptState(cast(str, self.state)),

If the database contains an invalid mode or state value (e.g., due to manual DB editing or a future migration issue), these lines will raise an unhandled ValueError. Other models in the codebase (e.g., LifecyclePlanModel.to_domain()) include defensive coercion with warning-level logging for legacy values.

Recommendation: Add a try/except with a sensible fallback or at minimum a descriptive error message.


L4 — Test coverage gap: No test for delete() in non-pending state

Files: features/correction_attempt_persistence.feature, features/steps/correction_attempt_persistence_steps.py

All delete scenarios operate on correction attempts in pending state. There is no test verifying that deletion works for attempts in executing, complete, or failed states. While the implementation has no state guard, an explicit test would prevent regressions.


L5 — Test coverage gap: No cross-plan isolation test for list_by_plan()

File: features/correction_attempt_persistence.feature

There is no scenario verifying that list_by_plan(plan_A) does not return correction attempts belonging to plan_B. The empty-list scenario partially addresses this but doesn't exercise the case where attempts exist for a different plan.


INFORMATIONAL

I1 — No audit logging for correction attempt state transitions

File: src/cleveragents/infrastructure/database/repositories.py:5651-5907

The CorrectionAttemptRepository does not log state transitions via structlog. For correction workflows — which modify the decision tree and are security-sensitive — audit logging of create(), update_state(), and delete() operations would aid observability and compliance. The AuditLogModel table already supports correction_applied events.

I2 — No ORM relationship from CorrectionAttemptModel to parent models

File: src/cleveragents/infrastructure/database/models.py:2960-3035

Unlike other child tables (e.g., PlanProjectModel, PlanInvariantModel), CorrectionAttemptModel does not declare SQLAlchemy relationship() back-references to LifecyclePlanModel or DecisionModel. This means navigating from a plan to its correction attempts requires a separate query rather than ORM traversal. This is consistent with CheckpointModel but could be improved for future usability.

I3 — No pagination on list_by_plan()

File: src/cleveragents/infrastructure/database/repositories.py:5745-5770

Unlike LifecyclePlanRepository.list_plans() which supports limit/offset, list_by_plan() returns all matching records. Expected to be fine for typical correction attempt counts, but worth noting for future scaling.


Summary Table

ID Severity Category Location Description
M1 Medium Bug models.py:3094-3099 from_domain() crashes on naive datetimes
M2 Medium Design repositories.py:5774 Cannot update fields in EXECUTING state without state transition
L1 Low Test Flaw helper_*.py:288 1s tolerance too generous for ms-precision round-trip
L2 Low Code Quality repositories.py:5810 Redundant in-method import
L3 Low Robustness models.py:3067-3070 No defensive enum handling in to_domain()
L4 Low Test Coverage feature file No delete test for non-pending state
L5 Low Test Coverage feature file No cross-plan isolation test for list_by_plan()
I1 Info Enhancement repositories.py No audit logging for state transitions
I2 Info Design models.py No ORM back-reference relationships
I3 Info Design repositories.py No pagination on list_by_plan()
# Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Reviewer:** Automated Code Review (3 full review cycles) **Scope:** All code changes in branch `feat/correction-attempts-table` plus immediate surrounding context **Issue:** #920 --- ## Overview Solid implementation of the `correction_attempts` table per the spec DDL. The ORM model, Alembic migration, repository layer, domain model, and test suite are thorough and well-aligned with the codebase conventions. The following findings are organized by severity and category. --- ## MEDIUM Severity ### M1 — Bug: `from_domain()` crashes on naive datetimes **File:** `src/cleveragents/infrastructure/database/models.py:3094-3099` `CorrectionAttemptModel.from_domain()` calls `record.created_at.astimezone(UTC)` (line 3099) and `record.completed_at.astimezone(UTC)` (line 3094). Both calls will raise `ValueError: astimezone() cannot be applied to a naive datetime` if the datetime has no `tzinfo`. While the `CorrectionAttemptRecord` default factory always produces UTC-aware datetimes, the `created_at` field has **no validator enforcing timezone awareness**. A caller providing a naive datetime (e.g., `datetime(2026, 1, 1, 12, 0, 0)`) would trigger an unhandled crash. **Recommendation:** Add a defensive guard or Pydantic validator: ```python # Option A: Defensive guard in from_domain() created_at_utc = ( record.created_at.astimezone(UTC) if record.created_at.tzinfo is not None else record.created_at.replace(tzinfo=UTC) ) ``` ```python # Option B: Pydantic validator on CorrectionAttemptRecord @field_validator("created_at") @classmethod def _ensure_tz_aware(cls, v: datetime) -> datetime: if v.tzinfo is None: return v.replace(tzinfo=UTC) return v ``` --- ### M2 — Design limitation: Cannot update `new_decision_id`/`archived_artifacts_path` in EXECUTING state **File:** `src/cleveragents/infrastructure/database/repositories.py:5774-5848` `update_state()` requires a mandatory `state` parameter and validates the transition. The valid transitions map only allows EXECUTING → {COMPLETE, FAILED}. This means that if an attempt is in EXECUTING state and the caller needs to set `new_decision_id` (e.g., the correction engine produced a new decision during execution), they **must** also transition to a terminal state. There is no way to update `new_decision_id` or `archived_artifacts_path` on an EXECUTING attempt without completing or failing it, because EXECUTING → EXECUTING is not a valid transition. **Recommendation:** Either: 1. Add EXECUTING as a valid self-transition: `CorrectionAttemptState.EXECUTING: frozenset({EXECUTING, COMPLETE, FAILED})`, or 2. Add a separate `update_fields()` method that allows updating non-state fields without requiring a state transition. --- ## LOW Severity ### L1 — Test flaw: Overly generous timestamp tolerance in Robot helper **File:** `robot/helper_correction_attempt_persistence.py:288-289` ```python delta = abs((retrieved.created_at - now).total_seconds()) assert delta < 1.0, f"created_at drift {delta}s exceeds 1s tolerance" ``` The `from_domain()` method truncates timestamps to millisecond precision. A 1-second tolerance could mask real precision or format issues. Consider tightening to `< 0.01` (10ms) to verify the millisecond-precision round-trip works correctly. --- ### L2 — Code quality: Redundant in-method import in `update_state()` **File:** `src/cleveragents/infrastructure/database/repositories.py:5810-5812` ```python from cleveragents.domain.models.core.correction import ( CorrectionAttemptState as _CAS, ) ``` `CorrectionAttemptState` is already loaded at module level via the top-level import of `CORRECTION_ATTEMPT_TERMINAL_STATES` (line 131), which is a `frozenset[CorrectionAttemptState]`. The in-method import is redundant. --- ### L3 — Robustness: `to_domain()` does not defensively handle invalid enum values **File:** `src/cleveragents/infrastructure/database/models.py:3067-3070` ```python mode=CorrectionMode(cast(str, self.mode)), state=CorrectionAttemptState(cast(str, self.state)), ``` If the database contains an invalid `mode` or `state` value (e.g., due to manual DB editing or a future migration issue), these lines will raise an unhandled `ValueError`. Other models in the codebase (e.g., `LifecyclePlanModel.to_domain()`) include defensive coercion with warning-level logging for legacy values. **Recommendation:** Add a try/except with a sensible fallback or at minimum a descriptive error message. --- ### L4 — Test coverage gap: No test for `delete()` in non-pending state **Files:** `features/correction_attempt_persistence.feature`, `features/steps/correction_attempt_persistence_steps.py` All delete scenarios operate on correction attempts in `pending` state. There is no test verifying that deletion works for attempts in `executing`, `complete`, or `failed` states. While the implementation has no state guard, an explicit test would prevent regressions. --- ### L5 — Test coverage gap: No cross-plan isolation test for `list_by_plan()` **File:** `features/correction_attempt_persistence.feature` There is no scenario verifying that `list_by_plan(plan_A)` does **not** return correction attempts belonging to `plan_B`. The empty-list scenario partially addresses this but doesn't exercise the case where attempts exist for a different plan. --- ## INFORMATIONAL ### I1 — No audit logging for correction attempt state transitions **File:** `src/cleveragents/infrastructure/database/repositories.py:5651-5907` The `CorrectionAttemptRepository` does not log state transitions via `structlog`. For correction workflows — which modify the decision tree and are security-sensitive — audit logging of `create()`, `update_state()`, and `delete()` operations would aid observability and compliance. The `AuditLogModel` table already supports `correction_applied` events. ### I2 — No ORM relationship from `CorrectionAttemptModel` to parent models **File:** `src/cleveragents/infrastructure/database/models.py:2960-3035` Unlike other child tables (e.g., `PlanProjectModel`, `PlanInvariantModel`), `CorrectionAttemptModel` does not declare SQLAlchemy `relationship()` back-references to `LifecyclePlanModel` or `DecisionModel`. This means navigating from a plan to its correction attempts requires a separate query rather than ORM traversal. This is consistent with `CheckpointModel` but could be improved for future usability. ### I3 — No pagination on `list_by_plan()` **File:** `src/cleveragents/infrastructure/database/repositories.py:5745-5770` Unlike `LifecyclePlanRepository.list_plans()` which supports `limit`/`offset`, `list_by_plan()` returns all matching records. Expected to be fine for typical correction attempt counts, but worth noting for future scaling. --- ## Summary Table | ID | Severity | Category | Location | Description | |---|---|---|---|---| | M1 | Medium | Bug | `models.py:3094-3099` | `from_domain()` crashes on naive datetimes | | M2 | Medium | Design | `repositories.py:5774` | Cannot update fields in EXECUTING state without state transition | | L1 | Low | Test Flaw | `helper_*.py:288` | 1s tolerance too generous for ms-precision round-trip | | L2 | Low | Code Quality | `repositories.py:5810` | Redundant in-method import | | L3 | Low | Robustness | `models.py:3067-3070` | No defensive enum handling in `to_domain()` | | L4 | Low | Test Coverage | feature file | No delete test for non-pending state | | L5 | Low | Test Coverage | feature file | No cross-plan isolation test for `list_by_plan()` | | I1 | Info | Enhancement | `repositories.py` | No audit logging for state transitions | | I2 | Info | Design | `models.py` | No ORM back-reference relationships | | I3 | Info | Design | `repositories.py` | No pagination on `list_by_plan()` |
CoreRasurae force-pushed feat/correction-attempts-table from 21864a5a0e
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m23s
CI / integration_tests (pull_request) Successful in 3m45s
CI / quality (pull_request) Successful in 3m49s
CI / build (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 4m19s
CI / security (pull_request) Successful in 4m33s
CI / unit_tests (pull_request) Successful in 6m54s
CI / e2e_tests (pull_request) Successful in 5m54s
CI / docker (pull_request) Successful in 1m4s
CI / coverage (pull_request) Successful in 8m6s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 51m11s
to 71ffed8cf3
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 54s
CI / build (pull_request) Successful in 26s
CI / integration_tests (pull_request) Successful in 3m46s
CI / security (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m23s
CI / e2e_tests (pull_request) Successful in 6m59s
CI / unit_tests (pull_request) Successful in 7m29s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Successful in 11m16s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 22:00:50 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Scope: Code changes in branch feat/correction-attempts-table + close connections to surrounding code.
Ref: Issue #920, spec DDL at docs/specification.md lines 45580-45593.
Commit: 71ffed8 by Luis Mendes
Review cycles: 3 full passes across all categories until no new findings.


Summary

The implementation is solid overall. The CorrectionAttemptModel, CorrectionAttemptRecord, CorrectionAttemptRepository, and Alembic migration correctly implement the spec DDL with appropriate domain validation, state transition enforcement, and comprehensive test coverage. The code follows established codebase patterns (session-factory, database_retry, defensive enum coercion in to_domain()).

Findings: 2 Medium, 7 Low, 4 Informational.


Medium Severity

B-1: ON DELETE CASCADE on original_decision_id FK deviates from spec DDL

Files: models.py:2989, alembic/versions/m8_001_correction_attempts_table.py:43

The spec DDL (line 45583) defines:

original_decision_id TEXT NOT NULL REFERENCES decisions(decision_id)

No ON DELETE clause — defaults to NO ACTION/RESTRICT. The implementation adds ondelete="CASCADE".

Other models in the codebase that reference decisions.decision_id (lines 2577, 2610, 2616 in models.py) use ondelete="SET NULL", not CASCADE. CASCADE is only used for decision-to-decision dependency relationships (lines 2803, 2808). The correction_attempts table is not a decision dependency — it is an audit/tracking record.

Impact: If the original decision is ever deleted (e.g., during plan cleanup, subtree recomputation), the correction attempt audit trail is silently lost. The spec's implicit RESTRICT would prevent deletion, preserving the audit history. Consider whether SET NULL or RESTRICT would be more appropriate here, matching both the spec and the codebase convention for non-dependency references to decisions.

Note: new_decision_id correctly uses ondelete="SET NULL".


T-1: Auto-set completed_at for terminal states is untested

Files: repositories.py:5828-5833, correction_attempt_persistence.feature, helper_correction_attempt_persistence.py

update_state() contains logic to auto-set completed_at when transitioning to terminal states (complete/failed) if completed_at is not explicitly provided:

# repositories.py:5828-5833
effective_completed_at = completed_at
if (
    effective_completed_at is None
    and state in CORRECTION_ATTEMPT_TERMINAL_STATES
):
    effective_completed_at = datetime.now(UTC)

All BDD scenarios and Robot helper tests that transition to terminal states always provide an explicit completed_at=datetime.now(UTC):

  • step_update_complete_with_timestamp (line 349-351)
  • step_update_failed_with_timestamp (line 569-570)
  • step_persisted_complete (line 589-590)
  • step_persisted_failed (line 611-612)
  • cmd_update_state() in Robot helper (line 214-215)

The auto-set code path is never exercised. A scenario transitioning to complete or failed without providing completed_at would cover this.


Low Severity

B-2: to_domain() defaults unknown DB state to PENDING — allows re-execution of corrupted records

File: models.py:3074-3081

When the DB contains an unrecognized state value, to_domain() defaults to PENDING with a warning log. If a previously-terminal record has a corrupted state (e.g., a typo like "complte" instead of "complete"), it would be treated as PENDING and accept new state transitions, potentially causing re-execution.

The pattern is consistent with LifecyclePlanModel.to_domain(), and the warning log provides observability. Low practical risk since it requires DB corruption.


B-3: No non-empty validation on new_decision_id field

File: correction.py:412-415

Unlike plan_id and original_decision_id, new_decision_id has no strip-and-validate validator. A whitespace-only string would pass domain validation but fail at the DB FK constraint. Low risk since ULIDs are auto-generated, and the FK constraint provides defense-in-depth.


T-2: No test for FK violation during create() with non-existent plan_id or original_decision_id

The tests verify FK cascade behavior (plan deletion cascades) and FK violations during update_state(), but don't test the create() path when plan_id or original_decision_id reference non-existent records. This code path (lines 5697-5702 in repositories.py) which generates a descriptive FK-violation error message is untested.


T-3: No test for invalid mode value at domain model level

CorrectionMode is a StrEnum with values "revert" and "append". Passing an invalid value (e.g., "invalid") would raise a Pydantic ValidationError, but there's no explicit boundary test for this.


P-1: list_by_plan() returns unbounded result set

File: repositories.py:5746-5770

No LIMIT clause or pagination support. For plans with many correction attempts, this could cause high memory usage. In practice, the number of correction attempts per plan is expected to be small, but adding a limit parameter (defaulting to a reasonable max) would be a defensive measure.


S-1: Alembic migration uses SQLite-specific strftime() server default

File: alembic/versions/m8_001_correction_attempts_table.py:60

server_default=sa.text("(strftime('%Y-%m-%dT%H:%M:%f', 'now'))")

SQLite's strftime('%f') produces SS.SSS (seconds with milliseconds). This function doesn't exist in PostgreSQL. The server mode migration path would need a PostgreSQL-compatible default (e.g., now() AT TIME ZONE 'UTC'). This is consistent with all other migrations in the codebase.


C-1: Fragile string-based IntegrityError detection

File: repositories.py:5692-5702, 5851-5862

exc_str = str(exc).upper()
if "UNIQUE" in exc_str:
    ...
if "FOREIGN KEY" in exc_str or "FOREIGN_KEY" in exc_str:
    ...

Error classification depends on database engine error message format. While functional for SQLite, different engines may format these differently. This pattern is consistent with other repositories in the codebase.


Informational

I-1: CorrectionAttemptRecord is mutable (frozen=False) unlike most correction domain models

File: correction.py:398

Most correction-related models use frozen=True (CorrectionImpact, CorrectionResult, CorrectionDryRunReport, CascadeAction, CorrectionRejection, CascadeResult). CorrectionAttemptRecord uses frozen=False, which is needed for tests that override created_at for deterministic ordering. This is a conscious choice but could allow accidental mutation in production code.

I-2: Two different spec DDL schemas exist for correction_attempts

  • Line 18783: Uses attempt_id (PK), original_subtree_snapshot, correction_reason, status (with value 'completed')
  • Line 45580: Uses correction_attempt_id (PK), mode, guidance, archived_artifacts_path, state (with value 'complete')

The implementation correctly follows the canonical DDL at line 45580 per issue #920 acceptance criteria.

I-3: CorrectionAttempt vs CorrectionAttemptRecord naming overlap

Two similarly-named classes exist in correction.py:

  • CorrectionAttempt (line 218) — in-memory retry tracking, used by CorrectionService
  • CorrectionAttemptRecord (line 389) — spec-aligned DB record, used by the new repository

These serve different purposes but the naming proximity could cause confusion.

I-4: Repository wired to UnitOfWork but not yet consumed by CorrectionService

CorrectionAttemptRepository is accessible via UnitOfWork.correction_attempts but CorrectionService still uses its in-memory _attempts: dict[str, list[CorrectionAttempt]] dictionary. Integration is expected to be a separate ticket.


Security

No significant security issues found. The code uses SQLAlchemy ORM with parameterized queries throughout. Input validation is handled via Pydantic validators. The archived_artifacts_path field is stored as-is without path validation, which is acceptable at the persistence layer (path traversal protection would be at the service/filesystem layer).


Spec Conformance

All 7 acceptance criteria from issue #920 are addressed:

  • CorrectionAttemptModel created in models.py
  • All 10 spec DDL columns present
  • FK constraints to v3_plans and decisions tables
  • Index idx_corrections_plan on plan_id
  • Alembic migration created (m8_001_correction_attempts)
  • Repository layer with full CRUD operations
  • BDD (28 scenarios) + Robot (5 integration tests) coverage

The implementation goes beyond the spec requirements with domain-level state transition validation, timezone normalization, millisecond-precision timestamp formatting, defensive enum coercion, and max-length guidance validation.

# Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Scope:** Code changes in branch `feat/correction-attempts-table` + close connections to surrounding code. **Ref:** Issue #920, spec DDL at `docs/specification.md` lines 45580-45593. **Commit:** `71ffed8` by Luis Mendes **Review cycles:** 3 full passes across all categories until no new findings. --- ## Summary The implementation is solid overall. The `CorrectionAttemptModel`, `CorrectionAttemptRecord`, `CorrectionAttemptRepository`, and Alembic migration correctly implement the spec DDL with appropriate domain validation, state transition enforcement, and comprehensive test coverage. The code follows established codebase patterns (session-factory, `database_retry`, defensive enum coercion in `to_domain()`). **Findings: 2 Medium, 7 Low, 4 Informational.** --- ## Medium Severity ### B-1: `ON DELETE CASCADE` on `original_decision_id` FK deviates from spec DDL **Files:** `models.py:2989`, `alembic/versions/m8_001_correction_attempts_table.py:43` The spec DDL (line 45583) defines: ```sql original_decision_id TEXT NOT NULL REFERENCES decisions(decision_id) ``` No `ON DELETE` clause — defaults to `NO ACTION`/`RESTRICT`. The implementation adds `ondelete="CASCADE"`. Other models in the codebase that reference `decisions.decision_id` (lines 2577, 2610, 2616 in `models.py`) use `ondelete="SET NULL"`, not CASCADE. CASCADE is only used for decision-to-decision dependency relationships (lines 2803, 2808). The correction_attempts table is not a decision dependency — it is an audit/tracking record. **Impact:** If the original decision is ever deleted (e.g., during plan cleanup, subtree recomputation), the correction attempt audit trail is silently lost. The spec's implicit RESTRICT would prevent deletion, preserving the audit history. Consider whether `SET NULL` or `RESTRICT` would be more appropriate here, matching both the spec and the codebase convention for non-dependency references to decisions. Note: `new_decision_id` correctly uses `ondelete="SET NULL"`. --- ### T-1: Auto-set `completed_at` for terminal states is untested **Files:** `repositories.py:5828-5833`, `correction_attempt_persistence.feature`, `helper_correction_attempt_persistence.py` `update_state()` contains logic to auto-set `completed_at` when transitioning to terminal states (`complete`/`failed`) if `completed_at` is not explicitly provided: ```python # repositories.py:5828-5833 effective_completed_at = completed_at if ( effective_completed_at is None and state in CORRECTION_ATTEMPT_TERMINAL_STATES ): effective_completed_at = datetime.now(UTC) ``` All BDD scenarios and Robot helper tests that transition to terminal states **always** provide an explicit `completed_at=datetime.now(UTC)`: - `step_update_complete_with_timestamp` (line 349-351) - `step_update_failed_with_timestamp` (line 569-570) - `step_persisted_complete` (line 589-590) - `step_persisted_failed` (line 611-612) - `cmd_update_state()` in Robot helper (line 214-215) The auto-set code path is never exercised. A scenario transitioning to `complete` or `failed` **without** providing `completed_at` would cover this. --- ## Low Severity ### B-2: `to_domain()` defaults unknown DB state to `PENDING` — allows re-execution of corrupted records **File:** `models.py:3074-3081` When the DB contains an unrecognized `state` value, `to_domain()` defaults to `PENDING` with a warning log. If a previously-terminal record has a corrupted state (e.g., a typo like `"complte"` instead of `"complete"`), it would be treated as `PENDING` and accept new state transitions, potentially causing re-execution. The pattern is consistent with `LifecyclePlanModel.to_domain()`, and the warning log provides observability. Low practical risk since it requires DB corruption. --- ### B-3: No non-empty validation on `new_decision_id` field **File:** `correction.py:412-415` Unlike `plan_id` and `original_decision_id`, `new_decision_id` has no strip-and-validate validator. A whitespace-only string would pass domain validation but fail at the DB FK constraint. Low risk since ULIDs are auto-generated, and the FK constraint provides defense-in-depth. --- ### T-2: No test for FK violation during `create()` with non-existent `plan_id` or `original_decision_id` The tests verify FK cascade behavior (plan deletion cascades) and FK violations during `update_state()`, but don't test the `create()` path when `plan_id` or `original_decision_id` reference non-existent records. This code path (lines 5697-5702 in `repositories.py`) which generates a descriptive FK-violation error message is untested. --- ### T-3: No test for invalid `mode` value at domain model level `CorrectionMode` is a `StrEnum` with values `"revert"` and `"append"`. Passing an invalid value (e.g., `"invalid"`) would raise a Pydantic `ValidationError`, but there's no explicit boundary test for this. --- ### P-1: `list_by_plan()` returns unbounded result set **File:** `repositories.py:5746-5770` No `LIMIT` clause or pagination support. For plans with many correction attempts, this could cause high memory usage. In practice, the number of correction attempts per plan is expected to be small, but adding a `limit` parameter (defaulting to a reasonable max) would be a defensive measure. --- ### S-1: Alembic migration uses SQLite-specific `strftime()` server default **File:** `alembic/versions/m8_001_correction_attempts_table.py:60` ```python server_default=sa.text("(strftime('%Y-%m-%dT%H:%M:%f', 'now'))") ``` SQLite's `strftime('%f')` produces `SS.SSS` (seconds with milliseconds). This function doesn't exist in PostgreSQL. The server mode migration path would need a PostgreSQL-compatible default (e.g., `now() AT TIME ZONE 'UTC'`). This is consistent with all other migrations in the codebase. --- ### C-1: Fragile string-based `IntegrityError` detection **File:** `repositories.py:5692-5702, 5851-5862` ```python exc_str = str(exc).upper() if "UNIQUE" in exc_str: ... if "FOREIGN KEY" in exc_str or "FOREIGN_KEY" in exc_str: ... ``` Error classification depends on database engine error message format. While functional for SQLite, different engines may format these differently. This pattern is consistent with other repositories in the codebase. --- ## Informational ### I-1: `CorrectionAttemptRecord` is mutable (`frozen=False`) unlike most correction domain models **File:** `correction.py:398` Most correction-related models use `frozen=True` (`CorrectionImpact`, `CorrectionResult`, `CorrectionDryRunReport`, `CascadeAction`, `CorrectionRejection`, `CascadeResult`). `CorrectionAttemptRecord` uses `frozen=False`, which is needed for tests that override `created_at` for deterministic ordering. This is a conscious choice but could allow accidental mutation in production code. ### I-2: Two different spec DDL schemas exist for `correction_attempts` - **Line 18783:** Uses `attempt_id` (PK), `original_subtree_snapshot`, `correction_reason`, `status` (with value `'completed'`) - **Line 45580:** Uses `correction_attempt_id` (PK), `mode`, `guidance`, `archived_artifacts_path`, `state` (with value `'complete'`) The implementation correctly follows the canonical DDL at line 45580 per issue #920 acceptance criteria. ### I-3: `CorrectionAttempt` vs `CorrectionAttemptRecord` naming overlap Two similarly-named classes exist in `correction.py`: - `CorrectionAttempt` (line 218) — in-memory retry tracking, used by `CorrectionService` - `CorrectionAttemptRecord` (line 389) — spec-aligned DB record, used by the new repository These serve different purposes but the naming proximity could cause confusion. ### I-4: Repository wired to `UnitOfWork` but not yet consumed by `CorrectionService` `CorrectionAttemptRepository` is accessible via `UnitOfWork.correction_attempts` but `CorrectionService` still uses its in-memory `_attempts: dict[str, list[CorrectionAttempt]]` dictionary. Integration is expected to be a separate ticket. --- ## Security No significant security issues found. The code uses SQLAlchemy ORM with parameterized queries throughout. Input validation is handled via Pydantic validators. The `archived_artifacts_path` field is stored as-is without path validation, which is acceptable at the persistence layer (path traversal protection would be at the service/filesystem layer). --- ## Spec Conformance All 7 acceptance criteria from issue #920 are addressed: - [x] `CorrectionAttemptModel` created in `models.py` - [x] All 10 spec DDL columns present - [x] FK constraints to `v3_plans` and `decisions` tables - [x] Index `idx_corrections_plan` on `plan_id` - [x] Alembic migration created (`m8_001_correction_attempts`) - [x] Repository layer with full CRUD operations - [x] BDD (28 scenarios) + Robot (5 integration tests) coverage The implementation goes beyond the spec requirements with domain-level state transition validation, timezone normalization, millisecond-precision timestamp formatting, defensive enum coercion, and max-length guidance validation.
CoreRasurae force-pushed feat/correction-attempts-table from 71ffed8cf3
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 54s
CI / build (pull_request) Successful in 26s
CI / integration_tests (pull_request) Successful in 3m46s
CI / security (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m23s
CI / e2e_tests (pull_request) Successful in 6m59s
CI / unit_tests (pull_request) Successful in 7m29s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Successful in 11m16s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to f03897364d
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 56s
CI / security (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 22s
CI / unit_tests (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m12s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 6m7s
CI / integration_tests (pull_request) Successful in 6m59s
CI / coverage (pull_request) Successful in 11m10s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-28 22:48:00 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Reviewer: Automated code review (multi-cycle analysis)
Scope: All code changes in feat/correction-attempts-table branch plus immediate surrounding code
Cycles completed: 3 full global sweeps across all categories (bugs, security, performance, test coverage/flaws, spec conformance)
Issue: #920


Overall Assessment

The implementation is solid and well-structured. It correctly follows the spec DDL at lines 45580-45593, the session-factory repository pattern used by the rest of the codebase, and the domain model conventions. The 35 BDD scenarios and 5 Robot integration tests provide good coverage of CRUD, state transitions, constraints, and edge cases. The code has gone through nine rounds of self-review addressing real feedback, and it shows.

That said, I identified the following findings organized by severity:


MEDIUM Severity

M-1. Bug: update_state() bypasses domain validation for new_decision_id

Category: Bug / Data Integrity
File: src/cleveragents/infrastructure/database/repositories.py:5841-5842

The update_state() method sets row.new_decision_id = new_decision_id directly on the ORM model, bypassing the CorrectionAttemptRecord._new_decision_id_not_empty Pydantic validator. If a caller passes new_decision_id="" or new_decision_id=" " (whitespace-only), the value is stored raw in the DB.

The method then calls row.to_domain() at line 5846, which constructs a CorrectionAttemptRecord — the Pydantic validator strips the value to "" and raises ValueError (wrapped as ValidationError). This ValidationError is not caught by any of the except blocks, so it propagates as an unhandled exception. The DB has been flushed but not committed, leaving a dirty session.

Impact: Callers receive an unexpected ValidationError instead of a meaningful DatabaseError. If the session were ever committed despite the error (e.g., in an auto-commit context), the corrupted data would persist and break all subsequent get()/list_by_plan() reads for that record.

Suggested fix: Validate new_decision_id before assignment:

if new_decision_id is not None:
    stripped = new_decision_id.strip()
    if not stripped:
        raise DatabaseError("new_decision_id must not be empty when set")
    row.new_decision_id = stripped

LOW Severity

L-1. Test Flaw: BDD step reuses _ca_guidance_error for mode validation

Category: Test Flaw / Naming
File: features/steps/correction_attempt_persistence_steps.pystep_create_invalid_mode (line 845)

The step for scenario "Invalid mode value is rejected at domain model level" stores the caught ValidationError in context._ca_guidance_error, then the Then step "a guidance validation error should be raised" asserts isinstance(..., ValidationError). While this works technically, using _ca_guidance_error for a mode validation error is misleading. The assertion only checks the exception type, not whether the error is actually about the mode field. A future regression where guidance validation fires instead of mode validation would pass silently.

Suggested fix: Either use a separate variable (e.g., _ca_mode_error) with a dedicated Then step, or add a field-name assertion to the existing check:

assert any("mode" in str(e["loc"]) for e in context._ca_guidance_error.errors())

L-2. Design: update_state() conflates state transitions with field updates

Category: Design Limitation
File: src/cleveragents/infrastructure/database/repositories.py:5765-5846

There is no way to update new_decision_id or archived_artifacts_path without also performing a state transition. The state parameter is required, and validate_correction_state_transition() rejects same-state "transitions" (e.g., executing → executing). This means you cannot set new_decision_id on an already-executing record without changing its state.

For the current correction lifecycle this works (fields are set during transitions), but it limits extensibility. A minor design concern, not a bug.

L-3. Security: archived_artifacts_path has no path validation

Category: Security / Input Validation
File: src/cleveragents/domain/models/core/correction.py:425-428

The archived_artifacts_path field accepts arbitrary strings with no sanitization (no path traversal checks, no max-length, no allowed-prefix validation). While the model is a persistence record and does not perform I/O, if downstream consumers use this path for file operations without their own validation, path traversal could be a concern. The guidance field has both strip and max_length validation; archived_artifacts_path has neither.

Suggested: Consider adding max_length and/or a field_validator that rejects obviously malicious patterns (e.g., .. components) — or document that consumers must validate the path before use.

L-4. Test Coverage Gap: No test for whitespace-only new_decision_id via update_state()

Category: Test Coverage
Related to: M-1

No BDD scenario or Robot test exercises calling update_state() with new_decision_id="" or new_decision_id=" ". This is the path that triggers the M-1 bug.

L-5. Test Coverage Gap: No test for combined new_decision_id + archived_artifacts_path in single update_state() call

Category: Test Coverage
File: features/correction_attempt_persistence.feature

Existing scenarios test new_decision_id and archived_artifacts_path updates individually but never together in a single update_state() call. While unlikely to cause issues, it's an untested code path.


INFORMATIONAL

I-1. Spec has two divergent DDL blocks for correction_attempts

Category: Spec Conformance

The spec contains two definitions of correction_attempts:

  • Line 18783 (earlier, within the decision tree section): uses attempt_id as PK, status column (with value 'completed'), and has original_subtree_snapshot/correction_reason columns.
  • Line 45580 (later, in the persistence DDL section): uses correction_attempt_id as PK, state column (with value 'complete'), and has mode/guidance columns.

The implementation correctly follows the line 45580 version as referenced by issue #920. The earlier block appears to be an older draft. Not a code issue, but worth noting for spec maintainers.

I-2. ON DELETE behaviors are implementation decisions beyond the spec DDL

Category: Spec Conformance

The spec DDL at line 45580 has no explicit ON DELETE clauses (defaulting to NO ACTION). The implementation adds:

  • plan_id: CASCADE — deleting a plan auto-deletes its correction attempts
  • original_decision_id: RESTRICT — prevents deleting a decision that has correction attempts
  • new_decision_id: SET NULL — nullifies the link if the new decision is deleted

These are reasonable choices well-documented in the code comments. They go beyond the literal spec but align with codebase conventions (e.g., plan_id FKs on other tables also use CASCADE). The RESTRICT on original_decision_id is particularly well-motivated — it preserves the correction audit trail.


Summary Table

ID Severity Category File Description
M-1 Medium Bug repositories.py:5841 update_state() bypasses domain validation for new_decision_id
L-1 Low Test Flaw correction_attempt_persistence_steps.py:845 Misleading _ca_guidance_error variable used for mode validation
L-2 Low Design repositories.py:5765 update_state() conflates state transitions with field updates
L-3 Low Security correction.py:425 archived_artifacts_path has no path validation
L-4 Low Test Gap No test for whitespace new_decision_id via update_state()
L-5 Low Test Gap correction_attempt_persistence.feature No combined field update test
I-1 Info Spec Two divergent DDL blocks in specification
I-2 Info Spec ON DELETE behaviors beyond spec (well-justified)

Recommendation: Address M-1 before merge (small fix, prevents data integrity issue). L-1 through L-5 are deferrable but worth tracking.

## Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Reviewer**: Automated code review (multi-cycle analysis) **Scope**: All code changes in `feat/correction-attempts-table` branch plus immediate surrounding code **Cycles completed**: 3 full global sweeps across all categories (bugs, security, performance, test coverage/flaws, spec conformance) **Issue**: #920 --- ### Overall Assessment The implementation is **solid** and **well-structured**. It correctly follows the spec DDL at lines 45580-45593, the session-factory repository pattern used by the rest of the codebase, and the domain model conventions. The 35 BDD scenarios and 5 Robot integration tests provide good coverage of CRUD, state transitions, constraints, and edge cases. The code has gone through nine rounds of self-review addressing real feedback, and it shows. That said, I identified the following findings organized by severity: --- ## MEDIUM Severity ### M-1. Bug: `update_state()` bypasses domain validation for `new_decision_id` **Category**: Bug / Data Integrity **File**: `src/cleveragents/infrastructure/database/repositories.py:5841-5842` The `update_state()` method sets `row.new_decision_id = new_decision_id` directly on the ORM model, bypassing the `CorrectionAttemptRecord._new_decision_id_not_empty` Pydantic validator. If a caller passes `new_decision_id=""` or `new_decision_id=" "` (whitespace-only), the value is stored raw in the DB. The method then calls `row.to_domain()` at line 5846, which constructs a `CorrectionAttemptRecord` — the Pydantic validator strips the value to `""` and raises `ValueError` (wrapped as `ValidationError`). This `ValidationError` is **not caught** by any of the `except` blocks, so it propagates as an unhandled exception. The DB has been flushed but not committed, leaving a dirty session. **Impact**: Callers receive an unexpected `ValidationError` instead of a meaningful `DatabaseError`. If the session were ever committed despite the error (e.g., in an auto-commit context), the corrupted data would persist and break all subsequent `get()`/`list_by_plan()` reads for that record. **Suggested fix**: Validate `new_decision_id` before assignment: ```python if new_decision_id is not None: stripped = new_decision_id.strip() if not stripped: raise DatabaseError("new_decision_id must not be empty when set") row.new_decision_id = stripped ``` --- ## LOW Severity ### L-1. Test Flaw: BDD step reuses `_ca_guidance_error` for mode validation **Category**: Test Flaw / Naming **File**: `features/steps/correction_attempt_persistence_steps.py` — `step_create_invalid_mode` (line 845) The step for scenario "Invalid mode value is rejected at domain model level" stores the caught `ValidationError` in `context._ca_guidance_error`, then the `Then` step "a guidance validation error should be raised" asserts `isinstance(..., ValidationError)`. While this works technically, using `_ca_guidance_error` for a **mode** validation error is misleading. The assertion only checks the exception type, not whether the error is actually about the `mode` field. A future regression where guidance validation fires instead of mode validation would pass silently. **Suggested fix**: Either use a separate variable (e.g., `_ca_mode_error`) with a dedicated `Then` step, or add a field-name assertion to the existing check: ```python assert any("mode" in str(e["loc"]) for e in context._ca_guidance_error.errors()) ``` ### L-2. Design: `update_state()` conflates state transitions with field updates **Category**: Design Limitation **File**: `src/cleveragents/infrastructure/database/repositories.py:5765-5846` There is no way to update `new_decision_id` or `archived_artifacts_path` without also performing a state transition. The `state` parameter is required, and `validate_correction_state_transition()` rejects same-state "transitions" (e.g., `executing → executing`). This means you cannot set `new_decision_id` on an already-executing record without changing its state. For the current correction lifecycle this works (fields are set during transitions), but it limits extensibility. A minor design concern, not a bug. ### L-3. Security: `archived_artifacts_path` has no path validation **Category**: Security / Input Validation **File**: `src/cleveragents/domain/models/core/correction.py:425-428` The `archived_artifacts_path` field accepts arbitrary strings with no sanitization (no path traversal checks, no max-length, no allowed-prefix validation). While the model is a persistence record and does not perform I/O, if downstream consumers use this path for file operations without their own validation, path traversal could be a concern. The `guidance` field has both strip and max_length validation; `archived_artifacts_path` has neither. **Suggested**: Consider adding `max_length` and/or a `field_validator` that rejects obviously malicious patterns (e.g., `..` components) — or document that consumers must validate the path before use. ### L-4. Test Coverage Gap: No test for whitespace-only `new_decision_id` via `update_state()` **Category**: Test Coverage **Related to**: M-1 No BDD scenario or Robot test exercises calling `update_state()` with `new_decision_id=""` or `new_decision_id=" "`. This is the path that triggers the M-1 bug. ### L-5. Test Coverage Gap: No test for combined `new_decision_id` + `archived_artifacts_path` in single `update_state()` call **Category**: Test Coverage **File**: `features/correction_attempt_persistence.feature` Existing scenarios test `new_decision_id` and `archived_artifacts_path` updates individually but never together in a single `update_state()` call. While unlikely to cause issues, it's an untested code path. --- ## INFORMATIONAL ### I-1. Spec has two divergent DDL blocks for `correction_attempts` **Category**: Spec Conformance The spec contains two definitions of `correction_attempts`: - **Line 18783** (earlier, within the decision tree section): uses `attempt_id` as PK, `status` column (with value `'completed'`), and has `original_subtree_snapshot`/`correction_reason` columns. - **Line 45580** (later, in the persistence DDL section): uses `correction_attempt_id` as PK, `state` column (with value `'complete'`), and has `mode`/`guidance` columns. The implementation correctly follows the **line 45580** version as referenced by issue #920. The earlier block appears to be an older draft. Not a code issue, but worth noting for spec maintainers. ### I-2. `ON DELETE` behaviors are implementation decisions beyond the spec DDL **Category**: Spec Conformance The spec DDL at line 45580 has no explicit `ON DELETE` clauses (defaulting to `NO ACTION`). The implementation adds: - `plan_id`: `CASCADE` — deleting a plan auto-deletes its correction attempts - `original_decision_id`: `RESTRICT` — prevents deleting a decision that has correction attempts - `new_decision_id`: `SET NULL` — nullifies the link if the new decision is deleted These are **reasonable** choices well-documented in the code comments. They go beyond the literal spec but align with codebase conventions (e.g., `plan_id` FKs on other tables also use `CASCADE`). The `RESTRICT` on `original_decision_id` is particularly well-motivated — it preserves the correction audit trail. --- ### Summary Table | ID | Severity | Category | File | Description | |----|----------|----------|------|-------------| | M-1 | **Medium** | Bug | `repositories.py:5841` | `update_state()` bypasses domain validation for `new_decision_id` | | L-1 | Low | Test Flaw | `correction_attempt_persistence_steps.py:845` | Misleading `_ca_guidance_error` variable used for mode validation | | L-2 | Low | Design | `repositories.py:5765` | `update_state()` conflates state transitions with field updates | | L-3 | Low | Security | `correction.py:425` | `archived_artifacts_path` has no path validation | | L-4 | Low | Test Gap | — | No test for whitespace `new_decision_id` via `update_state()` | | L-5 | Low | Test Gap | `correction_attempt_persistence.feature` | No combined field update test | | I-1 | Info | Spec | — | Two divergent DDL blocks in specification | | I-2 | Info | Spec | — | ON DELETE behaviors beyond spec (well-justified) | **Recommendation**: Address M-1 before merge (small fix, prevents data integrity issue). L-1 through L-5 are deferrable but worth tracking.
CoreRasurae force-pushed feat/correction-attempts-table from f03897364d
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 56s
CI / security (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 22s
CI / unit_tests (pull_request) Successful in 4m11s
CI / typecheck (pull_request) Successful in 4m12s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 6m7s
CI / integration_tests (pull_request) Successful in 6m59s
CI / coverage (pull_request) Successful in 11m10s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to 421a73b0ac
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 47s
CI / quality (pull_request) Successful in 57s
CI / security (pull_request) Successful in 1m2s
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m21s
CI / unit_tests (pull_request) Successful in 4m9s
CI / integration_tests (pull_request) Successful in 3m53s
CI / e2e_tests (pull_request) Successful in 6m42s
CI / docker (pull_request) Successful in 1m41s
CI / coverage (pull_request) Successful in 11m41s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 53m9s
2026-03-28 23:39:09 +00:00
Compare
CoreRasurae left a comment

Code Review Report: feat(db): add correction_attempts table per specification DDL

PR: #1145 | Issue: #920 | Branch: feat/correction-attempts-table
Reviewed: 3 full review cycles across all categories (bug detection, security, performance, test coverage, code quality)
Scope: All 11 changed files plus close connections to surrounding code (repositories, models, domain, migration chain, retry decorator, unit-of-work)


Summary

Solid implementation overall. The correction_attempts table, ORM model, domain model, repository CRUD, Alembic migration, and comprehensive BDD/Robot tests are well-structured and follow established codebase conventions. The spec DDL from issue #920 (lines 45580-45593) is faithfully implemented with appropriate defensive additions (CHECK constraints, timezone normalization, lifecycle state machine).

One medium-severity bug was found in update_state() error handling. The remainder are low-severity observations and minor test coverage gaps.


Findings by Severity

MEDIUM

M1 -- Dirty session on new_decision_id validation failure in update_state() (Bug)

File: repositories.py:5826-5846

In update_state(), the ORM row is mutated (row.state, possibly row.completed_at) before the new_decision_id emptiness validation on line 5843. If validation raises DatabaseError, the session is left dirty with partial state changes and no rollback occurs within the method:

row.state = state.value            # line 5826: session dirty
# ... row.completed_at may be set  # lines 5834-5840: more dirty
if new_decision_id is not None:
    stripped_decision_id = new_decision_id.strip()
    if not stripped_decision_id:
        raise DatabaseError(...)   # line 5844: propagates with dirty session

The raised DatabaseError is not caught by any of the except clauses (they catch CorrectionAttemptNotFoundError, InvalidCorrectionStateTransitionError, IntegrityError, OperationalError, and SQLAlchemyDatabaseError -- but not the base application DatabaseError).

Impact: The caller receives the exception but the session retains the partial state/completed_at mutations. If the caller doesn't explicitly rollback, subsequent operations on that session may see or persist inconsistent data.

Suggested fix: Either (a) validate all inputs before mutating the row, or (b) add session.rollback() before raising the validation DatabaseError.


M2 -- Validation error incorrectly triggers @database_retry (Bug)

File: repositories.py:5844 + retry_patterns.py:128-131

The DatabaseError raised for empty/whitespace new_decision_id is an instance of cleveragents.core.exceptions.DatabaseError. The @database_retry decorator retries on DatabaseError (line 131 of retry_patterns.py). This means the non-transient validation error will be retried 3 times with 0.5s delays, wasting ~1.5 seconds before ultimately re-raising.

Note: This retry-on-business-logic-error pattern is pre-existing across the codebase (all *NotFoundError, Duplicate*Error classes inherit from DatabaseError). However, using the base DatabaseError class directly for a validation check (rather than a specific subclass) makes this instance particularly opaque.

Suggested fix: Introduce a dedicated exception (e.g. InvalidCorrectionFieldError(DatabaseError)) or validate before entering the retry-decorated code path.


LOW

L1 -- Inconsistent error type for new_decision_id validation (Code Quality)

File: repositories.py:5844

Empty/whitespace new_decision_id raises DatabaseError, but this is fundamentally a validation error, not a database error. Compare with InvalidCorrectionStateTransitionError which has a dedicated class for state validation. Using a specific exception subclass would improve error diagnostics and allow callers to handle validation vs. DB failures differently.


L2 -- No input validation for archived_artifacts_path in update_state() (Code Quality / Consistency)

File: repositories.py:5848-5849

new_decision_id gets strip-and-validate treatment (rejecting empty/whitespace), but archived_artifacts_path is accepted as-is without any validation. An empty string "" would be stored. For consistency with the new_decision_id validation pattern and CONTRIBUTING.md argument validation guidelines, archived_artifacts_path could benefit from the same guard.


L3 -- Missing BDD test for self-transition rejection (Test Coverage)

File: correction_attempt_persistence.feature

The test suite covers pending -> complete (rejected), complete -> executing (rejected), failed -> pending (rejected), etc. However, self-transitions like executing -> executing or pending -> pending are not tested. These should be rejected per the valid transitions map but there is no explicit scenario verifying this.


L4 -- list_by_plan() ordering relies on lexicographic string comparison (Robustness)

File: repositories.py:5763

list_by_plan() orders by the created_at string column. The YYYY-MM-DDTHH:MM:SS.mmm format ensures lexicographic = chronological ordering, but this is inherently fragile. If a timestamp is ever written in a different format (e.g. with timezone offset suffix), ordering would silently break. The code has mitigations (UTC normalization, format truncation in from_domain), but the invariant is not enforced at the DB level.


INFORMATIONAL

I1 -- Specification has two inconsistent DDL definitions for correction_attempts

Spec lines: 18783-18797 vs. 45580-45593

The specification contains two separate DDL definitions for correction_attempts with different column names:

  • Line 18783: attempt_id, original_subtree_snapshot, correction_reason, status (value 'completed')
  • Line 45580: correction_attempt_id, guidance, archived_artifacts_path, state (value 'complete'), mode

The implementation correctly follows the line 45580 DDL (as referenced in issue #920). This discrepancy is in the specification, not the code, but is worth noting for spec maintenance.


I2 -- to_domain() defensive coercion silently changes invalid DB values

File: models.py:3067-3084

When invalid mode/state values are found in the DB, to_domain() defaults to REVERT/PENDING with a warning log. This follows the established LifecyclePlanModel.to_domain() pattern. The risk is that a read-modify-write cycle would silently "fix" corrupted data without explicit correction. Acceptable for defensive resilience but worth documenting.


I3 -- No pagination on list_by_plan()

File: repositories.py:5746-5770

If a plan accumulates a large number of correction attempts, list_by_plan() loads all records into memory. In practice this is unlikely (corrections are manual operations), but a limit/offset parameter would future-proof the API.


I4 -- archived_artifacts_path not validated for path traversal

File: correction.py:425-428, repositories.py:5848

The archived_artifacts_path field accepts any string without path traversal validation (e.g. ../../etc/passwd). Since this is a persistence layer (not executing filesystem operations), the actual risk depends on downstream consumers. Defense-in-depth validation could be added at the domain model level.


I5 -- CorrectionAttemptRecord does not validate correction_attempt_id format

File: correction.py:400-403

The correction_attempt_id defaults to str(ULID()) but has no validator ensuring the format when explicitly provided. The String(26) DB column could silently truncate longer strings in some configurations. Low risk since the default factory always produces valid ULIDs.


Conclusion

The implementation is well-crafted and thorough, with 38 BDD scenarios and 5 Robot integration tests providing strong coverage. The key actionable items are M1 (dirty session bug) and M2 (retry-on-validation). The low-severity and informational items are suggestions for incremental improvement.

# Code Review Report: `feat(db): add correction_attempts table per specification DDL` **PR:** #1145 | **Issue:** #920 | **Branch:** `feat/correction-attempts-table` **Reviewed:** 3 full review cycles across all categories (bug detection, security, performance, test coverage, code quality) **Scope:** All 11 changed files plus close connections to surrounding code (repositories, models, domain, migration chain, retry decorator, unit-of-work) --- ## Summary Solid implementation overall. The `correction_attempts` table, ORM model, domain model, repository CRUD, Alembic migration, and comprehensive BDD/Robot tests are well-structured and follow established codebase conventions. The spec DDL from issue #920 (lines 45580-45593) is faithfully implemented with appropriate defensive additions (CHECK constraints, timezone normalization, lifecycle state machine). One medium-severity bug was found in `update_state()` error handling. The remainder are low-severity observations and minor test coverage gaps. --- ## Findings by Severity ### MEDIUM #### M1 -- Dirty session on `new_decision_id` validation failure in `update_state()` (Bug) **File:** `repositories.py:5826-5846` In `update_state()`, the ORM row is mutated (`row.state`, possibly `row.completed_at`) **before** the `new_decision_id` emptiness validation on line 5843. If validation raises `DatabaseError`, the session is left dirty with partial state changes and **no rollback** occurs within the method: ```python row.state = state.value # line 5826: session dirty # ... row.completed_at may be set # lines 5834-5840: more dirty if new_decision_id is not None: stripped_decision_id = new_decision_id.strip() if not stripped_decision_id: raise DatabaseError(...) # line 5844: propagates with dirty session ``` The raised `DatabaseError` is not caught by any of the `except` clauses (they catch `CorrectionAttemptNotFoundError`, `InvalidCorrectionStateTransitionError`, `IntegrityError`, `OperationalError`, and `SQLAlchemyDatabaseError` -- but not the base application `DatabaseError`). **Impact:** The caller receives the exception but the session retains the partial `state`/`completed_at` mutations. If the caller doesn't explicitly rollback, subsequent operations on that session may see or persist inconsistent data. **Suggested fix:** Either (a) validate all inputs **before** mutating the row, or (b) add `session.rollback()` before raising the validation `DatabaseError`. --- #### M2 -- Validation error incorrectly triggers `@database_retry` (Bug) **File:** `repositories.py:5844` + `retry_patterns.py:128-131` The `DatabaseError` raised for empty/whitespace `new_decision_id` is an instance of `cleveragents.core.exceptions.DatabaseError`. The `@database_retry` decorator retries on `DatabaseError` (line 131 of `retry_patterns.py`). This means the **non-transient validation error** will be retried 3 times with 0.5s delays, wasting ~1.5 seconds before ultimately re-raising. **Note:** This retry-on-business-logic-error pattern is pre-existing across the codebase (all `*NotFoundError`, `Duplicate*Error` classes inherit from `DatabaseError`). However, using the **base** `DatabaseError` class directly for a validation check (rather than a specific subclass) makes this instance particularly opaque. **Suggested fix:** Introduce a dedicated exception (e.g. `InvalidCorrectionFieldError(DatabaseError)`) or validate before entering the retry-decorated code path. --- ### LOW #### L1 -- Inconsistent error type for `new_decision_id` validation (Code Quality) **File:** `repositories.py:5844` Empty/whitespace `new_decision_id` raises `DatabaseError`, but this is fundamentally a **validation** error, not a database error. Compare with `InvalidCorrectionStateTransitionError` which has a dedicated class for state validation. Using a specific exception subclass would improve error diagnostics and allow callers to handle validation vs. DB failures differently. --- #### L2 -- No input validation for `archived_artifacts_path` in `update_state()` (Code Quality / Consistency) **File:** `repositories.py:5848-5849` `new_decision_id` gets strip-and-validate treatment (rejecting empty/whitespace), but `archived_artifacts_path` is accepted as-is without any validation. An empty string `""` would be stored. For consistency with the `new_decision_id` validation pattern and CONTRIBUTING.md argument validation guidelines, `archived_artifacts_path` could benefit from the same guard. --- #### L3 -- Missing BDD test for self-transition rejection (Test Coverage) **File:** `correction_attempt_persistence.feature` The test suite covers `pending -> complete` (rejected), `complete -> executing` (rejected), `failed -> pending` (rejected), etc. However, **self-transitions** like `executing -> executing` or `pending -> pending` are not tested. These should be rejected per the valid transitions map but there is no explicit scenario verifying this. --- #### L4 -- `list_by_plan()` ordering relies on lexicographic string comparison (Robustness) **File:** `repositories.py:5763` `list_by_plan()` orders by the `created_at` string column. The `YYYY-MM-DDTHH:MM:SS.mmm` format ensures lexicographic = chronological ordering, but this is inherently fragile. If a timestamp is ever written in a different format (e.g. with timezone offset suffix), ordering would silently break. The code has mitigations (UTC normalization, format truncation in `from_domain`), but the invariant is not enforced at the DB level. --- ### INFORMATIONAL #### I1 -- Specification has two inconsistent DDL definitions for `correction_attempts` **Spec lines:** 18783-18797 vs. 45580-45593 The specification contains two separate DDL definitions for `correction_attempts` with different column names: - Line 18783: `attempt_id`, `original_subtree_snapshot`, `correction_reason`, `status` (value `'completed'`) - Line 45580: `correction_attempt_id`, `guidance`, `archived_artifacts_path`, `state` (value `'complete'`), `mode` The implementation correctly follows the line 45580 DDL (as referenced in issue #920). This discrepancy is in the specification, not the code, but is worth noting for spec maintenance. --- #### I2 -- `to_domain()` defensive coercion silently changes invalid DB values **File:** `models.py:3067-3084` When invalid `mode`/`state` values are found in the DB, `to_domain()` defaults to `REVERT`/`PENDING` with a warning log. This follows the established `LifecyclePlanModel.to_domain()` pattern. The risk is that a read-modify-write cycle would silently "fix" corrupted data without explicit correction. Acceptable for defensive resilience but worth documenting. --- #### I3 -- No pagination on `list_by_plan()` **File:** `repositories.py:5746-5770` If a plan accumulates a large number of correction attempts, `list_by_plan()` loads all records into memory. In practice this is unlikely (corrections are manual operations), but a `limit`/`offset` parameter would future-proof the API. --- #### I4 -- `archived_artifacts_path` not validated for path traversal **File:** `correction.py:425-428`, `repositories.py:5848` The `archived_artifacts_path` field accepts any string without path traversal validation (e.g. `../../etc/passwd`). Since this is a persistence layer (not executing filesystem operations), the actual risk depends on downstream consumers. Defense-in-depth validation could be added at the domain model level. --- #### I5 -- `CorrectionAttemptRecord` does not validate `correction_attempt_id` format **File:** `correction.py:400-403` The `correction_attempt_id` defaults to `str(ULID())` but has no validator ensuring the format when explicitly provided. The `String(26)` DB column could silently truncate longer strings in some configurations. Low risk since the default factory always produces valid ULIDs. --- ## Conclusion The implementation is well-crafted and thorough, with 38 BDD scenarios and 5 Robot integration tests providing strong coverage. The key actionable items are **M1** (dirty session bug) and **M2** (retry-on-validation). The low-severity and informational items are suggestions for incremental improvement.
CoreRasurae force-pushed feat/correction-attempts-table from 421a73b0ac
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 47s
CI / quality (pull_request) Successful in 57s
CI / security (pull_request) Successful in 1m2s
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m21s
CI / unit_tests (pull_request) Successful in 4m9s
CI / integration_tests (pull_request) Successful in 3m53s
CI / e2e_tests (pull_request) Successful in 6m42s
CI / docker (pull_request) Successful in 1m41s
CI / coverage (pull_request) Successful in 11m41s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 53m9s
to 543c7332c2
Some checks are pending
CI / benchmark-publish (pull_request) Waiting to run
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m43s
CI / security (pull_request) Successful in 4m4s
CI / build (pull_request) Successful in 21s
CI / typecheck (pull_request) Successful in 4m14s
CI / benchmark-regression (pull_request) Waiting to run
CI / integration_tests (pull_request) Successful in 3m46s
CI / unit_tests (pull_request) Successful in 6m44s
CI / docker (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Successful in 6m29s
CI / coverage (pull_request) Successful in 12m17s
CI / status-check (pull_request) Successful in 1s
2026-03-29 09:17:40 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Scope: All code changes on feat/correction-attempts-table branch (commit 543c733) plus close connections to surrounding code.
Methodology: 3 full review cycles across all categories (bugs, security, performance, test coverage, spec compliance, code quality). No tests were executed.


Finding #1archived_artifacts_path not stripped before storage in update_state()

Attribute Value
Severity HIGH
Category Bug / Data Integrity
File repositories.py:5812–5866

The argument validation block at lines 5812–5817 computes stripped_path = archived_artifacts_path.strip() and correctly rejects empty/whitespace-only values. However, the mutation block at line 5865–5866 writes the original unstripped value to the database row:

# Line 5812-5817 — validation (correct)
if archived_artifacts_path is not None:
    stripped_path = archived_artifacts_path.strip()
    if not stripped_path:
        raise DatabaseError(...)

# Line 5865-5866 — mutation (BUG: uses original, not stripped)
if archived_artifacts_path is not None:
    row.archived_artifacts_path = archived_artifacts_path  # should be stripped_path

Compare with new_decision_id which correctly uses the stripped value:

# Line 5863-5864 — mutation (correct: uses stripped_decision_id)
if stripped_decision_id is not None:
    row.new_decision_id = stripped_decision_id

Impact: Leading/trailing whitespace in archived_artifacts_path will be persisted to the database, creating data inconsistency between the validation logic and the stored value. The stripped_path variable is computed but discarded.

Suggested fix: Change line 5865–5866 to use stripped_path:

if stripped_path is not None:
    row.archived_artifacts_path = stripped_path

(Note: the outer condition should also change from archived_artifacts_path to stripped_path to be consistent with the new_decision_id pattern.)


Finding #2InvalidCorrectionStateTransitionError inherits from DatabaseError

Attribute Value
Severity MEDIUM
Category Design / Architecture
File repositories.py:5647

InvalidCorrectionStateTransitionError represents a business rule violation (invalid state machine transition per the spec lifecycle pending -> executing -> complete|failed), not a database error. It inherits from DatabaseError:

class InvalidCorrectionStateTransitionError(DatabaseError):

The analogous domain-level exception InvalidJobTransitionError (in domain/models/core/async_job.py) does not inherit from DatabaseError. This inconsistency means callers catching DatabaseError broadly (e.g., for retry logic) would inadvertently catch business logic violations, preventing proper differentiation between "retry-worthy" database errors and "non-retryable" validation failures.

Suggested fix: Consider making InvalidCorrectionStateTransitionError inherit from a domain exception base class rather than DatabaseError, or introduce a RepositoryValidationError intermediate class to distinguish business rule violations from actual database failures.


Finding #3 — Spec has two conflicting correction_attempts DDL definitions

Attribute Value
Severity INFO
Category Specification Compliance
Location docs/specification.md:18783 vs docs/specification.md:45580

The specification contains two different DDL definitions for correction_attempts:

Aspect Line 18783 ("Correction history") Line 45580 (Main DDL) Implementation
PK name attempt_id correction_attempt_id correction_attempt_id
State value 'completed' 'complete' 'complete'
Additional columns original_subtree_snapshot, correction_reason mode, guidance, archived_artifacts_path mode, guidance, archived_artifacts_path
Status column name status state state

The implementation correctly follows the main DDL (line 45580) as referenced by issue #920. The earlier DDL at line 18783 appears to be a superseded version. No action required — noted for awareness only. The spec discrepancy itself may warrant a documentation fix outside this PR.


Finding #4plan_id FK uses CASCADE vs spec default

Attribute Value
Severity INFO
Category Specification Compliance
File models.py:2986, m8_001_correction_attempts_table.py:34

The spec DDL at line 45582 specifies REFERENCES plans(plan_id) without an ON DELETE clause, which defaults to SQL NO ACTION (equivalent to RESTRICT). The implementation uses ondelete="CASCADE". While CASCADE is semantically appropriate (orphaned correction attempts after plan deletion are meaningless), it is a literal deviation from the spec DDL. The commit message documents the intentional choice for original_decision_id (changed to RESTRICT) but does not document the CASCADE choice for plan_id.


Finding #5 — Missing test for whitespace-padded archived_artifacts_path

Attribute Value
Severity LOW
Category Test Coverage
File correction_attempt_persistence.feature

There are BDD scenarios for empty and whitespace-only archived_artifacts_path being rejected (lines 303–316), but no scenario testing a whitespace-padded valid path (e.g., " /path/to/file "). This gap is directly related to Finding #1 — a test with a whitespace-padded path would expose the stripping inconsistency.


Finding #6 — Robot helper covers only happy paths

Attribute Value
Severity LOW
Category Test Coverage
File robot/helper_correction_attempt_persistence.py

The Robot integration tests (5 test cases) cover only happy-path CRUD operations. Error scenarios (invalid state transitions, FK violations, not-found cases, duplicate detection) are covered exclusively by BDD tests (42 scenarios). While BDD coverage is comprehensive, having zero error coverage in Robot integration tests is a gap compared to other repository test suites in the codebase.


Finding #7 — No pagination for list_by_plan()

Attribute Value
Severity LOW
Category Performance
File repositories.py:5746

list_by_plan() loads all correction attempts for a plan into memory with no limit/offset support. While consistent with the codebase convention (e.g., CheckpointRepository.list_by_plan()) and unlikely to be a practical issue (plans rarely have hundreds of correction attempts), it is worth noting for future scalability.


Finding #8 — Magic number [:23] for timestamp truncation

Attribute Value
Severity LOW
Category Code Quality
Files models.py:3121,3135 and repositories.py:5862

The slice [:23] appears in 3 places to truncate Python's 6-digit microsecond strftime output to 3-digit millisecond precision matching SQLite's strftime('%f') format. While each occurrence has a comment explaining the purpose, a named constant would improve maintainability:

_SQLITE_TIMESTAMP_LEN = 23  # "YYYY-MM-DDTHH:MM:SS.mmm"

Finding #9update_state() method length

Attribute Value
Severity LOW
Category Code Quality
File repositories.py:5775–5892

The update_state() method is approximately 80 lines, handling argument validation, DB row lookup, state transition validation, completed_at guard, multiple optional field mutations, and three exception handler blocks. While still readable, the argument validation block (lines 5803–5817) could be extracted into a private _validate_update_args() helper to improve separation of concerns and testability.


Finding #10 — Defensive to_domain() defaults could mask data corruption

Attribute Value
Severity LOW
Category Security / Data Integrity
File models.py:3069–3084

When unknown mode or state values are found in the database, to_domain() logs a warning and defaults to REVERT/PENDING respectively. While consistent with the codebase convention (LifecyclePlanModel.to_domain()), defaulting to PENDING specifically could make a corrupted record appear actionable — it would pass the update_state() state transition validation and could be "progressed" through the lifecycle as if it were a legitimate pending attempt.


Summary

# Severity Category Description
1 HIGH Bug archived_artifacts_path not stripped before DB write in update_state()
2 MEDIUM Design InvalidCorrectionStateTransitionError inherits from DatabaseError
3 INFO Spec Two conflicting DDL definitions in spec (implementation follows correct one)
4 INFO Spec plan_id FK uses CASCADE vs spec implicit NO ACTION
5 LOW Test Missing whitespace-padded archived_artifacts_path test
6 LOW Test Robot helper covers only happy paths
7 Low Performance No pagination for list_by_plan()
8 Low Code Quality Magic number [:23] for timestamp truncation
9 Low Code Quality update_state() method length
10 Low Security Defensive defaults in to_domain() could mask corruption

Overall the implementation is thorough, well-documented, and demonstrates strong adherence to the specification and codebase conventions. The BDD test suite is comprehensive (42 scenarios) covering lifecycle transitions, edge cases, and error paths. Finding #1 is the only actionable bug that should be fixed before merge.

## Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Scope:** All code changes on `feat/correction-attempts-table` branch (commit `543c733`) plus close connections to surrounding code. **Methodology:** 3 full review cycles across all categories (bugs, security, performance, test coverage, spec compliance, code quality). No tests were executed. --- ### Finding #1 — `archived_artifacts_path` not stripped before storage in `update_state()` | Attribute | Value | |-----------|-------| | **Severity** | HIGH | | **Category** | Bug / Data Integrity | | **File** | `repositories.py:5812–5866` | The argument validation block at lines 5812–5817 computes `stripped_path = archived_artifacts_path.strip()` and correctly rejects empty/whitespace-only values. However, the mutation block at line 5865–5866 writes the **original unstripped** value to the database row: ```python # Line 5812-5817 — validation (correct) if archived_artifacts_path is not None: stripped_path = archived_artifacts_path.strip() if not stripped_path: raise DatabaseError(...) # Line 5865-5866 — mutation (BUG: uses original, not stripped) if archived_artifacts_path is not None: row.archived_artifacts_path = archived_artifacts_path # should be stripped_path ``` Compare with `new_decision_id` which **correctly** uses the stripped value: ```python # Line 5863-5864 — mutation (correct: uses stripped_decision_id) if stripped_decision_id is not None: row.new_decision_id = stripped_decision_id ``` **Impact:** Leading/trailing whitespace in `archived_artifacts_path` will be persisted to the database, creating data inconsistency between the validation logic and the stored value. The `stripped_path` variable is computed but discarded. **Suggested fix:** Change line 5865–5866 to use `stripped_path`: ```python if stripped_path is not None: row.archived_artifacts_path = stripped_path ``` (Note: the outer condition should also change from `archived_artifacts_path` to `stripped_path` to be consistent with the `new_decision_id` pattern.) --- ### Finding #2 — `InvalidCorrectionStateTransitionError` inherits from `DatabaseError` | Attribute | Value | |-----------|-------| | **Severity** | MEDIUM | | **Category** | Design / Architecture | | **File** | `repositories.py:5647` | `InvalidCorrectionStateTransitionError` represents a **business rule violation** (invalid state machine transition per the spec lifecycle `pending -> executing -> complete|failed`), not a database error. It inherits from `DatabaseError`: ```python class InvalidCorrectionStateTransitionError(DatabaseError): ``` The analogous domain-level exception `InvalidJobTransitionError` (in `domain/models/core/async_job.py`) does **not** inherit from `DatabaseError`. This inconsistency means callers catching `DatabaseError` broadly (e.g., for retry logic) would inadvertently catch business logic violations, preventing proper differentiation between "retry-worthy" database errors and "non-retryable" validation failures. **Suggested fix:** Consider making `InvalidCorrectionStateTransitionError` inherit from a domain exception base class rather than `DatabaseError`, or introduce a `RepositoryValidationError` intermediate class to distinguish business rule violations from actual database failures. --- ### Finding #3 — Spec has two conflicting `correction_attempts` DDL definitions | Attribute | Value | |-----------|-------| | **Severity** | INFO | | **Category** | Specification Compliance | | **Location** | `docs/specification.md:18783` vs `docs/specification.md:45580` | The specification contains two different DDL definitions for `correction_attempts`: | Aspect | Line 18783 ("Correction history") | Line 45580 (Main DDL) | Implementation | |--------|-------|---------|------| | PK name | `attempt_id` | `correction_attempt_id` | `correction_attempt_id` | | State value | `'completed'` | `'complete'` | `'complete'` | | Additional columns | `original_subtree_snapshot`, `correction_reason` | `mode`, `guidance`, `archived_artifacts_path` | `mode`, `guidance`, `archived_artifacts_path` | | Status column name | `status` | `state` | `state` | The implementation correctly follows the main DDL (line 45580) as referenced by issue #920. The earlier DDL at line 18783 appears to be a superseded version. **No action required** — noted for awareness only. The spec discrepancy itself may warrant a documentation fix outside this PR. --- ### Finding #4 — `plan_id` FK uses `CASCADE` vs spec default | Attribute | Value | |-----------|-------| | **Severity** | INFO | | **Category** | Specification Compliance | | **File** | `models.py:2986`, `m8_001_correction_attempts_table.py:34` | The spec DDL at line 45582 specifies `REFERENCES plans(plan_id)` without an `ON DELETE` clause, which defaults to SQL `NO ACTION` (equivalent to `RESTRICT`). The implementation uses `ondelete="CASCADE"`. While `CASCADE` is semantically appropriate (orphaned correction attempts after plan deletion are meaningless), it is a literal deviation from the spec DDL. The commit message documents the intentional choice for `original_decision_id` (changed to `RESTRICT`) but does not document the `CASCADE` choice for `plan_id`. --- ### Finding #5 — Missing test for whitespace-padded `archived_artifacts_path` | Attribute | Value | |-----------|-------| | **Severity** | LOW | | **Category** | Test Coverage | | **File** | `correction_attempt_persistence.feature` | There are BDD scenarios for **empty** and **whitespace-only** `archived_artifacts_path` being rejected (lines 303–316), but no scenario testing a **whitespace-padded** valid path (e.g., `" /path/to/file "`). This gap is directly related to Finding #1 — a test with a whitespace-padded path would expose the stripping inconsistency. --- ### Finding #6 — Robot helper covers only happy paths | Attribute | Value | |-----------|-------| | **Severity** | LOW | | **Category** | Test Coverage | | **File** | `robot/helper_correction_attempt_persistence.py` | The Robot integration tests (5 test cases) cover only happy-path CRUD operations. Error scenarios (invalid state transitions, FK violations, not-found cases, duplicate detection) are covered exclusively by BDD tests (42 scenarios). While BDD coverage is comprehensive, having zero error coverage in Robot integration tests is a gap compared to other repository test suites in the codebase. --- ### Finding #7 — No pagination for `list_by_plan()` | Attribute | Value | |-----------|-------| | **Severity** | LOW | | **Category** | Performance | | **File** | `repositories.py:5746` | `list_by_plan()` loads all correction attempts for a plan into memory with no `limit`/`offset` support. While consistent with the codebase convention (e.g., `CheckpointRepository.list_by_plan()`) and unlikely to be a practical issue (plans rarely have hundreds of correction attempts), it is worth noting for future scalability. --- ### Finding #8 — Magic number `[:23]` for timestamp truncation | Attribute | Value | |-----------|-------| | **Severity** | LOW | | **Category** | Code Quality | | **Files** | `models.py:3121,3135` and `repositories.py:5862` | The slice `[:23]` appears in 3 places to truncate Python's 6-digit microsecond `strftime` output to 3-digit millisecond precision matching SQLite's `strftime('%f')` format. While each occurrence has a comment explaining the purpose, a named constant would improve maintainability: ```python _SQLITE_TIMESTAMP_LEN = 23 # "YYYY-MM-DDTHH:MM:SS.mmm" ``` --- ### Finding #9 — `update_state()` method length | Attribute | Value | |-----------|-------| | **Severity** | LOW | | **Category** | Code Quality | | **File** | `repositories.py:5775–5892` | The `update_state()` method is approximately 80 lines, handling argument validation, DB row lookup, state transition validation, `completed_at` guard, multiple optional field mutations, and three exception handler blocks. While still readable, the argument validation block (lines 5803–5817) could be extracted into a private `_validate_update_args()` helper to improve separation of concerns and testability. --- ### Finding #10 — Defensive `to_domain()` defaults could mask data corruption | Attribute | Value | |-----------|-------| | **Severity** | LOW | | **Category** | Security / Data Integrity | | **File** | `models.py:3069–3084` | When unknown `mode` or `state` values are found in the database, `to_domain()` logs a warning and defaults to `REVERT`/`PENDING` respectively. While consistent with the codebase convention (`LifecyclePlanModel.to_domain()`), defaulting to `PENDING` specifically could make a corrupted record appear actionable — it would pass the `update_state()` state transition validation and could be "progressed" through the lifecycle as if it were a legitimate pending attempt. --- ### Summary | # | Severity | Category | Description | |---|----------|----------|-------------| | 1 | **HIGH** | Bug | `archived_artifacts_path` not stripped before DB write in `update_state()` | | 2 | **MEDIUM** | Design | `InvalidCorrectionStateTransitionError` inherits from `DatabaseError` | | 3 | INFO | Spec | Two conflicting DDL definitions in spec (implementation follows correct one) | | 4 | INFO | Spec | `plan_id` FK uses `CASCADE` vs spec implicit `NO ACTION` | | 5 | LOW | Test | Missing whitespace-padded `archived_artifacts_path` test | | 6 | LOW | Test | Robot helper covers only happy paths | | 7 | Low | Performance | No pagination for `list_by_plan()` | | 8 | Low | Code Quality | Magic number `[:23]` for timestamp truncation | | 9 | Low | Code Quality | `update_state()` method length | | 10 | Low | Security | Defensive defaults in `to_domain()` could mask corruption | Overall the implementation is thorough, well-documented, and demonstrates strong adherence to the specification and codebase conventions. The BDD test suite is comprehensive (42 scenarios) covering lifecycle transitions, edge cases, and error paths. Finding #1 is the only actionable bug that should be fixed before merge.
@ -5618,0 +5644,4 @@
"""Raised when a correction attempt ID already exists."""
class InvalidCorrectionStateTransitionError(DatabaseError):
Author
Member

DESIGN (Finding #2): This exception represents a business rule violation (invalid state machine transition), not a database error. The analogous InvalidJobTransitionError in the domain layer does not inherit from DatabaseError. This makes it difficult for callers to distinguish retryable DB errors from non-retryable validation failures when catching DatabaseError broadly.

**DESIGN (Finding #2):** This exception represents a business rule violation (invalid state machine transition), not a database error. The analogous `InvalidJobTransitionError` in the domain layer does not inherit from `DatabaseError`. This makes it difficult for callers to distinguish retryable DB errors from non-retryable validation failures when catching `DatabaseError` broadly.
@ -5618,0 +5863,4 @@
if stripped_decision_id is not None:
row.new_decision_id = stripped_decision_id # type: ignore[assignment]
if archived_artifacts_path is not None:
row.archived_artifacts_path = archived_artifacts_path # type: ignore[assignment]
Author
Member

BUG (Finding #1): archived_artifacts_path is written unstripped here. The stripped_path computed at line 5813 is validated but discarded. This line should use stripped_path instead of the original archived_artifacts_path — matching the pattern used for new_decision_id at line 5863-5864 which correctly uses stripped_decision_id.

Suggested fix:

if stripped_path is not None:
    row.archived_artifacts_path = stripped_path
**BUG (Finding #1):** `archived_artifacts_path` is written unstripped here. The `stripped_path` computed at line 5813 is validated but discarded. This line should use `stripped_path` instead of the original `archived_artifacts_path` — matching the pattern used for `new_decision_id` at line 5863-5864 which correctly uses `stripped_decision_id`. Suggested fix: ```python if stripped_path is not None: row.archived_artifacts_path = stripped_path ```
CoreRasurae force-pushed feat/correction-attempts-table from 543c7332c2
Some checks are pending
CI / benchmark-publish (pull_request) Waiting to run
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m43s
CI / security (pull_request) Successful in 4m4s
CI / build (pull_request) Successful in 21s
CI / typecheck (pull_request) Successful in 4m14s
CI / benchmark-regression (pull_request) Waiting to run
CI / integration_tests (pull_request) Successful in 3m46s
CI / unit_tests (pull_request) Successful in 6m44s
CI / docker (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Successful in 6m29s
CI / coverage (pull_request) Successful in 12m17s
CI / status-check (pull_request) Successful in 1s
to ad68e198c5
Some checks failed
CI / lint (pull_request) Successful in 24s
CI / quality (pull_request) Successful in 56s
CI / security (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 22s
CI / typecheck (pull_request) Successful in 4m0s
CI / integration_tests (pull_request) Successful in 3m41s
CI / unit_tests (pull_request) Successful in 7m22s
CI / docker (pull_request) Successful in 1m1s
CI / e2e_tests (pull_request) Successful in 9m20s
CI / coverage (pull_request) Successful in 12m25s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-29 10:03:30 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 feat(db): add correction_attempts table per specification DDL

Reviewer: Automated code review (OpenCode)
Scope: Code changes in branch feat/correction-attempts-table plus close surrounding code
Spec reference: docs/specification.md lines 45580–45593 (authoritative DDL), issue #920
Cycles performed: 3 full global passes across all categories (bugs, spec compliance, security, performance, test coverage/flaws, design)


Summary

The implementation is thorough and well-structured across 11 changed files. The CorrectionAttemptModel, CorrectionAttemptRecord, CorrectionAttemptRepository, Alembic migration, and test suites (45 BDD scenarios + 5 Robot tests) demonstrate strong attention to spec alignment and defensive coding. No Critical or High severity issues were found. No security vulnerabilities were identified. Performance is appropriate (index exists for the only list-query pattern). The findings below are 3 Medium and 7 Low severity items.


Findings by Category and Severity

1. BUGS

1.1 [Medium] InvalidCorrectionStateTransitionError inherits from DatabaseError instead of BusinessRuleViolation

File: src/cleveragents/infrastructure/database/repositories.py:5648

class InvalidCorrectionStateTransitionError(DatabaseError):

State transition validation is a business rule, not a database error. The codebase already has precedent for this distinction: ActionInUseError(BusinessRuleViolation) in the same file. Having this exception inherit from DatabaseError means callers catching DatabaseError (for retry or generic error handling) will inadvertently catch business rule violations, potentially masking the specific error or triggering incorrect retry behavior.

Recommendation: Change to InvalidCorrectionStateTransitionError(BusinessRuleViolation).


1.2 [Low] Defensive to_domain() coercion defaults corrupted state to PENDING

File: src/cleveragents/infrastructure/database/models.py:3081–3089

except ValueError:
    _logger.warning(
        "Unknown correction state '%s' for attempt %s; defaulting to 'pending'",
        raw_state, self.correction_attempt_id,
    )
    state_enum = CorrectionAttemptState.PENDING

If a DB record has a corrupted state value (e.g., was previously complete or failed but the value is mangled), loading it as PENDING could allow the correction to be re-transitioned through EXECUTINGCOMPLETE, effectively re-executing a finished correction. A terminal state like FAILED would be a safer default since it prevents further transitions.

Recommendation: Default corrupted state to CorrectionAttemptState.FAILED instead of PENDING.


1.3 [Low] Input validation errors in update_state() raised as DatabaseError

File: src/cleveragents/infrastructure/database/repositories.py:5810–5818

if not stripped_decision_id:
    raise DatabaseError("new_decision_id must not be empty or whitespace-only")
...
if not stripped_artifacts_path:
    raise DatabaseError("archived_artifacts_path must not be empty or whitespace-only")

These are input validation errors, not database errors. Using DatabaseError conflates the exception semantics and could trigger retry logic (via @database_retry) that doesn't make sense for validation failures.

Recommendation: Use ValueError or BusinessRuleViolation for input validation failures.


2. SPEC COMPLIANCE

2.1 [Medium] new_decision_id FK uses SET NULL instead of spec-implied RESTRICT

Files: src/cleveragents/infrastructure/database/models.py:3002–3005, alembic/versions/m8_001_correction_attempts_table.py:43–47

The spec DDL (line 45584) defines:

new_decision_id TEXT REFERENCES decisions(decision_id)

No ON DELETE clause means the SQL default: RESTRICT / NO ACTION.

The implementation uses ondelete="SET NULL", which silently nullifies the reference when the referenced decision is deleted. This is inconsistent with the approach taken for original_decision_id, which was explicitly changed to RESTRICT in review round 9 to match the spec DDL default. The same reasoning should apply to new_decision_id.

Recommendation: Change new_decision_id FK to ondelete="RESTRICT" (or remove the ondelete clause) for consistency with the spec and with original_decision_id. Update the Alembic migration accordingly.


2.2 [Low] plan_id FK uses CASCADE, spec implies RESTRICT

Files: src/cleveragents/infrastructure/database/models.py:2985–2989, alembic/versions/m8_001_correction_attempts_table.py:31–35

The spec DDL REFERENCES plans(plan_id) with no ON DELETE implies RESTRICT. The implementation uses CASCADE. While this is consistent with other v3_plans child tables in the codebase (PlanProjectModel, PlanArgumentModel, etc.) and is a pragmatic design choice, it is a deviation from the literal spec DDL.

Recommendation: Document this deviation explicitly in a code comment explaining the codebase convention, or align with the spec if correction audit trail preservation is desired.


3. TEST COVERAGE GAPS

3.1 [Medium] No test for original_decision_id RESTRICT FK behavior

File: features/correction_attempt_persistence.feature

The original_decision_id FK was explicitly changed from CASCADE to RESTRICT in review round 9 to preserve the correction audit trail when decisions are cleaned up. However, there is no BDD or Robot test verifying that deleting a decision referenced by original_decision_id is actually blocked. This is the most important FK behavioral change in the PR and it has no corresponding test.

Recommendation: Add a BDD scenario like:

Scenario: Deleting a decision referenced by original_decision_id is blocked
  Given a persisted correction attempt in pending state
  When I try to delete the original decision
  Then a database integrity error should be raised
  And the correction attempt should still exist

3.2 [Low] Weak cross-plan isolation test

File: features/correction_attempt_persistence.feature:225–228

Scenario: list_by_plan returns only attempts for the specified plan
    Given a persisted correction attempt in pending state
    When I list correction attempts for a plan with no correction attempts
    Then I should get 0 correction attempts

This scenario queries a non-existent plan ID (01HV000000000000000NOATTEMPT) which trivially returns empty. It doesn't prove actual cross-plan isolation. A stronger test would create attempts for two different plans and verify each list_by_plan call returns only its own.

Recommendation: Create a second prerequisite plan, add attempts to both plans, and verify list_by_plan returns only the targeted plan's attempts.


4. TEST FLAWS

4.1 [Low] Hardcoded assertion value in step_check_archived_path

File: features/steps/correction_attempt_persistence_steps.py:412–417

@then("the correction attempt archived_artifacts_path should be set")
def step_check_archived_path(context: Context) -> None:
    refreshed = context._ca_repo.get(context._ca_persisted.correction_attempt_id)
    assert refreshed.archived_artifacts_path == "/tmp/archived/correction"

The expected value "/tmp/archived/correction" is hardcoded rather than stored in a context variable. If the setup step changes the path, this assertion will fail for the wrong reason.

Recommendation: Store the path in a context variable in the When step and compare against it in the Then step.


5. DESIGN

5.1 [Low] Timestamp formatting logic duplicated across module boundaries

Files: models.py:3126–3128 and repositories.py:5860–5864

The millisecond-precision timestamp formatting (strftime(...)[:_SQLITE_TIMESTAMP_MS_LEN]) appears in both CorrectionAttemptModel.from_domain() and CorrectionAttemptRepository.update_state(). This duplicates the format logic and creates a maintenance risk if the format ever changes.

Recommendation: Extract a shared helper (e.g., _format_sqlite_timestamp(dt: datetime) -> str on the model class or in a shared utilities module).


5.2 [Low] Private constant _SQLITE_TIMESTAMP_MS_LEN imported across modules

Files: models.py:2962, repositories.py:93

The underscore prefix convention marks _SQLITE_TIMESTAMP_MS_LEN as module-internal, but it is imported by repositories.py. This creates coupling to an implementation detail.

Recommendation: Either remove the underscore prefix (making it a public constant) or move it to a shared constants module.


Areas with No Issues Found

Category Assessment
Security No SQL injection (parameterized ORM queries), no path traversal at this layer, proper input validation on all string fields
Performance idx_corrections_plan index covers the list_by_plan() query path; no N+1 queries; consistent timestamp format ensures correct string-based ordering
Alembic Migration Migration matches model definition; CHECK constraints match enum values; downgrade correctly reverses upgrade
Domain Model Field validators comprehensive (whitespace, empty, max_length, timezone normalization); state transition logic correctly extracted to domain layer
CRUD Operations Create, get, list_by_plan, update_state, delete all follow existing repository patterns; retry decorator applied; proper session rollback on errors

Severity Summary

Severity Count IDs
Critical 0
High 0
Medium 3 1.1, 2.1, 3.1
Low 7 1.2, 1.3, 2.2, 3.2, 4.1, 5.1, 5.2
Total 10
# Code Review Report — PR #1145 `feat(db): add correction_attempts table per specification DDL` **Reviewer**: Automated code review (OpenCode) **Scope**: Code changes in branch `feat/correction-attempts-table` plus close surrounding code **Spec reference**: `docs/specification.md` lines 45580–45593 (authoritative DDL), issue #920 **Cycles performed**: 3 full global passes across all categories (bugs, spec compliance, security, performance, test coverage/flaws, design) --- ## Summary The implementation is thorough and well-structured across 11 changed files. The `CorrectionAttemptModel`, `CorrectionAttemptRecord`, `CorrectionAttemptRepository`, Alembic migration, and test suites (45 BDD scenarios + 5 Robot tests) demonstrate strong attention to spec alignment and defensive coding. No **Critical** or **High** severity issues were found. **No security vulnerabilities were identified.** Performance is appropriate (index exists for the only list-query pattern). The findings below are 3 **Medium** and 7 **Low** severity items. --- ## Findings by Category and Severity ### 1. BUGS #### 1.1 [Medium] `InvalidCorrectionStateTransitionError` inherits from `DatabaseError` instead of `BusinessRuleViolation` **File**: `src/cleveragents/infrastructure/database/repositories.py:5648` ```python class InvalidCorrectionStateTransitionError(DatabaseError): ``` State transition validation is a **business rule**, not a database error. The codebase already has precedent for this distinction: `ActionInUseError(BusinessRuleViolation)` in the same file. Having this exception inherit from `DatabaseError` means callers catching `DatabaseError` (for retry or generic error handling) will inadvertently catch business rule violations, potentially masking the specific error or triggering incorrect retry behavior. **Recommendation**: Change to `InvalidCorrectionStateTransitionError(BusinessRuleViolation)`. --- #### 1.2 [Low] Defensive `to_domain()` coercion defaults corrupted state to `PENDING` **File**: `src/cleveragents/infrastructure/database/models.py:3081–3089` ```python except ValueError: _logger.warning( "Unknown correction state '%s' for attempt %s; defaulting to 'pending'", raw_state, self.correction_attempt_id, ) state_enum = CorrectionAttemptState.PENDING ``` If a DB record has a corrupted state value (e.g., was previously `complete` or `failed` but the value is mangled), loading it as `PENDING` could allow the correction to be re-transitioned through `EXECUTING` → `COMPLETE`, effectively re-executing a finished correction. A terminal state like `FAILED` would be a safer default since it prevents further transitions. **Recommendation**: Default corrupted state to `CorrectionAttemptState.FAILED` instead of `PENDING`. --- #### 1.3 [Low] Input validation errors in `update_state()` raised as `DatabaseError` **File**: `src/cleveragents/infrastructure/database/repositories.py:5810–5818` ```python if not stripped_decision_id: raise DatabaseError("new_decision_id must not be empty or whitespace-only") ... if not stripped_artifacts_path: raise DatabaseError("archived_artifacts_path must not be empty or whitespace-only") ``` These are input validation errors, not database errors. Using `DatabaseError` conflates the exception semantics and could trigger retry logic (via `@database_retry`) that doesn't make sense for validation failures. **Recommendation**: Use `ValueError` or `BusinessRuleViolation` for input validation failures. --- ### 2. SPEC COMPLIANCE #### 2.1 [Medium] `new_decision_id` FK uses `SET NULL` instead of spec-implied RESTRICT **Files**: `src/cleveragents/infrastructure/database/models.py:3002–3005`, `alembic/versions/m8_001_correction_attempts_table.py:43–47` The spec DDL (line 45584) defines: ```sql new_decision_id TEXT REFERENCES decisions(decision_id) ``` No `ON DELETE` clause means the SQL default: `RESTRICT` / `NO ACTION`. The implementation uses `ondelete="SET NULL"`, which silently nullifies the reference when the referenced decision is deleted. This is **inconsistent** with the approach taken for `original_decision_id`, which was explicitly changed to `RESTRICT` in review round 9 to match the spec DDL default. The same reasoning should apply to `new_decision_id`. **Recommendation**: Change `new_decision_id` FK to `ondelete="RESTRICT"` (or remove the `ondelete` clause) for consistency with the spec and with `original_decision_id`. Update the Alembic migration accordingly. --- #### 2.2 [Low] `plan_id` FK uses `CASCADE`, spec implies RESTRICT **Files**: `src/cleveragents/infrastructure/database/models.py:2985–2989`, `alembic/versions/m8_001_correction_attempts_table.py:31–35` The spec DDL `REFERENCES plans(plan_id)` with no `ON DELETE` implies RESTRICT. The implementation uses `CASCADE`. While this is consistent with other `v3_plans` child tables in the codebase (PlanProjectModel, PlanArgumentModel, etc.) and is a pragmatic design choice, it is a **deviation** from the literal spec DDL. **Recommendation**: Document this deviation explicitly in a code comment explaining the codebase convention, or align with the spec if correction audit trail preservation is desired. --- ### 3. TEST COVERAGE GAPS #### 3.1 [Medium] No test for `original_decision_id` RESTRICT FK behavior **File**: `features/correction_attempt_persistence.feature` The `original_decision_id` FK was explicitly changed from `CASCADE` to `RESTRICT` in review round 9 to preserve the correction audit trail when decisions are cleaned up. However, there is **no BDD or Robot test** verifying that deleting a decision referenced by `original_decision_id` is actually blocked. This is the most important FK behavioral change in the PR and it has no corresponding test. **Recommendation**: Add a BDD scenario like: ```gherkin Scenario: Deleting a decision referenced by original_decision_id is blocked Given a persisted correction attempt in pending state When I try to delete the original decision Then a database integrity error should be raised And the correction attempt should still exist ``` --- #### 3.2 [Low] Weak cross-plan isolation test **File**: `features/correction_attempt_persistence.feature:225–228` ```gherkin Scenario: list_by_plan returns only attempts for the specified plan Given a persisted correction attempt in pending state When I list correction attempts for a plan with no correction attempts Then I should get 0 correction attempts ``` This scenario queries a non-existent plan ID (`01HV000000000000000NOATTEMPT`) which trivially returns empty. It doesn't prove actual cross-plan isolation. A stronger test would create attempts for **two different plans** and verify each `list_by_plan` call returns only its own. **Recommendation**: Create a second prerequisite plan, add attempts to both plans, and verify `list_by_plan` returns only the targeted plan's attempts. --- ### 4. TEST FLAWS #### 4.1 [Low] Hardcoded assertion value in `step_check_archived_path` **File**: `features/steps/correction_attempt_persistence_steps.py:412–417` ```python @then("the correction attempt archived_artifacts_path should be set") def step_check_archived_path(context: Context) -> None: refreshed = context._ca_repo.get(context._ca_persisted.correction_attempt_id) assert refreshed.archived_artifacts_path == "/tmp/archived/correction" ``` The expected value `"/tmp/archived/correction"` is hardcoded rather than stored in a context variable. If the setup step changes the path, this assertion will fail for the wrong reason. **Recommendation**: Store the path in a context variable in the `When` step and compare against it in the `Then` step. --- ### 5. DESIGN #### 5.1 [Low] Timestamp formatting logic duplicated across module boundaries **Files**: `models.py:3126–3128` and `repositories.py:5860–5864` The millisecond-precision timestamp formatting (`strftime(...)[:_SQLITE_TIMESTAMP_MS_LEN]`) appears in both `CorrectionAttemptModel.from_domain()` and `CorrectionAttemptRepository.update_state()`. This duplicates the format logic and creates a maintenance risk if the format ever changes. **Recommendation**: Extract a shared helper (e.g., `_format_sqlite_timestamp(dt: datetime) -> str` on the model class or in a shared utilities module). --- #### 5.2 [Low] Private constant `_SQLITE_TIMESTAMP_MS_LEN` imported across modules **Files**: `models.py:2962`, `repositories.py:93` The underscore prefix convention marks `_SQLITE_TIMESTAMP_MS_LEN` as module-internal, but it is imported by `repositories.py`. This creates coupling to an implementation detail. **Recommendation**: Either remove the underscore prefix (making it a public constant) or move it to a shared constants module. --- ## Areas with No Issues Found | Category | Assessment | |---|---| | **Security** | No SQL injection (parameterized ORM queries), no path traversal at this layer, proper input validation on all string fields | | **Performance** | `idx_corrections_plan` index covers the `list_by_plan()` query path; no N+1 queries; consistent timestamp format ensures correct string-based ordering | | **Alembic Migration** | Migration matches model definition; CHECK constraints match enum values; downgrade correctly reverses upgrade | | **Domain Model** | Field validators comprehensive (whitespace, empty, max_length, timezone normalization); state transition logic correctly extracted to domain layer | | **CRUD Operations** | Create, get, list_by_plan, update_state, delete all follow existing repository patterns; retry decorator applied; proper session rollback on errors | --- ## Severity Summary | Severity | Count | IDs | |---|---|---| | Critical | 0 | — | | High | 0 | — | | **Medium** | **3** | 1.1, 2.1, 3.1 | | **Low** | **7** | 1.2, 1.3, 2.2, 3.2, 4.1, 5.1, 5.2 | | **Total** | **10** | |
CoreRasurae force-pushed feat/correction-attempts-table from ad68e198c5
Some checks failed
CI / lint (pull_request) Successful in 24s
CI / quality (pull_request) Successful in 56s
CI / security (pull_request) Successful in 1m1s
CI / build (pull_request) Successful in 22s
CI / typecheck (pull_request) Successful in 4m0s
CI / integration_tests (pull_request) Successful in 3m41s
CI / unit_tests (pull_request) Successful in 7m22s
CI / docker (pull_request) Successful in 1m1s
CI / e2e_tests (pull_request) Successful in 9m20s
CI / coverage (pull_request) Successful in 12m25s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
to 971ec1029e
Some checks failed
CI / lint (pull_request) Successful in 3m21s
CI / typecheck (pull_request) Successful in 4m9s
CI / quality (pull_request) Successful in 3m40s
CI / security (pull_request) Successful in 4m10s
CI / build (pull_request) Successful in 18s
CI / unit_tests (pull_request) Successful in 3m55s
CI / integration_tests (pull_request) Successful in 3m46s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 5m36s
CI / coverage (pull_request) Successful in 8m14s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-29 10:33:02 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1145 (feat/correction-attempts-table)

Scope: All changes in branch feat/correction-attempts-table (commit 971ec102) plus close connections to surrounding code.
Issue: #920feat(db): implement missing correction_attempts table per spec DDL
Spec reference: docs/specification.md lines 45580-45593 (canonical DDL)

Files reviewed: correction.py (domain), models.py (ORM), repositories.py (CRUD), unit_of_work.py, __init__.py (exports), Alembic migration m8_001, BDD feature + steps (45 scenarios), Robot tests + helper (5 tests), CHANGELOG.md.

Methodology: 3 full review cycles across all categories (bugs, security, performance, test coverage/flaws, spec compliance, documentation). No tests were executed.


Summary

Severity Count
High 1
Medium 3
Low 4

Overall the implementation is solid: the Alembic migration chain is correct (m4_003 is the current head), column definitions match the spec DDL, CHECK constraints are consistent between migration and ORM model, FK semantics are well-reasoned (with documented deviations), the domain model has comprehensive validators, and test coverage is extensive with 45 BDD + 5 Robot scenarios. The findings below are areas for improvement.


HIGH Severity

H-1: Missing ORM-level relationship/cascade on LifecyclePlanModel for correction_attempts

Category: Bug / Data Integrity
Files: models.py (LifecyclePlanModel, lines ~559-716), models.py (CorrectionAttemptModel, line 2985)

All other v3_plans child tables (PlanProjectModel, PlanArgumentModel, PlanInvariantModel) have both DB-level FK CASCADE and an ORM-level relationship(..., cascade="all, delete-orphan") on LifecyclePlanModel. The new CorrectionAttemptModel only has the DB-level CASCADE FK — there is no corresponding relationship() on LifecyclePlanModel.

SQLite's PRAGMA foreign_keys is OFF by default. Without the ORM relationship, calling session.delete(plan) will not cascade-delete correction attempts unless FK enforcement is explicitly enabled at the connection level. The BDD cascade test passes because the test helper explicitly sets PRAGMA foreign_keys=ON, but production code paths that don't enable this pragma will leave orphaned correction_attempts rows.

Recommendation: Add a relationship("CorrectionAttemptModel", ..., cascade="all, delete-orphan") on LifecyclePlanModel, consistent with the other child tables.


MEDIUM Severity

M-1: to_domain() lacks defensive handling for corrupted guidance

Category: Bug / Robustness
File: models.py:3115-3126

The to_domain() method has defensive coercion for mode (lines 3096-3104, falls back to REVERT) and state (lines 3105-3113, falls back to FAILED) when the DB contains invalid values. However, guidance is passed directly to CorrectionAttemptRecord at line 3121 without any guard. If the DB contains whitespace-only or empty guidance (possible via direct SQL or a future migration bug), the Pydantic _guidance_not_empty validator will raise a ValidationError, crashing deserialization.

Recommendation: Add a defensive guard similar to mode/state — e.g., log a warning and default to a placeholder like "[corrupted]" if guidance is empty after stripping.

M-2: list_by_plan() lacks pagination support

Category: Performance
File: repositories.py:5747-5771

list_by_plan() returns all correction attempts for a plan without limit/offset parameters. Other list methods in the same module (e.g., list_plans() at line 5516) support pagination. While a single plan is unlikely to have thousands of corrections, the lack of pagination is inconsistent with the codebase pattern and could become a memory issue at scale.

Recommendation: Add optional limit and offset parameters with sensible defaults, matching the list_plans() signature.

M-3: format_sqlite_timestamp() has no guard against naive datetimes

Category: Bug / Robustness
File: models.py:2968-2982

The public helper calls dt.astimezone(UTC) which silently interprets naive datetimes as local time on Python 3.x. The CorrectionAttemptRecord validators ensure timestamps are tz-aware before reaching from_domain(), but format_sqlite_timestamp() is a public utility that could be called from other code paths without this guarantee.

Recommendation: Add an explicit check: if dt.tzinfo is None, either raise a ValueError or call dt.replace(tzinfo=UTC) with a logged warning, rather than relying on astimezone() guessing local time.


LOW Severity

L-1: Stale spec DDL line reference in docstring

Category: Documentation
File: models.py:2989

The CorrectionAttemptModel docstring says "specification DDL (lines 45486-45499)" but the actual canonical DDL is at lines 45580-45593 in docs/specification.md.

L-2: Duplicated docstring on SQLITE_TIMESTAMP_MS_LEN constant

Category: Documentation
File: models.py:2960-2965

The constant has two overlapping documentation blocks that repeat the same information:

#: precision: ``YYYY-MM-DDTHH:MM:SS.mmm`` (23 characters).  Matches the
#: output of SQLite ``strftime('%Y-%m-%dT%H:%M:%f', 'now')``.
#: Length of the millisecond-precision ISO-8601 timestamp string:
#: ``YYYY-MM-DDTHH:MM:SS.mmm`` (23 characters).  Matches the output
#: of SQLite ``strftime('%Y-%m-%dT%H:%M:%f', 'now')``.
SQLITE_TIMESTAMP_MS_LEN: int = 23

L-3: No DB-level CHECK constraint for non-empty guidance

Category: Spec Compliance / Defense in Depth
Files: Migration m8_001 (line 65), models.py:3036

mode and state have CHECK constraints enforcing their allowed values at the DB level. guidance has no equivalent constraint — the DB allows inserting empty or whitespace-only guidance via direct SQL, though the domain model rejects it. Adding CHECK(length(trim(guidance)) > 0) would provide defense in depth.

L-4: plan_id FK uses CASCADE instead of spec's implicit RESTRICT

Category: Spec Compliance (documented deviation)
Files: models.py:3004-3010, migration lines 33-37

The spec DDL (REFERENCES plans(plan_id) with no ON DELETE clause) implies NO ACTION/RESTRICT, meaning correction records should survive plan deletion for audit purposes. The implementation uses CASCADE, deleting correction records when the parent plan is deleted. This is documented in a code comment as an intentional deviation for codebase consistency with other child tables — but it does mean the correction audit trail is lost on plan deletion, which may conflict with the spec's intent for correction history traceability (see ADR-034, which discusses using correction_attempts for historical tree reconstruction).


Items Verified as Correct

  • Migration chain: m8_001 correctly extends from m4_003 (the current head). No orphaned branch.
  • All 10 spec DDL columns present with correct types, nullability, and defaults.
  • idx_corrections_plan index matches spec.
  • CHECK constraints on mode and state are consistent between migration and ORM model.
  • original_decision_id and new_decision_id FKs use RESTRICT, matching spec default.
  • server_default for created_at and state match spec DDL.
  • Timestamp normalization (UTC, millisecond precision) is consistent between from_domain() and update_state().
  • State transition validation is extracted to domain-level validate_correction_state_transition().
  • InvalidCorrectionStateTransitionError inherits from BusinessRuleViolation (not DatabaseError), correct per CONTRIBUTING.md.
  • update_state() validates inputs before ORM mutations, preventing dirty session state.
  • Domain model exports in __init__.py and __all__ are complete.
  • UnitOfWork integration is correctly wired.
  • BDD and Robot test coverage is comprehensive (45 + 5 scenarios covering CRUD, state transitions, FK violations, cascade, validation, edge cases).
# Code Review Report — PR #1145 (`feat/correction-attempts-table`) **Scope**: All changes in branch `feat/correction-attempts-table` (commit `971ec102`) plus close connections to surrounding code. **Issue**: #920 — `feat(db): implement missing correction_attempts table per spec DDL` **Spec reference**: `docs/specification.md` lines 45580-45593 (canonical DDL) **Files reviewed**: `correction.py` (domain), `models.py` (ORM), `repositories.py` (CRUD), `unit_of_work.py`, `__init__.py` (exports), Alembic migration `m8_001`, BDD feature + steps (45 scenarios), Robot tests + helper (5 tests), `CHANGELOG.md`. **Methodology**: 3 full review cycles across all categories (bugs, security, performance, test coverage/flaws, spec compliance, documentation). No tests were executed. --- ## Summary | Severity | Count | |----------|-------| | High | 1 | | Medium | 3 | | Low | 4 | Overall the implementation is solid: the Alembic migration chain is correct (`m4_003` is the current head), column definitions match the spec DDL, CHECK constraints are consistent between migration and ORM model, FK semantics are well-reasoned (with documented deviations), the domain model has comprehensive validators, and test coverage is extensive with 45 BDD + 5 Robot scenarios. The findings below are areas for improvement. --- ## HIGH Severity ### H-1: Missing ORM-level relationship/cascade on `LifecyclePlanModel` for `correction_attempts` **Category**: Bug / Data Integrity **Files**: `models.py` (LifecyclePlanModel, lines ~559-716), `models.py` (CorrectionAttemptModel, line 2985) All other `v3_plans` child tables (`PlanProjectModel`, `PlanArgumentModel`, `PlanInvariantModel`) have **both** DB-level FK `CASCADE` **and** an ORM-level `relationship(..., cascade="all, delete-orphan")` on `LifecyclePlanModel`. The new `CorrectionAttemptModel` only has the DB-level `CASCADE` FK — there is no corresponding `relationship()` on `LifecyclePlanModel`. SQLite's `PRAGMA foreign_keys` is `OFF` by default. Without the ORM relationship, calling `session.delete(plan)` will **not** cascade-delete correction attempts unless FK enforcement is explicitly enabled at the connection level. The BDD cascade test passes because the test helper explicitly sets `PRAGMA foreign_keys=ON`, but production code paths that don't enable this pragma will leave orphaned correction_attempts rows. **Recommendation**: Add a `relationship("CorrectionAttemptModel", ..., cascade="all, delete-orphan")` on `LifecyclePlanModel`, consistent with the other child tables. --- ## MEDIUM Severity ### M-1: `to_domain()` lacks defensive handling for corrupted `guidance` **Category**: Bug / Robustness **File**: `models.py:3115-3126` The `to_domain()` method has defensive coercion for `mode` (lines 3096-3104, falls back to `REVERT`) and `state` (lines 3105-3113, falls back to `FAILED`) when the DB contains invalid values. However, `guidance` is passed directly to `CorrectionAttemptRecord` at line 3121 without any guard. If the DB contains whitespace-only or empty guidance (possible via direct SQL or a future migration bug), the Pydantic `_guidance_not_empty` validator will raise a `ValidationError`, crashing deserialization. **Recommendation**: Add a defensive guard similar to mode/state — e.g., log a warning and default to a placeholder like `"[corrupted]"` if guidance is empty after stripping. ### M-2: `list_by_plan()` lacks pagination support **Category**: Performance **File**: `repositories.py:5747-5771` `list_by_plan()` returns **all** correction attempts for a plan without `limit`/`offset` parameters. Other list methods in the same module (e.g., `list_plans()` at line 5516) support pagination. While a single plan is unlikely to have thousands of corrections, the lack of pagination is inconsistent with the codebase pattern and could become a memory issue at scale. **Recommendation**: Add optional `limit` and `offset` parameters with sensible defaults, matching the `list_plans()` signature. ### M-3: `format_sqlite_timestamp()` has no guard against naive datetimes **Category**: Bug / Robustness **File**: `models.py:2968-2982` The public helper calls `dt.astimezone(UTC)` which **silently interprets naive datetimes as local time** on Python 3.x. The `CorrectionAttemptRecord` validators ensure timestamps are tz-aware before reaching `from_domain()`, but `format_sqlite_timestamp()` is a public utility that could be called from other code paths without this guarantee. **Recommendation**: Add an explicit check: if `dt.tzinfo is None`, either raise a `ValueError` or call `dt.replace(tzinfo=UTC)` with a logged warning, rather than relying on `astimezone()` guessing local time. --- ## LOW Severity ### L-1: Stale spec DDL line reference in docstring **Category**: Documentation **File**: `models.py:2989` The `CorrectionAttemptModel` docstring says *"specification DDL (lines 45486-45499)"* but the actual canonical DDL is at **lines 45580-45593** in `docs/specification.md`. ### L-2: Duplicated docstring on `SQLITE_TIMESTAMP_MS_LEN` constant **Category**: Documentation **File**: `models.py:2960-2965` The constant has two overlapping documentation blocks that repeat the same information: ```python #: precision: ``YYYY-MM-DDTHH:MM:SS.mmm`` (23 characters). Matches the #: output of SQLite ``strftime('%Y-%m-%dT%H:%M:%f', 'now')``. #: Length of the millisecond-precision ISO-8601 timestamp string: #: ``YYYY-MM-DDTHH:MM:SS.mmm`` (23 characters). Matches the output #: of SQLite ``strftime('%Y-%m-%dT%H:%M:%f', 'now')``. SQLITE_TIMESTAMP_MS_LEN: int = 23 ``` ### L-3: No DB-level `CHECK` constraint for non-empty `guidance` **Category**: Spec Compliance / Defense in Depth **Files**: Migration `m8_001` (line 65), `models.py:3036` `mode` and `state` have `CHECK` constraints enforcing their allowed values at the DB level. `guidance` has no equivalent constraint — the DB allows inserting empty or whitespace-only guidance via direct SQL, though the domain model rejects it. Adding `CHECK(length(trim(guidance)) > 0)` would provide defense in depth. ### L-4: `plan_id` FK uses `CASCADE` instead of spec's implicit `RESTRICT` **Category**: Spec Compliance (documented deviation) **Files**: `models.py:3004-3010`, migration lines 33-37 The spec DDL (`REFERENCES plans(plan_id)` with no `ON DELETE` clause) implies `NO ACTION`/`RESTRICT`, meaning correction records should survive plan deletion for audit purposes. The implementation uses `CASCADE`, deleting correction records when the parent plan is deleted. This is documented in a code comment as an intentional deviation for codebase consistency with other child tables — but it does mean the correction audit trail is lost on plan deletion, which may conflict with the spec's intent for correction history traceability (see ADR-034, which discusses using `correction_attempts` for historical tree reconstruction). --- ## Items Verified as Correct - Migration chain: `m8_001` correctly extends from `m4_003` (the current head). No orphaned branch. - All 10 spec DDL columns present with correct types, nullability, and defaults. - `idx_corrections_plan` index matches spec. - `CHECK` constraints on `mode` and `state` are consistent between migration and ORM model. - `original_decision_id` and `new_decision_id` FKs use `RESTRICT`, matching spec default. - `server_default` for `created_at` and `state` match spec DDL. - Timestamp normalization (UTC, millisecond precision) is consistent between `from_domain()` and `update_state()`. - State transition validation is extracted to domain-level `validate_correction_state_transition()`. - `InvalidCorrectionStateTransitionError` inherits from `BusinessRuleViolation` (not `DatabaseError`), correct per CONTRIBUTING.md. - `update_state()` validates inputs before ORM mutations, preventing dirty session state. - Domain model exports in `__init__.py` and `__all__` are complete. - UnitOfWork integration is correctly wired. - BDD and Robot test coverage is comprehensive (45 + 5 scenarios covering CRUD, state transitions, FK violations, cascade, validation, edge cases).
CoreRasurae force-pushed feat/correction-attempts-table from 971ec1029e
Some checks failed
CI / lint (pull_request) Successful in 3m21s
CI / typecheck (pull_request) Successful in 4m9s
CI / quality (pull_request) Successful in 3m40s
CI / security (pull_request) Successful in 4m10s
CI / build (pull_request) Successful in 18s
CI / unit_tests (pull_request) Successful in 3m55s
CI / integration_tests (pull_request) Successful in 3m46s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 5m36s
CI / coverage (pull_request) Successful in 8m14s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
to 58ba6c6ff5
All checks were successful
CI / lint (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 52s
CI / quality (pull_request) Successful in 57s
CI / security (pull_request) Successful in 1m2s
CI / build (pull_request) Successful in 33s
CI / integration_tests (pull_request) Successful in 3m43s
CI / e2e_tests (pull_request) Successful in 6m7s
CI / unit_tests (pull_request) Successful in 7m32s
CI / docker (pull_request) Successful in 58s
CI / benchmark-publish (pull_request) Has been skipped
CI / coverage (pull_request) Successful in 9m25s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 52m38s
2026-03-29 11:13:00 +00:00
Compare
CoreRasurae left a comment

Code Review Report - PR #1145: feat(db): add correction_attempts table per specification DDL

Reviewer: Automated deep review (multi-cycle analysis)
Scope: All code changes in feat/correction-attempts-table branch (11 files, +2726 lines) plus close surrounding code context.
Cross-referenced: Issue #920 acceptance criteria, docs/specification.md DDL (lines 45580-45593), CONTRIBUTING.md exception/validation guidelines.


Spec DDL Alignment Summary

The implementation correctly maps all 10 spec-defined columns, the idx_corrections_plan index, and FK constraints. The plan_id FK uses CASCADE (vs spec default RESTRICT), which is documented as a codebase convention matching all other v3_plans child tables. The Alembic migration and SQLAlchemy model are consistent with each other. The server_default for created_at and state matches the spec DDL.


Findings by Severity

MEDIUM Severity

B-1: Unhandled ValueError in update_state() from corrupted DB state

Category: Bug | File: repositories.py:5832

current_state = CorrectionAttemptState(cast(str, row.state))  # <-- no ValueError handler
try:
    validate_correction_state_transition(current_state, state)
except ValueError as exc:
    raise InvalidCorrectionStateTransitionError(...) from exc

CorrectionAttemptState(cast(str, row.state)) is outside the inner try/except ValueError block. If the database contains a corrupted state value (bypassing the CheckConstraint), this raises an unhandled ValueError that propagates through all outer except clauses (none catch ValueError).

This is inconsistent with to_domain() (in models.py:3125-3133) which defensively coerces corrupted state values to CorrectionAttemptState.FAILED with a warning log. The same defensive pattern should be applied in update_state().

Suggested fix: Wrap the enum construction in a try/except ValueError block, coercing to FAILED and logging a warning, or raise InvalidCorrectionStateTransitionError with a descriptive message.


T-1: No tests for to_domain() defensive coercion paths

Category: Test Coverage Gap | File: models.py:3116-3139

to_domain() contains three defensive coercion paths for corrupted DB data:

  1. Unknown mode -> defaults to CorrectionMode.REVERT (line 3118)
  2. Unknown state -> defaults to CorrectionAttemptState.FAILED (line 3126)
  3. Empty/whitespace guidance -> defaults to "[corrupted]" (line 3134)

None of these paths are exercised by any BDD scenario or Robot test. Testing them requires inserting raw SQL with corrupted values and then reading through the repository. These are important safety mechanisms (as called out in the commit message: "Defensive to_domain() coercion now defaults corrupted state to 'failed' (terminal) instead of 'pending', preventing re-execution").


T-2: Missing vulture_whitelist.py entries

Category: Test/CI Gap | File: vulture_whitelist.py

The following new public symbols are not referenced in vulture_whitelist.py and may trigger false-positive "unused code" warnings in CI:

  • correction_attempts_rel - SQLAlchemy relationship attribute on LifecyclePlanModel, only referenced internally by SQLAlchemy (not in Python code).
  • SQLITE_TIMESTAMP_MS_LEN - public constant, only used within format_sqlite_timestamp() in the same file.
  • CORRECTION_ATTEMPT_VALID_TRANSITIONS - exported from __init__.py but only consumed internally by validate_correction_state_transition().

F-1: RESTRICT FK test has weak assertion

Category: Test Flaw | File: correction_attempt_persistence_steps.py:987-988

@then("a database integrity error should be raised for the decision")
def step_check_integrity_error(context: Context) -> None:
    assert context._ca_integrity_error is not None  # only checks "some error"

The step name says "database integrity error" but the assertion only verifies any exception was raised. The When step (line 975) catches a bare Exception. This could produce a false pass if an unrelated error (e.g., TypeError, AttributeError) were raised instead of an IntegrityError/DatabaseError.

Suggested fix: Assert the exception type, e.g.:

from sqlalchemy.exc import IntegrityError
assert isinstance(context._ca_integrity_error, (IntegrityError, DatabaseError))

LOW Severity

B-2: to_domain() lacks defensive coercion for corrupted timestamps

Category: Bug | File: models.py:3109

created_at_dt = datetime.fromisoformat(cast(str, self.created_at))

If created_at contains an unparseable string, datetime.fromisoformat() raises ValueError unhandled. Defensive coercion exists for mode, state, and guidance, but not for timestamps. The same applies to completed_at (line 3105). While the NOT NULL + server_default + CheckConstraint-guarded write path makes corruption unlikely, this is inconsistent with the defensive posture elsewhere in to_domain().


T-3: format_sqlite_timestamp() naive datetime rejection not tested

Category: Test Coverage Gap | File: models.py:2984-2988

The ValueError guard in format_sqlite_timestamp() for naive datetimes (added per CONTRIBUTING.md fail-fast guidelines) has no corresponding test. A simple unit test passing a naive datetime and asserting ValueError would close this gap.


T-4: Domain model naive datetime validators not directly tested

Category: Test Coverage Gap | File: correction.py:470-481

The _created_at_tz_aware and _completed_at_tz_aware field validators normalize naive datetimes to UTC. The existing timezone test (Non-UTC timezone datetimes are normalised to UTC on persist) uses a timezone-aware (non-UTC) datetime, which exercises the astimezone() path in format_sqlite_timestamp, but not the naive-to-UTC coercion path in the Pydantic validators.


S-1: No content validation on archived_artifacts_path

Category: Security (Defense-in-depth) | File: repositories.py:update_state()

The archived_artifacts_path field is stripped of whitespace but has no content validation (e.g., path traversal patterns like ../../). While this is a persistence-only field and actual file operations should validate separately, defense-in-depth at the domain model or repository level would prevent storing obviously malicious paths.


D-1: No direct pending -> failed transition in state machine

Category: Design Concern | File: correction.py:351-357

The spec lifecycle pending -> executing -> complete|failed has no direct pending -> failed path. If a correction attempt fails during setup (before execution begins), the only option is pending -> executing -> failed. This is spec-faithful, but if the system crashes between the two transitions, the record is stuck in pending with no recovery path. Consider whether a direct pending -> failed transition should be proposed as a spec amendment.


F-2: Multiple When/Then pairs in single BDD scenario

Category: Test Style | File: correction_attempt_persistence.feature:335-348

Scenario: list_by_plan returns only attempts for each specific plan
    ...
    When I list correction attempts for the first plan
    Then I should get exactly 1 correction attempt for the first plan
    When I list correction attempts for the second plan
    Then I should get exactly 1 correction attempt for the second plan

Two When/Then pairs in one scenario. While Behave handles this correctly, idiomatic BDD recommends a single When/Then per scenario. Consider splitting into two scenarios or using a Scenario Outline.


T-5: No test for database_retry triggered retry cycle

Category: Test Coverage Gap | File: repositories.py

The @database_retry decorator retries on DatabaseError (up to 3 attempts, 0.5s fixed wait). After an IntegrityError -> session.rollback() -> DatabaseError raise, the retry decorator catches the DatabaseError and retries the entire method. The session state after rollback and the behavior on retry are not tested. This is a lower priority since the retry decorator is a shared infrastructure component tested elsewhere.


Summary Table

# Severity Category Description File
B-1 Medium Bug Unhandled ValueError in update_state() from corrupted DB state repositories.py:5832
T-1 Medium Test Gap No tests for to_domain() defensive coercion (corrupted mode/state/guidance) models.py:3116-3139
T-2 Medium CI Gap Missing vulture_whitelist.py entries for 3 symbols vulture_whitelist.py
F-1 Medium Test Flaw RESTRICT FK test assertion only checks "any error", not error type steps.py:987
B-2 Low Bug to_domain() no defensive coercion for corrupted timestamps models.py:3109
T-3 Low Test Gap format_sqlite_timestamp() naive datetime rejection not tested models.py:2984
T-4 Low Test Gap Domain model naive datetime validators not directly tested correction.py:470-481
S-1 Low Security No content validation on archived_artifacts_path repositories.py
D-1 Low Design No direct pending -> failed transition correction.py:351
F-2 Low Test Style Multiple When/Then pairs in single BDD scenario .feature:335-348
T-5 Low Test Gap No test for retry cycle behavior after IntegrityError repositories.py

Overall assessment: The implementation is solid, well-documented, and thoroughly tested (45 BDD + 5 Robot). The spec DDL alignment is correct. The medium-severity items (B-1, T-1, T-2, F-1) are actionable and should be addressed before merge. The low-severity items are quality improvements that can be deferred if needed.

## Code Review Report - PR #1145: `feat(db): add correction_attempts table per specification DDL` **Reviewer**: Automated deep review (multi-cycle analysis) **Scope**: All code changes in `feat/correction-attempts-table` branch (11 files, +2726 lines) plus close surrounding code context. **Cross-referenced**: Issue #920 acceptance criteria, `docs/specification.md` DDL (lines 45580-45593), `CONTRIBUTING.md` exception/validation guidelines. --- ### Spec DDL Alignment Summary The implementation correctly maps all 10 spec-defined columns, the `idx_corrections_plan` index, and FK constraints. The `plan_id` FK uses `CASCADE` (vs spec default RESTRICT), which is documented as a codebase convention matching all other `v3_plans` child tables. The Alembic migration and SQLAlchemy model are consistent with each other. The `server_default` for `created_at` and `state` matches the spec DDL. --- ## Findings by Severity ### MEDIUM Severity #### B-1: Unhandled `ValueError` in `update_state()` from corrupted DB state **Category**: Bug | **File**: `repositories.py:5832` ```python current_state = CorrectionAttemptState(cast(str, row.state)) # <-- no ValueError handler try: validate_correction_state_transition(current_state, state) except ValueError as exc: raise InvalidCorrectionStateTransitionError(...) from exc ``` `CorrectionAttemptState(cast(str, row.state))` is **outside** the inner `try/except ValueError` block. If the database contains a corrupted state value (bypassing the `CheckConstraint`), this raises an unhandled `ValueError` that propagates through all outer `except` clauses (none catch `ValueError`). This is inconsistent with `to_domain()` (in `models.py:3125-3133`) which defensively coerces corrupted state values to `CorrectionAttemptState.FAILED` with a warning log. The same defensive pattern should be applied in `update_state()`. **Suggested fix**: Wrap the enum construction in a `try/except ValueError` block, coercing to `FAILED` and logging a warning, or raise `InvalidCorrectionStateTransitionError` with a descriptive message. --- #### T-1: No tests for `to_domain()` defensive coercion paths **Category**: Test Coverage Gap | **File**: `models.py:3116-3139` `to_domain()` contains three defensive coercion paths for corrupted DB data: 1. Unknown `mode` -> defaults to `CorrectionMode.REVERT` (line 3118) 2. Unknown `state` -> defaults to `CorrectionAttemptState.FAILED` (line 3126) 3. Empty/whitespace `guidance` -> defaults to `"[corrupted]"` (line 3134) None of these paths are exercised by any BDD scenario or Robot test. Testing them requires inserting raw SQL with corrupted values and then reading through the repository. These are important safety mechanisms (as called out in the commit message: "Defensive to_domain() coercion now defaults corrupted state to 'failed' (terminal) instead of 'pending', preventing re-execution"). --- #### T-2: Missing `vulture_whitelist.py` entries **Category**: Test/CI Gap | **File**: `vulture_whitelist.py` The following new public symbols are not referenced in `vulture_whitelist.py` and may trigger false-positive "unused code" warnings in CI: - `correction_attempts_rel` - SQLAlchemy relationship attribute on `LifecyclePlanModel`, only referenced internally by SQLAlchemy (not in Python code). - `SQLITE_TIMESTAMP_MS_LEN` - public constant, only used within `format_sqlite_timestamp()` in the same file. - `CORRECTION_ATTEMPT_VALID_TRANSITIONS` - exported from `__init__.py` but only consumed internally by `validate_correction_state_transition()`. --- #### F-1: RESTRICT FK test has weak assertion **Category**: Test Flaw | **File**: `correction_attempt_persistence_steps.py:987-988` ```python @then("a database integrity error should be raised for the decision") def step_check_integrity_error(context: Context) -> None: assert context._ca_integrity_error is not None # only checks "some error" ``` The step name says "database integrity error" but the assertion only verifies *any* exception was raised. The `When` step (line 975) catches a bare `Exception`. This could produce a false pass if an unrelated error (e.g., `TypeError`, `AttributeError`) were raised instead of an `IntegrityError`/`DatabaseError`. **Suggested fix**: Assert the exception type, e.g.: ```python from sqlalchemy.exc import IntegrityError assert isinstance(context._ca_integrity_error, (IntegrityError, DatabaseError)) ``` --- ### LOW Severity #### B-2: `to_domain()` lacks defensive coercion for corrupted timestamps **Category**: Bug | **File**: `models.py:3109` ```python created_at_dt = datetime.fromisoformat(cast(str, self.created_at)) ``` If `created_at` contains an unparseable string, `datetime.fromisoformat()` raises `ValueError` unhandled. Defensive coercion exists for `mode`, `state`, and `guidance`, but not for timestamps. The same applies to `completed_at` (line 3105). While the `NOT NULL` + `server_default` + `CheckConstraint`-guarded write path makes corruption unlikely, this is inconsistent with the defensive posture elsewhere in `to_domain()`. --- #### T-3: `format_sqlite_timestamp()` naive datetime rejection not tested **Category**: Test Coverage Gap | **File**: `models.py:2984-2988` The `ValueError` guard in `format_sqlite_timestamp()` for naive datetimes (added per CONTRIBUTING.md fail-fast guidelines) has no corresponding test. A simple unit test passing a naive `datetime` and asserting `ValueError` would close this gap. --- #### T-4: Domain model naive datetime validators not directly tested **Category**: Test Coverage Gap | **File**: `correction.py:470-481` The `_created_at_tz_aware` and `_completed_at_tz_aware` field validators normalize naive datetimes to UTC. The existing timezone test (`Non-UTC timezone datetimes are normalised to UTC on persist`) uses a timezone-aware (non-UTC) datetime, which exercises the `astimezone()` path in `format_sqlite_timestamp`, but not the naive-to-UTC coercion path in the Pydantic validators. --- #### S-1: No content validation on `archived_artifacts_path` **Category**: Security (Defense-in-depth) | **File**: `repositories.py:update_state()` The `archived_artifacts_path` field is stripped of whitespace but has no content validation (e.g., path traversal patterns like `../../`). While this is a persistence-only field and actual file operations should validate separately, defense-in-depth at the domain model or repository level would prevent storing obviously malicious paths. --- #### D-1: No direct `pending -> failed` transition in state machine **Category**: Design Concern | **File**: `correction.py:351-357` The spec lifecycle `pending -> executing -> complete|failed` has no direct `pending -> failed` path. If a correction attempt fails during setup (before execution begins), the only option is `pending -> executing -> failed`. This is spec-faithful, but if the system crashes between the two transitions, the record is stuck in `pending` with no recovery path. Consider whether a direct `pending -> failed` transition should be proposed as a spec amendment. --- #### F-2: Multiple When/Then pairs in single BDD scenario **Category**: Test Style | **File**: `correction_attempt_persistence.feature:335-348` ```gherkin Scenario: list_by_plan returns only attempts for each specific plan ... When I list correction attempts for the first plan Then I should get exactly 1 correction attempt for the first plan When I list correction attempts for the second plan Then I should get exactly 1 correction attempt for the second plan ``` Two When/Then pairs in one scenario. While Behave handles this correctly, idiomatic BDD recommends a single When/Then per scenario. Consider splitting into two scenarios or using a Scenario Outline. --- #### T-5: No test for `database_retry` triggered retry cycle **Category**: Test Coverage Gap | **File**: `repositories.py` The `@database_retry` decorator retries on `DatabaseError` (up to 3 attempts, 0.5s fixed wait). After an `IntegrityError` -> `session.rollback()` -> `DatabaseError` raise, the retry decorator catches the `DatabaseError` and retries the entire method. The session state after rollback and the behavior on retry are not tested. This is a lower priority since the retry decorator is a shared infrastructure component tested elsewhere. --- ### Summary Table | # | Severity | Category | Description | File | |---|----------|----------|-------------|------| | B-1 | Medium | Bug | Unhandled `ValueError` in `update_state()` from corrupted DB state | `repositories.py:5832` | | T-1 | Medium | Test Gap | No tests for `to_domain()` defensive coercion (corrupted mode/state/guidance) | `models.py:3116-3139` | | T-2 | Medium | CI Gap | Missing `vulture_whitelist.py` entries for 3 symbols | `vulture_whitelist.py` | | F-1 | Medium | Test Flaw | RESTRICT FK test assertion only checks "any error", not error type | `steps.py:987` | | B-2 | Low | Bug | `to_domain()` no defensive coercion for corrupted timestamps | `models.py:3109` | | T-3 | Low | Test Gap | `format_sqlite_timestamp()` naive datetime rejection not tested | `models.py:2984` | | T-4 | Low | Test Gap | Domain model naive datetime validators not directly tested | `correction.py:470-481` | | S-1 | Low | Security | No content validation on `archived_artifacts_path` | `repositories.py` | | D-1 | Low | Design | No direct `pending -> failed` transition | `correction.py:351` | | F-2 | Low | Test Style | Multiple When/Then pairs in single BDD scenario | `.feature:335-348` | | T-5 | Low | Test Gap | No test for retry cycle behavior after IntegrityError | `repositories.py` | --- **Overall assessment**: The implementation is solid, well-documented, and thoroughly tested (45 BDD + 5 Robot). The spec DDL alignment is correct. The medium-severity items (B-1, T-1, T-2, F-1) are actionable and should be addressed before merge. The low-severity items are quality improvements that can be deferred if needed.
CoreRasurae force-pushed feat/correction-attempts-table from 58ba6c6ff5
All checks were successful
CI / lint (pull_request) Successful in 26s
CI / typecheck (pull_request) Successful in 52s
CI / quality (pull_request) Successful in 57s
CI / security (pull_request) Successful in 1m2s
CI / build (pull_request) Successful in 33s
CI / integration_tests (pull_request) Successful in 3m43s
CI / e2e_tests (pull_request) Successful in 6m7s
CI / unit_tests (pull_request) Successful in 7m32s
CI / docker (pull_request) Successful in 58s
CI / benchmark-publish (pull_request) Has been skipped
CI / coverage (pull_request) Successful in 9m25s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 52m38s
to bb94c10117
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 56s
CI / lint (pull_request) Successful in 3m21s
CI / build (pull_request) Successful in 16s
CI / integration_tests (pull_request) Successful in 3m37s
CI / security (pull_request) Successful in 4m10s
CI / typecheck (pull_request) Successful in 4m15s
CI / unit_tests (pull_request) Successful in 7m26s
CI / e2e_tests (pull_request) Successful in 9m20s
CI / docker (pull_request) Successful in 1m26s
CI / coverage (pull_request) Successful in 11m8s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-29 14:56:31 +00:00
Compare
CoreRasurae force-pushed feat/correction-attempts-table from bb94c10117
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / quality (pull_request) Successful in 56s
CI / lint (pull_request) Successful in 3m21s
CI / build (pull_request) Successful in 16s
CI / integration_tests (pull_request) Successful in 3m37s
CI / security (pull_request) Successful in 4m10s
CI / typecheck (pull_request) Successful in 4m15s
CI / unit_tests (pull_request) Successful in 7m26s
CI / e2e_tests (pull_request) Successful in 9m20s
CI / docker (pull_request) Successful in 1m26s
CI / coverage (pull_request) Successful in 11m8s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Has been cancelled
to f678d611bb
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / typecheck (pull_request) Successful in 58s
CI / quality (pull_request) Successful in 57s
CI / security (pull_request) Successful in 1m3s
CI / build (pull_request) Successful in 25s
CI / helm (pull_request) Successful in 34s
CI / lint (pull_request) Successful in 3m20s
CI / integration_tests (pull_request) Successful in 4m2s
CI / unit_tests (pull_request) Successful in 4m27s
CI / docker (pull_request) Successful in 1m18s
CI / coverage (pull_request) Successful in 8m39s
CI / e2e_tests (pull_request) Successful in 12m37s
CI / status-check (pull_request) Successful in 1s
CI / typecheck (push) Successful in 47s
CI / lint (push) Successful in 3m16s
CI / security (push) Successful in 4m7s
CI / quality (push) Successful in 3m46s
CI / build (push) Successful in 21s
CI / helm (push) Successful in 24s
CI / unit_tests (push) Successful in 4m19s
CI / integration_tests (push) Successful in 3m55s
CI / docker (push) Successful in 1m19s
CI / e2e_tests (push) Successful in 12m30s
CI / coverage (push) Successful in 11m48s
CI / status-check (push) Successful in 1s
CI / benchmark-publish (push) Successful in 29m36s
CI / benchmark-regression (push) Has been skipped
CI / benchmark-regression (pull_request) Successful in 56m32s
2026-03-29 15:23:20 +00:00
Compare
CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-03-29 15:31:26 +00:00
CoreRasurae deleted branch feat/correction-attempts-table 2026-03-29 15:36:58 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!1145
No description provided.