feat(plan): enforce decision type phase-gating at recording time #973

Merged
hurui200320 merged 2 commits from feature/m4-decision-phase-gating into master 2026-03-19 07:53:44 +00:00
Member

Summary

Adds phase-gating validation to DecisionService.record_decision() that enforces the specification's constraint: certain decision types are only valid during specific plan phases. This prevents invalid decisions (e.g., tool_invocation during Strategize, strategy_choice during Execute) from being persisted.

Changes

  • Exception (cleveragents.core.exceptions): Added DecisionPhaseViolationError(BusinessRuleViolation) with decision_type, plan_phase, and allowed_types attributes.
  • Phase constants (cleveragents.domain.models.core.decision):
    • resource_selection added to EXECUTE_TYPES — now phase-agnostic (Strategize or Execute) per ADR-007 L72 and ADR-033 L74.
    • subplan_spawn / subplan_parallel_spawn in both sets; code comment documents divergence from ADRs per M4 subplan model (ticket #931).
    • USER_INTERVENTION remains phase-agnostic (both sets).
    • Module-level docstring table updated to match actual assignments.
    • is_any_phase_type property updated to check membership in both sets dynamically (was hardcoded to USER_INTERVENTION only).
  • Phase-gating module (cleveragents.application.services.phase_gating):
    • Extracted from DecisionService to reduce decision_service.py line count (1010 → 913) and isolate the phase-gating concern.
    • PHASE_ALLOWED_TYPES typed as Mapping[PlanPhase, frozenset[DecisionType]].
    • resolve_plan_phase() helper: supports explicit parameter, DB lookup, and graceful skip.
    • validate_phase_gating() enforcement raises DecisionPhaseViolationError.
    • Exception narrowing: DB lookup catches (DatabaseError, OperationalError, OSError) instead of bare except Exception — only absorbs infrastructure failures, not programming errors.
    • # TODO(pg-migration): marker on TOCTOU race documentation for future PostgreSQL migration.
  • Decision service (cleveragents.application.services.decision_service):
    • Added plan_phase parameter to record_decision().
    • Invalid plan_phase string now raises ValidationError (was uncaught ValueError).
    • Imports and delegates to phase_gating module for all phase-gating logic.
    • PHASE_ALLOWED_TYPES re-exported in __all__ for backward compatibility.
  • CHANGELOG: Added behavioral change entry for resource_selection reclassification.
  • Backward compatibility: Phase-gating is opt-in — when neither plan_phase is provided nor a UnitOfWork is wired, validation is skipped, preserving all existing callers.
  • Unrelated drive-by reverted: Removed ULID_PATTERN from decision.py __all__ (was an unrelated export addition).
  • Tests:
    • 38 Behave scenarios covering valid/invalid types per phase, phase-agnostic acceptance, DB-based resolution (Strategize and Execute plans), unknown plan in DB, PlanPhase enum pass-through, error attributes, ungated phases, invalid plan_phase string validation, and DB error resilience in resolve_plan_phase.
    • 11 new Behave scenarios for is_any_phase_type: 4 dual-phase types (true) + 7 single-phase types (false), including prompt_definition root test.
    • 6 Robot Framework integration tests with stderr assertions.
    • Updated consolidated_decision.feature for new EXECUTE_TYPES member count (8 members).
    • Test cleanup now calls uow.engine.dispose() before file deletion.
    • tempfile.mktemp() replaced with tempfile.mkstemp().
    • Inline imports moved to module top-level per CONTRIBUTING.md.
    • Flaky concurrency test timing increased in subplan_execution_steps.py.

Review Round 1 + 2 Fixes

# Finding Resolution
P1-1 except Exception too broad in _resolve_plan_phase Narrowed to (DatabaseError, OperationalError, OSError) — matches codebase pattern
P2-2 decision_service.py at 1010 lines Extracted to phase_gating.py module (1010 → 913 lines)
P2-3 TOCTOU race — no programmatic guard Added # TODO(pg-migration): marker with actionable guidance
P2-4 resource_selection reclassification needs CHANGELOG Added CHANGELOG entry documenting behavioral change
P2-7 is_any_phase_type BDD gap for dual-phase types Added 11 parametrized scenarios covering all 4 dual-phase + 7 single-phase types
P3-5 ULID_PATTERN export is unrelated drive-by Reverted — removed from decision.py __all__
P3-6 decision.py at 514 lines (now 513) No action — reviewer accepted as marginally over

Coverage Fix (Round 3)

Coverage was at 96.9446% (displayed as 97% but rounded to 96.9% at 1-decimal precision, failing the ≥97% threshold). Added 2 new Behave scenarios to cover previously untested paths in phase_gating.py:

Scenario Lines Covered Description
Invalid plan_phase string raises ValidationError Lines 72-75 Tests that passing an invalid string like "not_a_real_phase" to plan_phase raises ValidationError with the invalid value in the message
resolve_plan_phase gracefully handles database errors Lines 101, 106-109 Tests that resolve_plan_phase() catches DatabaseError from a corrupted SQLite DB and returns None (skip gating) instead of propagating

Coverage moved from 96.9446% → 96.9587%, which rounds to 97.0% and passes the threshold.

Quality Gates

Session Result
lint PASS
typecheck PASS (0 errors)
unit_tests PASS (11,250 scenarios, 0 failures)
integration_tests PASS (1,563 tests, 0 failures)
e2e_tests PASS (16 tests, 0 failures)
coverage_report PASS (97.0%, threshold: 97%)

Closes #931

## Summary Adds phase-gating validation to `DecisionService.record_decision()` that enforces the specification's constraint: certain decision types are only valid during specific plan phases. This prevents invalid decisions (e.g., `tool_invocation` during Strategize, `strategy_choice` during Execute) from being persisted. ### Changes - **Exception** (`cleveragents.core.exceptions`): Added `DecisionPhaseViolationError(BusinessRuleViolation)` with `decision_type`, `plan_phase`, and `allowed_types` attributes. - **Phase constants** (`cleveragents.domain.models.core.decision`): - `resource_selection` added to `EXECUTE_TYPES` — now phase-agnostic (Strategize or Execute) per ADR-007 L72 and ADR-033 L74. - `subplan_spawn` / `subplan_parallel_spawn` in both sets; code comment documents divergence from ADRs per M4 subplan model (ticket #931). - `USER_INTERVENTION` remains phase-agnostic (both sets). - Module-level docstring table updated to match actual assignments. - `is_any_phase_type` property updated to check membership in both sets dynamically (was hardcoded to `USER_INTERVENTION` only). - **Phase-gating module** (`cleveragents.application.services.phase_gating`): - Extracted from `DecisionService` to reduce `decision_service.py` line count (1010 → 913) and isolate the phase-gating concern. - `PHASE_ALLOWED_TYPES` typed as `Mapping[PlanPhase, frozenset[DecisionType]]`. - `resolve_plan_phase()` helper: supports explicit parameter, DB lookup, and graceful skip. - `validate_phase_gating()` enforcement raises `DecisionPhaseViolationError`. - Exception narrowing: DB lookup catches `(DatabaseError, OperationalError, OSError)` instead of bare `except Exception` — only absorbs infrastructure failures, not programming errors. - `# TODO(pg-migration):` marker on TOCTOU race documentation for future PostgreSQL migration. - **Decision service** (`cleveragents.application.services.decision_service`): - Added `plan_phase` parameter to `record_decision()`. - Invalid `plan_phase` string now raises `ValidationError` (was uncaught `ValueError`). - Imports and delegates to `phase_gating` module for all phase-gating logic. - `PHASE_ALLOWED_TYPES` re-exported in `__all__` for backward compatibility. - **CHANGELOG**: Added behavioral change entry for `resource_selection` reclassification. - **Backward compatibility**: Phase-gating is opt-in — when neither `plan_phase` is provided nor a UnitOfWork is wired, validation is skipped, preserving all existing callers. - **Unrelated drive-by reverted**: Removed `ULID_PATTERN` from `decision.py` `__all__` (was an unrelated export addition). - **Tests**: - 38 Behave scenarios covering valid/invalid types per phase, phase-agnostic acceptance, DB-based resolution (Strategize and Execute plans), unknown plan in DB, PlanPhase enum pass-through, error attributes, ungated phases, invalid plan_phase string validation, and DB error resilience in `resolve_plan_phase`. - 11 new Behave scenarios for `is_any_phase_type`: 4 dual-phase types (true) + 7 single-phase types (false), including `prompt_definition` root test. - 6 Robot Framework integration tests with stderr assertions. - Updated `consolidated_decision.feature` for new `EXECUTE_TYPES` member count (8 members). - Test cleanup now calls `uow.engine.dispose()` before file deletion. - `tempfile.mktemp()` replaced with `tempfile.mkstemp()`. - Inline imports moved to module top-level per CONTRIBUTING.md. - Flaky concurrency test timing increased in `subplan_execution_steps.py`. ### Review Round 1 + 2 Fixes | # | Finding | Resolution | |---|---------|------------| | P1-1 | `except Exception` too broad in `_resolve_plan_phase` | Narrowed to `(DatabaseError, OperationalError, OSError)` — matches codebase pattern | | P2-2 | `decision_service.py` at 1010 lines | Extracted to `phase_gating.py` module (1010 → 913 lines) | | P2-3 | TOCTOU race — no programmatic guard | Added `# TODO(pg-migration):` marker with actionable guidance | | P2-4 | `resource_selection` reclassification needs CHANGELOG | Added CHANGELOG entry documenting behavioral change | | P2-7 | `is_any_phase_type` BDD gap for dual-phase types | Added 11 parametrized scenarios covering all 4 dual-phase + 7 single-phase types | | P3-5 | `ULID_PATTERN` export is unrelated drive-by | Reverted — removed from `decision.py` `__all__` | | P3-6 | `decision.py` at 514 lines (now 513) | No action — reviewer accepted as marginally over | ### Coverage Fix (Round 3) Coverage was at 96.9446% (displayed as 97% but rounded to 96.9% at 1-decimal precision, failing the ≥97% threshold). Added 2 new Behave scenarios to cover previously untested paths in `phase_gating.py`: | Scenario | Lines Covered | Description | |----------|---------------|-------------| | Invalid plan_phase string raises ValidationError | Lines 72-75 | Tests that passing an invalid string like `"not_a_real_phase"` to `plan_phase` raises `ValidationError` with the invalid value in the message | | resolve_plan_phase gracefully handles database errors | Lines 101, 106-109 | Tests that `resolve_plan_phase()` catches `DatabaseError` from a corrupted SQLite DB and returns `None` (skip gating) instead of propagating | Coverage moved from 96.9446% → 96.9587%, which rounds to 97.0% and passes the threshold. ### Quality Gates | Session | Result | |---------|--------| | lint | PASS | | typecheck | PASS (0 errors) | | unit_tests | PASS (11,250 scenarios, 0 failures) | | integration_tests | PASS (1,563 tests, 0 failures) | | e2e_tests | PASS (16 tests, 0 failures) | | coverage_report | PASS (97.0%, threshold: 97%) | Closes #931
hurui200320 added this to the v3.3.0 milestone 2026-03-16 08:12:14 +00:00
freemo left a comment

PM Day 36 Triage: Decision phase-gating implementation. Closes #931. M4 scope. Reviewer needed: @freemo (decision framework expert). Verify alignment with ADR-007 decision tree and ADR-033 decision recording protocol.

PM Day 36 Triage: Decision phase-gating implementation. Closes #931. M4 scope. Reviewer needed: @freemo (decision framework expert). Verify alignment with ADR-007 decision tree and ADR-033 decision recording protocol.
hurui200320 force-pushed feature/m4-decision-phase-gating from 031339fe06
All checks were successful
CI / lint (pull_request) Successful in 23s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 14s
CI / quality (pull_request) Successful in 50s
CI / typecheck (pull_request) Successful in 55s
CI / security (pull_request) Successful in 55s
CI / e2e_tests (pull_request) Successful in 1m42s
CI / unit_tests (pull_request) Successful in 3m24s
CI / docker (pull_request) Successful in 55s
CI / integration_tests (pull_request) Successful in 5m49s
CI / coverage (pull_request) Successful in 5m52s
CI / benchmark-regression (pull_request) Successful in 37m2s
to 20b6f0d5b9
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 36s
CI / security (pull_request) Successful in 41s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Successful in 1m19s
CI / unit_tests (pull_request) Successful in 3m18s
CI / integration_tests (pull_request) Successful in 3m46s
CI / docker (pull_request) Successful in 1m18s
CI / coverage (pull_request) Successful in 5m55s
CI / benchmark-regression (pull_request) Successful in 37m53s
2026-03-17 07:27:04 +00:00
Compare
Author
Member

Self-QA Review — Approved

Iterations: 2 review/fix cycles
Final verdict: Approve


Cycle 1: 16 issues found → 16 fixed

The initial review identified 1 critical, 4 major, 6 minor, and 4 nit issues. The critical finding was that resource_selection was incorrectly excluded from EXECUTE_TYPES, violating ADR-007, ADR-033, and the specification. All 16 issues were fixed in a single amended commit.

Key fixes applied:

  • Critical: Added DecisionType.RESOURCE_SELECTION to EXECUTE_TYPES; updated BDD scenarios and consolidated feature member counts
  • Major: Documented subplan_spawn/subplan_parallel_spawn ADR divergence with ticket reference; added TOCTOU race condition comment; added 2 missing DB-resolution test scenarios (empty DB + Execute phase)
  • Minor: Fixed is_any_phase_type to use dynamic set membership; wrapped PlanPhase() in try/except for ValidationError; added DB error resilience; updated stale docstring table; added engine disposal in test cleanup; added PlanPhase enum direct-pass test
  • Nits: Replaced deprecated tempfile.mktemp(); changed to Mapping type hint; moved imports to top-level; added stderr checks in Robot tests

Cycle 2: 0 critical/major issues — Approved

All 16 previous fixes verified correct. 8 minor style/coverage gaps and 5 nits remain — all non-blocking (defensive code path coverage, Robot Framework conventions, cosmetic code patterns).

Quality Gates

Session Result
nox -e lint PASS
nox -e typecheck PASS (0 errors)
nox -e unit_tests PASS (10,851 scenarios)
nox -e integration_tests PASS (1,517 tests)
nox -e e2e_tests PASS (4 tests)
nox -e coverage_report PASS (97%)

Full implementation notes posted on ticket #931.

## Self-QA Review — ✅ Approved **Iterations:** 2 review/fix cycles **Final verdict:** Approve --- ### Cycle 1: 16 issues found → 16 fixed The initial review identified **1 critical, 4 major, 6 minor, and 4 nit** issues. The critical finding was that `resource_selection` was incorrectly excluded from `EXECUTE_TYPES`, violating ADR-007, ADR-033, and the specification. All 16 issues were fixed in a single amended commit. Key fixes applied: - **Critical:** Added `DecisionType.RESOURCE_SELECTION` to `EXECUTE_TYPES`; updated BDD scenarios and consolidated feature member counts - **Major:** Documented `subplan_spawn`/`subplan_parallel_spawn` ADR divergence with ticket reference; added TOCTOU race condition comment; added 2 missing DB-resolution test scenarios (empty DB + Execute phase) - **Minor:** Fixed `is_any_phase_type` to use dynamic set membership; wrapped `PlanPhase()` in try/except for `ValidationError`; added DB error resilience; updated stale docstring table; added engine disposal in test cleanup; added PlanPhase enum direct-pass test - **Nits:** Replaced deprecated `tempfile.mktemp()`; changed to `Mapping` type hint; moved imports to top-level; added stderr checks in Robot tests ### Cycle 2: 0 critical/major issues — Approved All 16 previous fixes verified correct. 8 minor style/coverage gaps and 5 nits remain — all non-blocking (defensive code path coverage, Robot Framework conventions, cosmetic code patterns). ### Quality Gates | Session | Result | |---------|--------| | `nox -e lint` | ✅ PASS | | `nox -e typecheck` | ✅ PASS (0 errors) | | `nox -e unit_tests` | ✅ PASS (10,851 scenarios) | | `nox -e integration_tests` | ✅ PASS (1,517 tests) | | `nox -e e2e_tests` | ✅ PASS (4 tests) | | `nox -e coverage_report` | ✅ PASS (97%) | Full implementation notes posted on [ticket #931](https://git.cleverthis.com/cleveragents/cleveragents-core/issues/931#issuecomment-66057).
hurui200320 force-pushed feature/m4-decision-phase-gating from 20b6f0d5b9
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 36s
CI / security (pull_request) Successful in 41s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Successful in 1m19s
CI / unit_tests (pull_request) Successful in 3m18s
CI / integration_tests (pull_request) Successful in 3m46s
CI / docker (pull_request) Successful in 1m18s
CI / coverage (pull_request) Successful in 5m55s
CI / benchmark-regression (pull_request) Successful in 37m53s
to cb7edf2227
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 52s
CI / typecheck (pull_request) Successful in 1m4s
CI / e2e_tests (pull_request) Successful in 1m34s
CI / unit_tests (pull_request) Successful in 3m5s
CI / integration_tests (pull_request) Successful in 3m35s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 6m43s
CI / benchmark-regression (pull_request) Successful in 37m16s
2026-03-17 08:24:47 +00:00
Compare
brent.edwards requested changes 2026-03-17 19:52:59 +00:00
Dismissed
brent.edwards left a comment

Code Review — PR #973 feat(plan): enforce decision type phase-gating at recording time

Reviewer: @brent.edwards | Size: L (+802/−9, 8 files) | Focus: Domain invariants, service design, backward compat


P1:must-fix (1)

1. except Exception too broad in _resolve_plan_phase
decision_service.py:~807 — The comment says "database errors" but catches everything including TypeError, AttributeError, and other programming errors. The opt-in contract ("don't break if DB is unavailable") should only absorb infrastructure failures. Narrow to (OperationalError, DatabaseError, OSError) — the same pattern PR #971 correctly uses in _build_skill_service.


P2:should-fix (3)

2. decision_service.py at 1010 lines — well over the 500-line guideline. _resolve_plan_phase + _validate_phase_gating (~50 lines) could be extracted to a PhaseGatingPolicy class or phase_gating.py module.

3. TOCTOU race is documented but not programmatically guarded — the code assumes SQLite single-writer serialization. If someone switches to PostgreSQL, this assumption silently breaks. Add a # TODO(pg-migration): marker.

4. resource_selection reclassification to both-phases is a semantic breaking change — is_strategize_type and is_execute_type now return True for more types. The ticket #931 rationale is documented in code, but this deserves a CHANGELOG entry as a behavioral change.


P3:nit (2)

5. ULID_PATTERN export added to __all__ is an unrelated drive-by fix.
6. decision.py at 514 lines — marginally over guideline, acceptable.


Positive Observations

  • DecisionPhaseViolationError with structured attributes (decision_type, plan_phase, allowed_types: frozenset) — excellent for programmatic error handling
  • Phase-gating is fully opt-in: no plan_phase + no UoW = skip — backward compatible
  • PHASE_ALLOWED_TYPES as Mapping[PlanPhase, frozenset[DecisionType]] — immutable, O(1) lookup
  • Behave scenario outlines cover all type×phase combinations — thorough
  • tempfile.mktemp()tempfile.mkstemp() and uow.engine.dispose() before file deletion — good cleanup fixes

Verdict: REQUEST_CHANGES — P1-1 is a targeted fix.

## Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time` **Reviewer:** @brent.edwards | **Size:** L (+802/−9, 8 files) | **Focus:** Domain invariants, service design, backward compat --- ### P1:must-fix (1) **1. `except Exception` too broad in `_resolve_plan_phase`** `decision_service.py:~807` — The comment says "database errors" but catches everything including `TypeError`, `AttributeError`, and other programming errors. The opt-in contract ("don't break if DB is unavailable") should only absorb infrastructure failures. Narrow to `(OperationalError, DatabaseError, OSError)` — the same pattern PR #971 correctly uses in `_build_skill_service`. --- ### P2:should-fix (3) **2.** `decision_service.py` at 1010 lines — well over the 500-line guideline. `_resolve_plan_phase` + `_validate_phase_gating` (~50 lines) could be extracted to a `PhaseGatingPolicy` class or `phase_gating.py` module. **3.** TOCTOU race is documented but not programmatically guarded — the code assumes SQLite single-writer serialization. If someone switches to PostgreSQL, this assumption silently breaks. Add a `# TODO(pg-migration):` marker. **4.** `resource_selection` reclassification to both-phases is a semantic breaking change — `is_strategize_type` and `is_execute_type` now return `True` for more types. The ticket #931 rationale is documented in code, but this deserves a CHANGELOG entry as a behavioral change. --- ### P3:nit (2) **5.** `ULID_PATTERN` export added to `__all__` is an unrelated drive-by fix. **6.** `decision.py` at 514 lines — marginally over guideline, acceptable. --- ### Positive Observations - `DecisionPhaseViolationError` with structured attributes (`decision_type`, `plan_phase`, `allowed_types: frozenset`) — excellent for programmatic error handling - Phase-gating is fully opt-in: no `plan_phase` + no UoW = skip — backward compatible - `PHASE_ALLOWED_TYPES` as `Mapping[PlanPhase, frozenset[DecisionType]]` — immutable, O(1) lookup - Behave scenario outlines cover all type×phase combinations — thorough - `tempfile.mktemp()` → `tempfile.mkstemp()` and `uow.engine.dispose()` before file deletion — good cleanup fixes **Verdict:** REQUEST_CHANGES — P1-1 is a targeted fix.
brent.edwards left a comment

Code Review Round 2 — PR #973 feat(plan): enforce decision type phase-gating

Reviewer: @brent.edwards | Focus: Verification of Round 1 fixes + deep second pass


Prior Findings: 0 of 6 resolved

All Round 1 findings remain open:

# Sev Finding Status
P1-1 except Exception too broad in _resolve_plan_phase OPEN — still bare except Exception at line ~807. Comment added explaining intent but catch scope unchanged.
P2-2 decision_service.py at 1010 lines OPEN
P2-3 TOCTOU race — no programmatic SQLite assertion OPEN (excellent inline comment added, but still aspirational not enforceable)
P2-4 resource_selection reclassification needs CHANGELOG OPEN
P3-5 ULID_PATTERN export is unrelated OPEN
P3-6 decision.py at 514 lines OPEN

New Finding (1)

P2: is_any_phase_type BDD gap for 3 newly dual-phase types
The PR promotes resource_selection, subplan_spawn, and subplan_parallel_spawn from single-phase to both-phase, changing is_any_phase_type from FalseTrue for all three. But the only BDD scenario testing is_any_phase_type uses user_intervention. If someone later removes one of these from a phase set, the regression goes undetected.
Fix: Add a parametrized scenario covering all 4 dual-phase types.


Confirmed Clean from Second Pass

Area Verdict
Phase constants (STRATEGIZE_TYPES ∪ EXECUTE_TYPES) Correct — all 11 DecisionType members covered, no orphans
record_decision backward compat Safe — plan_phase is keyword-only, defaults to None, gating skipped when None
DecisionPhaseViolationError upstream handling Correct — caught by all upstream BusinessRuleViolation handlers (a2a, CLI, error_handling)
is_any_phase_type implementation Dynamically correct — checks both-set membership for 4 types
Test quality Good — real DecisionService with real UnitOfWork + SQLite, no mocking of service layer

Verdict: REQUEST_CHANGES — P1-1 (except Exception at line ~807) remains the sole merge-blocker. Narrowing to except (OperationalError, DatabaseError, OSError) resolves it. All other findings are P2/P3 and can be tracked as follow-up.

## Code Review Round 2 — PR #973 `feat(plan): enforce decision type phase-gating` **Reviewer:** @brent.edwards | **Focus:** Verification of Round 1 fixes + deep second pass --- ### Prior Findings: 0 of 6 resolved All Round 1 findings remain open: | # | Sev | Finding | Status | |---|-----|---------|--------| | P1-1 | `except Exception` too broad in `_resolve_plan_phase` | **OPEN** — still bare `except Exception` at line ~807. Comment added explaining intent but catch scope unchanged. | | P2-2 | `decision_service.py` at 1010 lines | OPEN | | P2-3 | TOCTOU race — no programmatic SQLite assertion | OPEN (excellent inline comment added, but still aspirational not enforceable) | | P2-4 | `resource_selection` reclassification needs CHANGELOG | OPEN | | P3-5 | `ULID_PATTERN` export is unrelated | OPEN | | P3-6 | `decision.py` at 514 lines | OPEN | --- ### New Finding (1) **P2: `is_any_phase_type` BDD gap for 3 newly dual-phase types** The PR promotes `resource_selection`, `subplan_spawn`, and `subplan_parallel_spawn` from single-phase to both-phase, changing `is_any_phase_type` from `False` → `True` for all three. But the only BDD scenario testing `is_any_phase_type` uses `user_intervention`. If someone later removes one of these from a phase set, the regression goes undetected. **Fix:** Add a parametrized scenario covering all 4 dual-phase types. --- ### Confirmed Clean from Second Pass | Area | Verdict | |------|---------| | Phase constants (STRATEGIZE_TYPES ∪ EXECUTE_TYPES) | Correct — all 11 DecisionType members covered, no orphans | | `record_decision` backward compat | Safe — `plan_phase` is keyword-only, defaults to `None`, gating skipped when `None` | | `DecisionPhaseViolationError` upstream handling | Correct — caught by all upstream `BusinessRuleViolation` handlers (a2a, CLI, error_handling) | | `is_any_phase_type` implementation | Dynamically correct — checks both-set membership for 4 types | | Test quality | Good — real `DecisionService` with real `UnitOfWork` + SQLite, no mocking of service layer | --- **Verdict:** REQUEST_CHANGES — P1-1 (`except Exception` at line ~807) remains the sole merge-blocker. Narrowing to `except (OperationalError, DatabaseError, OSError)` resolves it. All other findings are P2/P3 and can be tracked as follow-up.
hurui200320 force-pushed feature/m4-decision-phase-gating from cb7edf2227
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 37s
CI / security (pull_request) Successful in 52s
CI / typecheck (pull_request) Successful in 1m4s
CI / e2e_tests (pull_request) Successful in 1m34s
CI / unit_tests (pull_request) Successful in 3m5s
CI / integration_tests (pull_request) Successful in 3m35s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 6m43s
CI / benchmark-regression (pull_request) Successful in 37m16s
to ef37aa3dfc
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 47s
CI / unit_tests (pull_request) Successful in 3m0s
CI / integration_tests (pull_request) Successful in 3m33s
CI / e2e_tests (pull_request) Successful in 4m3s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-18 07:22:28 +00:00
Compare
Author
Member

Review Fix Summary — Addressing Rounds 1 & 2 (@brent.edwards)

All 7 findings from both review rounds have been addressed. Branch rebased onto latest master and force-pushed.

P1-1 (must-fix): except Exception too broad — RESOLVED

Narrowed the catch in resolve_plan_phase() from bare except Exception to except (DatabaseError, OperationalError, OSError). This follows the established codebase pattern from repositories.py:

  • DatabaseError (from cleveragents.core.exceptions) — catches repository-wrapped DB errors
  • OperationalError (from sqlalchemy.exc) — catches raw SQLAlchemy connection/operation errors from the UoW layer
  • OSError — catches filesystem-level SQLite access failures

Programming errors (TypeError, AttributeError, etc.) now correctly propagate instead of being silently swallowed.

P2-2 (should-fix): decision_service.py at 1010 lines — RESOLVED

Extracted phase-gating concern to new module cleveragents.application.services.phase_gating:

  • PHASE_ALLOWED_TYPES constant
  • resolve_plan_phase() (was _resolve_plan_phase instance method)
  • validate_phase_gating() (was _validate_phase_gating static method)

decision_service.py reduced from 1010 → 913 lines. The service imports and delegates to the new module. PHASE_ALLOWED_TYPES is re-exported in decision_service.__all__ for backward compatibility.

P2-3 (should-fix): TOCTOU race — no programmatic guard — RESOLVED

Added # TODO(pg-migration): marker to the TOCTOU comment with actionable guidance: "At minimum, a single-writer assertion or advisory lock should guard this section under multi-writer engines."

P2-4 (should-fix): resource_selection reclassification needs CHANGELOG — RESOLVED

Added CHANGELOG entry under ## Unreleased documenting the behavioral change: resource_selection reclassified from Execute-only to phase-agnostic, with impact note for code relying on is_strategize_type/is_execute_type.

P2-7 (new in Round 2): is_any_phase_type BDD gap — RESOLVED

Added 11 new Behave scenarios to consolidated_decision.feature:

  • Scenario Outline: All dual-phase types report is_any_phase_type true — parametrized over resource_selection, subplan_spawn, subplan_parallel_spawn, user_intervention (4 scenarios)
  • Scenario Outline: Single-phase types report is_any_phase_type false — parametrized over invariant_enforced, strategy_choice, implementation_choice, tool_invocation, error_recovery, validation_response (6 scenarios)
  • prompt_definition tested separately (existing root scenario) with added is_any_phase_type should be false assertion

P3-5 (nit): ULID_PATTERN export is unrelated drive-by — RESOLVED

Removed ULID_PATTERN from decision.py __all__.

P3-6 (nit): decision.py at 514 lines — No action

Reviewer accepted as marginally over guideline.

Quality Gates

Session Result
lint PASS
typecheck PASS (0 errors)
unit_tests PASS (11,153 scenarios, 0 failures, 0 errors)
integration_tests PASS (1,563 tests, 0 failures)
e2e_tests PASS (16 tests, 0 failures)
coverage_report PASS (97%)
## Review Fix Summary — Addressing Rounds 1 & 2 (@brent.edwards) All 7 findings from both review rounds have been addressed. Branch rebased onto latest `master` and force-pushed. ### P1-1 (must-fix): `except Exception` too broad — ✅ RESOLVED Narrowed the catch in `resolve_plan_phase()` from bare `except Exception` to `except (DatabaseError, OperationalError, OSError)`. This follows the established codebase pattern from `repositories.py`: - `DatabaseError` (from `cleveragents.core.exceptions`) — catches repository-wrapped DB errors - `OperationalError` (from `sqlalchemy.exc`) — catches raw SQLAlchemy connection/operation errors from the UoW layer - `OSError` — catches filesystem-level SQLite access failures Programming errors (`TypeError`, `AttributeError`, etc.) now correctly propagate instead of being silently swallowed. ### P2-2 (should-fix): `decision_service.py` at 1010 lines — ✅ RESOLVED Extracted phase-gating concern to new module `cleveragents.application.services.phase_gating`: - `PHASE_ALLOWED_TYPES` constant - `resolve_plan_phase()` (was `_resolve_plan_phase` instance method) - `validate_phase_gating()` (was `_validate_phase_gating` static method) `decision_service.py` reduced from 1010 → 913 lines. The service imports and delegates to the new module. `PHASE_ALLOWED_TYPES` is re-exported in `decision_service.__all__` for backward compatibility. ### P2-3 (should-fix): TOCTOU race — no programmatic guard — ✅ RESOLVED Added `# TODO(pg-migration):` marker to the TOCTOU comment with actionable guidance: *"At minimum, a single-writer assertion or advisory lock should guard this section under multi-writer engines."* ### P2-4 (should-fix): `resource_selection` reclassification needs CHANGELOG — ✅ RESOLVED Added CHANGELOG entry under `## Unreleased` documenting the behavioral change: `resource_selection` reclassified from Execute-only to phase-agnostic, with impact note for code relying on `is_strategize_type`/`is_execute_type`. ### P2-7 (new in Round 2): `is_any_phase_type` BDD gap — ✅ RESOLVED Added 11 new Behave scenarios to `consolidated_decision.feature`: - **Scenario Outline: All dual-phase types report is_any_phase_type true** — parametrized over `resource_selection`, `subplan_spawn`, `subplan_parallel_spawn`, `user_intervention` (4 scenarios) - **Scenario Outline: Single-phase types report is_any_phase_type false** — parametrized over `invariant_enforced`, `strategy_choice`, `implementation_choice`, `tool_invocation`, `error_recovery`, `validation_response` (6 scenarios) - **prompt_definition** tested separately (existing root scenario) with added `is_any_phase_type should be false` assertion ### P3-5 (nit): `ULID_PATTERN` export is unrelated drive-by — ✅ RESOLVED Removed `ULID_PATTERN` from `decision.py` `__all__`. ### P3-6 (nit): `decision.py` at 514 lines — No action Reviewer accepted as marginally over guideline. ### Quality Gates | Session | Result | |---------|--------| | lint | ✅ PASS | | typecheck | ✅ PASS (0 errors) | | unit_tests | ✅ PASS (11,153 scenarios, 0 failures, 0 errors) | | integration_tests | ✅ PASS (1,563 tests, 0 failures) | | e2e_tests | ✅ PASS (16 tests, 0 failures) | | coverage_report | ✅ PASS (97%) |
hurui200320 force-pushed feature/m4-decision-phase-gating from ef37aa3dfc
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 41s
CI / security (pull_request) Successful in 47s
CI / unit_tests (pull_request) Successful in 3m0s
CI / integration_tests (pull_request) Successful in 3m33s
CI / e2e_tests (pull_request) Successful in 4m3s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
to 1f016bea33
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 22s
CI / quality (pull_request) Successful in 34s
CI / security (pull_request) Successful in 49s
CI / typecheck (pull_request) Successful in 54s
CI / unit_tests (pull_request) Successful in 3m22s
CI / integration_tests (pull_request) Successful in 3m55s
CI / docker (pull_request) Successful in 1m7s
CI / e2e_tests (pull_request) Successful in 4m46s
CI / coverage (pull_request) Successful in 8m0s
CI / benchmark-regression (pull_request) Successful in 38m31s
2026-03-18 07:29:13 +00:00
Compare
hurui200320 force-pushed feature/m4-decision-phase-gating from 1f016bea33
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 22s
CI / quality (pull_request) Successful in 34s
CI / security (pull_request) Successful in 49s
CI / typecheck (pull_request) Successful in 54s
CI / unit_tests (pull_request) Successful in 3m22s
CI / integration_tests (pull_request) Successful in 3m55s
CI / docker (pull_request) Successful in 1m7s
CI / e2e_tests (pull_request) Successful in 4m46s
CI / coverage (pull_request) Successful in 8m0s
CI / benchmark-regression (pull_request) Successful in 38m31s
to 35eb7b762a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 46s
CI / security (pull_request) Successful in 53s
CI / integration_tests (pull_request) Successful in 4m7s
CI / e2e_tests (pull_request) Successful in 5m5s
CI / unit_tests (pull_request) Successful in 5m40s
CI / docker (pull_request) Successful in 1m40s
CI / coverage (pull_request) Successful in 6m52s
CI / benchmark-regression (pull_request) Successful in 40m53s
2026-03-18 08:26:06 +00:00
Compare
freemo approved these changes 2026-03-19 04:56:07 +00:00
Dismissed
freemo left a comment

Code Review — PR #973 feat(plan): enforce decision type phase-gating at recording time

Cleanly scoped feature with good architectural separation. The extraction of phase-gating logic into phase_gating.py (148 lines) with clear API boundary (resolve_plan_phase(), validate_phase_gating(), PHASE_ALLOWED_TYPES) is well-done. The opt-in design (gating skipped when neither plan_phase nor UoW is provided) ensures backward compatibility.

The resource_selection reclassification to phase-agnostic is a breaking behavioral change, but it's properly documented in the CHANGELOG with ADR references (ADR-007 L72, ADR-033 L74). The TOCTOU race condition is documented with a clear TODO(pg-migration) marker.

36 Behave scenarios + 6 Robot tests + proper exception hierarchy (DecisionPhaseViolationError(BusinessRuleViolation)) demonstrate thorough implementation.

Approved. No issues found.

## Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time` Cleanly scoped feature with good architectural separation. The extraction of phase-gating logic into `phase_gating.py` (148 lines) with clear API boundary (`resolve_plan_phase()`, `validate_phase_gating()`, `PHASE_ALLOWED_TYPES`) is well-done. The opt-in design (gating skipped when neither `plan_phase` nor UoW is provided) ensures backward compatibility. The `resource_selection` reclassification to phase-agnostic is a breaking behavioral change, but it's properly documented in the CHANGELOG with ADR references (ADR-007 L72, ADR-033 L74). The TOCTOU race condition is documented with a clear `TODO(pg-migration)` marker. 36 Behave scenarios + 6 Robot tests + proper exception hierarchy (`DecisionPhaseViolationError(BusinessRuleViolation)`) demonstrate thorough implementation. **Approved.** No issues found.
hurui200320 force-pushed feature/m4-decision-phase-gating from 35eb7b762a
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 46s
CI / security (pull_request) Successful in 53s
CI / integration_tests (pull_request) Successful in 4m7s
CI / e2e_tests (pull_request) Successful in 5m5s
CI / unit_tests (pull_request) Successful in 5m40s
CI / docker (pull_request) Successful in 1m40s
CI / coverage (pull_request) Successful in 6m52s
CI / benchmark-regression (pull_request) Successful in 40m53s
to 52f1bb2abb
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / build (pull_request) Successful in 23s
CI / quality (pull_request) Successful in 40s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 53s
CI / unit_tests (pull_request) Successful in 3m28s
CI / integration_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 3m42s
CI / docker (pull_request) Successful in 56s
CI / coverage (pull_request) Failing after 7m1s
CI / benchmark-regression (pull_request) Successful in 38m17s
2026-03-19 05:46:57 +00:00
Compare
hurui200320 dismissed freemo's review 2026-03-19 05:46:57 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-19 05:50:58 +00:00
hurui200320 force-pushed feature/m4-decision-phase-gating from 52f1bb2abb
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / build (pull_request) Successful in 23s
CI / quality (pull_request) Successful in 40s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 53s
CI / unit_tests (pull_request) Successful in 3m28s
CI / integration_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 3m42s
CI / docker (pull_request) Successful in 56s
CI / coverage (pull_request) Failing after 7m1s
CI / benchmark-regression (pull_request) Successful in 38m17s
to 1ec6b2ac27
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 25s
CI / typecheck (pull_request) Successful in 41s
CI / quality (pull_request) Successful in 44s
CI / security (pull_request) Successful in 48s
CI / integration_tests (pull_request) Successful in 3m38s
CI / unit_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 4m8s
CI / docker (pull_request) Successful in 1m5s
CI / coverage (pull_request) Successful in 6m59s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-19 06:56:28 +00:00
Compare
hurui200320 force-pushed feature/m4-decision-phase-gating from 1ec6b2ac27
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 25s
CI / typecheck (pull_request) Successful in 41s
CI / quality (pull_request) Successful in 44s
CI / security (pull_request) Successful in 48s
CI / integration_tests (pull_request) Successful in 3m38s
CI / unit_tests (pull_request) Successful in 3m39s
CI / e2e_tests (pull_request) Successful in 4m8s
CI / docker (pull_request) Successful in 1m5s
CI / coverage (pull_request) Successful in 6m59s
CI / benchmark-regression (pull_request) Has been cancelled
to 231c3656e0
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 52s
CI / unit_tests (pull_request) Successful in 3m36s
CI / integration_tests (pull_request) Successful in 3m40s
CI / benchmark-regression (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
2026-03-19 07:10:54 +00:00
Compare
hurui200320 force-pushed feature/m4-decision-phase-gating from 231c3656e0
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 52s
CI / unit_tests (pull_request) Successful in 3m36s
CI / integration_tests (pull_request) Successful in 3m40s
CI / benchmark-regression (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to 296daebe59
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 21s
CI / lint (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 47s
CI / security (pull_request) Successful in 53s
CI / unit_tests (pull_request) Successful in 3m13s
CI / integration_tests (pull_request) Successful in 3m50s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 5m47s
CI / coverage (pull_request) Successful in 7m12s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-19 07:14:58 +00:00
Compare
Author
Member

Coverage Fix — Branch rebased and pushed

What changed

The coverage gate was failing at 96.9% (displayed as 97% but the precise value 96.9446% rounds to 96.9% at 1 decimal). Two untested paths in phase_gating.py were the gap:

  1. Invalid plan_phase string → ValidationError — Added scenario testing that plan_phase="not_a_real_phase" raises ValidationError with the invalid value in the message.
  2. DB error resilience in resolve_plan_phase — Added scenario that corrupts a real SQLite DB and verifies resolve_plan_phase() catches the DatabaseError and returns None (graceful skip).

Rebase

Branch rebased onto latest origin/master (cbf8bcc9). Resolved CHANGELOG.md conflict (kept both entries).

All quality gates pass (post-rebase)

  • lint
  • typecheck (0 errors)
  • unit_tests (11,264 scenarios)
  • integration_tests
  • e2e_tests (37 tests)
  • coverage_report: 97.0% (was 96.9%)

CI running: https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/2381

## Coverage Fix — Branch rebased and pushed ### What changed The coverage gate was failing at **96.9%** (displayed as 97% but the precise value 96.9446% rounds to 96.9% at 1 decimal). Two untested paths in `phase_gating.py` were the gap: 1. **Invalid plan_phase string → ValidationError** — Added scenario testing that `plan_phase="not_a_real_phase"` raises `ValidationError` with the invalid value in the message. 2. **DB error resilience in resolve_plan_phase** — Added scenario that corrupts a real SQLite DB and verifies `resolve_plan_phase()` catches the `DatabaseError` and returns `None` (graceful skip). ### Rebase Branch rebased onto latest `origin/master` (`cbf8bcc9`). Resolved CHANGELOG.md conflict (kept both entries). ### All quality gates pass (post-rebase) - ✅ lint - ✅ typecheck (0 errors) - ✅ unit_tests (11,264 scenarios) - ✅ integration_tests - ✅ e2e_tests (37 tests) - ✅ **coverage_report: 97.0%** (was 96.9%) CI running: https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/2381
Merge branch 'master' into feature/m4-decision-phase-gating
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 30s
CI / security (pull_request) Successful in 51s
CI / typecheck (pull_request) Successful in 1m3s
CI / unit_tests (pull_request) Successful in 3m25s
CI / integration_tests (pull_request) Successful in 3m45s
CI / docker (pull_request) Successful in 55s
CI / e2e_tests (pull_request) Successful in 5m19s
CI / coverage (pull_request) Successful in 7m1s
CI / benchmark-regression (pull_request) Successful in 38m21s
a3706a393d
hurui200320 deleted branch feature/m4-decision-phase-gating 2026-03-19 07:53:44 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!973
No description provided.