feat(plan): enforce decision type phase-gating at recording time #973

2026-03-16T08:12:06Z

hurui200320 commented

2026-03-16 08:12:06 +00:00

Summary

Adds phase-gating validation to DecisionService.record_decision() that enforces the specification's constraint: certain decision types are only valid during specific plan phases. This prevents invalid decisions (e.g., tool_invocation during Strategize, strategy_choice during Execute) from being persisted.

Changes

Exception (cleveragents.core.exceptions): Added DecisionPhaseViolationError(BusinessRuleViolation) with decision_type, plan_phase, and allowed_types attributes.
Phase constants (cleveragents.domain.models.core.decision):
- resource_selection added to EXECUTE_TYPES — now phase-agnostic (Strategize or Execute) per ADR-007 L72 and ADR-033 L74.
- subplan_spawn / subplan_parallel_spawn in both sets; code comment documents divergence from ADRs per M4 subplan model (ticket #931).
- USER_INTERVENTION remains phase-agnostic (both sets).
- Module-level docstring table updated to match actual assignments.
- is_any_phase_type property updated to check membership in both sets dynamically (was hardcoded to USER_INTERVENTION only).
Phase-gating module (cleveragents.application.services.phase_gating):
- Extracted from DecisionService to reduce decision_service.py line count (1010 → 913) and isolate the phase-gating concern.
- PHASE_ALLOWED_TYPES typed as Mapping[PlanPhase, frozenset[DecisionType]].
- resolve_plan_phase() helper: supports explicit parameter, DB lookup, and graceful skip.
- validate_phase_gating() enforcement raises DecisionPhaseViolationError.
- Exception narrowing: DB lookup catches (DatabaseError, OperationalError, OSError) instead of bare except Exception — only absorbs infrastructure failures, not programming errors.
- # TODO(pg-migration): marker on TOCTOU race documentation for future PostgreSQL migration.
Decision service (cleveragents.application.services.decision_service):
- Added plan_phase parameter to record_decision().
- Invalid plan_phase string now raises ValidationError (was uncaught ValueError).
- Imports and delegates to phase_gating module for all phase-gating logic.
- PHASE_ALLOWED_TYPES re-exported in __all__ for backward compatibility.
CHANGELOG: Added behavioral change entry for resource_selection reclassification.
Backward compatibility: Phase-gating is opt-in — when neither plan_phase is provided nor a UnitOfWork is wired, validation is skipped, preserving all existing callers.
Unrelated drive-by reverted: Removed ULID_PATTERN from decision.py __all__ (was an unrelated export addition).
Tests:
- 38 Behave scenarios covering valid/invalid types per phase, phase-agnostic acceptance, DB-based resolution (Strategize and Execute plans), unknown plan in DB, PlanPhase enum pass-through, error attributes, ungated phases, invalid plan_phase string validation, and DB error resilience in resolve_plan_phase.
- 11 new Behave scenarios for is_any_phase_type: 4 dual-phase types (true) + 7 single-phase types (false), including prompt_definition root test.
- 6 Robot Framework integration tests with stderr assertions.
- Updated consolidated_decision.feature for new EXECUTE_TYPES member count (8 members).
- Test cleanup now calls uow.engine.dispose() before file deletion.
- tempfile.mktemp() replaced with tempfile.mkstemp().
- Inline imports moved to module top-level per CONTRIBUTING.md.
- Flaky concurrency test timing increased in subplan_execution_steps.py.

Review Round 1 + 2 Fixes

#	Finding	Resolution
P1-1	`except Exception` too broad in `_resolve_plan_phase`	Narrowed to `(DatabaseError, OperationalError, OSError)` — matches codebase pattern
P2-2	`decision_service.py` at 1010 lines	Extracted to `phase_gating.py` module (1010 → 913 lines)
P2-3	TOCTOU race — no programmatic guard	Added `# TODO(pg-migration):` marker with actionable guidance
P2-4	`resource_selection` reclassification needs CHANGELOG	Added CHANGELOG entry documenting behavioral change
P2-7	`is_any_phase_type` BDD gap for dual-phase types	Added 11 parametrized scenarios covering all 4 dual-phase + 7 single-phase types
P3-5	`ULID_PATTERN` export is unrelated drive-by	Reverted — removed from `decision.py` `__all__`
P3-6	`decision.py` at 514 lines (now 513)	No action — reviewer accepted as marginally over

Coverage Fix (Round 3)

Coverage was at 96.9446% (displayed as 97% but rounded to 96.9% at 1-decimal precision, failing the ≥97% threshold). Added 2 new Behave scenarios to cover previously untested paths in phase_gating.py:

Scenario	Lines Covered	Description
Invalid plan_phase string raises ValidationError	Lines 72-75	Tests that passing an invalid string like `"not_a_real_phase"` to `plan_phase` raises `ValidationError` with the invalid value in the message
resolve_plan_phase gracefully handles database errors	Lines 101, 106-109	Tests that `resolve_plan_phase()` catches `DatabaseError` from a corrupted SQLite DB and returns `None` (skip gating) instead of propagating

Coverage moved from 96.9446% → 96.9587%, which rounds to 97.0% and passes the threshold.

Quality Gates

Session	Result
lint	PASS
typecheck	PASS (0 errors)
unit_tests	PASS (11,250 scenarios, 0 failures)
integration_tests	PASS (1,563 tests, 0 failures)
e2e_tests	PASS (16 tests, 0 failures)
coverage_report	PASS (97.0%, threshold: 97%)

Closes #931

## Summary Adds phase-gating validation to `DecisionService.record_decision()` that enforces the specification's constraint: certain decision types are only valid during specific plan phases. This prevents invalid decisions (e.g., `tool_invocation` during Strategize, `strategy_choice` during Execute) from being persisted. ### Changes - **Exception** (`cleveragents.core.exceptions`): Added `DecisionPhaseViolationError(BusinessRuleViolation)` with `decision_type`, `plan_phase`, and `allowed_types` attributes. - **Phase constants** (`cleveragents.domain.models.core.decision`): - `resource_selection` added to `EXECUTE_TYPES` — now phase-agnostic (Strategize or Execute) per ADR-007 L72 and ADR-033 L74. - `subplan_spawn` / `subplan_parallel_spawn` in both sets; code comment documents divergence from ADRs per M4 subplan model (ticket #931). - `USER_INTERVENTION` remains phase-agnostic (both sets). - Module-level docstring table updated to match actual assignments. - `is_any_phase_type` property updated to check membership in both sets dynamically (was hardcoded to `USER_INTERVENTION` only). - **Phase-gating module** (`cleveragents.application.services.phase_gating`): - Extracted from `DecisionService` to reduce `decision_service.py` line count (1010 → 913) and isolate the phase-gating concern. - `PHASE_ALLOWED_TYPES` typed as `Mapping[PlanPhase, frozenset[DecisionType]]`. - `resolve_plan_phase()` helper: supports explicit parameter, DB lookup, and graceful skip. - `validate_phase_gating()` enforcement raises `DecisionPhaseViolationError`. - Exception narrowing: DB lookup catches `(DatabaseError, OperationalError, OSError)` instead of bare `except Exception` — only absorbs infrastructure failures, not programming errors. - `# TODO(pg-migration):` marker on TOCTOU race documentation for future PostgreSQL migration. - **Decision service** (`cleveragents.application.services.decision_service`): - Added `plan_phase` parameter to `record_decision()`. - Invalid `plan_phase` string now raises `ValidationError` (was uncaught `ValueError`). - Imports and delegates to `phase_gating` module for all phase-gating logic. - `PHASE_ALLOWED_TYPES` re-exported in `__all__` for backward compatibility. - **CHANGELOG**: Added behavioral change entry for `resource_selection` reclassification. - **Backward compatibility**: Phase-gating is opt-in — when neither `plan_phase` is provided nor a UnitOfWork is wired, validation is skipped, preserving all existing callers. - **Unrelated drive-by reverted**: Removed `ULID_PATTERN` from `decision.py` `__all__` (was an unrelated export addition). - **Tests**: - 38 Behave scenarios covering valid/invalid types per phase, phase-agnostic acceptance, DB-based resolution (Strategize and Execute plans), unknown plan in DB, PlanPhase enum pass-through, error attributes, ungated phases, invalid plan_phase string validation, and DB error resilience in `resolve_plan_phase`. - 11 new Behave scenarios for `is_any_phase_type`: 4 dual-phase types (true) + 7 single-phase types (false), including `prompt_definition` root test. - 6 Robot Framework integration tests with stderr assertions. - Updated `consolidated_decision.feature` for new `EXECUTE_TYPES` member count (8 members). - Test cleanup now calls `uow.engine.dispose()` before file deletion. - `tempfile.mktemp()` replaced with `tempfile.mkstemp()`. - Inline imports moved to module top-level per CONTRIBUTING.md. - Flaky concurrency test timing increased in `subplan_execution_steps.py`. ### Review Round 1 + 2 Fixes | # | Finding | Resolution | |---|---------|------------| | P1-1 | `except Exception` too broad in `_resolve_plan_phase` | Narrowed to `(DatabaseError, OperationalError, OSError)` — matches codebase pattern | | P2-2 | `decision_service.py` at 1010 lines | Extracted to `phase_gating.py` module (1010 → 913 lines) | | P2-3 | TOCTOU race — no programmatic guard | Added `# TODO(pg-migration):` marker with actionable guidance | | P2-4 | `resource_selection` reclassification needs CHANGELOG | Added CHANGELOG entry documenting behavioral change | | P2-7 | `is_any_phase_type` BDD gap for dual-phase types | Added 11 parametrized scenarios covering all 4 dual-phase + 7 single-phase types | | P3-5 | `ULID_PATTERN` export is unrelated drive-by | Reverted — removed from `decision.py` `__all__` | | P3-6 | `decision.py` at 514 lines (now 513) | No action — reviewer accepted as marginally over | ### Coverage Fix (Round 3) Coverage was at 96.9446% (displayed as 97% but rounded to 96.9% at 1-decimal precision, failing the ≥97% threshold). Added 2 new Behave scenarios to cover previously untested paths in `phase_gating.py`: | Scenario | Lines Covered | Description | |----------|---------------|-------------| | Invalid plan_phase string raises ValidationError | Lines 72-75 | Tests that passing an invalid string like `"not_a_real_phase"` to `plan_phase` raises `ValidationError` with the invalid value in the message | | resolve_plan_phase gracefully handles database errors | Lines 101, 106-109 | Tests that `resolve_plan_phase()` catches `DatabaseError` from a corrupted SQLite DB and returns `None` (skip gating) instead of propagating | Coverage moved from 96.9446% → 96.9587%, which rounds to 97.0% and passes the threshold. ### Quality Gates | Session | Result | |---------|--------| | lint | PASS | | typecheck | PASS (0 errors) | | unit_tests | PASS (11,250 scenarios, 0 failures) | | integration_tests | PASS (1,563 tests, 0 failures) | | e2e_tests | PASS (16 tests, 0 failures) | | coverage_report | PASS (97.0%, threshold: 97%) | Closes #931

hurui200320 added this to the v3.3.0 milestone 2026-03-16 08:12:14 +00:00

hurui200320 added the

Type

Feature

label 2026-03-16 08:12:15 +00:00

freemo reviewed 2026-03-16 16:14:08 +00:00

freemo left a comment

PM Day 36 Triage: Decision phase-gating implementation. Closes #931. M4 scope. Reviewer needed: @freemo (decision framework expert). Verify alignment with ADR-007 decision tree and ADR-033 decision recording protocol.

hurui200320 force-pushed feature/m4-decision-phase-gating from 031339fe06 to 20b6f0d5b9

2026-03-17 07:27:04 +00:00

Compare

hurui200320 commented

2026-03-17 08:04:00 +00:00

Self-QA Review — ✅ Approved

Iterations: 2 review/fix cycles
Final verdict: Approve

Cycle 1: 16 issues found → 16 fixed

The initial review identified 1 critical, 4 major, 6 minor, and 4 nit issues. The critical finding was that resource_selection was incorrectly excluded from EXECUTE_TYPES, violating ADR-007, ADR-033, and the specification. All 16 issues were fixed in a single amended commit.

Key fixes applied:

Critical: Added DecisionType.RESOURCE_SELECTION to EXECUTE_TYPES; updated BDD scenarios and consolidated feature member counts
Major: Documented subplan_spawn/subplan_parallel_spawn ADR divergence with ticket reference; added TOCTOU race condition comment; added 2 missing DB-resolution test scenarios (empty DB + Execute phase)
Minor: Fixed is_any_phase_type to use dynamic set membership; wrapped PlanPhase() in try/except for ValidationError; added DB error resilience; updated stale docstring table; added engine disposal in test cleanup; added PlanPhase enum direct-pass test
Nits: Replaced deprecated tempfile.mktemp(); changed to Mapping type hint; moved imports to top-level; added stderr checks in Robot tests

Cycle 2: 0 critical/major issues — Approved

All 16 previous fixes verified correct. 8 minor style/coverage gaps and 5 nits remain — all non-blocking (defensive code path coverage, Robot Framework conventions, cosmetic code patterns).

Quality Gates

Session	Result
`nox -e lint`	✅ PASS
`nox -e typecheck`	✅ PASS (0 errors)
`nox -e unit_tests`	✅ PASS (10,851 scenarios)
`nox -e integration_tests`	✅ PASS (1,517 tests)
`nox -e e2e_tests`	✅ PASS (4 tests)
`nox -e coverage_report`	✅ PASS (97%)

Full implementation notes posted on ticket #931.

## Self-QA Review — ✅ Approved **Iterations:** 2 review/fix cycles **Final verdict:** Approve --- ### Cycle 1: 16 issues found → 16 fixed The initial review identified **1 critical, 4 major, 6 minor, and 4 nit** issues. The critical finding was that `resource_selection` was incorrectly excluded from `EXECUTE_TYPES`, violating ADR-007, ADR-033, and the specification. All 16 issues were fixed in a single amended commit. Key fixes applied: - **Critical:** Added `DecisionType.RESOURCE_SELECTION` to `EXECUTE_TYPES`; updated BDD scenarios and consolidated feature member counts - **Major:** Documented `subplan_spawn`/`subplan_parallel_spawn` ADR divergence with ticket reference; added TOCTOU race condition comment; added 2 missing DB-resolution test scenarios (empty DB + Execute phase) - **Minor:** Fixed `is_any_phase_type` to use dynamic set membership; wrapped `PlanPhase()` in try/except for `ValidationError`; added DB error resilience; updated stale docstring table; added engine disposal in test cleanup; added PlanPhase enum direct-pass test - **Nits:** Replaced deprecated `tempfile.mktemp()`; changed to `Mapping` type hint; moved imports to top-level; added stderr checks in Robot tests ### Cycle 2: 0 critical/major issues — Approved All 16 previous fixes verified correct. 8 minor style/coverage gaps and 5 nits remain — all non-blocking (defensive code path coverage, Robot Framework conventions, cosmetic code patterns). ### Quality Gates | Session | Result | |---------|--------| | `nox -e lint` | ✅ PASS | | `nox -e typecheck` | ✅ PASS (0 errors) | | `nox -e unit_tests` | ✅ PASS (10,851 scenarios) | | `nox -e integration_tests` | ✅ PASS (1,517 tests) | | `nox -e e2e_tests` | ✅ PASS (4 tests) | | `nox -e coverage_report` | ✅ PASS (97%) | Full implementation notes posted on [ticket #931](https://git.cleverthis.com/cleveragents/cleveragents-core/issues/931#issuecomment-66057).

hurui200320 force-pushed feature/m4-decision-phase-gating from 20b6f0d5b9 to cb7edf2227

2026-03-17 08:24:47 +00:00

Compare

brent.edwards requested changes 2026-03-17 19:52:59 +00:00

Dismissed

brent.edwards left a comment

Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time`

Reviewer: @brent.edwards | Size: L (+802/−9, 8 files) | Focus: Domain invariants, service design, backward compat

P1:must-fix (1)

1. except Exception too broad in _resolve_plan_phase
decision_service.py:~807 — The comment says "database errors" but catches everything including TypeError, AttributeError, and other programming errors. The opt-in contract ("don't break if DB is unavailable") should only absorb infrastructure failures. Narrow to (OperationalError, DatabaseError, OSError) — the same pattern PR #971 correctly uses in _build_skill_service.

P2:should-fix (3)

2. decision_service.py at 1010 lines — well over the 500-line guideline. _resolve_plan_phase + _validate_phase_gating (~50 lines) could be extracted to a PhaseGatingPolicy class or phase_gating.py module.

3. TOCTOU race is documented but not programmatically guarded — the code assumes SQLite single-writer serialization. If someone switches to PostgreSQL, this assumption silently breaks. Add a # TODO(pg-migration): marker.

4. resource_selection reclassification to both-phases is a semantic breaking change — is_strategize_type and is_execute_type now return True for more types. The ticket #931 rationale is documented in code, but this deserves a CHANGELOG entry as a behavioral change.

P3:nit (2)

5. ULID_PATTERN export added to __all__ is an unrelated drive-by fix.
6. decision.py at 514 lines — marginally over guideline, acceptable.

Positive Observations

DecisionPhaseViolationError with structured attributes (decision_type, plan_phase, allowed_types: frozenset) — excellent for programmatic error handling
Phase-gating is fully opt-in: no plan_phase + no UoW = skip — backward compatible
PHASE_ALLOWED_TYPES as Mapping[PlanPhase, frozenset[DecisionType]] — immutable, O(1) lookup
Behave scenario outlines cover all type×phase combinations — thorough
tempfile.mktemp() → tempfile.mkstemp() and uow.engine.dispose() before file deletion — good cleanup fixes

Verdict: REQUEST_CHANGES — P1-1 is a targeted fix.

## Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time` **Reviewer:** @brent.edwards | **Size:** L (+802/−9, 8 files) | **Focus:** Domain invariants, service design, backward compat --- ### P1:must-fix (1) **1. `except Exception` too broad in `_resolve_plan_phase`** `decision_service.py:~807` — The comment says "database errors" but catches everything including `TypeError`, `AttributeError`, and other programming errors. The opt-in contract ("don't break if DB is unavailable") should only absorb infrastructure failures. Narrow to `(OperationalError, DatabaseError, OSError)` — the same pattern PR #971 correctly uses in `_build_skill_service`. --- ### P2:should-fix (3) **2.** `decision_service.py` at 1010 lines — well over the 500-line guideline. `_resolve_plan_phase` + `_validate_phase_gating` (~50 lines) could be extracted to a `PhaseGatingPolicy` class or `phase_gating.py` module. **3.** TOCTOU race is documented but not programmatically guarded — the code assumes SQLite single-writer serialization. If someone switches to PostgreSQL, this assumption silently breaks. Add a `# TODO(pg-migration):` marker. **4.** `resource_selection` reclassification to both-phases is a semantic breaking change — `is_strategize_type` and `is_execute_type` now return `True` for more types. The ticket #931 rationale is documented in code, but this deserves a CHANGELOG entry as a behavioral change. --- ### P3:nit (2) **5.** `ULID_PATTERN` export added to `__all__` is an unrelated drive-by fix. **6.** `decision.py` at 514 lines — marginally over guideline, acceptable. --- ### Positive Observations - `DecisionPhaseViolationError` with structured attributes (`decision_type`, `plan_phase`, `allowed_types: frozenset`) — excellent for programmatic error handling - Phase-gating is fully opt-in: no `plan_phase` + no UoW = skip — backward compatible - `PHASE_ALLOWED_TYPES` as `Mapping[PlanPhase, frozenset[DecisionType]]` — immutable, O(1) lookup - Behave scenario outlines cover all type×phase combinations — thorough - `tempfile.mktemp()` → `tempfile.mkstemp()` and `uow.engine.dispose()` before file deletion — good cleanup fixes **Verdict:** REQUEST_CHANGES — P1-1 is a targeted fix.

brent.edwards requested changes 2026-03-17 20:36:41 +00:00

brent.edwards left a comment

Code Review Round 2 — PR #973 `feat(plan): enforce decision type phase-gating`

Reviewer: @brent.edwards | Focus: Verification of Round 1 fixes + deep second pass

Prior Findings: 0 of 6 resolved

All Round 1 findings remain open:

#	Sev	Finding
P1-1	`except Exception` too broad in `_resolve_plan_phase`	OPEN — still bare `except Exception` at line ~807. Comment added explaining intent but catch scope unchanged.
P2-2	`decision_service.py` at 1010 lines	OPEN
P2-3	TOCTOU race — no programmatic SQLite assertion	OPEN (excellent inline comment added, but still aspirational not enforceable)
P2-4	`resource_selection` reclassification needs CHANGELOG	OPEN
P3-5	`ULID_PATTERN` export is unrelated	OPEN
P3-6	`decision.py` at 514 lines	OPEN

New Finding (1)

P2: is_any_phase_type BDD gap for 3 newly dual-phase types
The PR promotes resource_selection, subplan_spawn, and subplan_parallel_spawn from single-phase to both-phase, changing is_any_phase_type from False → True for all three. But the only BDD scenario testing is_any_phase_type uses user_intervention. If someone later removes one of these from a phase set, the regression goes undetected.
Fix: Add a parametrized scenario covering all 4 dual-phase types.

Confirmed Clean from Second Pass

Area	Verdict
Phase constants (STRATEGIZE_TYPES ∪ EXECUTE_TYPES)	Correct — all 11 DecisionType members covered, no orphans
`record_decision` backward compat	Safe — `plan_phase` is keyword-only, defaults to `None`, gating skipped when `None`
`DecisionPhaseViolationError` upstream handling	Correct — caught by all upstream `BusinessRuleViolation` handlers (a2a, CLI, error_handling)
`is_any_phase_type` implementation	Dynamically correct — checks both-set membership for 4 types
Test quality	Good — real `DecisionService` with real `UnitOfWork` + SQLite, no mocking of service layer

Verdict: REQUEST_CHANGES — P1-1 (except Exception at line ~807) remains the sole merge-blocker. Narrowing to except (OperationalError, DatabaseError, OSError) resolves it. All other findings are P2/P3 and can be tracked as follow-up.

## Code Review Round 2 — PR #973 `feat(plan): enforce decision type phase-gating` **Reviewer:** @brent.edwards | **Focus:** Verification of Round 1 fixes + deep second pass --- ### Prior Findings: 0 of 6 resolved All Round 1 findings remain open: | # | Sev | Finding | Status | |---|-----|---------|--------| | P1-1 | `except Exception` too broad in `_resolve_plan_phase` | **OPEN** — still bare `except Exception` at line ~807. Comment added explaining intent but catch scope unchanged. | | P2-2 | `decision_service.py` at 1010 lines | OPEN | | P2-3 | TOCTOU race — no programmatic SQLite assertion | OPEN (excellent inline comment added, but still aspirational not enforceable) | | P2-4 | `resource_selection` reclassification needs CHANGELOG | OPEN | | P3-5 | `ULID_PATTERN` export is unrelated | OPEN | | P3-6 | `decision.py` at 514 lines | OPEN | --- ### New Finding (1) **P2: `is_any_phase_type` BDD gap for 3 newly dual-phase types** The PR promotes `resource_selection`, `subplan_spawn`, and `subplan_parallel_spawn` from single-phase to both-phase, changing `is_any_phase_type` from `False` → `True` for all three. But the only BDD scenario testing `is_any_phase_type` uses `user_intervention`. If someone later removes one of these from a phase set, the regression goes undetected. **Fix:** Add a parametrized scenario covering all 4 dual-phase types. --- ### Confirmed Clean from Second Pass | Area | Verdict | |------|---------| | Phase constants (STRATEGIZE_TYPES ∪ EXECUTE_TYPES) | Correct — all 11 DecisionType members covered, no orphans | | `record_decision` backward compat | Safe — `plan_phase` is keyword-only, defaults to `None`, gating skipped when `None` | | `DecisionPhaseViolationError` upstream handling | Correct — caught by all upstream `BusinessRuleViolation` handlers (a2a, CLI, error_handling) | | `is_any_phase_type` implementation | Dynamically correct — checks both-set membership for 4 types | | Test quality | Good — real `DecisionService` with real `UnitOfWork` + SQLite, no mocking of service layer | --- **Verdict:** REQUEST_CHANGES — P1-1 (`except Exception` at line ~807) remains the sole merge-blocker. Narrowing to `except (OperationalError, DatabaseError, OSError)` resolves it. All other findings are P2/P3 and can be tracked as follow-up.

hurui200320 force-pushed feature/m4-decision-phase-gating from cb7edf2227 to ef37aa3dfc

2026-03-18 07:22:28 +00:00

Compare

hurui200320 commented

2026-03-18 07:23:26 +00:00

Review Fix Summary — Addressing Rounds 1 & 2 (@brent.edwards)

All 7 findings from both review rounds have been addressed. Branch rebased onto latest master and force-pushed.

P1-1 (must-fix): `except Exception` too broad — ✅ RESOLVED

Narrowed the catch in resolve_plan_phase() from bare except Exception to except (DatabaseError, OperationalError, OSError). This follows the established codebase pattern from repositories.py:

DatabaseError (from cleveragents.core.exceptions) — catches repository-wrapped DB errors
OperationalError (from sqlalchemy.exc) — catches raw SQLAlchemy connection/operation errors from the UoW layer
OSError — catches filesystem-level SQLite access failures

Programming errors (TypeError, AttributeError, etc.) now correctly propagate instead of being silently swallowed.

P2-2 (should-fix): `decision_service.py` at 1010 lines — ✅ RESOLVED

Extracted phase-gating concern to new module cleveragents.application.services.phase_gating:

PHASE_ALLOWED_TYPES constant
resolve_plan_phase() (was _resolve_plan_phase instance method)
validate_phase_gating() (was _validate_phase_gating static method)

decision_service.py reduced from 1010 → 913 lines. The service imports and delegates to the new module. PHASE_ALLOWED_TYPES is re-exported in decision_service.__all__ for backward compatibility.

P2-3 (should-fix): TOCTOU race — no programmatic guard — ✅ RESOLVED

Added # TODO(pg-migration): marker to the TOCTOU comment with actionable guidance: "At minimum, a single-writer assertion or advisory lock should guard this section under multi-writer engines."

P2-4 (should-fix): `resource_selection` reclassification needs CHANGELOG — ✅ RESOLVED

Added CHANGELOG entry under ## Unreleased documenting the behavioral change: resource_selection reclassified from Execute-only to phase-agnostic, with impact note for code relying on is_strategize_type/is_execute_type.

P2-7 (new in Round 2): `is_any_phase_type` BDD gap — ✅ RESOLVED

Added 11 new Behave scenarios to consolidated_decision.feature:

Scenario Outline: All dual-phase types report is_any_phase_type true — parametrized over resource_selection, subplan_spawn, subplan_parallel_spawn, user_intervention (4 scenarios)
Scenario Outline: Single-phase types report is_any_phase_type false — parametrized over invariant_enforced, strategy_choice, implementation_choice, tool_invocation, error_recovery, validation_response (6 scenarios)
prompt_definition tested separately (existing root scenario) with added is_any_phase_type should be false assertion

P3-5 (nit): `ULID_PATTERN` export is unrelated drive-by — ✅ RESOLVED

Removed ULID_PATTERN from decision.py __all__.

P3-6 (nit): `decision.py` at 514 lines — No action

Reviewer accepted as marginally over guideline.

Quality Gates

Session	Result
lint	✅ PASS
typecheck	✅ PASS (0 errors)
unit_tests	✅ PASS (11,153 scenarios, 0 failures, 0 errors)
integration_tests	✅ PASS (1,563 tests, 0 failures)
e2e_tests	✅ PASS (16 tests, 0 failures)
coverage_report	✅ PASS (97%)

## Review Fix Summary — Addressing Rounds 1 & 2 (@brent.edwards) All 7 findings from both review rounds have been addressed. Branch rebased onto latest `master` and force-pushed. ### P1-1 (must-fix): `except Exception` too broad — ✅ RESOLVED Narrowed the catch in `resolve_plan_phase()` from bare `except Exception` to `except (DatabaseError, OperationalError, OSError)`. This follows the established codebase pattern from `repositories.py`: - `DatabaseError` (from `cleveragents.core.exceptions`) — catches repository-wrapped DB errors - `OperationalError` (from `sqlalchemy.exc`) — catches raw SQLAlchemy connection/operation errors from the UoW layer - `OSError` — catches filesystem-level SQLite access failures Programming errors (`TypeError`, `AttributeError`, etc.) now correctly propagate instead of being silently swallowed. ### P2-2 (should-fix): `decision_service.py` at 1010 lines — ✅ RESOLVED Extracted phase-gating concern to new module `cleveragents.application.services.phase_gating`: - `PHASE_ALLOWED_TYPES` constant - `resolve_plan_phase()` (was `_resolve_plan_phase` instance method) - `validate_phase_gating()` (was `_validate_phase_gating` static method) `decision_service.py` reduced from 1010 → 913 lines. The service imports and delegates to the new module. `PHASE_ALLOWED_TYPES` is re-exported in `decision_service.__all__` for backward compatibility. ### P2-3 (should-fix): TOCTOU race — no programmatic guard — ✅ RESOLVED Added `# TODO(pg-migration):` marker to the TOCTOU comment with actionable guidance: *"At minimum, a single-writer assertion or advisory lock should guard this section under multi-writer engines."* ### P2-4 (should-fix): `resource_selection` reclassification needs CHANGELOG — ✅ RESOLVED Added CHANGELOG entry under `## Unreleased` documenting the behavioral change: `resource_selection` reclassified from Execute-only to phase-agnostic, with impact note for code relying on `is_strategize_type`/`is_execute_type`. ### P2-7 (new in Round 2): `is_any_phase_type` BDD gap — ✅ RESOLVED Added 11 new Behave scenarios to `consolidated_decision.feature`: - **Scenario Outline: All dual-phase types report is_any_phase_type true** — parametrized over `resource_selection`, `subplan_spawn`, `subplan_parallel_spawn`, `user_intervention` (4 scenarios) - **Scenario Outline: Single-phase types report is_any_phase_type false** — parametrized over `invariant_enforced`, `strategy_choice`, `implementation_choice`, `tool_invocation`, `error_recovery`, `validation_response` (6 scenarios) - **prompt_definition** tested separately (existing root scenario) with added `is_any_phase_type should be false` assertion ### P3-5 (nit): `ULID_PATTERN` export is unrelated drive-by — ✅ RESOLVED Removed `ULID_PATTERN` from `decision.py` `__all__`. ### P3-6 (nit): `decision.py` at 514 lines — No action Reviewer accepted as marginally over guideline. ### Quality Gates | Session | Result | |---------|--------| | lint | ✅ PASS | | typecheck | ✅ PASS (0 errors) | | unit_tests | ✅ PASS (11,153 scenarios, 0 failures, 0 errors) | | integration_tests | ✅ PASS (1,563 tests, 0 failures) | | e2e_tests | ✅ PASS (16 tests, 0 failures) | | coverage_report | ✅ PASS (97%) |

hurui200320 referenced this pull request

2026-03-18 07:23:50 +00:00

feat(plan): enforce decision type phase-gating at recording time #931

hurui200320 force-pushed feature/m4-decision-phase-gating from ef37aa3dfc to 1f016bea33

2026-03-18 07:29:13 +00:00

Compare

hurui200320 force-pushed feature/m4-decision-phase-gating from 1f016bea33 to 35eb7b762a

2026-03-18 08:26:06 +00:00

Compare

freemo approved these changes 2026-03-19 04:56:07 +00:00

Dismissed

freemo left a comment

Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time`

Cleanly scoped feature with good architectural separation. The extraction of phase-gating logic into phase_gating.py (148 lines) with clear API boundary (resolve_plan_phase(), validate_phase_gating(), PHASE_ALLOWED_TYPES) is well-done. The opt-in design (gating skipped when neither plan_phase nor UoW is provided) ensures backward compatibility.

The resource_selection reclassification to phase-agnostic is a breaking behavioral change, but it's properly documented in the CHANGELOG with ADR references (ADR-007 L72, ADR-033 L74). The TOCTOU race condition is documented with a clear TODO(pg-migration) marker.

36 Behave scenarios + 6 Robot tests + proper exception hierarchy (DecisionPhaseViolationError(BusinessRuleViolation)) demonstrate thorough implementation.

Approved. No issues found.

## Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time` Cleanly scoped feature with good architectural separation. The extraction of phase-gating logic into `phase_gating.py` (148 lines) with clear API boundary (`resolve_plan_phase()`, `validate_phase_gating()`, `PHASE_ALLOWED_TYPES`) is well-done. The opt-in design (gating skipped when neither `plan_phase` nor UoW is provided) ensures backward compatibility. The `resource_selection` reclassification to phase-agnostic is a breaking behavioral change, but it's properly documented in the CHANGELOG with ADR references (ADR-007 L72, ADR-033 L74). The TOCTOU race condition is documented with a clear `TODO(pg-migration)` marker. 36 Behave scenarios + 6 Robot tests + proper exception hierarchy (`DecisionPhaseViolationError(BusinessRuleViolation)`) demonstrate thorough implementation. **Approved.** No issues found.

freemo requested review from brent.edwards 2026-03-19 05:19:32 +00:00

freemo requested review from freemo 2026-03-19 05:19:32 +00:00

hurui200320 force-pushed feature/m4-decision-phase-gating from 35eb7b762a to 52f1bb2abb

2026-03-19 05:46:57 +00:00

Compare

hurui200320 dismissed freemo's review 2026-03-19 05:46:57 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-19 05:50:58 +00:00

hurui200320 force-pushed feature/m4-decision-phase-gating from 52f1bb2abb to 1ec6b2ac27

2026-03-19 06:56:28 +00:00

Compare

hurui200320 force-pushed feature/m4-decision-phase-gating from 1ec6b2ac27 to 231c3656e0

2026-03-19 07:10:54 +00:00

Compare

hurui200320 force-pushed feature/m4-decision-phase-gating from 231c3656e0 to 296daebe59

2026-03-19 07:14:58 +00:00

Compare

hurui200320 commented

2026-03-19 07:15:47 +00:00

Coverage Fix — Branch rebased and pushed

What changed

The coverage gate was failing at 96.9% (displayed as 97% but the precise value 96.9446% rounds to 96.9% at 1 decimal). Two untested paths in phase_gating.py were the gap:

Invalid plan_phase string → ValidationError — Added scenario testing that plan_phase="not_a_real_phase" raises ValidationError with the invalid value in the message.
DB error resilience in resolve_plan_phase — Added scenario that corrupts a real SQLite DB and verifies resolve_plan_phase() catches the DatabaseError and returns None (graceful skip).

Rebase

Branch rebased onto latest origin/master (cbf8bcc9). Resolved CHANGELOG.md conflict (kept both entries).

All quality gates pass (post-rebase)

✅ lint
✅ typecheck (0 errors)
✅ unit_tests (11,264 scenarios)
✅ integration_tests
✅ e2e_tests (37 tests)
✅ coverage_report: 97.0% (was 96.9%)

CI running: https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/2381

## Coverage Fix — Branch rebased and pushed ### What changed The coverage gate was failing at **96.9%** (displayed as 97% but the precise value 96.9446% rounds to 96.9% at 1 decimal). Two untested paths in `phase_gating.py` were the gap: 1. **Invalid plan_phase string → ValidationError** — Added scenario testing that `plan_phase="not_a_real_phase"` raises `ValidationError` with the invalid value in the message. 2. **DB error resilience in resolve_plan_phase** — Added scenario that corrupts a real SQLite DB and verifies `resolve_plan_phase()` catches the `DatabaseError` and returns `None` (graceful skip). ### Rebase Branch rebased onto latest `origin/master` (`cbf8bcc9`). Resolved CHANGELOG.md conflict (kept both entries). ### All quality gates pass (post-rebase) - ✅ lint - ✅ typecheck (0 errors) - ✅ unit_tests (11,264 scenarios) - ✅ integration_tests - ✅ e2e_tests (37 tests) - ✅ **coverage_report: 97.0%** (was 96.9%) CI running: https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/2381

hurui200320 added 1 commit 2026-03-19 07:45:36 +00:00

Merge branch 'master' into feature/m4-decision-phase-gating

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 15s

Details

CI / build (pull_request) Successful in 17s

Details

CI / quality (pull_request) Successful in 30s

Details

CI / security (pull_request) Successful in 51s

Details

CI / typecheck (pull_request) Successful in 1m3s

Details

CI / unit_tests (pull_request) Successful in 3m25s

Details

CI / integration_tests (pull_request) Successful in 3m45s

Details

CI / docker (pull_request) Successful in 55s

Details

CI / e2e_tests (pull_request) Successful in 5m19s

Details

CI / coverage (pull_request) Successful in 7m1s

Details

CI / benchmark-regression (pull_request) Successful in 38m21s

Details

a3706a393d

hurui200320 merged commit 3837327564 into master

2026-03-19 07:53:44 +00:00

hurui200320 deleted branch feature/m4-decision-phase-gating

2026-03-19 07:53:44 +00:00

hurui200320 referenced this issue from a commit

2026-03-19 07:53:45 +00:00

feat(plan): enforce decision type phase-gating at recording time (#973)

CoreRasurae referenced this pull request

2026-03-19 21:55:35 +00:00

refactor(cli): align actor run signature with spec positional args #1072

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#973

feat(plan): enforce decision type phase-gating at recording time #973

Summary

Changes

Review Round 1 + 2 Fixes

Coverage Fix (Round 3)

Quality Gates

Self-QA Review — ✅ Approved

Cycle 1: 16 issues found → 16 fixed

Cycle 2: 0 critical/major issues — Approved

Quality Gates

Code Review — PR #973 feat(plan): enforce decision type phase-gating at recording time

P1:must-fix (1)

P2:should-fix (3)

P3:nit (2)

Positive Observations

Code Review Round 2 — PR #973 feat(plan): enforce decision type phase-gating

Prior Findings: 0 of 6 resolved

New Finding (1)

Confirmed Clean from Second Pass

Review Fix Summary — Addressing Rounds 1 & 2 (@brent.edwards)

P1-1 (must-fix): except Exception too broad — ✅ RESOLVED

P2-2 (should-fix): decision_service.py at 1010 lines — ✅ RESOLVED

P2-3 (should-fix): TOCTOU race — no programmatic guard — ✅ RESOLVED

P2-4 (should-fix): resource_selection reclassification needs CHANGELOG — ✅ RESOLVED

P2-7 (new in Round 2): is_any_phase_type BDD gap — ✅ RESOLVED

P3-5 (nit): ULID_PATTERN export is unrelated drive-by — ✅ RESOLVED

P3-6 (nit): decision.py at 514 lines — No action

Quality Gates

Code Review — PR #973 feat(plan): enforce decision type phase-gating at recording time

Coverage Fix — Branch rebased and pushed

What changed

Rebase

All quality gates pass (post-rebase)

Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time`

Code Review Round 2 — PR #973 `feat(plan): enforce decision type phase-gating`

P1-1 (must-fix): `except Exception` too broad — ✅ RESOLVED

P2-2 (should-fix): `decision_service.py` at 1010 lines — ✅ RESOLVED

P2-4 (should-fix): `resource_selection` reclassification needs CHANGELOG — ✅ RESOLVED

P2-7 (new in Round 2): `is_any_phase_type` BDD gap — ✅ RESOLVED

P3-5 (nit): `ULID_PATTERN` export is unrelated drive-by — ✅ RESOLVED

P3-6 (nit): `decision.py` at 514 lines — No action

Code Review — PR #973 `feat(plan): enforce decision type phase-gating at recording time`