UAT: Definition of Done (DoD) gating not enforced before Apply phase transition #3600

Open
opened 2026-04-05 20:16:54 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/dod-gating-not-enforced-apply-phase
  • Commit Message: fix(dod): wire DoDEvaluator into apply gate — DoD criteria not enforced before Apply phase transition
  • Milestone: v3.2.0
  • Parent Epic: #395

Background

The Definition of Done (DoD) is not evaluated or enforced before allowing a plan to transition to the Apply phase. The DoDEvaluator, DoDSummary, parse_dod_criteria, and TextMatchEvaluator classes exist in the domain model (definition_of_done.py) but are never called from the application services layer.

Per docs/timeline.md line 2007: "DoD gating enforcement (#178 merged 2026-02-25)" — the apply phase should be blocked when the plan's definition_of_done criteria have not been met. Issue #178 was closed as complete, but the implementation is missing from the application services layer.

Per the spec, the definition_of_done field on an action/plan defines testable completion criteria. Before a plan can transition to Apply, the DoD criteria must be evaluated and all must pass. If any DoD criterion fails, the apply should be blocked (similar to how required validation failures block apply).

The DoDSummary.to_validation_summary() method (in definition_of_done.py, line 196–208) even provides a method to convert DoD results into the validation_summary format, confirming the original intent was to integrate DoD evaluation into the apply gate.

Actual Behavior

  1. apply_plan() in plan_lifecycle_service.py (lines 1679–1728) does not call any DoD evaluation — it only calls _run_invariant_reconciliation().
  2. apply_with_validation_gate() in plan_apply_service.py checks validation_summary for required validation failures but does not evaluate the definition_of_done text against any context.
  3. DoDEvaluator, parse_dod_criteria, and TextMatchEvaluator are exported from domain/models/core/__init__.py but are never imported or called from any application service.
  4. A plan with definition_of_done: "Coverage reaches 85%" can be applied even if coverage is 0%.

Expected Behavior

Before apply_plan() transitions a plan to the Apply phase:

  1. The plan's definition_of_done criteria are parsed via parse_dod_criteria().
  2. Each criterion is evaluated via DoDEvaluator / TextMatchEvaluator against the current plan context.
  3. The resulting DoDSummary is converted to a ValidationSummary via DoDSummary.to_validation_summary().
  4. If any DoD criterion fails, apply_with_validation_gate() blocks the apply and surfaces a clear error — identical to how required validation failures block apply today.

Code Locations

  • /app/src/cleveragents/application/services/plan_lifecycle_service.py, lines 1679–1728 — apply_plan() missing DoD check
  • /app/src/cleveragents/application/services/plan_apply_service.pyapply_with_validation_gate() missing DoD evaluation
  • /app/src/cleveragents/domain/models/core/definition_of_done.py — DoD infrastructure exists but is unused

Steps to Reproduce

  1. Create an action with definition_of_done: "Coverage reaches 85%"
  2. Create a plan from the action
  3. Execute the plan (strategize + execute phases)
  4. Call apply_plan() — it succeeds even though DoD criteria were never evaluated
  5. Observe: validation_summary in the plan does not have dod_evaluated: True

Verification

# Search for any usage of DoDEvaluator, parse_dod_criteria in application services
# Result: 0 matches in /app/src/cleveragents/application/

Subtasks

  • Audit plan_lifecycle_service.py::apply_plan() and plan_apply_service.py::apply_with_validation_gate() to confirm zero DoD calls
  • Wire parse_dod_criteria() into apply_plan() to extract DoD criteria from the plan's definition_of_done field
  • Instantiate and invoke DoDEvaluator / TextMatchEvaluator against the current plan context within the apply gate
  • Convert DoDSummary to ValidationSummary via DoDSummary.to_validation_summary() and merge into the plan's validation_summary
  • Ensure apply_with_validation_gate() treats DoD failures as blocking (same severity path as required validation failures)
  • Add Behave BDD scenarios: DoD gate pass, DoD gate fail (apply blocked), DoD gate with no criteria (pass-through)
  • Add Robot Framework integration test covering DoD gate CLI output on agents plan apply
  • Verify coverage >= 97% via nox -s coverage_report
  • Run nox (all default sessions) and fix any errors

Definition of Done

  • apply_plan() calls parse_dod_criteria() and DoDEvaluator before transitioning to Apply phase
  • apply_with_validation_gate() blocks apply when any DoD criterion fails, surfacing a clear error message
  • DoDSummary.to_validation_summary() is used to merge DoD results into the plan's validation_summary
  • Behave scenarios cover: DoD pass, DoD fail (blocked), no DoD criteria (pass-through)
  • Robot Framework integration test covers DoD gate CLI output
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Acting on behalf of: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/dod-gating-not-enforced-apply-phase` - **Commit Message**: `fix(dod): wire DoDEvaluator into apply gate — DoD criteria not enforced before Apply phase transition` - **Milestone**: v3.2.0 - **Parent Epic**: #395 ## Background The Definition of Done (DoD) is **not evaluated or enforced** before allowing a plan to transition to the Apply phase. The `DoDEvaluator`, `DoDSummary`, `parse_dod_criteria`, and `TextMatchEvaluator` classes exist in the domain model (`definition_of_done.py`) but are **never called** from the application services layer. Per `docs/timeline.md` line 2007: *"DoD gating enforcement (#178 merged 2026-02-25)"* — the apply phase should be blocked when the plan's `definition_of_done` criteria have not been met. Issue #178 was closed as complete, but the implementation is missing from the application services layer. Per the spec, the `definition_of_done` field on an action/plan defines testable completion criteria. Before a plan can transition to Apply, the DoD criteria must be evaluated and all must pass. If any DoD criterion fails, the apply should be blocked (similar to how required validation failures block apply). The `DoDSummary.to_validation_summary()` method (in `definition_of_done.py`, line 196–208) even provides a method to convert DoD results into the `validation_summary` format, confirming the original intent was to integrate DoD evaluation into the apply gate. ## Actual Behavior 1. `apply_plan()` in `plan_lifecycle_service.py` (lines 1679–1728) does **not** call any DoD evaluation — it only calls `_run_invariant_reconciliation()`. 2. `apply_with_validation_gate()` in `plan_apply_service.py` checks `validation_summary` for required validation failures but does **not** evaluate the `definition_of_done` text against any context. 3. `DoDEvaluator`, `parse_dod_criteria`, and `TextMatchEvaluator` are exported from `domain/models/core/__init__.py` but are **never imported or called** from any application service. 4. A plan with `definition_of_done: "Coverage reaches 85%"` can be applied even if coverage is 0%. ## Expected Behavior Before `apply_plan()` transitions a plan to the Apply phase: 1. The plan's `definition_of_done` criteria are parsed via `parse_dod_criteria()`. 2. Each criterion is evaluated via `DoDEvaluator` / `TextMatchEvaluator` against the current plan context. 3. The resulting `DoDSummary` is converted to a `ValidationSummary` via `DoDSummary.to_validation_summary()`. 4. If any DoD criterion fails, `apply_with_validation_gate()` blocks the apply and surfaces a clear error — identical to how required validation failures block apply today. ## Code Locations - `/app/src/cleveragents/application/services/plan_lifecycle_service.py`, lines 1679–1728 — `apply_plan()` missing DoD check - `/app/src/cleveragents/application/services/plan_apply_service.py` — `apply_with_validation_gate()` missing DoD evaluation - `/app/src/cleveragents/domain/models/core/definition_of_done.py` — DoD infrastructure exists but is unused ## Steps to Reproduce 1. Create an action with `definition_of_done: "Coverage reaches 85%"` 2. Create a plan from the action 3. Execute the plan (strategize + execute phases) 4. Call `apply_plan()` — it succeeds even though DoD criteria were never evaluated 5. Observe: `validation_summary` in the plan does **not** have `dod_evaluated: True` ## Verification ```python # Search for any usage of DoDEvaluator, parse_dod_criteria in application services # Result: 0 matches in /app/src/cleveragents/application/ ``` ## Subtasks - [ ] Audit `plan_lifecycle_service.py::apply_plan()` and `plan_apply_service.py::apply_with_validation_gate()` to confirm zero DoD calls - [ ] Wire `parse_dod_criteria()` into `apply_plan()` to extract DoD criteria from the plan's `definition_of_done` field - [ ] Instantiate and invoke `DoDEvaluator` / `TextMatchEvaluator` against the current plan context within the apply gate - [ ] Convert `DoDSummary` to `ValidationSummary` via `DoDSummary.to_validation_summary()` and merge into the plan's `validation_summary` - [ ] Ensure `apply_with_validation_gate()` treats DoD failures as blocking (same severity path as required validation failures) - [ ] Add Behave BDD scenarios: DoD gate pass, DoD gate fail (apply blocked), DoD gate with no criteria (pass-through) - [ ] Add Robot Framework integration test covering DoD gate CLI output on `agents plan apply` - [ ] Verify coverage >= 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions) and fix any errors ## Definition of Done - [ ] `apply_plan()` calls `parse_dod_criteria()` and `DoDEvaluator` before transitioning to Apply phase - [ ] `apply_with_validation_gate()` blocks apply when any DoD criterion fails, surfacing a clear error message - [ ] `DoDSummary.to_validation_summary()` is used to merge DoD results into the plan's `validation_summary` - [ ] Behave scenarios cover: DoD pass, DoD fail (blocked), no DoD criteria (pass-through) - [ ] Robot Framework integration test covers DoD gate CLI output - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.2.0 milestone 2026-04-05 20:16:59 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — Definition of Done (DoD) gating is not enforced before Apply phase transition. Plans can proceed to Apply without meeting their DoD criteria, which is a fundamental plan lifecycle violation.
  • Milestone: v3.2.0 (already assigned)
  • Story Points: 5 — L — Requires implementing DoD validation in the phase transition logic.
  • MoSCoW: Must Have — DoD gating is a core spec requirement for the plan lifecycle. Without it, plans can apply incomplete or invalid work. This directly impacts the integrity of the Strategize → Execute → Apply pipeline.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical — Definition of Done (DoD) gating is not enforced before Apply phase transition. Plans can proceed to Apply without meeting their DoD criteria, which is a fundamental plan lifecycle violation. - **Milestone**: v3.2.0 (already assigned) - **Story Points**: 5 — L — Requires implementing DoD validation in the phase transition logic. - **MoSCoW**: Must Have — DoD gating is a core spec requirement for the plan lifecycle. Without it, plans can apply incomplete or invalid work. This directly impacts the integrity of the Strategize → Execute → Apply pipeline. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#395 Epic: Validation & Quality Gating
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3600
No description provided.