UAT: PlanApplyService.apply_with_validation_gate() silently fails to transition plan state — returns outcome=applied while plan remains in Apply/QUEUED state #1910

Open
opened 2026-04-03 00:11:53 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/uat-plan-apply-service-silent-state-transition-failure
  • Commit Message: fix(plan): call start_apply() before complete_apply() in apply_with_validation_gate() to prevent silent state transition failure
  • Milestone: v3.2.0
  • Parent Epic: #372

Background and Context

PlanApplyService.apply_with_validation_gate() in src/cleveragents/application/services/plan_apply_service.py silently fails to transition the plan's processing state to APPLIED after a successful apply. The method returns ApplyResult(outcome=APPLIED) even though the plan remains in Apply/QUEUED state.

Root Cause:
The method calls self._lifecycle.complete_apply() directly without first calling self._lifecycle.start_apply(). The complete_apply() method requires the plan to be in PROCESSING state, but the plan is in QUEUED state when apply_with_validation_gate() is called. The resulting PlanError is caught and silently swallowed (logged at DEBUG level only), and the method returns a false success result.

Code Location:

  • src/cleveragents/application/services/plan_apply_service.py, lines ~632–651 (the apply_with_validation_gate method)
  • The complete_apply() call at line ~634 fails because start_apply() was never called
  • The exception is caught at line ~640 and silently swallowed

Expected Behavior (from spec and CLI implementation):
The correct flow (as implemented in src/cleveragents/cli/commands/plan.py lines ~826–832) is:

  1. Call start_apply(plan_id) — transitions plan from QUEUED to PROCESSING
  2. Call complete_apply(plan_id) — transitions plan from PROCESSING to APPLIED

Actual Behavior:

  1. apply_with_validation_gate() calls complete_apply() directly (skipping start_apply())
  2. complete_apply() raises PlanError: Plan X is not processing (current: queued)
  3. The exception is caught and logged at DEBUG level: "Could not transition to applied (plan may not be in Apply phase)"
  4. The method returns ApplyResult(outcome=APPLIED) — a false success
  5. The plan remains in Apply/QUEUED state forever (stuck)

Steps to Reproduce:

from cleveragents.application.services.plan_apply_service import PlanApplyService
from cleveragents.application.services.plan_executor import PlanExecutor
from cleveragents.application.services.plan_lifecycle_service import PlanLifecycleService
from cleveragents.config.settings import Settings
from cleveragents.domain.models.core.plan import ProjectLink, ProcessingState

settings = Settings()
lifecycle = PlanLifecycleService(settings=settings)
action = lifecycle.create_action(
    name='apply-test', description='Test', definition_of_done='- Step 1',
    strategy_actor='local/strategy-stub', execution_actor='local/execute-stub',
)
plan = lifecycle.use_action(action_name='apply-test', project_links=[ProjectLink(project_name='test-project')])
plan_id = plan.identity.plan_id

executor = PlanExecutor(lifecycle_service=lifecycle)
executor.run_strategize(plan_id)
lifecycle.execute_plan(plan_id)
executor.run_execute(plan_id)
lifecycle.apply_plan(plan_id)

apply_service = PlanApplyService(lifecycle_service=lifecycle)
result = apply_service.apply_with_validation_gate(plan_id, allow_empty=True)

print(f'Result outcome: {result.outcome}')  # Prints: applied (FALSE SUCCESS)
plan_final = lifecycle.get_plan(plan_id)
print(f'Plan state: {plan_final.processing_state}')  # Prints: queued (BUG: should be applied)

Impact:

  • Any code path that uses apply_with_validation_gate() (rather than the CLI's manual start_apply + complete_apply sequence) will silently fail to apply the plan
  • Plans get stuck in Apply/QUEUED state permanently
  • Callers receive a false success result (outcome=APPLIED) with no indication of failure

Why Tests Don't Catch This:
The existing BDD tests for apply_with_validation_gate use mocked lifecycle services where complete_apply is patched to succeed regardless of plan state. The bug only manifests with a real (non-mocked) PlanLifecycleService.

Subtasks

  • Reproduce the bug with a real (non-mocked) PlanLifecycleService and confirm the false-success return value
  • Add a call to self._lifecycle.start_apply(plan_id) before complete_apply() in apply_with_validation_gate() in src/cleveragents/application/services/plan_apply_service.py
  • Verify the fix transitions the plan correctly through QUEUED → PROCESSING → APPLIED
  • Add or update BDD scenario(s) for apply_with_validation_gate using a real (non-mocked) PlanLifecycleService to prevent regression
  • Ensure exception handling around the state transition is tightened so silent swallowing of PlanError is no longer possible (raise or surface the error appropriately)
  • Run full nox suite and confirm coverage ≥ 97%

Definition of Done

  • apply_with_validation_gate() correctly calls start_apply() before complete_apply()
  • The plan transitions to Apply/APPLIED (not Apply/QUEUED) after a successful apply via apply_with_validation_gate()
  • ApplyResult(outcome=APPLIED) is only returned when the plan state has actually been transitioned to APPLIED
  • A regression test using a real PlanLifecycleService (not a mock) covers this exact flow
  • No PlanError from state transition is silently swallowed in apply_with_validation_gate()
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `fix/uat-plan-apply-service-silent-state-transition-failure` - **Commit Message**: `fix(plan): call start_apply() before complete_apply() in apply_with_validation_gate() to prevent silent state transition failure` - **Milestone**: v3.2.0 - **Parent Epic**: #372 ## Background and Context `PlanApplyService.apply_with_validation_gate()` in `src/cleveragents/application/services/plan_apply_service.py` silently fails to transition the plan's processing state to `APPLIED` after a successful apply. The method returns `ApplyResult(outcome=APPLIED)` even though the plan remains in `Apply/QUEUED` state. **Root Cause:** The method calls `self._lifecycle.complete_apply()` directly without first calling `self._lifecycle.start_apply()`. The `complete_apply()` method requires the plan to be in `PROCESSING` state, but the plan is in `QUEUED` state when `apply_with_validation_gate()` is called. The resulting `PlanError` is caught and silently swallowed (logged at DEBUG level only), and the method returns a false success result. **Code Location:** - `src/cleveragents/application/services/plan_apply_service.py`, lines ~632–651 (the `apply_with_validation_gate` method) - The `complete_apply()` call at line ~634 fails because `start_apply()` was never called - The exception is caught at line ~640 and silently swallowed **Expected Behavior (from spec and CLI implementation):** The correct flow (as implemented in `src/cleveragents/cli/commands/plan.py` lines ~826–832) is: 1. Call `start_apply(plan_id)` — transitions plan from `QUEUED` to `PROCESSING` 2. Call `complete_apply(plan_id)` — transitions plan from `PROCESSING` to `APPLIED` **Actual Behavior:** 1. `apply_with_validation_gate()` calls `complete_apply()` directly (skipping `start_apply()`) 2. `complete_apply()` raises `PlanError: Plan X is not processing (current: queued)` 3. The exception is caught and logged at DEBUG level: "Could not transition to applied (plan may not be in Apply phase)" 4. The method returns `ApplyResult(outcome=APPLIED)` — a false success 5. The plan remains in `Apply/QUEUED` state forever (stuck) **Steps to Reproduce:** ```python from cleveragents.application.services.plan_apply_service import PlanApplyService from cleveragents.application.services.plan_executor import PlanExecutor from cleveragents.application.services.plan_lifecycle_service import PlanLifecycleService from cleveragents.config.settings import Settings from cleveragents.domain.models.core.plan import ProjectLink, ProcessingState settings = Settings() lifecycle = PlanLifecycleService(settings=settings) action = lifecycle.create_action( name='apply-test', description='Test', definition_of_done='- Step 1', strategy_actor='local/strategy-stub', execution_actor='local/execute-stub', ) plan = lifecycle.use_action(action_name='apply-test', project_links=[ProjectLink(project_name='test-project')]) plan_id = plan.identity.plan_id executor = PlanExecutor(lifecycle_service=lifecycle) executor.run_strategize(plan_id) lifecycle.execute_plan(plan_id) executor.run_execute(plan_id) lifecycle.apply_plan(plan_id) apply_service = PlanApplyService(lifecycle_service=lifecycle) result = apply_service.apply_with_validation_gate(plan_id, allow_empty=True) print(f'Result outcome: {result.outcome}') # Prints: applied (FALSE SUCCESS) plan_final = lifecycle.get_plan(plan_id) print(f'Plan state: {plan_final.processing_state}') # Prints: queued (BUG: should be applied) ``` **Impact:** - Any code path that uses `apply_with_validation_gate()` (rather than the CLI's manual `start_apply` + `complete_apply` sequence) will silently fail to apply the plan - Plans get stuck in `Apply/QUEUED` state permanently - Callers receive a false success result (`outcome=APPLIED`) with no indication of failure **Why Tests Don't Catch This:** The existing BDD tests for `apply_with_validation_gate` use mocked lifecycle services where `complete_apply` is patched to succeed regardless of plan state. The bug only manifests with a real (non-mocked) `PlanLifecycleService`. ## Subtasks - [ ] Reproduce the bug with a real (non-mocked) `PlanLifecycleService` and confirm the false-success return value - [ ] Add a call to `self._lifecycle.start_apply(plan_id)` before `complete_apply()` in `apply_with_validation_gate()` in `src/cleveragents/application/services/plan_apply_service.py` - [ ] Verify the fix transitions the plan correctly through `QUEUED → PROCESSING → APPLIED` - [ ] Add or update BDD scenario(s) for `apply_with_validation_gate` using a real (non-mocked) `PlanLifecycleService` to prevent regression - [ ] Ensure exception handling around the state transition is tightened so silent swallowing of `PlanError` is no longer possible (raise or surface the error appropriately) - [ ] Run full nox suite and confirm coverage ≥ 97% ## Definition of Done - [ ] `apply_with_validation_gate()` correctly calls `start_apply()` before `complete_apply()` - [ ] The plan transitions to `Apply/APPLIED` (not `Apply/QUEUED`) after a successful apply via `apply_with_validation_gate()` - [ ] `ApplyResult(outcome=APPLIED)` is only returned when the plan state has actually been transitioned to `APPLIED` - [ ] A regression test using a real `PlanLifecycleService` (not a mock) covers this exact flow - [ ] No `PlanError` from state transition is silently swallowed in `apply_with_validation_gate()` - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
freemo added this to the v3.2.0 milestone 2026-04-03 00:12:17 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • MoSCoW: MoSCoW/Should Have — bug or error handling improvement.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **MoSCoW**: MoSCoW/Should Have — bug or error handling improvement. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
Reference
cleveragents/cleveragents-core#1910
No description provided.