UAT: PlanApplyService.apply_with_validation_gate silently swallows complete_apply failure, reporting false APPLIED status #3702

Open
opened 2026-04-05 22:14:49 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/plan-apply-service-false-success
  • Commit Message: fix(plan): call start_apply before complete_apply in apply_with_validation_gate
  • Milestone: None (Backlog)
  • Parent Epic: #362

Background and Context

PlanApplyService.apply_with_validation_gate in src/cleveragents/application/services/plan_apply_service.py (lines 482–662) calls self._lifecycle.complete_apply() directly without first calling self._lifecycle.start_apply(). Since complete_apply() requires ProcessingState.PROCESSING (enforced at line 1790 of plan_lifecycle_service.py), but the plan is in ProcessingState.QUEUED after apply_plan() transitions it, complete_apply() raises a PlanError. This exception is caught and silently swallowed (lines 642–648), but the method still returns ApplyResult(outcome=ApplyOutcome.APPLIED, ...) — reporting success when the plan was NOT actually applied.

This was discovered during UAT testing of the Plan Lifecycle feature area.

Current Behavior

When apply_with_validation_gate is called on a plan in Apply/QUEUED state:

  1. persist_apply_summary() is called (no state change)
  2. complete_apply() is called — raises PlanError because plan is in QUEUED not PROCESSING
  3. The PlanError is caught and swallowed (lines 642–648 of plan_apply_service.py)
  4. The method returns ApplyResult(outcome=ApplyOutcome.APPLIED, ...)false success
  5. The plan remains in Apply/QUEUED state, not APPLIED
# plan_apply_service.py lines 633-648 (the bug)
try:
    self._lifecycle.complete_apply(
        plan_id,
        files_changed=files_changed,
        ...
    )
except (PlanError, ValidationError):
    # If lifecycle transition fails, attempt rollback
    self._try_rollback(plan_id)
    self._logger.debug(
        "Could not transition to applied (plan may not be in Apply phase)",
        plan_id=plan_id,
    )
# Returns APPLIED outcome even though complete_apply failed!
return ApplyResult(outcome=ApplyOutcome.APPLIED, ...)

Expected Behavior

apply_with_validation_gate should call start_apply() before complete_apply() to properly transition the plan through Apply/PROCESSING → Apply/APPLIED. The correct sequence is:

  1. self._lifecycle.start_apply(plan_id) → Apply/PROCESSING
  2. self._lifecycle.complete_apply(plan_id, ...) → Apply/APPLIED

Alternatively, if complete_apply fails, the method should NOT return ApplyOutcome.APPLIED — it should propagate the error or return ApplyOutcome.CONSTRAINED.

Steps to Reproduce

  1. Create an action and use it to create a plan
  2. Run the plan through Strategize and Execute phases
  3. Transition to Apply phase via apply_plan()
  4. Call apply_with_validation_gate() with a valid changeset and no failed validations
  5. Observe: method returns ApplyResult(outcome=APPLIED) but plan is still in Apply/QUEUED state

Code Location

  • File: src/cleveragents/application/services/plan_apply_service.py
  • Lines: 482–662 (apply_with_validation_gate method)
  • Related: src/cleveragents/application/services/plan_lifecycle_service.py lines 1730–1750 (start_apply), 1752–1843 (complete_apply)

Subtasks

  • Add self._lifecycle.start_apply(plan_id) call before complete_apply in apply_with_validation_gate
  • Remove the silent exception swallowing or change it to propagate the error properly
  • Add/update unit test in features/ to verify the correct state transition sequence
  • Verify nox -e unit_tests passes
  • Verify nox -e typecheck passes

Definition of Done

  • apply_with_validation_gate calls start_apply before complete_apply
  • If complete_apply fails, the method does NOT return ApplyOutcome.APPLIED
  • Unit test coverage for the correct Apply phase state transition sequence
  • All nox quality gates pass
  • PR merged

Backlog note: This issue was discovered during autonomous operation
on milestone v3.3.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/plan-apply-service-false-success` - **Commit Message**: `fix(plan): call start_apply before complete_apply in apply_with_validation_gate` - **Milestone**: None (Backlog) - **Parent Epic**: #362 ## Background and Context `PlanApplyService.apply_with_validation_gate` in `src/cleveragents/application/services/plan_apply_service.py` (lines 482–662) calls `self._lifecycle.complete_apply()` directly without first calling `self._lifecycle.start_apply()`. Since `complete_apply()` requires `ProcessingState.PROCESSING` (enforced at line 1790 of `plan_lifecycle_service.py`), but the plan is in `ProcessingState.QUEUED` after `apply_plan()` transitions it, `complete_apply()` raises a `PlanError`. This exception is caught and silently swallowed (lines 642–648), but the method still returns `ApplyResult(outcome=ApplyOutcome.APPLIED, ...)` — reporting success when the plan was NOT actually applied. This was discovered during UAT testing of the Plan Lifecycle feature area. ## Current Behavior When `apply_with_validation_gate` is called on a plan in Apply/QUEUED state: 1. `persist_apply_summary()` is called (no state change) 2. `complete_apply()` is called — raises `PlanError` because plan is in QUEUED not PROCESSING 3. The `PlanError` is caught and swallowed (lines 642–648 of `plan_apply_service.py`) 4. The method returns `ApplyResult(outcome=ApplyOutcome.APPLIED, ...)` — **false success** 5. The plan remains in Apply/QUEUED state, not APPLIED ```python # plan_apply_service.py lines 633-648 (the bug) try: self._lifecycle.complete_apply( plan_id, files_changed=files_changed, ... ) except (PlanError, ValidationError): # If lifecycle transition fails, attempt rollback self._try_rollback(plan_id) self._logger.debug( "Could not transition to applied (plan may not be in Apply phase)", plan_id=plan_id, ) # Returns APPLIED outcome even though complete_apply failed! return ApplyResult(outcome=ApplyOutcome.APPLIED, ...) ``` ## Expected Behavior `apply_with_validation_gate` should call `start_apply()` before `complete_apply()` to properly transition the plan through Apply/PROCESSING → Apply/APPLIED. The correct sequence is: 1. `self._lifecycle.start_apply(plan_id)` → Apply/PROCESSING 2. `self._lifecycle.complete_apply(plan_id, ...)` → Apply/APPLIED Alternatively, if `complete_apply` fails, the method should NOT return `ApplyOutcome.APPLIED` — it should propagate the error or return `ApplyOutcome.CONSTRAINED`. ## Steps to Reproduce 1. Create an action and use it to create a plan 2. Run the plan through Strategize and Execute phases 3. Transition to Apply phase via `apply_plan()` 4. Call `apply_with_validation_gate()` with a valid changeset and no failed validations 5. Observe: method returns `ApplyResult(outcome=APPLIED)` but plan is still in Apply/QUEUED state ## Code Location - **File**: `src/cleveragents/application/services/plan_apply_service.py` - **Lines**: 482–662 (`apply_with_validation_gate` method) - **Related**: `src/cleveragents/application/services/plan_lifecycle_service.py` lines 1730–1750 (`start_apply`), 1752–1843 (`complete_apply`) ## Subtasks - [ ] Add `self._lifecycle.start_apply(plan_id)` call before `complete_apply` in `apply_with_validation_gate` - [ ] Remove the silent exception swallowing or change it to propagate the error properly - [ ] Add/update unit test in `features/` to verify the correct state transition sequence - [ ] Verify `nox -e unit_tests` passes - [ ] Verify `nox -e typecheck` passes ## Definition of Done - [ ] `apply_with_validation_gate` calls `start_apply` before `complete_apply` - [ ] If `complete_apply` fails, the method does NOT return `ApplyOutcome.APPLIED` - [ ] Unit test coverage for the correct Apply phase state transition sequence - [ ] All nox quality gates pass - [ ] PR merged > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.3.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#362 Epic: Security & Safety Hardening
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3702
No description provided.