UAT: PlanResumeService.resume_plan directly mutates processing_state bypassing lifecycle service methods, skipping pre-flight guardrails and invariant reconciliation #3728

Open
opened 2026-04-05 22:20:17 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/plan-resume-lifecycle-methods
  • Commit Message: fix(plan): use proper lifecycle start methods in PlanResumeService.resume_plan
  • Milestone: None (Backlog)
  • Parent Epic: #362

Background and Context

PlanResumeService.resume_plan in src/cleveragents/application/services/plan_resume_service.py (lines 218–277) directly mutates plan.processing_state = ProcessingState.PROCESSING without calling the appropriate lifecycle service methods (start_strategize, start_execute, or start_apply). This bypasses critical safety checks that are required by the spec.

This was discovered during UAT testing of the Plan Lifecycle feature area.

Current Behavior

resume_plan (lines 256–267) directly sets state:

if plan.processing_state == ProcessingState.ERRORED:
    plan.processing_state = ProcessingState.PROCESSING
    plan.error_message = None
    plan.timestamps.updated_at = datetime.now()
    self._lifecycle._commit_plan(plan)

elif plan.processing_state == ProcessingState.QUEUED:
    plan.processing_state = ProcessingState.PROCESSING
    plan.timestamps.updated_at = datetime.now()
    self._lifecycle._commit_plan(plan)

This bypasses:

  1. Pre-flight guardrail checks (7 checks in start_strategize including action availability, actor registry, automation profile validation)
  2. Invariant reconciliation (Invariant Reconciliation Actor is not invoked)
  3. Phase-specific timestamp updates (e.g., execute_started_at is not set)
  4. Decision recording (no decision is recorded for the resume action)
  5. Event emission (no PLAN_STATE_CHANGED event is emitted)
  6. Private method access (_commit_plan is a private method being called from an external service)

Expected Behavior

resume_plan should call the appropriate phase start method based on the current plan phase:

if plan.phase == PlanPhase.STRATEGIZE:
    self._lifecycle.start_strategize(plan_id)
elif plan.phase == PlanPhase.EXECUTE:
    self._lifecycle.start_execute(plan_id)
elif plan.phase == PlanPhase.APPLY:
    self._lifecycle.start_apply(plan_id)

This ensures all lifecycle invariants, guardrails, and side effects are properly applied when resuming a plan.

Steps to Reproduce

  1. Create a plan and start the Execute phase
  2. Interrupt execution (set plan to ERRORED state)
  3. Call PlanResumeService.resume_plan(plan_id)
  4. Observe: plan transitions to PROCESSING without pre-flight checks, invariant reconciliation, or event emission

Code Location

  • File: src/cleveragents/application/services/plan_resume_service.py
  • Lines: 218–277 (resume_plan method)
  • Related: src/cleveragents/application/services/plan_lifecycle_service.py (start_strategize, start_execute, start_apply)

Subtasks

  • Replace direct plan.processing_state = ProcessingState.PROCESSING mutations with calls to the appropriate start_* lifecycle method
  • Handle the case where the plan is in ERRORED state (may need to reset error_message before calling start method)
  • Add/update unit test in features/ to verify pre-flight checks are run on resume
  • Verify nox -e unit_tests passes
  • Verify nox -e typecheck passes

Definition of Done

  • resume_plan calls start_strategize, start_execute, or start_apply based on current phase
  • Pre-flight guardrails are enforced on resume
  • Invariant reconciliation runs on resume
  • Phase-specific timestamps are set correctly
  • Unit test coverage for resume lifecycle method delegation
  • All nox quality gates pass
  • PR merged

Backlog note: This issue was discovered during autonomous operation on milestone v3.3.0. It does not block milestone completion and has been placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/plan-resume-lifecycle-methods` - **Commit Message**: `fix(plan): use proper lifecycle start methods in PlanResumeService.resume_plan` - **Milestone**: None (Backlog) - **Parent Epic**: #362 ## Background and Context `PlanResumeService.resume_plan` in `src/cleveragents/application/services/plan_resume_service.py` (lines 218–277) directly mutates `plan.processing_state = ProcessingState.PROCESSING` without calling the appropriate lifecycle service methods (`start_strategize`, `start_execute`, or `start_apply`). This bypasses critical safety checks that are required by the spec. This was discovered during UAT testing of the Plan Lifecycle feature area. ## Current Behavior `resume_plan` (lines 256–267) directly sets state: ```python if plan.processing_state == ProcessingState.ERRORED: plan.processing_state = ProcessingState.PROCESSING plan.error_message = None plan.timestamps.updated_at = datetime.now() self._lifecycle._commit_plan(plan) elif plan.processing_state == ProcessingState.QUEUED: plan.processing_state = ProcessingState.PROCESSING plan.timestamps.updated_at = datetime.now() self._lifecycle._commit_plan(plan) ``` This bypasses: 1. **Pre-flight guardrail checks** (7 checks in `start_strategize` including action availability, actor registry, automation profile validation) 2. **Invariant reconciliation** (Invariant Reconciliation Actor is not invoked) 3. **Phase-specific timestamp updates** (e.g., `execute_started_at` is not set) 4. **Decision recording** (no decision is recorded for the resume action) 5. **Event emission** (no `PLAN_STATE_CHANGED` event is emitted) 6. **Private method access** (`_commit_plan` is a private method being called from an external service) ## Expected Behavior `resume_plan` should call the appropriate phase start method based on the current plan phase: ```python if plan.phase == PlanPhase.STRATEGIZE: self._lifecycle.start_strategize(plan_id) elif plan.phase == PlanPhase.EXECUTE: self._lifecycle.start_execute(plan_id) elif plan.phase == PlanPhase.APPLY: self._lifecycle.start_apply(plan_id) ``` This ensures all lifecycle invariants, guardrails, and side effects are properly applied when resuming a plan. ## Steps to Reproduce 1. Create a plan and start the Execute phase 2. Interrupt execution (set plan to ERRORED state) 3. Call `PlanResumeService.resume_plan(plan_id)` 4. Observe: plan transitions to PROCESSING without pre-flight checks, invariant reconciliation, or event emission ## Code Location - **File**: `src/cleveragents/application/services/plan_resume_service.py` - **Lines**: 218–277 (`resume_plan` method) - **Related**: `src/cleveragents/application/services/plan_lifecycle_service.py` (`start_strategize`, `start_execute`, `start_apply`) ## Subtasks - [ ] Replace direct `plan.processing_state = ProcessingState.PROCESSING` mutations with calls to the appropriate `start_*` lifecycle method - [ ] Handle the case where the plan is in ERRORED state (may need to reset error_message before calling start method) - [ ] Add/update unit test in `features/` to verify pre-flight checks are run on resume - [ ] Verify `nox -e unit_tests` passes - [ ] Verify `nox -e typecheck` passes ## Definition of Done - [ ] `resume_plan` calls `start_strategize`, `start_execute`, or `start_apply` based on current phase - [ ] Pre-flight guardrails are enforced on resume - [ ] Invariant reconciliation runs on resume - [ ] Phase-specific timestamps are set correctly - [ ] Unit test coverage for resume lifecycle method delegation - [ ] All nox quality gates pass - [ ] PR merged > **Backlog note:** This issue was discovered during autonomous operation on milestone v3.3.0. It does not block milestone completion and has been placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Medium (confirmed) — This is a correctness bug in plan lifecycle management. It bypasses guardrails and invariant reconciliation on resume, but only affects the resume path (not the primary execution path).
  • Milestone: Recommend v3.3.0 — This directly relates to the Corrections + Subplans + Checkpoints milestone. Plan resume is a core lifecycle operation that must respect the same guardrails as initial execution.
  • Story Points: 5 — L — Requires refactoring the resume_plan method to delegate to lifecycle service methods, handling error state reset edge cases, and adding Behave tests for the guardrail enforcement path.
  • MoSCoW: Should Have — The spec requires lifecycle guardrails to be enforced on all state transitions. Bypassing them on resume is a spec violation, but the resume path is not the primary user flow. Important to fix but not blocking the milestone demo.
  • Parent Epic: #362 (Epic: Security & Safety Hardening) — Correct, since this is about enforcing safety guardrails.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Medium (confirmed) — This is a correctness bug in plan lifecycle management. It bypasses guardrails and invariant reconciliation on resume, but only affects the resume path (not the primary execution path). - **Milestone**: Recommend v3.3.0 — This directly relates to the Corrections + Subplans + Checkpoints milestone. Plan resume is a core lifecycle operation that must respect the same guardrails as initial execution. - **Story Points**: 5 — L — Requires refactoring the resume_plan method to delegate to lifecycle service methods, handling error state reset edge cases, and adding Behave tests for the guardrail enforcement path. - **MoSCoW**: Should Have — The spec requires lifecycle guardrails to be enforced on all state transitions. Bypassing them on resume is a spec violation, but the resume path is not the primary user flow. Important to fix but not blocking the milestone demo. - **Parent Epic**: #362 (Epic: Security & Safety Hardening) — Correct, since this is about enforcing safety guardrails. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#362 Epic: Security & Safety Hardening
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3728
No description provided.