UAT: CrossPlanCorrectionService._rollback_completed_actions() is a stub — cascade atomicity guarantee broken #3720

Open
opened 2026-04-05 22:17:53 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/cross-plan-correction-rollback-stub
  • Commit Message: fix(services): implement _rollback_completed_actions in CrossPlanCorrectionService — restore cascade atomicity
  • Milestone: (none — backlog)
  • Parent Epic: #394

Background

CrossPlanCorrectionService.execute_cascade() is documented as atomic: "if any action fails mid-way, previously completed actions are undone before raising." However, the _rollback_completed_actions() method that is supposed to implement this undo logic is a stub — it only logs a message and never actually calls cancel_child_plan or rollback_child_plan_sandbox to undo the completed actions.

This means the atomicity guarantee is entirely broken: if a cascade fails mid-way, already-cancelled child plans are not restored, leaving the system in an inconsistent state.

Affected File

  • src/cleveragents/application/services/cross_plan_correction_service.py

Code Location

CrossPlanCorrectionService._rollback_completed_actions() (near end of file):

def _rollback_completed_actions(
    self, completed_actions: list[CascadeAction]
) -> None:
    """Undo previously completed cascade actions for atomicity.

    Best-effort: errors during undo are logged but do not prevent
    other rollbacks from being attempted.

    Args:
        completed_actions: Actions that were successfully executed.
    """
    for action in reversed(completed_actions):
        try:
            logger.info(
                "cross_plan_correction.rollback_action",
                child_plan_id=action.child_plan_id,
            )
        except Exception as rollback_exc:
            logger.error(
                "cross_plan_correction.rollback_action_failed",
                child_plan_id=action.child_plan_id,
                error=str(rollback_exc),
            )

The try block only calls logger.info() — it never calls self._plan_canceller.cancel_child_plan() or self._sandbox_rollbacker.rollback_child_plan_sandbox() to actually undo the actions.

Steps to Reproduce

  1. Create a CrossPlanCorrectionService with mock dependencies
  2. Set up 3 child plans: plan_A (NOT_STARTED), plan_B (IN_PROGRESS), plan_C (IN_PROGRESS)
  3. Make the cancel_child_plan call for plan_C raise a RuntimeError
  4. Call execute_cascade(correction_id, [plan_A_id, plan_B_id, plan_C_id])
  5. Observe that the exception is raised (correct) but plan_A and plan_B are NOT restored (incorrect — atomicity broken)

Expected Behaviour

When execute_cascade() fails mid-way (e.g., plan_C's cancellation raises), _rollback_completed_actions() should:

  1. Iterate completed_actions in reverse order
  2. For each action, attempt to restore the child plan (e.g., by re-activating it or signalling it was incorrectly cancelled)
  3. For cancel_and_rollback actions, also attempt to undo the sandbox rollback

Actual Behaviour

_rollback_completed_actions() only logs a message. No actual undo operations are performed. The system is left in an inconsistent state where some child plans are cancelled but the parent correction was not applied.

Subtasks

  • Implement actual undo logic in _rollback_completed_actions() — call appropriate reversal operations on _plan_canceller and _sandbox_rollbacker
  • Define a ChildPlanRestorer protocol (or extend existing protocols) to support undo operations
  • Add a Behave scenario in features/cross_plan_correction.feature that verifies atomicity: mid-cascade failure leaves no child plans in cancelled state
  • Verify nox -e unit_tests passes

Definition of Done

  • _rollback_completed_actions() performs actual undo operations (not just logging)
  • Atomicity is verified by a Behave test scenario
  • nox -e unit_tests passes
  • PR merged

Backlog note: This issue was discovered during autonomous operation
on milestone v3.5.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/cross-plan-correction-rollback-stub` - **Commit Message**: `fix(services): implement _rollback_completed_actions in CrossPlanCorrectionService — restore cascade atomicity` - **Milestone**: *(none — backlog)* - **Parent Epic**: #394 ## Background `CrossPlanCorrectionService.execute_cascade()` is documented as **atomic**: "if any action fails mid-way, previously completed actions are undone before raising." However, the `_rollback_completed_actions()` method that is supposed to implement this undo logic is a **stub** — it only logs a message and never actually calls `cancel_child_plan` or `rollback_child_plan_sandbox` to undo the completed actions. This means the atomicity guarantee is entirely broken: if a cascade fails mid-way, already-cancelled child plans are **not** restored, leaving the system in an inconsistent state. ## Affected File - `src/cleveragents/application/services/cross_plan_correction_service.py` ## Code Location `CrossPlanCorrectionService._rollback_completed_actions()` (near end of file): ```python def _rollback_completed_actions( self, completed_actions: list[CascadeAction] ) -> None: """Undo previously completed cascade actions for atomicity. Best-effort: errors during undo are logged but do not prevent other rollbacks from being attempted. Args: completed_actions: Actions that were successfully executed. """ for action in reversed(completed_actions): try: logger.info( "cross_plan_correction.rollback_action", child_plan_id=action.child_plan_id, ) except Exception as rollback_exc: logger.error( "cross_plan_correction.rollback_action_failed", child_plan_id=action.child_plan_id, error=str(rollback_exc), ) ``` The `try` block only calls `logger.info()` — it never calls `self._plan_canceller.cancel_child_plan()` or `self._sandbox_rollbacker.rollback_child_plan_sandbox()` to actually undo the actions. ## Steps to Reproduce 1. Create a `CrossPlanCorrectionService` with mock dependencies 2. Set up 3 child plans: plan_A (NOT_STARTED), plan_B (IN_PROGRESS), plan_C (IN_PROGRESS) 3. Make the `cancel_child_plan` call for plan_C raise a `RuntimeError` 4. Call `execute_cascade(correction_id, [plan_A_id, plan_B_id, plan_C_id])` 5. Observe that the exception is raised (correct) but plan_A and plan_B are NOT restored (incorrect — atomicity broken) ## Expected Behaviour When `execute_cascade()` fails mid-way (e.g., plan_C's cancellation raises), `_rollback_completed_actions()` should: 1. Iterate `completed_actions` in reverse order 2. For each action, attempt to restore the child plan (e.g., by re-activating it or signalling it was incorrectly cancelled) 3. For `cancel_and_rollback` actions, also attempt to undo the sandbox rollback ## Actual Behaviour `_rollback_completed_actions()` only logs a message. No actual undo operations are performed. The system is left in an inconsistent state where some child plans are cancelled but the parent correction was not applied. ## Subtasks - [ ] Implement actual undo logic in `_rollback_completed_actions()` — call appropriate reversal operations on `_plan_canceller` and `_sandbox_rollbacker` - [ ] Define a `ChildPlanRestorer` protocol (or extend existing protocols) to support undo operations - [ ] Add a Behave scenario in `features/cross_plan_correction.feature` that verifies atomicity: mid-cascade failure leaves no child plans in cancelled state - [ ] Verify `nox -e unit_tests` passes ## Definition of Done - `_rollback_completed_actions()` performs actual undo operations (not just logging) - Atomicity is verified by a Behave test scenario - `nox -e unit_tests` passes - PR merged > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.5.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#394 Epic: Decision Framework
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3720
No description provided.