BUG-HUNT: error-handling — CrossPlanCorrectionService._rollback_completed_actions() is a no-op #7712

Open
opened 2026-04-12 03:19:38 +00:00 by HAL9000 · 4 comments
Owner

Bug Report: error-handling — CrossPlanCorrectionService._rollback_completed_actions() is a no-op

Severity Assessment

  • Impact: The promised atomicity guarantee of execute_cascade() is completely broken. If any cascade action fails mid-way, the method calls _rollback_completed_actions() but nothing is actually undone — previously cancelled child plans remain cancelled and rolled-back sandboxes remain rolled back. The docstring claims "All cascade operations are atomic" but the compensating rollback is empty.
  • Likelihood: Any cross-plan cascade that fails partway through (e.g., a sandbox rollback raises RuntimeError) will silently leave the system in a partially-applied, inconsistent state.
  • Priority: Critical

Location

  • File: src/cleveragents/application/services/cross_plan_correction_service.py
  • Function/Class: CrossPlanCorrectionService._rollback_completed_actions
  • Lines: 395–417

Description

execute_cascade() claims atomicity: if any child-plan action fails, all completed actions are undone. However, _rollback_completed_actions() only calls logger.info(...) — it does not actually call _plan_canceller, _sandbox_rollbacker, or any undo mechanism. The try/except inside iterates completed actions and logs, but the except block only fires on logger failures (which also just logs). No actual compensation occurs.

Evidence

def _rollback_completed_actions(
    self, completed_actions: list[CascadeAction]
) -> None:
    """Undo previously completed cascade actions for atomicity.

    Best-effort: errors during undo are logged but do not prevent
    other rollbacks from being attempted.
    """
    for action in reversed(completed_actions):
        try:
            logger.info(
                "cross_plan_correction.rollback_action",
                child_plan_id=action.child_plan_id,
            )
        except Exception as rollback_exc:
            logger.error(
                "cross_plan_correction.rollback_action_failed",
                child_plan_id=action.child_plan_id,
                error=str(rollback_exc),
            )

The try block only logs; no undo call is made. The except catches logger failures, not rollback failures, because there is no rollback call to fail.

Expected Behavior

When execute_cascade() fails mid-cascade, all previously completed cancel/rollback actions should be reversed (e.g., by re-spawning cancelled plans or restoring sandbox state).

Actual Behavior

The rollback is entirely skipped. Child plans that were already cancelled remain cancelled. The correction is re-raised as if rollback succeeded, leaving the system in an inconsistent state with no compensating action taken.

Suggested Fix

The _rollback_completed_actions method must contain actual compensation calls. Since plan-cancellation typically cannot be perfectly reversed, at minimum the method should record the compensation attempt or raise a RollbackError. For full atomicity, each completed action should be reversed:

def _rollback_completed_actions(self, completed_actions: list[CascadeAction]) -> None:
    for action in reversed(completed_actions):
        try:
            # Re-open / restore cancelled plan
            self._plan_canceller.restore_child_plan(action.child_plan_id)  # or equivalent
            if action.sandbox_rolled_back:
                self._sandbox_rollbacker.restore_child_plan_sandbox(action.child_plan_id)
            logger.info(...)
        except Exception as rollback_exc:
            logger.error(...)

Category

error-handling

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: error-handling — `CrossPlanCorrectionService._rollback_completed_actions()` is a no-op ### Severity Assessment - **Impact**: The promised atomicity guarantee of `execute_cascade()` is completely broken. If any cascade action fails mid-way, the method calls `_rollback_completed_actions()` but nothing is actually undone — previously cancelled child plans remain cancelled and rolled-back sandboxes remain rolled back. The docstring claims "All cascade operations are atomic" but the compensating rollback is empty. - **Likelihood**: Any cross-plan cascade that fails partway through (e.g., a sandbox rollback raises `RuntimeError`) will silently leave the system in a partially-applied, inconsistent state. - **Priority**: Critical ### Location - **File**: `src/cleveragents/application/services/cross_plan_correction_service.py` - **Function/Class**: `CrossPlanCorrectionService._rollback_completed_actions` - **Lines**: 395–417 ### Description `execute_cascade()` claims atomicity: if any child-plan action fails, all completed actions are undone. However, `_rollback_completed_actions()` only calls `logger.info(...)` — it does **not** actually call `_plan_canceller`, `_sandbox_rollbacker`, or any undo mechanism. The `try/except` inside iterates completed actions and logs, but the `except` block only fires on logger failures (which also just logs). No actual compensation occurs. ### Evidence ```python def _rollback_completed_actions( self, completed_actions: list[CascadeAction] ) -> None: """Undo previously completed cascade actions for atomicity. Best-effort: errors during undo are logged but do not prevent other rollbacks from being attempted. """ for action in reversed(completed_actions): try: logger.info( "cross_plan_correction.rollback_action", child_plan_id=action.child_plan_id, ) except Exception as rollback_exc: logger.error( "cross_plan_correction.rollback_action_failed", child_plan_id=action.child_plan_id, error=str(rollback_exc), ) ``` The `try` block only logs; **no undo call is made**. The `except` catches logger failures, not rollback failures, because there is no rollback call to fail. ### Expected Behavior When `execute_cascade()` fails mid-cascade, all previously completed cancel/rollback actions should be reversed (e.g., by re-spawning cancelled plans or restoring sandbox state). ### Actual Behavior The rollback is entirely skipped. Child plans that were already cancelled remain cancelled. The correction is re-raised as if rollback succeeded, leaving the system in an inconsistent state with no compensating action taken. ### Suggested Fix The `_rollback_completed_actions` method must contain actual compensation calls. Since plan-cancellation typically cannot be perfectly reversed, at minimum the method should record the compensation attempt or raise a `RollbackError`. For full atomicity, each completed action should be reversed: ```python def _rollback_completed_actions(self, completed_actions: list[CascadeAction]) -> None: for action in reversed(completed_actions): try: # Re-open / restore cancelled plan self._plan_canceller.restore_child_plan(action.child_plan_id) # or equivalent if action.sandbox_rolled_back: self._sandbox_rollbacker.restore_child_plan_sandbox(action.child_plan_id) logger.info(...) except Exception as rollback_exc: logger.error(...) ``` ### Category error-handling ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-12 03:24:18 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Critical — broken atomicity guarantee in cascade correction; any mid-cascade failure leaves system in inconsistent state with no compensation
  • Milestone: v3.2.0 — CrossPlanCorrectionService is part of the correction engine required for M3 (plan correct functionality)
  • Story Points: 5 — L — requires implementing actual compensation logic with proper undo calls for each cascade action type
  • MoSCoW: Must Have — plan correction atomicity is a core correctness guarantee; broken rollback means data corruption risk
  • Parent Epic: Correction engine epic (v3.2.0 scope)

The _rollback_completed_actions() method is entirely a no-op — it only logs. This means execute_cascade() atomicity claim is false. Any cascade failure leaves child plans cancelled with no recovery path.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Critical — broken atomicity guarantee in cascade correction; any mid-cascade failure leaves system in inconsistent state with no compensation - **Milestone**: v3.2.0 — CrossPlanCorrectionService is part of the correction engine required for M3 (plan correct functionality) - **Story Points**: 5 — L — requires implementing actual compensation logic with proper undo calls for each cascade action type - **MoSCoW**: Must Have — plan correction atomicity is a core correctness guarantee; broken rollback means data corruption risk - **Parent Epic**: Correction engine epic (v3.2.0 scope) The `_rollback_completed_actions()` method is entirely a no-op — it only logs. This means `execute_cascade()` atomicity claim is false. Any cascade failure leaves child plans cancelled with no recovery path. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner
Author
Owner

Verified — Critical bug: CrossPlanCorrectionService._rollback_completed_actions() is a no-op — rollback never actually happens. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical bug: CrossPlanCorrectionService._rollback_completed_actions() is a no-op — rollback never actually happens. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Critical bug: CrossPlanCorrectionService._rollback_completed_actions() is a no-op — rollback never actually happens. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical bug: CrossPlanCorrectionService._rollback_completed_actions() is a no-op — rollback never actually happens. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Critical bug: CrossPlanCorrectionService._rollback_completed_actions() is a no-op — rollback never actually happens. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical bug: CrossPlanCorrectionService._rollback_completed_actions() is a no-op — rollback never actually happens. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7712
No description provided.