CrossPlanCorrectionService._rollback_completed_actions() is a no-op — atomic cascade guarantee violated #9376

Closed
opened 2026-04-14 16:16:43 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Commit Message: fix(correction): implement actual undo logic in _rollback_completed_actions to restore atomic cascade guarantee
  • Branch: fix/cross-plan-cascade-atomic-rollback

Background and Context

The CrossPlanCorrectionService in src/cleveragents/application/services/cross_plan_correction_service.py documents and advertises atomic cascade semantics: "All cascading actions are atomic: either every child plan action succeeds, or the entire cascade is rolled back to the pre-correction state."

The execute_cascade() method calls _rollback_completed_actions() when a mid-cascade failure occurs. However, the _rollback_completed_actions() method is a no-op — it only calls logger.info() inside the try block and never actually invokes any undo operations on the _plan_canceller or _sandbox_rollbacker dependencies.

This was discovered during UAT testing of the Cross-Plan Correction Cascading feature (features/cross_plan_correction.feature).

Current Behavior

When a cascade fails mid-way (e.g., CP2 fails to cancel after CP1 was already cancelled), the _rollback_completed_actions() method is called but performs no actual undo. The previously cancelled child plans (CP1) remain in their cancelled state, leaving the system in a partially mutated, inconsistent state.

Relevant code in src/cleveragents/application/services/cross_plan_correction_service.py (method _rollback_completed_actions, lines ~395–417):

def _rollback_completed_actions(
    self, completed_actions: list[CascadeAction]
) -> None:
    for action in reversed(completed_actions):
        try:
            logger.info(
                "cross_plan_correction.rollback_action",
                child_plan_id=action.child_plan_id,
            )
        except Exception as rollback_exc:
            logger.error(
                "cross_plan_correction.rollback_action_failed",
                child_plan_id=action.child_plan_id,
                error=str(rollback_exc),
            )

The try block contains only a logger.info() call — which cannot raise — making the except block unreachable and the entire method a no-op.

Expected Behavior

_rollback_completed_actions() must actually undo the completed cascade actions in reverse order. For each completed CascadeAction:

  • If the action was cancel or cancel_and_rollback, the undo should attempt to re-activate or restore the child plan (or at minimum signal the failure state clearly).
  • The method should call the appropriate dependency (_plan_canceller, _sandbox_rollbacker) to reverse the completed work.
  • Errors during undo should be logged but should not prevent other undos from being attempted (best-effort semantics, as the docstring states).

The feature specification (features/cross_plan_correction.feature, Scenario: "Atomic cascade — failure during one cancel rolls back all previous cancels") requires that when CP2 fails, the cascade error is raised and CP1's cancellation is rolled back.

Acceptance Criteria

  • _rollback_completed_actions() calls the appropriate undo operations on _plan_canceller and/or _sandbox_rollbacker for each completed action in reverse order
  • The method is no longer a no-op — it performs real undo work
  • Errors during individual undo steps are caught and logged (best-effort), but do not prevent other undos from running
  • The BDD scenario "Atomic cascade — failure during one cancel rolls back all previous cancels" in features/cross_plan_correction.feature passes and verifies the rollback actually occurred
  • All existing cross_plan_correction.feature scenarios continue to pass
  • nox -s unit_tests -- features/cross_plan_correction.feature passes with no failures

Supporting Information

  • File: src/cleveragents/application/services/cross_plan_correction_service.py
  • Method: CrossPlanCorrectionService._rollback_completed_actions()
  • Feature file: features/cross_plan_correction.feature
  • Spec reference: Feature docstring: "All cascading actions are atomic: either every child plan action succeeds, or the entire cascade is rolled back to the pre-correction state."
  • Discovered by: UAT Test Pool — Cross-Plan Correction Cascading feature area test

Subtasks

  • Implement actual undo logic in _rollback_completed_actions() — call _plan_canceller and/or _sandbox_rollbacker to reverse completed actions
  • Add/update BDD step to verify rollback actually occurred (not just that an error was raised)
  • Run nox -s unit_tests -- features/cross_plan_correction.feature and confirm all scenarios pass
  • Verify coverage ≥97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Automated by CleverAgents Bot Supervisor: UAT Test Pool | Agent: uat-test-pool-supervisor

## Metadata - **Commit Message**: `fix(correction): implement actual undo logic in _rollback_completed_actions to restore atomic cascade guarantee` - **Branch**: `fix/cross-plan-cascade-atomic-rollback` ## Background and Context The `CrossPlanCorrectionService` in `src/cleveragents/application/services/cross_plan_correction_service.py` documents and advertises **atomic cascade semantics**: "All cascading actions are atomic: either every child plan action succeeds, or the entire cascade is rolled back to the pre-correction state." The `execute_cascade()` method calls `_rollback_completed_actions()` when a mid-cascade failure occurs. However, the `_rollback_completed_actions()` method is a **no-op** — it only calls `logger.info()` inside the `try` block and never actually invokes any undo operations on the `_plan_canceller` or `_sandbox_rollbacker` dependencies. This was discovered during UAT testing of the Cross-Plan Correction Cascading feature (`features/cross_plan_correction.feature`). ## Current Behavior When a cascade fails mid-way (e.g., `CP2` fails to cancel after `CP1` was already cancelled), the `_rollback_completed_actions()` method is called but performs no actual undo. The previously cancelled child plans (`CP1`) remain in their cancelled state, leaving the system in a partially mutated, inconsistent state. Relevant code in `src/cleveragents/application/services/cross_plan_correction_service.py` (method `_rollback_completed_actions`, lines ~395–417): ```python def _rollback_completed_actions( self, completed_actions: list[CascadeAction] ) -> None: for action in reversed(completed_actions): try: logger.info( "cross_plan_correction.rollback_action", child_plan_id=action.child_plan_id, ) except Exception as rollback_exc: logger.error( "cross_plan_correction.rollback_action_failed", child_plan_id=action.child_plan_id, error=str(rollback_exc), ) ``` The `try` block contains only a `logger.info()` call — which cannot raise — making the `except` block unreachable and the entire method a no-op. ## Expected Behavior `_rollback_completed_actions()` must actually undo the completed cascade actions in reverse order. For each completed `CascadeAction`: - If the action was `cancel` or `cancel_and_rollback`, the undo should attempt to re-activate or restore the child plan (or at minimum signal the failure state clearly). - The method should call the appropriate dependency (`_plan_canceller`, `_sandbox_rollbacker`) to reverse the completed work. - Errors during undo should be logged but should not prevent other undos from being attempted (best-effort semantics, as the docstring states). The feature specification (`features/cross_plan_correction.feature`, Scenario: "Atomic cascade — failure during one cancel rolls back all previous cancels") requires that when `CP2` fails, the cascade error is raised and `CP1`'s cancellation is rolled back. ## Acceptance Criteria - [ ] `_rollback_completed_actions()` calls the appropriate undo operations on `_plan_canceller` and/or `_sandbox_rollbacker` for each completed action in reverse order - [ ] The method is no longer a no-op — it performs real undo work - [ ] Errors during individual undo steps are caught and logged (best-effort), but do not prevent other undos from running - [ ] The BDD scenario "Atomic cascade — failure during one cancel rolls back all previous cancels" in `features/cross_plan_correction.feature` passes and verifies the rollback actually occurred - [ ] All existing `cross_plan_correction.feature` scenarios continue to pass - [ ] `nox -s unit_tests -- features/cross_plan_correction.feature` passes with no failures ## Supporting Information - **File**: `src/cleveragents/application/services/cross_plan_correction_service.py` - **Method**: `CrossPlanCorrectionService._rollback_completed_actions()` - **Feature file**: `features/cross_plan_correction.feature` - **Spec reference**: Feature docstring: "All cascading actions are **atomic**: either every child plan action succeeds, or the entire cascade is rolled back to the pre-correction state." - **Discovered by**: UAT Test Pool — Cross-Plan Correction Cascading feature area test ## Subtasks - [ ] Implement actual undo logic in `_rollback_completed_actions()` — call `_plan_canceller` and/or `_sandbox_rollbacker` to reverse completed actions - [ ] Add/update BDD step to verify rollback actually occurred (not just that an error was raised) - [ ] Run `nox -s unit_tests -- features/cross_plan_correction.feature` and confirm all scenarios pass - [ ] Verify coverage ≥97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. --- **Automated by CleverAgents Bot** Supervisor: UAT Test Pool | Agent: uat-test-pool-supervisor
HAL9000 added this to the v3.3.0 milestone 2026-04-14 16:22:35 +00:00
Author
Owner

Triage: Verified [AUTO-OWNR-1]

Valid critical bug: CrossPlanCorrectionService._rollback_completed_actions() is a no-op — it only calls logger.info() inside the try block and never invokes any actual undo operations on _plan_canceller or _sandbox_rollbacker. This violates the documented atomic cascade guarantee: "All cascading actions are atomic: either every child plan action succeeds, or the entire cascade is rolled back to the pre-correction state."

When a cascade fails mid-way, the previously completed actions are NOT rolled back, leaving the system in a partially mutated, inconsistent state. This is a data integrity bug.

Assigning to v3.3.0 (Corrections + Subplans + Checkpoints) as cross-plan correction is a core M4 feature. Priority High — the atomic cascade guarantee is violated, leading to data inconsistency.

MoSCoW: Must Have — atomic cascade semantics are a core correctness guarantee for the correction workflow. Without it, failed corrections leave the system in an inconsistent state.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Triage: Verified** [AUTO-OWNR-1] Valid critical bug: `CrossPlanCorrectionService._rollback_completed_actions()` is a no-op — it only calls `logger.info()` inside the `try` block and never invokes any actual undo operations on `_plan_canceller` or `_sandbox_rollbacker`. This violates the documented atomic cascade guarantee: "All cascading actions are atomic: either every child plan action succeeds, or the entire cascade is rolled back to the pre-correction state." When a cascade fails mid-way, the previously completed actions are NOT rolled back, leaving the system in a partially mutated, inconsistent state. This is a data integrity bug. Assigning to **v3.3.0** (Corrections + Subplans + Checkpoints) as cross-plan correction is a core M4 feature. Priority **High** — the atomic cascade guarantee is violated, leading to data inconsistency. MoSCoW: **Must Have** — atomic cascade semantics are a core correctness guarantee for the correction workflow. Without it, failed corrections leave the system in an inconsistent state. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#9376
No description provided.