BUG: [data-flow] Potential for Inconsistent State in actor remove — Missing Transactional Boundary #3753

Open
opened 2026-04-05 22:28:34 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/actor-remove-transactional-boundary
  • Commit Message: fix(cli): wrap actor remove impact assessment and deletion in transactional boundary
  • Milestone: (none — see backlog note below)
  • Parent Epic: #392

Description

The remove command in src/cleveragents/cli/commands/actor.py (lines 781–841) computes the impact of removing an actor (session count, active plan count, action count) and then performs the removal as two separate, non-atomic operations. There is no transactional boundary around these two steps.

If the removal operation fails after the impact assessment has already been computed and displayed, the reported impact will be out of sync with the actual state of the system. In a concurrent or partially-failed scenario this can mislead the operator about what was actually affected.

Evidence

@app.command()
def remove(name: Annotated[str, typer.Argument(help="Actor name to remove")]) -> None:
    service, registry = _get_services()
    try:
        # Compute impact counts before removal so the display reflects
        # what was actually affected at the time of removal.
        session_count, active_plan_count, action_count = _compute_actor_impact(name)

        # Perform the removal
        if registry:
            registry.remove_actor(name)
        else:
            service.remove_actor(name)
        # ...
    except (ValidationError, BusinessRuleViolation, NotFoundError) as exc:
        console.print(f"[red]Error:[/red] {exc}")
        raise typer.Abort() from exc

Expected Behaviour

The impact assessment and the removal operation must be wrapped in a single transactional boundary (Unit of Work). If the removal fails, the transaction must be rolled back so the system remains in a consistent state and no misleading impact summary is surfaced to the user.

Suggested Fix

Introduce a Unit of Work / transaction context around both _compute_actor_impact and the remove_actor call. Leverage the existing UoW pattern already present in the service layer (per the specification's Repository + Unit of Work design patterns) so that either both operations succeed atomically or neither is committed.

Subtasks

  • Add a TDD issue-capture Behave scenario tagged @tdd_expected_fail that demonstrates the inconsistent-state bug (impact computed but removal fails)
  • Identify the correct Unit of Work / transaction context available in the service layer for actor operations
  • Refactor remove in src/cleveragents/cli/commands/actor.py to wrap _compute_actor_impact and remove_actor inside a single transactional boundary
  • Remove the @tdd_expected_fail tag once the fix is in place and verify the scenario passes
  • Add / update Robot Framework integration test to cover the failure-rollback path
  • Ensure nox -e typecheck passes (no # type: ignore suppressions)
  • Ensure nox -e lint passes
  • Ensure nox -e unit_tests and nox -e integration_tests pass
  • Verify nox -e coverage_report reports coverage ≥ 97 %

Definition of Done

  • TDD capture scenario exists and is tagged @tdd_expected_fail before the fix
  • remove in actor.py wraps impact assessment and deletion in a single transactional boundary
  • On removal failure the transaction is rolled back and no impact summary is displayed
  • All Behave unit scenarios pass (nox -e unit_tests)
  • All Robot Framework integration tests pass (nox -e integration_tests)
  • All nox stages pass
  • Coverage >= 97%
  • PR merged and this issue closed

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/actor-remove-transactional-boundary` - **Commit Message**: `fix(cli): wrap actor remove impact assessment and deletion in transactional boundary` - **Milestone**: *(none — see backlog note below)* - **Parent Epic**: #392 ## Description The `remove` command in `src/cleveragents/cli/commands/actor.py` (lines 781–841) computes the impact of removing an actor (session count, active plan count, action count) and then performs the removal as two separate, non-atomic operations. There is no transactional boundary around these two steps. If the removal operation fails after the impact assessment has already been computed and displayed, the reported impact will be out of sync with the actual state of the system. In a concurrent or partially-failed scenario this can mislead the operator about what was actually affected. ### Evidence ```python @app.command() def remove(name: Annotated[str, typer.Argument(help="Actor name to remove")]) -> None: service, registry = _get_services() try: # Compute impact counts before removal so the display reflects # what was actually affected at the time of removal. session_count, active_plan_count, action_count = _compute_actor_impact(name) # Perform the removal if registry: registry.remove_actor(name) else: service.remove_actor(name) # ... except (ValidationError, BusinessRuleViolation, NotFoundError) as exc: console.print(f"[red]Error:[/red] {exc}") raise typer.Abort() from exc ``` ### Expected Behaviour The impact assessment and the removal operation must be wrapped in a single transactional boundary (Unit of Work). If the removal fails, the transaction must be rolled back so the system remains in a consistent state and no misleading impact summary is surfaced to the user. ### Suggested Fix Introduce a Unit of Work / transaction context around both `_compute_actor_impact` and the `remove_actor` call. Leverage the existing UoW pattern already present in the service layer (per the specification's Repository + Unit of Work design patterns) so that either both operations succeed atomically or neither is committed. ## Subtasks - [ ] Add a TDD issue-capture Behave scenario tagged `@tdd_expected_fail` that demonstrates the inconsistent-state bug (impact computed but removal fails) - [ ] Identify the correct Unit of Work / transaction context available in the service layer for actor operations - [ ] Refactor `remove` in `src/cleveragents/cli/commands/actor.py` to wrap `_compute_actor_impact` and `remove_actor` inside a single transactional boundary - [ ] Remove the `@tdd_expected_fail` tag once the fix is in place and verify the scenario passes - [ ] Add / update Robot Framework integration test to cover the failure-rollback path - [ ] Ensure `nox -e typecheck` passes (no `# type: ignore` suppressions) - [ ] Ensure `nox -e lint` passes - [ ] Ensure `nox -e unit_tests` and `nox -e integration_tests` pass - [ ] Verify `nox -e coverage_report` reports coverage ≥ 97 % ## Definition of Done - [ ] TDD capture scenario exists and is tagged `@tdd_expected_fail` before the fix - [ ] `remove` in `actor.py` wraps impact assessment and deletion in a single transactional boundary - [ ] On removal failure the transaction is rolled back and no impact summary is displayed - [ ] All Behave unit scenarios pass (`nox -e unit_tests`) - [ ] All Robot Framework integration tests pass (`nox -e integration_tests`) - [ ] All nox stages pass - [ ] Coverage >= 97% - [ ] PR merged and this issue closed > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#392 Epic: Actor YAML & Compiler
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3753
No description provided.