UAT: ActorService.remove_actor() silently swallows event bus emission exceptions — violates fail-fast principle #3729

Open
opened 2026-04-05 22:20:36 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/actor-service-remove-event-fail-fast
  • Commit Message: fix(actor): propagate event bus exceptions in ActorService.remove_actor() per fail-fast principle
  • Milestone: None (Backlog)
  • Parent Epic: #230

Background and Context

CONTRIBUTING.md states:

"Do not suppress errors. Exceptions should propagate to the top-level execution where they can be logged and handled."
"Only catch exceptions when they can be meaningfully handled (e.g., for retry logic or resource cleanup). Do not catch exceptions just to log and re-raise."

The ActorService.remove_actor() method in src/cleveragents/application/services/actor_service.py catches all exceptions from event bus emission and only logs a warning, silently swallowing the error.

Current Behavior

ActorService.remove_actor() (approximately lines 130–155 of actor_service.py):

def remove_actor(self, name: str) -> None:
    """Remove a custom actor if allowed."""
    normalized = self._normalize_name(name, allow_built_in=False)
    with self.unit_of_work.transaction() as ctx:
        actor = ctx.actors.get_by_name(normalized)
        if not actor:
            raise NotFoundError(resource_type="actor", resource_id=normalized)
        try:
            ctx.actors.delete(normalized)
        except ValueError as exc:
            raise BusinessRuleViolation(str(exc)) from exc
    if self._event_bus is not None:
        try:
            self._event_bus.emit(
                DomainEvent(
                    event_type=EventType.ENTITY_DELETED,
                    details={
                        "entity_type": "actor",
                        "entity_name": normalized,
                    },
                )
            )
        except Exception:
            _logger.warning("audit_emit_failed", event_type="ENTITY_DELETED")  # BUG: swallows exception

When self._event_bus.emit() raises any exception, it is caught by the broad except Exception clause and only a warning is logged. The exception is not re-raised.

Expected Behavior

Per CONTRIBUTING.md's fail-fast principle, exceptions from event bus emission should propagate to the caller unless there is a specific, meaningful reason to suppress them (e.g., the event bus is optional/best-effort and the spec explicitly allows silent failure).

If the event bus emission is intended to be best-effort (i.e., actor removal should succeed even if the event cannot be emitted), this must be explicitly documented and the exception handling must be intentional. The current code provides no such documentation.

The correct approach is either:

  1. Remove the try/except and let exceptions propagate (fail-fast), OR
  2. Document the intent that event emission is best-effort and add a specific exception type (not bare except Exception) with a clear comment explaining why silent failure is acceptable here.

Impact

  • Event bus failures are silently swallowed, making it impossible to detect event bus connectivity issues during actor removal.
  • The SessionManager (which listens for actor removal events to invalidate sessions) may not receive the event, leading to stale session state.
  • Violates CONTRIBUTING.md's explicit prohibition on catching exceptions just to log and re-raise (or in this case, just to log and discard).

Code Location

src/cleveragents/application/services/actor_service.pyremove_actor() method, event emission block (approximately lines 145–155)

Subtasks

  • Write a TDD failing Behave scenario (@tdd_expected_fail) asserting that exceptions from event_bus.emit() propagate out of remove_actor()
  • Determine the intended behavior: is event emission best-effort or required?
    • If required: Remove the try/except block entirely and let exceptions propagate
    • If best-effort: Replace except Exception with a specific exception type, add a docstring comment explaining why silent failure is acceptable, and log at error level (not warning)
  • Remove @tdd_expected_fail tag once fix is in place
  • Verify nox -e typecheck passes
  • Verify nox -e unit_tests passes with coverage >= 97%
  • Run nox (all default sessions), fix any errors

Definition of Done

  • ActorService.remove_actor() either propagates event bus exceptions or has explicit, documented justification for suppressing them
  • No bare except Exception clause that silently discards exceptions without re-raising
  • Behave BDD scenarios cover the event emission behavior
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous UAT testing of the Actor System feature area. It does not block milestone completion and has been placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/actor-service-remove-event-fail-fast` - **Commit Message**: `fix(actor): propagate event bus exceptions in ActorService.remove_actor() per fail-fast principle` - **Milestone**: None (Backlog) - **Parent Epic**: #230 ## Background and Context `CONTRIBUTING.md` states: > "Do not suppress errors. Exceptions should propagate to the top-level execution where they can be logged and handled." > "Only catch exceptions when they can be meaningfully handled (e.g., for retry logic or resource cleanup). Do not catch exceptions just to log and re-raise." The `ActorService.remove_actor()` method in `src/cleveragents/application/services/actor_service.py` catches all exceptions from event bus emission and only logs a warning, silently swallowing the error. ## Current Behavior `ActorService.remove_actor()` (approximately lines 130–155 of `actor_service.py`): ```python def remove_actor(self, name: str) -> None: """Remove a custom actor if allowed.""" normalized = self._normalize_name(name, allow_built_in=False) with self.unit_of_work.transaction() as ctx: actor = ctx.actors.get_by_name(normalized) if not actor: raise NotFoundError(resource_type="actor", resource_id=normalized) try: ctx.actors.delete(normalized) except ValueError as exc: raise BusinessRuleViolation(str(exc)) from exc if self._event_bus is not None: try: self._event_bus.emit( DomainEvent( event_type=EventType.ENTITY_DELETED, details={ "entity_type": "actor", "entity_name": normalized, }, ) ) except Exception: _logger.warning("audit_emit_failed", event_type="ENTITY_DELETED") # BUG: swallows exception ``` When `self._event_bus.emit()` raises any exception, it is caught by the broad `except Exception` clause and only a warning is logged. The exception is not re-raised. ## Expected Behavior Per `CONTRIBUTING.md`'s fail-fast principle, exceptions from event bus emission should propagate to the caller unless there is a specific, meaningful reason to suppress them (e.g., the event bus is optional/best-effort and the spec explicitly allows silent failure). If the event bus emission is intended to be best-effort (i.e., actor removal should succeed even if the event cannot be emitted), this must be explicitly documented and the exception handling must be intentional. The current code provides no such documentation. The correct approach is either: 1. **Remove the try/except** and let exceptions propagate (fail-fast), OR 2. **Document the intent** that event emission is best-effort and add a specific exception type (not bare `except Exception`) with a clear comment explaining why silent failure is acceptable here. ## Impact - Event bus failures are silently swallowed, making it impossible to detect event bus connectivity issues during actor removal. - The `SessionManager` (which listens for actor removal events to invalidate sessions) may not receive the event, leading to stale session state. - Violates `CONTRIBUTING.md`'s explicit prohibition on catching exceptions just to log and re-raise (or in this case, just to log and discard). ## Code Location `src/cleveragents/application/services/actor_service.py` — `remove_actor()` method, event emission block (approximately lines 145–155) ## Subtasks - [ ] Write a TDD failing Behave scenario (`@tdd_expected_fail`) asserting that exceptions from `event_bus.emit()` propagate out of `remove_actor()` - [ ] Determine the intended behavior: is event emission best-effort or required? - If **required**: Remove the `try/except` block entirely and let exceptions propagate - If **best-effort**: Replace `except Exception` with a specific exception type, add a docstring comment explaining why silent failure is acceptable, and log at `error` level (not `warning`) - [ ] Remove `@tdd_expected_fail` tag once fix is in place - [ ] Verify `nox -e typecheck` passes - [ ] Verify `nox -e unit_tests` passes with coverage >= 97% - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done - [ ] `ActorService.remove_actor()` either propagates event bus exceptions or has explicit, documented justification for suppressing them - [ ] No bare `except Exception` clause that silently discards exceptions without re-raising - [ ] Behave BDD scenarios cover the event emission behavior - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous UAT testing of the Actor System feature area. It does not block milestone completion and has been placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#230 Stage A1: Plan Data Model
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3729
No description provided.