[Bug Hunt][Cycle 2][Reactive] Error Swallowing in A2A Event Callback Execution #7114

Open
opened 2026-04-10 07:52:55 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: bugfix/m3-a2a-event-callback-error-swallowing
  • Commit Message: fix(a2a): propagate exceptions from event callbacks instead of swallowing them
  • Milestone: (none — see backlog note below)
  • Parent Epic: #7052

Background and Context

A2aEventQueue.publish() (src/cleveragents/a2a/events.py) iterates over registered
callbacks and wraps each invocation in a bare except Exception block that only logs the
error and continues. This violates the project's Exception Propagation policy
(CONTRIBUTING.md §Error and Exception Handling):

"Do not suppress errors. Let exceptions propagate to top-level execution."
"Never catch exceptions just to log them — let them bubble up for centralized handling."

Critical bugs in event handlers are silently hidden, making debugging extremely difficult
and masking serious application errors.

Evidence

# src/cleveragents/a2a/events.py, lines 72-79
# Catches all exceptions from callbacks — error swallowing
for sub_id, callback in self._subscriptions.items():
    try:
        callback(event)
    except Exception:
        logger.exception(
            "a2a.event.callback_error",
            subscription_id=sub_id,
        )

The exception is caught, logged, and discarded. The caller of publish() receives no
indication that a callback failed. Any RuntimeError, ValueError, AssertionError, or
other exception raised inside a subscriber is silently swallowed.

Impact

  • Critical bugs in event handlers are silently hidden, making debugging extremely difficult.
  • Callers of publish() cannot distinguish between successful delivery and delivery with
    errors — they always see a clean return.
  • Violates the fail-fast principle: the system continues in a potentially inconsistent state
    after a callback failure.
  • Contradicts CONTRIBUTING.md §Error and Exception Handling: "Only catch exceptions when
    you can meaningfully handle them (e.g., retry logic, resource cleanup, adding context).
    Otherwise, let them propagate."

Proposed Fix

Remove the bare except Exception swallow. Either:

  1. Re-raise after logging — log the error for observability, then re-raise so the
    caller is aware of the failure.
  2. Collect and raise — iterate all callbacks, collect any exceptions, then raise a
    composite error after all callbacks have been attempted (preserving fan-out delivery
    while still surfacing failures).
  3. Remove the try/except entirely — let exceptions propagate naturally per the
    fail-fast principle.

The chosen approach must be consistent with the project's error-handling guidelines and
must not silently discard exceptions.

Subtasks

  • Write TDD issue-capture test (see TDD issue — this issue depends on it)
  • Remove or correct the bare except Exception swallow in A2aEventQueue.publish()
  • Ensure exceptions from callbacks propagate to the caller of publish()
  • Update EventBusBridge._on_domain_event() if needed (it calls publish() inside
    contextlib.suppress(RuntimeError) — verify this is still appropriate after the fix)
  • Update BDD feature file and step definitions for the new behaviour
  • Run nox and confirm all stages pass with coverage ≥ 97%

Definition of Done

  • A2aEventQueue.publish() no longer silently swallows callback exceptions
  • Exceptions from event callbacks propagate to the caller (or are collected and
    re-raised) rather than being discarded after logging
  • BDD scenarios cover the error-propagation behaviour
  • @tdd_expected_fail tag removed from the TDD issue-capture test
  • All nox stages pass
  • Coverage ≥ 97%

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: Acting on behalf of: Bug Hunt | Agent: new-issue-creator

## Metadata - **Branch**: `bugfix/m3-a2a-event-callback-error-swallowing` - **Commit Message**: `fix(a2a): propagate exceptions from event callbacks instead of swallowing them` - **Milestone**: *(none — see backlog note below)* - **Parent Epic**: #7052 ## Background and Context `A2aEventQueue.publish()` (`src/cleveragents/a2a/events.py`) iterates over registered callbacks and wraps each invocation in a bare `except Exception` block that only logs the error and continues. This violates the project's **Exception Propagation** policy (CONTRIBUTING.md §Error and Exception Handling): > *"Do not suppress errors. Let exceptions propagate to top-level execution."* > *"Never catch exceptions just to log them — let them bubble up for centralized handling."* Critical bugs in event handlers are silently hidden, making debugging extremely difficult and masking serious application errors. ## Evidence ```python # src/cleveragents/a2a/events.py, lines 72-79 # Catches all exceptions from callbacks — error swallowing for sub_id, callback in self._subscriptions.items(): try: callback(event) except Exception: logger.exception( "a2a.event.callback_error", subscription_id=sub_id, ) ``` The exception is caught, logged, and **discarded**. The caller of `publish()` receives no indication that a callback failed. Any `RuntimeError`, `ValueError`, `AssertionError`, or other exception raised inside a subscriber is silently swallowed. ## Impact - Critical bugs in event handlers are silently hidden, making debugging extremely difficult. - Callers of `publish()` cannot distinguish between successful delivery and delivery with errors — they always see a clean return. - Violates the fail-fast principle: the system continues in a potentially inconsistent state after a callback failure. - Contradicts CONTRIBUTING.md §Error and Exception Handling: *"Only catch exceptions when you can meaningfully handle them (e.g., retry logic, resource cleanup, adding context). Otherwise, let them propagate."* ## Proposed Fix Remove the bare `except Exception` swallow. Either: 1. **Re-raise after logging** — log the error for observability, then re-raise so the caller is aware of the failure. 2. **Collect and raise** — iterate all callbacks, collect any exceptions, then raise a composite error after all callbacks have been attempted (preserving fan-out delivery while still surfacing failures). 3. **Remove the try/except entirely** — let exceptions propagate naturally per the fail-fast principle. The chosen approach must be consistent with the project's error-handling guidelines and must not silently discard exceptions. ## Subtasks - [ ] Write TDD issue-capture test (see TDD issue — this issue depends on it) - [ ] Remove or correct the bare `except Exception` swallow in `A2aEventQueue.publish()` - [ ] Ensure exceptions from callbacks propagate to the caller of `publish()` - [ ] Update `EventBusBridge._on_domain_event()` if needed (it calls `publish()` inside `contextlib.suppress(RuntimeError)` — verify this is still appropriate after the fix) - [ ] Update BDD feature file and step definitions for the new behaviour - [ ] Run `nox` and confirm all stages pass with coverage ≥ 97% ## Definition of Done - [ ] `A2aEventQueue.publish()` no longer silently swallows callback exceptions - [ ] Exceptions from event callbacks propagate to the caller (or are collected and re-raised) rather than being discarded after logging - [ ] BDD scenarios cover the error-propagation behaviour - [ ] `@tdd_expected_fail` tag removed from the TDD issue-capture test - [ ] All nox stages pass - [ ] Coverage ≥ 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: Acting on behalf of: Bug Hunt | Agent: new-issue-creator
Author
Owner

Verified — Critical bug: A2A event callback errors swallowed silently. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical bug: A2A event callback errors swallowed silently. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#7114
No description provided.