BUG-HUNT: [error-handling] A2aEventQueue swallows exceptions from subscribers #1772

Open
opened 2026-04-02 23:46:44 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/error-handling-a2a-event-queue-subscriber-exceptions
  • Commit Message: fix(a2a): implement configurable error handler for A2aEventQueue to prevent silent subscriber exception swallowing
  • Milestone: v3.7.0
  • Parent Epic: #1669

Bug Report: [error-handling] — A2aEventQueue swallows exceptions from subscribers

Severity Assessment

  • Impact: A faulty event subscriber can fail silently, preventing other components from reacting to events and making it difficult to debug system behavior.
  • Likelihood: High. Any exception in a subscriber callback will be caught and silenced.
  • Priority: Medium

Location

  • File: src/cleveragents/a2a/events.py
  • Function/Class: A2aEventQueue.publish
  • Lines: 118-125

Description

The publish method in A2aEventQueue iterates through its subscribers and calls them within a try...except Exception block. If a subscriber raises an exception, it is caught, logged, and then ignored. This "catch-and-log" approach completely swallows the exception, preventing it from propagating.

Per the project's error-handling standards (CONTRIBUTING.md), exceptions should not be suppressed. Exceptions should propagate to the top-level execution for centralized handling and logging, and should only be caught when they can be handled meaningfully (e.g., for retries or resource cleanup).

Evidence

# src/cleveragents/a2a/events.py:120
        for sub_id, callback in self._subscriptions.items():
            try:
                callback(event)
            except Exception:
                logger.exception(
                    "a2a.event.callback_error",
                    subscription_id=sub_id,
                )

Expected Behavior

The system should provide a mechanism to handle subscriber errors without completely silencing them. Options include:

  1. Error Channel: Publish subscriber errors to a separate error channel or event queue.
  2. Configurable Handler: Allow an optional error handler to be configured on the A2aEventQueue to process exceptions.
  3. Circuit Breaker: Implement a circuit breaker to automatically disable faulty subscribers after a certain number of failures.

Actual Behavior

Exceptions in subscriber callbacks are caught and logged, but the exception is otherwise ignored. The publisher and other subscribers are unaware that an error occurred.

Suggested Fix

Implement a configurable error handler for the A2aEventQueue. This would allow the application to define custom logic for handling subscriber exceptions, such as re-raising, logging to a different system, or disabling the faulty subscriber.

Category

error-handling

Subtasks

  • Add an optional on_subscriber_error callback parameter to A2aEventQueue.__init__ (typed as Callable[[str, Exception], None] | None)
  • Update A2aEventQueue.publish to invoke on_subscriber_error when a subscriber raises an exception, passing the subscription_id and the exception
  • If no on_subscriber_error is configured, re-raise the exception (fail-fast default) rather than silently swallowing it
  • Update all existing usages of A2aEventQueue to either supply an error handler or accept the new fail-fast default
  • Write Behave unit tests: scenario for subscriber raising an exception with no handler (expect re-raise), scenario with a custom handler (expect handler invoked, no re-raise)
  • Ensure nox -e typecheck passes (no # type: ignore suppression)
  • Ensure nox -e unit_tests passes
  • Ensure nox -e coverage_report passes with coverage >= 97%

Definition of Done

  • A2aEventQueue accepts an optional on_subscriber_error callback
  • Without a handler, subscriber exceptions propagate (fail-fast behaviour restored)
  • With a handler, the handler is invoked and the exception does not propagate
  • All Behave scenarios for the new behaviour are green
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/error-handling-a2a-event-queue-subscriber-exceptions` - **Commit Message**: `fix(a2a): implement configurable error handler for A2aEventQueue to prevent silent subscriber exception swallowing` - **Milestone**: v3.7.0 - **Parent Epic**: #1669 ## Bug Report: [error-handling] — A2aEventQueue swallows exceptions from subscribers ### Severity Assessment - **Impact**: A faulty event subscriber can fail silently, preventing other components from reacting to events and making it difficult to debug system behavior. - **Likelihood**: High. Any exception in a subscriber callback will be caught and silenced. - **Priority**: Medium ### Location - **File**: `src/cleveragents/a2a/events.py` - **Function/Class**: `A2aEventQueue.publish` - **Lines**: 118-125 ### Description The `publish` method in `A2aEventQueue` iterates through its subscribers and calls them within a `try...except Exception` block. If a subscriber raises an exception, it is caught, logged, and then ignored. This "catch-and-log" approach completely swallows the exception, preventing it from propagating. Per the project's error-handling standards (CONTRIBUTING.md), exceptions should not be suppressed. Exceptions should propagate to the top-level execution for centralized handling and logging, and should only be caught when they can be handled meaningfully (e.g., for retries or resource cleanup). ### Evidence ```python # src/cleveragents/a2a/events.py:120 for sub_id, callback in self._subscriptions.items(): try: callback(event) except Exception: logger.exception( "a2a.event.callback_error", subscription_id=sub_id, ) ``` ### Expected Behavior The system should provide a mechanism to handle subscriber errors without completely silencing them. Options include: 1. **Error Channel**: Publish subscriber errors to a separate error channel or event queue. 2. **Configurable Handler**: Allow an optional error handler to be configured on the `A2aEventQueue` to process exceptions. 3. **Circuit Breaker**: Implement a circuit breaker to automatically disable faulty subscribers after a certain number of failures. ### Actual Behavior Exceptions in subscriber callbacks are caught and logged, but the exception is otherwise ignored. The publisher and other subscribers are unaware that an error occurred. ### Suggested Fix Implement a configurable error handler for the `A2aEventQueue`. This would allow the application to define custom logic for handling subscriber exceptions, such as re-raising, logging to a different system, or disabling the faulty subscriber. ### Category error-handling ## Subtasks - [ ] Add an optional `on_subscriber_error` callback parameter to `A2aEventQueue.__init__` (typed as `Callable[[str, Exception], None] | None`) - [ ] Update `A2aEventQueue.publish` to invoke `on_subscriber_error` when a subscriber raises an exception, passing the `subscription_id` and the exception - [ ] If no `on_subscriber_error` is configured, re-raise the exception (fail-fast default) rather than silently swallowing it - [ ] Update all existing usages of `A2aEventQueue` to either supply an error handler or accept the new fail-fast default - [ ] Write Behave unit tests: scenario for subscriber raising an exception with no handler (expect re-raise), scenario with a custom handler (expect handler invoked, no re-raise) - [ ] Ensure `nox -e typecheck` passes (no `# type: ignore` suppression) - [ ] Ensure `nox -e unit_tests` passes - [ ] Ensure `nox -e coverage_report` passes with coverage >= 97% ## Definition of Done - [ ] `A2aEventQueue` accepts an optional `on_subscriber_error` callback - [ ] Without a handler, subscriber exceptions propagate (fail-fast behaviour restored) - [ ] With a handler, the handler is invoked and the exception does not propagate - [ ] All Behave scenarios for the new behaviour are green - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.7.0 milestone 2026-04-02 23:47:04 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Priority/High
  • MoSCoW: MoSCoW/Should Have — swallowing exceptions violates fail-fast principle. Should Have.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Priority/High - **MoSCoW**: MoSCoW/Should Have — swallowing exceptions violates fail-fast principle. Should Have. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#1669 Bug Hunting Session
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#1772
No description provided.