Error Handling: Silent Suppression of RuntimeError in EventBusBridge #8855

Open
opened 2026-04-14 02:50:31 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Commit Message: fix(a2a): log suppressed RuntimeError in EventBusBridge._on_domain_event
  • Branch: bugfix/m6-eventbusbridge-silent-runtimeerror

Background and Context

In EventBusBridge._on_domain_event (located in cleveragents.a2a.events), a RuntimeError raised by _event_queue.publish is silently suppressed using contextlib.suppress(RuntimeError). This pattern violates the project's error handling principles, which explicitly state:

CRITICAL: Do not suppress errors. Let exceptions propagate to top-level execution.
No Silent Failures: Avoid returning null or default values when an error condition exists — raise exceptions or return explicit error types.

If the event queue is closed unexpectedly (e.g., due to a lifecycle management issue or race condition), this silent suppression will:

  1. Hide the error entirely — no log entry, no warning, no indication that anything went wrong.
  2. Silently drop events — domain events that should be forwarded to A2A subscribers are lost without any trace.
  3. Make debugging extremely difficult — operators and developers have no way to know events are being dropped.
  4. Risk data loss — in scenarios where event delivery is critical (e.g., task status updates), silent drops can lead to inconsistent client state.

Additionally, the import contextlib statement is placed inside the method body rather than at the top of the file, which violates the project's import guidelines.

Code Evidence:

# src/cleveragents/a2a/events.py — EventBusBridge._on_domain_event
import contextlib

with contextlib.suppress(RuntimeError):
    self._event_queue.publish(a2a_event)

Recommendation:
Replace the silent suppression with a WARNING-level log that records the dropped event and the reason (closed queue), so that operators can detect and diagnose the issue:

try:
    self._event_queue.publish(a2a_event)
except RuntimeError:
    logger.warning(
        "event_dropped_queue_closed",
        event_type=a2a_event.event_type,
        plan_id=str(a2a_event.plan_id) if a2a_event.plan_id else None,
    )

Expected Behavior

When _event_queue.publish raises a RuntimeError (e.g., because the queue is closed), the exception is caught and logged at WARNING level with relevant context (event type, plan ID). The event drop is visible in logs, enabling operators and developers to detect and diagnose the issue. No events are silently lost.

Acceptance Criteria

  • The contextlib.suppress(RuntimeError) block in EventBusBridge._on_domain_event is replaced with an explicit try/except RuntimeError block.
  • The except block logs the suppressed exception at WARNING level using the existing structlog logger, including event_type and plan_id as structured fields.
  • The import contextlib statement is removed from inside the method body (it is no longer needed).
  • No other import statements remain inside method bodies in cleveragents.a2a.events.
  • All existing BDD scenarios for EventBusBridge continue to pass.
  • A new BDD scenario is added that verifies a WARNING log entry is emitted when publish raises RuntimeError.
  • nox passes with no errors (lint, typecheck, unit tests, coverage ≥ 97%).

Subtasks

  • Replace contextlib.suppress(RuntimeError) with try/except RuntimeError in EventBusBridge._on_domain_event
  • Add logger.warning(...) call in the except block with structured fields (event_type, plan_id)
  • Remove the inline import contextlib from inside the method body
  • Tests (Behave): Add BDD scenario verifying WARNING log is emitted when queue is closed and publish raises RuntimeError
  • Tests (Behave): Verify all existing EventBusBridge scenarios still pass
  • Verify coverage ≥ 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (fix(a2a): log suppressed RuntimeError in EventBusBridge._on_domain_event), followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (bugfix/m6-eventbusbridge-silent-runtimeerror).
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit Message**: `fix(a2a): log suppressed RuntimeError in EventBusBridge._on_domain_event` - **Branch**: `bugfix/m6-eventbusbridge-silent-runtimeerror` ## Background and Context In `EventBusBridge._on_domain_event` (located in `cleveragents.a2a.events`), a `RuntimeError` raised by `_event_queue.publish` is silently suppressed using `contextlib.suppress(RuntimeError)`. This pattern violates the project's error handling principles, which explicitly state: > **CRITICAL: Do not suppress errors. Let exceptions propagate to top-level execution.** > **No Silent Failures:** Avoid returning null or default values when an error condition exists — raise exceptions or return explicit error types. If the event queue is closed unexpectedly (e.g., due to a lifecycle management issue or race condition), this silent suppression will: 1. **Hide the error entirely** — no log entry, no warning, no indication that anything went wrong. 2. **Silently drop events** — domain events that should be forwarded to A2A subscribers are lost without any trace. 3. **Make debugging extremely difficult** — operators and developers have no way to know events are being dropped. 4. **Risk data loss** — in scenarios where event delivery is critical (e.g., task status updates), silent drops can lead to inconsistent client state. Additionally, the `import contextlib` statement is placed inside the method body rather than at the top of the file, which violates the project's import guidelines. **Code Evidence:** ```python # src/cleveragents/a2a/events.py — EventBusBridge._on_domain_event import contextlib with contextlib.suppress(RuntimeError): self._event_queue.publish(a2a_event) ``` **Recommendation:** Replace the silent suppression with a `WARNING`-level log that records the dropped event and the reason (closed queue), so that operators can detect and diagnose the issue: ```python try: self._event_queue.publish(a2a_event) except RuntimeError: logger.warning( "event_dropped_queue_closed", event_type=a2a_event.event_type, plan_id=str(a2a_event.plan_id) if a2a_event.plan_id else None, ) ``` ## Expected Behavior When `_event_queue.publish` raises a `RuntimeError` (e.g., because the queue is closed), the exception is caught and logged at `WARNING` level with relevant context (event type, plan ID). The event drop is visible in logs, enabling operators and developers to detect and diagnose the issue. No events are silently lost. ## Acceptance Criteria - [ ] The `contextlib.suppress(RuntimeError)` block in `EventBusBridge._on_domain_event` is replaced with an explicit `try/except RuntimeError` block. - [ ] The `except` block logs the suppressed exception at `WARNING` level using the existing `structlog` logger, including `event_type` and `plan_id` as structured fields. - [ ] The `import contextlib` statement is removed from inside the method body (it is no longer needed). - [ ] No other `import` statements remain inside method bodies in `cleveragents.a2a.events`. - [ ] All existing BDD scenarios for `EventBusBridge` continue to pass. - [ ] A new BDD scenario is added that verifies a `WARNING` log entry is emitted when `publish` raises `RuntimeError`. - [ ] `nox` passes with no errors (lint, typecheck, unit tests, coverage ≥ 97%). ## Subtasks - [ ] Replace `contextlib.suppress(RuntimeError)` with `try/except RuntimeError` in `EventBusBridge._on_domain_event` - [ ] Add `logger.warning(...)` call in the `except` block with structured fields (`event_type`, `plan_id`) - [ ] Remove the inline `import contextlib` from inside the method body - [ ] Tests (Behave): Add BDD scenario verifying WARNING log is emitted when queue is closed and `publish` raises `RuntimeError` - [ ] Tests (Behave): Verify all existing `EventBusBridge` scenarios still pass - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`fix(a2a): log suppressed RuntimeError in EventBusBridge._on_domain_event`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`bugfix/m6-eventbusbridge-silent-runtimeerror`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. --- **Automated by CleverAgents Bot** Agent: new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-14 02:54:51 +00:00
Author
Owner

Triage Decision: VERIFIED — MoSCoW/Must Have

Real bug: EventBusBridge._on_domain_event silently suppresses RuntimeError exceptions, hiding failures in the event bridge. Silent error suppression in event handling can lead to data loss and hard-to-debug failures in the A2A event system.

Priority/High — Silent error suppression in event handling is a reliability concern for the A2A system.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Triage Decision: VERIFIED — MoSCoW/Must Have** Real bug: `EventBusBridge._on_domain_event` silently suppresses `RuntimeError` exceptions, hiding failures in the event bridge. Silent error suppression in event handling can lead to data loss and hard-to-debug failures in the A2A event system. **Priority/High** — Silent error suppression in event handling is a reliability concern for the A2A system. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8855
No description provided.