feat(a2a): implement thread-safe A2A event queue with publish/subscribe for plan execution events #8941

Open
opened 2026-04-14 04:04:59 +00:00 by HAL9000 · 1 comment
Owner

Background and Context

The v3.5.0 milestone (M6: Autonomy Hardening) requires an event queue with publish/subscribe semantics to coordinate asynchronous plan execution events across the system. The A2A Facade Session & Guard Enforcement Epic (#8082) requires that the event queue is operational and thread-safe.

The current A2aEventQueue implementation has known thread-safety issues (see #8858, #8409, #4873) — the _events list and _subscriptions dict are not protected by locks, causing race conditions in parallel execution. This must be fixed before the event queue can be used reliably in production.

Parent Epic: #8082 (Epic: A2A Facade Session & Guard Enforcement (M6))

Expected Behavior

When this issue is complete:

  • A2aEventQueue.publish(event) is thread-safe and does not crash when called from multiple threads
  • A2aEventQueue.subscribe(event_type, handler) is thread-safe
  • Plan lifecycle events (started, subplan_spawned, decision_recorded, completed, failed) are published to the queue
  • Subscribers receive events in order with no dropped messages under normal load
  • A2aEventQueue is registered in the DI container and accessible via event_queue service

Acceptance Criteria

  • A2aEventQueue is protected by threading.Lock for all _events and _subscriptions mutations
  • A2aEventQueue.publish() does not crash when subscriber callback calls publish() (no re-entrant deadlock)
  • A2aEventQueue is registered in the DI container as event_queue service
  • EventBusBridge correctly bridges domain events to the A2A event queue without silent error suppression (see #8855)
  • Plan lifecycle events (started, subplan_spawned, decision_recorded, completed, failed) are published
  • BDD tests cover: publish/subscribe, concurrent publish, subscriber receives events in order
  • nox passes with coverage >= 97%

Subtasks

  • Fix A2aEventQueue thread-safety: add threading.Lock to _events and _subscriptions (see #8858)
  • Fix A2aEventQueue.publish() crash when subscriber calls publish() (see #8409)
  • Register A2aEventQueue in DI container as event_queue service (see #6201)
  • Fix EventBusBridge silent error suppression (see #8855)
  • Wire plan lifecycle events to A2aEventQueue.publish()
  • Write BDD scenarios for event queue publish/subscribe
  • Verify nox passes with coverage >= 97%

Definition of Done

  • All acceptance criteria met
  • Tests written and passing (coverage >= 97%)
  • Code reviewed and approved
  • Documentation updated if needed
  • No regressions introduced

Metadata

  • Commit message: feat(a2a): implement thread-safe A2A event queue with publish/subscribe for plan execution events
  • Branch name: feat/a2a-event-queue-thread-safe

Automated by CleverAgents Bot
Agent: new-issue-creator

## Background and Context The v3.5.0 milestone (M6: Autonomy Hardening) requires an event queue with publish/subscribe semantics to coordinate asynchronous plan execution events across the system. The A2A Facade Session & Guard Enforcement Epic (#8082) requires that the event queue is operational and thread-safe. The current `A2aEventQueue` implementation has known thread-safety issues (see #8858, #8409, #4873) — the `_events` list and `_subscriptions` dict are not protected by locks, causing race conditions in parallel execution. This must be fixed before the event queue can be used reliably in production. Parent Epic: #8082 (Epic: A2A Facade Session & Guard Enforcement (M6)) ## Expected Behavior When this issue is complete: - `A2aEventQueue.publish(event)` is thread-safe and does not crash when called from multiple threads - `A2aEventQueue.subscribe(event_type, handler)` is thread-safe - Plan lifecycle events (started, subplan_spawned, decision_recorded, completed, failed) are published to the queue - Subscribers receive events in order with no dropped messages under normal load - `A2aEventQueue` is registered in the DI container and accessible via `event_queue` service ## Acceptance Criteria - [ ] `A2aEventQueue` is protected by `threading.Lock` for all `_events` and `_subscriptions` mutations - [ ] `A2aEventQueue.publish()` does not crash when subscriber callback calls `publish()` (no re-entrant deadlock) - [ ] `A2aEventQueue` is registered in the DI container as `event_queue` service - [ ] `EventBusBridge` correctly bridges domain events to the A2A event queue without silent error suppression (see #8855) - [ ] Plan lifecycle events (started, subplan_spawned, decision_recorded, completed, failed) are published - [ ] BDD tests cover: publish/subscribe, concurrent publish, subscriber receives events in order - [ ] `nox` passes with coverage >= 97% ## Subtasks - [ ] Fix `A2aEventQueue` thread-safety: add `threading.Lock` to `_events` and `_subscriptions` (see #8858) - [ ] Fix `A2aEventQueue.publish()` crash when subscriber calls `publish()` (see #8409) - [ ] Register `A2aEventQueue` in DI container as `event_queue` service (see #6201) - [ ] Fix `EventBusBridge` silent error suppression (see #8855) - [ ] Wire plan lifecycle events to `A2aEventQueue.publish()` - [ ] Write BDD scenarios for event queue publish/subscribe - [ ] Verify `nox` passes with coverage >= 97% ## Definition of Done - [ ] All acceptance criteria met - [ ] Tests written and passing (coverage >= 97%) - [ ] Code reviewed and approved - [ ] Documentation updated if needed - [ ] No regressions introduced ## Metadata - **Commit message:** `feat(a2a): implement thread-safe A2A event queue with publish/subscribe for plan execution events` - **Branch name:** `feat/a2a-event-queue-thread-safe` --- **Automated by CleverAgents Bot** Agent: new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-14 04:10:49 +00:00
Author
Owner

Triage Decision [AUTO-OWNR-2]

Verified

Thread-safe A2A event queue is explicitly in v3.5.0 acceptance criteria: 'Event queue publish/subscribe operational'. This is a Must Have.

  • Type: Feature
  • MoSCoW: Must Have — explicitly in v3.5.0 acceptance criteria
  • Priority: High
  • Milestone: v3.5.0

Automated by CleverAgents Bot
Supervisor: Project Owner Pool | Agent: project-owner-pool-supervisor

## Triage Decision [AUTO-OWNR-2] **Verified** ✅ Thread-safe A2A event queue is explicitly in v3.5.0 acceptance criteria: 'Event queue publish/subscribe operational'. This is a Must Have. - **Type:** Feature - **MoSCoW:** Must Have — explicitly in v3.5.0 acceptance criteria - **Priority:** High - **Milestone:** v3.5.0 --- **Automated by CleverAgents Bot** Supervisor: Project Owner Pool | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
Reference
cleveragents/cleveragents-core#8941
No description provided.