BUG-HUNT: [concurrency] Potential race condition in A2aLocalFacade #5983

Open
opened 2026-04-09 12:52:48 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: fix/a2a-local-facade-race-condition
  • Commit Message: fix(a2a): add threading lock to protect concurrent access to _services in A2aLocalFacade
  • Milestone: (backlog — see note below)
  • Parent Epic: #4949

Backlog note: This issue was discovered during autonomous operation
on milestone v3.5.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Bug Report: [concurrency] — Potential race condition in A2aLocalFacade

Severity Assessment

  • Impact: In a multi-threaded environment, concurrent calls to dispatch and register_service could lead to inconsistent behavior, where a dispatch operation might not see a newly registered service, or could access the _services dictionary in an intermediate state.
  • Likelihood: Low. The A2aLocalFacade is primarily used in a single-threaded context in the CLI. However, if the facade is ever used in a multi-threaded server or application, this bug could manifest.
  • Priority: Medium

Location

  • File: src/cleveragents/a2a/facade.py
  • Function/Class: A2aLocalFacade
  • Lines: 123-130, 205-214

Description

The A2aLocalFacade class has a potential race condition. The _services dictionary is mutable and is modified by the register_service method. The dispatch method reads from this dictionary. There is no locking mechanism to protect concurrent access to self._services.

Evidence

# In A2aLocalFacade:

def __init__(self, services: dict[str, Any] | None = None) -> None:
    # ...
    self._services: dict[str, Any] = dict(services) if services else {}
    # ...

def dispatch(self, request: A2aRequest) -> A2aResponse:
    # ...
    # Reads from self._services via property methods
    # ...

def register_service(self, name: str, service: Any) -> None:
    # ...
    self._services[name] = service # Unprotected write
    # ...

Expected Behavior

Access to the _services dictionary should be thread-safe. A lock should be used to protect reads and writes to the dictionary to ensure that dispatch and register_service can be called from multiple threads without causing race conditions.

Actual Behavior

The _services dictionary is accessed and modified without any synchronization, making the A2aLocalFacade not thread-safe.

Suggested Fix

Introduce a threading.Lock to protect access to self._services.

import threading

class A2aLocalFacade:
    def __init__(self, services: dict[str, Any] | None = None) -> None:
        # ...
        self._services: dict[str, Any] = dict(services) if services else {}
        self._lock = threading.Lock()
        # ...

    def dispatch(self, request: A2aRequest) -> A2aResponse:
        with self._lock:
            # ...
            # Access self._services
            # ...

    def register_service(self, name: str, service: Any) -> None:
        with self._lock:
            # ...
            self._services[name] = service
            # ...

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be
created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>,
and @tdd_expected_fail to prove the bug exists before fixing it.

Subtasks

  • Reproduce the race condition in a test (Behave scenario with @tdd_issue, @tdd_issue_<N>, @tdd_expected_fail tags)
  • Add threading.Lock (self._lock) to A2aLocalFacade.__init__
  • Wrap dispatch method body with with self._lock: context manager
  • Wrap register_service method body with with self._lock: context manager
  • Audit any other methods in A2aLocalFacade that read or write self._services and protect them with the lock
  • Remove @tdd_expected_fail tag from the TDD test once the fix is in place
  • Ensure all nox stages pass

Definition of Done

  • A2aLocalFacade._services is protected by a threading.Lock for all reads and writes
  • A Behave unit test (tagged @tdd_issue and @tdd_issue_<N>) demonstrates thread-safety of dispatch and register_service
  • No # type: ignore suppressions introduced
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## Metadata - **Branch**: `fix/a2a-local-facade-race-condition` - **Commit Message**: `fix(a2a): add threading lock to protect concurrent access to _services in A2aLocalFacade` - **Milestone**: (backlog — see note below) - **Parent Epic**: #4949 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.5.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Bug Report: [concurrency] — Potential race condition in `A2aLocalFacade` ### Severity Assessment - **Impact**: In a multi-threaded environment, concurrent calls to `dispatch` and `register_service` could lead to inconsistent behavior, where a dispatch operation might not see a newly registered service, or could access the `_services` dictionary in an intermediate state. - **Likelihood**: Low. The `A2aLocalFacade` is primarily used in a single-threaded context in the CLI. However, if the facade is ever used in a multi-threaded server or application, this bug could manifest. - **Priority**: Medium ### Location - **File**: `src/cleveragents/a2a/facade.py` - **Function/Class**: `A2aLocalFacade` - **Lines**: 123-130, 205-214 ### Description The `A2aLocalFacade` class has a potential race condition. The `_services` dictionary is mutable and is modified by the `register_service` method. The `dispatch` method reads from this dictionary. There is no locking mechanism to protect concurrent access to `self._services`. ### Evidence ```python # In A2aLocalFacade: def __init__(self, services: dict[str, Any] | None = None) -> None: # ... self._services: dict[str, Any] = dict(services) if services else {} # ... def dispatch(self, request: A2aRequest) -> A2aResponse: # ... # Reads from self._services via property methods # ... def register_service(self, name: str, service: Any) -> None: # ... self._services[name] = service # Unprotected write # ... ``` ### Expected Behavior Access to the `_services` dictionary should be thread-safe. A lock should be used to protect reads and writes to the dictionary to ensure that `dispatch` and `register_service` can be called from multiple threads without causing race conditions. ### Actual Behavior The `_services` dictionary is accessed and modified without any synchronization, making the `A2aLocalFacade` not thread-safe. ### Suggested Fix Introduce a `threading.Lock` to protect access to `self._services`. ```python import threading class A2aLocalFacade: def __init__(self, services: dict[str, Any] | None = None) -> None: # ... self._services: dict[str, Any] = dict(services) if services else {} self._lock = threading.Lock() # ... def dispatch(self, request: A2aRequest) -> A2aResponse: with self._lock: # ... # Access self._services # ... def register_service(self, name: str, service: Any) -> None: with self._lock: # ... self._services[name] = service # ... ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. ## Subtasks - [ ] Reproduce the race condition in a test (Behave scenario with `@tdd_issue`, `@tdd_issue_<N>`, `@tdd_expected_fail` tags) - [ ] Add `threading.Lock` (`self._lock`) to `A2aLocalFacade.__init__` - [ ] Wrap `dispatch` method body with `with self._lock:` context manager - [ ] Wrap `register_service` method body with `with self._lock:` context manager - [ ] Audit any other methods in `A2aLocalFacade` that read or write `self._services` and protect them with the lock - [ ] Remove `@tdd_expected_fail` tag from the TDD test once the fix is in place - [ ] Ensure all nox stages pass ## Definition of Done - [ ] `A2aLocalFacade._services` is protected by a `threading.Lock` for all reads and writes - [ ] A Behave unit test (tagged `@tdd_issue` and `@tdd_issue_<N>`) demonstrates thread-safety of `dispatch` and `register_service` - [ ] No `# type: ignore` suppressions introduced - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-09 13:39:39 +00:00
Author
Owner

Label compliance fix applied:

  • Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#5983
No description provided.