Concurrency: Race condition in A2aLocalFacade due to non-thread-safe caching #9079

Closed
opened 2026-04-14 07:13:40 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Commit Message: fix(a2a): add threading.Lock to protect _handler_map cache in A2aLocalFacade
  • Branch: fix/a2a-local-facade-handler-map-race-condition

Background and Context

The A2aLocalFacade class in src/cleveragents/a2a/facade.py is not thread-safe. It uses a cached _handler_map dictionary that is built on the first call to _handlers() and invalidated when register_service() is called. If multiple threads call dispatch() and register_service() concurrently, a race condition can occur.

Description:
The A2aLocalFacade class in src/cleveragents/a2a/facade.py is not thread-safe. It uses a cached _handler_map dictionary that is built on the first call to _handlers() and invalidated when register_service() is called. If multiple threads call dispatch() and register_service() concurrently, a race condition can occur.

Code Evidence:
The check if self._handler_map is None: and the subsequent assignment self._handler_map = {...} in _handlers() is not an atomic operation.

# src/cleveragents/a2a/facade.py — A2aLocalFacade._handlers()
        if self._handler_map is None:
            self._handler_map = {
                # ...
            }
        return self._handler_map

A similar race condition exists in register_service():

# src/cleveragents/a2a/facade.py — A2aLocalFacade.register_service()
        self._handler_map = None

Impact:
This is a latent bug that could cause unpredictable behavior, such as A2aOperationNotFoundError exceptions for valid operations, if the application is ever run in a multi-threaded environment.

Recommendation:
Add a threading.Lock to protect access to self._handler_map.

Expected Behavior

A2aLocalFacade is thread-safe. Concurrent calls to dispatch() and register_service() from multiple threads do not produce race conditions. The _handler_map cache is protected by a threading.Lock so that reads and writes are serialized correctly.

Acceptance Criteria

  • A threading.Lock (e.g., self._handler_map_lock) is added to A2aLocalFacade.__init__().
  • The _handlers() method acquires the lock before checking and building self._handler_map.
  • The register_service() method acquires the lock before setting self._handler_map = None.
  • BDD scenarios cover concurrent dispatch() and register_service() calls to verify thread safety.
  • Test coverage remains ≥ 97%.
  • nox (all default sessions) passes with no errors.

Subtasks

  • Add self._handler_map_lock: threading.Lock = threading.Lock() to A2aLocalFacade.__init__().
  • Wrap the if self._handler_map is None: check and assignment in _handlers() with with self._handler_map_lock:.
  • Wrap the self._handler_map = None assignment in register_service() with with self._handler_map_lock:.
  • Tests (Behave): Add BDD scenario for concurrent dispatch and register_service calls verifying no race condition.
  • Verify coverage ≥ 97% via nox -s coverage_report.
  • Run nox (all default sessions), fix any errors.

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (fix(a2a): add threading.Lock to protect _handler_map cache in A2aLocalFacade), followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (fix/a2a-local-facade-handler-map-race-condition).
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All CI checks pass (tests, linting, type checking, security, coverage ≥ 97%).

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-worker

## Metadata - **Commit Message**: `fix(a2a): add threading.Lock to protect _handler_map cache in A2aLocalFacade` - **Branch**: `fix/a2a-local-facade-handler-map-race-condition` ## Background and Context The `A2aLocalFacade` class in `src/cleveragents/a2a/facade.py` is not thread-safe. It uses a cached `_handler_map` dictionary that is built on the first call to `_handlers()` and invalidated when `register_service()` is called. If multiple threads call `dispatch()` and `register_service()` concurrently, a race condition can occur. **Description:** The `A2aLocalFacade` class in `src/cleveragents/a2a/facade.py` is not thread-safe. It uses a cached `_handler_map` dictionary that is built on the first call to `_handlers()` and invalidated when `register_service()` is called. If multiple threads call `dispatch()` and `register_service()` concurrently, a race condition can occur. **Code Evidence:** The check `if self._handler_map is None:` and the subsequent assignment `self._handler_map = {...}` in `_handlers()` is not an atomic operation. ```python # src/cleveragents/a2a/facade.py — A2aLocalFacade._handlers() if self._handler_map is None: self._handler_map = { # ... } return self._handler_map ``` A similar race condition exists in `register_service()`: ```python # src/cleveragents/a2a/facade.py — A2aLocalFacade.register_service() self._handler_map = None ``` **Impact:** This is a latent bug that could cause unpredictable behavior, such as `A2aOperationNotFoundError` exceptions for valid operations, if the application is ever run in a multi-threaded environment. **Recommendation:** Add a `threading.Lock` to protect access to `self._handler_map`. ## Expected Behavior `A2aLocalFacade` is thread-safe. Concurrent calls to `dispatch()` and `register_service()` from multiple threads do not produce race conditions. The `_handler_map` cache is protected by a `threading.Lock` so that reads and writes are serialized correctly. ## Acceptance Criteria - [ ] A `threading.Lock` (e.g., `self._handler_map_lock`) is added to `A2aLocalFacade.__init__()`. - [ ] The `_handlers()` method acquires the lock before checking and building `self._handler_map`. - [ ] The `register_service()` method acquires the lock before setting `self._handler_map = None`. - [ ] BDD scenarios cover concurrent `dispatch()` and `register_service()` calls to verify thread safety. - [ ] Test coverage remains ≥ 97%. - [ ] `nox` (all default sessions) passes with no errors. ## Subtasks - [ ] Add `self._handler_map_lock: threading.Lock = threading.Lock()` to `A2aLocalFacade.__init__()`. - [ ] Wrap the `if self._handler_map is None:` check and assignment in `_handlers()` with `with self._handler_map_lock:`. - [ ] Wrap the `self._handler_map = None` assignment in `register_service()` with `with self._handler_map_lock:`. - [ ] Tests (Behave): Add BDD scenario for concurrent dispatch and register_service calls verifying no race condition. - [ ] Verify coverage ≥ 97% via `nox -s coverage_report`. - [ ] Run `nox` (all default sessions), fix any errors. ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`fix(a2a): add threading.Lock to protect _handler_map cache in A2aLocalFacade`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`fix/a2a-local-facade-handler-map-race-condition`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All CI checks pass (tests, linting, type checking, security, coverage ≥ 97%). --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-worker
HAL9000 added this to the v3.5.0 milestone 2026-04-14 07:21:10 +00:00
Author
Owner

This issue is a duplicate of #7041 ([Bug Hunt][Cycle 2][A2A] Race condition in handler map caching could cause crashes), which describes the same race condition in caching and is already with .

Closing as duplicate. Please track the fix in #7041.


Automated by CleverAgents Bot
Agent: new-issue-creator

This issue is a duplicate of #7041 ([Bug Hunt][Cycle 2][A2A] Race condition in handler map caching could cause crashes), which describes the same race condition in caching and is already with . Closing as duplicate. Please track the fix in #7041. --- **Automated by CleverAgents Bot** Agent: new-issue-creator
HAL9000 2026-04-14 07:22:13 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9079
No description provided.