[Bug Hunt][Cycle 2][A2A] Race condition in handler map caching could cause crashes #7041

Open
opened 2026-04-10 07:23:28 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: [Concurrency] — Race condition in handler map caching could cause crashes

Severity Assessment

  • Impact: System crashes, inconsistent request routing, potential data corruption
  • Likelihood: Medium - occurs under concurrent load when services are being registered
  • Priority: Critical

Location

  • File: src/cleveragents/a2a/facade.py
  • Function/Class: A2aLocalFacade.register_service() and _handlers()
  • Lines: 148-154 and 273-330

Description

The A2aLocalFacade class uses a cached handler map (_handler_map) that is invalidated by setting it to None in register_service() and rebuilt in _handlers(). This implementation is not thread-safe and can cause race conditions when multiple threads access the facade concurrently.

Evidence

def register_service(self, name: str, service: Any) -> None:
    """Register a named service for operation routing."""
    if not name or not isinstance(name, str):
        raise ValueError("name must be a non-empty string")
    self._services[name] = service
    # PERF-1 fix: invalidate cached handler map so new service
    # wiring is picked up on the next dispatch.
    self._handler_map = None  # ← NOT THREAD SAFE
    logger.debug("a2a.local.service_registered", service_name=name)

def _handlers(self) -> dict[str, Any]:
    """Return the cached operation -> handler mapping."""
    if self._handler_map is None:  # ← RACE CONDITION HERE
        self._handler_map = {
            # ... build handler map
        }
    return self._handler_map

Expected Behavior

Handler map caching should be thread-safe with proper synchronization to prevent race conditions during concurrent access.

Actual Behavior

Multiple threads can see _handler_map=None simultaneously and both attempt to rebuild the handler map, leading to inconsistent state and potential crashes.

Suggested Fix

Implement proper thread synchronization using a lock or use atomic operations for handler map management:

import threading

def __init__(self, services: dict[str, Any] | None = None) -> None:
    # ... existing code ...
    self._handler_map_lock = threading.RLock()

def register_service(self, name: str, service: Any) -> None:
    with self._handler_map_lock:
        # ... existing validation ...
        self._services[name] = service
        self._handler_map = None

def _handlers(self) -> dict[str, Any]:
    if self._handler_map is None:
        with self._handler_map_lock:
            if self._handler_map is None:  # Double-check locking pattern
                self._handler_map = {
                    # ... build handler map
                }
    return self._handler_map

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [Concurrency] — Race condition in handler map caching could cause crashes ### Severity Assessment - **Impact**: System crashes, inconsistent request routing, potential data corruption - **Likelihood**: Medium - occurs under concurrent load when services are being registered - **Priority**: Critical ### Location - **File**: `src/cleveragents/a2a/facade.py` - **Function/Class**: `A2aLocalFacade.register_service()` and `_handlers()` - **Lines**: 148-154 and 273-330 ### Description The `A2aLocalFacade` class uses a cached handler map (`_handler_map`) that is invalidated by setting it to `None` in `register_service()` and rebuilt in `_handlers()`. This implementation is not thread-safe and can cause race conditions when multiple threads access the facade concurrently. ### Evidence ```python def register_service(self, name: str, service: Any) -> None: """Register a named service for operation routing.""" if not name or not isinstance(name, str): raise ValueError("name must be a non-empty string") self._services[name] = service # PERF-1 fix: invalidate cached handler map so new service # wiring is picked up on the next dispatch. self._handler_map = None # ← NOT THREAD SAFE logger.debug("a2a.local.service_registered", service_name=name) def _handlers(self) -> dict[str, Any]: """Return the cached operation -> handler mapping.""" if self._handler_map is None: # ← RACE CONDITION HERE self._handler_map = { # ... build handler map } return self._handler_map ``` ### Expected Behavior Handler map caching should be thread-safe with proper synchronization to prevent race conditions during concurrent access. ### Actual Behavior Multiple threads can see `_handler_map=None` simultaneously and both attempt to rebuild the handler map, leading to inconsistent state and potential crashes. ### Suggested Fix Implement proper thread synchronization using a lock or use atomic operations for handler map management: ```python import threading def __init__(self, services: dict[str, Any] | None = None) -> None: # ... existing code ... self._handler_map_lock = threading.RLock() def register_service(self, name: str, service: Any) -> None: with self._handler_map_lock: # ... existing validation ... self._services[name] = service self._handler_map = None def _handlers(self) -> dict[str, Any]: if self._handler_map is None: with self._handler_map_lock: if self._handler_map is None: # Double-check locking pattern self._handler_map = { # ... build handler map } return self._handler_map ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
Author
Owner

Verified — Critical concurrency bug: race condition in handler map caching. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: race condition in handler map caching. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7041
No description provided.