BUG-HUNT: [concurrency] MCPToolAdapter.discover_tools() holds global RLock during blocking transport call #7753

Open
opened 2026-04-12 03:24:47 +00:00 by HAL9000 · 3 comments
Owner

Bug Report: Concurrency — MCPToolAdapter.discover_tools() Holds Global RLock During Blocking Transport Call

Severity Assessment

  • Impact: The entire discover_tools() body runs under self._lock, including the blocking network call self._transport.call("tools/list", {}). This blocks all other adapter operations (including is_connected, capabilities, and invoke()) for the full duration of the discovery network round-trip. Health monitoring and concurrent tool invocations are serialized.
  • Likelihood: High — discover_tools() is called on startup, on reconnect, and repeatedly by the health-check timer (McpClient._check_health calls it at line 433). Any concurrent invoke() or property access blocks.
  • Priority: High

Location

  • File: src/cleveragents/mcp/adapter.py
  • Function/Class: MCPToolAdapter.discover_tools
  • Lines: 408–449

Description

The discover_tools() method acquires self._lock at line 424 and holds it until line 449, spanning the blocking call self._transport.call("tools/list", {}) at line 432. This is a network round-trip with indeterminate latency.

Consequences:

  1. If a background health-check timer fires during an active tool invocation, _check_health calls discover_tools(), which tries to acquire self._lock. Since invoke() already holds it (see related bug), discover_tools() blocks until invoke() completes.
  2. Conversely, if a health-check discovery is in progress, any invoke() call blocks.
  3. The self._tools dict is replaced atomically at line 448 — but this replacement happens under the lock while the network call is still included, meaning no interleaved reads of _tools are possible during discovery.

Evidence

# src/cleveragents/mcp/adapter.py lines 424-449
def discover_tools(self, tool_filter=None):
    with self._lock:                                          # lock acquired
        if not self._connected:
            raise RuntimeError(...)

        result = self._transport.call("tools/list", {})      # BLOCKING network I/O
        raw_tools = result.get("tools", [])

        descriptors = []
        for raw in raw_tools:
            desc = MCPToolDescriptor(...)
            descriptors.append(desc)

        if tool_filter:
            descriptors = self._apply_filter(descriptors, tool_filter)

        self._tools = {d.name: d for d in descriptors}       # state update
        return descriptors
    # lock released

Expected Behavior

The blocking network call should happen outside self._lock. Only the state mutation (self._tools = ...) requires the lock.

Actual Behavior

The blocking network call executes under self._lock, preventing any concurrent adapter operations.

Suggested Fix

def discover_tools(self, tool_filter=None):
    with self._lock:
        if not self._connected:
            raise RuntimeError(...)
    # Release lock before network call
    result = self._transport.call("tools/list", {})
    raw_tools = result.get("tools", [])
    descriptors = [MCPToolDescriptor(...) for raw in raw_tools]
    if tool_filter:
        descriptors = self._apply_filter(descriptors, tool_filter)
    with self._lock:  # Re-acquire only for state mutation
        self._tools = {d.name: d for d in descriptors}
    return descriptors

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: Concurrency — `MCPToolAdapter.discover_tools()` Holds Global RLock During Blocking Transport Call ### Severity Assessment - **Impact**: The entire `discover_tools()` body runs under `self._lock`, including the blocking network call `self._transport.call("tools/list", {})`. This blocks all other adapter operations (including `is_connected`, `capabilities`, and `invoke()`) for the full duration of the discovery network round-trip. Health monitoring and concurrent tool invocations are serialized. - **Likelihood**: High — `discover_tools()` is called on startup, on reconnect, and repeatedly by the health-check timer (`McpClient._check_health` calls it at line 433). Any concurrent `invoke()` or property access blocks. - **Priority**: High ### Location - **File**: `src/cleveragents/mcp/adapter.py` - **Function/Class**: `MCPToolAdapter.discover_tools` - **Lines**: 408–449 ### Description The `discover_tools()` method acquires `self._lock` at line 424 and holds it until line 449, spanning the blocking call `self._transport.call("tools/list", {})` at line 432. This is a network round-trip with indeterminate latency. Consequences: 1. If a background health-check timer fires during an active tool invocation, `_check_health` calls `discover_tools()`, which tries to acquire `self._lock`. Since `invoke()` already holds it (see related bug), `discover_tools()` blocks until `invoke()` completes. 2. Conversely, if a health-check discovery is in progress, any `invoke()` call blocks. 3. The `self._tools` dict is replaced atomically at line 448 — but this replacement happens under the lock while the network call is still included, meaning no interleaved reads of `_tools` are possible during discovery. ### Evidence ```python # src/cleveragents/mcp/adapter.py lines 424-449 def discover_tools(self, tool_filter=None): with self._lock: # lock acquired if not self._connected: raise RuntimeError(...) result = self._transport.call("tools/list", {}) # BLOCKING network I/O raw_tools = result.get("tools", []) descriptors = [] for raw in raw_tools: desc = MCPToolDescriptor(...) descriptors.append(desc) if tool_filter: descriptors = self._apply_filter(descriptors, tool_filter) self._tools = {d.name: d for d in descriptors} # state update return descriptors # lock released ``` ### Expected Behavior The blocking network call should happen outside `self._lock`. Only the state mutation (`self._tools = ...`) requires the lock. ### Actual Behavior The blocking network call executes under `self._lock`, preventing any concurrent adapter operations. ### Suggested Fix ```python def discover_tools(self, tool_filter=None): with self._lock: if not self._connected: raise RuntimeError(...) # Release lock before network call result = self._transport.call("tools/list", {}) raw_tools = result.get("tools", []) descriptors = [MCPToolDescriptor(...) for raw in raw_tools] if tool_filter: descriptors = self._apply_filter(descriptors, tool_filter) with self._lock: # Re-acquire only for state mutation self._tools = {d.name: d for d in descriptors} return descriptors ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-12 03:41:52 +00:00
Author
Owner

Verified — Concurrency bug: MCPToolAdapter holds global RLock during blocking I/O — deadlock risk. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: MCPToolAdapter holds global RLock during blocking I/O — deadlock risk. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Concurrency bug: MCPToolAdapter holds global RLock during blocking I/O — deadlock risk. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: MCPToolAdapter holds global RLock during blocking I/O — deadlock risk. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Concurrency bug: MCPToolAdapter holds global RLock during blocking I/O — deadlock risk. MoSCoW: Must-have. Priority: High.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: MCPToolAdapter holds global RLock during blocking I/O — deadlock risk. MoSCoW: Must-have. Priority: High. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7753
No description provided.