BUG-HUNT: [concurrency] McpClient._ensure_started() releases lock between check and start — double-start race condition #7436

Open
opened 2026-04-10 19:19:32 +00:00 by HAL9000 · 3 comments
Owner

Bug Report: Concurrency — McpClient Double-Start Race Condition

Severity Assessment

  • Impact: Two concurrent call_tool() calls can both observe _started = False, both call start(), causing double adapter connection, double tool discovery, and potentially duplicate timer registration. This can cause the MCP server to receive duplicate connection requests, double-registering health check timers.
  • Likelihood: Medium — any multi-threaded code calling call_tool() concurrently before the first start completes
  • Priority: High

Location

  • File: src/cleveragents/mcp/client.py
  • Function: McpClient._ensure_started()
  • Lines: 198–215

Description

The _ensure_started() method has a classic TOCTOU race:

def _ensure_started(self) -> None:
    with self._lock:
        if self._started:
            return
        if not self._config.lazy_start:
            raise RuntimeError(...)
    # ← LOCK IS RELEASED HERE before calling start()!
    
    self.start()   # ← Thread A and Thread B both reach here!

The lock is released between checking self._started and calling self.start(). Two threads that both see _started = False can both proceed to call start().

The start() method has a similar issue:

def start(self) -> None:
    with self._lock:
        if self._started:  # ← Thread B checks here AFTER Thread A has locked, but...
            return
        self._state = McpClientState.STARTING
    # ← Lock released again!
    
    try:
        self._adapter.connect()     # ← Both threads may reach here
        self._adapter.discover_tools()
    except Exception:
        ...
    
    with self._lock:
        self._started = True        # ← Both threads set this True
        ...

The problem is that start() exits the lock before _adapter.connect(), allowing two concurrent start() calls to both begin connecting.

Evidence

# src/cleveragents/mcp/client.py

def _ensure_started(self) -> None:
    with self._lock:
        if self._started:
            return
        if not self._config.lazy_start:
            raise RuntimeError(...)
    # ← LOCK RELEASED — race window opens here
    self.start()  # ← Both Thread A and Thread B can reach this

def start(self) -> None:
    with self._lock:
        if self._started:
            return
        self._state = McpClientState.STARTING
    # ← LOCK RELEASED — another race window!
    
    try:
        self._adapter.connect()       # ← Double connect possible!
        self._adapter.discover_tools()
    except Exception:
        ...

Expected Behavior

Only one thread should initiate the start sequence; all other concurrent callers should wait until the first start completes.

Actual Behavior

Multiple threads can race through _ensure_started()start()_adapter.connect() simultaneously.

Suggested Fix

Use a separate "starting" event to serialize concurrent starts:

def __init__(self, ...):
    self._starting_event = threading.Event()
    self._starting_event.set()  # Not starting initially
    
def start(self) -> None:
    with self._lock:
        if self._started:
            return
        if not self._starting_event.is_set():
            # Another thread is already starting — wait for it
            pass  # fall through to wait below
        self._starting_event.clear()  # Signal "starting in progress"
        self._state = McpClientState.STARTING
    
    # Only one thread proceeds here; others waiting
    try:
        self._adapter.connect()
        ...
    finally:
        self._starting_event.set()  # Signal done (success or failure)

Or simpler: hold the lock across the entire connect+discover sequence (ensure connect/discover are non-blocking/timeoutted).

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with tags: @tdd_issue, @tdd_issue_, @tdd_expected_fail.


Automated by CleverAgents Bot
Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: Concurrency — McpClient Double-Start Race Condition ### Severity Assessment - **Impact**: Two concurrent `call_tool()` calls can both observe `_started = False`, both call `start()`, causing double adapter connection, double tool discovery, and potentially duplicate timer registration. This can cause the MCP server to receive duplicate connection requests, double-registering health check timers. - **Likelihood**: Medium — any multi-threaded code calling `call_tool()` concurrently before the first start completes - **Priority**: High ### Location - **File**: `src/cleveragents/mcp/client.py` - **Function**: `McpClient._ensure_started()` - **Lines**: 198–215 ### Description The `_ensure_started()` method has a classic TOCTOU race: ```python def _ensure_started(self) -> None: with self._lock: if self._started: return if not self._config.lazy_start: raise RuntimeError(...) # ← LOCK IS RELEASED HERE before calling start()! self.start() # ← Thread A and Thread B both reach here! ``` The lock is released between checking `self._started` and calling `self.start()`. Two threads that both see `_started = False` can both proceed to call `start()`. The `start()` method has a similar issue: ```python def start(self) -> None: with self._lock: if self._started: # ← Thread B checks here AFTER Thread A has locked, but... return self._state = McpClientState.STARTING # ← Lock released again! try: self._adapter.connect() # ← Both threads may reach here self._adapter.discover_tools() except Exception: ... with self._lock: self._started = True # ← Both threads set this True ... ``` The problem is that `start()` exits the lock before `_adapter.connect()`, allowing two concurrent `start()` calls to both begin connecting. ### Evidence ```python # src/cleveragents/mcp/client.py def _ensure_started(self) -> None: with self._lock: if self._started: return if not self._config.lazy_start: raise RuntimeError(...) # ← LOCK RELEASED — race window opens here self.start() # ← Both Thread A and Thread B can reach this def start(self) -> None: with self._lock: if self._started: return self._state = McpClientState.STARTING # ← LOCK RELEASED — another race window! try: self._adapter.connect() # ← Double connect possible! self._adapter.discover_tools() except Exception: ... ``` ### Expected Behavior Only one thread should initiate the start sequence; all other concurrent callers should wait until the first start completes. ### Actual Behavior Multiple threads can race through `_ensure_started()` → `start()` → `_adapter.connect()` simultaneously. ### Suggested Fix Use a separate "starting" event to serialize concurrent starts: ```python def __init__(self, ...): self._starting_event = threading.Event() self._starting_event.set() # Not starting initially def start(self) -> None: with self._lock: if self._started: return if not self._starting_event.is_set(): # Another thread is already starting — wait for it pass # fall through to wait below self._starting_event.clear() # Signal "starting in progress" self._state = McpClientState.STARTING # Only one thread proceeds here; others waiting try: self._adapter.connect() ... finally: self._starting_event.set() # Signal done (success or failure) ``` Or simpler: hold the lock across the entire connect+discover sequence (ensure connect/discover are non-blocking/timeoutted). ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with tags: @tdd_issue, @tdd_issue_<this-issue-number>, @tdd_expected_fail. --- **Automated by CleverAgents Bot** Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-10 19:34:58 +00:00
HAL9000 self-assigned this 2026-04-11 03:21:07 +00:00
Author
Owner

Verified — Critical concurrency bug: McpClient double-start race condition. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: McpClient double-start race condition. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Critical concurrency bug: McpClient double-start race condition. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: McpClient double-start race condition. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Critical concurrency bug: McpClient double-start race condition. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: McpClient double-start race condition. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7436
No description provided.