TDD: MCPToolAdapter holds RLock during entire transport call, blocking concurrent operations #10510

Open
opened 2026-04-18 10:24:33 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: tdd/mcp-adapter-lock-held-during-transport
  • Commit Message: TDD: Add test for RLock held during transport call blocking concurrent operations
  • Related Bug: (to be linked after bug issue is created)

Background and Context

MCPToolAdapter.invoke() and MCPToolAdapter.discover_tools() hold self._lock (an RLock) for the entire duration of the transport call. Since self._transport.call() is a blocking I/O operation (network call to MCP server), this serializes all concurrent operations and blocks health checks during tool invocations.

Expected Behavior

Concurrent calls to invoke() and discover_tools() should be able to run in parallel (or at least not be serialized by the lock held during I/O). The lock should only protect shared state access, not the entire duration of the blocking transport call.

Summary

This TDD issue captures the concurrency defect in MCPToolAdapter.invoke() and MCPToolAdapter.discover_tools() where the adapter's RLock is held for the entire duration of the transport call. This serializes all concurrent operations and blocks health checks during tool invocations.

Root Cause

In src/cleveragents/mcp/adapter.py, both invoke() and discover_tools() hold self._lock across the entire self._transport.call() invocation:

def invoke(self, tool_name: str, arguments: dict[str, Any]) -> MCPToolResult:
    with self._lock:                          # ← lock acquired
        if not self._connected:
            ...
        descriptor = self._tools.get(tool_name)
        ...
        start = time.monotonic()
        try:
            result = self._transport.call(    # ← slow I/O inside lock!
                "tools/call",
                {"name": tool_name, "arguments": arguments},
            )
        ...
        return MCPToolResult(...)             # ← lock released here

Since self._transport.call() is a blocking I/O operation (network call to MCP server), holding the lock during this call means:

  • No concurrent tool invocations are possible
  • disconnect() is blocked while a tool call is in progress
  • Health check _check_health()discover_tools() is blocked during any tool invocation

Test to Write

A test tagged with @tdd_issue, @tdd_issue_<N>, and @tdd_expected_fail that:

  1. Creates an MCPToolAdapter with a slow mock transport (e.g., sleeps 0.5s in call())
  2. Starts two concurrent threads: one calling invoke(), one calling discover_tools()
  3. Asserts that both calls complete within 1 second (i.e., they run concurrently, not serially)
  4. The test should fail because the lock serializes them (total time ≈ 1s, not 0.5s)

Subtasks

  • Write a concurrent test using a slow mock transport (sleep in call())
  • Measure wall-clock time for concurrent invoke() + discover_tools() calls
  • Assert that concurrent calls complete faster than sequential calls
  • Tag the test with @tdd_issue, @tdd_issue_<N>, and @tdd_expected_fail
  • Verify the test fails (proving the bug exists)
  • Create tdd/mcp-adapter-lock-held-during-transport branch and open PR

Acceptance Criteria

  • Test demonstrates that concurrent invoke() and discover_tools() calls are serialized (proving the bug)
  • Test is tagged with @tdd_issue, @tdd_issue_<N>, and @tdd_expected_fail
  • Test passes CI with @tdd_expected_fail tag (inverted result)

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
Tag: [AUTO-BUG-7]


Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Branch:** `tdd/mcp-adapter-lock-held-during-transport` - **Commit Message:** `TDD: Add test for RLock held during transport call blocking concurrent operations` - **Related Bug:** (to be linked after bug issue is created) ## Background and Context `MCPToolAdapter.invoke()` and `MCPToolAdapter.discover_tools()` hold `self._lock` (an `RLock`) for the entire duration of the transport call. Since `self._transport.call()` is a blocking I/O operation (network call to MCP server), this serializes all concurrent operations and blocks health checks during tool invocations. ## Expected Behavior Concurrent calls to `invoke()` and `discover_tools()` should be able to run in parallel (or at least not be serialized by the lock held during I/O). The lock should only protect shared state access, not the entire duration of the blocking transport call. ## Summary This TDD issue captures the concurrency defect in `MCPToolAdapter.invoke()` and `MCPToolAdapter.discover_tools()` where the adapter's `RLock` is held for the entire duration of the transport call. This serializes all concurrent operations and blocks health checks during tool invocations. ## Root Cause In `src/cleveragents/mcp/adapter.py`, both `invoke()` and `discover_tools()` hold `self._lock` across the entire `self._transport.call()` invocation: ```python def invoke(self, tool_name: str, arguments: dict[str, Any]) -> MCPToolResult: with self._lock: # ← lock acquired if not self._connected: ... descriptor = self._tools.get(tool_name) ... start = time.monotonic() try: result = self._transport.call( # ← slow I/O inside lock! "tools/call", {"name": tool_name, "arguments": arguments}, ) ... return MCPToolResult(...) # ← lock released here ``` Since `self._transport.call()` is a blocking I/O operation (network call to MCP server), holding the lock during this call means: - No concurrent tool invocations are possible - `disconnect()` is blocked while a tool call is in progress - Health check `_check_health()` → `discover_tools()` is blocked during any tool invocation ## Test to Write A test tagged with `@tdd_issue`, `@tdd_issue_<N>`, and `@tdd_expected_fail` that: 1. Creates an `MCPToolAdapter` with a slow mock transport (e.g., sleeps 0.5s in `call()`) 2. Starts two concurrent threads: one calling `invoke()`, one calling `discover_tools()` 3. Asserts that both calls complete within 1 second (i.e., they run concurrently, not serially) 4. The test should fail because the lock serializes them (total time ≈ 1s, not 0.5s) ## Subtasks - [ ] Write a concurrent test using a slow mock transport (sleep in `call()`) - [ ] Measure wall-clock time for concurrent `invoke()` + `discover_tools()` calls - [ ] Assert that concurrent calls complete faster than sequential calls - [ ] Tag the test with `@tdd_issue`, `@tdd_issue_<N>`, and `@tdd_expected_fail` - [ ] Verify the test fails (proving the bug exists) - [ ] Create `tdd/mcp-adapter-lock-held-during-transport` branch and open PR ## Acceptance Criteria - Test demonstrates that concurrent `invoke()` and `discover_tools()` calls are serialized (proving the bug) - Test is tagged with `@tdd_issue`, `@tdd_issue_<N>`, and `@tdd_expected_fail` - Test passes CI with `@tdd_expected_fail` tag (inverted result) ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. --- Automated by CleverAgents Bot Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor Tag: [AUTO-BUG-7] --- **Automated by CleverAgents Bot** Agent: new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-18 10:27:39 +00:00
Author
Owner

[GROOMED] Quality Analysis Complete

Issue Assessment: VALID & ACTIONABLE

This is a Priority/Critical TDD issue describing a real concurrency defect in MCPToolAdapter where an RLock is held during the entire blocking transport call, serializing concurrent operations and blocking health checks.

Triage Findings

Current State:

  • State: State/Unverified (requires transition to State/Verified)
  • Type: Type/Testing (correct for TDD issue)
  • Priority: Priority/Critical (appropriate for concurrency blocker)
  • Milestone: v3.5.0 (Autonomy Hardening) — ASSIGNED
  • Assignee: None (awaiting assignment)

Quality Checks

Check Status Details
Validity PASS Well-documented concurrency defect with clear root cause
Required Labels PASS Has State/, Type/, and Priority/ labels
Milestone PASS Assigned to v3.5.0 (Autonomy Hardening)
Orphan Status PASS Belongs to v3.5.0 milestone
Completeness PASS Includes metadata, background, root cause, test requirements, subtasks, DoD

Issue Summary

Problem: MCPToolAdapter.invoke() and discover_tools() hold self._lock (RLock) for the entire duration of self._transport.call(), a blocking I/O operation. This serializes all concurrent operations and blocks health checks during tool invocations.

Impact:

  • No concurrent tool invocations possible
  • disconnect() blocked during tool calls
  • Health checks blocked during any tool invocation

Solution Approach: TDD - write test demonstrating the bug, then fix the lock scope to only protect shared state access, not the entire I/O operation.

Actions Taken

Milestone Assignment: Assigned to v3.5.0 (Autonomy Hardening)
⚠️ Label Update: State/Unverified → State/Verified (requires manual update due to API restrictions)

Recommendation

READY FOR DEVELOPMENT — This issue is valid, well-scoped, and ready for a developer to:

  1. Create the tdd/mcp-adapter-lock-held-during-transport branch
  2. Write the concurrent test with slow mock transport
  3. Verify the test fails (proving the bug)
  4. Submit PR with @tdd_expected_fail tag

Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor

[GROOMED] Quality Analysis Complete ### Issue Assessment: ✅ VALID & ACTIONABLE This is a **Priority/Critical** TDD issue describing a real concurrency defect in `MCPToolAdapter` where an `RLock` is held during the entire blocking transport call, serializing concurrent operations and blocking health checks. ### Triage Findings **Current State:** - ✅ State: State/Unverified (requires transition to State/Verified) - ✅ Type: Type/Testing (correct for TDD issue) - ✅ Priority: Priority/Critical (appropriate for concurrency blocker) - ✅ Milestone: **v3.5.0** (Autonomy Hardening) — **ASSIGNED** - ❌ Assignee: None (awaiting assignment) ### Quality Checks | Check | Status | Details | |-------|--------|----------| | **Validity** | ✅ PASS | Well-documented concurrency defect with clear root cause | | **Required Labels** | ✅ PASS | Has State/, Type/, and Priority/ labels | | **Milestone** | ✅ PASS | Assigned to v3.5.0 (Autonomy Hardening) | | **Orphan Status** | ✅ PASS | Belongs to v3.5.0 milestone | | **Completeness** | ✅ PASS | Includes metadata, background, root cause, test requirements, subtasks, DoD | ### Issue Summary **Problem:** `MCPToolAdapter.invoke()` and `discover_tools()` hold `self._lock` (RLock) for the entire duration of `self._transport.call()`, a blocking I/O operation. This serializes all concurrent operations and blocks health checks during tool invocations. **Impact:** - No concurrent tool invocations possible - `disconnect()` blocked during tool calls - Health checks blocked during any tool invocation **Solution Approach:** TDD - write test demonstrating the bug, then fix the lock scope to only protect shared state access, not the entire I/O operation. ### Actions Taken ✅ **Milestone Assignment:** Assigned to v3.5.0 (Autonomy Hardening) ⚠️ **Label Update:** State/Unverified → State/Verified (requires manual update due to API restrictions) ### Recommendation **READY FOR DEVELOPMENT** — This issue is valid, well-scoped, and ready for a developer to: 1. Create the `tdd/mcp-adapter-lock-held-during-transport` branch 2. Write the concurrent test with slow mock transport 3. Verify the test fails (proving the bug) 4. Submit PR with `@tdd_expected_fail` tag --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10510
No description provided.