BUG-HUNT: [resource] InlineToolExecutor._run_with_timeout calls proc.wait() after proc.kill() — deadlock when subprocess produces large output before timeout #6590

Open
opened 2026-04-09 21:52:34 +00:00 by HAL9000 · 0 comments
Owner

Bug Report: Resource — proc.wait() After proc.kill() Can Deadlock

Severity Assessment

  • Impact: The parent process hangs indefinitely when an inline tool produces enough stdout/stderr output to fill the OS pipe buffer before the timeout fires. This prevents the executor from returning and blocks the entire calling thread permanently.
  • Likelihood: Medium — any inline tool that writes more than ~64 KB of output (the typical Linux pipe buffer) before its timeout will trigger this. Long-running or verbose tools are particularly susceptible.
  • Priority: High

Location

  • File: src/cleveragents/skills/inline_executor.py
  • Function: InlineToolExecutor._run_with_timeout
  • Lines: ~344–357

Description

When proc.communicate() raises subprocess.TimeoutExpired, the code kills the child process with proc.kill() and then calls proc.wait() to reap the zombie:

# inline_executor.py  lines 344–357
except subprocess.TimeoutExpired:
    proc.kill()
    proc.wait()       # ← DEADLOCK
    elapsed_ms = (time.monotonic() - start) * 1000.0
    return InlineToolResult(
        success=False,
        error_message=(
            f"Execution timed out after {self._max_runtime_seconds}s"
        ),
        duration_ms=elapsed_ms,
    )

The process was started with stdout=subprocess.PIPE, stderr=subprocess.PIPE. If the child wrote enough bytes to fill the OS pipe buffer before the timeout expired, those bytes are still sitting in the buffer. proc.wait() only waits for the process to exit — it does not drain the pipes. If the pipe is full, the child's write() syscall blocks; the child cannot exit; proc.wait() blocks forever.

The Python documentation for Popen.wait() explicitly warns:

Warning

: This will deadlock when using stdout=PIPE or stderr=PIPE and the child process generates enough output to fill a pipe buffer. Use Popen.communicate() when using pipes to avoid that.

Evidence

proc = subprocess.Popen(
    [...],
    stdout=subprocess.PIPE,  # pipe buffer can fill up
    stderr=subprocess.PIPE,
    ...
)
try:
    stdout_bytes, stderr_bytes = proc.communicate(
        timeout=self._max_runtime_seconds
    )
except subprocess.TimeoutExpired:
    proc.kill()
    proc.wait()   # ← documented deadlock risk: pipes not drained

Expected Behavior

After killing the timed-out child process, the executor should drain the pipes and return the timeout error promptly.

Actual Behavior

proc.wait() blocks indefinitely if the child process filled the pipe buffer before being killed.

Suggested Fix

Replace proc.wait() with proc.communicate() after the kill, which safely drains the buffers. The Python docs recommend this pattern:

except subprocess.TimeoutExpired:
    proc.kill()
    # Drain pipes to prevent deadlock; discard output
    proc.communicate()
    elapsed_ms = (time.monotonic() - start) * 1000.0
    return InlineToolResult(
        success=False,
        error_message=(
            f"Execution timed out after {self._max_runtime_seconds}s"
        ),
        duration_ms=elapsed_ms,
    )

Category

resource

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: Resource — `proc.wait()` After `proc.kill()` Can Deadlock ### Severity Assessment - **Impact**: The parent process hangs indefinitely when an inline tool produces enough stdout/stderr output to fill the OS pipe buffer before the timeout fires. This prevents the executor from returning and blocks the entire calling thread permanently. - **Likelihood**: Medium — any inline tool that writes more than ~64 KB of output (the typical Linux pipe buffer) before its timeout will trigger this. Long-running or verbose tools are particularly susceptible. - **Priority**: High ### Location - **File**: `src/cleveragents/skills/inline_executor.py` - **Function**: `InlineToolExecutor._run_with_timeout` - **Lines**: ~344–357 ### Description When `proc.communicate()` raises `subprocess.TimeoutExpired`, the code kills the child process with `proc.kill()` and then calls `proc.wait()` to reap the zombie: ```python # inline_executor.py lines 344–357 except subprocess.TimeoutExpired: proc.kill() proc.wait() # ← DEADLOCK elapsed_ms = (time.monotonic() - start) * 1000.0 return InlineToolResult( success=False, error_message=( f"Execution timed out after {self._max_runtime_seconds}s" ), duration_ms=elapsed_ms, ) ``` The process was started with `stdout=subprocess.PIPE, stderr=subprocess.PIPE`. If the child wrote enough bytes to fill the OS pipe buffer **before** the timeout expired, those bytes are still sitting in the buffer. `proc.wait()` only waits for the process to exit — it does **not** drain the pipes. If the pipe is full, the child's `write()` syscall blocks; the child cannot exit; `proc.wait()` blocks forever. The Python documentation for `Popen.wait()` explicitly warns: > **Warning**: This will deadlock when using `stdout=PIPE` or `stderr=PIPE` and the child process generates enough output to fill a pipe buffer. Use `Popen.communicate()` when using pipes to avoid that. ### Evidence ```python proc = subprocess.Popen( [...], stdout=subprocess.PIPE, # pipe buffer can fill up stderr=subprocess.PIPE, ... ) try: stdout_bytes, stderr_bytes = proc.communicate( timeout=self._max_runtime_seconds ) except subprocess.TimeoutExpired: proc.kill() proc.wait() # ← documented deadlock risk: pipes not drained ``` ### Expected Behavior After killing the timed-out child process, the executor should drain the pipes and return the timeout error promptly. ### Actual Behavior `proc.wait()` blocks indefinitely if the child process filled the pipe buffer before being killed. ### Suggested Fix Replace `proc.wait()` with `proc.communicate()` after the kill, which safely drains the buffers. The Python docs recommend this pattern: ```python except subprocess.TimeoutExpired: proc.kill() # Drain pipes to prevent deadlock; discard output proc.communicate() elapsed_ms = (time.monotonic() - start) * 1000.0 return InlineToolResult( success=False, error_message=( f"Execution timed out after {self._max_runtime_seconds}s" ), duration_ms=elapsed_ms, ) ``` ### Category resource ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
HAL9000 added this to the v3.2.0 milestone 2026-04-09 22:13:19 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6590
No description provided.