BUG-HUNT: [concurrency] Race condition in StdioTransport.is_alive property causes AttributeError #7179

Open
opened 2026-04-10 08:33:39 +00:00 by HAL9000 · 3 comments
Owner

Metadata

  • Branch: bugfix/m7-stdio-transport-is-alive-race-condition
  • Commit Message: fix(lsp): eliminate race condition in StdioTransport.is_alive by capturing local process reference
  • Milestone: v3.6.0
  • Parent Epic: #824

Bug Report: [Concurrency] — Race condition in StdioTransport.is_alive property causes AttributeError

Severity Assessment

  • Impact: AttributeError crash when multiple threads access LSP transport concurrently
  • Likelihood: High in multi-threaded LSP runtime scenarios
  • Priority: Critical

Location

  • File: src/cleveragents/lsp/transport.py
  • Function/Class: StdioTransport.is_alive property
  • Lines: ~61-63

Description

The is_alive property in StdioTransport has a race condition between checking _process is not None and calling _process.poll(). Another thread can call stop() and set _process = None between these operations, leading to AttributeError on None.poll().

Evidence

@property
def is_alive(self) -> bool:
    return self._process is not None and self._process.poll() is None

If Thread A executes self._process is not None (returns True), then Thread B calls stop() which sets self._process = None, then Thread A continues with self._process.poll() → AttributeError.

Expected Behavior

The property should safely handle concurrent access without raising AttributeError.

Actual Behavior

Race condition causes AttributeError: 'NoneType' object has no attribute 'poll'

Suggested Fix

Capture a local reference to avoid TOCTOU:

@property
def is_alive(self) -> bool:
    process = self._process
    return process is not None and process.poll() is None

Category

concurrency

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.

Background and Context

The StdioTransport class in src/cleveragents/lsp/transport.py manages the lifecycle of LSP server subprocesses. The is_alive property is used throughout the LSP subsystem to check whether the underlying process is still running before performing operations. In multi-threaded LSP runtime scenarios — which are the norm during concurrent LSP operations — this property is accessed from multiple threads simultaneously.

The transport's docstring explicitly notes it is "not thread-safe by itself" and that callers must serialise access. However, the is_alive property is used in guards and checks throughout the codebase, and callers cannot always guarantee serialisation at the point of checking liveness. The race window is small but real and reproducible under load.

Current Behavior

When Thread A reads self._process is not None (evaluates to True) and Thread B concurrently calls stop() setting self._process = None, Thread A then evaluates self._process.poll() on a None reference, raising:

AttributeError: 'NoneType' object has no attribute 'poll'

This crash propagates up through the LSP lifecycle manager and can destabilise the entire LSP subsystem.

Expected Behavior

The is_alive property must safely handle concurrent access. Capturing a local reference to self._process before the boolean short-circuit evaluation eliminates the race window entirely, since Python's reference semantics guarantee the local variable cannot be set to None by another thread after capture.

Acceptance Criteria

  • StdioTransport.is_alive captures self._process into a local variable before evaluating
  • No AttributeError is raised when stop() is called concurrently with is_alive access
  • Existing tests continue to pass
  • New Behave scenario demonstrates the race condition is fixed (tagged @tdd_issue, @tdd_issue_<N>)
  • All nox stages pass
  • Coverage >= 97%

Supporting Information

  • Related: #7144 (Race condition in LSP lifecycle manager restart sequence)
  • Related: #6579 (StdioTransport._read_one_message() blocking read)
  • Related: #6575 (StdioTransport.stop() resource leak)
  • Parent Epic: #824 (LSP Functional Runtime)
  • File: src/cleveragents/lsp/transport.py, lines ~61-63

Subtasks

  • Capture self._process into a local variable in is_alive property
  • Verify fix eliminates the race condition window
  • Tests (Behave): Add scenario for concurrent is_alive + stop() access tagged @tdd_issue and @tdd_issue_<N>
  • Tests (Robot): Add integration test for concurrent LSP transport access
  • Verify coverage >= 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## Metadata - **Branch**: `bugfix/m7-stdio-transport-is-alive-race-condition` - **Commit Message**: `fix(lsp): eliminate race condition in StdioTransport.is_alive by capturing local process reference` - **Milestone**: v3.6.0 - **Parent Epic**: #824 ## Bug Report: [Concurrency] — Race condition in StdioTransport.is_alive property causes AttributeError ### Severity Assessment - **Impact**: AttributeError crash when multiple threads access LSP transport concurrently - **Likelihood**: High in multi-threaded LSP runtime scenarios - **Priority**: Critical ### Location - **File**: `src/cleveragents/lsp/transport.py` - **Function/Class**: `StdioTransport.is_alive` property - **Lines**: ~61-63 ### Description The `is_alive` property in StdioTransport has a race condition between checking `_process is not None` and calling `_process.poll()`. Another thread can call `stop()` and set `_process = None` between these operations, leading to AttributeError on `None.poll()`. ### Evidence ```python @property def is_alive(self) -> bool: return self._process is not None and self._process.poll() is None ``` If Thread A executes `self._process is not None` (returns True), then Thread B calls `stop()` which sets `self._process = None`, then Thread A continues with `self._process.poll()` → AttributeError. ### Expected Behavior The property should safely handle concurrent access without raising AttributeError. ### Actual Behavior Race condition causes `AttributeError: 'NoneType' object has no attribute 'poll'` ### Suggested Fix Capture a local reference to avoid TOCTOU: ```python @property def is_alive(self) -> bool: process = self._process return process is not None and process.poll() is None ``` ### Category concurrency ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. ## Background and Context The `StdioTransport` class in `src/cleveragents/lsp/transport.py` manages the lifecycle of LSP server subprocesses. The `is_alive` property is used throughout the LSP subsystem to check whether the underlying process is still running before performing operations. In multi-threaded LSP runtime scenarios — which are the norm during concurrent LSP operations — this property is accessed from multiple threads simultaneously. The transport's docstring explicitly notes it is "not thread-safe by itself" and that callers must serialise access. However, the `is_alive` property is used in guards and checks throughout the codebase, and callers cannot always guarantee serialisation at the point of checking liveness. The race window is small but real and reproducible under load. ## Current Behavior When Thread A reads `self._process is not None` (evaluates to `True`) and Thread B concurrently calls `stop()` setting `self._process = None`, Thread A then evaluates `self._process.poll()` on a `None` reference, raising: ``` AttributeError: 'NoneType' object has no attribute 'poll' ``` This crash propagates up through the LSP lifecycle manager and can destabilise the entire LSP subsystem. ## Expected Behavior The `is_alive` property must safely handle concurrent access. Capturing a local reference to `self._process` before the boolean short-circuit evaluation eliminates the race window entirely, since Python's reference semantics guarantee the local variable cannot be set to `None` by another thread after capture. ## Acceptance Criteria - [ ] `StdioTransport.is_alive` captures `self._process` into a local variable before evaluating - [ ] No `AttributeError` is raised when `stop()` is called concurrently with `is_alive` access - [ ] Existing tests continue to pass - [ ] New Behave scenario demonstrates the race condition is fixed (tagged `@tdd_issue`, `@tdd_issue_<N>`) - [ ] All nox stages pass - [ ] Coverage >= 97% ## Supporting Information - Related: #7144 (Race condition in LSP lifecycle manager restart sequence) - Related: #6579 (StdioTransport._read_one_message() blocking read) - Related: #6575 (StdioTransport.stop() resource leak) - Parent Epic: #824 (LSP Functional Runtime) - File: `src/cleveragents/lsp/transport.py`, lines ~61-63 ## Subtasks - [ ] Capture `self._process` into a local variable in `is_alive` property - [ ] Verify fix eliminates the race condition window - [ ] Tests (Behave): Add scenario for concurrent `is_alive` + `stop()` access tagged `@tdd_issue` and `@tdd_issue_<N>` - [ ] Tests (Robot): Add integration test for concurrent LSP transport access - [ ] Verify coverage >= 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
HAL9000 added this to the v3.5.0 milestone 2026-04-10 08:33:43 +00:00
Author
Owner

Verified — Critical concurrency bug: race condition in StdioTransport.is_alive causes AttributeError. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: race condition in StdioTransport.is_alive causes AttributeError. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Critical concurrency bug: race condition in StdioTransport.is_alive causes AttributeError. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: race condition in StdioTransport.is_alive causes AttributeError. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Verified — Critical concurrency bug: race condition in StdioTransport.is_alive causes AttributeError. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical concurrency bug: race condition in StdioTransport.is_alive causes AttributeError. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#824 Epic: LSP Functional Runtime
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#7179
No description provided.