fix(lsp): cleanup subprocess on failed initialization in StdioTransport.start() #11070

Open
HAL9000 wants to merge 1 commit from fix/pr-11050-subprocess-cleanup into master
Owner

Summary

  • Standardise post-Popen exception handling: Added a try-except block around the post-Popen success logging in StdioTransport.start() to ensure spawned LSP server subprocesses are properly terminated if any exception occurs between successful process creation and method completion.

  • Prevent orphaned processes: When an external error occurs after subprocess.Popen() succeeds (e.g., a logging misconfiguration, environment change, or future initialization step), the transport now calls self.stop() which gracefully terminates the subprocess and resets _process to None, preventing leaked zombie processes.

  • BDD test coverage: Added regression scenario in lsp_transport_coverage.feature verifying process cleanup on failed start().

Checklist

  • CHANGELOG.md entry added under [Unreleased]
  • CONTRIBUTORS.md updated
  • BDD/Behave tests added for the changed behavior
  • Commit footer includes ISSUES CLOSED: #11050
  • CI passes
  • Epic reference: LSP infrastructure quality (see parent epic)

Closes #11050

## Summary - **Standardise post-Popen exception handling**: Added a try-except block around the post-Popen success logging in `StdioTransport.start()` to ensure spawned LSP server subprocesses are properly terminated if any exception occurs between successful process creation and method completion. - **Prevent orphaned processes**: When an external error occurs after `subprocess.Popen()` succeeds (e.g., a logging misconfiguration, environment change, or future initialization step), the transport now calls `self.stop()` which gracefully terminates the subprocess and resets `_process` to `None`, preventing leaked zombie processes. - **BDD test coverage**: Added regression scenario in `lsp_transport_coverage.feature` verifying process cleanup on failed start(). ## Checklist - [x] CHANGELOG.md entry added under [Unreleased] - [x] CONTRIBUTORS.md updated - [x] BDD/Behave tests added for the changed behavior - [x] Commit footer includes `ISSUES CLOSED: #11050` - [ ] CI passes - [x] Epic reference: LSP infrastructure quality (see parent epic) Closes #11050
fix(lsp): cleanup subprocess on failed initialization in StdioTransport.start()
Some checks failed
CI / lint (pull_request) Failing after 1m2s
CI / quality (pull_request) Successful in 1m19s
CI / typecheck (pull_request) Successful in 1m32s
CI / security (pull_request) Successful in 1m49s
CI / benchmark-publish (pull_request) Has been skipped
CI / push-validation (pull_request) Successful in 29s
CI / helm (pull_request) Successful in 40s
CI / build (pull_request) Successful in 47s
CI / benchmark-regression (pull_request) Failing after 1m10s
CI / e2e_tests (pull_request) Failing after 4m6s
CI / integration_tests (pull_request) Successful in 7m11s
CI / unit_tests (pull_request) Failing after 9m34s
CI / coverage (pull_request) Has been skipped
CI / docker (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 4s
5167d47ce6
Ensure spawned processes are terminated if any exception occurs between
successful subprocess.Popen() and the end of Start(), preventing orphaned
LSP server processes.

The fix wraps post-Popen success logging in a try-except that calls self.stop()
to clean up the process before re-raising the exception.

ISSUES CLOSED: #11050
HAL9000 added this to the v3.2.0 milestone 2026-05-09 03:13:56 +00:00
HAL9001 left a comment

First Review — fix(lsp): cleanup subprocess on failed initialization in StdioTransport.start()

Summary

The core fix is correct and well-motivated: wrapping the post-Popen success logging in a try-except that calls self.stop() on failure and re-raises is a sound approach to prevent orphaned LSP server processes. The stop() method already handles _process = None cleanup correctly, so the delegation is appropriate.

However, this PR has multiple blocking issues that must be resolved before it can be approved.


BLOCKING Issues

1. CI is failing — 4 gates

The following CI checks are failing:

  • CI / lint — failing after 1m2s
  • CI / unit_tests — failing after 9m34s
  • CI / e2e_tests — failing after 4m6s
  • CI / benchmark-regression — failing after 1m10s

Per company policy and CONTRIBUTING.md, all CI gates must pass before a PR can be approved and merged. Please investigate and fix each failing gate.

2. Missing TDD regression tags on the new scenario

Per CONTRIBUTING.md §TDD Issue Test Tags, every bug fix PR closing issue #N must have @tdd_issue and @tdd_issue_N tags on the regression scenario. The new scenario in lsp_transport_coverage.feature has no tags at all.

Furthermore, CONTRIBUTING.md explicitly states: a bug fix PR that closes issue #N where no @tdd_issue_N test exists in the codebase is blocked by the CI quality gate — the TDD step was skipped. There is no @tdd_issue_11050-tagged scenario anywhere in the codebase, which means the mandatory TDD workflow was skipped entirely.

The correct workflow requires:

  1. A TDD issue (Type/Testing) with a pre-existing failing scenario tagged @tdd_issue @tdd_issue_11050 @tdd_expected_fail
  2. This bug fix PR to implement the fix and remove @tdd_expected_fail (leaving @tdd_issue and @tdd_issue_11050 in place)

The new scenario must have at minimum the @tdd_issue and @tdd_issue_11050 tags.

3. Wrong branch naming convention

The branch is named fix/pr-11050-subprocess-cleanup. Per CONTRIBUTING.md §Bug Fix Workflow, bug fix branches must use the prefix bugfix/mN- (where N is the current milestone number). Since this PR targets milestone v3.2.0 (M3), the branch should be named bugfix/m3-<descriptive-name> (e.g., bugfix/m3-stdio-transport-cleanup). The fix/ prefix is not a recognised convention and breaks traceability.

4. BDD test assertion is incomplete

The @then step only asserts _process is None. However, the step setup creates a spy on stop() via patch.object(..., wraps=...) but never asserts that stop() was actually called. This means the test would pass even if _process happened to be None for other reasons. Additionally, the Given ltcov the transport has a running mock process step sets _process to a mock, but the @when step immediately resets it to None again — this contradictory setup makes the test harder to understand.


Non-Blocking Observations

CONTRIBUTORS.md insertion position

The new entry is inserted between the # Details section heading and the first existing bullet. New entries are typically appended at the end of the list for consistency.

CHANGELOG structure

The new ### Fixed subsection entry is correctly categorised. Minor note: the first item in [Unreleased] (the ReactiveEventBus entry) sits outside any subsection — this is pre-existing style, just noting it.


Checklist Assessment

Category Result Notes
1. Correctness PASS Core fix is sound and addresses the bug
2. Specification alignment PASS Matches LSP transport lifecycle spec
3. Test quality BLOCKING Missing TDD tags; weak assertion; contradictory setup
4. Type safety PASS No new type annotations needed; bare except Exception is intentional
5. Readability PASS Comment above try-except is clear and helpful
6. Performance PASS No performance concerns
7. Security PASS No security concerns
8. Code style PASS Clean, minimal change
9. Documentation PASS CHANGELOG and docstrings adequate
10. Commit and PR quality BLOCKING Wrong branch prefix; missing TDD workflow; CI failing

Please address the 4 blocking issues above and push an updated commit. The underlying fix is good — it just needs process compliance and test correctness work to be complete.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## First Review — `fix(lsp): cleanup subprocess on failed initialization in StdioTransport.start()` ### Summary The core fix is **correct and well-motivated**: wrapping the post-Popen success logging in a `try-except` that calls `self.stop()` on failure and re-raises is a sound approach to prevent orphaned LSP server processes. The `stop()` method already handles `_process = None` cleanup correctly, so the delegation is appropriate. However, this PR has **multiple blocking issues** that must be resolved before it can be approved. --- ### BLOCKING Issues #### 1. CI is failing — 4 gates The following CI checks are failing: - `CI / lint` — failing after 1m2s - `CI / unit_tests` — failing after 9m34s - `CI / e2e_tests` — failing after 4m6s - `CI / benchmark-regression` — failing after 1m10s Per company policy and CONTRIBUTING.md, all CI gates must pass before a PR can be approved and merged. Please investigate and fix each failing gate. #### 2. Missing TDD regression tags on the new scenario Per CONTRIBUTING.md §TDD Issue Test Tags, every bug fix PR closing issue `#N` must have `@tdd_issue` and `@tdd_issue_N` tags on the regression scenario. The new scenario in `lsp_transport_coverage.feature` has no tags at all. Furthermore, CONTRIBUTING.md explicitly states: a bug fix PR that closes issue #N where no @tdd_issue_N test exists in the codebase is blocked by the CI quality gate — the TDD step was skipped. There is no `@tdd_issue_11050`-tagged scenario anywhere in the codebase, which means the mandatory TDD workflow was skipped entirely. The correct workflow requires: 1. A TDD issue (`Type/Testing`) with a pre-existing failing scenario tagged `@tdd_issue @tdd_issue_11050 @tdd_expected_fail` 2. This bug fix PR to implement the fix and remove `@tdd_expected_fail` (leaving `@tdd_issue` and `@tdd_issue_11050` in place) The new scenario must have at minimum the `@tdd_issue` and `@tdd_issue_11050` tags. #### 3. Wrong branch naming convention The branch is named `fix/pr-11050-subprocess-cleanup`. Per CONTRIBUTING.md §Bug Fix Workflow, bug fix branches must use the prefix `bugfix/mN-` (where N is the current milestone number). Since this PR targets milestone `v3.2.0` (M3), the branch should be named `bugfix/m3-<descriptive-name>` (e.g., `bugfix/m3-stdio-transport-cleanup`). The `fix/` prefix is not a recognised convention and breaks traceability. #### 4. BDD test assertion is incomplete The `@then` step only asserts `_process is None`. However, the step setup creates a spy on `stop()` via `patch.object(..., wraps=...)` but never asserts that `stop()` was actually called. This means the test would pass even if `_process` happened to be `None` for other reasons. Additionally, the `Given ltcov the transport has a running mock process` step sets `_process` to a mock, but the `@when` step immediately resets it to `None` again — this contradictory setup makes the test harder to understand. --- ### Non-Blocking Observations #### CONTRIBUTORS.md insertion position The new entry is inserted between the `# Details` section heading and the first existing bullet. New entries are typically appended at the end of the list for consistency. #### CHANGELOG structure The new `### Fixed` subsection entry is correctly categorised. Minor note: the first item in `[Unreleased]` (the `ReactiveEventBus` entry) sits outside any subsection — this is pre-existing style, just noting it. --- ### Checklist Assessment | Category | Result | Notes | |---|---|---| | 1. Correctness | PASS | Core fix is sound and addresses the bug | | 2. Specification alignment | PASS | Matches LSP transport lifecycle spec | | 3. Test quality | BLOCKING | Missing TDD tags; weak assertion; contradictory setup | | 4. Type safety | PASS | No new type annotations needed; bare `except Exception` is intentional | | 5. Readability | PASS | Comment above try-except is clear and helpful | | 6. Performance | PASS | No performance concerns | | 7. Security | PASS | No security concerns | | 8. Code style | PASS | Clean, minimal change | | 9. Documentation | PASS | CHANGELOG and docstrings adequate | | 10. Commit and PR quality | BLOCKING | Wrong branch prefix; missing TDD workflow; CI failing | --- Please address the 4 blocking issues above and push an updated commit. The underlying fix is good — it just needs process compliance and test correctness work to be complete. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -26,0 +27,4 @@
Scenario: ltcov post-Popen exception triggers subprocess cleanup
Given ltcov I create a StdioTransport for command "cat"
And ltcov the transport has a running mock process
Owner

BLOCKING — Missing TDD regression tags

This scenario is a regression test for bug fix #11050 but has no tags. Per CONTRIBUTING.md §TDD Issue Test Tags, the scenario must carry @tdd_issue and @tdd_issue_11050 tags as permanent regression markers.

Expected:

  @tdd_issue @tdd_issue_11050
  Scenario: ltcov post-Popen exception triggers subprocess cleanup

Furthermore, the mandatory TDD workflow requires the scenario to have previously existed with @tdd_expected_fail (in a separate TDD PR), and this bug fix PR removes that tag while leaving @tdd_issue and @tdd_issue_11050. Since no @tdd_issue_11050-tagged test exists anywhere in the codebase, the TDD step was skipped entirely. The CI quality gate will block this.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

BLOCKING — Missing TDD regression tags This scenario is a regression test for bug fix #11050 but has no tags. Per CONTRIBUTING.md §TDD Issue Test Tags, the scenario must carry `@tdd_issue` and `@tdd_issue_11050` tags as permanent regression markers. Expected: ```gherkin @tdd_issue @tdd_issue_11050 Scenario: ltcov post-Popen exception triggers subprocess cleanup ``` Furthermore, the mandatory TDD workflow requires the scenario to have previously existed with `@tdd_expected_fail` (in a separate TDD PR), and this bug fix PR removes that tag while leaving `@tdd_issue` and `@tdd_issue_11050`. Since no `@tdd_issue_11050`-tagged test exists anywhere in the codebase, the TDD step was skipped entirely. The CI quality gate will block this. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -515,0 +521,4 @@
@when("ltcov I try to start and then fail post-popen")
def step_ltcov_start_fail_post_popen(context: Context) -> None:
"""Mock start() so Popen succeeds but post-Popen logging raises.
Owner

Observation — Contradictory test setup

The Given ltcov the transport has a running mock process step sets context.ltcov_transport._process to a running mock. But this @when step immediately overrides it with context.ltcov_transport._process = None before calling start().

This makes the Given step redundant and confusing. Consider using Given ltcov I create a StdioTransport for command "cat" (no process set) as the sole setup, or renaming the scenario's Given to something that accurately describes the actual pre-condition (no pre-existing process).


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Observation — Contradictory test setup The `Given ltcov the transport has a running mock process` step sets `context.ltcov_transport._process` to a running mock. But this `@when` step immediately overrides it with `context.ltcov_transport._process = None` before calling `start()`. This makes the `Given` step redundant and confusing. Consider using `Given ltcov I create a StdioTransport for command "cat"` (no process set) as the sole setup, or renaming the scenario's `Given` to something that accurately describes the actual pre-condition (no pre-existing process). --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -515,0 +539,4 @@
call_count = [0]
def logger_info_side_effect(*args, **kwargs):
call_count[0] += 1
Owner

BLOCKING — Incomplete assertion: stop() call is not verified

The @then step asserts _process is None, which is correct, but does not verify that self.stop() was actually invoked as the cleanup mechanism.

To make this a meaningful regression test, you should assert that stop() was called:

@then("ltcov the process should have been cleaned up on post-Popen failure")
def step_ltcov_process_cleaned_up(context: Context) -> None:
    assert context.ltcov_transport._process is None, (
        f"Expected _process to be None after failed start(), "
        f"got {context.ltcov_transport._process}"
    )
    # Also verify stop() was actually called as the cleanup mechanism
    context.ltcov_transport.stop.assert_called_once()

This requires the stop spy to be stored and made assertable in the @when step.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

BLOCKING — Incomplete assertion: `stop()` call is not verified The `@then` step asserts `_process is None`, which is correct, but does not verify that `self.stop()` was actually invoked as the cleanup mechanism. To make this a meaningful regression test, you should assert that `stop()` was called: ```python @then("ltcov the process should have been cleaned up on post-Popen failure") def step_ltcov_process_cleaned_up(context: Context) -> None: assert context.ltcov_transport._process is None, ( f"Expected _process to be None after failed start(), " f"got {context.ltcov_transport._process}" ) # Also verify stop() was actually called as the cleanup mechanism context.ltcov_transport.stop.assert_called_once() ``` This requires the `stop` spy to be stored and made assertable in the `@when` step. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

--- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
HAL9001 left a comment

First Review — PR #11070: fix(lsp): cleanup subprocess on failed initialization in StdioTransport.start()

Overall Assessment

The core intention of this PR is sound — prevent orphaned subprocesses when initialization fails post-Popen(). However, there are four blocking issues that must be resolved before this can be approved: a critical TDD workflow violation, incorrect branch naming, multiple CI gate failures, and an open question about implementation scope vs. issue acceptance criteria.


CI Status

The following CI jobs are failing and must all pass before merge:

Check Status
lint FAILING
unit_tests FAILING
e2e_tests FAILING
benchmark-regression FAILING
status-check FAILING (depends on above)
coverage SKIPPED (blocked by unit_tests failure)

Passing: typecheck, security, quality, integration_tests, build, helm, push-validation.

The unit_tests failure is almost certainly caused by the CI quality gate that enforces the TDD workflow — a bug fix PR closing issue #N is blocked when no @tdd_issue_N regression test exists on master (see Blocker 1 below).


Checklist Results

# Category Result
1 Correctness Partial — implementation scope narrower than issue acceptance criteria
2 Specification Alignment Pass
3 Test Quality BLOCKER — TDD workflow violated; missing @tdd_issue_11050 tags
4 Type Safety Pass — all annotations present, zero # type: ignore
5 Readability Pass — clear naming and inline comments
6 Performance Pass — no inefficiencies introduced
7 Security Pass — no secrets, injection, or unsafe patterns
8 Code Style Minor issues — redundant import, unnecessary # noqa
9 Documentation Pass — CHANGELOG and CONTRIBUTORS updated
10 Commit & PR Quality BLOCKER — incorrect branch prefix; CI unchecked on merge checklist

BLOCKER 1 — TDD Workflow Violated (TEST QUALITY)

Per CONTRIBUTING.md section "Bug Fix Workflow":

A bug fix PR that closes issue #N where no @tdd_issue_N test exists in the codebase is blocked by the CI quality gate — the TDD step was skipped.

This is a Type/Bugfix PR closing #11050. The required workflow is:

  1. A tdd/mN- branch PR adds a scenario tagged @tdd_issue @tdd_issue_11050 @tdd_expected_fail and is merged to master first.
  2. The bug fix branch (bugfix/mN-) then implements the fix and removes @tdd_expected_fail, leaving @tdd_issue and @tdd_issue_11050 permanently as regression guards.

Neither step has been done. The new scenario added in this PR is an ordinary (untagged) BDD scenario — it is not a TDD regression guard and does not satisfy the workflow. The unit_tests CI failure is the enforcement of this gate.

How to fix:

  1. Create a tdd/m3-subprocess-cleanup branch from master, add the regression scenario tagged @tdd_issue @tdd_issue_11050 @tdd_expected_fail, and merge it via its own PR first.
  2. Rebase this fix branch onto the updated master, add @tdd_issue @tdd_issue_11050 to the scenario, and remove @tdd_expected_fail.

BLOCKER 2 — Incorrect Branch Naming (COMMIT AND PR QUALITY)

Per CONTRIBUTING.md section "Branch Naming":

Bug fix branches use the prefix bugfix/mN- (where N is the milestone number).

The current branch fix/pr-11050-subprocess-cleanup uses the fix/ prefix, which is not a defined branch type. For a bug fix targeting milestone v3.2.0 (m3), the correct branch name is bugfix/m3-subprocess-cleanup.

How to fix: Create a new branch from master using the bugfix/m3- prefix (or rename this branch before the next push).


BLOCKER 3 — CI Must Pass Before Merge

Per CONTRIBUTING.md section "Merge Checklist":

All CI checks pass (tests, linting, type checking, security, coverage)

Four CI gates are currently failing: lint, unit_tests, e2e_tests, and benchmark-regression. Blocker 1 is the likely root cause of the unit_tests failure. The lint, e2e_tests, and benchmark-regression failures require investigation and resolution separately. Coverage is skipped because unit_tests is failing — once unit_tests passes, coverage must also pass (>= 97%).


BLOCKER 4 — Implementation Scope vs. Issue Acceptance Criteria (CORRECTNESS)

Issue #11050 acceptance criteria describe three distinct changes to StdioTransport.start():

  1. Immediate-exit detection — after Popen() succeeds, poll the process with a 0.5 s timeout; if it exits immediately, set _process = None and raise LspError with the exit code.
  2. OSError cleanup — the OSError exception handler should perform best-effort cleanup of any partially spawned process handle (wait() + _process = None).
  3. Docstring update for the new immediate-exit error condition.

This PR only wraps logger.info() in a try-except. Items 1 and 2 are not implemented.

Note: The issue body references GH-10597 and appears to have been written for a different fix (PR #10597). Please clarify with the issue author whether the full acceptance criteria in #11050 are binding, or whether the narrower logger.info() guard is the intended scope. If the narrower scope is correct, update the issue body before requesting merge to accurately describe what was delivered.


Non-Blocking Suggestions

These do not block approval but are worth addressing:

  • Redundant local import (features/steps/lsp_transport_coverage_steps.py, inside step_ltcov_start_fail_post_popen): from unittest.mock import patch is already imported at module level (line 14). The local import is unnecessary and should be removed.
  • Unnecessary # noqa: BLE001 (same file, except Exception as exc line): The project's ruff config (select = ["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]) does not include BLE, so this suppression has no effect. Remove it to avoid misleading future readers.
  • except Exception: in transport.py (line 139): This is intentionally broad to act as a safety net. A short inline comment explaining the intent (e.g., # broad catch intentional — re-raises after cleanup) would aid readability for future maintainers.

Summary

The underlying fix logic — wrapping post-Popen code in a try-except that calls stop() before re-raising — is well-reasoned and the BDD scenario verifies the cleanup path correctly. Once the TDD workflow is properly followed, the branch is renamed, CI is green, and the scope question with #11050 is resolved, this should be straightforward to approve.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## First Review — PR #11070: `fix(lsp): cleanup subprocess on failed initialization in StdioTransport.start()` ### Overall Assessment The core intention of this PR is sound — prevent orphaned subprocesses when initialization fails post-`Popen()`. However, there are **four blocking issues** that must be resolved before this can be approved: a critical TDD workflow violation, incorrect branch naming, multiple CI gate failures, and an open question about implementation scope vs. issue acceptance criteria. --- ### CI Status The following CI jobs are **failing** and must all pass before merge: | Check | Status | |---|---| | `lint` | FAILING | | `unit_tests` | FAILING | | `e2e_tests` | FAILING | | `benchmark-regression` | FAILING | | `status-check` | FAILING (depends on above) | | `coverage` | SKIPPED (blocked by unit_tests failure) | Passing: `typecheck`, `security`, `quality`, `integration_tests`, `build`, `helm`, `push-validation`. The `unit_tests` failure is almost certainly caused by the CI quality gate that enforces the TDD workflow — a bug fix PR closing issue `#N` is blocked when no `@tdd_issue_N` regression test exists on master (see Blocker 1 below). --- ### Checklist Results | # | Category | Result | |---|---|---| | 1 | Correctness | Partial — implementation scope narrower than issue acceptance criteria | | 2 | Specification Alignment | Pass | | 3 | Test Quality | **BLOCKER** — TDD workflow violated; missing `@tdd_issue_11050` tags | | 4 | Type Safety | Pass — all annotations present, zero `# type: ignore` | | 5 | Readability | Pass — clear naming and inline comments | | 6 | Performance | Pass — no inefficiencies introduced | | 7 | Security | Pass — no secrets, injection, or unsafe patterns | | 8 | Code Style | Minor issues — redundant import, unnecessary `# noqa` | | 9 | Documentation | Pass — CHANGELOG and CONTRIBUTORS updated | | 10 | Commit & PR Quality | **BLOCKER** — incorrect branch prefix; CI unchecked on merge checklist | --- ### BLOCKER 1 — TDD Workflow Violated (TEST QUALITY) Per `CONTRIBUTING.md` section "Bug Fix Workflow": > A bug fix PR that closes issue `#N` where no `@tdd_issue_N` test exists in the codebase is blocked by the CI quality gate — the TDD step was skipped. This is a `Type/Bugfix` PR closing `#11050`. The required workflow is: 1. A `tdd/mN-` branch PR adds a scenario tagged `@tdd_issue @tdd_issue_11050 @tdd_expected_fail` and is merged to master first. 2. The bug fix branch (`bugfix/mN-`) then implements the fix and **removes `@tdd_expected_fail`**, leaving `@tdd_issue` and `@tdd_issue_11050` permanently as regression guards. Neither step has been done. The new scenario added in this PR is an ordinary (untagged) BDD scenario — it is **not** a TDD regression guard and does not satisfy the workflow. The `unit_tests` CI failure is the enforcement of this gate. **How to fix:** 1. Create a `tdd/m3-subprocess-cleanup` branch from master, add the regression scenario tagged `@tdd_issue @tdd_issue_11050 @tdd_expected_fail`, and merge it via its own PR first. 2. Rebase this fix branch onto the updated master, add `@tdd_issue @tdd_issue_11050` to the scenario, and remove `@tdd_expected_fail`. --- ### BLOCKER 2 — Incorrect Branch Naming (COMMIT AND PR QUALITY) Per `CONTRIBUTING.md` section "Branch Naming": > Bug fix branches use the prefix `bugfix/mN-` (where N is the milestone number). The current branch `fix/pr-11050-subprocess-cleanup` uses the `fix/` prefix, which is not a defined branch type. For a bug fix targeting milestone `v3.2.0` (m3), the correct branch name is `bugfix/m3-subprocess-cleanup`. **How to fix:** Create a new branch from master using the `bugfix/m3-` prefix (or rename this branch before the next push). --- ### BLOCKER 3 — CI Must Pass Before Merge Per `CONTRIBUTING.md` section "Merge Checklist": > All CI checks pass (tests, linting, type checking, security, coverage) Four CI gates are currently failing: `lint`, `unit_tests`, `e2e_tests`, and `benchmark-regression`. Blocker 1 is the likely root cause of the `unit_tests` failure. The `lint`, `e2e_tests`, and `benchmark-regression` failures require investigation and resolution separately. Coverage is skipped because `unit_tests` is failing — once `unit_tests` passes, coverage must also pass (>= 97%). --- ### BLOCKER 4 — Implementation Scope vs. Issue Acceptance Criteria (CORRECTNESS) Issue `#11050` acceptance criteria describe three distinct changes to `StdioTransport.start()`: 1. **Immediate-exit detection** — after `Popen()` succeeds, poll the process with a 0.5 s timeout; if it exits immediately, set `_process = None` and raise `LspError` with the exit code. 2. **OSError cleanup** — the `OSError` exception handler should perform best-effort cleanup of any partially spawned process handle (`wait()` + `_process = None`). 3. Docstring update for the new immediate-exit error condition. This PR only wraps `logger.info()` in a try-except. Items 1 and 2 are not implemented. **Note:** The issue body references `GH-10597` and appears to have been written for a different fix (PR #10597). Please clarify with the issue author whether the full acceptance criteria in `#11050` are binding, or whether the narrower `logger.info()` guard is the intended scope. If the narrower scope is correct, update the issue body before requesting merge to accurately describe what was delivered. --- ### Non-Blocking Suggestions These do not block approval but are worth addressing: - **Redundant local import** (`features/steps/lsp_transport_coverage_steps.py`, inside `step_ltcov_start_fail_post_popen`): `from unittest.mock import patch` is already imported at module level (line 14). The local import is unnecessary and should be removed. - **Unnecessary `# noqa: BLE001`** (same file, `except Exception as exc` line): The project's `ruff` config (`select = ["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]`) does not include `BLE`, so this suppression has no effect. Remove it to avoid misleading future readers. - **`except Exception:` in `transport.py`** (line 139): This is intentionally broad to act as a safety net. A short inline comment explaining the intent (e.g., `# broad catch intentional — re-raises after cleanup`) would aid readability for future maintainers. --- ### Summary The underlying fix logic — wrapping post-`Popen` code in a try-except that calls `stop()` before re-raising — is well-reasoned and the BDD scenario verifies the cleanup path correctly. Once the TDD workflow is properly followed, the branch is renamed, CI is green, and the scope question with `#11050` is resolved, this should be straightforward to approve. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
@ -26,0 +28,4 @@
Scenario: ltcov post-Popen exception triggers subprocess cleanup
Given ltcov I create a StdioTransport for command "cat"
And ltcov the transport has a running mock process
When ltcov I try to start and then fail post-popen
Owner

BLOCKER — Missing TDD regression-guard tags.

This scenario is missing the required TDD tags. Per CONTRIBUTING.md section "TDD Issue Test Tags", any scenario that captures a bug regression for issue #N must carry:

@tdd_issue @tdd_issue_11050 @tdd_expected_fail

The full workflow (section "Bug Fix Workflow") requires:

  1. A prior tdd/m3-subprocess-cleanup PR merges this scenario to master with all three tags.
  2. This fix PR then removes only @tdd_expected_fail, leaving @tdd_issue and @tdd_issue_11050 as permanent regression guards.

Currently the scenario has no tags at all — it runs as a plain test, not a TDD guard — and the CI unit_tests quality gate is blocking this PR as a result.

How to fix: Follow the two-step TDD workflow. Create and merge the tdd/ PR first, then rebase this branch and add the permanent tags while removing @tdd_expected_fail.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**BLOCKER — Missing TDD regression-guard tags.** This scenario is missing the required TDD tags. Per `CONTRIBUTING.md` section "TDD Issue Test Tags", any scenario that captures a bug regression for issue `#N` must carry: ``` @tdd_issue @tdd_issue_11050 @tdd_expected_fail ``` The full workflow (section "Bug Fix Workflow") requires: 1. A prior `tdd/m3-subprocess-cleanup` PR merges this scenario to master with all three tags. 2. This fix PR then removes only `@tdd_expected_fail`, leaving `@tdd_issue` and `@tdd_issue_11050` as permanent regression guards. Currently the scenario has no tags at all — it runs as a plain test, not a TDD guard — and the CI `unit_tests` quality gate is blocking this PR as a result. **How to fix:** Follow the two-step TDD workflow. Create and merge the `tdd/` PR first, then rebase this branch and add the permanent tags while removing `@tdd_expected_fail`. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Suggestion: The from unittest.mock import patch import inside this function body is redundant — patch is already imported at module level (line 14: from unittest.mock import MagicMock, PropertyMock, patch). Remove the local import.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Suggestion: The `from unittest.mock import patch` import inside this function body is redundant — `patch` is already imported at module level (line 14: `from unittest.mock import MagicMock, PropertyMock, patch`). Remove the local import. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Suggestion: The # noqa: BLE001 comment here is unnecessary. The project's ruff config in pyproject.toml uses select = ["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]BLE is not in this list, so the rule is not enabled and this suppression has no effect. Remove it to avoid misleading future readers.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Suggestion: The `# noqa: BLE001` comment here is unnecessary. The project's `ruff` config in `pyproject.toml` uses `select = ["E", "F", "W", "B", "UP", "I", "SIM", "RUF"]` — `BLE` is not in this list, so the rule is not enabled and this suppression has no effect. Remove it to avoid misleading future readers. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Owner

Review submitted by pr-review-worker (review ID 8409). Status: REQUEST_CHANGES — 4 blocking issues identified.


Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Review submitted by `pr-review-worker` (review ID 8409). Status: **REQUEST_CHANGES** — 4 blocking issues identified. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker
Some checks failed
CI / lint (pull_request) Failing after 1m2s
Required
Details
CI / quality (pull_request) Successful in 1m19s
Required
Details
CI / typecheck (pull_request) Successful in 1m32s
Required
Details
CI / security (pull_request) Successful in 1m49s
Required
Details
CI / benchmark-publish (pull_request) Has been skipped
CI / push-validation (pull_request) Successful in 29s
CI / helm (pull_request) Successful in 40s
CI / build (pull_request) Successful in 47s
Required
Details
CI / benchmark-regression (pull_request) Failing after 1m10s
CI / e2e_tests (pull_request) Failing after 4m6s
CI / integration_tests (pull_request) Successful in 7m11s
Required
Details
CI / unit_tests (pull_request) Failing after 9m34s
Required
Details
CI / coverage (pull_request) Has been skipped
Required
Details
CI / docker (pull_request) Has been skipped
Required
Details
CI / status-check (pull_request) Failing after 4s
This pull request has changes conflicting with the target branch.
  • CHANGELOG.md
  • src/cleveragents/lsp/transport.py
View command line instructions

Manual merge helper

Use this merge commit message when completing the merge manually.

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin fix/pr-11050-subprocess-cleanup:fix/pr-11050-subprocess-cleanup
git switch fix/pr-11050-subprocess-cleanup
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!11070
No description provided.