BUG-HUNT: [resource] Document leak in LSP runtime operations due to missing cleanup #7129

Open
opened 2026-04-10 08:02:01 +00:00 by HAL9000 · 2 comments
Owner

Bug Report: [resource] — Missing or incomplete try/finally in all four LspRuntime file-based operations leaves documents permanently open on LSP servers

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

⚠️ Related issue: #6581 covers the get_diagnostics() and get_completions() aspect of this same bug (missing try/finally). This issue is broader in scope — it covers all four methods including the incomplete try/finally protection in get_hover() and get_definitions(), and frames the problem as a resource-management concern rather than an error-handling inconsistency. Consider consolidating with #6581 or closing one in favour of the other after triage.


Metadata

  • Branch: fix/lsp-runtime-document-leak-missing-cleanup
  • Commit Message: fix(lsp): wrap all did_open/did_close patterns in try/finally for guaranteed document cleanup
  • Milestone: (none — backlog)
  • Parent Epic: (orphan — no LSP Epic found; requires manual linking)

Severity Assessment

  • Severity: High (resource leak)
  • Impact: LSP servers accumulate open documents on any exception during LSP operations, degrading performance and potentially causing memory leaks. On the next call for the same URI, did_open is sent again — many LSP servers reject or warn on duplicate didOpen for the same URI, leading to incorrect server-side state.
  • Likelihood: High — occurs on any exception during LSP operations (server crash, timeout, LspError). These are exactly the scenarios _get_healthy_client is designed to handle via restart_server.

Location

  • File: src/cleveragents/lsp/runtime.py
  • Functions: get_diagnostics(), get_completions(), get_hover(), get_definitions()
  • Lines: 158–164, 219–224, 274–278, 328–332

Description

The LSP runtime uses a pattern of did_open() → LSP operation → did_close() for all file-based operations. This pattern requires try/finally protection to guarantee that did_close() is always sent to the LSP server — even when the intervening operation raises an exception.

The bug manifests in two forms across the four affected methods:

Form 1 — Completely missing try/finally (get_diagnostics, get_completions)

get_diagnostics() and get_completions() have no try/finally protection at all. If the LSP operation raises, did_close() is never called.

# get_diagnostics() — lines 158–164 (BROKEN: no try/finally)
client.did_open(uri, language_id, version=1, text=text)

# Give the server time to process and push diagnostics
diagnostics = client.get_diagnostics(uri)  # ← raises on server crash/timeout

# Close the document — NEVER REACHED if get_diagnostics() raises
client.did_close(uri)
# get_completions() — lines 219–224 (BROKEN: no try/finally)
client.did_open(uri, language_id, version=1, text=text)

completions = client.get_completions(uri, line - 1, column - 1)  # ← raises on crash

client.did_close(uri)   # NEVER REACHED if get_completions() raises

Form 2 — Incomplete try/finally (get_hover, get_definitions)

get_hover() and get_definitions() have a try/finally block, but it only wraps the LSP query call — not the did_open() call. If did_open() itself raises, did_close() is still skipped.

# get_hover() — lines 274–278 (INCOMPLETE: try/finally doesn't cover did_open)
client.did_open(uri, language_id, version=1, text=text)  # ← if THIS raises, did_close is skipped
try:
    hover = client.get_hover(uri, line - 1, column - 1)
finally:
    client.did_close(uri)    # Only called if did_open succeeded
# get_definitions() — lines 328–332 (INCOMPLETE: try/finally doesn't cover did_open)
client.did_open(uri, language_id, version=1, text=text)  # ← if THIS raises, did_close is skipped
try:
    definitions = client.get_definitions(uri, line - 1, column - 1)
finally:
    client.did_close(uri)    # Only called if did_open succeeded

Expected Behavior

Documents should always be closed on the LSP server even if any operation (including did_open itself) fails. Every didOpen notification must be paired with a didClose per the LSP specification.

Actual Behavior

  • get_diagnostics() and get_completions(): Failed operations leave documents open (no try/finally at all).
  • get_hover() and get_definitions(): A did_open() failure leaves documents open (incomplete try/finally).

Suggested Fix

Wrap the entire did_open → operation → did_close sequence in a single try/finally for all four methods:

# Correct pattern for all four methods
client.did_open(uri, language_id, version=1, text=text)
try:
    result = client.<operation>(uri, ...)
finally:
    try:
        client.did_close(uri)
    except Exception:
        logger.warning("Failed to close document %s during cleanup", uri)

This ensures:

  1. did_close() is always called after a successful did_open().
  2. Cleanup failures during exception handling are logged but do not suppress the original exception.

Subtasks

  • Wrap get_diagnostics() did_open/did_close in a try/finally block
  • Wrap get_completions() did_open/did_close in a try/finally block
  • Extend get_hover() try/finally to cover the did_open() call as well
  • Extend get_definitions() try/finally to cover the did_open() call as well
  • Add logging when did_close() fails during exception-path cleanup
  • Write BDD scenario (Behave/Gherkin) capturing the resource leak behaviour with @tdd_issue, @tdd_issue_<N>, and @tdd_expected_fail tags
  • Update inline documentation / docstrings to reflect the guaranteed-cleanup contract
  • Verify no duplicate didOpen warnings appear in LSP server logs under failure conditions

Definition of Done

  • All four methods (get_diagnostics, get_completions, get_hover, get_definitions) use try/finally to guarantee did_close() is called after every successful did_open()
  • did_close() is called even when did_open() itself raises an exception
  • Cleanup failures during exception handling are logged at WARNING level and do not suppress the original exception
  • BDD tests added with @tdd_issue and @tdd_issue_<N> tags; @tdd_expected_fail removed once fix is applied
  • No LSP server accumulates open documents after repeated exception scenarios (verified by integration test or manual inspection)
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## Bug Report: [resource] — Missing or incomplete `try/finally` in all four `LspRuntime` file-based operations leaves documents permanently open on LSP servers > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. > **⚠️ Related issue:** #6581 covers the `get_diagnostics()` and `get_completions()` aspect of this same bug (missing `try/finally`). This issue is broader in scope — it covers all four methods including the **incomplete** `try/finally` protection in `get_hover()` and `get_definitions()`, and frames the problem as a resource-management concern rather than an error-handling inconsistency. Consider consolidating with #6581 or closing one in favour of the other after triage. --- ## Metadata - **Branch**: `fix/lsp-runtime-document-leak-missing-cleanup` - **Commit Message**: `fix(lsp): wrap all did_open/did_close patterns in try/finally for guaranteed document cleanup` - **Milestone**: *(none — backlog)* - **Parent Epic**: *(orphan — no LSP Epic found; requires manual linking)* --- ## Severity Assessment - **Severity**: High (resource leak) - **Impact**: LSP servers accumulate open documents on any exception during LSP operations, degrading performance and potentially causing memory leaks. On the next call for the same URI, `did_open` is sent again — many LSP servers reject or warn on duplicate `didOpen` for the same URI, leading to incorrect server-side state. - **Likelihood**: High — occurs on any exception during LSP operations (server crash, timeout, `LspError`). These are exactly the scenarios `_get_healthy_client` is designed to handle via `restart_server`. --- ## Location - **File**: `src/cleveragents/lsp/runtime.py` - **Functions**: `get_diagnostics()`, `get_completions()`, `get_hover()`, `get_definitions()` - **Lines**: 158–164, 219–224, 274–278, 328–332 --- ## Description The LSP runtime uses a pattern of `did_open()` → LSP operation → `did_close()` for all file-based operations. This pattern requires `try/finally` protection to guarantee that `did_close()` is always sent to the LSP server — even when the intervening operation raises an exception. The bug manifests in two forms across the four affected methods: ### Form 1 — Completely missing `try/finally` (get_diagnostics, get_completions) `get_diagnostics()` and `get_completions()` have **no** `try/finally` protection at all. If the LSP operation raises, `did_close()` is never called. ```python # get_diagnostics() — lines 158–164 (BROKEN: no try/finally) client.did_open(uri, language_id, version=1, text=text) # Give the server time to process and push diagnostics diagnostics = client.get_diagnostics(uri) # ← raises on server crash/timeout # Close the document — NEVER REACHED if get_diagnostics() raises client.did_close(uri) ``` ```python # get_completions() — lines 219–224 (BROKEN: no try/finally) client.did_open(uri, language_id, version=1, text=text) completions = client.get_completions(uri, line - 1, column - 1) # ← raises on crash client.did_close(uri) # NEVER REACHED if get_completions() raises ``` ### Form 2 — Incomplete `try/finally` (get_hover, get_definitions) `get_hover()` and `get_definitions()` have a `try/finally` block, but it only wraps the LSP query call — **not** the `did_open()` call. If `did_open()` itself raises, `did_close()` is still skipped. ```python # get_hover() — lines 274–278 (INCOMPLETE: try/finally doesn't cover did_open) client.did_open(uri, language_id, version=1, text=text) # ← if THIS raises, did_close is skipped try: hover = client.get_hover(uri, line - 1, column - 1) finally: client.did_close(uri) # Only called if did_open succeeded ``` ```python # get_definitions() — lines 328–332 (INCOMPLETE: try/finally doesn't cover did_open) client.did_open(uri, language_id, version=1, text=text) # ← if THIS raises, did_close is skipped try: definitions = client.get_definitions(uri, line - 1, column - 1) finally: client.did_close(uri) # Only called if did_open succeeded ``` --- ## Expected Behavior Documents should **always** be closed on the LSP server even if any operation (including `did_open` itself) fails. Every `didOpen` notification must be paired with a `didClose` per the LSP specification. ## Actual Behavior - `get_diagnostics()` and `get_completions()`: Failed operations leave documents open (no `try/finally` at all). - `get_hover()` and `get_definitions()`: A `did_open()` failure leaves documents open (incomplete `try/finally`). --- ## Suggested Fix Wrap the entire `did_open` → operation → `did_close` sequence in a single `try/finally` for all four methods: ```python # Correct pattern for all four methods client.did_open(uri, language_id, version=1, text=text) try: result = client.<operation>(uri, ...) finally: try: client.did_close(uri) except Exception: logger.warning("Failed to close document %s during cleanup", uri) ``` This ensures: 1. `did_close()` is always called after a successful `did_open()`. 2. Cleanup failures during exception handling are logged but do not suppress the original exception. --- ## Subtasks - [ ] Wrap `get_diagnostics()` `did_open`/`did_close` in a `try/finally` block - [ ] Wrap `get_completions()` `did_open`/`did_close` in a `try/finally` block - [ ] Extend `get_hover()` `try/finally` to cover the `did_open()` call as well - [ ] Extend `get_definitions()` `try/finally` to cover the `did_open()` call as well - [ ] Add logging when `did_close()` fails during exception-path cleanup - [ ] Write BDD scenario (Behave/Gherkin) capturing the resource leak behaviour with `@tdd_issue`, `@tdd_issue_<N>`, and `@tdd_expected_fail` tags - [ ] Update inline documentation / docstrings to reflect the guaranteed-cleanup contract - [ ] Verify no duplicate `didOpen` warnings appear in LSP server logs under failure conditions ## Definition of Done - [ ] All four methods (`get_diagnostics`, `get_completions`, `get_hover`, `get_definitions`) use `try/finally` to guarantee `did_close()` is called after every successful `did_open()` - [ ] `did_close()` is called even when `did_open()` itself raises an exception - [ ] Cleanup failures during exception handling are logged at `WARNING` level and do not suppress the original exception - [ ] BDD tests added with `@tdd_issue` and `@tdd_issue_<N>` tags; `@tdd_expected_fail` removed once fix is applied - [ ] No LSP server accumulates open documents after repeated exception scenarios (verified by integration test or manual inspection) - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
Author
Owner

⚠️ Orphan Issue — Requires Manual Parent Epic Linking

This issue was created by an automated agent and no parent Epic was found for the LSP runtime module at the time of creation.

Per project conventions in CONTRIBUTING.md, every issue must be linked to a parent Epic using Forgejo's dependency system (child blocks parent). This issue is currently an orphan.

Action required: A project maintainer should:

  1. Identify or create the appropriate LSP/runtime Epic for this issue.
  2. Link this issue as a dependency: this issue (#7129) blocks the parent Epic.

Suggested search terms for finding a parent Epic: LSP, lsp runtime, Type/Epic

Related issue: #6581 covers the get_diagnostics() and get_completions() subset of this same bug. Consider whether this issue should be consolidated with #6581 or whether #6581 should be closed in favour of this broader-scoped issue.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## ⚠️ Orphan Issue — Requires Manual Parent Epic Linking This issue was created by an automated agent and **no parent Epic was found** for the LSP runtime module at the time of creation. Per project conventions in `CONTRIBUTING.md`, every issue must be linked to a parent Epic using Forgejo's dependency system (child **blocks** parent). This issue is currently an orphan. **Action required:** A project maintainer should: 1. Identify or create the appropriate LSP/runtime Epic for this issue. 2. Link this issue as a dependency: this issue (#7129) **blocks** the parent Epic. **Suggested search terms for finding a parent Epic:** `LSP`, `lsp runtime`, `Type/Epic` **Related issue:** #6581 covers the `get_diagnostics()` and `get_completions()` subset of this same bug. Consider whether this issue should be consolidated with #6581 or whether #6581 should be closed in favour of this broader-scoped issue. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
Author
Owner

Verified — Resource leak: document leak in LSP runtime operations. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Resource leak: document leak in LSP runtime operations. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7129
No description provided.