UAT: LspLifecycleManager.restart_server() does not re-open tracked documents after restart as specified #3532

Open
opened 2026-04-05 19:00:45 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: bug/lsp-restart-reopen-tracked-documents
  • Commit Message: fix(lsp): re-open tracked documents after server restart in LspLifecycleManager
  • Milestone: (none — backlog)
  • Parent Epic: #824

Backlog note: This issue was discovered during autonomous operation
on milestone v3.6.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Background and Context

Per docs/specification.md (LSP Server Lifecycle section, "Crash Recovery"):

If a language server process crashes, the LSP Runtime restarts it automatically, re-sends the initialize handshake, re-opens tracked documents, and resumes operations without disrupting the actor's execution.

LspLifecycleManager.restart_server() in src/cleveragents/lsp/lifecycle.py currently performs steps 1–3 of crash recovery correctly but omits the critical step 4: re-opening all documents that were open before the crash. The _ManagedServer dataclass has no field to track open documents, so after a restart the language server has no knowledge of previously opened documents.

Current Behavior

LspLifecycleManager.restart_server() (src/cleveragents/lsp/lifecycle.py, lines 148–215):

  1. Stops the old transport
  2. Spawns a new process
  3. Re-sends the initialize handshake
  4. Does NOT re-open tracked documents

The _ManagedServer class (lines 28–50) stores config, transport, client, workspace_path, and ref_count — but has no open_documents field. After restart, the language server has no knowledge of previously opened documents, causing subsequent get_diagnostics(), get_completions(), etc. calls to fail or return empty results.

Steps to reproduce:

  1. Start an LSP server and open a document via did_open()
  2. Simulate a server crash (kill the process)
  3. Call restart_server() — the server restarts and re-initializes
  4. Call get_diagnostics() for the previously opened document
  5. Observe that diagnostics are empty or the call fails because the document was never re-opened

Expected Behavior

After restart_server() completes:

  • All documents that were open before the crash are re-opened on the new server process via did_open()
  • Subsequent LSP operations (get_diagnostics(), get_completions(), etc.) return correct results for those documents
  • The actor's execution continues without disruption, as specified

Subtasks

  • Add open_documents: dict[str, dict] field to _ManagedServer to track open documents (URI → {language_id, version, text})
  • Update LspClient.did_open() call-sites to record documents in _ManagedServer.open_documents
  • Update LspClient.did_close() call-sites to remove documents from _ManagedServer.open_documents
  • In restart_server(), after re-initialization, iterate open_documents and re-open each via did_open()
  • Tests (Behave): Add BDD scenarios covering crash recovery with document re-opening (happy path + empty document set)
  • Tests (Robot): Add integration test verifying end-to-end crash recovery with document re-synchronization
  • Verify coverage >= 97% via nox -e coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • _ManagedServer tracks open documents via an open_documents field
  • restart_server() re-opens all tracked documents after restart, matching the spec's crash recovery guarantee
  • All subtasks above are completed and checked off
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional implementation details
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `bug/lsp-restart-reopen-tracked-documents` - **Commit Message**: `fix(lsp): re-open tracked documents after server restart in LspLifecycleManager` - **Milestone**: *(none — backlog)* - **Parent Epic**: #824 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.6.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Background and Context Per `docs/specification.md` (LSP Server Lifecycle section, "Crash Recovery"): > If a language server process crashes, the LSP Runtime restarts it automatically, re-sends the `initialize` handshake, **re-opens tracked documents**, and resumes operations without disrupting the actor's execution. `LspLifecycleManager.restart_server()` in `src/cleveragents/lsp/lifecycle.py` currently performs steps 1–3 of crash recovery correctly but omits the critical step 4: re-opening all documents that were open before the crash. The `_ManagedServer` dataclass has no field to track open documents, so after a restart the language server has no knowledge of previously opened documents. ## Current Behavior `LspLifecycleManager.restart_server()` (`src/cleveragents/lsp/lifecycle.py`, lines 148–215): 1. ✅ Stops the old transport 2. ✅ Spawns a new process 3. ✅ Re-sends the `initialize` handshake 4. ❌ **Does NOT re-open tracked documents** The `_ManagedServer` class (lines 28–50) stores `config`, `transport`, `client`, `workspace_path`, and `ref_count` — but has no `open_documents` field. After restart, the language server has no knowledge of previously opened documents, causing subsequent `get_diagnostics()`, `get_completions()`, etc. calls to fail or return empty results. **Steps to reproduce:** 1. Start an LSP server and open a document via `did_open()` 2. Simulate a server crash (kill the process) 3. Call `restart_server()` — the server restarts and re-initializes 4. Call `get_diagnostics()` for the previously opened document 5. Observe that diagnostics are empty or the call fails because the document was never re-opened ## Expected Behavior After `restart_server()` completes: - All documents that were open before the crash are re-opened on the new server process via `did_open()` - Subsequent LSP operations (`get_diagnostics()`, `get_completions()`, etc.) return correct results for those documents - The actor's execution continues without disruption, as specified ## Subtasks - [ ] Add `open_documents: dict[str, dict]` field to `_ManagedServer` to track open documents (URI → `{language_id, version, text}`) - [ ] Update `LspClient.did_open()` call-sites to record documents in `_ManagedServer.open_documents` - [ ] Update `LspClient.did_close()` call-sites to remove documents from `_ManagedServer.open_documents` - [ ] In `restart_server()`, after re-initialization, iterate `open_documents` and re-open each via `did_open()` - [ ] Tests (Behave): Add BDD scenarios covering crash recovery with document re-opening (happy path + empty document set) - [ ] Tests (Robot): Add integration test verifying end-to-end crash recovery with document re-synchronization - [ ] Verify coverage >= 97% via `nox -e coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - `_ManagedServer` tracks open documents via an `open_documents` field - `restart_server()` re-opens all tracked documents after restart, matching the spec's crash recovery guarantee - All subtasks above are completed and checked off - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional implementation details - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done - All nox stages pass - Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Medium — Crash recovery without document re-opening means LSP operations return stale results after a server restart. This is a correctness issue but only manifests during crash recovery scenarios.
  • Milestone: v3.6.0 — LSP lifecycle management is in scope for M7 (Epic #824).
  • Story Points: 3 — M — Requires adding a tracking field to _ManagedServer, updating did_open/did_close call-sites, and adding re-open logic to restart_server(). Well-scoped.
  • MoSCoW: Should Have — The spec explicitly describes crash recovery with document re-opening. Important for reliability but only affects the crash recovery path.
  • Parent Epic: #824 (LSP Functional Runtime)

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Medium — Crash recovery without document re-opening means LSP operations return stale results after a server restart. This is a correctness issue but only manifests during crash recovery scenarios. - **Milestone**: v3.6.0 — LSP lifecycle management is in scope for M7 (Epic #824). - **Story Points**: 3 — M — Requires adding a tracking field to `_ManagedServer`, updating `did_open`/`did_close` call-sites, and adding re-open logic to `restart_server()`. Well-scoped. - **MoSCoW**: Should Have — The spec explicitly describes crash recovery with document re-opening. Important for reliability but only affects the crash recovery path. - **Parent Epic**: #824 (LSP Functional Runtime) --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo added this to the v3.6.0 milestone 2026-04-05 19:37:45 +00:00
freemo removed this from the v3.6.0 milestone 2026-04-06 23:43:20 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#824 Epic: LSP Functional Runtime
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3532
No description provided.