BUG-HUNT: [security] Header injection vulnerability in LSP transport ASCII decoding #7112

Open
opened 2026-04-10 07:50:36 +00:00 by HAL9000 · 2 comments
Owner

Background and Context

The LSP transport layer in src/cleveragents/lsp/transport.py uses ASCII decoding with errors="replace" when parsing LSP message headers inside _read_one_message(). This silently converts any non-ASCII bytes to Unicode replacement characters (\uFFFD / ?), rather than rejecting them outright.

Because the replacement is silent, a malicious or misbehaving peer can craft headers that contain non-ASCII byte sequences in positions that affect protocol parsing — most critically inside the Content-Length value. The int() conversion that follows will then either raise ValueError (triggering a silent return None that desynchronises the stream) or, in more subtle cases, produce an unexpected integer if the replacement character happens to land in a way that still parses. Either outcome allows an attacker to manipulate the transport's view of message boundaries, enabling protocol-level injection or desynchronisation attacks.

The vulnerability is present in the _read_one_message() method and is confirmed by direct inspection of the source.

Evidence

# src/cleveragents/lsp/transport.py, line 240
decoded = line.decode("ascii", errors="replace").strip()
if not decoded:
    break  # Empty line = end of headers
if decoded.lower().startswith("content-length:"):
    try:
        content_length = int(decoded.split(":", 1)[1].strip())
    except ValueError:
        logger.warning(
            "lsp.transport.invalid_content_length",
            header=decoded,
        )
        return None

Attack Scenario

  1. Attacker sends an LSP header line containing non-ASCII bytes embedded in the Content-Length value (e.g., Content-Length: 10\xc0\n).
  2. errors="replace" silently converts \xc0 to ?, yielding Content-Length: 10?.
  3. int("10?") raises ValueError; the transport returns None and discards the message — desynchronising the stream without any error surfaced to the caller.
  4. Alternatively, carefully crafted multi-byte sequences can produce replacement characters that, combined with surrounding digits, still parse as a valid integer, causing the transport to read the wrong number of body bytes and corrupt subsequent messages.
  5. In both cases the attacker can force the transport into an inconsistent state, enabling protocol manipulation or denial of service.

Current Behavior

Non-ASCII bytes in LSP headers are silently replaced with ? (U+FFFD). The corrupted header string is then passed to int(), which either fails silently (returning None and desynchronising the stream) or, in edge cases, produces an incorrect Content-Length value. No error is raised to the caller; the attack is invisible in normal logs.

Expected Behavior

Any non-ASCII byte in an LSP header MUST cause an immediate, explicit rejection of the message. The transport should raise a well-typed exception (e.g., ValueError or a dedicated LspProtocolError) that propagates to the caller, rather than silently corrupting the header and continuing.

Acceptance Criteria

  • _read_one_message() decodes header lines using errors="strict" (or equivalent explicit validation).
  • Any UnicodeDecodeError raised during header decoding is caught and re-raised as a typed protocol error (not swallowed).
  • An additional guard validates that the decoded header contains only printable ASCII characters (codepoints 0x20–0x7E plus \r\n) before any further parsing.
  • All existing LSP transport BDD scenarios continue to pass.
  • New BDD scenarios cover: (a) non-ASCII byte in Content-Length value, (b) non-ASCII byte in header name, (c) non-ASCII byte in an unrecognised header — all must result in a protocol error, not silent corruption.
  • nox passes with no failures; coverage ≥ 97%.

Supporting Information

  • File: src/cleveragents/lsp/transport.py
  • Method: _read_one_message()
  • Line: 240
  • Severity: Critical — protocol injection / transport desynchronisation
  • Likelihood: Medium — requires a crafted peer sending non-ASCII header bytes
  • Related issues: #7083 (DoS in LSP message reading), #7101 (path traversal in LSP runtime), #7102 (race condition in lifecycle manager)
  • CWE: CWE-116 (Improper Encoding or Escaping of Output), CWE-20 (Improper Input Validation)

Metadata

  • Branch: bugfix/m3.6.0-lsp-transport-header-injection-ascii
  • Commit Message: fix(lsp): reject non-ASCII header bytes in transport to prevent header injection
  • Milestone: v3.6.0
  • Parent Epic: #824

Subtasks

  • Write failing BDD scenarios (Behave) covering non-ASCII bytes in Content-Length, header name, and unknown header — tagged @tdd_issue, @tdd_issue_<N>, @tdd_expected_fail
  • Create TDD issue (Type/Testing, title prefixed TDD:) and link it as a dependency of this issue
  • Change line.decode("ascii", errors="replace") to line.decode("ascii", errors="strict") in _read_one_message()
  • Catch UnicodeDecodeError and re-raise as a typed LspProtocolError (or equivalent)
  • Add explicit printable-ASCII guard on the decoded header string before parsing
  • Remove @tdd_expected_fail tag from all @tdd_issue_<N> scenarios once fix is in place
  • Update docstring for _read_one_message() to document strict ASCII enforcement
  • Verify nox passes with no failures and coverage ≥ 97%

Definition of Done

  • Failing BDD scenarios exist and are tagged correctly (@tdd_issue, @tdd_issue_<N>, @tdd_expected_fail)
  • TDD issue created and linked as dependency of this issue
  • _read_one_message() uses errors="strict" (or equivalent) — no silent replacement
  • UnicodeDecodeError propagates as a typed protocol error, not swallowed
  • Printable-ASCII guard in place before header parsing
  • @tdd_expected_fail removed from all @tdd_issue_<N> scenarios after fix
  • All nox stages pass
  • Coverage ≥ 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: new-issue-creator

## Background and Context The LSP transport layer in `src/cleveragents/lsp/transport.py` uses ASCII decoding with `errors="replace"` when parsing LSP message headers inside `_read_one_message()`. This silently converts any non-ASCII bytes to Unicode replacement characters (`\uFFFD` / `?`), rather than rejecting them outright. Because the replacement is silent, a malicious or misbehaving peer can craft headers that contain non-ASCII byte sequences in positions that affect protocol parsing — most critically inside the `Content-Length` value. The `int()` conversion that follows will then either raise `ValueError` (triggering a silent `return None` that desynchronises the stream) or, in more subtle cases, produce an unexpected integer if the replacement character happens to land in a way that still parses. Either outcome allows an attacker to manipulate the transport's view of message boundaries, enabling protocol-level injection or desynchronisation attacks. The vulnerability is present in the `_read_one_message()` method and is confirmed by direct inspection of the source. ### Evidence ```python # src/cleveragents/lsp/transport.py, line 240 decoded = line.decode("ascii", errors="replace").strip() if not decoded: break # Empty line = end of headers if decoded.lower().startswith("content-length:"): try: content_length = int(decoded.split(":", 1)[1].strip()) except ValueError: logger.warning( "lsp.transport.invalid_content_length", header=decoded, ) return None ``` ### Attack Scenario 1. Attacker sends an LSP header line containing non-ASCII bytes embedded in the `Content-Length` value (e.g., `Content-Length: 10\xc0\n`). 2. `errors="replace"` silently converts `\xc0` to `?`, yielding `Content-Length: 10?`. 3. `int("10?")` raises `ValueError`; the transport returns `None` and discards the message — desynchronising the stream without any error surfaced to the caller. 4. Alternatively, carefully crafted multi-byte sequences can produce replacement characters that, combined with surrounding digits, still parse as a valid integer, causing the transport to read the wrong number of body bytes and corrupt subsequent messages. 5. In both cases the attacker can force the transport into an inconsistent state, enabling protocol manipulation or denial of service. ## Current Behavior Non-ASCII bytes in LSP headers are silently replaced with `?` (U+FFFD). The corrupted header string is then passed to `int()`, which either fails silently (returning `None` and desynchronising the stream) or, in edge cases, produces an incorrect `Content-Length` value. No error is raised to the caller; the attack is invisible in normal logs. ## Expected Behavior Any non-ASCII byte in an LSP header MUST cause an immediate, explicit rejection of the message. The transport should raise a well-typed exception (e.g., `ValueError` or a dedicated `LspProtocolError`) that propagates to the caller, rather than silently corrupting the header and continuing. ## Acceptance Criteria - `_read_one_message()` decodes header lines using `errors="strict"` (or equivalent explicit validation). - Any `UnicodeDecodeError` raised during header decoding is caught and re-raised as a typed protocol error (not swallowed). - An additional guard validates that the decoded header contains only printable ASCII characters (codepoints 0x20–0x7E plus `\r\n`) before any further parsing. - All existing LSP transport BDD scenarios continue to pass. - New BDD scenarios cover: (a) non-ASCII byte in `Content-Length` value, (b) non-ASCII byte in header name, (c) non-ASCII byte in an unrecognised header — all must result in a protocol error, not silent corruption. - `nox` passes with no failures; coverage ≥ 97%. ## Supporting Information - **File**: `src/cleveragents/lsp/transport.py` - **Method**: `_read_one_message()` - **Line**: 240 - **Severity**: Critical — protocol injection / transport desynchronisation - **Likelihood**: Medium — requires a crafted peer sending non-ASCII header bytes - **Related issues**: #7083 (DoS in LSP message reading), #7101 (path traversal in LSP runtime), #7102 (race condition in lifecycle manager) - **CWE**: CWE-116 (Improper Encoding or Escaping of Output), CWE-20 (Improper Input Validation) ## Metadata - **Branch**: `bugfix/m3.6.0-lsp-transport-header-injection-ascii` - **Commit Message**: `fix(lsp): reject non-ASCII header bytes in transport to prevent header injection` - **Milestone**: v3.6.0 - **Parent Epic**: #824 ## Subtasks - [ ] Write failing BDD scenarios (Behave) covering non-ASCII bytes in `Content-Length`, header name, and unknown header — tagged `@tdd_issue`, `@tdd_issue_<N>`, `@tdd_expected_fail` - [ ] Create TDD issue (`Type/Testing`, title prefixed `TDD:`) and link it as a dependency of this issue - [ ] Change `line.decode("ascii", errors="replace")` to `line.decode("ascii", errors="strict")` in `_read_one_message()` - [ ] Catch `UnicodeDecodeError` and re-raise as a typed `LspProtocolError` (or equivalent) - [ ] Add explicit printable-ASCII guard on the decoded header string before parsing - [ ] Remove `@tdd_expected_fail` tag from all `@tdd_issue_<N>` scenarios once fix is in place - [ ] Update docstring for `_read_one_message()` to document strict ASCII enforcement - [ ] Verify `nox` passes with no failures and coverage ≥ 97% ## Definition of Done - [ ] Failing BDD scenarios exist and are tagged correctly (`@tdd_issue`, `@tdd_issue_<N>`, `@tdd_expected_fail`) - [ ] TDD issue created and linked as dependency of this issue - [ ] `_read_one_message()` uses `errors="strict"` (or equivalent) — no silent replacement - [ ] `UnicodeDecodeError` propagates as a typed protocol error, not swallowed - [ ] Printable-ASCII guard in place before header parsing - [ ] `@tdd_expected_fail` removed from all `@tdd_issue_<N>` scenarios after fix - [ ] All nox stages pass - [ ] Coverage ≥ 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: new-issue-creator
HAL9000 added this to the v3.6.0 milestone 2026-04-10 07:50:57 +00:00
Author
Owner

Verified — Critical security bug: header injection in LSP transport ASCII decoding. MoSCoW: Must-have. Priority: Critical.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical security bug: header injection in LSP transport ASCII decoding. MoSCoW: Must-have. Priority: Critical. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Author
Owner

Dependency Note

PR #10608 should BLOCK issue #7112 (correct dependency direction: PR → blocks → issue).
The Forgejo dependencies API returned an error during automated setup. Manual verification required:

  • Open the issue and confirm PR #10608 appears under "depends on".
  • If missing, add it manually via the Forgejo UI.

Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

**Dependency Note** PR #10608 should BLOCK issue #7112 (correct dependency direction: PR → blocks → issue). The Forgejo dependencies API returned an error during automated setup. Manual verification required: - Open the issue and confirm PR #10608 appears under "depends on". - If missing, add it manually via the Forgejo UI. --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#824 Epic: LSP Functional Runtime
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#7112
No description provided.