BUG-HUNT: [security] PromptSanitizer applies HTML escaping AFTER injection pattern check — bypass possible via HTML-encoded injection payloads #7408

Open
opened 2026-04-10 19:05:27 +00:00 by HAL9000 · 0 comments
Owner

Bug Report: Security — Injection Detection Order in PromptSanitizer

Severity Assessment

  • Impact: An attacker can craft an HTML-encoded prompt injection payload that bypasses the injection pattern detector. The sanitized output fed to the LLM will then contain the decoded injection after the LLM processes the HTML entities.
  • Likelihood: Low–Medium — requires attacker to control user input AND the LLM must "decode" HTML entities in its context, which some models do
  • Priority: High (security)

Location

  • File: src/cleveragents/application/services/prompt_sanitizer.py
  • Function: PromptSanitizer.sanitize_user_input()
  • Lines: 83–112

Description

The injection check is performed on the pre-HTML-escaping text, which the docstring explicitly states is intentional ("check for injection patterns BEFORE HTML escaping so that patterns like <|im_start|> are detected on the raw text"). However, this creates an incomplete defense:

The order is:

  1. Strip control characters ✓
  2. Check injection patterns on raw text ← misses encoded variants
  3. Escape HTML entities

A payload like &#x59;ou are now a will pass step 2 (not matching you\s+are\s+now\s+a), and after HTML escaping (step 3), the string becomes &amp;#x59;ou are now a which is still not dangerous. However, a payload like:

  • IGNORE ALL previous instructions → caught ✓
  • IGNORE&#x20;ALL&#x20;previous&#x20;instructions → NOT caught (HTML entities between words bypass the regex)

Even more critically, since wrap_user_content() is called AFTER sanitization, an attacker who can slip past the regex (e.g., using Unicode homoglyphs or zero-width joiners that aren't stripped) can embed injection content within the boundary markers.

Evidence

# src/cleveragents/application/services/prompt_sanitizer.py, lines 88–112

# Step 2: check for injection patterns BEFORE HTML escaping
# so that patterns like <|im_start|> are detected on the raw text.
for pattern_name, pattern_re in self._INJECTION_PATTERNS:
    match = pattern_re.search(working)  # working = raw text, no HTML encoding yet
    if match:
        raise PromptInjectionDetected(...)

# Step 3: escape HTML entities
working = html.escape(working, quote=True)

The _INJECTION_PATTERNS include re.IGNORECASE but don't handle:

  • Unicode lookalike characters (e.g., "іgnore" with Cyrillic "і")
  • HTML numeric character references embedded in the text
  • Zero-width joiners/non-joiners between words

Expected Behavior

The sanitizer should perform injection pattern detection that is robust against Unicode normalization and encoded variants, or clearly document that it only defends against naive injection and relies on other mechanisms for sophisticated attacks.

Actual Behavior

The injection pattern detector can be bypassed using Unicode homoglyphs or partially encoded content, as the regex operates on raw strings without Unicode normalization.

Suggested Fix

  1. Apply Unicode normalization (NFKC) before pattern detection to collapse lookalikes
  2. Also scan the post-HTML-escape result for encoded injection patterns
  3. Add patterns for common encoding bypasses (HTML entities, URL encoding)
  4. Document clearly that this is a defense-in-depth measure, not a complete defense

Category

security

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with tags: @tdd_issue, @tdd_issue_, @tdd_expected_fail.


Automated by CleverAgents Bot
Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: Security — Injection Detection Order in PromptSanitizer ### Severity Assessment - **Impact**: An attacker can craft an HTML-encoded prompt injection payload that bypasses the injection pattern detector. The sanitized output fed to the LLM will then contain the decoded injection after the LLM processes the HTML entities. - **Likelihood**: Low–Medium — requires attacker to control user input AND the LLM must "decode" HTML entities in its context, which some models do - **Priority**: High (security) ### Location - **File**: `src/cleveragents/application/services/prompt_sanitizer.py` - **Function**: `PromptSanitizer.sanitize_user_input()` - **Lines**: 83–112 ### Description The injection check is performed on the **pre-HTML-escaping** text, which the docstring explicitly states is intentional ("check for injection patterns BEFORE HTML escaping so that patterns like `<|im_start|>` are detected on the raw text"). However, this creates an incomplete defense: The order is: 1. Strip control characters ✓ 2. **Check injection patterns on raw text** ← misses encoded variants 3. Escape HTML entities A payload like `&#x59;ou are now a` will pass step 2 (not matching `you\s+are\s+now\s+a`), and after HTML escaping (step 3), the string becomes `&amp;#x59;ou are now a` which is still not dangerous. However, a payload like: - `IGNORE ALL previous instructions` → caught ✓ - `IGNORE&#x20;ALL&#x20;previous&#x20;instructions` → NOT caught (HTML entities between words bypass the regex) Even more critically, since `wrap_user_content()` is called AFTER sanitization, an attacker who can slip past the regex (e.g., using Unicode homoglyphs or zero-width joiners that aren't stripped) can embed injection content within the boundary markers. ### Evidence ```python # src/cleveragents/application/services/prompt_sanitizer.py, lines 88–112 # Step 2: check for injection patterns BEFORE HTML escaping # so that patterns like <|im_start|> are detected on the raw text. for pattern_name, pattern_re in self._INJECTION_PATTERNS: match = pattern_re.search(working) # working = raw text, no HTML encoding yet if match: raise PromptInjectionDetected(...) # Step 3: escape HTML entities working = html.escape(working, quote=True) ``` The `_INJECTION_PATTERNS` include `re.IGNORECASE` but don't handle: - Unicode lookalike characters (e.g., "іgnore" with Cyrillic "і") - HTML numeric character references embedded in the text - Zero-width joiners/non-joiners between words ### Expected Behavior The sanitizer should perform injection pattern detection that is robust against Unicode normalization and encoded variants, or clearly document that it only defends against naive injection and relies on other mechanisms for sophisticated attacks. ### Actual Behavior The injection pattern detector can be bypassed using Unicode homoglyphs or partially encoded content, as the regex operates on raw strings without Unicode normalization. ### Suggested Fix 1. Apply Unicode normalization (NFKC) before pattern detection to collapse lookalikes 2. Also scan the post-HTML-escape result for encoded injection patterns 3. Add patterns for common encoding bypasses (HTML entities, URL encoding) 4. Document clearly that this is a defense-in-depth measure, not a complete defense ### Category security ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with tags: @tdd_issue, @tdd_issue_<this-issue-number>, @tdd_expected_fail. --- **Automated by CleverAgents Bot** Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-10 19:34:04 +00:00
HAL9000 self-assigned this 2026-04-11 03:21:09 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7408
No description provided.