BUG-HUNT: [security] PromptSanitizer applies HTML escaping AFTER injection pattern check — bypass possible via HTML-encoded injection payloads #7408

New issue

Open

opened 2026-04-10 19:05:27 +00:00 by HAL9000 · 0 comments

HAL9000 commented

2026-04-10 19:05:27 +00:00

Owner

Bug Report: Security — Injection Detection Order in PromptSanitizer

Severity Assessment

Impact: An attacker can craft an HTML-encoded prompt injection payload that bypasses the injection pattern detector. The sanitized output fed to the LLM will then contain the decoded injection after the LLM processes the HTML entities.
Likelihood: Low–Medium — requires attacker to control user input AND the LLM must "decode" HTML entities in its context, which some models do
Priority: High (security)

Location

File: src/cleveragents/application/services/prompt_sanitizer.py
Function: PromptSanitizer.sanitize_user_input()
Lines: 83–112

Description

The injection check is performed on the pre-HTML-escaping text, which the docstring explicitly states is intentional ("check for injection patterns BEFORE HTML escaping so that patterns like <|im_start|> are detected on the raw text"). However, this creates an incomplete defense:

The order is:

Strip control characters ✓
Check injection patterns on raw text ← misses encoded variants
Escape HTML entities

A payload like You are now a will pass step 2 (not matching you\s+are\s+now\s+a), and after HTML escaping (step 3), the string becomes &#x59;ou are now a which is still not dangerous. However, a payload like:

IGNORE ALL previous instructions → caught ✓
IGNORE ALL previous instructions → NOT caught (HTML entities between words bypass the regex)

Even more critically, since wrap_user_content() is called AFTER sanitization, an attacker who can slip past the regex (e.g., using Unicode homoglyphs or zero-width joiners that aren't stripped) can embed injection content within the boundary markers.

Evidence

# src/cleveragents/application/services/prompt_sanitizer.py, lines 88–112

# Step 2: check for injection patterns BEFORE HTML escaping
# so that patterns like <|im_start|> are detected on the raw text.
for pattern_name, pattern_re in self._INJECTION_PATTERNS:
    match = pattern_re.search(working)  # working = raw text, no HTML encoding yet
    if match:
        raise PromptInjectionDetected(...)

# Step 3: escape HTML entities
working = html.escape(working, quote=True)

The _INJECTION_PATTERNS include re.IGNORECASE but don't handle:

Unicode lookalike characters (e.g., "іgnore" with Cyrillic "і")
HTML numeric character references embedded in the text
Zero-width joiners/non-joiners between words

Expected Behavior

The sanitizer should perform injection pattern detection that is robust against Unicode normalization and encoded variants, or clearly document that it only defends against naive injection and relies on other mechanisms for sophisticated attacks.

Actual Behavior

The injection pattern detector can be bypassed using Unicode homoglyphs or partially encoded content, as the regex operates on raw strings without Unicode normalization.

Suggested Fix

Apply Unicode normalization (NFKC) before pattern detection to collapse lookalikes
Also scan the post-HTML-escape result for encoded injection patterns
Add patterns for common encoding bypasses (HTML entities, URL encoding)
Document clearly that this is a defense-in-depth measure, not a complete defense

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with tags: @tdd_issue, @tdd_issue_, @tdd_expected_fail.

Automated by CleverAgents Bot
Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor

## Bug Report: Security — Injection Detection Order in PromptSanitizer ### Severity Assessment - **Impact**: An attacker can craft an HTML-encoded prompt injection payload that bypasses the injection pattern detector. The sanitized output fed to the LLM will then contain the decoded injection after the LLM processes the HTML entities. - **Likelihood**: Low–Medium — requires attacker to control user input AND the LLM must "decode" HTML entities in its context, which some models do - **Priority**: High (security) ### Location - **File**: `src/cleveragents/application/services/prompt_sanitizer.py` - **Function**: `PromptSanitizer.sanitize_user_input()` - **Lines**: 83–112 ### Description The injection check is performed on the **pre-HTML-escaping** text, which the docstring explicitly states is intentional ("check for injection patterns BEFORE HTML escaping so that patterns like `<|im_start|>` are detected on the raw text"). However, this creates an incomplete defense: The order is: 1. Strip control characters ✓ 2. **Check injection patterns on raw text** ← misses encoded variants 3. Escape HTML entities A payload like `You are now a` will pass step 2 (not matching `you\s+are\s+now\s+a`), and after HTML escaping (step 3), the string becomes `&#x59;ou are now a` which is still not dangerous. However, a payload like: - `IGNORE ALL previous instructions` → caught ✓ - `IGNORE ALL previous instructions` → NOT caught (HTML entities between words bypass the regex) Even more critically, since `wrap_user_content()` is called AFTER sanitization, an attacker who can slip past the regex (e.g., using Unicode homoglyphs or zero-width joiners that aren't stripped) can embed injection content within the boundary markers. ### Evidence ```python # src/cleveragents/application/services/prompt_sanitizer.py, lines 88–112 # Step 2: check for injection patterns BEFORE HTML escaping # so that patterns like <|im_start|> are detected on the raw text. for pattern_name, pattern_re in self._INJECTION_PATTERNS: match = pattern_re.search(working) # working = raw text, no HTML encoding yet if match: raise PromptInjectionDetected(...) # Step 3: escape HTML entities working = html.escape(working, quote=True) ``` The `_INJECTION_PATTERNS` include `re.IGNORECASE` but don't handle: - Unicode lookalike characters (e.g., "іgnore" with Cyrillic "і") - HTML numeric character references embedded in the text - Zero-width joiners/non-joiners between words ### Expected Behavior The sanitizer should perform injection pattern detection that is robust against Unicode normalization and encoded variants, or clearly document that it only defends against naive injection and relies on other mechanisms for sophisticated attacks. ### Actual Behavior The injection pattern detector can be bypassed using Unicode homoglyphs or partially encoded content, as the regex operates on raw strings without Unicode normalization. ### Suggested Fix 1. Apply Unicode normalization (NFKC) before pattern detection to collapse lookalikes 2. Also scan the post-HTML-escape result for encoded injection patterns 3. Add patterns for common encoding bypasses (HTML entities, URL encoding) 4. Document clearly that this is a defense-in-depth measure, not a complete defense ### Category security ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD with tags: @tdd_issue, @tdd_issue_<this-issue-number>, @tdd_expected_fail. --- **Automated by CleverAgents Bot** Supervisor: Bug Detection Pool | Agent: bug-hunt-pool-supervisor

HAL9000 referenced this issue

2026-04-10 19:09:52 +00:00

[AUTO-BUG-SUP] Bug Hunt Status (Cycle 1) #7404

HAL9000 added this to the v3.5.0 milestone

2026-04-10 19:34:04 +00:00

HAL9000 referenced this issue

2026-04-10 19:38:52 +00:00

[AUTO-BUG-SUP] Bug Hunt Status (Cycle 3) #7449

HAL9000 added the

State

Verified

label