UAT: builtin/file-search compiles user-controlled regex without timeout or validation — ReDoS (Denial of Service) vulnerability #4116

Open
opened 2026-04-06 10:27:54 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/security-file-search-redos
  • Commit Message: fix(security): add regex timeout and complexity limit to builtin/file-search
  • Milestone: (none — backlog)
  • Parent Epic: #362

Bug Report

What Was Tested

Security of the builtin/file-search tool's regex pattern handling.

Expected Behavior (from spec)

Per docs/specification.md §Security Model — Sandbox Isolation and general security hardening principles, all user-provided inputs must be validated before use. Regex patterns provided by agents/users must not be able to cause unbounded CPU consumption.

Actual Behavior

_handle_file_search() in src/cleveragents/tool/builtins/file_tools.py compiles the user-provided pattern parameter directly as a Python regex without any validation, complexity limit, or timeout:

# file_tools.py lines ~175-185
def _handle_file_search(inputs: dict[str, Any]) -> dict[str, Any]:
    """Search files for content."""
    path = validate_path(inputs["path"], inputs.get("sandbox_root"))
    pattern: str = inputs["pattern"]
    include: str = inputs.get("include", "*")
    max_results: int = inputs.get("max_results", 100)

    if not path.is_dir():
        raise ValueError(f"Path is not a directory: {path}")

    compiled = re.compile(pattern)  # ← user-controlled regex, NO validation
    matches: list[dict[str, Any]] = []

    for file_path in sorted(path.rglob(include)):
        ...
        for line_no, line in enumerate(content.splitlines(), start=1):
            if compiled.search(line):  # ← applied to every line of every file

Attack Scenario (ReDoS)

An agent or user can provide a catastrophic backtracking regex pattern that causes the Python regex engine to consume exponential CPU time:

# Example catastrophic patterns:
pattern = "(a+)+"          # Catastrophic backtracking on "aaaa...aaab"
pattern = "(a|aa)+"        # Catastrophic backtracking
pattern = "([a-zA-Z]+)*"   # Catastrophic on long strings
pattern = "(x+x+)+y"       # Catastrophic on "xxxxxxxxxx"

When compiled.search(line) is called with such a pattern on a long line, the Python regex engine can take minutes or hours (or effectively forever) to return. Since the search iterates over every line of every file in the directory, this can permanently hang the agent process.

Proof of Concept

import re, time

# Catastrophic backtracking pattern
pattern = re.compile("(a+)+$")
evil_input = "a" * 30 + "b"  # 31 chars

start = time.time()
try:
    pattern.search(evil_input)  # This will hang for a very long time
except:
    pass
print(f"Elapsed: {time.time() - start:.2f}s")  # Will print a very large number

Code Location

  • src/cleveragents/tool/builtins/file_tools.py_handle_file_search() function, re.compile(pattern) call

Fix Required

Apply one or more of the following mitigations:

Option 1 — Use re.timeout (Python 3.11+):

import re
compiled = re.compile(pattern)
# Use timeout parameter in search (Python 3.11+)
try:
    match = compiled.search(line, timeout=1.0)  # 1 second timeout
except TimeoutError:
    continue  # Skip lines that cause timeout

Option 2 — Validate regex complexity before compiling:

# Reject patterns with known catastrophic backtracking structures
_DANGEROUS_PATTERNS = re.compile(
    r'(\(.*\+.*\)\+|\(.*\|.*\)\+|\[.*\]\*\*|\(.*\)\{[0-9]+,\})'
)
if _DANGEROUS_PATTERNS.search(pattern):
    raise ValueError(f"Regex pattern '{pattern}' may cause catastrophic backtracking")

Option 3 — Limit pattern length and use re.TIMEOUT flag:

MAX_PATTERN_LENGTH = 256
if len(pattern) > MAX_PATTERN_LENGTH:
    raise ValueError(f"Regex pattern too long: {len(pattern)} > {MAX_PATTERN_LENGTH}")

Option 4 — Run regex in a subprocess with timeout (most robust):
Use the existing InlineToolExecutor subprocess pattern to run the regex search with a hard timeout.

The recommended fix is Option 1 (Python 3.11+ timeout) combined with Option 3 (length limit) as defense-in-depth.

Impact

  • Availability: An agent can permanently hang the CleverAgents process by providing a catastrophic backtracking regex pattern to builtin/file-search.
  • OWASP Category: A05:2021 — Security Misconfiguration / A06:2021 — Vulnerable and Outdated Components (ReDoS)
  • Severity: High — can cause complete Denial of Service of the agent runtime

Subtasks

  • Add maximum pattern length validation (e.g., 512 chars) in _handle_file_search()
  • Apply re.timeout (Python 3.11+) or equivalent timeout mechanism to compiled.search(line) calls
  • Add BDD scenario verifying that catastrophic backtracking patterns are rejected or time-limited
  • Verify nox -e unit_tests passes

Definition of Done

  • _handle_file_search() validates or limits the pattern parameter before compiling
  • The regex search operation has a bounded execution time
  • A BDD scenario exists verifying the protection
  • nox -e unit_tests and nox -e typecheck pass
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous UAT security audit.
It does not block milestone completion and has been placed in the backlog
for human review and future milestone assignment.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `fix/security-file-search-redos` - **Commit Message**: `fix(security): add regex timeout and complexity limit to builtin/file-search` - **Milestone**: _(none — backlog)_ - **Parent Epic**: #362 ## Bug Report ### What Was Tested Security of the `builtin/file-search` tool's regex pattern handling. ### Expected Behavior (from spec) Per `docs/specification.md` §Security Model — Sandbox Isolation and general security hardening principles, all user-provided inputs must be validated before use. Regex patterns provided by agents/users must not be able to cause unbounded CPU consumption. ### Actual Behavior `_handle_file_search()` in `src/cleveragents/tool/builtins/file_tools.py` compiles the user-provided `pattern` parameter directly as a Python regex **without any validation, complexity limit, or timeout**: ```python # file_tools.py lines ~175-185 def _handle_file_search(inputs: dict[str, Any]) -> dict[str, Any]: """Search files for content.""" path = validate_path(inputs["path"], inputs.get("sandbox_root")) pattern: str = inputs["pattern"] include: str = inputs.get("include", "*") max_results: int = inputs.get("max_results", 100) if not path.is_dir(): raise ValueError(f"Path is not a directory: {path}") compiled = re.compile(pattern) # ← user-controlled regex, NO validation matches: list[dict[str, Any]] = [] for file_path in sorted(path.rglob(include)): ... for line_no, line in enumerate(content.splitlines(), start=1): if compiled.search(line): # ← applied to every line of every file ``` ### Attack Scenario (ReDoS) An agent or user can provide a **catastrophic backtracking** regex pattern that causes the Python regex engine to consume exponential CPU time: ```python # Example catastrophic patterns: pattern = "(a+)+" # Catastrophic backtracking on "aaaa...aaab" pattern = "(a|aa)+" # Catastrophic backtracking pattern = "([a-zA-Z]+)*" # Catastrophic on long strings pattern = "(x+x+)+y" # Catastrophic on "xxxxxxxxxx" ``` When `compiled.search(line)` is called with such a pattern on a long line, the Python regex engine can take **minutes or hours** (or effectively forever) to return. Since the search iterates over every line of every file in the directory, this can permanently hang the agent process. ### Proof of Concept ```python import re, time # Catastrophic backtracking pattern pattern = re.compile("(a+)+$") evil_input = "a" * 30 + "b" # 31 chars start = time.time() try: pattern.search(evil_input) # This will hang for a very long time except: pass print(f"Elapsed: {time.time() - start:.2f}s") # Will print a very large number ``` ### Code Location - `src/cleveragents/tool/builtins/file_tools.py` — `_handle_file_search()` function, `re.compile(pattern)` call ### Fix Required Apply one or more of the following mitigations: **Option 1 — Use `re.timeout` (Python 3.11+)**: ```python import re compiled = re.compile(pattern) # Use timeout parameter in search (Python 3.11+) try: match = compiled.search(line, timeout=1.0) # 1 second timeout except TimeoutError: continue # Skip lines that cause timeout ``` **Option 2 — Validate regex complexity before compiling**: ```python # Reject patterns with known catastrophic backtracking structures _DANGEROUS_PATTERNS = re.compile( r'(\(.*\+.*\)\+|\(.*\|.*\)\+|\[.*\]\*\*|\(.*\)\{[0-9]+,\})' ) if _DANGEROUS_PATTERNS.search(pattern): raise ValueError(f"Regex pattern '{pattern}' may cause catastrophic backtracking") ``` **Option 3 — Limit pattern length and use `re.TIMEOUT` flag**: ```python MAX_PATTERN_LENGTH = 256 if len(pattern) > MAX_PATTERN_LENGTH: raise ValueError(f"Regex pattern too long: {len(pattern)} > {MAX_PATTERN_LENGTH}") ``` **Option 4 — Run regex in a subprocess with timeout** (most robust): Use the existing `InlineToolExecutor` subprocess pattern to run the regex search with a hard timeout. The recommended fix is Option 1 (Python 3.11+ timeout) combined with Option 3 (length limit) as defense-in-depth. ### Impact - **Availability**: An agent can permanently hang the CleverAgents process by providing a catastrophic backtracking regex pattern to `builtin/file-search`. - **OWASP Category**: A05:2021 — Security Misconfiguration / A06:2021 — Vulnerable and Outdated Components (ReDoS) - **Severity**: High — can cause complete Denial of Service of the agent runtime ## Subtasks - [ ] Add maximum pattern length validation (e.g., 512 chars) in `_handle_file_search()` - [ ] Apply `re.timeout` (Python 3.11+) or equivalent timeout mechanism to `compiled.search(line)` calls - [ ] Add BDD scenario verifying that catastrophic backtracking patterns are rejected or time-limited - [ ] Verify `nox -e unit_tests` passes ## Definition of Done - [ ] `_handle_file_search()` validates or limits the `pattern` parameter before compiling - [ ] The regex search operation has a bounded execution time - [ ] A BDD scenario exists verifying the protection - [ ] `nox -e unit_tests` and `nox -e typecheck` pass - All nox stages pass - Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous UAT security audit. > It does not block milestone completion and has been placed in the backlog > for human review and future milestone assignment. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
freemo added this to the v3.3.0 milestone 2026-04-06 18:06:36 +00:00
Author
Owner

Milestone Triage Decision: Moved to Backlog

This security logging issue has been moved out of v3.3.0 during aggressive milestone triage. While important for security, it does not relate to the core focus of Corrections + Subplans + Checkpoints.

Reasoning:

  • v3.3.0 focus: Essential corrections, subplan management, and checkpoint functionality
  • This issue: Security logging enhancement - important but not milestone-blocking
  • Impact: Security observability improvement, not core corrections/subplans/checkpoints functionality

Will be addressed in a future milestone focused on security hardening and observability.

**Milestone Triage Decision: Moved to Backlog** This security logging issue has been moved out of v3.3.0 during aggressive milestone triage. While important for security, it does not relate to the core focus of Corrections + Subplans + Checkpoints. **Reasoning:** - v3.3.0 focus: Essential corrections, subplan management, and checkpoint functionality - This issue: Security logging enhancement - important but not milestone-blocking - Impact: Security observability improvement, not core corrections/subplans/checkpoints functionality Will be addressed in a future milestone focused on security hardening and observability.
freemo removed this from the v3.3.0 milestone 2026-04-06 20:42:57 +00:00
HAL9000 added this to the v3.5.0 milestone 2026-04-09 03:10:50 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#4116
No description provided.