BUG-HUNT: [security] Potential for prompt injection in agent LLM calls #3319

New issue

Open

opened 2026-04-05 09:42:40 +00:00 by freemo · 1 comment

freemo commented

2026-04-05 09:42:40 +00:00

Owner

Metadata

Branch: fix/security-prompt-injection-agent-llm-calls
Commit Message: fix(security): sanitize user input in agent LLM prompt construction
Milestone: v3.6.0
Parent Epic: #400

Bug Report: [security] — Potential for prompt injection in agent LLM calls

Severity Assessment

Impact: A malicious user could inject prompts to cause the LLM to generate malicious code, reveal sensitive information, or perform other unintended actions.
Likelihood: High, as user input is directly concatenated with system prompts.
Priority: Critical

Location

File: src/cleveragents/agents/graphs/auto_debug.py, src/cleveragents/agents/graphs/context_analysis.py, src/cleveragents/agents/graphs/plan_generation.py
Function/Class: _analyze_error, _generate_fix, _validate_fix, _analyze_dependencies, _score_relevance, _summarize_context, _analyze_requirements, _generate_plan, _validate

Description

The agents construct prompts by concatenating user-provided input with system instructions. While plan_generation.py uses a PromptSanitizer, its effectiveness is not guaranteed, and the other agents do not appear to use any sanitization. This makes them vulnerable to prompt injection attacks.

Evidence

# Example from src/cleveragents/agents/graphs/auto_debug.py
HumanMessage(
    content=f'''Error Message:
{error_msg}

Code Context:
{code_ctx}

Analyze this error and provide insights.'''
),

Expected Behavior

User input should be properly sanitized or isolated to prevent prompt injection.

Actual Behavior

User input is directly included in the prompt, allowing for potential injection.

Suggested Fix

Implement robust input sanitization and consider using techniques like prompt templating with clear separation of user input and system instructions.

Subtasks

Audit all agent graph files (auto_debug.py, context_analysis.py, plan_generation.py) for unsanitized user input in LLM prompt construction
Evaluate effectiveness of existing PromptSanitizer in plan_generation.py and identify gaps
Implement or extend PromptSanitizer to cover all identified injection vectors across all three agent files
Apply sanitization to all affected functions: _analyze_error, _generate_fix, _validate_fix, _analyze_dependencies, _score_relevance, _summarize_context, _analyze_requirements, _generate_plan, _validate
Add structural separation between system instructions and user-provided content (e.g., delimiters, templating)
Tests (Behave): Add BDD scenarios for prompt injection attempts and verify sanitization blocks them
Tests (Robot): Add integration tests verifying sanitized prompts reach the LLM correctly
Verify coverage >= 97% via nox -s coverage_report
Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

All subtasks above are completed and checked off.
A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (fix(security): sanitize user input in agent LLM prompt construction), followed by a blank line, then additional lines providing relevant details about the implementation.
The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (fix/security-prompt-injection-agent-llm-calls).
The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
All nox stages pass.
Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/security-prompt-injection-agent-llm-calls` - **Commit Message**: `fix(security): sanitize user input in agent LLM prompt construction` - **Milestone**: v3.6.0 - **Parent Epic**: #400 ## Bug Report: [security] — Potential for prompt injection in agent LLM calls ### Severity Assessment - **Impact**: A malicious user could inject prompts to cause the LLM to generate malicious code, reveal sensitive information, or perform other unintended actions. - **Likelihood**: High, as user input is directly concatenated with system prompts. - **Priority**: Critical ### Location - **File**: `src/cleveragents/agents/graphs/auto_debug.py`, `src/cleveragents/agents/graphs/context_analysis.py`, `src/cleveragents/agents/graphs/plan_generation.py` - **Function/Class**: `_analyze_error`, `_generate_fix`, `_validate_fix`, `_analyze_dependencies`, `_score_relevance`, `_summarize_context`, `_analyze_requirements`, `_generate_plan`, `_validate` ### Description The agents construct prompts by concatenating user-provided input with system instructions. While `plan_generation.py` uses a `PromptSanitizer`, its effectiveness is not guaranteed, and the other agents do not appear to use any sanitization. This makes them vulnerable to prompt injection attacks. ### Evidence ```python # Example from src/cleveragents/agents/graphs/auto_debug.py HumanMessage( content=f'''Error Message: {error_msg} Code Context: {code_ctx} Analyze this error and provide insights.''' ), ``` ### Expected Behavior User input should be properly sanitized or isolated to prevent prompt injection. ### Actual Behavior User input is directly included in the prompt, allowing for potential injection. ### Suggested Fix Implement robust input sanitization and consider using techniques like prompt templating with clear separation of user input and system instructions. ### Category security ## Subtasks - [ ] Audit all agent graph files (`auto_debug.py`, `context_analysis.py`, `plan_generation.py`) for unsanitized user input in LLM prompt construction - [ ] Evaluate effectiveness of existing `PromptSanitizer` in `plan_generation.py` and identify gaps - [ ] Implement or extend `PromptSanitizer` to cover all identified injection vectors across all three agent files - [ ] Apply sanitization to all affected functions: `_analyze_error`, `_generate_fix`, `_validate_fix`, `_analyze_dependencies`, `_score_relevance`, `_summarize_context`, `_analyze_requirements`, `_generate_plan`, `_validate` - [ ] Add structural separation between system instructions and user-provided content (e.g., delimiters, templating) - [ ] Tests (Behave): Add BDD scenarios for prompt injection attempts and verify sanitization blocks them - [ ] Tests (Robot): Add integration tests verifying sanitized prompts reach the LLM correctly - [ ] Verify coverage >= 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`fix(security): sanitize user input in agent LLM prompt construction`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`fix/security-prompt-injection-agent-llm-calls`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator

freemo added this to the v3.6.0 milestone

2026-04-05 09:44:56 +00:00

freemo added a new dependency

2026-04-05 09:51:52 +00:00

#400 Epic: Post-MVP Security

freemo commented

2026-04-05 09:54:25 +00:00

Author

Owner

Issue triaged by project owner:

State: Verified — valid security concern with clear code evidence showing unsanitized user input in LLM prompt construction
Priority: Critical (unchanged) — security vulnerability. User input is directly concatenated into LLM prompts without sanitization in multiple agent graph modules.
Milestone: v3.6.0 (already set) — safety profiles and security hardening are v3.6.0 scope
MoSCoW: Should Have — while this is a security concern, prompt injection in LLM calls is a known industry-wide challenge. The existing PromptSanitizer in plan_generation.py shows awareness of the issue. Elevating to Should Have rather than Must Have because: (1) the system is not yet in production, (2) the attack surface requires a malicious user with direct CLI access, and (3) complete prompt injection prevention is an ongoing research problem. However, extending sanitization to all agent graphs is important before any production deployment.
Parent Epic: #400 (referenced in issue body)

Note: Downgrading from the bug hunter's "Critical" assessment to "Should Have" MoSCoW because this is a pre-production system where the attack surface is limited to users with direct CLI access. The priority remains Critical to ensure it's addressed before production.

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified — valid security concern with clear code evidence showing unsanitized user input in LLM prompt construction - **Priority**: Critical (unchanged) — security vulnerability. User input is directly concatenated into LLM prompts without sanitization in multiple agent graph modules. - **Milestone**: v3.6.0 (already set) — safety profiles and security hardening are v3.6.0 scope - **MoSCoW**: Should Have — while this is a security concern, prompt injection in LLM calls is a known industry-wide challenge. The existing `PromptSanitizer` in `plan_generation.py` shows awareness of the issue. Elevating to Should Have rather than Must Have because: (1) the system is not yet in production, (2) the attack surface requires a malicious user with direct CLI access, and (3) complete prompt injection prevention is an ongoing research problem. However, extending sanitization to all agent graphs is important before any production deployment. - **Parent Epic**: #400 (referenced in issue body) Note: Downgrading from the bug hunter's "Critical" assessment to "Should Have" MoSCoW because this is a pre-production system where the attack surface is limited to users with direct CLI access. The priority remains Critical to ensure it's addressed before production. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner

freemo referenced this issue

2026-04-06 07:57:10 +00:00

UAT: ReactiveStreamRouter._route_to_llm() applies prompt boundary markers (mechanism 2) but skips sanitize_user_input() (mechanism 1) — prompt injection mechanism 1 bypassed in reactive routing path #3965

freemo removed this from the v3.6.0 milestone

2026-04-06 23:59:53 +00:00