BUG-HUNT: [security] Potential for prompt injection in agent LLM calls #3319

Open
opened 2026-04-05 09:42:40 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/security-prompt-injection-agent-llm-calls
  • Commit Message: fix(security): sanitize user input in agent LLM prompt construction
  • Milestone: v3.6.0
  • Parent Epic: #400

Bug Report: [security] — Potential for prompt injection in agent LLM calls

Severity Assessment

  • Impact: A malicious user could inject prompts to cause the LLM to generate malicious code, reveal sensitive information, or perform other unintended actions.
  • Likelihood: High, as user input is directly concatenated with system prompts.
  • Priority: Critical

Location

  • File: src/cleveragents/agents/graphs/auto_debug.py, src/cleveragents/agents/graphs/context_analysis.py, src/cleveragents/agents/graphs/plan_generation.py
  • Function/Class: _analyze_error, _generate_fix, _validate_fix, _analyze_dependencies, _score_relevance, _summarize_context, _analyze_requirements, _generate_plan, _validate

Description

The agents construct prompts by concatenating user-provided input with system instructions. While plan_generation.py uses a PromptSanitizer, its effectiveness is not guaranteed, and the other agents do not appear to use any sanitization. This makes them vulnerable to prompt injection attacks.

Evidence

# Example from src/cleveragents/agents/graphs/auto_debug.py
HumanMessage(
    content=f'''Error Message:
{error_msg}

Code Context:
{code_ctx}

Analyze this error and provide insights.'''
),

Expected Behavior

User input should be properly sanitized or isolated to prevent prompt injection.

Actual Behavior

User input is directly included in the prompt, allowing for potential injection.

Suggested Fix

Implement robust input sanitization and consider using techniques like prompt templating with clear separation of user input and system instructions.

Category

security

Subtasks

  • Audit all agent graph files (auto_debug.py, context_analysis.py, plan_generation.py) for unsanitized user input in LLM prompt construction
  • Evaluate effectiveness of existing PromptSanitizer in plan_generation.py and identify gaps
  • Implement or extend PromptSanitizer to cover all identified injection vectors across all three agent files
  • Apply sanitization to all affected functions: _analyze_error, _generate_fix, _validate_fix, _analyze_dependencies, _score_relevance, _summarize_context, _analyze_requirements, _generate_plan, _validate
  • Add structural separation between system instructions and user-provided content (e.g., delimiters, templating)
  • Tests (Behave): Add BDD scenarios for prompt injection attempts and verify sanitization blocks them
  • Tests (Robot): Add integration tests verifying sanitized prompts reach the LLM correctly
  • Verify coverage >= 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly (fix(security): sanitize user input in agent LLM prompt construction), followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly (fix/security-prompt-injection-agent-llm-calls).
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage >= 97%.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/security-prompt-injection-agent-llm-calls` - **Commit Message**: `fix(security): sanitize user input in agent LLM prompt construction` - **Milestone**: v3.6.0 - **Parent Epic**: #400 ## Bug Report: [security] — Potential for prompt injection in agent LLM calls ### Severity Assessment - **Impact**: A malicious user could inject prompts to cause the LLM to generate malicious code, reveal sensitive information, or perform other unintended actions. - **Likelihood**: High, as user input is directly concatenated with system prompts. - **Priority**: Critical ### Location - **File**: `src/cleveragents/agents/graphs/auto_debug.py`, `src/cleveragents/agents/graphs/context_analysis.py`, `src/cleveragents/agents/graphs/plan_generation.py` - **Function/Class**: `_analyze_error`, `_generate_fix`, `_validate_fix`, `_analyze_dependencies`, `_score_relevance`, `_summarize_context`, `_analyze_requirements`, `_generate_plan`, `_validate` ### Description The agents construct prompts by concatenating user-provided input with system instructions. While `plan_generation.py` uses a `PromptSanitizer`, its effectiveness is not guaranteed, and the other agents do not appear to use any sanitization. This makes them vulnerable to prompt injection attacks. ### Evidence ```python # Example from src/cleveragents/agents/graphs/auto_debug.py HumanMessage( content=f'''Error Message: {error_msg} Code Context: {code_ctx} Analyze this error and provide insights.''' ), ``` ### Expected Behavior User input should be properly sanitized or isolated to prevent prompt injection. ### Actual Behavior User input is directly included in the prompt, allowing for potential injection. ### Suggested Fix Implement robust input sanitization and consider using techniques like prompt templating with clear separation of user input and system instructions. ### Category security ## Subtasks - [ ] Audit all agent graph files (`auto_debug.py`, `context_analysis.py`, `plan_generation.py`) for unsanitized user input in LLM prompt construction - [ ] Evaluate effectiveness of existing `PromptSanitizer` in `plan_generation.py` and identify gaps - [ ] Implement or extend `PromptSanitizer` to cover all identified injection vectors across all three agent files - [ ] Apply sanitization to all affected functions: `_analyze_error`, `_generate_fix`, `_validate_fix`, `_analyze_dependencies`, `_score_relevance`, `_summarize_context`, `_analyze_requirements`, `_generate_plan`, `_validate` - [ ] Add structural separation between system instructions and user-provided content (e.g., delimiters, templating) - [ ] Tests (Behave): Add BDD scenarios for prompt injection attempts and verify sanitization blocks them - [ ] Tests (Robot): Add integration tests verifying sanitized prompts reach the LLM correctly - [ ] Verify coverage >= 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly (`fix(security): sanitize user input in agent LLM prompt construction`), followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly (`fix/security-prompt-injection-agent-llm-calls`). - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage >= 97%. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.6.0 milestone 2026-04-05 09:44:56 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified — valid security concern with clear code evidence showing unsanitized user input in LLM prompt construction
  • Priority: Critical (unchanged) — security vulnerability. User input is directly concatenated into LLM prompts without sanitization in multiple agent graph modules.
  • Milestone: v3.6.0 (already set) — safety profiles and security hardening are v3.6.0 scope
  • MoSCoW: Should Have — while this is a security concern, prompt injection in LLM calls is a known industry-wide challenge. The existing PromptSanitizer in plan_generation.py shows awareness of the issue. Elevating to Should Have rather than Must Have because: (1) the system is not yet in production, (2) the attack surface requires a malicious user with direct CLI access, and (3) complete prompt injection prevention is an ongoing research problem. However, extending sanitization to all agent graphs is important before any production deployment.
  • Parent Epic: #400 (referenced in issue body)

Note: Downgrading from the bug hunter's "Critical" assessment to "Should Have" MoSCoW because this is a pre-production system where the attack surface is limited to users with direct CLI access. The priority remains Critical to ensure it's addressed before production.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified — valid security concern with clear code evidence showing unsanitized user input in LLM prompt construction - **Priority**: Critical (unchanged) — security vulnerability. User input is directly concatenated into LLM prompts without sanitization in multiple agent graph modules. - **Milestone**: v3.6.0 (already set) — safety profiles and security hardening are v3.6.0 scope - **MoSCoW**: Should Have — while this is a security concern, prompt injection in LLM calls is a known industry-wide challenge. The existing `PromptSanitizer` in `plan_generation.py` shows awareness of the issue. Elevating to Should Have rather than Must Have because: (1) the system is not yet in production, (2) the attack surface requires a malicious user with direct CLI access, and (3) complete prompt injection prevention is an ongoing research problem. However, extending sanitization to all agent graphs is important before any production deployment. - **Parent Epic**: #400 (referenced in issue body) Note: Downgrading from the bug hunter's "Critical" assessment to "Should Have" MoSCoW because this is a pre-production system where the attack surface is limited to users with direct CLI access. The priority remains Critical to ensure it's addressed before production. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.6.0 milestone 2026-04-06 23:59:53 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#400 Epic: Post-MVP Security
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3319
No description provided.