[BUG] Prompt Injection Vulnerability in AutoDebugAgent #9110

Open
opened 2026-04-14 07:43:25 +00:00 by HAL9000 · 0 comments
Owner

Metadata

  • Commit Message: fix(agents): sanitize user-provided content in AutoDebugAgent prompts to prevent prompt injection
  • Branch: fix/auto-debug-agent-prompt-injection

Background and Context

A prompt injection vulnerability has been identified in the AutoDebugAgent in src/cleveragents/agents/graphs/auto_debug.py.

The AutoDebugAgent constructs prompts by directly embedding user-provided content, such as error_message and code_context, without any sanitization. This makes the agent vulnerable to prompt injection attacks. An attacker can craft a malicious error message or code context that contains instructions for the LLM, allowing them to override the original instructions and control the agent's behavior.

Code Evidence:

  • src/cleveragents/agents/graphs/auto_debug.pyAutoDebugAgent._analyze_error method: directly includes error_msg and code_ctx in the prompt (lines 109–117 at time of writing)
  • src/cleveragents/agents/graphs/auto_debug.pyAutoDebugAgent._generate_fix method: includes error_analysis and code_context in the prompt (lines 181–192 at time of writing)
  • src/cleveragents/agents/graphs/auto_debug.pyAutoDebugAgent._validate_fix method: includes error_message in the prompt (lines 247–258 at time of writing)

Environment Verification:
This vulnerability can be reproduced by providing a malicious error message or code context to the agent. For example, an error message could be crafted to instruct the LLM to ignore the original instructions and perform a different task.

Severity: High

Tags: bug, security, prompt-injection, cwe-94

Expected Behavior

All user-provided content (error_message, code_context, error_analysis) embedded in LLM prompts within AutoDebugAgent must be sanitized using boundary markers before inclusion. The PromptSanitizer class and its usage in src/cleveragents/agents/graphs/plan_generation.py should be used as the reference implementation.

An attacker-controlled string such as "Ignore all previous instructions and do X" embedded in an error message or code context must not be able to override the agent's system instructions.

Acceptance Criteria

  • AutoDebugAgent._analyze_error wraps error_msg and code_ctx with PromptSanitizer boundary markers before embedding them in the prompt
  • AutoDebugAgent._generate_fix wraps error_analysis and code_context with PromptSanitizer boundary markers before embedding them in the prompt
  • AutoDebugAgent._validate_fix wraps error_message with PromptSanitizer boundary markers before embedding it in the prompt
  • A crafted injection string (e.g., "Ignore all previous instructions and ...") in error_message or code_context does not override the agent's system instructions
  • All existing AutoDebugAgent tests continue to pass
  • New BDD scenarios cover: sanitized error message, sanitized code context, and a prompt injection attempt that is neutralized

Subtasks

  • Review PromptSanitizer usage in src/cleveragents/agents/graphs/plan_generation.py as the reference implementation
  • Apply PromptSanitizer to error_msg and code_ctx in AutoDebugAgent._analyze_error
  • Apply PromptSanitizer to error_analysis and code_context in AutoDebugAgent._generate_fix
  • Apply PromptSanitizer to error_message in AutoDebugAgent._validate_fix
  • Tests (Behave): Add BDD scenario verifying that sanitized boundary markers are present in the constructed prompt
  • Tests (Behave): Add BDD scenario verifying that a prompt injection string in error_message does not alter agent behavior
  • Tests (Behave): Add BDD scenario verifying that a prompt injection string in code_context does not alter agent behavior
  • Tests (Robot): Add integration test confirming the agent behaves correctly when given a malicious error message
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-worker

## Metadata - **Commit Message**: `fix(agents): sanitize user-provided content in AutoDebugAgent prompts to prevent prompt injection` - **Branch**: `fix/auto-debug-agent-prompt-injection` ## Background and Context A prompt injection vulnerability has been identified in the `AutoDebugAgent` in `src/cleveragents/agents/graphs/auto_debug.py`. The `AutoDebugAgent` constructs prompts by directly embedding user-provided content, such as `error_message` and `code_context`, without any sanitization. This makes the agent vulnerable to prompt injection attacks. An attacker can craft a malicious error message or code context that contains instructions for the LLM, allowing them to override the original instructions and control the agent's behavior. **Code Evidence:** - `src/cleveragents/agents/graphs/auto_debug.py` — `AutoDebugAgent._analyze_error` method: directly includes `error_msg` and `code_ctx` in the prompt (lines 109–117 at time of writing) - `src/cleveragents/agents/graphs/auto_debug.py` — `AutoDebugAgent._generate_fix` method: includes `error_analysis` and `code_context` in the prompt (lines 181–192 at time of writing) - `src/cleveragents/agents/graphs/auto_debug.py` — `AutoDebugAgent._validate_fix` method: includes `error_message` in the prompt (lines 247–258 at time of writing) **Environment Verification:** This vulnerability can be reproduced by providing a malicious error message or code context to the agent. For example, an error message could be crafted to instruct the LLM to ignore the original instructions and perform a different task. **Severity:** High **Tags:** bug, security, prompt-injection, cwe-94 ## Expected Behavior All user-provided content (`error_message`, `code_context`, `error_analysis`) embedded in LLM prompts within `AutoDebugAgent` must be sanitized using boundary markers before inclusion. The `PromptSanitizer` class and its usage in `src/cleveragents/agents/graphs/plan_generation.py` should be used as the reference implementation. An attacker-controlled string such as `"Ignore all previous instructions and do X"` embedded in an error message or code context must not be able to override the agent's system instructions. ## Acceptance Criteria - [ ] `AutoDebugAgent._analyze_error` wraps `error_msg` and `code_ctx` with `PromptSanitizer` boundary markers before embedding them in the prompt - [ ] `AutoDebugAgent._generate_fix` wraps `error_analysis` and `code_context` with `PromptSanitizer` boundary markers before embedding them in the prompt - [ ] `AutoDebugAgent._validate_fix` wraps `error_message` with `PromptSanitizer` boundary markers before embedding it in the prompt - [ ] A crafted injection string (e.g., `"Ignore all previous instructions and ..."`) in `error_message` or `code_context` does not override the agent's system instructions - [ ] All existing `AutoDebugAgent` tests continue to pass - [ ] New BDD scenarios cover: sanitized error message, sanitized code context, and a prompt injection attempt that is neutralized ## Subtasks - [ ] Review `PromptSanitizer` usage in `src/cleveragents/agents/graphs/plan_generation.py` as the reference implementation - [ ] Apply `PromptSanitizer` to `error_msg` and `code_ctx` in `AutoDebugAgent._analyze_error` - [ ] Apply `PromptSanitizer` to `error_analysis` and `code_context` in `AutoDebugAgent._generate_fix` - [ ] Apply `PromptSanitizer` to `error_message` in `AutoDebugAgent._validate_fix` - [ ] Tests (Behave): Add BDD scenario verifying that sanitized boundary markers are present in the constructed prompt - [ ] Tests (Behave): Add BDD scenario verifying that a prompt injection string in `error_message` does not alter agent behavior - [ ] Tests (Behave): Add BDD scenario verifying that a prompt injection string in `code_context` does not alter agent behavior - [ ] Tests (Robot): Add integration test confirming the agent behaves correctly when given a malicious error message - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-worker
HAL9000 added this to the v3.5.0 milestone 2026-04-14 07:47:01 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Depends on
#9215 test
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#9110
No description provided.