BUG: [security] Potential prompt injection vulnerability in agent graphs #3236

Open
opened 2026-04-05 08:13:08 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/security-prompt-injection-agent-graphs
  • Commit Message: fix(agents): sanitize prompt construction to prevent prompt injection in agent graphs
  • Milestone: (none — backlog)
  • Parent Epic: #362

Background and Context

During autonomous bug hunting, a potential prompt injection vulnerability was identified in the agent graph prompt construction code. The PlanGenerationGraph and AutoDebugAgent classes use Python f-strings to directly concatenate user-provided data into LLM prompt strings. This pattern is a well-known security anti-pattern for LLM-based systems: an attacker who controls any portion of the interpolated input can craft a payload that overrides system instructions, exfiltrates context, or causes the LLM to execute unintended commands.

This is particularly relevant to CleverAgents' architecture because actors and agent graphs are the core execution units of the plan lifecycle (Strategize / Execute / Apply phases). A compromised prompt in PlanGenerationGraph._create_prompts or AutoDebugAgent._analyze_error / _generate_fix / _validate_fix could subvert plan decisions, invariant enforcement, or sandbox safety controls.

Current Behavior

User-controlled input (e.g., error messages, code context, user-supplied plan arguments) is directly interpolated into prompt strings using f-strings:

# src/cleveragents/agents/graphs/plan_generation.py:L217
f"Request: {_bs}\n{{prompt}}\n{_be}\n"
# src/cleveragents/agents/graphs/auto_debug.py:L111
f"""Error Message:
{error_msg}

Code Context:
{code_ctx}

Analyze this error and provide insights."""

There is no separation between system instructions and user-provided data, and no escaping or sanitization of the interpolated values.

Affected locations:

  • src/cleveragents/agents/graphs/plan_generation.pyPlanGenerationGraph._create_prompts (lines 209–252)
  • src/cleveragents/agents/graphs/auto_debug.pyAutoDebugAgent._analyze_error (lines 98–118), AutoDebugAgent._generate_fix (lines 164–193), AutoDebugAgent._validate_fix (lines 231–259)

Expected Behavior

User-provided data must be clearly separated from system instructions in all LLM prompt construction. The recommended approach is to use a structured multi-turn message format (e.g., a list of role-keyed message dicts: [{"role": "system", "content": "..."}, {"role": "user", "content": user_data}]) so that the LLM runtime can enforce role boundaries. This prevents user content from being interpreted as system instructions regardless of its content.

Acceptance Criteria

  • All f-string prompt constructions that interpolate user-controlled data in plan_generation.py and auto_debug.py are refactored to use structured message lists with explicit role separation.
  • No user-provided value is concatenated directly into a system-role prompt string.
  • The refactored prompt construction is covered by BDD scenarios that verify role separation is maintained even when the user input contains adversarial content (e.g., "Ignore previous instructions and...").
  • All existing tests continue to pass after the refactor.
  • Static analysis (linting, type checking, security scanning) passes with no new violations.

Supporting Information

  • Severity: High (attacker can manipulate LLM behavior; depends on user input reaching these code paths)
  • Likelihood: Medium (requires attacker-controlled input to reach the affected functions)
  • CWE: CWE-77 (Improper Neutralization of Special Elements used in a Command)
  • Related pattern: Prompt injection is analogous to SQL injection — the fix is the same: separate data from instructions using structured formats rather than string concatenation.
  • Discovered by: ca-bug-hunter during autonomous bug hunting on the codebase.

Backlog note: This issue was discovered during autonomous operation
on milestone v3.5.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Subtasks

  • Audit all f-string prompt constructions in plan_generation.py (PlanGenerationGraph._create_prompts, lines 209–252) and refactor to structured message lists
  • Audit all f-string prompt constructions in auto_debug.py (AutoDebugAgent._analyze_error, _generate_fix, _validate_fix) and refactor to structured message lists
  • Verify no other agent graph files use the same unsafe pattern (extend fix if found)
  • Tests (Behave): Add BDD scenarios for prompt construction with adversarial user input to verify role separation
  • Tests (Robot): Add integration test verifying the LLM receives correctly structured messages
  • Verify coverage ≥ 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
  • All nox stages pass.
  • Coverage ≥ 97%.

Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/security-prompt-injection-agent-graphs` - **Commit Message**: `fix(agents): sanitize prompt construction to prevent prompt injection in agent graphs` - **Milestone**: *(none — backlog)* - **Parent Epic**: #362 ## Background and Context During autonomous bug hunting, a potential prompt injection vulnerability was identified in the agent graph prompt construction code. The `PlanGenerationGraph` and `AutoDebugAgent` classes use Python f-strings to directly concatenate user-provided data into LLM prompt strings. This pattern is a well-known security anti-pattern for LLM-based systems: an attacker who controls any portion of the interpolated input can craft a payload that overrides system instructions, exfiltrates context, or causes the LLM to execute unintended commands. This is particularly relevant to CleverAgents' architecture because actors and agent graphs are the core execution units of the plan lifecycle (Strategize / Execute / Apply phases). A compromised prompt in `PlanGenerationGraph._create_prompts` or `AutoDebugAgent._analyze_error` / `_generate_fix` / `_validate_fix` could subvert plan decisions, invariant enforcement, or sandbox safety controls. ## Current Behavior User-controlled input (e.g., error messages, code context, user-supplied plan arguments) is directly interpolated into prompt strings using f-strings: ```python # src/cleveragents/agents/graphs/plan_generation.py:L217 f"Request: {_bs}\n{{prompt}}\n{_be}\n" ``` ```python # src/cleveragents/agents/graphs/auto_debug.py:L111 f"""Error Message: {error_msg} Code Context: {code_ctx} Analyze this error and provide insights.""" ``` There is no separation between system instructions and user-provided data, and no escaping or sanitization of the interpolated values. **Affected locations:** - `src/cleveragents/agents/graphs/plan_generation.py` — `PlanGenerationGraph._create_prompts` (lines 209–252) - `src/cleveragents/agents/graphs/auto_debug.py` — `AutoDebugAgent._analyze_error` (lines 98–118), `AutoDebugAgent._generate_fix` (lines 164–193), `AutoDebugAgent._validate_fix` (lines 231–259) ## Expected Behavior User-provided data must be clearly separated from system instructions in all LLM prompt construction. The recommended approach is to use a structured multi-turn message format (e.g., a list of role-keyed message dicts: `[{"role": "system", "content": "..."}, {"role": "user", "content": user_data}]`) so that the LLM runtime can enforce role boundaries. This prevents user content from being interpreted as system instructions regardless of its content. ## Acceptance Criteria - [ ] All f-string prompt constructions that interpolate user-controlled data in `plan_generation.py` and `auto_debug.py` are refactored to use structured message lists with explicit role separation. - [ ] No user-provided value is concatenated directly into a system-role prompt string. - [ ] The refactored prompt construction is covered by BDD scenarios that verify role separation is maintained even when the user input contains adversarial content (e.g., `"Ignore previous instructions and..."`). - [ ] All existing tests continue to pass after the refactor. - [ ] Static analysis (linting, type checking, security scanning) passes with no new violations. ## Supporting Information - **Severity**: High (attacker can manipulate LLM behavior; depends on user input reaching these code paths) - **Likelihood**: Medium (requires attacker-controlled input to reach the affected functions) - **CWE**: CWE-77 (Improper Neutralization of Special Elements used in a Command) - **Related pattern**: Prompt injection is analogous to SQL injection — the fix is the same: separate data from instructions using structured formats rather than string concatenation. - Discovered by: `ca-bug-hunter` during autonomous bug hunting on the codebase. > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.5.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Subtasks - [ ] Audit all f-string prompt constructions in `plan_generation.py` (`PlanGenerationGraph._create_prompts`, lines 209–252) and refactor to structured message lists - [ ] Audit all f-string prompt constructions in `auto_debug.py` (`AutoDebugAgent._analyze_error`, `_generate_fix`, `_validate_fix`) and refactor to structured message lists - [ ] Verify no other agent graph files use the same unsafe pattern (extend fix if found) - [ ] Tests (Behave): Add BDD scenarios for prompt construction with adversarial user input to verify role separation - [ ] Tests (Robot): Add integration test verifying the LLM receives correctly structured messages - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage ≥ 97%. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: ca-new-issue-creator
freemo added this to the v3.6.0 milestone 2026-04-05 08:26:27 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: High (elevated from Backlog — prompt injection is a serious security concern)
  • Milestone: v3.6.0 (Post-MVP Security scope)
  • MoSCoW: Should Have — prompt injection vulnerabilities in agent graphs can subvert plan decisions and safety controls. While the likelihood depends on attacker-controlled input reaching these code paths, the impact is high. This should be addressed before production use.
  • Parent Epic: #362 (Security & Safety Hardening) / #400 (Post-MVP Security)

This is the second security issue found in the agent graphs (alongside #3089 path traversal). Both should be prioritized together as part of a security hardening pass.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: High (elevated from Backlog — prompt injection is a serious security concern) - **Milestone**: v3.6.0 (Post-MVP Security scope) - **MoSCoW**: Should Have — prompt injection vulnerabilities in agent graphs can subvert plan decisions and safety controls. While the likelihood depends on attacker-controlled input reaching these code paths, the impact is high. This should be addressed before production use. - **Parent Epic**: #362 (Security & Safety Hardening) / #400 (Post-MVP Security) This is the second security issue found in the agent graphs (alongside #3089 path traversal). Both should be prioritized together as part of a security hardening pass. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#362 Epic: Security & Safety Hardening
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3236
No description provided.