Critical Security Vulnerability: Arbitrary Code Execution in Tool and Validation Models #8248

Open
opened 2026-04-13 06:21:29 +00:00 by HAL9000 · 2 comments
Owner

Metadata

  • Commit Message: fix(security): sandbox user-provided tool and validation code
  • Branch Name: bugfix/sandbox-user-code

Background and Context

The Tool and Validation models in src/cleveragents/domain/models/core/tool.py allow users to define custom tools and validations with inline Python code. The code field in the Tool model and the transform field in the Validation model are strings that contain Python code.

This code is executed in a way that is not properly sandboxed, leading to a critical arbitrary code execution vulnerability:

  • The code field in the Tool model is executed in a subprocess with a timeout, as seen in src/cleveragents/skills/inline_executor.py. While this provides some isolation, it does not prevent the code from accessing the filesystem, network, or other system resources with the same permissions as the user running the application.
  • The transform field in the Validation model is executed in a so-called "sandboxed context", as seen in src/cleveragents/tool/wrapping.py. However, this "sandbox" only restricts the set of available built-in names and can be easily bypassed.

Steps to Reproduce

  1. Create a new tool with a malicious code field, for example:
    name: local/malicious-tool
    description: A malicious tool
    source: custom
    code: |
      import os
      os.system('echo "malicious code executed" > /tmp/pwned')
    
  2. Run the tool.
  3. Observe that the file /tmp/pwned is created.

Expected Behavior

All user-provided code should be executed in a properly sandboxed environment that prevents it from accessing sensitive resources or performing malicious actions. This could be achieved using technologies like Docker containers, gVisor, or a more robust sandboxing library.

The sandbox must:

  • Prevent access to the filesystem (beyond a designated working directory)
  • Prevent access to the network
  • Prevent access to other sensitive system resources
  • Enforce a strict timeout to prevent denial-of-service attacks

Acceptance Criteria

  • All user-provided code in the Tool.code field is executed in a properly sandboxed environment.
  • All user-provided code in the Validation.transform field is executed in a properly sandboxed environment.
  • The sandbox prevents access to the filesystem (outside a designated working directory), network, and other sensitive resources.
  • The sandbox enforces a strict timeout to prevent denial-of-service attacks.
  • The existing bypass via inline_executor.py subprocess is replaced or hardened.
  • The existing bypass via wrapping.py restricted builtins is replaced with a robust sandbox.
  • The implementation of the sandbox is well-documented.
  • Unit and integration tests cover the sandboxing behaviour, including attempted bypasses.
  • Test coverage remains ≥ 97%.

Subtasks

  • Audit src/cleveragents/skills/inline_executor.py and document all current sandbox escape vectors.
  • Audit src/cleveragents/tool/wrapping.py and document all current sandbox escape vectors.
  • Research and select an appropriate sandboxing technology (e.g., Docker, gVisor, RestrictedPython, seccomp).
  • Design the sandboxed execution interface and document it (ADR or inline docstring).
  • Implement sandboxed execution for Tool.code (replace/harden inline_executor.py).
  • Implement sandboxed execution for Validation.transform (replace/harden wrapping.py).
  • Add timeout enforcement to both sandboxed execution paths.
  • Write unit tests for the new sandbox, including attempted bypass scenarios.
  • Write integration tests verifying end-to-end tool and validation execution within the sandbox.
  • Update documentation and inline comments to describe the sandboxing approach.

Definition of Done

This issue should be closed when:

  1. All acceptance criteria above are met and verified by passing tests.
  2. No known sandbox escape vectors remain in inline_executor.py or wrapping.py.
  3. The chosen sandboxing approach is documented (ADR or equivalent).
  4. CI passes with test coverage ≥ 97%.
  5. A security-focused code review has been completed and approved.

Automated by CleverAgents Bot
Agent: new-issue-creator

## Metadata - **Commit Message**: `fix(security): sandbox user-provided tool and validation code` - **Branch Name**: `bugfix/sandbox-user-code` --- ## Background and Context The `Tool` and `Validation` models in `src/cleveragents/domain/models/core/tool.py` allow users to define custom tools and validations with inline Python code. The `code` field in the `Tool` model and the `transform` field in the `Validation` model are strings that contain Python code. This code is executed in a way that is not properly sandboxed, leading to a critical arbitrary code execution vulnerability: - The `code` field in the `Tool` model is executed in a subprocess with a timeout, as seen in `src/cleveragents/skills/inline_executor.py`. While this provides some isolation, it does not prevent the code from accessing the filesystem, network, or other system resources with the same permissions as the user running the application. - The `transform` field in the `Validation` model is executed in a so-called "sandboxed context", as seen in `src/cleveragents/tool/wrapping.py`. However, this "sandbox" only restricts the set of available built-in names and can be easily bypassed. ### Steps to Reproduce 1. Create a new tool with a malicious `code` field, for example: ```yaml name: local/malicious-tool description: A malicious tool source: custom code: | import os os.system('echo "malicious code executed" > /tmp/pwned') ``` 2. Run the tool. 3. Observe that the file `/tmp/pwned` is created. --- ## Expected Behavior All user-provided code should be executed in a properly sandboxed environment that prevents it from accessing sensitive resources or performing malicious actions. This could be achieved using technologies like Docker containers, gVisor, or a more robust sandboxing library. The sandbox must: - Prevent access to the filesystem (beyond a designated working directory) - Prevent access to the network - Prevent access to other sensitive system resources - Enforce a strict timeout to prevent denial-of-service attacks --- ## Acceptance Criteria - [ ] All user-provided code in the `Tool.code` field is executed in a properly sandboxed environment. - [ ] All user-provided code in the `Validation.transform` field is executed in a properly sandboxed environment. - [ ] The sandbox prevents access to the filesystem (outside a designated working directory), network, and other sensitive resources. - [ ] The sandbox enforces a strict timeout to prevent denial-of-service attacks. - [ ] The existing bypass via `inline_executor.py` subprocess is replaced or hardened. - [ ] The existing bypass via `wrapping.py` restricted builtins is replaced with a robust sandbox. - [ ] The implementation of the sandbox is well-documented. - [ ] Unit and integration tests cover the sandboxing behaviour, including attempted bypasses. - [ ] Test coverage remains ≥ 97%. --- ## Subtasks - [ ] Audit `src/cleveragents/skills/inline_executor.py` and document all current sandbox escape vectors. - [ ] Audit `src/cleveragents/tool/wrapping.py` and document all current sandbox escape vectors. - [ ] Research and select an appropriate sandboxing technology (e.g., Docker, gVisor, `RestrictedPython`, `seccomp`). - [ ] Design the sandboxed execution interface and document it (ADR or inline docstring). - [ ] Implement sandboxed execution for `Tool.code` (replace/harden `inline_executor.py`). - [ ] Implement sandboxed execution for `Validation.transform` (replace/harden `wrapping.py`). - [ ] Add timeout enforcement to both sandboxed execution paths. - [ ] Write unit tests for the new sandbox, including attempted bypass scenarios. - [ ] Write integration tests verifying end-to-end tool and validation execution within the sandbox. - [ ] Update documentation and inline comments to describe the sandboxing approach. --- ## Definition of Done This issue should be closed when: 1. All acceptance criteria above are met and verified by passing tests. 2. No known sandbox escape vectors remain in `inline_executor.py` or `wrapping.py`. 3. The chosen sandboxing approach is documented (ADR or equivalent). 4. CI passes with test coverage ≥ 97%. 5. A security-focused code review has been completed and approved. --- **Automated by CleverAgents Bot** Agent: new-issue-creator
Author
Owner

[AUTO-EPIC] Epic Linkage Assessment

This is a critical security vulnerability in the Tool and Validation model sandboxing. It is a cross-cutting security concern that affects multiple milestones.

Assessment: This issue is a foundational security fix that should be addressed before any milestone that uses Tool or Validation execution. It is appropriate to keep it in the backlog (no milestone) for urgent prioritization.

Recommended Epic: This issue may warrant its own Epic given its scope (requires ADR, sandboxing technology selection, and implementation across multiple modules). Consider creating a dedicated "Security Hardening" Epic if multiple security issues accumulate.

Note: This issue has Priority/Critical and State/Verified — it should be worked on immediately.


Automated by CleverAgents Bot
Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor

## [AUTO-EPIC] Epic Linkage Assessment This is a critical security vulnerability in the Tool and Validation model sandboxing. It is a cross-cutting security concern that affects multiple milestones. **Assessment**: This issue is a foundational security fix that should be addressed before any milestone that uses Tool or Validation execution. It is appropriate to keep it in the backlog (no milestone) for urgent prioritization. **Recommended Epic**: This issue may warrant its own Epic given its scope (requires ADR, sandboxing technology selection, and implementation across multiple modules). Consider creating a dedicated "Security Hardening" Epic if multiple security issues accumulate. **Note**: This issue has Priority/Critical and State/Verified — it should be worked on immediately. --- **Automated by CleverAgents Bot** Supervisor: Epic Planning | Agent: epic-planning-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-13 06:45:52 +00:00
Author
Owner

🔒 Milestone Assigned: v3.5.0 — Arbitrary code execution in Tool and Validation models is a critical security vulnerability. Assigning to v3.5.0 (Autonomy Hardening) as this is a security hardening requirement that must be resolved before autonomous operation. The sandbox implementation requires careful design — an ADR should be created before implementation begins.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

🔒 **Milestone Assigned: v3.5.0** — Arbitrary code execution in Tool and Validation models is a critical security vulnerability. Assigning to v3.5.0 (Autonomy Hardening) as this is a security hardening requirement that must be resolved before autonomous operation. The sandbox implementation requires careful design — an ADR should be created before implementation begins. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8248
No description provided.