UAT: CopyOnWriteSandbox does not skip .git, node_modules, __pycache__ — spec-required protected directories are copied into and committed from sandbox #6649

Open
opened 2026-04-09 22:45:19 +00:00 by HAL9000 · 0 comments
Owner

Summary

The spec requires that sandbox file-writing skips protected directories (.git, node_modules, __pycache__). The CopyOnWriteSandbox implementation does not implement this protection — it copies all directories including .git, node_modules, and __pycache__ into the sandbox, and commits all changes (including changes to these directories) back to the original.

Spec Reference

From the Security & Safety specification:

Protected directory skipping.git, node_modules, __pycache__ not written to

Expected Behavior

When the CopyOnWriteSandbox creates a sandbox copy and commits changes back:

  1. .git directory should NOT be copied into the sandbox (to prevent git history corruption)
  2. node_modules and __pycache__ should NOT be copied (performance + security)
  3. Changes to these directories in the sandbox should NOT be committed back to the original

Actual Behavior

create() in copy_on_write.py (line ~100):

shutil.copytree(
    self._original_path,
    self._sandbox_path,
    symlinks=True,
    dirs_exist_ok=False,
)

This copies ALL directories including .git, node_modules, __pycache__ into the sandbox.

compute_diff() in _fs_utils.py (line ~130):

sandbox_files: set[str] = {
    os.path.relpath(os.path.join(dp, f), sandbox_dir)
    for dp, _, fnames in os.walk(sandbox_dir)
    for f in fnames
}

os.walk() traverses ALL directories with no exclusions, so changes to .git, node_modules, __pycache__ are included in the diff and committed back.

Code Locations

  • src/cleveragents/infrastructure/sandbox/copy_on_write.pycreate() method uses shutil.copytree() with no exclusions
  • src/cleveragents/infrastructure/sandbox/_fs_utils.pycompute_diff() uses os.walk() with no exclusions
  • src/cleveragents/infrastructure/sandbox/_fs_utils.pybackup_directory() uses os.walk() with no exclusions

Security Impact

Copying .git into the sandbox and then committing changes back could:

  1. Corrupt git history if the agent modifies .git/ contents in the sandbox
  2. Expose git credentials stored in .git/config to the sandbox environment
  3. Cause unexpected behavior when the sandbox is committed back (git objects may be corrupted)

Steps to Reproduce

  1. Create a CopyOnWriteSandbox for a git repository
  2. Call create(plan_id) — observe that .git is copied into the sandbox
  3. Modify a file in the sandbox
  4. Call commit() — observe that .git changes (if any) are synced back

Fix Suggestion

Add an ignore parameter to shutil.copytree() in create():

shutil.copytree(
    self._original_path,
    self._sandbox_path,
    symlinks=True,
    dirs_exist_ok=False,
    ignore=shutil.ignore_patterns('.git', 'node_modules', '__pycache__'),
)

And filter os.walk() in compute_diff() and backup_directory() to skip these directories.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Summary The spec requires that sandbox file-writing skips protected directories (`.git`, `node_modules`, `__pycache__`). The `CopyOnWriteSandbox` implementation does not implement this protection — it copies all directories including `.git`, `node_modules`, and `__pycache__` into the sandbox, and commits all changes (including changes to these directories) back to the original. ## Spec Reference From the Security & Safety specification: > **Protected directory skipping** — `.git`, `node_modules`, `__pycache__` not written to ## Expected Behavior When the `CopyOnWriteSandbox` creates a sandbox copy and commits changes back: 1. `.git` directory should NOT be copied into the sandbox (to prevent git history corruption) 2. `node_modules` and `__pycache__` should NOT be copied (performance + security) 3. Changes to these directories in the sandbox should NOT be committed back to the original ## Actual Behavior **`create()` in `copy_on_write.py`** (line ~100): ```python shutil.copytree( self._original_path, self._sandbox_path, symlinks=True, dirs_exist_ok=False, ) ``` This copies ALL directories including `.git`, `node_modules`, `__pycache__` into the sandbox. **`compute_diff()` in `_fs_utils.py`** (line ~130): ```python sandbox_files: set[str] = { os.path.relpath(os.path.join(dp, f), sandbox_dir) for dp, _, fnames in os.walk(sandbox_dir) for f in fnames } ``` `os.walk()` traverses ALL directories with no exclusions, so changes to `.git`, `node_modules`, `__pycache__` are included in the diff and committed back. ## Code Locations - `src/cleveragents/infrastructure/sandbox/copy_on_write.py` — `create()` method uses `shutil.copytree()` with no exclusions - `src/cleveragents/infrastructure/sandbox/_fs_utils.py` — `compute_diff()` uses `os.walk()` with no exclusions - `src/cleveragents/infrastructure/sandbox/_fs_utils.py` — `backup_directory()` uses `os.walk()` with no exclusions ## Security Impact Copying `.git` into the sandbox and then committing changes back could: 1. Corrupt git history if the agent modifies `.git/` contents in the sandbox 2. Expose git credentials stored in `.git/config` to the sandbox environment 3. Cause unexpected behavior when the sandbox is committed back (git objects may be corrupted) ## Steps to Reproduce 1. Create a `CopyOnWriteSandbox` for a git repository 2. Call `create(plan_id)` — observe that `.git` is copied into the sandbox 3. Modify a file in the sandbox 4. Call `commit()` — observe that `.git` changes (if any) are synced back ## Fix Suggestion Add an `ignore` parameter to `shutil.copytree()` in `create()`: ```python shutil.copytree( self._original_path, self._sandbox_path, symlinks=True, dirs_exist_ok=False, ignore=shutil.ignore_patterns('.git', 'node_modules', '__pycache__'), ) ``` And filter `os.walk()` in `compute_diff()` and `backup_directory()` to skip these directories. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
HAL9000 added this to the v3.2.0 milestone 2026-04-09 23:04:01 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6649
No description provided.