BUG-HUNT: [security] _git_ls_files constructs absolute paths without validating they remain inside resource_location — path traversal possible with malformed git output #6448

Open
opened 2026-04-09 21:04:14 +00:00 by HAL9000 · 1 comment
Owner

Bug Report: [security] _git_ls_files missing path boundary check — path traversal via malicious git output

Severity Assessment

  • Impact: If git ls-files returns a path containing ../ sequences, os.path.join(resource_location, rel_path) resolves to a file outside the resource directory. The hydrator will then read and index that file's contents as LLM context. An attacker who can influence git configuration (e.g., via a malicious .gitmodules, a compromised git submodule config, or a crafted worktree setup) could cause the system to index /etc/passwd, SSH private keys, or other host-level secrets.
  • Likelihood: Low (requires unusual git configuration or a compromised repository), but the impact is critical if triggered (host-level secret exposure to LLM).
  • Priority: High

Location

  • File: src/cleveragents/application/services/context_tier_hydrator.py
  • Function: hydrate_tiers_from_project (caller) and _git_ls_files (path provider)
  • Lines: 117–130 (boundary check missing in caller) and 264–276 (_git_ls_files returns unvalidated paths)

Description

_git_ls_files returns paths from git ls-files --cached --others --exclude-standard without validating that they stay within the root directory:

def _git_ls_files(root: str) -> list[str] | None:
    try:
        result = subprocess.run(
            ["git", "ls-files", "--cached", "--others", "--exclude-standard"],
            cwd=root,
            ...
        )
        ...
        files = []
        for line in result.stdout.strip().split("\n"):
            line = line.strip()
            if not line:
                continue
            ext = os.path.splitext(line)[1].lower()
            if ext in _BINARY_EXTS:
                continue
            files.append(line)   # ← no path boundary check
        return files

The caller then builds an absolute path with no verification:

for rel_path in files:
    abs_path = os.path.join(resource_location, rel_path)  # ← no realpath/boundary check
    ...
    content = Path(abs_path).read_text(encoding="utf-8")  # ← reads the file

Demonstration:

import os
resource_location = '/tmp/safe_resource'
rel_path = '../../../etc/passwd'           # hypothetical malicious git output
abs_path = os.path.join(resource_location, rel_path)
# abs_path = '/tmp/safe_resource/../../../etc/passwd'
resolved = os.path.realpath(abs_path)
# resolved = '/etc/passwd'  ← OUTSIDE safe_resource
print(resolved.startswith(os.path.realpath(resource_location)))  # False

In practice, standard git ls-files should not return ../ paths for a normal repository. However:

  1. Git sparse-checkout with --no-cone mode can produce relative paths.
  2. A compromised or crafted repository could have unusual core.worktree settings.
  3. --others (untracked files) could surface files in unusual locations depending on git's CWD resolution.

The _walk_files fallback path is safe because os.walk() produces absolute dirpath values, and os.path.relpath() is computed from within the root — but relative paths that navigate upward via symlinks are also not fully prevented.

Expected Behavior

All paths returned by _git_ls_files (and used in hydrate_tiers_from_project) should be validated to confirm the resolved absolute path starts within resource_location:

real_root = os.path.realpath(resource_location)
abs_path = os.path.realpath(os.path.join(resource_location, rel_path))
if not abs_path.startswith(real_root + os.sep) and abs_path != real_root:
    logger.warning("context_hydrator.path_traversal_blocked", path=rel_path)
    continue

Actual Behavior

No boundary check is performed. Any path returned by git ls-files — including paths with ../ traversal — is read and stored as context.

Suggested Fix

Add a boundary validation step in hydrate_tiers_from_project before the read_text call:

real_root = os.path.realpath(resource_location)

for rel_path in files:
    abs_path = os.path.join(resource_location, rel_path)
    # Validate path stays within resource root (prevent path traversal)
    real_path = os.path.realpath(abs_path)
    if not (real_path == real_root or real_path.startswith(real_root + os.sep)):
        logger.warning(
            "context_hydrator.path_traversal_blocked",
            rel_path=rel_path,
            resolved=real_path,
            resource_root=real_root,
        )
        continue
    # ... rest of processing

Category

security

TDD Note

After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: @tdd_issue, @tdd_issue_<this-issue-number>, and @tdd_expected_fail to prove the bug exists before fixing it.


Automated by CleverAgents Bot
Supervisor: Bug Hunting | Agent: bug-hunter

## Bug Report: [security] `_git_ls_files` missing path boundary check — path traversal via malicious git output ### Severity Assessment - **Impact**: If `git ls-files` returns a path containing `../` sequences, `os.path.join(resource_location, rel_path)` resolves to a file **outside** the resource directory. The hydrator will then read and index that file's contents as LLM context. An attacker who can influence git configuration (e.g., via a malicious `.gitmodules`, a compromised git submodule config, or a crafted `worktree` setup) could cause the system to index `/etc/passwd`, SSH private keys, or other host-level secrets. - **Likelihood**: Low (requires unusual git configuration or a compromised repository), but the impact is critical if triggered (host-level secret exposure to LLM). - **Priority**: High ### Location - **File**: `src/cleveragents/application/services/context_tier_hydrator.py` - **Function**: `hydrate_tiers_from_project` (caller) and `_git_ls_files` (path provider) - **Lines**: 117–130 (boundary check missing in caller) and 264–276 (`_git_ls_files` returns unvalidated paths) ### Description `_git_ls_files` returns paths from `git ls-files --cached --others --exclude-standard` without validating that they stay within the `root` directory: ```python def _git_ls_files(root: str) -> list[str] | None: try: result = subprocess.run( ["git", "ls-files", "--cached", "--others", "--exclude-standard"], cwd=root, ... ) ... files = [] for line in result.stdout.strip().split("\n"): line = line.strip() if not line: continue ext = os.path.splitext(line)[1].lower() if ext in _BINARY_EXTS: continue files.append(line) # ← no path boundary check return files ``` The caller then builds an absolute path with no verification: ```python for rel_path in files: abs_path = os.path.join(resource_location, rel_path) # ← no realpath/boundary check ... content = Path(abs_path).read_text(encoding="utf-8") # ← reads the file ``` Demonstration: ```python import os resource_location = '/tmp/safe_resource' rel_path = '../../../etc/passwd' # hypothetical malicious git output abs_path = os.path.join(resource_location, rel_path) # abs_path = '/tmp/safe_resource/../../../etc/passwd' resolved = os.path.realpath(abs_path) # resolved = '/etc/passwd' ← OUTSIDE safe_resource print(resolved.startswith(os.path.realpath(resource_location))) # False ``` In practice, standard `git ls-files` should not return `../` paths for a normal repository. However: 1. Git sparse-checkout with `--no-cone` mode can produce relative paths. 2. A compromised or crafted repository could have unusual `core.worktree` settings. 3. `--others` (untracked files) could surface files in unusual locations depending on git's CWD resolution. The `_walk_files` fallback path is **safe** because `os.walk()` produces absolute `dirpath` values, and `os.path.relpath()` is computed from within the root — but relative paths that navigate upward via symlinks are also not fully prevented. ### Expected Behavior All paths returned by `_git_ls_files` (and used in `hydrate_tiers_from_project`) should be validated to confirm the resolved absolute path starts within `resource_location`: ```python real_root = os.path.realpath(resource_location) abs_path = os.path.realpath(os.path.join(resource_location, rel_path)) if not abs_path.startswith(real_root + os.sep) and abs_path != real_root: logger.warning("context_hydrator.path_traversal_blocked", path=rel_path) continue ``` ### Actual Behavior No boundary check is performed. Any path returned by `git ls-files` — including paths with `../` traversal — is read and stored as context. ### Suggested Fix Add a boundary validation step in `hydrate_tiers_from_project` before the `read_text` call: ```python real_root = os.path.realpath(resource_location) for rel_path in files: abs_path = os.path.join(resource_location, rel_path) # Validate path stays within resource root (prevent path traversal) real_path = os.path.realpath(abs_path) if not (real_path == real_root or real_path.startswith(real_root + os.sep)): logger.warning( "context_hydrator.path_traversal_blocked", rel_path=rel_path, resolved=real_path, resource_root=real_root, ) continue # ... rest of processing ``` ### Category security ### TDD Note After this bug issue is verified, a corresponding Type/Testing issue will be created for TDD. The test will use tags: `@tdd_issue`, `@tdd_issue_<this-issue-number>`, and `@tdd_expected_fail` to prove the bug exists before fixing it. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunting | Agent: bug-hunter
Author
Owner

Verified — Critical security bug. Path traversal possible with malformed git output. MoSCoW: Must Have — security vulnerability that must be fixed.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Critical security bug. Path traversal possible with malformed git output. **MoSCoW: Must Have** — security vulnerability that must be fixed. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
HAL9000 added this to the v3.5.0 milestone 2026-04-17 08:45:07 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6448
No description provided.