BUG-HUNT: [Concurrency] TOCTOU race condition in file permission checking allows inconsistent security decisions #7131

Open
opened 2026-04-10 08:05:14 +00:00 by HAL9000 · 1 comment
Owner

Metadata

  • Branch: bugfix/toctou-race-condition-file-permission-checking
  • Commit Message: fix(cli): eliminate TOCTOU race condition in _check_file_permissions() with atomic stat-based checking
  • Milestone: backlog
  • Parent Epic: #7023

Backlog note: This issue was discovered during autonomous operation
on milestone v3.2.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Bug Report: Concurrency — TOCTOU Race Condition in File Permission Checking

Severity Assessment

  • Impact: Inconsistent security decisions when directory state changes between existence check and permission check
  • Likelihood: Medium in concurrent environments or systems with active file management
  • Priority: High

Location

  • File: src/cleveragents/cli/commands/system.py
  • Function: _check_file_permissions()
  • Lines: ~336–365

Description

The _check_file_permissions() function contains a Time-of-Check-Time-of-Use (TOCTOU) race condition. The function first checks whether the data directory exists (if not data_dir.exists()), then separately calls os.access(data_dir, os.R_OK|os.W_OK) to verify permissions. These two operations are not atomic — a window exists between them during which the directory could be deleted, renamed, have its permissions changed, or be replaced by a symlink, leading to inconsistent or incorrect security decisions.

A related TOCTOU pattern also exists in the _check_database() function in the same file.

Evidence

# Non-atomic check sequence in _check_file_permissions():
if not data_dir.exists():          # Check 1: existence (line ~336)
    return { ... }                 # Early return if missing

readable = os.access(data_dir, os.R_OK)   # Check 2: permissions (line ~344)
writable = os.access(data_dir, os.W_OK)   # Check 3: permissions (line ~345)
# Race window: directory could disappear or change between Check 1 and Checks 2/3

Expected Behavior

Permission checking should be atomic or should gracefully handle the case where the directory disappears or changes between checks. The function should not make security decisions based on stale state.

Actual Behavior

Separate existence check and permission check create a race window. If the directory is deleted, modified, or has permissions changed between the two checks, the function may:

  • Raise an unhandled OSError / PermissionError from os.access()
  • Return an incorrect permission status based on stale state
  • Make inconsistent security decisions in concurrent environments

Suggested Fix

  1. Wrap os.access() calls in try/except to handle OSError for cases where the directory disappears between checks
  2. Use os.stat() for atomic file info retrieval — a single stat() call retrieves existence and permission bits atomically
  3. Document the race condition if it is acceptable for the diagnostic use case
  4. Apply the same fix to _check_database() which has a similar TOCTOU pattern

Example atomic approach:

import stat as stat_module

def _check_file_permissions() -> dict[str, Any]:
    from cleveragents.config.settings import get_settings
    settings = get_settings()
    data_dir = settings.data_dir
    try:
        st = data_dir.stat()  # Atomic: raises FileNotFoundError if missing
        mode = st.st_mode
        readable = bool(mode & stat_module.S_IRUSR)
        writable = bool(mode & stat_module.S_IWUSR)
        ...
    except FileNotFoundError:
        return {"name": "File permissions", "status": CheckStatus.WARN, "details": "data dir does not exist"}
    except OSError as exc:
        return {"name": "File permissions", "status": CheckStatus.ERROR, "details": str(exc)}
  • Similar TOCTOU pattern exists in _check_database() in the same file — should be fixed in the same commit

Subtasks

  • Audit _check_file_permissions() for all non-atomic check sequences
  • Audit _check_database() for the related TOCTOU pattern
  • Replace non-atomic existence + permission checks with a single os.stat() call or equivalent
  • Add try/except around os.access() / os.stat() calls to handle OSError gracefully
  • Update or add Behave BDD scenarios covering the race condition edge cases (directory disappears mid-check, permission denied mid-check)
  • Ensure all nox stages pass after the fix

Definition of Done

  • _check_file_permissions() uses atomic stat-based permission checking with no TOCTOU window
  • _check_database() TOCTOU pattern addressed in the same commit
  • OSError / FileNotFoundError raised during permission checks are caught and returned as structured error results
  • Behave BDD scenarios cover: directory missing, directory disappears mid-check, permission denied
  • No new # type: ignore suppressions introduced
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Bug Hunt Cycle 2 | Agent: new-issue-creator

## Metadata - **Branch**: `bugfix/toctou-race-condition-file-permission-checking` - **Commit Message**: `fix(cli): eliminate TOCTOU race condition in _check_file_permissions() with atomic stat-based checking` - **Milestone**: backlog - **Parent Epic**: #7023 > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.2.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Bug Report: Concurrency — TOCTOU Race Condition in File Permission Checking ### Severity Assessment - **Impact**: Inconsistent security decisions when directory state changes between existence check and permission check - **Likelihood**: Medium in concurrent environments or systems with active file management - **Priority**: High ### Location - **File**: `src/cleveragents/cli/commands/system.py` - **Function**: `_check_file_permissions()` - **Lines**: ~336–365 ### Description The `_check_file_permissions()` function contains a Time-of-Check-Time-of-Use (TOCTOU) race condition. The function first checks whether the data directory exists (`if not data_dir.exists()`), then separately calls `os.access(data_dir, os.R_OK|os.W_OK)` to verify permissions. These two operations are not atomic — a window exists between them during which the directory could be deleted, renamed, have its permissions changed, or be replaced by a symlink, leading to inconsistent or incorrect security decisions. A related TOCTOU pattern also exists in the `_check_database()` function in the same file. ### Evidence ```python # Non-atomic check sequence in _check_file_permissions(): if not data_dir.exists(): # Check 1: existence (line ~336) return { ... } # Early return if missing readable = os.access(data_dir, os.R_OK) # Check 2: permissions (line ~344) writable = os.access(data_dir, os.W_OK) # Check 3: permissions (line ~345) # Race window: directory could disappear or change between Check 1 and Checks 2/3 ``` ### Expected Behavior Permission checking should be atomic or should gracefully handle the case where the directory disappears or changes between checks. The function should not make security decisions based on stale state. ### Actual Behavior Separate existence check and permission check create a race window. If the directory is deleted, modified, or has permissions changed between the two checks, the function may: - Raise an unhandled `OSError` / `PermissionError` from `os.access()` - Return an incorrect permission status based on stale state - Make inconsistent security decisions in concurrent environments ### Suggested Fix 1. **Wrap `os.access()` calls in try/except** to handle `OSError` for cases where the directory disappears between checks 2. **Use `os.stat()` for atomic file info retrieval** — a single `stat()` call retrieves existence and permission bits atomically 3. **Document the race condition** if it is acceptable for the diagnostic use case 4. **Apply the same fix to `_check_database()`** which has a similar TOCTOU pattern Example atomic approach: ```python import stat as stat_module def _check_file_permissions() -> dict[str, Any]: from cleveragents.config.settings import get_settings settings = get_settings() data_dir = settings.data_dir try: st = data_dir.stat() # Atomic: raises FileNotFoundError if missing mode = st.st_mode readable = bool(mode & stat_module.S_IRUSR) writable = bool(mode & stat_module.S_IWUSR) ... except FileNotFoundError: return {"name": "File permissions", "status": CheckStatus.WARN, "details": "data dir does not exist"} except OSError as exc: return {"name": "File permissions", "status": CheckStatus.ERROR, "details": str(exc)} ``` ### Related Issues - Similar TOCTOU pattern exists in `_check_database()` in the same file — should be fixed in the same commit ## Subtasks - [ ] Audit `_check_file_permissions()` for all non-atomic check sequences - [ ] Audit `_check_database()` for the related TOCTOU pattern - [ ] Replace non-atomic existence + permission checks with a single `os.stat()` call or equivalent - [ ] Add try/except around `os.access()` / `os.stat()` calls to handle `OSError` gracefully - [ ] Update or add Behave BDD scenarios covering the race condition edge cases (directory disappears mid-check, permission denied mid-check) - [ ] Ensure all nox stages pass after the fix ## Definition of Done - [ ] `_check_file_permissions()` uses atomic stat-based permission checking with no TOCTOU window - [ ] `_check_database()` TOCTOU pattern addressed in the same commit - [ ] `OSError` / `FileNotFoundError` raised during permission checks are caught and returned as structured error results - [ ] Behave BDD scenarios cover: directory missing, directory disappears mid-check, permission denied - [ ] No new `# type: ignore` suppressions introduced - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Cycle 2 | Agent: new-issue-creator
Author
Owner

Verified — Concurrency bug: TOCTOU race condition in file permission checking. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Concurrency bug: TOCTOU race condition in file permission checking. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#7131
No description provided.