UAT: UKOIndexer instantiated without content_reader in DI container — LocationContentReader defaults to no base_dir, allowing unrestricted filesystem access #3962

Open
opened 2026-04-06 07:54:29 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: fix/security-uko-indexer-base-dir
  • Commit Message: fix(security): wire base_dir into UKOIndexer LocationContentReader in DI container
  • Milestone: (none — backlog)
  • Parent Epic: #362

Bug Report

What Was Tested

Path containment security in the UKO Indexer's content reader.

Expected Behavior (from spec)

Per docs/specification.md §Security Model — Sandbox Isolation:

The sandbox path is derived deterministically from the plan ULID, preventing path traversal.

The LocationContentReader class explicitly documents that base_dir must be set in production to prevent unrestricted file access:

# From uko_indexer_protocols.py:
if self._base_dir is None:
    logger.warning(
        "content_reader.no_base_dir",
        hint="No base_dir set — reader can access any file the "
        "process can read. Set base_dir in production.",
    )

Actual Behavior

UKOIndexer is wired in the DI container (application/container.py line 799) without a content_reader parameter:

# application/container.py lines 799-805
uko_indexer = providers.Singleton(
    UKOIndexer,
    analyzer_registry=analyzer_registry,
    graph_backend=index_graph_backend,
    text_backend=index_text_backend,
    vector_backend=index_vector_backend,
    # ← No content_reader provided!
)

This causes UKOIndexer.__init__ to fall back to LocationContentReader() with no base_dir:

# uko_indexer.py line 85
self._content_reader: ContentReader = content_reader or LocationContentReader()
#                                                                              ^
#                                                                    No base_dir!

As a result, the UKO indexer can read any file the process can access — including files outside the project's resource directories. A resource with a crafted location value (e.g., ../../../../etc/passwd) would be read without restriction.

Code Locations

  • src/cleveragents/application/container.py:799-805 — DI wiring missing content_reader
  • src/cleveragents/application/services/uko_indexer.py:85 — fallback to LocationContentReader() without base_dir
  • src/cleveragents/application/services/uko_indexer_protocols.py:153-220LocationContentReader class with base_dir enforcement

Fix Required

Wire a LocationContentReader with an appropriate base_dir in the DI container:

# application/container.py
from cleveragents.application.services.uko_indexer_protocols import LocationContentReader

uko_content_reader = providers.Singleton(
    LocationContentReader,
    base_dir=settings.provided.data_dir,  # or appropriate base directory
)

uko_indexer = providers.Singleton(
    UKOIndexer,
    analyzer_registry=analyzer_registry,
    graph_backend=index_graph_backend,
    text_backend=index_text_backend,
    vector_backend=index_vector_backend,
    content_reader=uko_content_reader,  # ← Add this
)

Subtasks

  • Determine the appropriate base_dir for the LocationContentReader (likely the configured data directory or project root)
  • Wire LocationContentReader with base_dir in application/container.py
  • Add BDD scenario verifying that the UKO indexer rejects resource locations outside the base directory
  • Verify nox -e unit_tests passes

Definition of Done

  • UKOIndexer in the DI container receives a LocationContentReader with a non-None base_dir
  • The content_reader.no_base_dir warning is no longer emitted in production
  • A BDD scenario exists verifying path containment in the UKO indexer
  • nox -e unit_tests and nox -e typecheck pass
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `fix/security-uko-indexer-base-dir` - **Commit Message**: `fix(security): wire base_dir into UKOIndexer LocationContentReader in DI container` - **Milestone**: _(none — backlog)_ - **Parent Epic**: #362 ## Bug Report ### What Was Tested Path containment security in the UKO Indexer's content reader. ### Expected Behavior (from spec) Per `docs/specification.md` §Security Model — Sandbox Isolation: > The sandbox path is derived deterministically from the plan ULID, **preventing path traversal**. The `LocationContentReader` class explicitly documents that `base_dir` must be set in production to prevent unrestricted file access: ```python # From uko_indexer_protocols.py: if self._base_dir is None: logger.warning( "content_reader.no_base_dir", hint="No base_dir set — reader can access any file the " "process can read. Set base_dir in production.", ) ``` ### Actual Behavior `UKOIndexer` is wired in the DI container (`application/container.py` line 799) **without** a `content_reader` parameter: ```python # application/container.py lines 799-805 uko_indexer = providers.Singleton( UKOIndexer, analyzer_registry=analyzer_registry, graph_backend=index_graph_backend, text_backend=index_text_backend, vector_backend=index_vector_backend, # ← No content_reader provided! ) ``` This causes `UKOIndexer.__init__` to fall back to `LocationContentReader()` with no `base_dir`: ```python # uko_indexer.py line 85 self._content_reader: ContentReader = content_reader or LocationContentReader() # ^ # No base_dir! ``` As a result, the UKO indexer can read **any file the process can access** — including files outside the project's resource directories. A resource with a crafted `location` value (e.g., `../../../../etc/passwd`) would be read without restriction. ### Code Locations - `src/cleveragents/application/container.py:799-805` — DI wiring missing `content_reader` - `src/cleveragents/application/services/uko_indexer.py:85` — fallback to `LocationContentReader()` without `base_dir` - `src/cleveragents/application/services/uko_indexer_protocols.py:153-220` — `LocationContentReader` class with `base_dir` enforcement ### Fix Required Wire a `LocationContentReader` with an appropriate `base_dir` in the DI container: ```python # application/container.py from cleveragents.application.services.uko_indexer_protocols import LocationContentReader uko_content_reader = providers.Singleton( LocationContentReader, base_dir=settings.provided.data_dir, # or appropriate base directory ) uko_indexer = providers.Singleton( UKOIndexer, analyzer_registry=analyzer_registry, graph_backend=index_graph_backend, text_backend=index_text_backend, vector_backend=index_vector_backend, content_reader=uko_content_reader, # ← Add this ) ``` ## Subtasks - [ ] Determine the appropriate `base_dir` for the `LocationContentReader` (likely the configured data directory or project root) - [ ] Wire `LocationContentReader` with `base_dir` in `application/container.py` - [ ] Add BDD scenario verifying that the UKO indexer rejects resource locations outside the base directory - [ ] Verify `nox -e unit_tests` passes ## Definition of Done - [ ] `UKOIndexer` in the DI container receives a `LocationContentReader` with a non-`None` `base_dir` - [ ] The `content_reader.no_base_dir` warning is no longer emitted in production - [ ] A BDD scenario exists verifying path containment in the UKO indexer - [ ] `nox -e unit_tests` and `nox -e typecheck` pass - All nox stages pass - Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#3962
No description provided.