feat(acms): implement Real-time Index Sync / UKOIndexer with pluggable analyzers #612

P2:should-fix — File move to different directory leaves file unwatched.

When a FileMovedEvent fires, _handle_fs_event updates _watched_paths to map dest_path → (resource_id, project) but does NOT schedule a watchdog watch on dest_path's parent directory (unlike watch() which does check and schedule). If the file moved to a directory not already monitored, future modifications at the new location won't be detected.

Suggested fix: After updating _watched_paths for a move, check if the new parent directory needs a watch:

if isinstance(event, FileMovedEvent):
    dest_path = str(Path(str(event.dest_path)).resolve())
    self._watched_paths.pop(src_path, None)
    self._watched_paths[dest_path] = (resource_id, project)
    # Schedule watch on new parent if needed
    new_parent = str(Path(dest_path).parent)
    if (self._running and self._observer is not None
            and new_parent not in self._dir_watches):
        handler = _ResourceChangeHandler(self)
        handle = self._observer.schedule(handler, new_parent, recursive=False)
        self._dir_watches[new_parent] = handle

**P2:should-fix** — File move to different directory leaves file unwatched. When a `FileMovedEvent` fires, `_handle_fs_event` updates `_watched_paths` to map `dest_path → (resource_id, project)` but does NOT schedule a watchdog watch on `dest_path`'s parent directory (unlike `watch()` which does check and schedule). If the file moved to a directory not already monitored, future modifications at the new location won't be detected. Suggested fix: After updating `_watched_paths` for a move, check if the new parent directory needs a watch: ```python if isinstance(event, FileMovedEvent): dest_path = str(Path(str(event.dest_path)).resolve()) self._watched_paths.pop(src_path, None) self._watched_paths[dest_path] = (resource_id, project) # Schedule watch on new parent if needed new_parent = str(Path(dest_path).parent) if (self._running and self._observer is not None and new_parent not in self._dir_watches): handler = _ResourceChangeHandler(self) handle = self._observer.schedule(handler, new_parent, recursive=False) self._dir_watches[new_parent] = handle ```

src/cleveragents/application/services/uko_indexer.py Outdated

						
				@@ -0,0 +444,4 @@

				            self._resource_subjects.pop(resource_id, None)

				            self._indexed_resources.pop(resource_id, None)

				            self._resource_analyzer.pop(resource_id, None)

				            self._resource_locks.pop(resource_id, None)

brent.edwards commented

P1:must-fix — Per-resource lock deleted while caller still holds it.

_remove_resource_internal pops the resource's lock from _resource_locks (line 447). But the callers (index_resource, reindex_resource) invoke _remove_resource_internal inside a with res_lock: block, meaning they still hold the old lock object.

Race scenario:

Thread A calls index_resource(res1), acquires res_lock_A from _resource_locks["res1"]
Thread A calls _remove_resource_internal → pops _resource_locks["res1"]
Thread B calls index_resource(res1), calls _resource_lock("res1") → creates new res_lock_B (old one was popped)
Thread B acquires res_lock_B (different object than res_lock_A)
Both threads now run _index_resource_core concurrently for the same resource — data corruption

Suggested fix: Remove the _resource_locks.pop(resource_id, None) line. Let per-resource locks persist (they're lightweight threading.Lock objects). If cleanup is desired, do it lazily or in a separate maintenance method.

**P1:must-fix** — Per-resource lock deleted while caller still holds it. `_remove_resource_internal` pops the resource's lock from `_resource_locks` (line 447). But the callers (`index_resource`, `reindex_resource`) invoke `_remove_resource_internal` inside a `with res_lock:` block, meaning they still hold the old lock object. Race scenario: 1. Thread A calls `index_resource(res1)`, acquires `res_lock_A` from `_resource_locks["res1"]` 2. Thread A calls `_remove_resource_internal` → pops `_resource_locks["res1"]` 3. Thread B calls `index_resource(res1)`, calls `_resource_lock("res1")` → creates **new** `res_lock_B` (old one was popped) 4. Thread B acquires `res_lock_B` (different object than `res_lock_A`) 5. Both threads now run `_index_resource_core` concurrently for the same resource — data corruption Suggested fix: Remove the `_resource_locks.pop(resource_id, None)` line. Let per-resource locks persist (they're lightweight `threading.Lock` objects). If cleanup is desired, do it lazily or in a separate maintenance method.

src/cleveragents/application/services/uko_indexer_internals.py Outdated

						
				@@ -0,0 +41,4 @@

				    """Fire *on_indexed* lifecycle hook, guarding against failures."""

				    try:

				        hook.on_indexed(result)

				    except Exception:

brent.edwards commented

P3:nit — The except Exception block here (and in fire_on_removed, fire_on_error) swallows the exception without logging its message or traceback. This makes debugging hook failures very difficult — you'll see "lifecycle_hook_error" in logs but no indication of what went wrong.

Consider logger.warning(..., exc_info=True) or at minimum including error=str(exc) in the log event.

**P3:nit** — The `except Exception` block here (and in `fire_on_removed`, `fire_on_error`) swallows the exception without logging its message or traceback. This makes debugging hook failures very difficult — you'll see "lifecycle_hook_error" in logs but no indication of *what* went wrong. Consider `logger.warning(..., exc_info=True)` or at minimum including `error=str(exc)` in the log event.

src/cleveragents/application/services/uko_indexer_protocols.py Outdated

						
				@@ -0,0 +224,4 @@

				        # Re-open in text mode for proper UTF-8 decoding (the raw fd

				        # was opened non-blocking only for the fstat check).

				        os.close(fd)

brent.edwards commented

P1:must-fix — TOCTOU vulnerability: fd validated then closed, path re-opened.

The code opens with O_NONBLOCK, validates via fstat that it's a regular file, then closes the fd and re-opens the path in text mode. Between os.close(fd) and open(resolved, ...), the file could be replaced with:

A FIFO (causing indefinite block on the text-mode open)
A symlink to a file outside base_dir (bypassing security check)

The code comment on line 208 claims to "eliminate the TOCTOU window" but this close-and-reopen creates a new one.

Suggested fix — keep the fd and wrap it:

# Instead of os.close(fd) + open(resolved)
fh = os.fdopen(fd, 'r', encoding='utf-8')
try:
    content = fh.read(self._max_content_size + 1)
finally:
    fh.close()

This ensures the security check and the content read operate on the same inode.

**P1:must-fix** — TOCTOU vulnerability: fd validated then closed, path re-opened. The code opens with `O_NONBLOCK`, validates via `fstat` that it's a regular file, then **closes the fd** and re-opens the path in text mode. Between `os.close(fd)` and `open(resolved, ...)`, the file could be replaced with: - A FIFO (causing indefinite block on the text-mode `open`) - A symlink to a file outside `base_dir` (bypassing security check) The code comment on line 208 claims to "eliminate the TOCTOU window" but this close-and-reopen creates a new one. Suggested fix — keep the fd and wrap it: ```python # Instead of os.close(fd) + open(resolved) fh = os.fdopen(fd, 'r', encoding='utf-8') try: content = fh.read(self._max_content_size + 1) finally: fh.close() ``` This ensures the security check and the content read operate on the **same inode**.

src/cleveragents/domain/models/acms/analyzers.py Outdated

						
				@@ -86,3 +83,1 @@

				        if not value:

				            raise ValueError("predicate must not be empty.")

				        return value

				    @model_validator(mode="after")

brent.edwards commented

P2:should-fix — New behavioral constraint that may break existing callers.

This _validate_object model validator is a NEW requirement: previously, UKOTriple could be created with both object_uri and object_value empty/default. This is a breaking change for any existing code that relies on creating triples with no object (e.g., partial construction patterns).

Please add a note to CHANGELOG.md under Breaking Changes so downstream consumers are aware. If any existing analyzers produce triples with no object, they'll fail validation at runtime.

**P2:should-fix** — New behavioral constraint that may break existing callers. This `_validate_object` model validator is a NEW requirement: previously, `UKOTriple` could be created with both `object_uri` and `object_value` empty/default. This is a breaking change for any existing code that relies on creating triples with no object (e.g., partial construction patterns). Please add a note to `CHANGELOG.md` under Breaking Changes so downstream consumers are aware. If any existing analyzers produce triples with no object, they'll fail validation at runtime.

src/cleveragents/domain/models/acms/index_backends.py Outdated

						
				@@ -0,0 +74,4 @@

				        if not self.doc_id:

				            raise ValueError("doc_id must be a non-empty string")

				        if not (0.0 <= self.score <= 1.0):

				            raise ValueError(f"score must be between 0.0 and 1.0, got {self.score}")

brent.edwards commented

P3:nit — metadata is a mutable dict[str, str] inside a @dataclass(frozen=True). The frozen=True prevents attribute reassignment (result.metadata = {} fails) but not mutation of the dict itself (result.metadata['key'] = 'val' succeeds). For true immutability, consider types.MappingProxyType or document this as intentional.

**P3:nit** — `metadata` is a mutable `dict[str, str]` inside a `@dataclass(frozen=True)`. The `frozen=True` prevents attribute reassignment (`result.metadata = {}` fails) but not mutation of the dict itself (`result.metadata['key'] = 'val'` succeeds). For true immutability, consider `types.MappingProxyType` or document this as intentional.

src/cleveragents/domain/models/acms/index_stubs.py Outdated

						
				@@ -0,0 +305,4 @@

				        """Total number of stored embeddings (test helper)."""

				        return len(self._embeddings)

brent.edwards commented

P3:nit — This logger.debug fires even when the triple was deduplicated (already in _triple_sets). Consider moving the log inside the if triple not in self._triple_sets branch, or adding a deduplicated=True/False field to the log event.

**P3:nit** — This `logger.debug` fires even when the triple was deduplicated (already in `_triple_sets`). Consider moving the log inside the `if triple not in self._triple_sets` branch, or adding a `deduplicated=True/False` field to the log event.

brent.edwards requested changes 2026-03-10 19:05:09 +00:00

Dismissed

brent.edwards left a comment

Supplemental Review — Deep Re-review of PR #612

After a line-by-line re-read of every production file in this PR, here are 19 additional findings not covered in the initial review (review ID 2102). Grouped by file/area, with severity per the review playbook.

`uko_indexer.py` — Resource removal & idempotency

#10 · P2:should-fix — Two remove_triples calls share one try block; first failure skips bulk removal (_remove_resource_internal, lines 407-422)

The provenance-link removal (uko:sourceResource) and the bulk subject removal (predicate=None, obj=None) are in the same try block. If the first call fails, the second is skipped and all data triples for that subject remain in the graph.

Fix: Wrap each remove_triples call in its own try/except so the bulk removal always runs.

#11 · P2:should-fix — rdfs:label triples use object_uri as subject but that subject is never tracked in _resource_subjects (uko_indexer_internals.py, lines 154-163 → uko_indexer.py, line 312)

In index_graph, when both object_uri and object_value are set, an rdfs:label triple is stored with t.object_uri as subject. But only t.subject_uri is added to the subjects set (line 148). The object_uri-keyed triple is never tracked in _resource_subjects, so _remove_resource_internal cannot clean it up — it leaks permanently.

Fix: subjects.add(t.object_uri) after storing the rdfs:label triple.

#12 · P2:should-fix — Content reader exception narrower than protocol allows; data loss on unexpected exception (uko_indexer.py, lines 229-231)

except (OSError, ValueError) doesn't cover all exceptions a custom ContentReader implementation might raise. Since the idempotency path at line 202-204 already removed old data, an uncaught exception here means old data is gone and new data was never stored — data loss.

Fix: Catch Exception (like the analyzer error path at line 245), or document that ContentReader.read_content() implementations MUST only raise OSError/ValueError.

#13 · P3:nit — fire_on_removed called outside per-resource lock (uko_indexer.py, line 371)

fire_on_removed runs after releasing res_lock. Worse, the lock itself was already deleted by _remove_resource_internal (line 447). A concurrent index_resource call for the same resource could create a new lock and fire on_indexed before on_removed completes — lifecycle event ordering inversion for custom hooks.

#14 · P3:nit — Removal errors in idempotency path not surfaced in IndexResult (uko_indexer.py, lines 197-206)

When index_resource removes stale data before re-indexing, any errors from _remove_resource_internal are logged but not included in the returned IndexResult. Callers relying on IndexResult.errors for monitoring miss partial removal failures.

#15 · P3:nit — Backend failures never fire on_error lifecycle hook (uko_indexer.py, lines 276-322 via uko_indexer_internals.py)

Content-read and analyzer failures fire on_error, but graph/text/vector backend failures only append to the errors list. Custom lifecycle hooks monitoring via on_error miss backend failures.

`resource_file_watcher.py` — Debounce & concurrency

#16 · P2:should-fix — Debounce timer keyed on stale src_path after FileMovedEvent (lines 331-343)

On FileMovedEvent, _watched_paths is updated to key on dest_path (line 332), but the debounce timer is stored under src_path (line 343). A subsequent FileModifiedEvent at dest_path will:

Find the entry in _watched_paths[dest_path] ✓
Try to cancel _pending_timers.pop(dest_path) — finds nothing (timer is under src_path)
Create a second timer

Result: two callbacks fire for the same logical change — duplicate re-indexing.

Fix: Store the timer under dest_path when a move event updates the watched-paths mapping.

#17 · P2:should-fix — _fire_change callback can outlive stop() (lines 360-411 vs 250-268)

_fire_change checks _running under lock then releases it before executing the callback. stop() sets _running=False and returns. If a timer fires between the lock release in _fire_change and the stop() lock acquisition, the callback runs after stop() returns — violating the caller's expectation that all activity ceases.

Fix: Track in-flight callbacks with a counter or event; have stop() wait for them.

#18 · P3:nit — _fire_change doesn't verify path still in _watched_paths (lines 369-373)

If unwatch() and a timer fire race, the timer's cancel() may arrive too late. The callback fires for a path that was already unwatched. Benign (extra re-index) but inconsistent with unwatch semantics.

#19 · P3:nit — observer.join(timeout=5.0) timeout silently ignored (line 267)

If the observer thread doesn't stop within 5 seconds, the code continues without warning. Since _handle_fs_event doesn't check _running, the zombie observer can still create debounce timers (they'll be no-ops in _fire_change, but wasteful). Adding a log warning on timeout would aid debugging.

#20 · P3:nit — unwatch() re-resolves path; may differ from watch() resolution (lines 136 vs 177)

Both watch() and unwatch() call path.resolve() independently. If a symlink target or mount changes between calls, the resolved paths differ and unwatch() silently fails to remove the watch entry.

#21 · P3:nit — fd leaked on BaseException in LocationContentReader (uko_indexer_protocols.py, lines 211-223)

The try/except only catches OSError and ValueError. A BaseException (e.g., KeyboardInterrupt during os.fstat) leaks the file descriptor. A finally block would be safer. (Related to but distinct from initial finding #2 — this is about fd leakage, not TOCTOU.)

`index_backends.py` & `index_stubs.py` — Validation gaps

#22 · P2:should-fix — IndexedDocument and SearchResult accept whitespace-only strings (index_backends.py, lines 48-54, 73-77)

__post_init__ uses if not self.project which doesn't catch " " (whitespace-only). The protocol docstrings specify "empty or whitespace-only" should be rejected, and the stubs' _require_non_empty correctly does .strip() — but the dataclasses themselves don't.

Fix: Change to if not self.project or not self.project.strip() (or similar).

#23 · P3:nit — remove_triples accepts empty-string filters (index_stubs.py, lines 406-411)

remove_triples("proj", "", "", "") passes the all-None guard (they're not None) but matches nothing — a silent no-op that the caller probably didn't intend.

#24 · P3:nit — search_similar does not validate min_relevance range (index_stubs.py, lines 234-260)

No check that min_relevance is in [0.0, 1.0]. Values > 1.0 silently return no results; negative values silently accept everything.

`analyzers.py` — Registry & URI helpers

#25 · P2:should-fix — get_for_extension / get_for_resource is case-sensitive (analyzers.py diff, get_for_resource)

PurePosixPath(location).suffix preserves the original case. A file named FOO.PY yields extension .PY, which won't match .py in the registry. On case-insensitive filesystems (macOS, Windows) this causes silent missed indexing.

Fix: Normalize with ext.lower() in get_for_resource (or in get_for_extension).

#26 · P3:nit — register doesn't validate extension format (analyzers.py, lines 194-209 in diff)

Extensions without a leading dot (e.g., "py" instead of ".py") are silently stored but never matched by get_for_resource (which uses PurePosixPath.suffix).

#27 · P3:nit — safe_uri_segment truncation can leave trailing underscores (analyzers.py, line 106)

_SAFE_URI_RE.sub("_", text).strip("_")[:120] — the [:120] truncation happens after strip("_"), so if the truncation point falls in a run of underscores, the result has trailing underscores.

#28 · P3:nit — ProvenanceMetadata.source_range has no format validation (provenance.py, lines 57-59)

source_range is documented as "10-25" format but accepts any string, including garbage.

Summary

Severity	Count	Action
P2:should-fix	7	Fix in follow-up PR within 3 days
P3:nit	12	Author discretion

Combined with the initial review's 3 P1s, 2 P2s, and 4 P3s, the full tally is:

P1:must-fix: 3 (all from initial review — DI wiring, TOCTOU, lock deletion)
P2:should-fix: 9 (2 initial + 7 new)
P3:nit: 16 (4 initial + 12 new)

The P1s from the initial review remain the blockers. The new P2s should be tracked for follow-up. REQUEST_CHANGES stance unchanged.

## Supplemental Review — Deep Re-review of PR #612 After a line-by-line re-read of every production file in this PR, here are **19 additional findings** not covered in the initial review (review ID 2102). Grouped by file/area, with severity per the review playbook. --- ### `uko_indexer.py` — Resource removal & idempotency **#10 · P2:should-fix — Two `remove_triples` calls share one `try` block; first failure skips bulk removal** (`_remove_resource_internal`, lines 407-422) The provenance-link removal (`uko:sourceResource`) and the bulk subject removal (`predicate=None, obj=None`) are in the same `try` block. If the first call fails, the second is skipped and all data triples for that subject remain in the graph. **Fix:** Wrap each `remove_triples` call in its own `try/except` so the bulk removal always runs. --- **#11 · P2:should-fix — `rdfs:label` triples use `object_uri` as subject but that subject is never tracked in `_resource_subjects`** (`uko_indexer_internals.py`, lines 154-163 → `uko_indexer.py`, line 312) In `index_graph`, when both `object_uri` and `object_value` are set, an `rdfs:label` triple is stored with `t.object_uri` as subject. But only `t.subject_uri` is added to the `subjects` set (line 148). The `object_uri`-keyed triple is never tracked in `_resource_subjects`, so `_remove_resource_internal` cannot clean it up — it leaks permanently. **Fix:** `subjects.add(t.object_uri)` after storing the `rdfs:label` triple. --- **#12 · P2:should-fix — Content reader exception narrower than protocol allows; data loss on unexpected exception** (`uko_indexer.py`, lines 229-231) `except (OSError, ValueError)` doesn't cover all exceptions a custom `ContentReader` implementation might raise. Since the idempotency path at line 202-204 already removed old data, an uncaught exception here means old data is gone and new data was never stored — data loss. **Fix:** Catch `Exception` (like the analyzer error path at line 245), or document that `ContentReader.read_content()` implementations MUST only raise `OSError`/`ValueError`. --- **#13 · P3:nit — `fire_on_removed` called outside per-resource lock** (`uko_indexer.py`, line 371) `fire_on_removed` runs after releasing `res_lock`. Worse, the lock itself was already deleted by `_remove_resource_internal` (line 447). A concurrent `index_resource` call for the same resource could create a new lock and fire `on_indexed` before `on_removed` completes — lifecycle event ordering inversion for custom hooks. --- **#14 · P3:nit — Removal errors in idempotency path not surfaced in `IndexResult`** (`uko_indexer.py`, lines 197-206) When `index_resource` removes stale data before re-indexing, any errors from `_remove_resource_internal` are logged but not included in the returned `IndexResult`. Callers relying on `IndexResult.errors` for monitoring miss partial removal failures. --- **#15 · P3:nit — Backend failures never fire `on_error` lifecycle hook** (`uko_indexer.py`, lines 276-322 via `uko_indexer_internals.py`) Content-read and analyzer failures fire `on_error`, but graph/text/vector backend failures only append to the `errors` list. Custom lifecycle hooks monitoring via `on_error` miss backend failures. --- ### `resource_file_watcher.py` — Debounce & concurrency **#16 · P2:should-fix — Debounce timer keyed on stale `src_path` after `FileMovedEvent`** (lines 331-343) On `FileMovedEvent`, `_watched_paths` is updated to key on `dest_path` (line 332), but the debounce timer is stored under `src_path` (line 343). A subsequent `FileModifiedEvent` at `dest_path` will: 1. Find the entry in `_watched_paths[dest_path]` ✓ 2. Try to cancel `_pending_timers.pop(dest_path)` — finds nothing (timer is under `src_path`) 3. Create a second timer Result: two callbacks fire for the same logical change — duplicate re-indexing. **Fix:** Store the timer under `dest_path` when a move event updates the watched-paths mapping. --- **#17 · P2:should-fix — `_fire_change` callback can outlive `stop()`** (lines 360-411 vs 250-268) `_fire_change` checks `_running` under lock then releases it before executing the callback. `stop()` sets `_running=False` and returns. If a timer fires between the lock release in `_fire_change` and the `stop()` lock acquisition, the callback runs after `stop()` returns — violating the caller's expectation that all activity ceases. **Fix:** Track in-flight callbacks with a counter or event; have `stop()` wait for them. --- **#18 · P3:nit — `_fire_change` doesn't verify path still in `_watched_paths`** (lines 369-373) If `unwatch()` and a timer fire race, the timer's `cancel()` may arrive too late. The callback fires for a path that was already unwatched. Benign (extra re-index) but inconsistent with unwatch semantics. --- **#19 · P3:nit — `observer.join(timeout=5.0)` timeout silently ignored** (line 267) If the observer thread doesn't stop within 5 seconds, the code continues without warning. Since `_handle_fs_event` doesn't check `_running`, the zombie observer can still create debounce timers (they'll be no-ops in `_fire_change`, but wasteful). Adding a log warning on timeout would aid debugging. --- **#20 · P3:nit — `unwatch()` re-resolves path; may differ from `watch()` resolution** (lines 136 vs 177) Both `watch()` and `unwatch()` call `path.resolve()` independently. If a symlink target or mount changes between calls, the resolved paths differ and `unwatch()` silently fails to remove the watch entry. --- **#21 · P3:nit — fd leaked on `BaseException` in `LocationContentReader`** (`uko_indexer_protocols.py`, lines 211-223) The `try/except` only catches `OSError` and `ValueError`. A `BaseException` (e.g., `KeyboardInterrupt` during `os.fstat`) leaks the file descriptor. A `finally` block would be safer. *(Related to but distinct from initial finding #2 — this is about fd leakage, not TOCTOU.)* --- ### `index_backends.py` & `index_stubs.py` — Validation gaps **#22 · P2:should-fix — `IndexedDocument` and `SearchResult` accept whitespace-only strings** (`index_backends.py`, lines 48-54, 73-77) `__post_init__` uses `if not self.project` which doesn't catch `" "` (whitespace-only). The protocol docstrings specify "empty or whitespace-only" should be rejected, and the stubs' `_require_non_empty` correctly does `.strip()` — but the dataclasses themselves don't. **Fix:** Change to `if not self.project or not self.project.strip()` (or similar). --- **#23 · P3:nit — `remove_triples` accepts empty-string filters** (`index_stubs.py`, lines 406-411) `remove_triples("proj", "", "", "")` passes the all-None guard (they're not `None`) but matches nothing — a silent no-op that the caller probably didn't intend. --- **#24 · P3:nit — `search_similar` does not validate `min_relevance` range** (`index_stubs.py`, lines 234-260) No check that `min_relevance` is in `[0.0, 1.0]`. Values > 1.0 silently return no results; negative values silently accept everything. --- ### `analyzers.py` — Registry & URI helpers **#25 · P2:should-fix — `get_for_extension` / `get_for_resource` is case-sensitive** (`analyzers.py` diff, `get_for_resource`) `PurePosixPath(location).suffix` preserves the original case. A file named `FOO.PY` yields extension `.PY`, which won't match `.py` in the registry. On case-insensitive filesystems (macOS, Windows) this causes silent missed indexing. **Fix:** Normalize with `ext.lower()` in `get_for_resource` (or in `get_for_extension`). --- **#26 · P3:nit — `register` doesn't validate extension format** (`analyzers.py`, lines 194-209 in diff) Extensions without a leading dot (e.g., `"py"` instead of `".py"`) are silently stored but never matched by `get_for_resource` (which uses `PurePosixPath.suffix`). --- **#27 · P3:nit — `safe_uri_segment` truncation can leave trailing underscores** (`analyzers.py`, line 106) `_SAFE_URI_RE.sub("_", text).strip("_")[:120]` — the `[:120]` truncation happens after `strip("_")`, so if the truncation point falls in a run of underscores, the result has trailing underscores. --- **#28 · P3:nit — `ProvenanceMetadata.source_range` has no format validation** (`provenance.py`, lines 57-59) `source_range` is documented as `"10-25"` format but accepts any string, including garbage. --- ### Summary | Severity | Count | Action | |----------|-------|--------| | P2:should-fix | 7 | Fix in follow-up PR within 3 days | | P3:nit | 12 | Author discretion | Combined with the initial review's 3 P1s, 2 P2s, and 4 P3s, the full tally is: - **P1:must-fix: 3** (all from initial review — DI wiring, TOCTOU, lock deletion) - **P2:should-fix: 9** (2 initial + 7 new) - **P3:nit: 16** (4 initial + 12 new) The P1s from the initial review remain the blockers. The new P2s should be tracked for follow-up. REQUEST_CHANGES stance unchanged.

src/cleveragents/application/services/resource_file_watcher.py Outdated

						
				@@ -0,0 +331,4 @@

				                self._watched_paths.pop(src_path, None)

				                self._watched_paths[dest_path] = (resource_id, project)

				            existing = self._pending_timers.pop(src_path, None)

brent.edwards commented

#16 · P2 — On FileMovedEvent, _watched_paths is updated to key on dest_path (line 332), but the debounce timer is stored under src_path (line 343). A subsequent event at dest_path can't cancel this timer → duplicate callbacks.

# Fix: use the effective path for the timer key
timer_key = dest_path if dest_path is not None else src_path
existing = self._pending_timers.pop(timer_key, None)
# also cancel any timer under the old key on move:
if dest_path is not None:
    old_timer = self._pending_timers.pop(src_path, None)
    if old_timer is not None:
        old_timer.cancel()
...
self._pending_timers[timer_key] = timer

**#16 · P2** — On `FileMovedEvent`, `_watched_paths` is updated to key on `dest_path` (line 332), but the debounce timer is stored under `src_path` (line 343). A subsequent event at `dest_path` can't cancel this timer → duplicate callbacks. ```python # Fix: use the effective path for the timer key timer_key = dest_path if dest_path is not None else src_path existing = self._pending_timers.pop(timer_key, None) # also cancel any timer under the old key on move: if dest_path is not None: old_timer = self._pending_timers.pop(src_path, None) if old_timer is not None: old_timer.cancel() ... self._pending_timers[timer_key] = timer ```

src/cleveragents/application/services/resource_file_watcher.py Outdated

						
				@@ -0,0 +366,4 @@

				        dest_path: str | None = None,

				    ) -> None:

				        """Fire the change callback and/or EventBus event after debounce."""

				        with self._lock:

brent.edwards commented

#17 · P2 — After _fire_change releases the lock (line 372) and before the callback completes (lines 382-411), stop() can set _running=False and return. The caller of stop() assumes all activity has ceased, but the callback is still in flight.

**#17 · P2** — After `_fire_change` releases the lock (line 372) and before the callback completes (lines 382-411), `stop()` can set `_running=False` and return. The caller of `stop()` assumes all activity has ceased, but the callback is still in flight.

src/cleveragents/application/services/uko_indexer.py Outdated

						
				@@ -0,0 +228,4 @@

				        # 2. Read resource content (I/O — outside global lock)

				        try:

				            content = self._content_reader.read_content(resource)

				        except (OSError, ValueError) as exc:

brent.edwards commented

#12 · P2 — except (OSError, ValueError) is narrower than the except Exception used for analyzer errors (line 245). A custom ContentReader raising e.g. RuntimeError would propagate uncaught. Since the idempotency path (lines 202-204) already removed old data, this means data loss.

Fix: widen to except Exception or add a protocol constraint that only OSError/ValueError are allowed.

**#12 · P2** — `except (OSError, ValueError)` is narrower than the `except Exception` used for analyzer errors (line 245). A custom `ContentReader` raising e.g. `RuntimeError` would propagate uncaught. Since the idempotency path (lines 202-204) already removed old data, this means data loss. Fix: widen to `except Exception` or add a protocol constraint that only `OSError`/`ValueError` are allowed.

src/cleveragents/application/services/uko_indexer.py Outdated

						
				@@ -0,0 +405,4 @@

				        # leave the resource permanently stuck (P1 #4).

				        errors: list[str] = []

				        for subj in tracked_subjects:

				            try:

brent.edwards commented

#10 · P2 — The two remove_triples calls share one try block. If the provenance-link removal (line 409-414) throws, the bulk data-triple removal (line 415-420) is skipped and those triples leak.

# Fix: separate try blocks
for subj in tracked_subjects:
    try:
        self._graph_backend.remove_triples(
            project, subject=subj,
            predicate="uko:sourceResource", obj=resource_uri,
        )
    except Exception as exc:
        errors.append(f"graph remove provenance {subj}: {exc}")
    try:
        self._graph_backend.remove_triples(
            project, subject=subj, predicate=None, obj=None,
        )
    except Exception as exc:
        errors.append(f"graph remove data {subj}: {exc}")

**#10 · P2** — The two `remove_triples` calls share one `try` block. If the provenance-link removal (line 409-414) throws, the bulk data-triple removal (line 415-420) is skipped and those triples leak. ```python # Fix: separate try blocks for subj in tracked_subjects: try: self._graph_backend.remove_triples( project, subject=subj, predicate="uko:sourceResource", obj=resource_uri, ) except Exception as exc: errors.append(f"graph remove provenance {subj}: {exc}") try: self._graph_backend.remove_triples( project, subject=subj, predicate=None, obj=None, ) except Exception as exc: errors.append(f"graph remove data {subj}: {exc}") ```

src/cleveragents/application/services/uko_indexer_internals.py Outdated

						
				@@ -0,0 +151,4 @@

				        # URI-identified resource with a literal label), store the

				        # literal as a separate ``rdfs:label`` triple so no data is

				        # silently discarded.

				        if t.object_uri and t.object_value:

brent.edwards commented

#11 · P2 — When both object_uri and object_value are set, an rdfs:label triple is stored with t.object_uri as subject (line 158). But subjects.add(t.subject_uri) at line 148 only tracks subject_uri. The object_uri-keyed triple is never in _resource_subjects and leaks on resource removal.

Fix: add subjects.add(t.object_uri) after line 161.

**#11 · P2** — When both `object_uri` and `object_value` are set, an `rdfs:label` triple is stored with `t.object_uri` as subject (line 158). But `subjects.add(t.subject_uri)` at line 148 only tracks `subject_uri`. The `object_uri`-keyed triple is never in `_resource_subjects` and leaks on resource removal. Fix: add `subjects.add(t.object_uri)` after line 161.

src/cleveragents/domain/models/acms/analyzers.py Outdated

brent.edwards commented

#25 · P2 — PurePosixPath(location).suffix preserves case. A file FOO.PY yields .PY which won't match .py in the registry. This silently skips indexing on case-insensitive filesystems.

Fix: ext = PurePosixPath(location).suffix.lower()

**#25 · P2** — `PurePosixPath(location).suffix` preserves case. A file `FOO.PY` yields `.PY` which won't match `.py` in the registry. This silently skips indexing on case-insensitive filesystems. Fix: `ext = PurePosixPath(location).suffix.lower()`

src/cleveragents/domain/models/acms/index_backends.py Outdated

						
				@@ -0,0 +46,4 @@

				    char_count: int = 0

				    def __post_init__(self) -> None:

				        if not self.project:

brent.edwards commented

#22 · P2 — if not self.project doesn't catch whitespace-only strings like " ". The protocol docstrings say "empty or whitespace-only" should be rejected. The stubs' _require_non_empty helper correctly uses .strip() but these dataclasses don't.

Fix: if not self.project or not self.project.strip():

**#22 · P2** — `if not self.project` doesn't catch whitespace-only strings like `" "`. The protocol docstrings say "empty or whitespace-only" should be rejected. The stubs' `_require_non_empty` helper correctly uses `.strip()` but these dataclasses don't. Fix: `if not self.project or not self.project.strip():`

brent.edwards requested changes 2026-03-10 19:29:12 +00:00

Dismissed

brent.edwards left a comment

Third-Pass Exhaustive Review — PR #612

Ran 7 specialized parallel deep-reviews (cross-file data flow, concurrency, security, protocol compliance, state machine, test coverage gaps, API contracts). After deduplication against the 28 findings from the initial + supplemental reviews, 32 genuinely new findings remain.

P1:must-fix (2 new)

#29 · P1 — Deadlock: fire_on_indexed/fire_on_error called inside per-resource lock (uko_indexer.py:225,234,249,322)

_index_resource_core calls fire_on_indexed (lines 225, 322) and fire_on_error (lines 234, 249) while the caller holds res_lock (a non-reentrant threading.Lock). If a custom IndexLifecycleHook.on_indexed or on_error calls back into index_resource/reindex_resource/remove_resource for the same resource_id, the thread deadlocks permanently. The protocol docstring places no restriction on what hooks may do.

Note: this is the opposite of known #13 (fire_on_removed outside lock). Both hooks inside and outside the lock are independently problematic.

Fix: Move all lifecycle hook calls outside with res_lock:, or use RLock, or document the re-entrance restriction in the protocol.

#30 · P1 — Analyzer-produced URIs pass unsanitized to graph backend (uko_indexer_internals.py:124-148)

index_graph passes t.subject_uri, t.predicate, and t.object_uri/t.object_value from analyzer output directly to graph_backend.add_triple(). For in-memory stubs this is harmless, but with a real SPARQL/Cypher backend, a malicious custom analyzer can inject:

UKOTriple(subject_uri='x"> . } DELETE WHERE { ?s ?p ?o } #', ...)

The GraphIndexBackend.query() docstring correctly warns about sanitization, but add_triple() has no equivalent guidance, and the indexer performs no URI-format validation.

Fix: Validate URI format (e.g., no angle brackets, braces, newlines) before passing to backend, or add mandatory sanitization guidance to the add_triple protocol docstring.

P2:should-fix (18 new)

#31 · P2 — Container wires empty AnalyzerRegistry (container.py diff)

AnalyzerRegistry() is created with zero registered analyzers. No code in the codebase calls register() on the container-provided instance (grep confirmed). UKOIndexer.index_resource always hits the no-analyzer early return (line 219-226), making the indexer functionally a no-op.

#32 · P2 — _handle_fs_event Path.resolve() unguarded (resource_file_watcher.py:305,330)

Path.resolve() can raise OSError (permission denied, broken mount). Neither the handler nor the watchdog dispatch wraps it. The event is silently lost (or crashes the observer on older watchdog versions).

#33 · P2 — RESOURCE_MODIFIED emitted for all change types including deletion (resource_file_watcher.py:401-402)

_fire_change always emits EventType.RESOURCE_MODIFIED. For FileDeletedEvent, subscribers may attempt to re-read a deleted file. The change_type is buried in details but EventBus.subscribe filters by event_type, so subscribers can't exclude deletions.

#34 · P2 — Timer race: expired _fire_change steals replacement timer's entry (resource_file_watcher.py:334-343,369-370)

Scenario: Timer T1 for path P fires but hasn't acquired the lock yet. New event for P creates T2, cancels T1 (no-op — already fired), stores T2 at _pending_timers[P]. T1 now acquires lock, pop(P) steals T2's entry. Both T1 and T2 fire. This is distinct from known #16 (move-specific) — occurs on normal FileModifiedEvent sequences.

#35 · P2 — stop()/start() race creates two concurrent observers (resource_file_watcher.py:253-268,220-242)

stop() releases lock before observer.join(). start() can acquire the lock, see _running=False, and create a new observer while the old one is still running. Both deliver events.

#36 · P2 — Per-resource lock memory leak for failed/untracked resources (uko_indexer.py:143-150,219-226,361-366)

_resource_lock() always creates and stores a Lock. If indexing fails before the tracking dict is populated (no analyzer, content read failure, analyzer error), or if remove_resource is called for a never-indexed resource, the lock is never cleaned up. Repeated calls accumulate unbounded orphan locks.

#37 · P2 — Watchdog OS ops called under lock (resource_file_watcher.py:155-160,192,232-240)

observer.schedule(), observer.unschedule(), and observer.start() are OS-level operations that can block (inotify syscalls on loaded systems). Calling them inside self._lock serializes all watcher operations behind these potentially slow calls.

#38 · P2 — max_triples cap applied after full materialization (uko_indexer.py:244,257-263)

analyzer.analyze() returns the complete list before truncation. A pathological analyzer returning hundreds of millions of triples causes OOM before the cap at line 257 is reached.

#39 · P2 — FailingTextBackend mock missing rebuild_index (features/mocks/uko_indexer_mocks.py:107-132)

TextIndexBackend is @runtime_checkable and requires rebuild_index. The mock omits it, so isinstance(FailingTextBackend(), TextIndexBackend) returns False.

#40 · P2 — ContentReader protocol documents only OSError; impl raises ValueError (uko_indexer_protocols.py:42-54 vs 172-236)

The protocol's Raises section only mentions OSError. LocationContentReader raises ValueError in 4 cases. The consumer catches both (line 231). Third-party readers written against the protocol wouldn't know ValueError is expected.

Fix: Add ValueError to protocol docstring, or wrap validation failures in OSError.

#41 · P2 — ProvenanceMetadata/IndexResult lack str_strip_whitespace (provenance.py:70,146)

Both use min_length=1 without str_strip_whitespace=True. A single space " " passes min_length=1. Contrast with UKOTriple which correctly uses str_strip_whitespace=True. Different from known #22 (dataclass __post_init__ issue) — this is Pydantic min_length + missing strip.

#42 · P2 — watch() silently overwrites resource_id (resource_file_watcher.py:148)

Calling watch(path, resource_id="R1") then watch(path, resource_id="R2") silently replaces R1's entry. R1 loses all notifications without warning.

#43 · P2 — IndexResult doesn't indicate max_triples truncation (uko_indexer.py:257-263,314-321)

When triples are capped, the warning log exists but IndexResult has no field/flag/error indicating truncation occurred. Programmatic consumers can't distinguish "produced 50K" from "produced 200K, 150K dropped."

#44 · P2 — remove_resource docstring is factually wrong (uko_indexer.py:339-343)

Docstring: "Uses the caller's project for backend cleanup." Code (line 369): always uses stored_project, never the caller's project. The first sentence is false.

#45 · P2 — ProvenanceMetadata.is_current never set to False (provenance.py:44-45)

Docstring: "Set to False when a resource is removed or superseded." No code path in the codebase ever sets it to False (confirmed via grep). _remove_resource_internal deletes triples rather than marking them. The field is dead weight.

#46 · P2 — on_indexed fires for no-analyzer case (uko_indexer.py:221-226)

When no analyzer matches, fire_on_indexed is called with triple_count=0, analyzer_domain="none". The on_indexed docstring says "Called after a resource is successfully indexed". Skipping is not indexing — misleading for custom hooks.

#47 · P2 — TextIndexBackend.rebuild_index never called (index_backends.py:159-171)

Required protocol method with destructive semantics (drops and recreates index). No caller exists. Imposes implementation burden for dead functionality.

#48 · P2 — FailingGraphBackend.remove_triples is no-op (features/mocks/uko_indexer_mocks.py:91-98)

Test mock's remove_triples is pass — never raises. The entire removal error path in _remove_resource_internal (lines 408-433) is untested. Regressions in the try/except structure would be invisible.

P3:nit (12 new)

#49 · P3 — remove_resource logs caller's project but uses stored_project — misleading structured logs (uko_indexer.py:358,369)

#50 · P3 — indexed_resource_count transiently dips during reindex_resource (uko_indexer.py:130,443-447,308-312)

#51 · P3 — Removal errors include raw str(exc) (file paths, connection strings) vs indexing path which redacts to type(exc).__name__ (uko_indexer.py:422,428,433 vs uko_indexer_internals.py:137,207,250)

#52 · P3 — text_backend, vector_backend, content_reader, lifecycle_hook skip isinstance checks in constructor; analyzer_registry and graph_backend are checked (uko_indexer.py:77-94)

#53 · P3 — Empty-string resource.location (not None) resolves to CWD via Path("").resolve() with confusing error message (uko_indexer_protocols.py:187-215)

#54 · P3 — UKOTriple.confidence is silently discarded — never passed to graph_backend.add_triple or stored anywhere (uko_indexer_internals.py:124-148)

#55 · P3 — DEFAULT_MAX_CONTENT_SIZE missing from uko_indexer_protocols.__all__ (uko_indexer_protocols.py:138,251-256)

#56 · P3 — analyzers.py is the only new module without __all__ declaration

#57 · P3 — max_triples < 1 constructor guard untested (uko_indexer.py:95-96)

#58 · P3 — No concurrency tests despite "All public methods are thread-safe" docstring (uko_indexer.py:56-64)

#59 · P3 — No test covers FileMovedEvent followed by FileModifiedEvent to verify debounce coherence

#60 · P3 — index_graph dual-object branch (object_uri + object_value → rdfs:label) never exercised by any test (uko_indexer_internals.py:154-163)

Cumulative Tally (all three reviews combined)

Severity	Initial	Supplemental	This review	Total
P0:blocker	0	0	0	0
P1:must-fix	3	0	2	5
P2:should-fix	2	7	18	27
P3:nit	4	12	12	28
Total	9	19	32	60

The 5 P1s that must be fixed before merge:

DI wiring: read-side stubs as write-side backends (initial #1)
TOCTOU: fd closed then path re-opened in LocationContentReader (initial #2)
Per-resource lock deleted while caller still holds it (initial #3)
Deadlock: lifecycle hooks called inside per-resource lock (this review #29)
Analyzer URIs unsanitized → injection risk (this review #30)

## Third-Pass Exhaustive Review — PR #612 Ran 7 specialized parallel deep-reviews (cross-file data flow, concurrency, security, protocol compliance, state machine, test coverage gaps, API contracts). After deduplication against the 28 findings from the initial + supplemental reviews, **32 genuinely new findings** remain. --- ### P1:must-fix (2 new) **#29 · P1 — Deadlock: `fire_on_indexed`/`fire_on_error` called inside per-resource lock** (`uko_indexer.py:225,234,249,322`) `_index_resource_core` calls `fire_on_indexed` (lines 225, 322) and `fire_on_error` (lines 234, 249) while the caller holds `res_lock` (a non-reentrant `threading.Lock`). If a custom `IndexLifecycleHook.on_indexed` or `on_error` calls back into `index_resource`/`reindex_resource`/`remove_resource` for the **same** resource_id, the thread deadlocks permanently. The protocol docstring places no restriction on what hooks may do. Note: this is the **opposite** of known #13 (`fire_on_removed` outside lock). Both hooks inside and outside the lock are independently problematic. **Fix:** Move all lifecycle hook calls outside `with res_lock:`, or use `RLock`, or document the re-entrance restriction in the protocol. --- **#30 · P1 — Analyzer-produced URIs pass unsanitized to graph backend** (`uko_indexer_internals.py:124-148`) `index_graph` passes `t.subject_uri`, `t.predicate`, and `t.object_uri`/`t.object_value` from analyzer output directly to `graph_backend.add_triple()`. For in-memory stubs this is harmless, but with a real SPARQL/Cypher backend, a malicious custom analyzer can inject: ```python UKOTriple(subject_uri='x"> . } DELETE WHERE { ?s ?p ?o } #', ...) ``` The `GraphIndexBackend.query()` docstring correctly warns about sanitization, but `add_triple()` has no equivalent guidance, and the indexer performs no URI-format validation. **Fix:** Validate URI format (e.g., no angle brackets, braces, newlines) before passing to backend, or add mandatory sanitization guidance to the `add_triple` protocol docstring. --- ### P2:should-fix (18 new) **#31 · P2 — Container wires empty `AnalyzerRegistry`** (`container.py` diff) `AnalyzerRegistry()` is created with zero registered analyzers. No code in the codebase calls `register()` on the container-provided instance (grep confirmed). `UKOIndexer.index_resource` always hits the no-analyzer early return (line 219-226), making the indexer functionally a no-op. --- **#32 · P2 — `_handle_fs_event` `Path.resolve()` unguarded** (`resource_file_watcher.py:305,330`) `Path.resolve()` can raise `OSError` (permission denied, broken mount). Neither the handler nor the watchdog dispatch wraps it. The event is silently lost (or crashes the observer on older watchdog versions). --- **#33 · P2 — `RESOURCE_MODIFIED` emitted for all change types including deletion** (`resource_file_watcher.py:401-402`) `_fire_change` always emits `EventType.RESOURCE_MODIFIED`. For `FileDeletedEvent`, subscribers may attempt to re-read a deleted file. The `change_type` is buried in `details` but `EventBus.subscribe` filters by `event_type`, so subscribers can't exclude deletions. --- **#34 · P2 — Timer race: expired `_fire_change` steals replacement timer's entry** (`resource_file_watcher.py:334-343,369-370`) Scenario: Timer T1 for path P fires but hasn't acquired the lock yet. New event for P creates T2, cancels T1 (no-op — already fired), stores T2 at `_pending_timers[P]`. T1 now acquires lock, `pop(P)` steals T2's entry. Both T1 and T2 fire. This is distinct from known #16 (move-specific) — occurs on normal `FileModifiedEvent` sequences. --- **#35 · P2 — `stop()`/`start()` race creates two concurrent observers** (`resource_file_watcher.py:253-268,220-242`) `stop()` releases lock before `observer.join()`. `start()` can acquire the lock, see `_running=False`, and create a new observer while the old one is still running. Both deliver events. --- **#36 · P2 — Per-resource lock memory leak for failed/untracked resources** (`uko_indexer.py:143-150,219-226,361-366`) `_resource_lock()` always creates and stores a `Lock`. If indexing fails before the tracking dict is populated (no analyzer, content read failure, analyzer error), or if `remove_resource` is called for a never-indexed resource, the lock is never cleaned up. Repeated calls accumulate unbounded orphan locks. --- **#37 · P2 — Watchdog OS ops called under lock** (`resource_file_watcher.py:155-160,192,232-240`) `observer.schedule()`, `observer.unschedule()`, and `observer.start()` are OS-level operations that can block (inotify syscalls on loaded systems). Calling them inside `self._lock` serializes all watcher operations behind these potentially slow calls. --- **#38 · P2 — `max_triples` cap applied after full materialization** (`uko_indexer.py:244,257-263`) `analyzer.analyze()` returns the complete list before truncation. A pathological analyzer returning hundreds of millions of triples causes OOM before the cap at line 257 is reached. --- **#39 · P2 — `FailingTextBackend` mock missing `rebuild_index`** (`features/mocks/uko_indexer_mocks.py:107-132`) `TextIndexBackend` is `@runtime_checkable` and requires `rebuild_index`. The mock omits it, so `isinstance(FailingTextBackend(), TextIndexBackend)` returns `False`. --- **#40 · P2 — `ContentReader` protocol documents only `OSError`; impl raises `ValueError`** (`uko_indexer_protocols.py:42-54` vs `172-236`) The protocol's Raises section only mentions `OSError`. `LocationContentReader` raises `ValueError` in 4 cases. The consumer catches both (line 231). Third-party readers written against the protocol wouldn't know `ValueError` is expected. **Fix:** Add `ValueError` to protocol docstring, or wrap validation failures in `OSError`. --- **#41 · P2 — `ProvenanceMetadata`/`IndexResult` lack `str_strip_whitespace`** (`provenance.py:70,146`) Both use `min_length=1` without `str_strip_whitespace=True`. A single space `" "` passes `min_length=1`. Contrast with `UKOTriple` which correctly uses `str_strip_whitespace=True`. Different from known #22 (dataclass `__post_init__` issue) — this is Pydantic `min_length` + missing strip. --- **#42 · P2 — `watch()` silently overwrites resource_id** (`resource_file_watcher.py:148`) Calling `watch(path, resource_id="R1")` then `watch(path, resource_id="R2")` silently replaces R1's entry. R1 loses all notifications without warning. --- **#43 · P2 — `IndexResult` doesn't indicate `max_triples` truncation** (`uko_indexer.py:257-263,314-321`) When triples are capped, the warning log exists but `IndexResult` has no field/flag/error indicating truncation occurred. Programmatic consumers can't distinguish "produced 50K" from "produced 200K, 150K dropped." --- **#44 · P2 — `remove_resource` docstring is factually wrong** (`uko_indexer.py:339-343`) Docstring: *"Uses the caller's project for backend cleanup."* Code (line 369): always uses `stored_project`, never the caller's `project`. The first sentence is false. --- **#45 · P2 — `ProvenanceMetadata.is_current` never set to `False`** (`provenance.py:44-45`) Docstring: *"Set to False when a resource is removed or superseded."* No code path in the codebase ever sets it to `False` (confirmed via grep). `_remove_resource_internal` deletes triples rather than marking them. The field is dead weight. --- **#46 · P2 — `on_indexed` fires for no-analyzer case** (`uko_indexer.py:221-226`) When no analyzer matches, `fire_on_indexed` is called with `triple_count=0, analyzer_domain="none"`. The `on_indexed` docstring says *"Called after a resource is successfully indexed"*. Skipping is not indexing — misleading for custom hooks. --- **#47 · P2 — `TextIndexBackend.rebuild_index` never called** (`index_backends.py:159-171`) Required protocol method with destructive semantics (drops and recreates index). No caller exists. Imposes implementation burden for dead functionality. --- **#48 · P2 — `FailingGraphBackend.remove_triples` is no-op** (`features/mocks/uko_indexer_mocks.py:91-98`) Test mock's `remove_triples` is `pass` — never raises. The entire removal error path in `_remove_resource_internal` (lines 408-433) is untested. Regressions in the `try/except` structure would be invisible. --- ### P3:nit (12 new) **#49 · P3** — `remove_resource` logs caller's `project` but uses `stored_project` — misleading structured logs (`uko_indexer.py:358,369`) **#50 · P3** — `indexed_resource_count` transiently dips during `reindex_resource` (`uko_indexer.py:130,443-447,308-312`) **#51 · P3** — Removal errors include raw `str(exc)` (file paths, connection strings) vs indexing path which redacts to `type(exc).__name__` (`uko_indexer.py:422,428,433` vs `uko_indexer_internals.py:137,207,250`) **#52 · P3** — `text_backend`, `vector_backend`, `content_reader`, `lifecycle_hook` skip `isinstance` checks in constructor; `analyzer_registry` and `graph_backend` are checked (`uko_indexer.py:77-94`) **#53 · P3** — Empty-string `resource.location` (not `None`) resolves to CWD via `Path("").resolve()` with confusing error message (`uko_indexer_protocols.py:187-215`) **#54 · P3** — `UKOTriple.confidence` is silently discarded — never passed to `graph_backend.add_triple` or stored anywhere (`uko_indexer_internals.py:124-148`) **#55 · P3** — `DEFAULT_MAX_CONTENT_SIZE` missing from `uko_indexer_protocols.__all__` (`uko_indexer_protocols.py:138,251-256`) **#56 · P3** — `analyzers.py` is the only new module without `__all__` declaration **#57 · P3** — `max_triples < 1` constructor guard untested (`uko_indexer.py:95-96`) **#58 · P3** — No concurrency tests despite "All public methods are thread-safe" docstring (`uko_indexer.py:56-64`) **#59 · P3** — No test covers `FileMovedEvent` followed by `FileModifiedEvent` to verify debounce coherence **#60 · P3** — `index_graph` dual-object branch (`object_uri` + `object_value` → `rdfs:label`) never exercised by any test (`uko_indexer_internals.py:154-163`) --- ### Cumulative Tally (all three reviews combined) | Severity | Initial | Supplemental | This review | **Total** | |----------|---------|-------------|-------------|-----------| | P0:blocker | 0 | 0 | 0 | **0** | | P1:must-fix | 3 | 0 | 2 | **5** | | P2:should-fix | 2 | 7 | 18 | **27** | | P3:nit | 4 | 12 | 12 | **28** | | **Total** | **9** | **19** | **32** | **60** | The 5 P1s that must be fixed before merge: 1. DI wiring: read-side stubs as write-side backends (initial #1) 2. TOCTOU: fd closed then path re-opened in LocationContentReader (initial #2) 3. Per-resource lock deleted while caller still holds it (initial #3) 4. Deadlock: lifecycle hooks called inside per-resource lock (this review #29) 5. Analyzer URIs unsanitized → injection risk (this review #30)

src/cleveragents/application/services/resource_file_watcher.py Outdated

						
				@@ -0,0 +259,4 @@

				            observer = self._observer

				            self._observer = None

				            self._dir_watches.clear()

				            self._running = False

brent.edwards commented

#35 · P2 — stop()/start() race creates two concurrent observers

stop() releases self._lock at line 262 before observer.join() at line 267. start() can acquire the lock, see _running=False, and create a new observer while the old one is still running. Both deliver events.

Fix: add a _stopping sentinel checked by start(), or hold the lock across join (with appropriate deadlock prevention).

**#35 · P2 — stop()/start() race creates two concurrent observers** `stop()` releases `self._lock` at line 262 before `observer.join()` at line 267. `start()` can acquire the lock, see `_running=False`, and create a new observer while the old one is still running. Both deliver events. Fix: add a `_stopping` sentinel checked by `start()`, or hold the lock across join (with appropriate deadlock prevention).

src/cleveragents/application/services/resource_file_watcher.py Outdated

						
				@@ -0,0 +340,4 @@

				                args=(src_path, resource_id, project, change_type, dest_path),

				            )

				            timer.daemon = True

				            self._pending_timers[src_path] = timer

brent.edwards commented

#34 · P2 — Timer race: expired _fire_change steals replacement timer's entry

Scenario: Timer T1 for path P fires (hasn't acquired lock yet). New event for P creates T2, cancels T1 (no-op), stores T2 here. T1 acquires lock, pop(P) steals T2's entry. Both fire.

Distinct from #16 (move-specific) — occurs on normal FileModifiedEvent sequences.

Fix: use a generation counter to verify the executing timer is the currently registered one.

**#34 · P2 — Timer race: expired `_fire_change` steals replacement timer's entry** Scenario: Timer T1 for path P fires (hasn't acquired lock yet). New event for P creates T2, cancels T1 (no-op), stores T2 here. T1 acquires lock, `pop(P)` steals T2's entry. Both fire. Distinct from #16 (move-specific) — occurs on normal `FileModifiedEvent` sequences. Fix: use a generation counter to verify the executing timer is the currently registered one.

src/cleveragents/application/services/uko_indexer.py Outdated

						
				@@ -0,0 +145,4 @@

				        with self._lock:

				            lock = self._resource_locks.get(resource_id)

				            if lock is None:

				                lock = threading.Lock()

brent.edwards commented

#36 · P2 — Per-resource lock memory leak

_resource_lock() always creates + stores a Lock. If indexing fails before tracking-dict population (no analyzer, content error, analyzer error) or remove_resource is called for a never-indexed resource, the lock is never cleaned up. Repeated calls accumulate unbounded orphan locks.

Fix: clean up the lock when no data was stored, or add periodic sweeps.

**#36 · P2 — Per-resource lock memory leak** `_resource_lock()` always creates + stores a `Lock`. If indexing fails before tracking-dict population (no analyzer, content error, analyzer error) or `remove_resource` is called for a never-indexed resource, the lock is never cleaned up. Repeated calls accumulate unbounded orphan locks. Fix: clean up the lock when no data was stored, or add periodic sweeps.

src/cleveragents/application/services/uko_indexer.py Outdated

						
				@@ -0,0 +222,4 @@

				                resource_id=resource.resource_id,

				                analyzer_domain="none",

				            )

				            fire_on_indexed(self._lifecycle_hook, result)

brent.edwards commented

#29 · P1 — Deadlock on re-entrant lifecycle hooks

_index_resource_core calls fire_on_indexed (here, and line 322) and fire_on_error (lines 234, 249) while the caller holds res_lock — a non-reentrant threading.Lock. If a custom hook calls back into index_resource/remove_resource for the same resource_id, the thread deadlocks permanently.

This is the opposite of known #13 (fire_on_removed outside lock). Both directions are independently dangerous.

Fix: move all fire_on_* calls outside with res_lock:, or use RLock, or document the re-entrance restriction.

**#29 · P1 — Deadlock on re-entrant lifecycle hooks** `_index_resource_core` calls `fire_on_indexed` (here, and line 322) and `fire_on_error` (lines 234, 249) while the caller holds `res_lock` — a non-reentrant `threading.Lock`. If a custom hook calls back into `index_resource`/`remove_resource` for the same resource_id, the thread deadlocks permanently. This is the *opposite* of known #13 (fire_on_removed outside lock). Both directions are independently dangerous. Fix: move all `fire_on_*` calls outside `with res_lock:`, or use `RLock`, or document the re-entrance restriction.

src/cleveragents/application/services/uko_indexer.py Outdated

						
				@@ -0,0 +241,4 @@

				        # 3. Produce UKO triples (CPU — outside global lock)

				        resource_uri = f"uko://resource/{resource.resource_id}"

				        try:

				            triples = analyzer.analyze(content, resource_uri)

brent.edwards commented

#38 · P2 — max_triples cap after full materialization

analyzer.analyze() returns the complete list before truncation at line 257. A pathological analyzer returning hundreds of millions of triples causes OOM before the cap runs.

Fix: pass max_triples to analyzers, or wrap analyze() output with a counting iterator that stops early.

**#38 · P2 — max_triples cap after full materialization** `analyzer.analyze()` returns the complete list before truncation at line 257. A pathological analyzer returning hundreds of millions of triples causes OOM before the cap runs. Fix: pass max_triples to analyzers, or wrap `analyze()` output with a counting iterator that stops early.

src/cleveragents/application/services/uko_indexer_internals.py Outdated

						
				@@ -0,0 +127,4 @@

				        if not obj:

				            continue

				        try:

				            graph_backend.add_triple(

brent.edwards commented

#30 · P1 — Analyzer URIs pass unsanitized to graph backend

t.subject_uri, t.predicate, and obj flow directly from analyzer output to graph_backend.add_triple() with no URI-format validation. A malicious custom analyzer can inject SPARQL/Cypher payloads:

UKOTriple(subject_uri='x"> . } DELETE WHERE { ?s ?p ?o } #', ...)

GraphIndexBackend.query() warns about sanitization, but add_triple() has no equivalent guidance.

Fix: validate URI format before calling backend, or mandate sanitization in the add_triple contract.

**#30 · P1 — Analyzer URIs pass unsanitized to graph backend** `t.subject_uri`, `t.predicate`, and `obj` flow directly from analyzer output to `graph_backend.add_triple()` with no URI-format validation. A malicious custom analyzer can inject SPARQL/Cypher payloads: ```python UKOTriple(subject_uri='x"> . } DELETE WHERE { ?s ?p ?o } #', ...) ``` `GraphIndexBackend.query()` warns about sanitization, but `add_triple()` has no equivalent guidance. Fix: validate URI format before calling backend, or mandate sanitization in the `add_triple` contract.

src/cleveragents/domain/models/acms/provenance.py Outdated

						
				@@ -0,0 +67,4 @@

				        description="Whether this triple reflects the latest state.",

				    )

				    model_config = ConfigDict(frozen=True)

brent.edwards commented