UAT: UKOInferenceEngine sibling inference only infers first pair per type — O(n²) avoidance silently drops most sibling relationships #5875

Open
opened 2026-04-09 11:17:37 +00:00 by HAL9000 · 0 comments
Owner

Bug Report

What Was Tested

UKOInferenceEngine.infer() in src/cleveragents/application/services/uko_inference.py — specifically the co-occurrence (sibling) inference rule.

Expected Behavior (from spec)

Per docs/specification.md §45521:

UKOInferenceEngine — Semantic analysis service that infers implicit relationships from UKO triples produced by domain analyzers. Infers three relationship types: uko:implicitSiblingOf (co-occurrence), uko:implicitContains (URI prefix containment), and uko:implicitDependsOn (URI reference in object values).

The spec describes uko:implicitSiblingOf as a co-occurrence relationship — subjects that share the same type are siblings. This implies all pairs of subjects sharing a type should be related.

Actual Behavior

The sibling inference code (lines 96–116) only infers the relationship for the first two subjects sharing each type:

for _type_uri, type_subjects in type_to_subjects.items():
    if len(type_subjects) < 2:
        continue
    # Only infer sibling relationships for the first pair to
    # avoid O(n²) explosion on large files.
    s1, s2 = type_subjects[0], type_subjects[1]
    if s1 != s2:
        inferred.append(
            UKOTriple(
                subject_uri=s1,
                predicate=_PRED_SIBLING,
                object_uri=s2,
                confidence=_IMPLICIT_CONFIDENCE,
            )
        )

Example: If a Python file has 10 classes (PythonClass type), only the first two classes get a uko:implicitSiblingOf relationship. The remaining 8 classes have no sibling relationships inferred, even though they all share the same type.

The comment acknowledges this: "Only infer sibling relationships for the first pair to avoid O(n²) explosion on large files."

Impact

  • For large Python files with many classes/functions, most sibling relationships are silently dropped
  • The breadth-depth-navigator context strategy relies on sibling relationships to discover related code — missing siblings means incomplete context assembly
  • The spec does not document this limitation; users and strategies expect all co-occurring siblings to be inferred

Suggested Fix

Options:

  1. Infer all pairs (O(n²)) — acceptable for small files, but may be slow for large files
  2. Infer a hub-and-spoke pattern — pick one representative subject per type and link all others to it (O(n) triples, preserves full connectivity)
  3. Document the limitation in the spec and add a max_siblings_per_type configuration parameter

The hub-and-spoke approach (option 2) would produce O(n) triples while preserving full sibling reachability via transitivity.

Code Location

  • src/cleveragents/application/services/uko_inference.py, lines 96–116

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report ### What Was Tested `UKOInferenceEngine.infer()` in `src/cleveragents/application/services/uko_inference.py` — specifically the co-occurrence (sibling) inference rule. ### Expected Behavior (from spec) Per `docs/specification.md` §45521: > `UKOInferenceEngine` — Semantic analysis service that infers implicit relationships from UKO triples produced by domain analyzers. Infers three relationship types: `uko:implicitSiblingOf` (co-occurrence), `uko:implicitContains` (URI prefix containment), and `uko:implicitDependsOn` (URI reference in object values). The spec describes `uko:implicitSiblingOf` as a co-occurrence relationship — subjects that share the same type are siblings. This implies all pairs of subjects sharing a type should be related. ### Actual Behavior The sibling inference code (lines 96–116) only infers the relationship for the **first two subjects** sharing each type: ```python for _type_uri, type_subjects in type_to_subjects.items(): if len(type_subjects) < 2: continue # Only infer sibling relationships for the first pair to # avoid O(n²) explosion on large files. s1, s2 = type_subjects[0], type_subjects[1] if s1 != s2: inferred.append( UKOTriple( subject_uri=s1, predicate=_PRED_SIBLING, object_uri=s2, confidence=_IMPLICIT_CONFIDENCE, ) ) ``` **Example**: If a Python file has 10 classes (`PythonClass` type), only the first two classes get a `uko:implicitSiblingOf` relationship. The remaining 8 classes have no sibling relationships inferred, even though they all share the same type. The comment acknowledges this: "Only infer sibling relationships for the first pair to avoid O(n²) explosion on large files." ### Impact - For large Python files with many classes/functions, most sibling relationships are silently dropped - The `breadth-depth-navigator` context strategy relies on sibling relationships to discover related code — missing siblings means incomplete context assembly - The spec does not document this limitation; users and strategies expect all co-occurring siblings to be inferred ### Suggested Fix Options: 1. **Infer all pairs** (O(n²)) — acceptable for small files, but may be slow for large files 2. **Infer a hub-and-spoke pattern** — pick one representative subject per type and link all others to it (O(n) triples, preserves full connectivity) 3. **Document the limitation** in the spec and add a `max_siblings_per_type` configuration parameter The hub-and-spoke approach (option 2) would produce O(n) triples while preserving full sibling reachability via transitivity. ### Code Location - `src/cleveragents/application/services/uko_inference.py`, lines 96–116 --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#5875
No description provided.