UAT: UKO vector indexer uses a constant placeholder embedding [1.0] — real embedding model is never called #4143

Open
opened 2026-04-06 10:48:21 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/uko-real-embedding-model
  • Commit Message: fix(acms): integrate real embedding model into UKO vector indexer
  • Milestone: (none — backlog)
  • Parent Epic: #539 (ACMS v1 + Context Scaling M5)

Bug Report

What Was Tested

The UKO (Universal Knowledge Ontology) vector indexer's embedding generation for semantic context search.

Expected Behavior (from spec)

Per docs/specification.md §UKO:

An RDF-based, inheritance-driven ontology representing resources at multiple abstraction levels with provenance and temporal versioning. Semantically aware — implicit relationships are inferred from content analysis.

The UKO vector indexer should generate real semantic embeddings from resource content, enabling meaningful semantic similarity search across the knowledge graph.

Actual Behavior

index_vector() in src/cleveragents/application/services/uko_indexer_internals.py (line 333) uses a constant placeholder embedding instead of calling a real embedding model:

def index_vector(
    vector_backend: VectorIndexBackend | None,
    project: str,
    resource: Resource,
    content: str,
    errors: list[str],
    resource_uri: str,
) -> int:
    """Index resource embedding in the vector backend."""
    if vector_backend is None:
        return 0
    # TODO(#578): integrate real embedding model — placeholder vector
    # avoids leaking content size metadata by using a constant.
    # See docs/reference/uko_indexer.md § Known Limitations.
    placeholder_embedding = [1.0]  # ← CONSTANT PLACEHOLDER
    try:
        vector_backend.index_embedding(
            project,
            resource_uri,
            placeholder_embedding,  # ← Always [1.0] regardless of content
            ...
        )

The placeholder [1.0] is a single-dimensional constant vector that:

  1. Does not encode any semantic information from the resource content
  2. Makes all resources appear identical in vector space (distance = 0 between all resources)
  3. Renders semantic similarity search completely non-functional

Impact

  • Semantic context search (ACMS vector strategy) returns meaningless results
  • All resources have identical embeddings, so similarity search cannot distinguish between them
  • The "semantically aware" UKO feature described in the spec is non-functional
  • The acms_search_by_vector() method in VectorStoreService returns arbitrary results

Code Location

  • src/cleveragents/application/services/uko_indexer_internals.pyindex_vector() function, line 333
  • TODO comment references issue #578

Steps to Reproduce

  1. Index two resources with different content via the UKO indexer
  2. Query for semantic similarity between them
  3. Observe that all resources have identical embeddings and similarity scores are meaningless

Subtasks

  • Integrate the configured embedding model (from index.embedding.provider config) into index_vector()
  • Use VectorStoreService._create_acms_embeddings() or equivalent to generate real embeddings
  • Remove the placeholder_embedding = [1.0] constant
  • Add BDD scenario verifying that different content produces different embeddings
  • Update docs/reference/uko_indexer.md to remove the "Known Limitations" note
  • Run nox (all default sessions), fix any errors
  • Verify coverage >= 97% via nox -s coverage_report

Definition of Done

  • index_vector() generates real embeddings from resource content
  • Different resources produce different embedding vectors
  • Semantic similarity search returns meaningful results
  • A Git commit is created where the first line matches the Commit Message in Metadata exactly, followed by a blank line, then additional details
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done
  • All nox stages pass
  • Coverage >= 97%

Backlog note: This issue was discovered during autonomous operation
on milestone M5 (ACMS v1 + Context Scaling). It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

The TODO comment references issue #578. The placeholder was intentional
during early development but needs to be replaced with real embeddings.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/uko-real-embedding-model` - **Commit Message**: `fix(acms): integrate real embedding model into UKO vector indexer` - **Milestone**: _(none — backlog)_ - **Parent Epic**: #539 (ACMS v1 + Context Scaling M5) ## Bug Report ### What Was Tested The UKO (Universal Knowledge Ontology) vector indexer's embedding generation for semantic context search. ### Expected Behavior (from spec) Per `docs/specification.md` §UKO: > An RDF-based, inheritance-driven ontology representing resources at multiple abstraction levels with provenance and temporal versioning. **Semantically aware — implicit relationships are inferred from content analysis.** The UKO vector indexer should generate real semantic embeddings from resource content, enabling meaningful semantic similarity search across the knowledge graph. ### Actual Behavior `index_vector()` in `src/cleveragents/application/services/uko_indexer_internals.py` (line 333) uses a constant placeholder embedding instead of calling a real embedding model: ```python def index_vector( vector_backend: VectorIndexBackend | None, project: str, resource: Resource, content: str, errors: list[str], resource_uri: str, ) -> int: """Index resource embedding in the vector backend.""" if vector_backend is None: return 0 # TODO(#578): integrate real embedding model — placeholder vector # avoids leaking content size metadata by using a constant. # See docs/reference/uko_indexer.md § Known Limitations. placeholder_embedding = [1.0] # ← CONSTANT PLACEHOLDER try: vector_backend.index_embedding( project, resource_uri, placeholder_embedding, # ← Always [1.0] regardless of content ... ) ``` The placeholder `[1.0]` is a single-dimensional constant vector that: 1. Does not encode any semantic information from the resource content 2. Makes all resources appear identical in vector space (distance = 0 between all resources) 3. Renders semantic similarity search completely non-functional ### Impact - Semantic context search (ACMS vector strategy) returns meaningless results - All resources have identical embeddings, so similarity search cannot distinguish between them - The "semantically aware" UKO feature described in the spec is non-functional - The `acms_search_by_vector()` method in `VectorStoreService` returns arbitrary results ### Code Location - `src/cleveragents/application/services/uko_indexer_internals.py` — `index_vector()` function, line 333 - TODO comment references issue #578 ### Steps to Reproduce 1. Index two resources with different content via the UKO indexer 2. Query for semantic similarity between them 3. Observe that all resources have identical embeddings and similarity scores are meaningless ## Subtasks - [ ] Integrate the configured embedding model (from `index.embedding.provider` config) into `index_vector()` - [ ] Use `VectorStoreService._create_acms_embeddings()` or equivalent to generate real embeddings - [ ] Remove the `placeholder_embedding = [1.0]` constant - [ ] Add BDD scenario verifying that different content produces different embeddings - [ ] Update `docs/reference/uko_indexer.md` to remove the "Known Limitations" note - [ ] Run `nox` (all default sessions), fix any errors - [ ] Verify coverage >= 97% via `nox -s coverage_report` ## Definition of Done - [ ] `index_vector()` generates real embeddings from resource content - [ ] Different resources produce different embedding vectors - [ ] Semantic similarity search returns meaningful results - [ ] A Git commit is created where the first line matches the Commit Message in Metadata exactly, followed by a blank line, then additional details - [ ] The commit is pushed to the remote on the branch matching the Branch in Metadata exactly - [ ] The commit is submitted as a pull request to `master`, reviewed, and merged before this issue is marked done - [ ] All nox stages pass - [ ] Coverage >= 97% > **Backlog note:** This issue was discovered during autonomous operation > on milestone M5 (ACMS v1 + Context Scaling). It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. > > The TODO comment references issue #578. The placeholder was intentional > during early development but needs to be replaced with real embeddings. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-new-issue-creator
freemo added this to the v3.4.0 milestone 2026-04-06 18:07:12 +00:00
Author
Owner

Milestone Triage Decision: Moved to Backlog

This CLI enhancement issue has been moved out of v3.3.0 during aggressive milestone triage. While useful for user experience, it does not relate to the core focus of Corrections + Subplans + Checkpoints.

Reasoning:

  • v3.3.0 focus: Essential corrections, subplan management, and checkpoint functionality
  • This issue: CLI error handling enhancement - user experience improvement
  • Impact: UX enhancement, not core corrections/subplans/checkpoints functionality

Will be addressed in a future milestone focused on CLI polish and user experience enhancements.

**Milestone Triage Decision: Moved to Backlog** This CLI enhancement issue has been moved out of v3.3.0 during aggressive milestone triage. While useful for user experience, it does not relate to the core focus of Corrections + Subplans + Checkpoints. **Reasoning:** - v3.3.0 focus: Essential corrections, subplan management, and checkpoint functionality - This issue: CLI error handling enhancement - user experience improvement - Impact: UX enhancement, not core corrections/subplans/checkpoints functionality Will be addressed in a future milestone focused on CLI polish and user experience enhancements.
freemo removed this from the v3.4.0 milestone 2026-04-06 20:39:48 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#4143
No description provided.