UAT: UKO vector embeddings use constant placeholder [1.0] — semantic ACMS strategies non-functional #6335

Open
opened 2026-04-09 20:11:19 +00:00 by HAL9000 · 0 comments
Owner

Bug Report

Spec Reference

docs/specification.md — ACMS > UKO > Real-time Index Synchronization; ACMS Context Assembly Pipeline

The spec defines the ACMS pipeline with multiple context strategies including semantic/vector-based retrieval. The UKO indexer is required to produce embeddings that allow semantic similarity queries across indexed resources.

Code Location

File: /app/src/cleveragents/application/services/uko_indexer_internals.py
Lines: 333–341 (index_vector function)

def index_vector(...) -> int:
    ...
    # TODO(#578): integrate real embedding model — placeholder vector
    # avoids leaking content size metadata by using a constant.
    # See docs/reference/uko_indexer.md § Known Limitations.
    placeholder_embedding = [1.0]
    try:
        vector_backend.index_embedding(
            project,
            resource_uri,
            placeholder_embedding,   # ← always [1.0] regardless of content
            ...
        )

Finding

Every resource indexed by UKOIndexer.index_resource() stores the identical embedding vector [1.0] into the VectorIndexBackend, completely independent of the resource's actual content. The TODO references issue #578 for real embedding model integration, which has not been implemented.

Impact

All ACMS context strategies that rely on vector similarity (semantic search) are non-functional:

  • Querying the vector backend for "resources similar to X" will return arbitrary/identical similarity scores for all resources because every embedding is [1.0].
  • The VectorIndexBackend is populated (the has_vector_backend property returns True and IndexResult.embeddings_indexed is incremented), giving a false impression that semantic search is available.
  • Any ContextStrategy that calls vector_backend.query_similar() will receive meaningless results.

Steps to Reproduce

  1. Index a repository: agents repo index local/my-repo
  2. Observe that IndexResult.embeddings_indexed > 0 (backend appears populated)
  3. Attempt a semantic similarity query via any ACMS strategy — all resources return identical similarity regardless of content

Expected

UKOIndexer.index_resource() calls an embedding model (e.g., a local sentence-transformer or a configured LLM embedding endpoint) to produce a content-derived embedding vector before calling vector_backend.index_embedding().

Actual

All resources are stored with [1.0] as their embedding, making all vector similarity queries degenerate.


Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report ### Spec Reference `docs/specification.md` — ACMS > UKO > Real-time Index Synchronization; ACMS Context Assembly Pipeline The spec defines the ACMS pipeline with multiple context strategies including semantic/vector-based retrieval. The UKO indexer is required to produce embeddings that allow semantic similarity queries across indexed resources. ### Code Location **File**: `/app/src/cleveragents/application/services/uko_indexer_internals.py` **Lines**: 333–341 (`index_vector` function) ```python def index_vector(...) -> int: ... # TODO(#578): integrate real embedding model — placeholder vector # avoids leaking content size metadata by using a constant. # See docs/reference/uko_indexer.md § Known Limitations. placeholder_embedding = [1.0] try: vector_backend.index_embedding( project, resource_uri, placeholder_embedding, # ← always [1.0] regardless of content ... ) ``` ### Finding Every resource indexed by `UKOIndexer.index_resource()` stores the identical embedding vector `[1.0]` into the `VectorIndexBackend`, completely independent of the resource's actual content. The TODO references issue `#578` for real embedding model integration, which has not been implemented. ### Impact All ACMS context strategies that rely on vector similarity (semantic search) are non-functional: - Querying the vector backend for "resources similar to X" will return arbitrary/identical similarity scores for all resources because every embedding is `[1.0]`. - The `VectorIndexBackend` is populated (the `has_vector_backend` property returns `True` and `IndexResult.embeddings_indexed` is incremented), giving a false impression that semantic search is available. - Any `ContextStrategy` that calls `vector_backend.query_similar()` will receive meaningless results. ### Steps to Reproduce 1. Index a repository: `agents repo index local/my-repo` 2. Observe that `IndexResult.embeddings_indexed > 0` (backend appears populated) 3. Attempt a semantic similarity query via any ACMS strategy — all resources return identical similarity regardless of content ### Expected `UKOIndexer.index_resource()` calls an embedding model (e.g., a local sentence-transformer or a configured LLM embedding endpoint) to produce a content-derived embedding vector before calling `vector_backend.index_embedding()`. ### Actual All resources are stored with `[1.0]` as their embedding, making all vector similarity queries degenerate. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6335
No description provided.