UAT: UKO vector embeddings use constant placeholder `[1.0]` — semantic ACMS strategies non-functional #6335

New issue

Open

opened 2026-04-09 20:11:19 +00:00 by HAL9000 · 0 comments

HAL9000 commented

2026-04-09 20:11:19 +00:00

Owner

Bug Report

Spec Reference

docs/specification.md — ACMS > UKO > Real-time Index Synchronization; ACMS Context Assembly Pipeline

The spec defines the ACMS pipeline with multiple context strategies including semantic/vector-based retrieval. The UKO indexer is required to produce embeddings that allow semantic similarity queries across indexed resources.

Code Location

File: /app/src/cleveragents/application/services/uko_indexer_internals.py
Lines: 333–341 (index_vector function)

def index_vector(...) -> int:
    ...
    # TODO(#578): integrate real embedding model — placeholder vector
    # avoids leaking content size metadata by using a constant.
    # See docs/reference/uko_indexer.md § Known Limitations.
    placeholder_embedding = [1.0]
    try:
        vector_backend.index_embedding(
            project,
            resource_uri,
            placeholder_embedding,   # ← always [1.0] regardless of content
            ...
        )

Finding

Every resource indexed by UKOIndexer.index_resource() stores the identical embedding vector [1.0] into the VectorIndexBackend, completely independent of the resource's actual content. The TODO references issue #578 for real embedding model integration, which has not been implemented.

Impact

All ACMS context strategies that rely on vector similarity (semantic search) are non-functional:

Querying the vector backend for "resources similar to X" will return arbitrary/identical similarity scores for all resources because every embedding is [1.0].
The VectorIndexBackend is populated (the has_vector_backend property returns True and IndexResult.embeddings_indexed is incremented), giving a false impression that semantic search is available.
Any ContextStrategy that calls vector_backend.query_similar() will receive meaningless results.

Steps to Reproduce

Index a repository: agents repo index local/my-repo
Observe that IndexResult.embeddings_indexed > 0 (backend appears populated)
Attempt a semantic similarity query via any ACMS strategy — all resources return identical similarity regardless of content

Expected

UKOIndexer.index_resource() calls an embedding model (e.g., a local sentence-transformer or a configured LLM embedding endpoint) to produce a content-derived embedding vector before calling vector_backend.index_embedding().

Actual

All resources are stored with [1.0] as their embedding, making all vector similarity queries degenerate.

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report ### Spec Reference `docs/specification.md` — ACMS > UKO > Real-time Index Synchronization; ACMS Context Assembly Pipeline The spec defines the ACMS pipeline with multiple context strategies including semantic/vector-based retrieval. The UKO indexer is required to produce embeddings that allow semantic similarity queries across indexed resources. ### Code Location **File**: `/app/src/cleveragents/application/services/uko_indexer_internals.py` **Lines**: 333–341 (`index_vector` function) ```python def index_vector(...) -> int: ... # TODO(#578): integrate real embedding model — placeholder vector # avoids leaking content size metadata by using a constant. # See docs/reference/uko_indexer.md § Known Limitations. placeholder_embedding = [1.0] try: vector_backend.index_embedding( project, resource_uri, placeholder_embedding, # ← always [1.0] regardless of content ... ) ``` ### Finding Every resource indexed by `UKOIndexer.index_resource()` stores the identical embedding vector `[1.0]` into the `VectorIndexBackend`, completely independent of the resource's actual content. The TODO references issue `#578` for real embedding model integration, which has not been implemented. ### Impact All ACMS context strategies that rely on vector similarity (semantic search) are non-functional: - Querying the vector backend for "resources similar to X" will return arbitrary/identical similarity scores for all resources because every embedding is `[1.0]`. - The `VectorIndexBackend` is populated (the `has_vector_backend` property returns `True` and `IndexResult.embeddings_indexed` is incremented), giving a false impression that semantic search is available. - Any `ContextStrategy` that calls `vector_backend.query_similar()` will receive meaningless results. ### Steps to Reproduce 1. Index a repository: `agents repo index local/my-repo` 2. Observe that `IndexResult.embeddings_indexed > 0` (backend appears populated) 3. Attempt a semantic similarity query via any ACMS strategy — all resources return identical similarity regardless of content ### Expected `UKOIndexer.index_resource()` calls an embedding model (e.g., a local sentence-transformer or a configured LLM embedding endpoint) to produce a content-derived embedding vector before calling `vector_backend.index_embedding()`. ### Actual All resources are stored with `[1.0]` as their embedding, making all vector similarity queries degenerate. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester

HAL9000 referenced this issue

2026-04-09 20:18:09 +00:00

[AUTO-UAT-POOL] UAT Testing Report (Cycle 3) #6294

HAL9000 added the

labels

2026-04-09 20:18:36 +00:00