UAT: UKO indexer uses placeholder embedding vector instead of real embedding model — semantic search is non-functional #3933

Open
opened 2026-04-06 07:35:47 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: feat/uko-real-embedding-model
  • Commit Message: feat(uko): integrate real embedding model for UKO node indexing
  • Milestone: Backlog (see note below)
  • Parent Epic: #396

Background and Context

The UKO (Universal Knowledge Ontology) indexer in src/cleveragents/application/services/uko_indexer_internals.py (lines 333–341) uses a single-element placeholder vector [1.0] for all UKO node embeddings instead of a real embedding model. This was discovered during UAT code-level analysis.

Per docs/specification.md, the UKO is described as "Semantically aware — implicit relationships are inferred from content analysis." The ACMS includes a semantic-embedding context strategy that relies on vector embeddings for semantic similarity search, and the spec describes the ACMS as having a vector backend for semantic similarity search. The current placeholder implementation renders this capability entirely non-functional.

A TODO(#578) comment in the source code acknowledges this gap, referencing issue #578.

Current Behavior

# src/cleveragents/application/services/uko_indexer_internals.py lines 333-341:
# TODO(#578): integrate real embedding model — placeholder vector
placeholder_embedding = [1.0]

All UKO nodes receive the identical single-element placeholder embedding [1.0]. As a result:

  1. All UKO nodes have identical embeddings — no semantic differentiation is possible.
  2. Semantic similarity search returns meaningless results (all nodes are equidistant).
  3. The semantic-embedding ACMS context strategy cannot function correctly.

Expected Behavior

Per docs/specification.md, the UKO indexer should use a real embedding model (e.g., a sentence-transformer or equivalent) to produce meaningful multi-dimensional vector embeddings for each UKO node. These embeddings should enable the ACMS semantic-embedding context strategy to perform genuine semantic similarity search, distinguishing between different code entities based on their content.

Steps to Reproduce

  1. Open src/cleveragents/application/services/uko_indexer_internals.py.
  2. Navigate to lines 333–341.
  3. Observe placeholder_embedding = [1.0] used for all UKO node embeddings.

Impact

The semantic embedding capability of the ACMS is entirely non-functional. All UKO nodes have identical placeholder embeddings, making semantic similarity search return meaningless results. The semantic-embedding context strategy cannot distinguish between different code entities.

Code locations affected:

  • src/cleveragents/application/services/uko_indexer_internals.py lines 333–341 (placeholder embedding)
  • Referenced issue: #578 (mentioned in TODO comment)

Backlog note: This issue was discovered during autonomous operation
on milestone v3.4.0. It does not block milestone completion and has been
placed in the backlog for human review and future milestone assignment.

Subtasks

  • Select and integrate an appropriate embedding model (e.g., sentence-transformers, OpenAI embeddings, or a configurable provider abstraction)
  • Replace the placeholder_embedding = [1.0] stub in uko_indexer_internals.py lines 333–341 with a call to the real embedding model
  • Ensure the embedding model is configurable (model name, provider, dimensionality) via the ACMS configuration layer
  • Update the ACMS vector store schema/backend to accept the correct embedding dimensionality from the real model
  • Write BDD unit tests (Behave/Gherkin) in features/ covering embedding generation and UKO node indexing with real vectors
  • Write Robot Framework integration tests in robot/ verifying semantic similarity search returns meaningful ranked results
  • Update type annotations throughout the embedding pipeline to satisfy Pyright (nox -e typecheck)
  • Verify nox -e coverage_report reports >= 97% coverage
  • Update any relevant documentation or spec references

Definition of Done

  • All subtasks above are completed
  • placeholder_embedding = [1.0] no longer exists in the codebase
  • Real embedding model is integrated and produces multi-dimensional vectors for UKO nodes
  • Semantic similarity search via the semantic-embedding ACMS context strategy returns meaningful, ranked results
  • All BDD unit tests pass (nox -e unit_tests)
  • All Robot Framework integration tests pass (nox -e integration_tests)
  • All nox stages pass
  • Coverage >= 97%
  • The corresponding pull request has been successfully merged

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `feat/uko-real-embedding-model` - **Commit Message**: `feat(uko): integrate real embedding model for UKO node indexing` - **Milestone**: Backlog (see note below) - **Parent Epic**: #396 ## Background and Context The UKO (Universal Knowledge Ontology) indexer in `src/cleveragents/application/services/uko_indexer_internals.py` (lines 333–341) uses a single-element placeholder vector `[1.0]` for all UKO node embeddings instead of a real embedding model. This was discovered during UAT code-level analysis. Per `docs/specification.md`, the UKO is described as *"Semantically aware — implicit relationships are inferred from content analysis."* The ACMS includes a `semantic-embedding` context strategy that relies on vector embeddings for semantic similarity search, and the spec describes the ACMS as having a vector backend for semantic similarity search. The current placeholder implementation renders this capability entirely non-functional. A `TODO(#578)` comment in the source code acknowledges this gap, referencing issue #578. ## Current Behavior ```python # src/cleveragents/application/services/uko_indexer_internals.py lines 333-341: # TODO(#578): integrate real embedding model — placeholder vector placeholder_embedding = [1.0] ``` All UKO nodes receive the identical single-element placeholder embedding `[1.0]`. As a result: 1. All UKO nodes have identical embeddings — no semantic differentiation is possible. 2. Semantic similarity search returns meaningless results (all nodes are equidistant). 3. The `semantic-embedding` ACMS context strategy cannot function correctly. ## Expected Behavior Per `docs/specification.md`, the UKO indexer should use a real embedding model (e.g., a sentence-transformer or equivalent) to produce meaningful multi-dimensional vector embeddings for each UKO node. These embeddings should enable the ACMS `semantic-embedding` context strategy to perform genuine semantic similarity search, distinguishing between different code entities based on their content. ## Steps to Reproduce 1. Open `src/cleveragents/application/services/uko_indexer_internals.py`. 2. Navigate to lines 333–341. 3. Observe `placeholder_embedding = [1.0]` used for all UKO node embeddings. ## Impact The semantic embedding capability of the ACMS is entirely non-functional. All UKO nodes have identical placeholder embeddings, making semantic similarity search return meaningless results. The `semantic-embedding` context strategy cannot distinguish between different code entities. **Code locations affected:** - `src/cleveragents/application/services/uko_indexer_internals.py` lines 333–341 (placeholder embedding) - Referenced issue: #578 (mentioned in TODO comment) > **Backlog note:** This issue was discovered during autonomous operation > on milestone v3.4.0. It does not block milestone completion and has been > placed in the backlog for human review and future milestone assignment. ## Subtasks - [ ] Select and integrate an appropriate embedding model (e.g., `sentence-transformers`, OpenAI embeddings, or a configurable provider abstraction) - [ ] Replace the `placeholder_embedding = [1.0]` stub in `uko_indexer_internals.py` lines 333–341 with a call to the real embedding model - [ ] Ensure the embedding model is configurable (model name, provider, dimensionality) via the ACMS configuration layer - [ ] Update the ACMS vector store schema/backend to accept the correct embedding dimensionality from the real model - [ ] Write BDD unit tests (Behave/Gherkin) in `features/` covering embedding generation and UKO node indexing with real vectors - [ ] Write Robot Framework integration tests in `robot/` verifying semantic similarity search returns meaningful ranked results - [ ] Update type annotations throughout the embedding pipeline to satisfy Pyright (`nox -e typecheck`) - [ ] Verify `nox -e coverage_report` reports >= 97% coverage - [ ] Update any relevant documentation or spec references ## Definition of Done - [ ] All subtasks above are completed - [ ] `placeholder_embedding = [1.0]` no longer exists in the codebase - [ ] Real embedding model is integrated and produces multi-dimensional vectors for UKO nodes - [ ] Semantic similarity search via the `semantic-embedding` ACMS context strategy returns meaningful, ranked results - [ ] All BDD unit tests pass (`nox -e unit_tests`) - [ ] All Robot Framework integration tests pass (`nox -e integration_tests`) - [ ] All nox stages pass - [ ] Coverage >= 97% - [ ] The corresponding pull request has been successfully merged --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#396 Epic: ACMS Context Pipeline
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#3933
No description provided.