feat(acms): implement Real-time Index Sync / UKOIndexer with pluggable analyzers #578

Closed
opened 2026-03-04 23:43:50 +00:00 by freemo · 0 comments
Owner

Metadata

Field Value
Commit Message feat(acms): implement Real-time Index Sync / UKOIndexer with pluggable analyzers
Branch feature/m5-realtime-index-sync-ukoindexer

Summary

Implement the UKOIndexer class that produces UKO triples from resources using pluggable domain-specific analyzers, adds provenance metadata, and simultaneously indexes into text, vector, and graph backends. Also implement the index lifecycle (resource add/change/remove/maintenance triggers) and the graceful degradation fallback chain.

Spec Reference

Section: Architecture > ACMS > Real-time Index Synchronization
Lines: ~43205-43300

Current State

  • No UKOIndexer class exists in the codebase.
  • BAL backend stubs exist (TextBackend, VectorBackend, GraphBackend in domain/models/acms/backends.py) but are Protocol definitions only — no real indexing implementations.
  • ScopedBackendView exists for scoped queries.
  • No file-watching or change-detection triggers exist for incremental index updates.
  • No analyzer framework or pluggable analyzer registry exists.

Description

UKOIndexer Core

class UKOIndexer:
    """Produces UKO triples from resources using pluggable analyzers."""

    def index_resource(self, resource: Resource):
        # 1. Determine the best analyzer for this resource
        analyzer = self.analyzers.get_for_resource(resource)
        # 2. Produce UKO triples using the most specific vocabulary
        triples = analyzer.analyze(resource)
        # 3. Add provenance (sourceResource, sourcePath, sourceRange, validFrom, isCurrent)
        for triple in triples:
            self._add_provenance(triple, resource)
        # 4. Store in graph backend
        self.graph_backend.add_triples(triples)
        # 5. Also index into text and vector backends
        self.text_indexer.index(resource, triples)
        self.vector_indexer.index(resource, triples)

Index Lifecycle

Stage Trigger Action
Resource added agents resource add / agents project link-resource Immediate full indexing
Code changed File modification detected Immediate incremental update
Resource removed agents resource remove / unlink Immediate cleanup, UKO nodes marked historical
Maintenance Scheduled Full reindex, consistency check

Graceful Degradation Fallback Chain

When advanced features are unavailable, the pipeline's ConfidenceWeightedSelector automatically routes to simpler strategies:

  1. Try arce (requires all backends) → if unavailable:
  2. Try breadth-depth-navigator (requires graph) → if unavailable:
  3. Try semantic-embedding (requires vector) → if unavailable:
  4. Fall back to simple-keyword (requires only text search / ripgrep)

Performance Targets (from spec)

  • Text indexing: 10,000 files/minute
  • Vector indexing: 1,000 files/minute (with GPU)
  • Graph indexing: 5,000 files/minute

Acceptance Criteria

  • UKOIndexer class with index_resource(), remove_resource(), reindex_resource() methods
  • Pluggable Analyzer Protocol: get_for_resource(resource) -> Analyzer with registry-based lookup
  • Provenance metadata automatically added to all produced triples (sourceResource, sourcePath, sourceRange, validFrom, isCurrent)
  • Multi-backend indexing: triples stored in graph backend, text content indexed in text backend, embeddings in vector backend
  • Index lifecycle hooks: immediate indexing on resource add, incremental update on change, cleanup on remove
  • File-watching integration: detect file modifications and trigger incremental re-indexing
  • Graceful degradation: system works with only text search available, enhances progressively as backends are added
  • Fallback chain: arce → breadth-depth-navigator → semantic-embedding → simple-keyword
  • Configuration: index.text.backend, index.vector.backend, index.graph.backend config keys
  • Unit tests for analyzer selection, triple production, provenance attachment
  • Integration test: add a resource, verify all three backends are populated, modify a file, verify incremental update
  • Depends on: UKO Layer 1 Domain Ontologies, BAL backend implementations
  • Parent epic: #396 (ACMS Context Pipeline)
  • Related: ACMS Domain-Specific Analyzer Implementations (the analyzers this indexer uses)
  • Prerequisite for: All context strategies that query indices

Suggested Milestone

v3.4.0

Priority

High

Suggested Assignee

@hamza.khyari

Subtasks

  • Code: Implement UKOIndexer class with index_resource(), remove_resource(), reindex_resource() methods and provenance metadata attachment
  • Code: Implement pluggable Analyzer Protocol with registry-based lookup (get_for_resource())
  • Code: Implement multi-backend indexing (graph, text, vector) and index lifecycle hooks (add/change/remove/maintenance)
  • Code: Implement graceful degradation fallback chain (arce → breadth-depth-navigator → semantic-embedding → simple-keyword)
  • Code: Add configuration keys: index.text.backend, index.vector.backend, index.graph.backend
  • Docs: Document the UKOIndexer architecture, analyzer plugin system, and fallback chain
  • Behave tests: Add BDD feature file features/acms/realtime_index_sync.feature covering indexing, incremental updates, and fallback
  • Robot tests: Add Robot Framework integration test: add resource, verify all 3 backends populated, modify file, verify incremental update
  • ASV benchmarks: Add ASV benchmark for indexing throughput against spec targets (10K files/min text, 5K graph, 1K vector) (benchmarks/bench_uko_indexer.py)
  • Quality: coverage ≥97%: Verify via nox -s coverage_report
  • Quality: nox full suite: Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks below are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata | Field | Value | |-------|-------| | **Commit Message** | `feat(acms): implement Real-time Index Sync / UKOIndexer with pluggable analyzers` | | **Branch** | `feature/m5-realtime-index-sync-ukoindexer` | ## Summary Implement the `UKOIndexer` class that produces UKO triples from resources using pluggable domain-specific analyzers, adds provenance metadata, and simultaneously indexes into text, vector, and graph backends. Also implement the index lifecycle (resource add/change/remove/maintenance triggers) and the graceful degradation fallback chain. ## Spec Reference **Section**: Architecture > ACMS > Real-time Index Synchronization **Lines**: ~43205-43300 ## Current State - No `UKOIndexer` class exists in the codebase. - BAL backend stubs exist (`TextBackend`, `VectorBackend`, `GraphBackend` in `domain/models/acms/backends.py`) but are Protocol definitions only — no real indexing implementations. - `ScopedBackendView` exists for scoped queries. - No file-watching or change-detection triggers exist for incremental index updates. - No analyzer framework or pluggable analyzer registry exists. ## Description ### UKOIndexer Core ```python class UKOIndexer: """Produces UKO triples from resources using pluggable analyzers.""" def index_resource(self, resource: Resource): # 1. Determine the best analyzer for this resource analyzer = self.analyzers.get_for_resource(resource) # 2. Produce UKO triples using the most specific vocabulary triples = analyzer.analyze(resource) # 3. Add provenance (sourceResource, sourcePath, sourceRange, validFrom, isCurrent) for triple in triples: self._add_provenance(triple, resource) # 4. Store in graph backend self.graph_backend.add_triples(triples) # 5. Also index into text and vector backends self.text_indexer.index(resource, triples) self.vector_indexer.index(resource, triples) ``` ### Index Lifecycle | Stage | Trigger | Action | |-------|---------|--------| | Resource added | `agents resource add` / `agents project link-resource` | Immediate full indexing | | Code changed | File modification detected | Immediate incremental update | | Resource removed | `agents resource remove` / unlink | Immediate cleanup, UKO nodes marked historical | | Maintenance | Scheduled | Full reindex, consistency check | ### Graceful Degradation Fallback Chain When advanced features are unavailable, the pipeline's `ConfidenceWeightedSelector` automatically routes to simpler strategies: 1. Try `arce` (requires all backends) → if unavailable: 2. Try `breadth-depth-navigator` (requires graph) → if unavailable: 3. Try `semantic-embedding` (requires vector) → if unavailable: 4. Fall back to `simple-keyword` (requires only text search / ripgrep) ### Performance Targets (from spec) - Text indexing: 10,000 files/minute - Vector indexing: 1,000 files/minute (with GPU) - Graph indexing: 5,000 files/minute ## Acceptance Criteria - [ ] `UKOIndexer` class with `index_resource()`, `remove_resource()`, `reindex_resource()` methods - [ ] Pluggable `Analyzer` Protocol: `get_for_resource(resource) -> Analyzer` with registry-based lookup - [ ] Provenance metadata automatically added to all produced triples (`sourceResource`, `sourcePath`, `sourceRange`, `validFrom`, `isCurrent`) - [ ] Multi-backend indexing: triples stored in graph backend, text content indexed in text backend, embeddings in vector backend - [ ] Index lifecycle hooks: immediate indexing on resource add, incremental update on change, cleanup on remove - [ ] File-watching integration: detect file modifications and trigger incremental re-indexing - [ ] Graceful degradation: system works with only text search available, enhances progressively as backends are added - [ ] Fallback chain: arce → breadth-depth-navigator → semantic-embedding → simple-keyword - [ ] Configuration: `index.text.backend`, `index.vector.backend`, `index.graph.backend` config keys - [ ] Unit tests for analyzer selection, triple production, provenance attachment - [ ] Integration test: add a resource, verify all three backends are populated, modify a file, verify incremental update ## Related Issues - Depends on: UKO Layer 1 Domain Ontologies, BAL backend implementations - Parent epic: #396 (ACMS Context Pipeline) - Related: ACMS Domain-Specific Analyzer Implementations (the analyzers this indexer uses) - Prerequisite for: All context strategies that query indices ## Suggested Milestone v3.4.0 ## Priority High ## Suggested Assignee @hamza.khyari ## Subtasks - [ ] **Code**: Implement `UKOIndexer` class with `index_resource()`, `remove_resource()`, `reindex_resource()` methods and provenance metadata attachment - [ ] **Code**: Implement pluggable `Analyzer` Protocol with registry-based lookup (`get_for_resource()`) - [ ] **Code**: Implement multi-backend indexing (graph, text, vector) and index lifecycle hooks (add/change/remove/maintenance) - [ ] **Code**: Implement graceful degradation fallback chain (arce → breadth-depth-navigator → semantic-embedding → simple-keyword) - [ ] **Code**: Add configuration keys: `index.text.backend`, `index.vector.backend`, `index.graph.backend` - [ ] **Docs**: Document the UKOIndexer architecture, analyzer plugin system, and fallback chain - [ ] **Behave tests**: Add BDD feature file `features/acms/realtime_index_sync.feature` covering indexing, incremental updates, and fallback - [ ] **Robot tests**: Add Robot Framework integration test: add resource, verify all 3 backends populated, modify file, verify incremental update - [ ] **ASV benchmarks**: Add ASV benchmark for indexing throughput against spec targets (10K files/min text, 5K graph, 1K vector) (`benchmarks/bench_uko_indexer.py`) - [ ] **Quality: coverage ≥97%**: Verify via `nox -s coverage_report` - [ ] **Quality: nox full suite**: Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks below are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.4.0 milestone 2026-03-05 00:30:13 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#578
No description provided.