feat(acms): implement graph backend (Blazegraph or Neo4j) #872

Open
opened 2026-03-13 22:56:31 +00:00 by freemo · 4 comments
Owner

Metadata

  • Commit Message: feat(acms): implement graph backend (Blazegraph or Neo4j)
  • Branch: feature/m7-graph-backend

Background and Context

The specification names Blazegraph and Neo4j as graph backends for the ACMS, supporting SPARQL queries over the UKO ontology. Configuration keys for Neo4j exist in config_service.py (lines 524-533) but no implementation class exists.

Currently, InMemoryGraphBackend returns empty GraphResult objects. The ACMS pipeline's breadth-depth-navigator and arce strategies depend on graph traversal for structural code queries and UKO relationship navigation.

The GraphBackend protocol is defined via #498 with SPARQL-based query interface.

Expected Behavior

A production graph backend implementing the GraphBackend protocol that:

  • Stores UKO triples (RDF) in Blazegraph or Neo4j
  • Supports SPARQL queries with variable bindings
  • Supports graph traversal (ancestors, descendants, related nodes)
  • Integrates with the GraphIndexBackend protocol for write operations
  • Registers as a DI container provider, replacing InMemoryGraphBackend

Acceptance Criteria

  • Graph backend implements the GraphBackend protocol (query, get_by_uri, count)
  • GraphIndexBackend implementation for write operations (index triples, remove, clear)
  • SPARQL queries return correct GraphResult objects
  • UKO ontology triples are persisted and queryable
  • Backend is registered in the DI container
  • Configuration sourced from config keys (Neo4j connection URL, credentials)
  • Graceful degradation when the graph database is not available

Subtasks

  • Choose backend (Blazegraph vs. Neo4j) based on deployment simplicity and Python driver maturity
  • Implement graph backend class
  • Implement graph index backend for write operations
  • Map UKO ontology model to graph database schema
  • Register backend in DI container with conditional activation
  • Wire into ACMSPipeline strategy execution
  • Tests (Behave): Add scenarios for triple storage, SPARQL queries, graceful degradation
  • Verify coverage >= 97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `feat(acms): implement graph backend (Blazegraph or Neo4j)` - **Branch**: `feature/m7-graph-backend` ## Background and Context The specification names Blazegraph and Neo4j as graph backends for the ACMS, supporting SPARQL queries over the UKO ontology. Configuration keys for Neo4j exist in `config_service.py` (lines 524-533) but no implementation class exists. Currently, `InMemoryGraphBackend` returns empty `GraphResult` objects. The ACMS pipeline's breadth-depth-navigator and arce strategies depend on graph traversal for structural code queries and UKO relationship navigation. The `GraphBackend` protocol is defined via #498 with SPARQL-based query interface. ## Expected Behavior A production graph backend implementing the `GraphBackend` protocol that: - Stores UKO triples (RDF) in Blazegraph or Neo4j - Supports SPARQL queries with variable bindings - Supports graph traversal (ancestors, descendants, related nodes) - Integrates with the `GraphIndexBackend` protocol for write operations - Registers as a DI container provider, replacing `InMemoryGraphBackend` ## Acceptance Criteria - [ ] Graph backend implements the `GraphBackend` protocol (query, get_by_uri, count) - [ ] `GraphIndexBackend` implementation for write operations (index triples, remove, clear) - [ ] SPARQL queries return correct `GraphResult` objects - [ ] UKO ontology triples are persisted and queryable - [ ] Backend is registered in the DI container - [ ] Configuration sourced from config keys (Neo4j connection URL, credentials) - [ ] Graceful degradation when the graph database is not available ## Subtasks - [ ] Choose backend (Blazegraph vs. Neo4j) based on deployment simplicity and Python driver maturity - [ ] Implement graph backend class - [ ] Implement graph index backend for write operations - [ ] Map UKO ontology model to graph database schema - [ ] Register backend in DI container with conditional activation - [ ] Wire into `ACMSPipeline` strategy execution - [ ] Tests (Behave): Add scenarios for triple storage, SPARQL queries, graceful degradation - [ ] Verify coverage >= 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.7.0 milestone 2026-03-13 22:56:42 +00:00
Member

@freemo Clarification needed on graph backend choice before implementation begins.

Issue title says: "Blazegraph or Neo4j"
Specification (docs/specification.md) says: accepted values for index.graph.backend are neo4j, rdflib, or none — Blazegraph is not listed as a config option.

The config keys in config_service.py (lines 527-553) already define:

  • graph.backend → accepts neo4j, rdflib, none
  • graph.neo4j-url → Neo4j server URL
  • graph.neo4j-auth → Neo4j auth credentials

The spec (lines 43652-43661) describes the technology stack as:

  • Neo4j — external graph database for production (structural code relationships, class hierarchies, call graphs)
  • rdflib >= 7.1.4 — in-process Python RDF library, no external dependencies, lightweight graph queries

Blazegraph appears only in the ACMS architecture diagram as one of several physical data store options but has no config support, no Python driver ecosystem, and is not referenced as an accepted config value.

My proposed approach: Implement rdflib as the graph backend for this issue. Rationale:

  1. rdflib has native SPARQL support (parser + query engine built in), which aligns directly with the GraphBackend.sparql_query() protocol method.
  2. No external server required — simpler deployment, easier testing.
  3. Mature Python library (>= 7.1.4 per spec).
  4. Follows the same pattern as the FAISS vector backend: factory function reads config, returns real implementation or falls back to InMemory.
  5. A separate Neo4j backend could be added later as a follow-up issue if production scale demands it.

If you intended for this issue to specifically target Neo4j or Blazegraph instead, please let me know before I proceed.

@freemo Clarification needed on graph backend choice before implementation begins. **Issue title says:** "Blazegraph or Neo4j" **Specification (`docs/specification.md`) says:** accepted values for `index.graph.backend` are `neo4j`, `rdflib`, or `none` — Blazegraph is not listed as a config option. The config keys in `config_service.py` (lines 527-553) already define: - `graph.backend` → accepts `neo4j`, `rdflib`, `none` - `graph.neo4j-url` → Neo4j server URL - `graph.neo4j-auth` → Neo4j auth credentials The spec (lines 43652-43661) describes the technology stack as: - **Neo4j** — external graph database for production (structural code relationships, class hierarchies, call graphs) - **rdflib >= 7.1.4** — in-process Python RDF library, no external dependencies, lightweight graph queries Blazegraph appears only in the ACMS architecture diagram as one of several physical data store options but has no config support, no Python driver ecosystem, and is not referenced as an accepted config value. **My proposed approach:** Implement **rdflib** as the graph backend for this issue. Rationale: 1. rdflib has **native SPARQL support** (parser + query engine built in), which aligns directly with the `GraphBackend.sparql_query()` protocol method. 2. No external server required — simpler deployment, easier testing. 3. Mature Python library (>= 7.1.4 per spec). 4. Follows the same pattern as the FAISS vector backend: factory function reads config, returns real implementation or falls back to InMemory. 5. A separate Neo4j backend could be added later as a follow-up issue if production scale demands it. If you intended for this issue to specifically target Neo4j or Blazegraph instead, please let me know before I proceed.
freemo self-assigned this 2026-04-02 06:14:03 +00:00
Author
Owner

PR #1282 has been reviewed. Changes requested — the implementation is solid but has blocking issues:

  1. 7+ # type: ignore comments in neo4j_graph_backend.py violating the project's strict no-type-ignore rule
  2. Broad except Exception catch-all blocks that suppress errors instead of propagating them (violates fail-fast principle)
  3. Factory function return types use Any instead of protocol types

Full review details in the PR comment.

PR #1282 has been reviewed. **Changes requested** — the implementation is solid but has blocking issues: 1. **7+ `# type: ignore` comments** in `neo4j_graph_backend.py` violating the project's strict no-type-ignore rule 2. **Broad `except Exception` catch-all blocks** that suppress errors instead of propagating them (violates fail-fast principle) 3. **Factory function return types** use `Any` instead of protocol types Full review details in the [PR comment](https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/1282#issuecomment-77324).
Author
Owner

PR #1282 reviewed (round 2) — changes requested. The PR has not been updated since the prior review. Three blocking issues remain:

  1. 9 # type: ignore comments across source and test files (CONTRIBUTING.md violation)
  2. Broad except Exception catch-all blocks in 6 backend methods suppress programming errors (fail-fast violation)
  3. Factory function return types use Any instead of protocol types

See PR #1282 review comment for full details and fix guidance.

PR #1282 reviewed (round 2) — **changes requested**. The PR has not been updated since the prior review. Three blocking issues remain: 1. **9 `# type: ignore` comments** across source and test files (CONTRIBUTING.md violation) 2. **Broad `except Exception` catch-all blocks** in 6 backend methods suppress programming errors (fail-fast violation) 3. **Factory function return types** use `Any` instead of protocol types See [PR #1282 review comment](https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/1282#issuecomment-78494) for full details and fix guidance.
Author
Owner

PR #1282 Review (Round 3): Changes requested.

The PR has not been updated since the original commit. Three blocking issues remain unresolved:

  1. 9 # type: ignore comments — strictly forbidden by CONTRIBUTING.md
  2. Broad except Exception blocks in 6 backend methods — violates fail-fast principle
  3. Factory function return types use Any instead of protocol types

Detailed review with fix instructions posted on PR #1282. The implementation is otherwise well-structured — once these three issues are fixed, the PR should be ready to merge.

**PR #1282 Review (Round 3)**: Changes requested. The PR has not been updated since the original commit. Three blocking issues remain unresolved: 1. **9 `# type: ignore` comments** — strictly forbidden by CONTRIBUTING.md 2. **Broad `except Exception` blocks** in 6 backend methods — violates fail-fast principle 3. **Factory function return types** use `Any` instead of protocol types Detailed review with fix instructions posted on PR #1282. The implementation is otherwise well-structured — once these three issues are fixed, the PR should be ready to merge.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#367 Epic: Multi-Agent RDF System
cleveragents/cleveragents-core
Depends on
Reference
cleveragents/cleveragents-core#872
No description provided.