feat(uko): add analyzer plugin framework and initial domain analyzers (Python, Markdown) #551

Closed
opened 2026-03-04 01:03:12 +00:00 by freemo · 1 comment
Owner

Metadata

  • Commit Message: feat(uko): add analyzer plugin framework and initial domain analyzers
  • Branch: feature/m6-uko-analyzers
Field Value
Type Feature
Priority Medium
MoSCoW Should Have
Points 8
Milestone v3.5.0
Assignee freemo
Parent Epic #396 (Epic: ACMS Context Pipeline)
Depends On #189 (UKO ontology scaffolding), #466 (UKO ontology scaffolding PR)

Background

The specification (§ Extensibility > ACMS Extensions > Analyzers, lines 44210-44261) defines a plugin system for domain analyzers that parse resources into UKO triples. The spec lists numerous analyzers:

  • Code: PythonAnalyzer, TypeScriptAnalyzer, RustAnalyzer, CustomDSLAnalyzer
  • Document: MarkdownAnalyzer, ReStructuredTextAnalyzer, HTMLDocumentAnalyzer
  • Data: PostgreSQLAnalyzer, MySQLAnalyzer, SQLiteAnalyzer
  • Infrastructure: DockerComposeAnalyzer, KubernetesAnalyzer

UKO ontology scaffolding is tracked (#189/#466) but NO issues exist for the analyzer plugin framework or individual analyzers.

This issue covers:

  1. The analyzer plugin framework (registration, discovery, invocation)
  2. PythonAnalyzer — most critical for code projects (parses Python files into UKO triples: modules, classes, functions, imports, dependencies)
  3. MarkdownAnalyzer — parses Markdown documents into UKO triples: sections, headings, code blocks, links

Acceptance Criteria

  1. AnalyzerProtocol defined with analyze(resource) -> list[UKOTriple] method.
  2. Analyzer registry allowing registration by resource type / file extension.
  3. Auto-discovery: analyzers registered via config.toml or plugin entry points.
  4. PythonAnalyzer extracts: modules, classes, functions, methods, imports, dependencies, docstrings.
  5. MarkdownAnalyzer extracts: sections (by heading level), code blocks, links, references.
  6. Both analyzers produce well-formed UKO triples with proper URI schemes.

Subtasks

1. Design

  • Design AnalyzerProtocol interface
  • Design analyzer registry and auto-discovery mechanism
  • Design UKO triple format for code and document domains

2. Implementation

  • Implement AnalyzerProtocol and analyzer registry
  • Implement PythonAnalyzer using AST parsing
  • Implement MarkdownAnalyzer using markdown parsing
  • Wire analyzers into UKO graph population workflow

3. Testing

  • Unit tests for analyzer registry
  • Unit tests for PythonAnalyzer with various Python file structures
  • Unit tests for MarkdownAnalyzer with various document structures
  • Integration test: file → analyzer → UKO graph

4. Documentation

  • Analyzer plugin development guide
  • Supported Python/Markdown constructs

5. Integration

  • Wire into resource indexing workflow (#195)
  • Verify UKO triple compatibility with ontology (#189)

6. Observability

  • Log analyzer invocations, triple counts, parse errors

7. Security

  • Analyzers execute read-only (no side effects)
  • Input validation on resource content

Definition of Done

  • All acceptance criteria met
  • All subtask checkboxes checked
  • Tests pass in CI
  • Code reviewed and approved
## Metadata - **Commit Message**: `feat(uko): add analyzer plugin framework and initial domain analyzers` - **Branch**: `feature/m6-uko-analyzers` | Field | Value | |-------|-------| | **Type** | Feature | | **Priority** | Medium | | **MoSCoW** | Should Have | | **Points** | 8 | | **Milestone** | v3.5.0 | | **Assignee** | freemo | | **Parent Epic** | #396 (Epic: ACMS Context Pipeline) | | **Depends On** | #189 (UKO ontology scaffolding), #466 (UKO ontology scaffolding PR) | ## Background The specification (§ Extensibility > ACMS Extensions > Analyzers, lines 44210-44261) defines a plugin system for **domain analyzers** that parse resources into UKO triples. The spec lists numerous analyzers: - **Code**: PythonAnalyzer, TypeScriptAnalyzer, RustAnalyzer, CustomDSLAnalyzer - **Document**: MarkdownAnalyzer, ReStructuredTextAnalyzer, HTMLDocumentAnalyzer - **Data**: PostgreSQLAnalyzer, MySQLAnalyzer, SQLiteAnalyzer - **Infrastructure**: DockerComposeAnalyzer, KubernetesAnalyzer UKO ontology scaffolding is tracked (#189/#466) but NO issues exist for the analyzer plugin framework or individual analyzers. This issue covers: 1. The **analyzer plugin framework** (registration, discovery, invocation) 2. **PythonAnalyzer** — most critical for code projects (parses Python files into UKO triples: modules, classes, functions, imports, dependencies) 3. **MarkdownAnalyzer** — parses Markdown documents into UKO triples: sections, headings, code blocks, links ## Acceptance Criteria 1. `AnalyzerProtocol` defined with `analyze(resource) -> list[UKOTriple]` method. 2. Analyzer registry allowing registration by resource type / file extension. 3. Auto-discovery: analyzers registered via config.toml or plugin entry points. 4. `PythonAnalyzer` extracts: modules, classes, functions, methods, imports, dependencies, docstrings. 5. `MarkdownAnalyzer` extracts: sections (by heading level), code blocks, links, references. 6. Both analyzers produce well-formed UKO triples with proper URI schemes. ## Subtasks ### 1. Design - [ ] Design `AnalyzerProtocol` interface - [ ] Design analyzer registry and auto-discovery mechanism - [ ] Design UKO triple format for code and document domains ### 2. Implementation - [ ] Implement `AnalyzerProtocol` and analyzer registry - [ ] Implement `PythonAnalyzer` using AST parsing - [ ] Implement `MarkdownAnalyzer` using markdown parsing - [ ] Wire analyzers into UKO graph population workflow ### 3. Testing - [ ] Unit tests for analyzer registry - [ ] Unit tests for PythonAnalyzer with various Python file structures - [ ] Unit tests for MarkdownAnalyzer with various document structures - [ ] Integration test: file → analyzer → UKO graph ### 4. Documentation - [ ] Analyzer plugin development guide - [ ] Supported Python/Markdown constructs ### 5. Integration - [ ] Wire into resource indexing workflow (#195) - [ ] Verify UKO triple compatibility with ontology (#189) ### 6. Observability - [ ] Log analyzer invocations, triple counts, parse errors ### 7. Security - [ ] Analyzers execute read-only (no side effects) - [ ] Input validation on resource content ## Definition of Done - [ ] All acceptance criteria met - [ ] All subtask checkboxes checked - [ ] Tests pass in CI - [ ] Code reviewed and approved
freemo added this to the v3.5.0 milestone 2026-03-04 01:03:31 +00:00
freemo self-assigned this 2026-03-04 01:41:13 +00:00
Author
Owner

Implementation Notes

Changes Made

Implemented the analyzer plugin framework with AnalyzerProtocol, AnalyzerRegistry, PythonAnalyzer (AST-based), and MarkdownAnalyzer (regex-based).

Created files (9):

  • src/cleveragents/domain/models/acms/analyzers.py (219 lines) — UKOTriple, AnalyzerProtocol, AnalyzerRegistry
  • src/cleveragents/domain/models/acms/python_analyzer.py (365 lines) — AST-based Python file analyzer
  • src/cleveragents/domain/models/acms/markdown_analyzer.py (275 lines) — Regex-based Markdown analyzer
  • src/cleveragents/domain/models/acms/__init__.py — Updated exports
  • features/uko_analyzers.feature (193 lines) — 39 Behave scenarios
  • features/steps/uko_analyzers_steps.py (468 lines) — Step definitions
  • robot/uko_analyzers.robot (65 lines) — 7 Robot Framework tests
  • robot/helper_uko_analyzers.py (241 lines) — Robot helper
  • benchmarks/uko_analyzers_bench.py (157 lines) — ASV benchmarks

Design Decisions

  1. AnalyzerProtocol@runtime_checkable Protocol with supported_extensions, domain, and analyze() method
  2. PythonAnalyzer — Uses ast.parse() for reliable extraction of modules, classes, functions, imports, docstrings
  3. MarkdownAnalyzer — Uses regex patterns for heading, code block, and link extraction (no external markdown parsing library needed)
  4. UKO URI schemeuko://code/module/... for Python, uko://docs/section/... for Markdown
  5. Registration by extensionAnalyzerRegistry.register(analyzer) uses supported_extensions to index analyzers

Verification

  • lint: passed
  • typecheck: 0 errors (Pyright strict)
  • unit_tests: 39 scenarios, 103 steps passed
  • integration_tests: 7/7 passed
  • Total: 2,004 lines added across 9 files

PR: #597

## Implementation Notes ### Changes Made Implemented the analyzer plugin framework with AnalyzerProtocol, AnalyzerRegistry, PythonAnalyzer (AST-based), and MarkdownAnalyzer (regex-based). **Created files (9):** - `src/cleveragents/domain/models/acms/analyzers.py` (219 lines) — `UKOTriple`, `AnalyzerProtocol`, `AnalyzerRegistry` - `src/cleveragents/domain/models/acms/python_analyzer.py` (365 lines) — AST-based Python file analyzer - `src/cleveragents/domain/models/acms/markdown_analyzer.py` (275 lines) — Regex-based Markdown analyzer - `src/cleveragents/domain/models/acms/__init__.py` — Updated exports - `features/uko_analyzers.feature` (193 lines) — 39 Behave scenarios - `features/steps/uko_analyzers_steps.py` (468 lines) — Step definitions - `robot/uko_analyzers.robot` (65 lines) — 7 Robot Framework tests - `robot/helper_uko_analyzers.py` (241 lines) — Robot helper - `benchmarks/uko_analyzers_bench.py` (157 lines) — ASV benchmarks ### Design Decisions 1. **AnalyzerProtocol** — `@runtime_checkable` Protocol with `supported_extensions`, `domain`, and `analyze()` method 2. **PythonAnalyzer** — Uses `ast.parse()` for reliable extraction of modules, classes, functions, imports, docstrings 3. **MarkdownAnalyzer** — Uses regex patterns for heading, code block, and link extraction (no external markdown parsing library needed) 4. **UKO URI scheme** — `uko://code/module/...` for Python, `uko://docs/section/...` for Markdown 5. **Registration by extension** — `AnalyzerRegistry.register(analyzer)` uses `supported_extensions` to index analyzers ### Verification - lint: passed - typecheck: 0 errors (Pyright strict) - unit_tests: 39 scenarios, 103 steps passed - integration_tests: 7/7 passed - Total: 2,004 lines added across 9 files PR: #597
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Blocks
#369 Epic: Large Project Autonomy & Context
cleveragents/cleveragents-core
Reference
cleveragents/cleveragents-core#551
No description provided.