feat(validation): add semantic validation service #449

2026-02-25T23:49:54Z

CoreRasurae commented

2026-02-25 23:49:54 +00:00

Description

Adds the SemanticValidationService — a new application-layer service that provides
AST-based semantic analysis checks for Python projects during the strategize/execute plan
phases. Semantic checks are exposed as Validation tools (the read-only Tool subtype
defined in the spec and ADR-013) attachable per resource via the Tool Registry, integrating
into the existing ValidationPipeline as informational by default.

Closes #207

Core Service (`semantic_validation_service.py`)

SemanticValidationService — orchestrator that runs all registered rules against source
files, maps results through the severity/mode pipeline, and returns normalised output
compatible with ValidationPipeline.
SemanticRuleRegistry — register/lookup/list pluggable rule implementations by name;
ships pre-loaded with all built-in rules via create_default_registry().
SemanticValidationCache — file-hash-keyed LRU cache (thread-safe, bounded with LRU
eviction) to skip re-analysis of unchanged files.
SemanticValidationRule — Protocol for pluggable rules: each rule exposes a name
property and a check(source, filename) -> SemanticCheckResult method.
Helper functions resolve_severity() and map_severity_to_mode() for mapping rule severity
to required vs informational validation mode.

Built-in Rules (`semantic_validation_rules.py`)

Six AST-based, lightweight heuristic rules that avoid executing user code:

Rule	What it detects
`SyntaxCheckRule`	Syntax errors via `ast.parse`
`MissingImportRule`	Suspicious imports referencing private non-stdlib modules
`BrokenReferenceRule`	References to undefined names (scope-aware, walks all nesting levels)
`DuplicateImportRule`	Duplicate relative import statements within a single file
`APIMisuseRule`	Dangerous function calls (`eval`, `exec`, `os.system`, `pickle.load`, etc.) via AST analysis
`MissingSymbolRule`	Undefined names in functions/methods (including class methods and nested functions)

DependencyCycleRule is retained as a backward-compatible alias for DuplicateImportRule.

Configuration

validation.semantic.enabled — global toggle (default: true)
validation.semantic.python.enabled — Python-specific toggle (default: true)
validation.semantic.severity_mapping — per-rule severity overrides (info/warn/error)

Severity and Pipeline Integration

Each rule's severity determines how it integrates with the ValidationPipeline:

error → required mode (failure blocks the operation)
warn/info → informational mode (reported but does not block)

Public API

All new types are re-exported from cleveragents.application.services.__init__ for clean
downstream imports.

Spec Alignment

This implementation fulfils the Validation abstraction defined in docs/specification.md:

Returns structured JSON with mandatory passed boolean, optional data, and optional
message fields.
Always read-only (writes=false, checkpointable=false).
Attachable per resource, optionally scoped to a project or plan.
Integrates as informational by default; severity mapping allows promoting individual rules
to required mode.

Type of Change

New feature (non-breaking change which adds functionality)

Quality Checklist

Code follows the project's coding standards (see CONTRIBUTING.md)
All public/protected methods have argument validation
Static typing is complete (no Any unless justified)
nox -s typecheck passes with no errors
nox -s lint passes with no errors
Unit tests written/updated (Behave scenarios in features/)
Integration tests written/updated (Robot suites in robot/) if applicable
Coverage remains above 85% (nox -s coverage_report)
No security issues introduced (nox -s security_scan)
No dead code introduced (nox -s dead_code)
Documentation updated if behavior changed

Testing

76 Behave BDD scenarios covering all rules, registry, cache, severity mapping, config keys,
pipeline integration, safe initialisation/cleanup, and edge-case code patterns. Robot Framework
integration tests (10 tests) and ASV performance benchmarks (5 suites) are also included.

Coverage for the two new source modules:

semantic_validation_service.py — 100% (163 stmts, 28 branches)
semantic_validation_rules.py — 99% (237 stmts, 134 branches; 3 partial branch exits
remain on fully-covered lines)

Test Commands Run

nox -s unit_tests       # Behave tests
nox -s typecheck        # Type checking
nox -s lint             # Linting
nox -s format           # Formatting
nox -s coverage_report  # Coverage

Related Issues

Closes #207

Implementation Notes

Non-Python files are automatically skipped (check_file() returns [] immediately).
The SemanticValidationCache uses an OrderedDict-based LRU with a configurable max_size
(default 4096). All operations are thread-safe via a threading.Lock.
The APIMisuseRule inspects ast.Call nodes rather than using regex, so it does not
false-positive on string literals containing function names like "eval".
The BrokenReferenceRule performs scope-aware name collection across all nesting levels
(functions, classes, loops, comprehensions, exception handlers) to minimise false positives.
resolve_severity() gracefully falls back to info for unknown rules or invalid severity
values in the configuration mapping.

## Description Adds the `SemanticValidationService` — a new application-layer service that provides AST-based semantic analysis checks for Python projects during the strategize/execute plan phases. Semantic checks are exposed as **Validation** tools (the read-only Tool subtype defined in the spec and ADR-013) attachable per resource via the Tool Registry, integrating into the existing `ValidationPipeline` as `informational` by default. Closes #207 ### Core Service (`semantic_validation_service.py`) - **`SemanticValidationService`** — orchestrator that runs all registered rules against source files, maps results through the severity/mode pipeline, and returns normalised output compatible with `ValidationPipeline`. - **`SemanticRuleRegistry`** — register/lookup/list pluggable rule implementations by name; ships pre-loaded with all built-in rules via `create_default_registry()`. - **`SemanticValidationCache`** — file-hash-keyed LRU cache (thread-safe, bounded with LRU eviction) to skip re-analysis of unchanged files. - **`SemanticValidationRule`** — `Protocol` for pluggable rules: each rule exposes a `name` property and a `check(source, filename) -> SemanticCheckResult` method. - Helper functions `resolve_severity()` and `map_severity_to_mode()` for mapping rule severity to `required` vs `informational` validation mode. ### Built-in Rules (`semantic_validation_rules.py`) Six AST-based, lightweight heuristic rules that avoid executing user code: | Rule | What it detects | |---|---| | `SyntaxCheckRule` | Syntax errors via `ast.parse` | | `MissingImportRule` | Suspicious imports referencing private non-stdlib modules | | `BrokenReferenceRule` | References to undefined names (scope-aware, walks all nesting levels) | | `DuplicateImportRule` | Duplicate relative import statements within a single file | | `APIMisuseRule` | Dangerous function calls (`eval`, `exec`, `os.system`, `pickle.load`, etc.) via AST analysis | | `MissingSymbolRule` | Undefined names in functions/methods (including class methods and nested functions) | `DependencyCycleRule` is retained as a backward-compatible alias for `DuplicateImportRule`. ### Configuration - `validation.semantic.enabled` — global toggle (default: `true`) - `validation.semantic.python.enabled` — Python-specific toggle (default: `true`) - `validation.semantic.severity_mapping` — per-rule severity overrides (`info`/`warn`/`error`) ### Severity and Pipeline Integration Each rule's severity determines how it integrates with the `ValidationPipeline`: - `error` → `required` mode (failure blocks the operation) - `warn`/`info` → `informational` mode (reported but does not block) ### Public API All new types are re-exported from `cleveragents.application.services.__init__` for clean downstream imports. ### Spec Alignment This implementation fulfils the **Validation** abstraction defined in `docs/specification.md`: - Returns structured JSON with mandatory `passed` boolean, optional `data`, and optional `message` fields. - Always read-only (`writes=false`, `checkpointable=false`). - Attachable per resource, optionally scoped to a project or plan. - Integrates as `informational` by default; severity mapping allows promoting individual rules to `required` mode. ## Type of Change - [x] New feature (non-breaking change which adds functionality) ## Quality Checklist - [x] Code follows the project's coding standards (see CONTRIBUTING.md) - [x] All public/protected methods have argument validation - [x] Static typing is complete (no `Any` unless justified) - [x] `nox -s typecheck` passes with no errors - [x] `nox -s lint` passes with no errors - [x] Unit tests written/updated (Behave scenarios in `features/`) - [x] Integration tests written/updated (Robot suites in `robot/`) if applicable - [x] Coverage remains above 85% (`nox -s coverage_report`) - [x] No security issues introduced (`nox -s security_scan`) - [x] No dead code introduced (`nox -s dead_code`) - [x] Documentation updated if behavior changed ## Testing 76 Behave BDD scenarios covering all rules, registry, cache, severity mapping, config keys, pipeline integration, safe initialisation/cleanup, and edge-case code patterns. Robot Framework integration tests (10 tests) and ASV performance benchmarks (5 suites) are also included. Coverage for the two new source modules: - `semantic_validation_service.py` — **100%** (163 stmts, 28 branches) - `semantic_validation_rules.py` — **99%** (237 stmts, 134 branches; 3 partial branch exits remain on fully-covered lines) ### Test Commands Run ```bash nox -s unit_tests # Behave tests nox -s typecheck # Type checking nox -s lint # Linting nox -s format # Formatting nox -s coverage_report # Coverage ``` ## Related Issues Closes #207 ## Implementation Notes - Non-Python files are automatically skipped (`check_file()` returns `[]` immediately). - The `SemanticValidationCache` uses an `OrderedDict`-based LRU with a configurable `max_size` (default 4096). All operations are thread-safe via a `threading.Lock`. - The `APIMisuseRule` inspects `ast.Call` nodes rather than using regex, so it does not false-positive on string literals containing function names like `"eval"`. - The `BrokenReferenceRule` performs scope-aware name collection across all nesting levels (functions, classes, loops, comprehensions, exception handlers) to minimise false positives. - `resolve_severity()` gracefully falls back to `info` for unknown rules or invalid severity values in the configuration mapping.

CoreRasurae added the

Type

Feature

label 2026-02-25 23:50:08 +00:00

CoreRasurae added this to the v3.5.0 milestone 2026-02-25 23:50:09 +00:00

CoreRasurae referenced this pull request

2026-02-25 23:50:28 +00:00

feat(validation): add semantic validation service #207

CoreRasurae force-pushed feature/m6-semantic-validation from fce31611ff to 81adaf0cac

2026-02-25 23:55:31 +00:00

Compare

CoreRasurae force-pushed feature/m6-semantic-validation from 81adaf0cac to 84fa79515f

2026-02-27 14:25:31 +00:00

Compare

CoreRasurae requested review from hamza.khyari 2026-02-27 14:26:58 +00:00

hamza.khyari requested changes 2026-02-27 15:13:24 +00:00

Dismissed

hamza.khyari left a comment

Review: `feat(validation): add semantic validation service`

Pre-PR Checklist (PROTOCOL.md §21)

#	Check	Status
1	Commit message first line matches issue Metadata exactly	PASS
2	Issue reference footer `ISSUES CLOSED: #207`	PASS
3	CHANGELOG.md updated	FAIL — not modified
5	PR labels: exactly one `Type/`, no `MoSCoW/`, `Points/`, `Priority/`, `enhancement`	PASS — `Type/Feature` only
6	No `Any` types in signatures	ISSUES — see P1-2
7	Full type annotations on all functions including Robot helpers	ISSUES — see P2-5

P1 — Must Fix

P1-1: Missing CHANGELOG.md update
Pre-PR checklist item 3 requires CHANGELOG.md to be updated. It is not touched in this PR.

P1-2: Any in public function signatures

semantic_validation_service.py:288 — config: dict[str, Any] | None = None in SemanticValidationService.__init__
semantic_validation_service.py:385 — as_pipeline_results returns list[dict[str, Any]]
semantic_validation_service.py:406 — normalise_output returns dict[str, Any]
semantic_validation_rules.py:438 — data: dict[str, Any] | None in SemanticCheckResult model

Per DANGER_ZONE.md: "No Any types in signatures." Consider introducing typed dicts (PipelineResultDict, NormalisedOutputDict) or at minimum Mapping[str, object] for the return types. The data field on the Pydantic model is arguably acceptable since it is genuinely unstructured, but the config and return types should be tightened.

P1-3: __init__.py exports are incomplete — missing rule classes
SyntaxCheckRule, MissingImportRule, BrokenReferenceRule, APIMisuseRule, MissingSymbolRule, DependencyCycleRule are not exported from __init__.py, yet they are listed in vulture_whitelist.py and used by the Robot helper and Behave steps via direct imports. The __init__.py only exports DuplicateImportRule from the rules module. All six rule classes should be exported and included in __all__.

P1-4: logging used instead of structlog
semantic_validation_service.py:32 uses import logging and logging.getLogger(__name__). The rest of the codebase uses structlog.get_logger(__name__) (see decision_service.py, plan_lifecycle_service.py, etc.). This should use structlog for consistency and to benefit from the secrets-masking structlog processor.

P1-5: Docs claim subprocess.call/run/Popen are detected — code does not
docs/reference/semantic_validation.md lists subprocess.call(), subprocess.run(), subprocess.Popen() as detected dangerous calls. APIMisuseRule._DANGEROUS_ATTRS only has entries for os, pickle, and marshal — subprocess is not covered. Either add subprocess detection to the rule or remove the claim from the docs.

P2 — Should Fix

P2-1: DuplicateImportRule registered under misleading name dependency_cycle
The rule's name property returns "dependency_cycle" but it does not detect dependency cycles — it detects duplicate relative imports within a single file. The docstring even says: "This rule does not perform cross-file cycle detection." Consider renaming to "duplicate_import" with DependencyCycleRule as backward-compat alias, or at minimum document the mismatch more prominently.

P2-2: SemanticValidationRule Protocol not exported from __init__.py
The protocol class SemanticValidationRule is defined in the service module but not re-exported through __init__.py. Custom rule authors need it.

P2-3: _is_python_file uses str.endswith with a loop instead of tuple
semantic_validation_service.py:317: any(filename.endswith(ext) for ext in _PYTHON_EXTENSIONS) — more idiomatic as filename.endswith((".py", ".pyi")).

P2-4: Cache default max_size inconsistency between code and docs
Code: _DEFAULT_CACHE_MAX_SIZE = 4096 (line 162). Docs: "default: 512 entries" (semantic_validation.md). Pick one and align.

P2-5: Behave step type annotations use bare Any for context
All step functions in features/steps/semantic_validation_steps.py annotate context as Any. Should be behave.runner.Context (with TYPE_CHECKING guard if needed).

P2-6: BrokenReferenceRule hardcodes dunder exclusions
Lines 692-698 exclude __name__, __file__, __doc__, __all__, __annotations__. This is incomplete — misses __spec__, __loader__, __package__, __builtins__, __cached__, __path__ which are commonly used at module level.

P2-7: MissingSymbolRule comprehension variable handling
Comprehension iteration variables ([x for x in items]) define x inside comprehension scope in Python 3. The _collect_function_local_names picks up ast.Name in Store context via ast.walk which should cover this, but worth adding an explicit test scenario to confirm comprehension-scoped vars are not falsely flagged.

P3 — Nit / Optional

P3-1: PR body is empty — should have a summary with bullet points describing the change.

P3-2: _collect_defined_names and _collect_all_scope_names have overlapping logic with subtle differences. Consider refactoring to share a common walker.

P3-3: SemanticCheckResult Pydantic model uses use_enum_values=False which is the default — the explicit setting is unnecessary.

Summary

The code is well-structured with good separation (rules module vs. service module), proper caching, thread safety, and thorough test coverage (48 Behave scenarios, 16 Robot tests, 7 benchmark suites). The architecture follows existing service patterns.

5 P1s and 7 P2s need attention before merge.

## Review: `feat(validation): add semantic validation service` ### Pre-PR Checklist (PROTOCOL.md §21) | # | Check | Status | |---|-------|--------| | 1 | Commit message first line matches issue Metadata exactly | **PASS** | | 2 | Issue reference footer `ISSUES CLOSED: #207` | **PASS** | | 3 | CHANGELOG.md updated | **FAIL** — not modified | | 5 | PR labels: exactly one `Type/`, no `MoSCoW/`, `Points/`, `Priority/`, `enhancement` | **PASS** — `Type/Feature` only | | 6 | No `Any` types in signatures | **ISSUES** — see P1-2 | | 7 | Full type annotations on all functions including Robot helpers | **ISSUES** — see P2-5 | --- ### P1 — Must Fix **P1-1: Missing `CHANGELOG.md` update** Pre-PR checklist item 3 requires CHANGELOG.md to be updated. It is not touched in this PR. **P1-2: `Any` in public function signatures** - `semantic_validation_service.py:288` — `config: dict[str, Any] | None = None` in `SemanticValidationService.__init__` - `semantic_validation_service.py:385` — `as_pipeline_results` returns `list[dict[str, Any]]` - `semantic_validation_service.py:406` — `normalise_output` returns `dict[str, Any]` - `semantic_validation_rules.py:438` — `data: dict[str, Any] | None` in `SemanticCheckResult` model Per DANGER_ZONE.md: "No `Any` types in signatures." Consider introducing typed dicts (`PipelineResultDict`, `NormalisedOutputDict`) or at minimum `Mapping[str, object]` for the return types. The `data` field on the Pydantic model is arguably acceptable since it is genuinely unstructured, but the config and return types should be tightened. **P1-3: `__init__.py` exports are incomplete — missing rule classes** `SyntaxCheckRule`, `MissingImportRule`, `BrokenReferenceRule`, `APIMisuseRule`, `MissingSymbolRule`, `DependencyCycleRule` are **not** exported from `__init__.py`, yet they are listed in `vulture_whitelist.py` and used by the Robot helper and Behave steps via direct imports. The `__init__.py` only exports `DuplicateImportRule` from the rules module. All six rule classes should be exported and included in `__all__`. **P1-4: `logging` used instead of `structlog`** `semantic_validation_service.py:32` uses `import logging` and `logging.getLogger(__name__)`. The rest of the codebase uses `structlog.get_logger(__name__)` (see `decision_service.py`, `plan_lifecycle_service.py`, etc.). This should use `structlog` for consistency and to benefit from the secrets-masking structlog processor. **P1-5: Docs claim `subprocess.call/run/Popen` are detected — code does not** `docs/reference/semantic_validation.md` lists `subprocess.call()`, `subprocess.run()`, `subprocess.Popen()` as detected dangerous calls. `APIMisuseRule._DANGEROUS_ATTRS` only has entries for `os`, `pickle`, and `marshal` — `subprocess` is not covered. Either add `subprocess` detection to the rule or remove the claim from the docs. --- ### P2 — Should Fix **P2-1: `DuplicateImportRule` registered under misleading name `dependency_cycle`** The rule's `name` property returns `"dependency_cycle"` but it does **not** detect dependency cycles — it detects duplicate relative imports within a single file. The docstring even says: "This rule does **not** perform cross-file cycle detection." Consider renaming to `"duplicate_import"` with `DependencyCycleRule` as backward-compat alias, or at minimum document the mismatch more prominently. **P2-2: `SemanticValidationRule` Protocol not exported from `__init__.py`** The protocol class `SemanticValidationRule` is defined in the service module but not re-exported through `__init__.py`. Custom rule authors need it. **P2-3: `_is_python_file` uses `str.endswith` with a loop instead of tuple** `semantic_validation_service.py:317`: `any(filename.endswith(ext) for ext in _PYTHON_EXTENSIONS)` — more idiomatic as `filename.endswith((".py", ".pyi"))`. **P2-4: Cache default `max_size` inconsistency between code and docs** Code: `_DEFAULT_CACHE_MAX_SIZE = 4096` (line 162). Docs: "default: 512 entries" (semantic_validation.md). Pick one and align. **P2-5: Behave step type annotations use bare `Any` for context** All step functions in `features/steps/semantic_validation_steps.py` annotate `context` as `Any`. Should be `behave.runner.Context` (with `TYPE_CHECKING` guard if needed). **P2-6: `BrokenReferenceRule` hardcodes dunder exclusions** Lines 692-698 exclude `__name__`, `__file__`, `__doc__`, `__all__`, `__annotations__`. This is incomplete — misses `__spec__`, `__loader__`, `__package__`, `__builtins__`, `__cached__`, `__path__` which are commonly used at module level. **P2-7: `MissingSymbolRule` comprehension variable handling** Comprehension iteration variables (`[x for x in items]`) define `x` inside comprehension scope in Python 3. The `_collect_function_local_names` picks up `ast.Name` in `Store` context via `ast.walk` which should cover this, but worth adding an explicit test scenario to confirm comprehension-scoped vars are not falsely flagged. --- ### P3 — Nit / Optional **P3-1:** PR body is empty — should have a summary with bullet points describing the change. **P3-2:** `_collect_defined_names` and `_collect_all_scope_names` have overlapping logic with subtle differences. Consider refactoring to share a common walker. **P3-3:** `SemanticCheckResult` Pydantic model uses `use_enum_values=False` which is the default — the explicit setting is unnecessary. --- ### Summary The code is well-structured with good separation (rules module vs. service module), proper caching, thread safety, and thorough test coverage (48 Behave scenarios, 16 Robot tests, 7 benchmark suites). The architecture follows existing service patterns. **5 P1s** and **7 P2s** need attention before merge.

CoreRasurae force-pushed feature/m6-semantic-validation from 84fa79515f to e013190ac1

2026-02-27 16:23:28 +00:00

Compare

hamza.khyari approved these changes 2026-02-27 17:38:40 +00:00

Dismissed

hamza.khyari commented

2026-02-27 17:39:23 +00:00

minimal change is to add PR summary and reference the ticket

CoreRasurae force-pushed feature/m6-semantic-validation from e013190ac1 to 2f54e01270

2026-02-27 18:22:35 +00:00

Compare

CoreRasurae dismissed hamza.khyari's review 2026-02-27 18:22:35 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

CoreRasurae force-pushed feature/m6-semantic-validation from 2f54e01270 to f2b85bf587

2026-02-28 16:12:32 +00:00

Compare

CoreRasurae force-pushed feature/m6-semantic-validation from f2b85bf587 to a2be3e67b0

2026-02-28 17:47:08 +00:00

Compare

CoreRasurae scheduled this pull request to auto merge when all checks succeed 2026-02-28 19:06:11 +00:00

CoreRasurae merged commit a2be3e67b0 into master

2026-02-28 19:06:59 +00:00

freemo added the

State

Completed

label 2026-03-04 00:58:40 +00:00

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#449

feat(validation): add semantic validation service #449

Description

Core Service (semantic_validation_service.py)

Built-in Rules (semantic_validation_rules.py)

Configuration

Severity and Pipeline Integration

Public API

Spec Alignment

Type of Change

Quality Checklist

Testing

Test Commands Run

Related Issues

Implementation Notes

Review: feat(validation): add semantic validation service

Pre-PR Checklist (PROTOCOL.md §21)

P1 — Must Fix

P2 — Should Fix

P3 — Nit / Optional

Summary

Core Service (`semantic_validation_service.py`)

Built-in Rules (`semantic_validation_rules.py`)

Review: `feat(validation): add semantic validation service`