Design doc gaps and boundary violations in cleveractors-core #4

Closed
opened 2026-05-21 05:51:43 +00:00 by hurui200320 · 1 comment
Member

Background

cleveractors-core was extracted from cleveragents-core to provide a pure-domain library (actor YAML parsing, validation, Jinja2 preprocessing, and LangGraph compilation) shared by both cleveragents-core and cleverrouter. The extraction was done by an LLM and the result has several documentation gaps and at least one code boundary violation relative to the decisions recorded in ADR-001 and ADR-002.

This issue tracks everything that needs to be fixed or added before the library can be considered properly documented and internally consistent.


1. Boundary violation — acms/index.py must not be in this library

Severity: high (code + doc inconsistency)

ADR-001 explicitly states:

cleveractors.acms — Pure UKO vocabulary data;
the indexer / storage / strategy runtime stays in cleveragents-core.

The file src/cleveractors/acms/index.py implements exactly the prohibited "indexer": ACMSIndex (storage + query engine), FileTraversalEngine (filesystem traversal), and supporting data models (IndexEntry, FileType, TierLevel). This implementation is bound to the CLI use case of cleveragents-core and has no relationship to actor YAML parsing or LangGraph compilation. The library's role is to provide building blocks (basic data types, interfaces, etc.); the actual indexer implementation depends on the host's use case.

Action required:

  • Move src/cleveractors/acms/index.py back to cleveragents-core under its ACMS subsystem.
  • Update src/cleveractors/acms/__init__.py to remove any re-export of these symbols.
  • Verify no other module in cleveractors imports from acms.index.

2. Missing port — cleveractors.ports.provider_registry

Severity: high (incomplete public surface)

docs/index.md references cleveractors.ports.provider_registry but the file does not exist. Without a pluggable provider mechanism, consumers cannot make actual LLM calls through the library — the library would be unusable as a standalone compile-and-run unit.

A ProviderRegistryPort Protocol (analogous to ToolRegistryPort in ADR-002) should be introduced so that hosts (cleveragents-core, cleverrouter, or third-party consumers) can supply their own LangChain provider configuration
without the library hardcoding a dependency on any particular provider registry.

Action required:

  • Create src/cleveractors/ports/provider_registry.py with a ProviderRegistryPort Protocol exposing the minimum surface the library needs (e.g. get(provider: str, model: str) -> BaseChatModel | None).
  • Add a corresponding ADR (ADR-003 or similar) documenting the decision.
  • Update docs/specification.md to list the new port in the "Extension points (Protocols)" table.
  • Update docs/index.md to remove the dangling "or supply via ..." note and replace it with the correct reference once the port exists.

3. Missing ADR — Actor Abstraction Definition

Severity: medium (missing design doc)

cleveractors is the canonical home for actor parsing and compilation, yet it has no document formally defining what an actor is. The foundational definition lives only in cleveragents-core/docs/adr/ADR-031. Consumers of this library need to understand:

  • The canonical actor definition (YAML-configured conversational unit, actor-as-graph principle)
  • Hierarchical composition rules (named references, load order, circular reference prohibition)
  • Actor vs Agent distinction
  • Actor roles in the plan lifecycle
  • Node types (AGENT, TOOL, CONDITIONAL, SUBGRAPH) and what they represent
  • Capability acquisition (via skills for tools, via LSP for language intelligence)

Action required:
Add docs/adr/ADR-003-actor-abstraction-definition.md derived from cleveragents-core ADR-031, scoped to what the library implements (parsing, schema, compilation) and cross-referencing the host for runtime concerns (plan lifecycle, invariant reconciliation, etc.).


4. Missing ADR — Jinja2 YAML Template Preprocessing

Severity: medium (missing design doc)

The two-phase preprocessing pipeline is a significant design decision implemented here (templates/secure_renderer.py, actor/yaml_template_engine.py) but has no ADR in this repo. Relevant decisions include:

  • Sandboxed Jinja2 execution (SandboxedEnvironment) — security constraint
  • Template protection for system_prompt fields (deferred rendering with sentinel markers)
  • Phase ordering: Jinja2 first, then ${VAR} env-var interpolation
  • Custom filters (yaml, indent, sum, selectattr) and built-in exposure (range, len, etc.)
  • Automatic type coercion of env-var substitution results

Action required:
Add docs/adr/ADR-004-jinja2-yaml-template-preprocessing.md derived from cleveragents-core ADR-032.


5. Missing reference docs — Actor YAML schema, compiler, config guide, hierarchy extensions, and examples

Severity: medium (missing documentation)

The library owns the actor YAML format and compiler but ships only a single trivial example (docs/my-actor.yaml). Consumers have no reference for:

Missing doc Content
docs/reference/actors_schema.md All top-level fields, actor types (llm, tool, graph), node type schemas, edge schema, validation rules, error messages
docs/reference/actor_compiler.md Compilation pipeline steps, node-type mappings, LSP binding extraction, error types and when they fire
docs/reference/actor_config.md Practical guide: basic structure, hierarchical graph examples, graph topology rules, common authoring errors
docs/reference/actor_hierarchy.md Per-node LSP bindings, tool-source references (skill/mcp/builtin), subgraph actor references, actor-level skills/lsp/lsp_capabilities fields
docs/reference/actors_examples.md Simple LLM actors, strategist/executor/reviewer/estimation actors, tool-only actors, linear and branching graph workflows, hierarchical actor graphs

Action required:
Port and adapt the five reference docs listed above from cleveragents-core. Scope each doc to what the library provides (schema + compile); strip references to host-only concerns (plan lifecycle, CLI commands, actor registry, reactive routing).


6. Missing API reference — public surface

Severity: low (missing documentation)

No API reference documents the public classes and functions:

  • ActorConfigSchema — Pydantic model fields and validators
  • ActorLoaderdiscover(), load(), validate() methods and their parameters
  • compile_actor() / CompiledActor — inputs, outputs, what CompilationMetadata contains
  • Error hierarchy — ActorCompilationError, MissingNodeError, InvalidEntryExitError, SubgraphCycleError
  • LspBinding, VocabularyRegistry, ToolRegistryPort, ProviderRegistryPort — data shapes and usage

Action required:
Add docs/api/actor.md documenting the above. Can be derived from cleveragents-core/docs/api/actor.md, removing the InvariantReconciliationActorsection (host-only) and updating import paths tocleveractors.*`.


7. Undocumented internal modules not mentioned in spec

Severity: low (documentation clarity)

docs/specification.md lists the public API surface but is silent on several modules that exist in the source tree. Consumers cannot tell if these are part of the public API or internal implementation details:

  • cleveractors.actor.role_validation
  • cleveractors.actor.utils
  • cleveractors.actor.yaml_template_engine
  • cleveractors.langgraph.dynamic_router
  • cleveractors.langgraph.message_router
  • cleveractors.langgraph.pure_graph
  • cleveractors.langgraph.routing_adapter

Action required:
Add a section to docs/specification.md titled "Internal modules (not public API)" listing the above with a one-line description of each, or promote any that are intentionally public into the public surface table.


Checklist

  • Move src/cleveractors/acms/index.py back to cleveragents-core (item 1)
  • Create src/cleveractors/ports/provider_registry.py with
    ProviderRegistryPort Protocol (item 2)
  • Add docs/adr/ADR-003-actor-abstraction-definition.md (item 3)
  • Add docs/adr/ADR-004-jinja2-yaml-template-preprocessing.md (item 4)
  • Add docs/reference/actors_schema.md (item 5)
  • Add docs/reference/actor_compiler.md (item 5)
  • Add docs/reference/actor_config.md (item 5)
  • Add docs/reference/actor_hierarchy.md (item 5)
  • Add docs/reference/actors_examples.md (item 5)
  • Add docs/api/actor.md (item 6)
  • Clarify public vs internal modules in docs/specification.md (item 7)
## Background `cleveractors-core` was extracted from `cleveragents-core` to provide a pure-domain library (actor YAML parsing, validation, Jinja2 preprocessing, and LangGraph compilation) shared by both `cleveragents-core` and `cleverrouter`. The extraction was done by an LLM and the result has several documentation gaps and at least one code boundary violation relative to the decisions recorded in ADR-001 and ADR-002. This issue tracks everything that needs to be fixed or added before the library can be considered properly documented and internally consistent. --- ## 1. Boundary violation — `acms/index.py` must not be in this library **Severity: high (code + doc inconsistency)** ADR-001 explicitly states: > `cleveractors.acms` — Pure UKO vocabulary data; > **the indexer / storage / strategy runtime stays in cleveragents-core.** The file `src/cleveractors/acms/index.py` implements exactly the prohibited "indexer": `ACMSIndex` (storage + query engine), `FileTraversalEngine` (filesystem traversal), and supporting data models (`IndexEntry`, `FileType`, `TierLevel`). This implementation is bound to the CLI use case of `cleveragents-core` and has no relationship to actor YAML parsing or LangGraph compilation. The library's role is to provide building blocks (basic data types, interfaces, etc.); the actual indexer implementation depends on the host's use case. **Action required:** - Move `src/cleveractors/acms/index.py` back to `cleveragents-core` under its ACMS subsystem. - Update `src/cleveractors/acms/__init__.py` to remove any re-export of these symbols. - Verify no other module in `cleveractors` imports from `acms.index`. --- ## 2. Missing port — `cleveractors.ports.provider_registry` **Severity: high (incomplete public surface)** `docs/index.md` references `cleveractors.ports.provider_registry` but the file does not exist. Without a pluggable provider mechanism, consumers cannot make actual LLM calls through the library — the library would be unusable as a standalone compile-and-run unit. A `ProviderRegistryPort` Protocol (analogous to `ToolRegistryPort` in ADR-002) should be introduced so that hosts (`cleveragents-core`, `cleverrouter`, or third-party consumers) can supply their own LangChain provider configuration without the library hardcoding a dependency on any particular provider registry. **Action required:** - Create `src/cleveractors/ports/provider_registry.py` with a `ProviderRegistryPort` Protocol exposing the minimum surface the library needs (e.g. `get(provider: str, model: str) -> BaseChatModel | None`). - Add a corresponding ADR (ADR-003 or similar) documenting the decision. - Update `docs/specification.md` to list the new port in the "Extension points (Protocols)" table. - Update `docs/index.md` to remove the dangling "or supply via ..." note and replace it with the correct reference once the port exists. --- ## 3. Missing ADR — Actor Abstraction Definition **Severity: medium (missing design doc)** `cleveractors` is the canonical home for actor parsing and compilation, yet it has no document formally defining what an actor *is*. The foundational definition lives only in `cleveragents-core/docs/adr/ADR-031`. Consumers of this library need to understand: - The canonical actor definition (YAML-configured conversational unit, actor-as-graph principle) - Hierarchical composition rules (named references, load order, circular reference prohibition) - Actor vs Agent distinction - Actor roles in the plan lifecycle - Node types (AGENT, TOOL, CONDITIONAL, SUBGRAPH) and what they represent - Capability acquisition (via skills for tools, via LSP for language intelligence) **Action required:** Add `docs/adr/ADR-003-actor-abstraction-definition.md` derived from `cleveragents-core` ADR-031, scoped to what the library implements (parsing, schema, compilation) and cross-referencing the host for runtime concerns (plan lifecycle, invariant reconciliation, etc.). --- ## 4. Missing ADR — Jinja2 YAML Template Preprocessing **Severity: medium (missing design doc)** The two-phase preprocessing pipeline is a significant design decision implemented here (`templates/secure_renderer.py`, `actor/yaml_template_engine.py`) but has no ADR in this repo. Relevant decisions include: - Sandboxed Jinja2 execution (`SandboxedEnvironment`) — security constraint - Template protection for `system_prompt` fields (deferred rendering with sentinel markers) - Phase ordering: Jinja2 first, then `${VAR}` env-var interpolation - Custom filters (`yaml`, `indent`, `sum`, `selectattr`) and built-in exposure (`range`, `len`, etc.) - Automatic type coercion of env-var substitution results **Action required:** Add `docs/adr/ADR-004-jinja2-yaml-template-preprocessing.md` derived from `cleveragents-core` ADR-032. --- ## 5. Missing reference docs — Actor YAML schema, compiler, config guide, hierarchy extensions, and examples **Severity: medium (missing documentation)** The library owns the actor YAML format and compiler but ships only a single trivial example (`docs/my-actor.yaml`). Consumers have no reference for: | Missing doc | Content | |---|---| | `docs/reference/actors_schema.md` | All top-level fields, actor types (`llm`, `tool`, `graph`), node type schemas, edge schema, validation rules, error messages | | `docs/reference/actor_compiler.md` | Compilation pipeline steps, node-type mappings, LSP binding extraction, error types and when they fire | | `docs/reference/actor_config.md` | Practical guide: basic structure, hierarchical graph examples, graph topology rules, common authoring errors | | `docs/reference/actor_hierarchy.md` | Per-node LSP bindings, tool-source references (`skill`/`mcp`/`builtin`), subgraph actor references, actor-level `skills`/`lsp`/`lsp_capabilities` fields | | `docs/reference/actors_examples.md` | Simple LLM actors, strategist/executor/reviewer/estimation actors, tool-only actors, linear and branching graph workflows, hierarchical actor graphs | **Action required:** Port and adapt the five reference docs listed above from `cleveragents-core`. Scope each doc to what the library provides (schema + compile); strip references to host-only concerns (plan lifecycle, CLI commands, actor registry, reactive routing). --- ## 6. Missing API reference — public surface **Severity: low (missing documentation)** No API reference documents the public classes and functions: - `ActorConfigSchema` — Pydantic model fields and validators - `ActorLoader` — `discover()`, `load()`, `validate()` methods and their parameters - `compile_actor()` / `CompiledActor` — inputs, outputs, what `CompilationMetadata` contains - Error hierarchy — `ActorCompilationError`, `MissingNodeError`, `InvalidEntryExitError`, `SubgraphCycleError` - `LspBinding`, `VocabularyRegistry`, `ToolRegistryPort`, `ProviderRegistryPort` — data shapes and usage **Action required:** Add `docs/api/actor.md` documenting the above. Can be derived from cleveragents-core/docs/api/actor.md`, removing the `InvariantReconciliationActor` section (host-only) and updating import paths to `cleveractors.*`. --- ## 7. Undocumented internal modules not mentioned in spec **Severity: low (documentation clarity)** `docs/specification.md` lists the public API surface but is silent on several modules that exist in the source tree. Consumers cannot tell if these are part of the public API or internal implementation details: - `cleveractors.actor.role_validation` - `cleveractors.actor.utils` - `cleveractors.actor.yaml_template_engine` - `cleveractors.langgraph.dynamic_router` - `cleveractors.langgraph.message_router` - `cleveractors.langgraph.pure_graph` - `cleveractors.langgraph.routing_adapter` **Action required:** Add a section to `docs/specification.md` titled "Internal modules (not public API)" listing the above with a one-line description of each, or promote any that are intentionally public into the public surface table. --- ## Checklist - [ ] Move `src/cleveractors/acms/index.py` back to `cleveragents-core` (item 1) - [ ] Create `src/cleveractors/ports/provider_registry.py` with `ProviderRegistryPort` Protocol (item 2) - [ ] Add `docs/adr/ADR-003-actor-abstraction-definition.md` (item 3) - [ ] Add `docs/adr/ADR-004-jinja2-yaml-template-preprocessing.md` (item 4) - [ ] Add `docs/reference/actors_schema.md` (item 5) - [ ] Add `docs/reference/actor_compiler.md` (item 5) - [ ] Add `docs/reference/actor_config.md` (item 5) - [ ] Add `docs/reference/actor_hierarchy.md` (item 5) - [ ] Add `docs/reference/actors_examples.md` (item 5) - [ ] Add `docs/api/actor.md` (item 6) - [ ] Clarify public vs internal modules in `docs/specification.md` (item 7)
Author
Member

Post-review fixes applied (commit 6d24090)

The following issues identified during multi-agent review of PR #5 have been addressed:

1. provider field missing from reference docs (Major)

Files: docs/reference/actors_schema.md, docs/reference/actor_config.md

Added provider as a required field to LLM Actor and GRAPH Actor sections in actors_schema.md. Added full provider field definition with description, common values, and cross-reference to ADR-005. Added provider to all four major YAML examples. Added missing provider validation error message. Updated actor_config.md with provider in basic structure and hierarchical graph examples, and added the missing provider error case.

Both commits squashed into a single commit with proper ISSUES CLOSED: #4 footer, per the project's Conventional Changelog standard. The single-commit structure also resolves the branch hygiene violation (two commits for one issue).

3. Prohibited # type: ignore in new file smoke_steps.py (Major)

File: features/steps/smoke_steps.py

Removed # type: ignore[import-untyped] from the behave import. Since features/ is excluded from pyright's scope in pyrightconfig.json, no suppression is needed.

4. Weak BDD assertion (Major)

Files: features/smoke.feature, features/steps/smoke_steps.py

Changed then_has_assistant_message to then_has_stub_assistant_message which verifies the precise content "stub response" rather than just checking for role == "assistant". This ensures the registry was actually resolved and the agent invoked, not that the graceful fallback path triggered.

5. Fallback message format now matches ADR-005 (Minor)

File: src/cleveractors/langgraph/nodes.py

Fixed _execute_agent fallback to produce "Agent {provider}/{model} not found" when both provider and model are available in metadata, matching the ADR-005 specification. Previously it emitted "Agent {model} not found".

6. Three new BDD scenarios added (Minor)

File: features/smoke.feature

Added BDD scenarios for:

  • Graceful fallback when provider_registry.get() returns None
  • Graceful fallback when no provider_registry is supplied
  • Pre-resolved agents dict takes precedence over provider registry (registry not called)

Total: 7 scenarios, 30 steps. All pass.

7. PR title "WIP:" prefix removed (Minor)

PR title changed from "WIP: Sync doc..." to "fix: sync doc with extraction scope, remove boundary violation, add ProviderRegistryPort".

8. Type/Documentation label added to PR

Quality gates (all passing)

nox -e lint       ✅ PASS
nox -e typecheck  ✅ PASS (0 errors, 0 warnings)
nox -e unit_tests ✅ PASS (7 scenarios, 30 steps)
nox -e coverage_report ✅ PASS

Deferred items

  • The two pre-existing # type: ignore comments in src/cleveractors/langgraph/nodes.py (lines 17, 20) were not removed — they are pre-existing and fixing them requires a stub file for cleveractors.agents.tool, which is out of scope for this PR.
  • features/mocks/ directory not created — this repo does not have a features/mocks/ convention in its project configuration.
  • Moving acms/index.py back into cleveragents-core — the file already exists there at cleveragents/acms/index.py. Any updates needed are tracked separately.
## Post-review fixes applied (commit `6d24090`) The following issues identified during multi-agent review of PR #5 have been addressed: ### 1. `provider` field missing from reference docs (Major) **Files:** `docs/reference/actors_schema.md`, `docs/reference/actor_config.md` Added `provider` as a required field to LLM Actor and GRAPH Actor sections in `actors_schema.md`. Added full `provider` field definition with description, common values, and cross-reference to ADR-005. Added `provider` to all four major YAML examples. Added missing `provider` validation error message. Updated `actor_config.md` with `provider` in basic structure and hierarchical graph examples, and added the missing `provider` error case. ### 2. Missing `ISSUES CLOSED: #4` footer in commits (Major) Both commits squashed into a single commit with proper `ISSUES CLOSED: #4` footer, per the project's Conventional Changelog standard. The single-commit structure also resolves the branch hygiene violation (two commits for one issue). ### 3. Prohibited `# type: ignore` in new file smoke_steps.py (Major) **File:** `features/steps/smoke_steps.py` Removed `# type: ignore[import-untyped]` from the `behave` import. Since `features/` is excluded from pyright's scope in `pyrightconfig.json`, no suppression is needed. ### 4. Weak BDD assertion (Major) **Files:** `features/smoke.feature`, `features/steps/smoke_steps.py` Changed `then_has_assistant_message` to `then_has_stub_assistant_message` which verifies the precise content `"stub response"` rather than just checking for `role == "assistant"`. This ensures the registry was actually resolved and the agent invoked, not that the graceful fallback path triggered. ### 5. Fallback message format now matches ADR-005 (Minor) **File:** `src/cleveractors/langgraph/nodes.py` Fixed `_execute_agent` fallback to produce `"Agent {provider}/{model} not found"` when both `provider` and `model` are available in metadata, matching the ADR-005 specification. Previously it emitted `"Agent {model} not found"`. ### 6. Three new BDD scenarios added (Minor) **File:** `features/smoke.feature` Added BDD scenarios for: - Graceful fallback when `provider_registry.get()` returns `None` - Graceful fallback when no `provider_registry` is supplied - Pre-resolved agents dict takes precedence over provider registry (registry not called) Total: 7 scenarios, 30 steps. All pass. ### 7. PR title "WIP:" prefix removed (Minor) PR title changed from `"WIP: Sync doc..."` to `"fix: sync doc with extraction scope, remove boundary violation, add ProviderRegistryPort"`. ### 8. `Type/Documentation` label added to PR ### Quality gates (all passing) ``` nox -e lint ✅ PASS nox -e typecheck ✅ PASS (0 errors, 0 warnings) nox -e unit_tests ✅ PASS (7 scenarios, 30 steps) nox -e coverage_report ✅ PASS ``` ### Deferred items - The two pre-existing `# type: ignore` comments in `src/cleveractors/langgraph/nodes.py` (lines 17, 20) were not removed — they are pre-existing and fixing them requires a stub file for `cleveractors.agents.tool`, which is out of scope for this PR. - `features/mocks/` directory not created — this repo does not have a `features/mocks/` convention in its project configuration. - Moving `acms/index.py` back into `cleveragents-core` — the file already exists there at `cleveragents/acms/index.py`. Any updates needed are tracked separately.
hurui200320 2026-05-22 03:25:52 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveractors-core#4
No description provided.