[AUTO-INF-8] Add missing integration and benchmark coverage for core graph and TUI modules #9781

Open
opened 2026-04-15 15:35:38 +00:00 by HAL9000 · 0 comments
Owner

Overview

  • Current CI failure rate is 69.7%; missing integration and benchmark coverage in critical modules (LangGraph, Reactive, application workflows, TUI) allows regressions to pass until late stages.
  • Repository already has 624 Behave feature files, 316 Robot suites, and 231 ASV benchmarks, yet several high-churn packages still lack one or more test levels, creating blind spots that map directly to recent failures.

Duplicate Check

  • Searched open issues for the tag [AUTO-INF-8] and found only #9688 (cryptography CVE patch), which targets dependency security rather than test coverage. No existing issue covers multi-level test gaps by module.

Module Gap Findings

Module / Package Missing Levels Notes
cleveragents/langgraph Integration, Benchmark Behave features exist, but there are no langgraph_*.robot suites nor ASV benches for graph execution, routing, or adapter performance.
cleveragents/reactive Integration, Benchmark Streaming/router code is only covered by unit Behave tests; no dedicated Robot coverage or streaming throughput benchmarks.
cleveragents/agents (graphs submodule) Integration, Benchmark Context analysis, plan generation, and auto-debug graphs lack end-to-end Robot suites and performance baselines.
cleveragents/application (container & workflows) Integration, Benchmark Dependency-injection container and workflow orchestrations have Behave coverage only; no Robot validation of wiring nor ASV cold-start timings.
cleveragents/tui & subpackages Benchmarks; integration gaps for persona/input/permissions Only two smoke Robot suites exist; no performance benchmarks for startup or slash catalog rendering; persona/input/permissions lack Robot coverage.
cleveragents/infrastructure/observability Benchmark Metrics emitter/processor code has unit + Robot tests but no latency throughput benchmarks.
cleveragents/lsp, cleveragents/mcp Benchmark (partial) Only registry/runtime benches exist; transport/client components are unmeasured.
cleveragents/shared (redaction.py) Dedicated Unit, Integration, Benchmark Functionality verified indirectly; lacks focused Behave/Robot/ASV coverage of redaction hot paths.
cleveragents/domain/models/{auth,orguserconfig} Integration, Benchmark Unit coverage present; missing persistence round-trip Robot coverage and serialization benchmarks.
cleveragents/templates Benchmark Template rendering lacks ASV timing coverage.
cleveragents/acp, cleveragents/runtime All levels Packages only contain __pycache__; they should either be implemented or documented as stubs to avoid silent omissions.

Recommendations

High Impact (execute next)

  1. LangGraph: add robot/langgraph_graph_execution.robot and robot/langgraph_routing.robot suites plus benchmarks/langgraph_graph_bench.py to measure graph.invoke() and router latency.
  2. Reactive streaming: author Robot suites for application.py and route_bridge.py, and add benchmarks/reactive_stream_bench.py for throughput checks.
  3. Agents graphs: create Robot coverage for context-analysis and auto-debug graphs, then consolidate graph invocation benchmarks in benchmarks/agents_graph_bench.py.

Medium Impact

  1. Application container/workflows: introduce Robot smoke covering DI wiring and workflow orchestration; add benchmarks/application_container_bench.py to keep startup latency within target.
  2. TUI: expand Robot smoke to persona/permissions/input widgets and add startup & slash-catalog benchmarks (benchmarks/tui_startup_bench.py, benchmarks/tui_slash_catalog_bench.py).
  3. Observability: capture emitter/processor throughput via benchmarks/observability_metrics_bench.py to detect regressions that can back up services.
  4. LSP/MCP transports: add ASV modules for client serialization/deserialization and registry lookup latency.

Lower Impact (schedule opportunistically)

  1. Shared redaction: add a focused Behave feature, Robot CLI coverage, and benchmarks/shared_redaction_bench.py for large payload scenarios.
  2. Domain auth/orguserconfig models: extend existing Robot sessions to cover DB round trips and add lightweight serialization benchmarks.
  3. ACP / runtime placeholders: either implement scoped behaviour with tests or replace with documented stubs to avoid false sense of coverage.

Evidence & Metrics

  • ls features/*.feature | wc -l624 unit BDD features in active use.
  • ls robot/*.robot | wc -l316 integration suites (e2e/ adds 16 more end-to-end suites).
  • ls benchmarks/*.py | wc -l231 ASV benchmarks; none for LangGraph, Reactive, TUI, or application container modules.
  • ls -la src/cleveragents/{acp,runtime}/ shows empty placeholders (only __pycache__).
  • Grep across robot/ and benchmarks/ confirms the absence of langgraph, reactive, application_container, and TUI benchmark files.
  • noxfile.py enforces a 97% coverage threshold; bolstering integration/benchmark coverage keeps slips from silently passing that gate.
## Overview - Current CI failure rate is **69.7%**; missing integration and benchmark coverage in critical modules (LangGraph, Reactive, application workflows, TUI) allows regressions to pass until late stages. - Repository already has 624 Behave feature files, 316 Robot suites, and 231 ASV benchmarks, yet several high-churn packages still lack one or more test levels, creating blind spots that map directly to recent failures. ## Duplicate Check - Searched open issues for the tag [AUTO-INF-8] and found only #9688 (cryptography CVE patch), which targets dependency security rather than test coverage. No existing issue covers multi-level test gaps by module. ## Module Gap Findings | Module / Package | Missing Levels | Notes | | --- | --- | --- | | `cleveragents/langgraph` | Integration, Benchmark | Behave features exist, but there are no `langgraph_*.robot` suites nor ASV benches for graph execution, routing, or adapter performance. | | `cleveragents/reactive` | Integration, Benchmark | Streaming/router code is only covered by unit Behave tests; no dedicated Robot coverage or streaming throughput benchmarks. | | `cleveragents/agents` (graphs submodule) | Integration, Benchmark | Context analysis, plan generation, and auto-debug graphs lack end-to-end Robot suites and performance baselines. | | `cleveragents/application` (container & workflows) | Integration, Benchmark | Dependency-injection container and workflow orchestrations have Behave coverage only; no Robot validation of wiring nor ASV cold-start timings. | | `cleveragents/tui` & subpackages | Benchmarks; integration gaps for persona/input/permissions | Only two smoke Robot suites exist; no performance benchmarks for startup or slash catalog rendering; persona/input/permissions lack Robot coverage. | | `cleveragents/infrastructure/observability` | Benchmark | Metrics emitter/processor code has unit + Robot tests but no latency throughput benchmarks. | | `cleveragents/lsp`, `cleveragents/mcp` | Benchmark (partial) | Only registry/runtime benches exist; transport/client components are unmeasured. | | `cleveragents/shared` (`redaction.py`) | Dedicated Unit, Integration, Benchmark | Functionality verified indirectly; lacks focused Behave/Robot/ASV coverage of redaction hot paths. | | `cleveragents/domain/models/{auth,orguserconfig}` | Integration, Benchmark | Unit coverage present; missing persistence round-trip Robot coverage and serialization benchmarks. | | `cleveragents/templates` | Benchmark | Template rendering lacks ASV timing coverage. | | `cleveragents/acp`, `cleveragents/runtime` | All levels | Packages only contain `__pycache__`; they should either be implemented or documented as stubs to avoid silent omissions. ## Recommendations ### High Impact (execute next) 1. **LangGraph**: add `robot/langgraph_graph_execution.robot` and `robot/langgraph_routing.robot` suites plus `benchmarks/langgraph_graph_bench.py` to measure `graph.invoke()` and router latency. 2. **Reactive streaming**: author Robot suites for `application.py` and `route_bridge.py`, and add `benchmarks/reactive_stream_bench.py` for throughput checks. 3. **Agents graphs**: create Robot coverage for context-analysis and auto-debug graphs, then consolidate graph invocation benchmarks in `benchmarks/agents_graph_bench.py`. ### Medium Impact 4. **Application container/workflows**: introduce Robot smoke covering DI wiring and workflow orchestration; add `benchmarks/application_container_bench.py` to keep startup latency within target. 5. **TUI**: expand Robot smoke to persona/permissions/input widgets and add startup & slash-catalog benchmarks (`benchmarks/tui_startup_bench.py`, `benchmarks/tui_slash_catalog_bench.py`). 6. **Observability**: capture emitter/processor throughput via `benchmarks/observability_metrics_bench.py` to detect regressions that can back up services. 7. **LSP/MCP transports**: add ASV modules for client serialization/deserialization and registry lookup latency. ### Lower Impact (schedule opportunistically) 8. **Shared redaction**: add a focused Behave feature, Robot CLI coverage, and `benchmarks/shared_redaction_bench.py` for large payload scenarios. 9. **Domain auth/orguserconfig models**: extend existing Robot sessions to cover DB round trips and add lightweight serialization benchmarks. 10. **ACP / runtime placeholders**: either implement scoped behaviour with tests or replace with documented stubs to avoid false sense of coverage. ## Evidence & Metrics - `ls features/*.feature | wc -l` → **624** unit BDD features in active use. - `ls robot/*.robot | wc -l` → **316** integration suites (`e2e/` adds 16 more end-to-end suites). - `ls benchmarks/*.py | wc -l` → **231** ASV benchmarks; none for LangGraph, Reactive, TUI, or application container modules. - `ls -la src/cleveragents/{acp,runtime}/` shows empty placeholders (only `__pycache__`). - Grep across `robot/` and `benchmarks/` confirms the absence of `langgraph`, `reactive`, `application_container`, and TUI benchmark files. - `noxfile.py` enforces a 97% coverage threshold; bolstering integration/benchmark coverage keeps slips from silently passing that gate.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9781
No description provided.