Add ASV benchmarks for LangGraph and TUI #9046

Open
opened 2026-04-14 06:29:01 +00:00 by HAL9000 · 1 comment
Owner

Summary

  • LangGraph orchestrator code still only has Behave scenarios (e.g. features/consolidated_langgraph.feature) and incidental Robot coverage, but it has no ASV benchmarks to catch performance regressions in langgraph.graph and langgraph.nodes.
  • The Textual TUI package has multiple Behave specs (features/tui_app_coverage.feature, tui_shell_exec_coverage.feature) and Robot smoke tests (robot/tui_smoke.robot), yet the benchmarks/ suite contains no entries that exercise TUI rendering or command routing.

Findings

Module Behave coverage Robot coverage Missing level
langgraph features/consolidated_langgraph.feature exercises bridge, dynamic router, nodes, and state helpers. robot/rxpy_route_validation.robot runs actor run with LangGraph routes (lines 65-82) and robot/plan_generation_graph.robot hits the LangGraph checkpoint path. No benchmarks/*langgraph* files (grep over benchmarks/*.py returned no matches).
tui features/tui_app_coverage.feature and related specs cover the Textual app, session view, and slash catalog. robot/tui_smoke.robot launches python -m cleveragents tui --headless and exercises widgets and screens. No benchmarks/*tui* files in the ASV suite (grep over benchmarks/*.py returned no matches).

The Behave/Robot suites prove the modules functionally, but performance regressions (e.g., graph execution throughput, TUI render latency) would land unnoticed without ASV coverage. Both modules rely on event loops and rendering hooks that commonly regress under load.

Proposal

  1. Add ASV benchmarks targeting LangGraph execution hotspots (e.g., GraphExecutor.step, nodes.RouterNode.route, checkpoint read/write) using representative plan graphs.
  2. Add ASV benchmarks for the TUI layer that measure layout/render cycle time and slash-command catalog hydration with realistic persona/session fixtures.
  3. Wire the new benchmarks into asv.conf.json and ensure they run under the existing nox -s benchmark session.
  4. Document the new benchmark entry points in docs/development/testing.md so infrastructure workers can run them manually when investigating regressions.

Additional Context

  • Behave evidence: features/consolidated_langgraph.feature (lines 1-37) and features/tui_app_coverage.feature (lines 1-39).
  • Robot evidence: robot/rxpy_route_validation.robot (lines 65-82) for LangGraph routes, robot/tui_smoke.robot (lines 6-57) for TUI widgets.
  • Benchmark gap: grep 'langgraph' benchmarks/*.py and grep 'tui' benchmarks/*.py both returned 'No files found' on commit c76fded5022c9f3d7f54efcf91746197e864af4e.
  • Closed issue #8333 flagged similar gaps but was closed as 'superseded by next cycle' without the benchmarks ever landing; the absence persists on the current master snapshot.

Duplicate Check

  • 2026-04-14: GET /api/v1/repos/cleveragents/cleveragents-core/issues?state=open&limit=50&page=1..3 filtered for langgraph + benchmark → no matches.
  • 2026-04-14: Same query filtered for tui + benchmark → no matches.
  • 2026-04-14: Reviewed closed issue #8333 ([AUTO-INF-12] Missing Test Levels Identified) — closed as superseded; gaps still present in current tree.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

## Summary - LangGraph orchestrator code still only has Behave scenarios (e.g. `features/consolidated_langgraph.feature`) and incidental Robot coverage, but it has no ASV benchmarks to catch performance regressions in `langgraph.graph` and `langgraph.nodes`. - The Textual TUI package has multiple Behave specs (`features/tui_app_coverage.feature`, `tui_shell_exec_coverage.feature`) and Robot smoke tests (`robot/tui_smoke.robot`), yet the `benchmarks/` suite contains no entries that exercise TUI rendering or command routing. ## Findings | Module | Behave coverage | Robot coverage | Missing level | | --- | --- | --- | --- | | `langgraph` | `features/consolidated_langgraph.feature` exercises bridge, dynamic router, nodes, and state helpers. | `robot/rxpy_route_validation.robot` runs `actor run` with LangGraph routes (lines 65-82) and `robot/plan_generation_graph.robot` hits the LangGraph checkpoint path. | No `benchmarks/*langgraph*` files (`grep` over `benchmarks/*.py` returned no matches). | | `tui` | `features/tui_app_coverage.feature` and related specs cover the Textual app, session view, and slash catalog. | `robot/tui_smoke.robot` launches `python -m cleveragents tui --headless` and exercises widgets and screens. | No `benchmarks/*tui*` files in the ASV suite (`grep` over `benchmarks/*.py` returned no matches). | The Behave/Robot suites prove the modules functionally, but performance regressions (e.g., graph execution throughput, TUI render latency) would land unnoticed without ASV coverage. Both modules rely on event loops and rendering hooks that commonly regress under load. ## Proposal 1. Add ASV benchmarks targeting LangGraph execution hotspots (e.g., `GraphExecutor.step`, `nodes.RouterNode.route`, checkpoint read/write) using representative plan graphs. 2. Add ASV benchmarks for the TUI layer that measure layout/render cycle time and slash-command catalog hydration with realistic persona/session fixtures. 3. Wire the new benchmarks into `asv.conf.json` and ensure they run under the existing `nox -s benchmark` session. 4. Document the new benchmark entry points in `docs/development/testing.md` so infrastructure workers can run them manually when investigating regressions. ## Additional Context - Behave evidence: `features/consolidated_langgraph.feature` (lines 1-37) and `features/tui_app_coverage.feature` (lines 1-39). - Robot evidence: `robot/rxpy_route_validation.robot` (lines 65-82) for LangGraph routes, `robot/tui_smoke.robot` (lines 6-57) for TUI widgets. - Benchmark gap: `grep 'langgraph' benchmarks/*.py` and `grep 'tui' benchmarks/*.py` both returned 'No files found' on commit c76fded5022c9f3d7f54efcf91746197e864af4e. - Closed issue #8333 flagged similar gaps but was closed as 'superseded by next cycle' without the benchmarks ever landing; the absence persists on the current master snapshot. ### Duplicate Check - 2026-04-14: GET /api/v1/repos/cleveragents/cleveragents-core/issues?state=open&limit=50&page=1..3 filtered for `langgraph` + `benchmark` → no matches. - 2026-04-14: Same query filtered for `tui` + `benchmark` → no matches. - 2026-04-14: Reviewed closed issue #8333 (`[AUTO-INF-12] Missing Test Levels Identified`) — closed as superseded; gaps still present in current tree. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-worker
HAL9000 added this to the v3.9.0 milestone 2026-04-14 06:43:43 +00:00
Author
Owner

Verified — Test coverage: ASV benchmarks for LangGraph and TUI. MoSCoW: Could-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Test coverage: ASV benchmarks for LangGraph and TUI. MoSCoW: Could-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9046
No description provided.