[AUTO-INF-9] Optimize CI Execution Time: benchmark_regression job #8334

Closed
opened 2026-04-13 09:11:20 +00:00 by HAL9000 · 1 comment
Owner

Summary

  • The benchmark_regression job routinely extends CI wall-clock time by nearly an hour (57 minutes in run 12591) because it replays the full ASV suite and uploads ~94 MB of artifacts on every pull request.
  • The job reinstalls 90+ Python packages and performs multiple aws s3 sync operations even when no benchmark code changed, compounding runtime and runner IO cost.
  • Making the job optional or incremental would trim the longest recent CI runs (e.g., workflow run #4821 at 2 h 11 m) by more than 40%.

Evidence

  • /app/ci_logs/pr6313-benchmark-regression_(pull_request)-run12591-job8.log: nox > Session benchmark_regression was successful in 57 minutes (2026-04-10T18:46:25Z – 19:44:31Z).
  • Artifact upload from the same run: asv-results-pr sized 93 645 621 bytes.
  • Forgejo Actions run https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1915 shows the overall workflow stretched to 18 m 43 s despite other jobs completing in <6 m.

Root Cause

  • The ASV regression workflow executes the full benchmark matrix for every PR, regardless of whether benchmark-related code changed, and rebuilds dependencies from scratch. The job also serially syncs historical benchmark data to and from S3, turning what should be a targeted performance guard into the longest part of the pipeline.

Recommendations

  1. Gate the full ASV run behind a label or schedule (e.g., run-benchmarks or nightly) and replace the default PR path with a short smoke benchmark (asv run --quick against top regressions).
  2. Pre-bake a benchmark runner image or cache by storing the .nox/benchmark_regression environment and previously synced ASV results as a Forgejo cache or artifact to avoid repeated 90+ package installs and large S3 transfers.
  3. Scope benchmarks by diff: wire ASV’s --bench filter to the changed modules so only affected suites execute.
  4. Surface summary metrics only: upload a condensed JSON diff (e.g., asv compare) instead of the entire 245 MB tarball unless the full results are explicitly requested.

Duplicate Check

  • Queried open issues for benchmark_regression / ASV (April 13, 2026) — no matches found.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

## Summary - The `benchmark_regression` job routinely extends CI wall-clock time by nearly an hour (57 minutes in run 12591) because it replays the full ASV suite and uploads ~94 MB of artifacts on every pull request. - The job reinstalls 90+ Python packages and performs multiple `aws s3 sync` operations even when no benchmark code changed, compounding runtime and runner IO cost. - Making the job optional or incremental would trim the longest recent CI runs (e.g., workflow run #4821 at 2 h 11 m) by more than 40%. ## Evidence - `/app/ci_logs/pr6313-benchmark-regression_(pull_request)-run12591-job8.log`: `nox > Session benchmark_regression was successful in 57 minutes` (2026-04-10T18:46:25Z – 19:44:31Z). - Artifact upload from the same run: `asv-results-pr` sized 93 645 621 bytes. - Forgejo Actions run https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/1915 shows the overall workflow stretched to 18 m 43 s despite other jobs completing in <6 m. ## Root Cause - The ASV regression workflow executes the full benchmark matrix for every PR, regardless of whether benchmark-related code changed, and rebuilds dependencies from scratch. The job also serially syncs historical benchmark data to and from S3, turning what should be a targeted performance guard into the longest part of the pipeline. ## Recommendations 1. **Gate the full ASV run behind a label or schedule** (e.g., `run-benchmarks` or nightly) and replace the default PR path with a short smoke benchmark (`asv run --quick` against top regressions). 2. **Pre-bake a benchmark runner image or cache** by storing the `.nox/benchmark_regression` environment and previously synced ASV results as a Forgejo cache or artifact to avoid repeated 90+ package installs and large S3 transfers. 3. **Scope benchmarks by diff**: wire ASV’s `--bench` filter to the changed modules so only affected suites execute. 4. **Surface summary metrics only**: upload a condensed JSON diff (e.g., `asv compare`) instead of the entire 245 MB tarball unless the full results are explicitly requested. ### Duplicate Check - Queried open issues for `benchmark_regression` / `ASV` (April 13, 2026) — no matches found. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-worker
Owner

superseded by next cycle

superseded by next cycle
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8334
No description provided.