[AUTO-INF-1] Reduce benchmark_regression job time by targeting impacted ASV benches #8243

Open
opened 2026-04-13 06:16:02 +00:00 by HAL9000 · 0 comments
Owner

Summary

  • The PR CI pipeline spends roughly 57 minutes on the benchmark_regression job, dominating wall-clock time for every pull request.
  • The job currently runs all 231 Airspeed Velocity benchmark modules and re-syncs ~205 MiB of historical results on every PR, even if the change only touches a single subsystem.

Evidence

  • .forgejo/workflows/ci.yml lines 409-453 configure benchmark-regression to run on each pull_request.
  • ci_logs/pr6363_benchmark-regression.log:2699606nox > Session benchmark_regression was successful in 57 minutes.
  • ci_logs/pr6363_benchmark-regression.log:999-1537aws s3 sync downloads ~204.8 MiB (297 files) before benchmarks start.
  • uv_uv_run python -c "import glob; print(len(glob.glob('benchmarks/*.py')))"231 benchmark modules executed in every run.

Proposal

  1. Add a change-detection helper (e.g. scripts/select_benchmarks.py) that maps git diff --name-only $ASV_BASE_SHA to the relevant benchmark name patterns.
  2. Update the benchmark_regression job to call nox -s benchmark_regression -- --bench <pattern> for only the impacted suites, defaulting to a lightweight smoke set (for example the existing m1_* benches) when no mapping is found.
  3. Provide an escape hatch:
    • run the full suite when the PR is labeled full-benchmark or when changes touch benchmarks/ or the selector helper; and
    • keep the nightly benchmark-publish workflow as-is so the entire suite still runs regularly.
  4. Emit the selected bench patterns in the job log so reviewers can confirm coverage.

Duplicate Check

  • curl --compressed ...issues?state=open&limit=50 filtered for benchmark → only ACMS performance feature requests (#8201/#8202), nothing about CI runtime.
  • curl --compressed ...issues?state=open&limit=50 filtered for asv → no results.
  • curl --compressed ...issues?state=closed&limit=50 filtered for benchmark → automation status ticket #7923 only.
  • curl --compressed ...issues?state=open&labels=TEST-INFRA → no CI benchmarking tickets.
  • Confident this gap has not been filed elsewhere.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

## Summary - The PR CI pipeline spends roughly **57 minutes** on the `benchmark_regression` job, dominating wall-clock time for every pull request. - The job currently runs all 231 Airspeed Velocity benchmark modules and re-syncs ~205 MiB of historical results on *every* PR, even if the change only touches a single subsystem. ## Evidence - `.forgejo/workflows/ci.yml` lines 409-453 configure `benchmark-regression` to run on each `pull_request`. - `ci_logs/pr6363_benchmark-regression.log:2699606` — `nox > Session benchmark_regression was successful in 57 minutes.` - `ci_logs/pr6363_benchmark-regression.log:999-1537` — `aws s3 sync` downloads ~204.8 MiB (297 files) before benchmarks start. - `uv_uv_run python -c "import glob; print(len(glob.glob('benchmarks/*.py')))"` → `231` benchmark modules executed in every run. ## Proposal 1. Add a change-detection helper (e.g. `scripts/select_benchmarks.py`) that maps `git diff --name-only $ASV_BASE_SHA` to the relevant benchmark name patterns. 2. Update the `benchmark_regression` job to call `nox -s benchmark_regression -- --bench <pattern>` for only the impacted suites, defaulting to a lightweight smoke set (for example the existing `m1_*` benches) when no mapping is found. 3. Provide an escape hatch: - run the full suite when the PR is labeled `full-benchmark` or when changes touch `benchmarks/` or the selector helper; and - keep the nightly `benchmark-publish` workflow as-is so the entire suite still runs regularly. 4. Emit the selected bench patterns in the job log so reviewers can confirm coverage. ### Duplicate Check - ✅ `curl --compressed ...issues?state=open&limit=50` filtered for `benchmark` → only ACMS performance feature requests (#8201/#8202), nothing about CI runtime. - ✅ `curl --compressed ...issues?state=open&limit=50` filtered for `asv` → no results. - ✅ `curl --compressed ...issues?state=closed&limit=50` filtered for `benchmark` → automation status ticket #7923 only. - ✅ `curl --compressed ...issues?state=open&labels=TEST-INFRA` → no CI benchmarking tickets. - ✅ Confident this gap has not been filed elsewhere. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-worker
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8243
No description provided.