TEST-INFRA: [ci-execution-time] Make benchmark regression job on-demand #2331

New issue

Open

opened 2026-04-03 14:32:26 +00:00 by freemo · 1 comment

freemo commented

2026-04-03 14:32:26 +00:00

Owner

Metadata

Branch: task/ci-benchmark-regression-on-demand
Commit Message: ci(ci-execution-time): make benchmark regression job on-demand via PR label
Milestone: v3.8.0
Parent Epic: #1678

Background and Context

The benchmark regression job currently runs on every pull request as part of the standard CI pipeline (see #1991). While this ensures performance regressions are caught early, the benchmark suite is inherently time-consuming — it must execute the full ASV benchmark suite, compare results against a stored baseline, and report deltas. For the vast majority of PRs (documentation updates, minor refactors, configuration changes, test-only changes), this overhead is unnecessary and adds significant wall-clock time to every CI run.

Making the benchmark regression job on-demand would:

Reduce CI execution time for the common case (PRs that do not touch performance-sensitive code paths).
Save shared runner capacity by avoiding expensive benchmark runs on changes that cannot plausibly affect performance.
Preserve correctness guarantees by ensuring benchmarks are still run whenever a contributor or reviewer explicitly requests it, or when changes touch performance-critical subsystems (plan lifecycle, actor runtime, resource registry, decision recording).

Proposed Solution

Gate the benchmark regression CI job behind a PR label (e.g., run-benchmarks). The Forgejo workflow condition would be:

if: contains(github.event.pull_request.labels.*.name, 'run-benchmarks')

When a contributor or reviewer adds the run-benchmarks label to a PR, the benchmark job is triggered. For all other PRs, the job is skipped entirely. The benchmark job continues to run unconditionally on pushes to master (to keep the stored baseline up to date).

Subtasks

Add run-benchmarks label to the repository's label set (colour suggestion: #0075ca)
Update .forgejo/workflows/ci.yml to add an if: condition on the benchmark regression job so it only runs when the run-benchmarks label is present on the PR (or on push to master)
Update features/ci_workflow_validation.feature to assert the on-demand condition is present on the benchmark job
Update CONTRIBUTING.md (or relevant docs) to document the run-benchmarks label and when contributors should apply it
Verify all other nox stages continue to pass
Confirm coverage ≥ 97%

Definition of Done

The run-benchmarks label exists in the repository
The benchmark regression CI job is skipped on PRs that do not carry the run-benchmarks label
The benchmark regression CI job runs unconditionally on pushes to master
features/ci_workflow_validation.feature is updated to cover the on-demand condition
CONTRIBUTING.md documents the run-benchmarks label and guidance on when to apply it
All nox stages pass
Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

## Metadata - **Branch**: `task/ci-benchmark-regression-on-demand` - **Commit Message**: `ci(ci-execution-time): make benchmark regression job on-demand via PR label` - **Milestone**: v3.8.0 - **Parent Epic**: #1678 ## Background and Context The benchmark regression job currently runs on every pull request as part of the standard CI pipeline (see #1991). While this ensures performance regressions are caught early, the benchmark suite is inherently time-consuming — it must execute the full ASV benchmark suite, compare results against a stored baseline, and report deltas. For the vast majority of PRs (documentation updates, minor refactors, configuration changes, test-only changes), this overhead is unnecessary and adds significant wall-clock time to every CI run. Making the benchmark regression job on-demand would: - **Reduce CI execution time** for the common case (PRs that do not touch performance-sensitive code paths). - **Save shared runner capacity** by avoiding expensive benchmark runs on changes that cannot plausibly affect performance. - **Preserve correctness guarantees** by ensuring benchmarks are still run whenever a contributor or reviewer explicitly requests it, or when changes touch performance-critical subsystems (plan lifecycle, actor runtime, resource registry, decision recording). ## Proposed Solution Gate the benchmark regression CI job behind a PR label (e.g., `run-benchmarks`). The Forgejo workflow condition would be: ```yaml if: contains(github.event.pull_request.labels.*.name, 'run-benchmarks') ``` When a contributor or reviewer adds the `run-benchmarks` label to a PR, the benchmark job is triggered. For all other PRs, the job is skipped entirely. The benchmark job continues to run unconditionally on pushes to `master` (to keep the stored baseline up to date). ## Subtasks - [ ] Add `run-benchmarks` label to the repository's label set (colour suggestion: `#0075ca`) - [ ] Update `.forgejo/workflows/ci.yml` to add an `if:` condition on the benchmark regression job so it only runs when the `run-benchmarks` label is present on the PR (or on push to `master`) - [ ] Update `features/ci_workflow_validation.feature` to assert the on-demand condition is present on the benchmark job - [ ] Update `CONTRIBUTING.md` (or relevant docs) to document the `run-benchmarks` label and when contributors should apply it - [ ] Verify all other nox stages continue to pass - [ ] Confirm coverage ≥ 97% ## Definition of Done - [ ] The `run-benchmarks` label exists in the repository - [ ] The benchmark regression CI job is skipped on PRs that do not carry the `run-benchmarks` label - [ ] The benchmark regression CI job runs unconditionally on pushes to `master` - [ ] `features/ci_workflow_validation.feature` is updated to cover the on-demand condition - [ ] `CONTRIBUTING.md` documents the `run-benchmarks` label and guidance on when to apply it - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

freemo added this to the v3.8.0 milestone

2026-04-03 14:32:32 +00:00

freemo commented

2026-04-03 14:33:41 +00:00

Author

Owner

Issue triaged by project owner:

State: Verified
Priority: Low (confirmed)
Milestone: v3.8.0 (confirmed — CI infrastructure)
MoSCoW: Could Have — Making benchmark regression on-demand is a CI optimization. Not blocking any deliverables.
Parent Epic: #1678 (confirmed correct)

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Low (confirmed) - **Milestone**: v3.8.0 (confirmed — CI infrastructure) - **MoSCoW**: Could Have — Making benchmark regression on-demand is a CI optimization. Not blocking any deliverables. - **Parent Epic**: #1678 (confirmed correct) --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner

freemo added the

MoSCoW

Could have

label

2026-04-03 14:33:41 +00:00

freemo added a new dependency

2026-04-03 14:34:23 +00:00

#1678 Epic: CI Execution Time Optimization — Timeouts, Concurrency, and Coverage Artifact Sharing

freemo removed the

MoSCoW

Could have

label