TEST-INFRA: [ci-execution-time] Make benchmark regression job on-demand #2331

Open
opened 2026-04-03 14:32:26 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: task/ci-benchmark-regression-on-demand
  • Commit Message: ci(ci-execution-time): make benchmark regression job on-demand via PR label
  • Milestone: v3.8.0
  • Parent Epic: #1678

Background and Context

The benchmark regression job currently runs on every pull request as part of the standard CI pipeline (see #1991). While this ensures performance regressions are caught early, the benchmark suite is inherently time-consuming — it must execute the full ASV benchmark suite, compare results against a stored baseline, and report deltas. For the vast majority of PRs (documentation updates, minor refactors, configuration changes, test-only changes), this overhead is unnecessary and adds significant wall-clock time to every CI run.

Making the benchmark regression job on-demand would:

  • Reduce CI execution time for the common case (PRs that do not touch performance-sensitive code paths).
  • Save shared runner capacity by avoiding expensive benchmark runs on changes that cannot plausibly affect performance.
  • Preserve correctness guarantees by ensuring benchmarks are still run whenever a contributor or reviewer explicitly requests it, or when changes touch performance-critical subsystems (plan lifecycle, actor runtime, resource registry, decision recording).

Proposed Solution

Gate the benchmark regression CI job behind a PR label (e.g., run-benchmarks). The Forgejo workflow condition would be:

if: contains(github.event.pull_request.labels.*.name, 'run-benchmarks')

When a contributor or reviewer adds the run-benchmarks label to a PR, the benchmark job is triggered. For all other PRs, the job is skipped entirely. The benchmark job continues to run unconditionally on pushes to master (to keep the stored baseline up to date).

Subtasks

  • Add run-benchmarks label to the repository's label set (colour suggestion: #0075ca)
  • Update .forgejo/workflows/ci.yml to add an if: condition on the benchmark regression job so it only runs when the run-benchmarks label is present on the PR (or on push to master)
  • Update features/ci_workflow_validation.feature to assert the on-demand condition is present on the benchmark job
  • Update CONTRIBUTING.md (or relevant docs) to document the run-benchmarks label and when contributors should apply it
  • Verify all other nox stages continue to pass
  • Confirm coverage ≥ 97%

Definition of Done

  • The run-benchmarks label exists in the repository
  • The benchmark regression CI job is skipped on PRs that do not carry the run-benchmarks label
  • The benchmark regression CI job runs unconditionally on pushes to master
  • features/ci_workflow_validation.feature is updated to cover the on-demand condition
  • CONTRIBUTING.md documents the run-benchmarks label and guidance on when to apply it
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

## Metadata - **Branch**: `task/ci-benchmark-regression-on-demand` - **Commit Message**: `ci(ci-execution-time): make benchmark regression job on-demand via PR label` - **Milestone**: v3.8.0 - **Parent Epic**: #1678 ## Background and Context The benchmark regression job currently runs on every pull request as part of the standard CI pipeline (see #1991). While this ensures performance regressions are caught early, the benchmark suite is inherently time-consuming — it must execute the full ASV benchmark suite, compare results against a stored baseline, and report deltas. For the vast majority of PRs (documentation updates, minor refactors, configuration changes, test-only changes), this overhead is unnecessary and adds significant wall-clock time to every CI run. Making the benchmark regression job on-demand would: - **Reduce CI execution time** for the common case (PRs that do not touch performance-sensitive code paths). - **Save shared runner capacity** by avoiding expensive benchmark runs on changes that cannot plausibly affect performance. - **Preserve correctness guarantees** by ensuring benchmarks are still run whenever a contributor or reviewer explicitly requests it, or when changes touch performance-critical subsystems (plan lifecycle, actor runtime, resource registry, decision recording). ## Proposed Solution Gate the benchmark regression CI job behind a PR label (e.g., `run-benchmarks`). The Forgejo workflow condition would be: ```yaml if: contains(github.event.pull_request.labels.*.name, 'run-benchmarks') ``` When a contributor or reviewer adds the `run-benchmarks` label to a PR, the benchmark job is triggered. For all other PRs, the job is skipped entirely. The benchmark job continues to run unconditionally on pushes to `master` (to keep the stored baseline up to date). ## Subtasks - [ ] Add `run-benchmarks` label to the repository's label set (colour suggestion: `#0075ca`) - [ ] Update `.forgejo/workflows/ci.yml` to add an `if:` condition on the benchmark regression job so it only runs when the `run-benchmarks` label is present on the PR (or on push to `master`) - [ ] Update `features/ci_workflow_validation.feature` to assert the on-demand condition is present on the benchmark job - [ ] Update `CONTRIBUTING.md` (or relevant docs) to document the `run-benchmarks` label and when contributors should apply it - [ ] Verify all other nox stages continue to pass - [ ] Confirm coverage ≥ 97% ## Definition of Done - [ ] The `run-benchmarks` label exists in the repository - [ ] The benchmark regression CI job is skipped on PRs that do not carry the `run-benchmarks` label - [ ] The benchmark regression CI job runs unconditionally on pushes to `master` - [ ] `features/ci_workflow_validation.feature` is updated to cover the on-demand condition - [ ] `CONTRIBUTING.md` documents the `run-benchmarks` label and guidance on when to apply it - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator
freemo added this to the v3.8.0 milestone 2026-04-03 14:32:32 +00:00
Author
Owner

Issue triaged by project owner:

  • State: Verified
  • Priority: Low (confirmed)
  • Milestone: v3.8.0 (confirmed — CI infrastructure)
  • MoSCoW: Could Have — Making benchmark regression on-demand is a CI optimization. Not blocking any deliverables.
  • Parent Epic: #1678 (confirmed correct)

Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Issue triaged by project owner: - **State**: Verified - **Priority**: Low (confirmed) - **Milestone**: v3.8.0 (confirmed — CI infrastructure) - **MoSCoW**: Could Have — Making benchmark regression on-demand is a CI optimization. Not blocking any deliverables. - **Parent Epic**: #1678 (confirmed correct) --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo removed this from the v3.8.0 milestone 2026-04-07 01:01:25 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#2331
No description provided.