CI: Reduce runner queue times for docker jobs #9039

Open
opened 2026-04-14 06:21:14 +00:00 by HAL9000 · 1 comment
Owner

Summary

  • Analysis of the latest 50 CI runs shows queue delays before jobs start are a consistent bottleneck. The 90th percentile wait is 22.3 minutes, with a maximum of 38.1 minutes.
  • Seven recent runs (e.g., run IDs 4821, 4427, 4430, 4434) waited more than 20 minutes before starting, even though job execution already stretches past 45 minutes.
  • All affected jobs target the generic runs-on: docker label, so long-lived tasks (integration, e2e, docker build, coverage) contend for the same limited runner pool.

Impact

  • Developers regularly experience 30–40 minute idle periods before CI feedback begins.
  • Long queues amplify overall turnaround time to 90–130 minutes for some pull requests and increase the number of cancelled/restarted runs.

Proposal

  1. Add at least one additional self-hosted runner (or increase capacity) for the docker label, or split heavy jobs onto a dedicated label (e.g., docker-highmem).
  2. Introduce workflow-level concurrency / cancel-in-progress policies so newer pushes automatically reclaim capacity from superseded runs.
  3. Evaluate moving non-blocking jobs (e.g., docker image build, helm lint) onto the docker-benchmark pool or a separate workflow to free core runners.
  4. Monitor queue time metrics after the change; target p90 queue time under 5 minutes.

Acceptance Criteria

  • CI telemetry (Actions run logs) shows p90 queue delay ≤ 5 minutes for the ci.yml workflow.
  • Documentation in docs/development/ci-cd.md (or similar) explains the revised runner topology.
  • No regression in overall job pass/fail rates.
## Summary - Analysis of the latest 50 CI runs shows queue delays before jobs start are a consistent bottleneck. The 90th percentile wait is **22.3 minutes**, with a maximum of **38.1 minutes**. - Seven recent runs (e.g., run IDs 4821, 4427, 4430, 4434) waited more than 20 minutes before starting, even though job execution already stretches past 45 minutes. - All affected jobs target the generic `runs-on: docker` label, so long-lived tasks (integration, e2e, docker build, coverage) contend for the same limited runner pool. ## Impact - Developers regularly experience 30–40 minute idle periods before CI feedback begins. - Long queues amplify overall turnaround time to 90–130 minutes for some pull requests and increase the number of cancelled/restarted runs. ## Proposal 1. Add at least one additional self-hosted runner (or increase capacity) for the `docker` label, or split heavy jobs onto a dedicated label (e.g., `docker-highmem`). 2. Introduce workflow-level `concurrency` / `cancel-in-progress` policies so newer pushes automatically reclaim capacity from superseded runs. 3. Evaluate moving non-blocking jobs (e.g., docker image build, helm lint) onto the `docker-benchmark` pool or a separate workflow to free core runners. 4. Monitor queue time metrics after the change; target p90 queue time under 5 minutes. ## Acceptance Criteria - CI telemetry (Actions run logs) shows p90 queue delay ≤ 5 minutes for the `ci.yml` workflow. - Documentation in `docs/development/ci-cd.md` (or similar) explains the revised runner topology. - No regression in overall job pass/fail rates.
HAL9000 added this to the v3.9.0 milestone 2026-04-14 06:43:42 +00:00
Author
Owner

Verified — CI improvement: reduce runner queue times for docker jobs. MoSCoW: Should-have. Priority: Medium.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — CI improvement: reduce runner queue times for docker jobs. MoSCoW: Should-have. Priority: Medium. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9039
No description provided.