TEST-INFRA: [ci-execution-time] Add job-level timeouts to all CI jobs to prevent hung runners #1700

Closed
opened 2026-04-02 23:31:18 +00:00 by freemo · 1 comment
Owner

Metadata

  • Branch: fix/ci-job-timeouts
  • Commit Message: fix(ci): add timeout-minutes to all CI jobs to prevent hung runners
  • Milestone: v3.8.0
  • Parent Epic: #1678

Problem

In .forgejo/workflows/ci.yml, only the e2e_tests job has timeout-minutes: 45. All other jobs — lint, typecheck, security, quality, unit_tests, integration_tests, coverage, build, docker, helm, status-check — have no timeout.

If any of these jobs hangs (e.g., a test waiting on a network resource, a subprocess deadlock, or a flaky tool), the job runs until the runner's global timeout (typically 6 hours), blocking the shared runner pool and delaying all other CI work.

Observed risk: The unit_tests job installs Helm CLI via curl from an external URL. A network timeout here would hang the job indefinitely without a job-level timeout.

Solution

Add appropriate timeout-minutes values to every job based on expected runtime:

Job Suggested timeout
lint 10
typecheck 10
security 10
quality 10
unit_tests 30
integration_tests 45
coverage 30
build 10
docker 20
helm 10
status-check 5
benchmark-regression 60
benchmark-publish 60

Values should be set conservatively (2× expected runtime) to allow for slow runners while still bounding worst-case hang time.

Subtasks

  • Add timeout-minutes to each job in .forgejo/workflows/ci.yml per the table above
  • Update features/ci_workflow_validation.feature to assert every job has a timeout-minutes field
  • Verify the Behave scenario passes with nox -s unit_tests
  • Verify all nox stages pass

Definition of Done

  • Every job in .forgejo/workflows/ci.yml has an explicit timeout-minutes value
  • features/ci_workflow_validation.feature has a scenario asserting all jobs have timeouts
  • All nox stages pass
  • Coverage >= 97%

Automated by CleverAgents Bot
Supervisor: Test Infrastructure | Agent: ca-new-issue-creator

## Metadata - **Branch**: `fix/ci-job-timeouts` - **Commit Message**: `fix(ci): add timeout-minutes to all CI jobs to prevent hung runners` - **Milestone**: v3.8.0 - **Parent Epic**: #1678 ## Problem In `.forgejo/workflows/ci.yml`, only the `e2e_tests` job has `timeout-minutes: 45`. All other jobs — `lint`, `typecheck`, `security`, `quality`, `unit_tests`, `integration_tests`, `coverage`, `build`, `docker`, `helm`, `status-check` — have no timeout. If any of these jobs hangs (e.g., a test waiting on a network resource, a subprocess deadlock, or a flaky tool), the job runs until the runner's global timeout (typically 6 hours), blocking the shared runner pool and delaying all other CI work. **Observed risk**: The `unit_tests` job installs Helm CLI via `curl` from an external URL. A network timeout here would hang the job indefinitely without a job-level timeout. ## Solution Add appropriate `timeout-minutes` values to every job based on expected runtime: | Job | Suggested timeout | |-----|------------------| | `lint` | 10 | | `typecheck` | 10 | | `security` | 10 | | `quality` | 10 | | `unit_tests` | 30 | | `integration_tests` | 45 | | `coverage` | 30 | | `build` | 10 | | `docker` | 20 | | `helm` | 10 | | `status-check` | 5 | | `benchmark-regression` | 60 | | `benchmark-publish` | 60 | Values should be set conservatively (2× expected runtime) to allow for slow runners while still bounding worst-case hang time. ## Subtasks - [ ] Add `timeout-minutes` to each job in `.forgejo/workflows/ci.yml` per the table above - [ ] Update `features/ci_workflow_validation.feature` to assert every job has a `timeout-minutes` field - [ ] Verify the Behave scenario passes with `nox -s unit_tests` - [ ] Verify all nox stages pass ## Definition of Done - [ ] Every job in `.forgejo/workflows/ci.yml` has an explicit `timeout-minutes` value - [ ] `features/ci_workflow_validation.feature` has a scenario asserting all jobs have timeouts - [ ] All nox stages pass - [ ] Coverage >= 97% --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure | Agent: ca-new-issue-creator
freemo added this to the v3.8.0 milestone 2026-04-02 23:32:44 +00:00
Author
Owner

Closing as duplicate. CI parallelization and execution time improvements are tracked in #1604 (setup consolidation), #1536 (parallelize static analysis), and #1632 (split integration tests).


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: ca-project-owner

Closing as duplicate. CI parallelization and execution time improvements are tracked in #1604 (setup consolidation), #1536 (parallelize static analysis), and #1632 (split integration tests). --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: ca-project-owner
freemo 2026-04-02 23:41:36 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#1700
No description provided.