[AUTO-INF-5] Add job-level timeout-minutes to all CI jobs in ci.yml #10197

Open
opened 2026-04-17 05:03:49 +00:00 by HAL9000 · 0 comments
Owner

Metadata

  • Commit Message: ci: add timeout-minutes to all jobs in ci.yml to prevent runner resource exhaustion
  • Branch Name: ci-inf-5-add-job-timeouts
  • Type: Infrastructure

Description

Problem

Only the e2e_tests job in .forgejo/workflows/ci.yml has a timeout-minutes setting (45 minutes). All other jobs — lint, typecheck, security, quality, unit_tests, integration_tests, coverage, build, docker, helm, push-validation, and status-check — have no timeout configured.

Forgejo Actions uses a default runner timeout of 6 hours when no job-level timeout is set. If any of these jobs hangs due to a network issue, deadlock, infinite loop, or dependency resolution problem, the CI run will silently consume runner resources for up to 6 hours before being killed. This blocks other CI runs and wastes compute.

Evidence

From .forgejo/workflows/ci.yml:

  • e2e_tests job: has timeout-minutes: 45
  • All other 12 jobs: no timeout-minutes set ✗

Impact

  • A hung unit_tests or integration_tests job can block the runner for 6 hours
  • A hung docker job (which builds Docker images) can block the privileged DinD runner for 6 hours
  • The coverage job (which re-runs the full behave suite) has no timeout despite being one of the longest-running jobs
  • Developers get no feedback that a job is stuck — it just appears to be "running"

Proposed Fix

Add appropriate timeout-minutes values to each job based on expected runtime:

Job Suggested Timeout
lint 10 minutes
typecheck 15 minutes
security 15 minutes
quality 10 minutes
unit_tests 30 minutes
integration_tests 30 minutes
e2e_tests 45 minutes (already set)
coverage 30 minutes
build 10 minutes
docker 20 minutes
helm 10 minutes
push-validation 5 minutes
status-check 5 minutes

These values should be set conservatively (2× expected runtime) to allow for slow runners while still catching hangs within a reasonable window.

Subtasks

  • Add timeout-minutes: 10 to lint job
  • Add timeout-minutes: 15 to typecheck job
  • Add timeout-minutes: 15 to security job
  • Add timeout-minutes: 10 to quality job
  • Add timeout-minutes: 30 to unit_tests job
  • Add timeout-minutes: 30 to integration_tests job
  • Add timeout-minutes: 30 to coverage job
  • Add timeout-minutes: 10 to build job
  • Add timeout-minutes: 20 to docker job
  • Add timeout-minutes: 10 to helm job
  • Add timeout-minutes: 5 to push-validation job
  • Add timeout-minutes: 5 to status-check job
  • Verify all jobs still pass with the new timeouts

Definition of Done

  • All jobs in .forgejo/workflows/ci.yml have explicit timeout-minutes set
  • No CI steps are removed or coverage thresholds lowered
  • CI pipeline passes with the new timeouts
  • Timeouts are set conservatively (2× expected runtime) to avoid false positives

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: implementation-worker

## Metadata - **Commit Message**: `ci: add timeout-minutes to all jobs in ci.yml to prevent runner resource exhaustion` - **Branch Name**: `ci-inf-5-add-job-timeouts` - **Type**: Infrastructure ## Description ### Problem Only the `e2e_tests` job in `.forgejo/workflows/ci.yml` has a `timeout-minutes` setting (45 minutes). All other jobs — `lint`, `typecheck`, `security`, `quality`, `unit_tests`, `integration_tests`, `coverage`, `build`, `docker`, `helm`, `push-validation`, and `status-check` — have no timeout configured. Forgejo Actions uses a default runner timeout of 6 hours when no job-level timeout is set. If any of these jobs hangs due to a network issue, deadlock, infinite loop, or dependency resolution problem, the CI run will silently consume runner resources for up to 6 hours before being killed. This blocks other CI runs and wastes compute. ### Evidence From `.forgejo/workflows/ci.yml`: - `e2e_tests` job: has `timeout-minutes: 45` ✓ - All other 12 jobs: no `timeout-minutes` set ✗ ### Impact - A hung `unit_tests` or `integration_tests` job can block the runner for 6 hours - A hung `docker` job (which builds Docker images) can block the privileged DinD runner for 6 hours - The `coverage` job (which re-runs the full behave suite) has no timeout despite being one of the longest-running jobs - Developers get no feedback that a job is stuck — it just appears to be "running" ### Proposed Fix Add appropriate `timeout-minutes` values to each job based on expected runtime: | Job | Suggested Timeout | |-----|------------------| | `lint` | 10 minutes | | `typecheck` | 15 minutes | | `security` | 15 minutes | | `quality` | 10 minutes | | `unit_tests` | 30 minutes | | `integration_tests` | 30 minutes | | `e2e_tests` | 45 minutes (already set) | | `coverage` | 30 minutes | | `build` | 10 minutes | | `docker` | 20 minutes | | `helm` | 10 minutes | | `push-validation` | 5 minutes | | `status-check` | 5 minutes | These values should be set conservatively (2× expected runtime) to allow for slow runners while still catching hangs within a reasonable window. ## Subtasks - [ ] Add `timeout-minutes: 10` to `lint` job - [ ] Add `timeout-minutes: 15` to `typecheck` job - [ ] Add `timeout-minutes: 15` to `security` job - [ ] Add `timeout-minutes: 10` to `quality` job - [ ] Add `timeout-minutes: 30` to `unit_tests` job - [ ] Add `timeout-minutes: 30` to `integration_tests` job - [ ] Add `timeout-minutes: 30` to `coverage` job - [ ] Add `timeout-minutes: 10` to `build` job - [ ] Add `timeout-minutes: 20` to `docker` job - [ ] Add `timeout-minutes: 10` to `helm` job - [ ] Add `timeout-minutes: 5` to `push-validation` job - [ ] Add `timeout-minutes: 5` to `status-check` job - [ ] Verify all jobs still pass with the new timeouts ## Definition of Done - All jobs in `.forgejo/workflows/ci.yml` have explicit `timeout-minutes` set - No CI steps are removed or coverage thresholds lowered - CI pipeline passes with the new timeouts - Timeouts are set conservatively (2× expected runtime) to avoid false positives --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: implementation-worker
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10197
No description provided.