CI: Gate coverage and post-build jobs on test success #9379

Open
opened 2026-04-14 16:21:41 +00:00 by HAL9000 · 1 comment
Owner

Summary

  • The CI workflow lets downstream jobs (coverage, build, helm, push-validation) run even when unit_tests or integration_tests fail, because the jobs lack needs: [...] on the test stage.
  • Run https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/13035 (2026-04-13) failed in unit_tests after 10m30s, yet coverage ran for another 10m41s, build/helm/push-validation all executed, and status-check finally failed.
  • Every red PR currently spends an extra ~11 minutes uploading artifacts that will be discarded.

Data

  • Source: job_4_attempt1_pretty.json (extracted from run 13035).
  • Key durations: unit_tests 11m02s (failed), coverage 10m41s (success), build 27s, helm 26s, push-validation 25s.
  • Coverage alone adds ~29% to the failing run wall-clock time.

Proposal

  1. Add needs: [unit_tests, integration_tests, e2e_tests] and if: ${{ success() }} (or equivalent) to the coverage, build, helm, and push-validation jobs in .forgejo/workflows/ci.yml.
  2. Keep coverage for mainline builds by letting nightly-quality continue to run the full suite.
  3. Optionally emit a short summary comment when tests fail early so contributors still see the required checks.

Impact

  • Saves ~11 minutes of runner time on every red PR.
  • Reduces unnecessary artifacts (coverage logs, build outputs) when key tests are already red.
  • Makes CI feedback loop tighter by failing as soon as the first critical job fails.
## Summary - The CI workflow lets downstream jobs (`coverage`, `build`, `helm`, `push-validation`) run even when `unit_tests` or `integration_tests` fail, because the jobs lack `needs: [...]` on the test stage. - Run https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/13035 (2026-04-13) failed in `unit_tests` after 10m30s, yet `coverage` ran for another 10m41s, `build`/`helm`/`push-validation` all executed, and `status-check` finally failed. - Every red PR currently spends an extra ~11 minutes uploading artifacts that will be discarded. ## Data - Source: job_4_attempt1_pretty.json (extracted from run 13035). - Key durations: `unit_tests` 11m02s (failed), `coverage` 10m41s (success), `build` 27s, `helm` 26s, `push-validation` 25s. - Coverage alone adds ~29% to the failing run wall-clock time. ## Proposal 1. Add `needs: [unit_tests, integration_tests, e2e_tests]` and `if: ${{ success() }}` (or equivalent) to the `coverage`, `build`, `helm`, and `push-validation` jobs in `.forgejo/workflows/ci.yml`. 2. Keep `coverage` for mainline builds by letting nightly-quality continue to run the full suite. 3. Optionally emit a short summary comment when tests fail early so contributors still see the required checks. ## Impact - Saves ~11 minutes of runner time on every red PR. - Reduces unnecessary artifacts (coverage logs, build outputs) when key tests are already red. - Makes CI feedback loop tighter by failing as soon as the first critical job fails.
HAL9000 self-assigned this 2026-04-14 16:21:41 +00:00
Author
Owner

🔍 Triage Decision — Verified

Decision: Verified | MoSCoW: Should Have | Priority: Medium

This is a confirmed CI efficiency issue backed by concrete run data. Run #13035 (2026-04-13) shows unit_tests failed after 11 minutes, yet coverage ran for another 10m41s, and build/helm/push-validation all executed — wasting ~29% additional wall-clock time on a run that was already red.

Rationale:

  • The data is concrete (job durations extracted from run #13035)
  • Downstream jobs running after test failure produce artifacts that are immediately discarded
  • The fix (needs: [unit_tests, integration_tests, e2e_tests] + if: ${{ success() }}) is standard CI practice
  • Classified as Should Have: important for CI efficiency and developer feedback loop
  • Assigned to v3.2.0 as a medium-priority CI improvement

Next steps: Add needs: [unit_tests, integration_tests, e2e_tests] and if: ${{ success() }} to the coverage, build, helm, and push-validation jobs in .forgejo/workflows/ci.yml. Keep nightly workflow unaffected.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Worker: [AUTO-OWNR-1]

## 🔍 Triage Decision — Verified ✅ **Decision:** Verified | **MoSCoW:** Should Have | **Priority:** Medium This is a confirmed CI efficiency issue backed by concrete run data. Run #13035 (2026-04-13) shows `unit_tests` failed after 11 minutes, yet `coverage` ran for another 10m41s, and `build`/`helm`/`push-validation` all executed — wasting ~29% additional wall-clock time on a run that was already red. **Rationale:** - The data is concrete (job durations extracted from run #13035) - Downstream jobs running after test failure produce artifacts that are immediately discarded - The fix (`needs: [unit_tests, integration_tests, e2e_tests]` + `if: ${{ success() }}`) is standard CI practice - Classified as **Should Have**: important for CI efficiency and developer feedback loop - Assigned to **v3.2.0** as a medium-priority CI improvement **Next steps:** Add `needs: [unit_tests, integration_tests, e2e_tests]` and `if: ${{ success() }}` to the `coverage`, `build`, `helm`, and `push-validation` jobs in `.forgejo/workflows/ci.yml`. Keep nightly workflow unaffected. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor Worker: [AUTO-OWNR-1]
HAL9000 added this to the v3.2.0 milestone 2026-04-16 12:38:25 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9379
No description provided.