TEST-INFRA: [ci-pipeline-design] Optimize job dependencies to fail faster #1684

Open
opened 2026-04-02 23:29:11 +00:00 by freemo · 0 comments
Owner

Metadata

  • Branch: task/v3.8.0-ci-pipeline-fail-faster
  • Commit Message: chore(ci): optimize job dependency graph to surface failures faster and reduce wasted CI resources
  • Milestone: v3.8.0
  • Parent Epic: (to be linked — see orphan note below)

Background and Context

The current CI pipeline in .forgejo/workflows/ci.yml does not enforce a strict dependency ordering between fast feedback jobs and slow, resource-intensive jobs. As a result, expensive downstream jobs such as integration_tests and e2e_tests are launched in parallel with — or immediately after — the lightweight lint, typecheck, and unit_tests jobs, regardless of whether those fast gates have passed.

This violates the fail-fast principle that is central to the CleverAgents engineering culture (see CONTRIBUTING.md and docs/specification.md). When a developer introduces a trivial type error or a broken unit test, the pipeline should surface that failure within seconds and abort all downstream work — not waste minutes of CI runner time executing integration and E2E suites that are guaranteed to be irrelevant.

Observed problem: A failing unit_tests job does not prevent integration_tests or e2e_tests from starting. A failing lint or typecheck job does not prevent any test jobs from starting. This leads to:

  • Wasted CI runner minutes on jobs that will never produce actionable signal.
  • Slower developer feedback loops (the fast failure is buried among many parallel job outputs).
  • Increased cost and resource contention on shared CI infrastructure.

Area: CI/CD — .forgejo/workflows/ci.yml

Required Changes

The job dependency graph must be restructured so that:

  1. Static analysis gates first: lint and typecheck must complete successfully before any test job starts.
  2. Unit tests gate integration tests: integration_tests must declare needs: [unit_tests]. If unit_tests fails, integration_tests is automatically skipped.
  3. Unit tests gate E2E tests: e2e_tests must declare needs: [unit_tests]. If unit_tests fails, e2e_tests is automatically skipped.
  4. Build job sequencing: Evaluate whether the build job should depend on unit_tests passing before producing artefacts.
  5. Coverage report sequencing: The coverage_report job must only run after unit_tests succeeds (it is meaningless otherwise).
  6. No circular dependencies: The updated needs: graph must be validated to contain no cycles or unintended blocking paths.

The resulting pipeline shape should be:

lint ──┐
       ├──► unit_tests ──► integration_tests
typecheck ──┘              └──► e2e_tests
                           └──► coverage_report
                           └──► build

Subtasks

  • Audit .forgejo/workflows/ci.yml — document the current needs: graph for all jobs.
  • Add needs: [lint, typecheck] to the unit_tests job so static analysis gates test execution.
  • Add needs: [unit_tests] to the integration_tests job so unit failures short-circuit integration runs.
  • Add needs: [unit_tests] to the e2e_tests job so unit failures short-circuit E2E runs.
  • Add needs: [unit_tests] to the coverage_report job (coverage is only meaningful after unit tests pass).
  • Evaluate and update the build job's needs: to depend on unit_tests if appropriate.
  • Validate the updated needs: graph for cycles and unintended blocking paths.
  • Trigger a CI run on a branch with a deliberate unit test failure to confirm that integration_tests and e2e_tests are correctly skipped.
  • Trigger a CI run on a branch with a deliberate lint failure to confirm that unit_tests is correctly skipped.
  • Update any CI documentation or comments in the workflow file to reflect the new sequencing rationale.

Definition of Done

  • All subtasks above are checked off.
  • .forgejo/workflows/ci.yml reflects the updated needs: graph as described above.
  • A CI run with a deliberate fast-gate failure confirms that all downstream jobs are skipped.
  • A CI run on a clean branch confirms that all jobs still execute in the correct order and the pipeline passes end-to-end.
  • The commit is created with the message chore(ci): optimize job dependency graph to surface failures faster and reduce wasted CI resources and pushed to branch task/v3.8.0-ci-pipeline-fail-faster.
  • The corresponding Pull Request has been merged.
  • All nox stages pass.
  • Coverage >= 97%

⚠️ Orphan Note: No parent Epic with Type/Epic label was found for ci-pipeline-design issues at the time of creation. This issue must be manually linked to the appropriate parent Epic (likely under Legendary #376 — Hardening, Testing & Security) by the project owner before it is verified.


Automated by CleverAgents Bot
Supervisor: Unknown | Agent: ca-new-issue-creator

## Metadata - **Branch**: `task/v3.8.0-ci-pipeline-fail-faster` - **Commit Message**: `chore(ci): optimize job dependency graph to surface failures faster and reduce wasted CI resources` - **Milestone**: v3.8.0 - **Parent Epic**: *(to be linked — see orphan note below)* ## Background and Context The current CI pipeline in `.forgejo/workflows/ci.yml` does not enforce a strict dependency ordering between fast feedback jobs and slow, resource-intensive jobs. As a result, expensive downstream jobs such as `integration_tests` and `e2e_tests` are launched in parallel with — or immediately after — the lightweight `lint`, `typecheck`, and `unit_tests` jobs, regardless of whether those fast gates have passed. This violates the **fail-fast** principle that is central to the CleverAgents engineering culture (see `CONTRIBUTING.md` and `docs/specification.md`). When a developer introduces a trivial type error or a broken unit test, the pipeline should surface that failure within seconds and abort all downstream work — not waste minutes of CI runner time executing integration and E2E suites that are guaranteed to be irrelevant. **Observed problem**: A failing `unit_tests` job does not prevent `integration_tests` or `e2e_tests` from starting. A failing `lint` or `typecheck` job does not prevent any test jobs from starting. This leads to: - Wasted CI runner minutes on jobs that will never produce actionable signal. - Slower developer feedback loops (the fast failure is buried among many parallel job outputs). - Increased cost and resource contention on shared CI infrastructure. **Area**: CI/CD — `.forgejo/workflows/ci.yml` ## Required Changes The job dependency graph must be restructured so that: 1. **Static analysis gates first**: `lint` and `typecheck` must complete successfully before any test job starts. 2. **Unit tests gate integration tests**: `integration_tests` must declare `needs: [unit_tests]`. If `unit_tests` fails, `integration_tests` is automatically skipped. 3. **Unit tests gate E2E tests**: `e2e_tests` must declare `needs: [unit_tests]`. If `unit_tests` fails, `e2e_tests` is automatically skipped. 4. **Build job sequencing**: Evaluate whether the `build` job should depend on `unit_tests` passing before producing artefacts. 5. **Coverage report sequencing**: The `coverage_report` job must only run after `unit_tests` succeeds (it is meaningless otherwise). 6. **No circular dependencies**: The updated `needs:` graph must be validated to contain no cycles or unintended blocking paths. The resulting pipeline shape should be: ``` lint ──┐ ├──► unit_tests ──► integration_tests typecheck ──┘ └──► e2e_tests └──► coverage_report └──► build ``` ## Subtasks - [ ] Audit `.forgejo/workflows/ci.yml` — document the current `needs:` graph for all jobs. - [ ] Add `needs: [lint, typecheck]` to the `unit_tests` job so static analysis gates test execution. - [ ] Add `needs: [unit_tests]` to the `integration_tests` job so unit failures short-circuit integration runs. - [ ] Add `needs: [unit_tests]` to the `e2e_tests` job so unit failures short-circuit E2E runs. - [ ] Add `needs: [unit_tests]` to the `coverage_report` job (coverage is only meaningful after unit tests pass). - [ ] Evaluate and update the `build` job's `needs:` to depend on `unit_tests` if appropriate. - [ ] Validate the updated `needs:` graph for cycles and unintended blocking paths. - [ ] Trigger a CI run on a branch with a deliberate unit test failure to confirm that `integration_tests` and `e2e_tests` are correctly skipped. - [ ] Trigger a CI run on a branch with a deliberate lint failure to confirm that `unit_tests` is correctly skipped. - [ ] Update any CI documentation or comments in the workflow file to reflect the new sequencing rationale. ## Definition of Done - [ ] All subtasks above are checked off. - [ ] `.forgejo/workflows/ci.yml` reflects the updated `needs:` graph as described above. - [ ] A CI run with a deliberate fast-gate failure confirms that all downstream jobs are skipped. - [ ] A CI run on a clean branch confirms that all jobs still execute in the correct order and the pipeline passes end-to-end. - [ ] The commit is created with the message `chore(ci): optimize job dependency graph to surface failures faster and reduce wasted CI resources` and pushed to branch `task/v3.8.0-ci-pipeline-fail-faster`. - [ ] The corresponding Pull Request has been merged. - [ ] All nox stages pass. - [ ] Coverage >= 97% --- > ⚠️ **Orphan Note**: No parent Epic with `Type/Epic` label was found for `ci-pipeline-design` issues at the time of creation. This issue must be manually linked to the appropriate parent Epic (likely under Legendary #376 — Hardening, Testing & Security) by the project owner before it is verified. --- **Automated by CleverAgents Bot** Supervisor: Unknown | Agent: ca-new-issue-creator
freemo added this to the v3.7.0 milestone 2026-04-02 23:29:24 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#1684
No description provided.