[AUTO-INF-9] Optimize CI Execution Time: integration_tests job #8336

Closed
opened 2026-04-13 09:11:47 +00:00 by HAL9000 · 1 comment
Owner

Summary

  • The integration_tests job executes 1 h 27 m of Robot Framework suites per run, even though parallelism keeps wall time near 4 minutes; any drop in concurrency or test flake immediately balloons CI runtime.
  • Every invocation rebuilds system dependencies (git, Helm, curl, perl, krb5, etc.) and provisions a fresh .nox/integration_tests env before tests begin, adding ~2 minutes of setup latency.
  • Breaking the suite into targeted shards and pre-baking the runner image would stabilise run time and prevent the multi-hour workflow spikes seen in runs #4821 / #4427 when retries occur.

Evidence

  • /app/ci-log-run12770-job5.log: Total testing: 1 hour 27 minutes 27.80 seconds with wall-clock 01:06:11Z → 01:10:33Z (2026-04-11) after installing 60+ Debian packages and rebuilding the nox env.
  • /app/ci_logs/pr6729_integration_tests.log: similar suite runtime (Total testing: 1 hour 18 minutes 23.70 seconds) and full apt bootstrap despite a successful run.
  • Forgejo Actions run https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/666 shows overall workflow >2 h when integration tests retried after a single scenario failure.

Root Cause

  • The integration job bundles all Robot suites into one massive session (nox -s integration_tests) that depends on heavyweight tooling (Helm, kubeconform, git) installed ad hoc inside the runner container. High parallelism masks the duration until any failure or resource throttling forces retries, at which point total suite time dominates the pipeline.

Recommendations

  1. Pre-bake a Docker image for integration_tests with Helm, git, curl, krb5, and cached UV wheels to eliminate the repeated apt install step.
  2. Shard Robot suites into logical groups (e.g., platform bootstrap vs CLI vs workflow) so failures rerun a ~20 minute shard instead of the entire 1h+ corpus.
  3. Introduce a PR smoke subset (e.g., tag critical) and reserve the full matrix for nightly or labeled runs; this reduces load on external LLM secrets and keeps PR CI predictable.
  4. Persist .nox/integration_tests as a cache keyed on pyproject.toml + robot/ so tooling isn’t rebuilt when the code hasn’t changed.

Duplicate Check

  • Looked for open issues mentioning integration_tests / Robot CI (April 13, 2026) — no existing tickets found.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

## Summary - The `integration_tests` job executes 1 h 27 m of Robot Framework suites per run, even though parallelism keeps wall time near 4 minutes; any drop in concurrency or test flake immediately balloons CI runtime. - Every invocation rebuilds system dependencies (git, Helm, curl, perl, krb5, etc.) and provisions a fresh `.nox/integration_tests` env before tests begin, adding ~2 minutes of setup latency. - Breaking the suite into targeted shards and pre-baking the runner image would stabilise run time and prevent the multi-hour workflow spikes seen in runs #4821 / #4427 when retries occur. ## Evidence - `/app/ci-log-run12770-job5.log`: `Total testing: 1 hour 27 minutes 27.80 seconds` with wall-clock 01:06:11Z → 01:10:33Z (2026-04-11) after installing 60+ Debian packages and rebuilding the nox env. - `/app/ci_logs/pr6729_integration_tests.log`: similar suite runtime (`Total testing: 1 hour 18 minutes 23.70 seconds`) and full apt bootstrap despite a successful run. - Forgejo Actions run https://git.cleverthis.com/cleveragents/cleveragents-core/actions/runs/666 shows overall workflow >2 h when integration tests retried after a single scenario failure. ## Root Cause - The integration job bundles all Robot suites into one massive session (`nox -s integration_tests`) that depends on heavyweight tooling (Helm, kubeconform, git) installed ad hoc inside the runner container. High parallelism masks the duration until any failure or resource throttling forces retries, at which point total suite time dominates the pipeline. ## Recommendations 1. **Pre-bake a Docker image** for `integration_tests` with Helm, git, curl, krb5, and cached UV wheels to eliminate the repeated apt install step. 2. **Shard Robot suites** into logical groups (e.g., platform bootstrap vs CLI vs workflow) so failures rerun a ~20 minute shard instead of the entire 1h+ corpus. 3. **Introduce a PR smoke subset** (e.g., tag `critical`) and reserve the full matrix for nightly or labeled runs; this reduces load on external LLM secrets and keeps PR CI predictable. 4. **Persist `.nox/integration_tests`** as a cache keyed on `pyproject.toml` + `robot/` so tooling isn’t rebuilt when the code hasn’t changed. ### Duplicate Check - Looked for open issues mentioning `integration_tests` / Robot CI (April 13, 2026) — no existing tickets found. --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-worker
Owner

superseded by next cycle

superseded by next cycle
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#8336
No description provided.