[AUTO-INF-5] Improve Docker caching, template DB reuse, and release SBOMs #9890

Open
opened 2026-04-15 23:36:32 +00:00 by HAL9000 · 0 comments
Owner

Summary

  • Docker and server images are rebuilt twice per run without any BuildKit cache, so the docker job recompiles every layer on each workflow execution.
  • Every nox session (unit, integration, e2e, coverage) recreates build/.template-migrated.db from scratch, paying the full Alembic migration cost multiple times per workflow.
  • Release artifacts (wheel + Docker image) are published without SBOM generation, vulnerability scanning, or signing, leaving a supply-chain gap.

Findings

1. Docker builds run without cache reuse

.forgejo/workflows/ci.yml defines the docker job with manual docker build calls inside a DinD container:

    docker:
        needs: [lint, typecheck, security, quality, unit_tests]
        runs-on: docker
        container:
            image: docker:dind
            options: --privileged
        steps:
            - name: Start Docker daemon and install dependencies
              run: |
                  dockerd &
                  apk add --no-cache git nodejs
                  sleep 3
            - name: Build Docker image (CLI)
              run: |
                  docker build -t cleverernie:test .
            - name: Build Docker image (Server)
              run: |
                  docker build -f Dockerfile.server -t cleveragents-server:test .

Both builds start from scratch, so base layers, pip installs, and OS packages are downloaded twice on every run.

2. Template database is rebuilt in every nox session

noxfile.py calls _create_template_db() in unit_tests, integration_tests, e2e_tests, and coverage_report:

def _create_template_db(session: nox.Session) -> str:
    template_path = str(Path('build/.template-migrated.db').resolve())
    session.run(
        'python',
        'scripts/create_template_db.py',
        template_path,
        silent=True,
    )
    return template_path

Because no cache is restored, scripts/create_template_db.py runs four times per workflow, reapplying roughly 25 migrations even though the template contents are identical within the same revision.

3. Release workflow lacks SBOM, scanning, or signing

.forgejo/workflows/release.yml builds the wheel and Docker image, then uploads them, but there are no steps for CycloneDX/Syft SBOMs, Trivy or pip-audit scans, or cosign/sigstore signing. That leaves downstream consumers without provenance data or automated vulnerability checks.

Proposed Improvements

A. Enable BuildKit caching for Docker builds

  • Use docker/setup-buildx-action@v3 and docker/build-push-action@v5 with cache-from / cache-to (registry-based or GitHub Cache exporter) so CLI and server builds reuse base layers.
  • Keep a shared cache key (for example the hash of Dockerfile*, pyproject.toml, and uv.lock) so both images warm the cache in one workflow run.
  • Convert the existing docker build steps to BuildKit invocations to eliminate duplicate package installs and reduce the job runtime.

B. Cache the pre-migrated template database

  • Introduce an actions/cache@v3 step keyed by the current alembic revision (for example hashFiles('alembic/versions/**/*.py', 'scripts/create_template_db.py')).
  • On cache hit, copy the cached build/.template-migrated.db into place before invoking nox; on miss, generate it once and save it so downstream jobs (integration/e2e/coverage) reuse the same artifact.
  • Optionally centralize this into a lightweight prepare-db-template job that restores or saves the cache and exposes the template path via outputs.

C. Add SBOM, vulnerability scanning, and signing to releases

  • After building the wheel and Docker image, run anchore/syft-action (or cyclonedx tooling) to produce SBOMs for both artifacts and upload them alongside the release.
  • Run aquasecurity/trivy-action@master against the image and pip-audit or safety against the wheel before publish; fail the workflow on HIGH severity findings.
  • Use sigstore/cosign-installer plus cosign sign (keyless or key-based) to sign the container image and wheel, storing the signatures as release assets so downstream automation can verify provenance.

Implementation Recommendations

  • Split the current docker job into: (1) setup-buildx, (2) a cached BuildKit build-and-test step, and (3) optional push, all sharing a cache stored in the registry (for example type=registry,ref=registry.example.com/ci-buildcache).
  • Add a reusable action or job (.forgejo/actions/prepare-template-db) that restores the cached database and only runs scripts/create_template_db.py on cache miss; make unit_tests, integration_tests, e2e_tests, and coverage depend on that preparation step.
  • Extend release.yml with SBOM generation, Trivy scans, and cosign signing, uploading SBOMs and signatures via the existing artifact/release upload steps so they ship with every tag.
  • Document the new cache keys and security expectations in docs/development/ci-cd.md so contributors understand how to troubleshoot cache misses or signature failures.

Duplicate Check

Query Result
CI pipeline design Found #9767 '[AUTO-INF-3] Harden CI workflow reliability...' — focuses on runner bootstrap and missing secrets; does not cover Docker caching, template DB reuse, or release supply chain.
docker build cache No matches (checked search pages 1–3 via /repos/issues/search).
helm cache No matches (checked search pages 1–3).
docker buildx No matches (checked search pages 1–3).
uv cache Found #9782, #9527, #8329 — these address uv/nox caching and runtime reductions but not template DB caching or release artifact security.

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor

## Summary - Docker and server images are rebuilt twice per run without any BuildKit cache, so the docker job recompiles every layer on each workflow execution. - Every nox session (unit, integration, e2e, coverage) recreates build/.template-migrated.db from scratch, paying the full Alembic migration cost multiple times per workflow. - Release artifacts (wheel + Docker image) are published without SBOM generation, vulnerability scanning, or signing, leaving a supply-chain gap. ## Findings ### 1. Docker builds run without cache reuse `.forgejo/workflows/ci.yml` defines the docker job with manual `docker build` calls inside a DinD container: ``` docker: needs: [lint, typecheck, security, quality, unit_tests] runs-on: docker container: image: docker:dind options: --privileged steps: - name: Start Docker daemon and install dependencies run: | dockerd & apk add --no-cache git nodejs sleep 3 - name: Build Docker image (CLI) run: | docker build -t cleverernie:test . - name: Build Docker image (Server) run: | docker build -f Dockerfile.server -t cleveragents-server:test . ``` Both builds start from scratch, so base layers, pip installs, and OS packages are downloaded twice on every run. ### 2. Template database is rebuilt in every nox session `noxfile.py` calls `_create_template_db()` in `unit_tests`, `integration_tests`, `e2e_tests`, and `coverage_report`: ``` def _create_template_db(session: nox.Session) -> str: template_path = str(Path('build/.template-migrated.db').resolve()) session.run( 'python', 'scripts/create_template_db.py', template_path, silent=True, ) return template_path ``` Because no cache is restored, `scripts/create_template_db.py` runs four times per workflow, reapplying roughly 25 migrations even though the template contents are identical within the same revision. ### 3. Release workflow lacks SBOM, scanning, or signing `.forgejo/workflows/release.yml` builds the wheel and Docker image, then uploads them, but there are no steps for CycloneDX/Syft SBOMs, Trivy or pip-audit scans, or cosign/sigstore signing. That leaves downstream consumers without provenance data or automated vulnerability checks. ## Proposed Improvements ### A. Enable BuildKit caching for Docker builds - Use docker/setup-buildx-action@v3 and docker/build-push-action@v5 with cache-from / cache-to (registry-based or GitHub Cache exporter) so CLI and server builds reuse base layers. - Keep a shared cache key (for example the hash of Dockerfile*, pyproject.toml, and uv.lock) so both images warm the cache in one workflow run. - Convert the existing docker build steps to BuildKit invocations to eliminate duplicate package installs and reduce the job runtime. ### B. Cache the pre-migrated template database - Introduce an actions/cache@v3 step keyed by the current alembic revision (for example `hashFiles('alembic/versions/**/*.py', 'scripts/create_template_db.py')`). - On cache hit, copy the cached build/.template-migrated.db into place before invoking nox; on miss, generate it once and save it so downstream jobs (integration/e2e/coverage) reuse the same artifact. - Optionally centralize this into a lightweight prepare-db-template job that restores or saves the cache and exposes the template path via outputs. ### C. Add SBOM, vulnerability scanning, and signing to releases - After building the wheel and Docker image, run anchore/syft-action (or cyclonedx tooling) to produce SBOMs for both artifacts and upload them alongside the release. - Run aquasecurity/trivy-action@master against the image and pip-audit or safety against the wheel before publish; fail the workflow on HIGH severity findings. - Use sigstore/cosign-installer plus cosign sign (keyless or key-based) to sign the container image and wheel, storing the signatures as release assets so downstream automation can verify provenance. ## Implementation Recommendations - Split the current docker job into: (1) setup-buildx, (2) a cached BuildKit build-and-test step, and (3) optional push, all sharing a cache stored in the registry (for example type=registry,ref=registry.example.com/ci-buildcache). - Add a reusable action or job (.forgejo/actions/prepare-template-db) that restores the cached database and only runs scripts/create_template_db.py on cache miss; make unit_tests, integration_tests, e2e_tests, and coverage depend on that preparation step. - Extend release.yml with SBOM generation, Trivy scans, and cosign signing, uploading SBOMs and signatures via the existing artifact/release upload steps so they ship with every tag. - Document the new cache keys and security expectations in docs/development/ci-cd.md so contributors understand how to troubleshoot cache misses or signature failures. ### Duplicate Check | Query | Result | | --- | --- | | `CI pipeline design` | Found #9767 '[AUTO-INF-3] Harden CI workflow reliability...' — focuses on runner bootstrap and missing secrets; does not cover Docker caching, template DB reuse, or release supply chain. | | `docker build cache` | No matches (checked search pages 1–3 via /repos/issues/search). | | `helm cache` | No matches (checked search pages 1–3). | | `docker buildx` | No matches (checked search pages 1–3). | | `uv cache` | Found #9782, #9527, #8329 — these address uv/nox caching and runtime reductions but not template DB caching or release artifact security. | --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9890
No description provided.