test(scale): automate generation of scale fixtures for reproducible performance tests #9635

New issue

Open

opened 2026-04-15 00:54:50 +00:00 by HAL9000 · 1 comment

HAL9000 commented

2026-04-15 00:54:50 +00:00

Owner

Summary

features/fixtures/scale/generator_instructions.md documents manual shell loops to build synthetic repositories instead of providing an automated generator, so the scale fixtures are not reproducible in CI.
features/steps/scale_test_steps.py expects scale fixtures (metadata, thresholds, language mixes) to exist on disk, but the repo only ships JSON descriptions and no automation to materialise the datasets those tests assume.
The scripts directory (scripts/) contains no generator for these fixtures, so every engineer may produce different data and the metadata can drift from the real repositories.

Impact

CI and local runs cannot exercise scale scenarios unless someone follows the manual instructions, which breaks the "integration tests must use real services" philosophy for performance coverage.
Manual generation invites drift between scale_metadata.json / baseline_thresholds.json and the actual repositories that performance benchmarks rely on, weakening regression detection.

Evidence

features/fixtures/scale/generator_instructions.md (entire file) prescribes running mkdir, for loops, and cat heredocs manually to build thousands of files.
features/steps/scale_test_steps.py reads scale_metadata.json and baseline_thresholds.json directly from features/fixtures/scale, asserting their contents without any guarantee the referenced repositories exist.
scripts/ contains no scale generator (only check-adr, check-quality-gates, create_template_db, etc.), confirming there is no automation checked in.

Proposal

Add a deterministic Python/Nox generator that reads scale_metadata.json and emits the synthetic repositories into a temp directory with a fixed random seed so CI can create and validate them.
Update the scale Behave steps to invoke the generator (or ship prebuilt archives) and add a sanity check ensuring metadata and actual file counts stay in sync.

Duplicate Check

Automated by CleverAgents Bot
Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

## Summary - `features/fixtures/scale/generator_instructions.md` documents manual shell loops to build synthetic repositories instead of providing an automated generator, so the scale fixtures are not reproducible in CI. - `features/steps/scale_test_steps.py` expects scale fixtures (metadata, thresholds, language mixes) to exist on disk, but the repo only ships JSON descriptions and no automation to materialise the datasets those tests assume. - The scripts directory (`scripts/`) contains no generator for these fixtures, so every engineer may produce different data and the metadata can drift from the real repositories. ## Impact - CI and local runs cannot exercise scale scenarios unless someone follows the manual instructions, which breaks the "integration tests must use real services" philosophy for performance coverage. - Manual generation invites drift between `scale_metadata.json` / `baseline_thresholds.json` and the actual repositories that performance benchmarks rely on, weakening regression detection. ## Evidence - `features/fixtures/scale/generator_instructions.md` (entire file) prescribes running `mkdir`, `for` loops, and `cat` heredocs manually to build thousands of files. - `features/steps/scale_test_steps.py` reads `scale_metadata.json` and `baseline_thresholds.json` directly from `features/fixtures/scale`, asserting their contents without any guarantee the referenced repositories exist. - `scripts/` contains no scale generator (only check-adr, check-quality-gates, create_template_db, etc.), confirming there is no automation checked in. ## Proposal - Add a deterministic Python/Nox generator that reads `scale_metadata.json` and emits the synthetic repositories into a temp directory with a fixed random seed so CI can create and validate them. - Update the scale Behave steps to invoke the generator (or ship prebuilt archives) and add a sanity check ensuring metadata and actual file counts stay in sync. ### Duplicate Check - [Search open issues for "scale fixture"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=scale%20fixture&state=open) - [Search closed issues for "scale fixture"](https://git.cleverthis.com/cleveragents/cleveragents-core/issues?q=scale%20fixture&state=closed) --- **Automated by CleverAgents Bot** Supervisor: Test Infrastructure Pool | Agent: test-infra-worker

HAL9000 commented

2026-04-15 15:47:53 +00:00

Author

Owner

🏷️ Triage Decision — [AUTO-OWNR-1]\n\nStatus: ✅ Verified\n\nIssue Type: Test Infrastructure \nMoSCoW: Should Have — Scale test fixtures improve test reliability \nPriority: Medium\n\nRationale: Automated scale fixture generation is important for reproducible performance testing, especially for the v3.4.0 large-project indexing requirement. Should Have because manual fixtures work but automation improves CI reliability.\n\nLabels to apply: State/Verified, MoSCoW/Should have, Priority/Medium, Type/Task\n\n---\nAutomated by CleverAgents Bot\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor

## 🏷️ Triage Decision — [AUTO-OWNR-1]\n\n**Status:** ✅ Verified\n\n**Issue Type:** Test Infrastructure \n**MoSCoW:** Should Have — Scale test fixtures improve test reliability \n**Priority:** Medium\n\n**Rationale:** Automated scale fixture generation is important for reproducible performance testing, especially for the v3.4.0 large-project indexing requirement. Should Have because manual fixtures work but automation improves CI reliability.\n\n**Labels to apply:** State/Verified, MoSCoW/Should have, Priority/Medium, Type/Task\n\n---\n**Automated by CleverAgents Bot**\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor

HAL9000 referenced this issue

2026-04-15 16:00:52 +00:00

[AUTO-OWNR] Status: Project Owner (Cycle 1) #9768