feat(perf): large project scaling tests #859

Closed
opened 2026-03-13 22:01:43 +00:00 by freemo · 1 comment
Owner

Metadata

  • Commit Message: feat(perf): large project scaling tests
  • Branch: feature/m7-scaling-tests

Background

M7 (v3.6.0) acceptance criterion: large project scaling tests must validate that the system handles projects at production scale. This includes indexing performance, context assembly performance, and execution throughput for large codebases.

Expected Behavior

  1. Scaling tests defined for 10K+, 50K+, and 100K+ file projects
  2. Indexing performance benchmarks pass within defined thresholds
  3. Context assembly performance scales sub-linearly
  4. Execution throughput measured and baselined

Acceptance Criteria

  • Scaling test suite defined for multiple project sizes
  • Indexing performance benchmarks pass
  • Context assembly performance scales acceptably
  • Execution throughput baselined
  • Results documented with baseline metrics

Subtasks

  • Define scaling test fixtures (10K, 50K, 100K files)
  • Implement indexing performance benchmarks
  • Implement context assembly performance tests
  • Implement execution throughput tests
  • Document baseline metrics
  • Verify coverage >=97% via nox -s coverage_report
  • Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

  • All subtasks above are completed and checked off.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
## Metadata - **Commit Message**: `feat(perf): large project scaling tests` - **Branch**: `feature/m7-scaling-tests` ## Background M7 (v3.6.0) acceptance criterion: large project scaling tests must validate that the system handles projects at production scale. This includes indexing performance, context assembly performance, and execution throughput for large codebases. ## Expected Behavior 1. Scaling tests defined for 10K+, 50K+, and 100K+ file projects 2. Indexing performance benchmarks pass within defined thresholds 3. Context assembly performance scales sub-linearly 4. Execution throughput measured and baselined ## Acceptance Criteria - [ ] Scaling test suite defined for multiple project sizes - [ ] Indexing performance benchmarks pass - [ ] Context assembly performance scales acceptably - [ ] Execution throughput baselined - [ ] Results documented with baseline metrics ## Subtasks - [ ] Define scaling test fixtures (10K, 50K, 100K files) - [ ] Implement indexing performance benchmarks - [ ] Implement context assembly performance tests - [ ] Implement execution throughput tests - [ ] Document baseline metrics - [ ] Verify coverage >=97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done.
freemo added this to the v3.6.0 milestone 2026-03-13 22:02:10 +00:00
Member

Implementation Notes

Benchmarks Added

3 new ASV benchmark suites extending coverage from 5K to 100K files:

IndexingScalingSuite (large_project_scaling_bench.py): Parameterized at 1K/10K/50K/100K files. Creates realistic directory structures (src/tests/docs with Python/MD/JSON). Measures walk_and_index() throughput, incremental refresh (1% modified files), file count tracking, tokens/second.

ContextAssemblyScalingSuite (context_assembly_scaling_bench.py): Parameterized at 100/1K/5K/10K fragments. Measures full ACMS pipeline, tiered strategy, recency strategy, assembled tokens, fragments/second.

ExecutionThroughputSuite (execution_throughput_bench.py): Parameterized at 10/50/100 plans. Measures sequential plan creation, executor construction, decision tree scaling.

Scale Fixture Updates

  • Added xlarge (50K files/2.5GB) and xxlarge (100K files/5GB) profiles to scale_metadata.json
  • Added corresponding thresholds: 50K indexing p50=90s, 100K p50=200s; 50K memory 3.8GB, 100K memory 7.7GB
  • Added context_assembly and execution_throughput threshold sections

Commit

e3db3242 on branch feature/m7-scaling-tests

PR

PR #984

## Implementation Notes ### Benchmarks Added 3 new ASV benchmark suites extending coverage from 5K to 100K files: **IndexingScalingSuite** (`large_project_scaling_bench.py`): Parameterized at 1K/10K/50K/100K files. Creates realistic directory structures (src/tests/docs with Python/MD/JSON). Measures `walk_and_index()` throughput, incremental refresh (1% modified files), file count tracking, tokens/second. **ContextAssemblyScalingSuite** (`context_assembly_scaling_bench.py`): Parameterized at 100/1K/5K/10K fragments. Measures full ACMS pipeline, tiered strategy, recency strategy, assembled tokens, fragments/second. **ExecutionThroughputSuite** (`execution_throughput_bench.py`): Parameterized at 10/50/100 plans. Measures sequential plan creation, executor construction, decision tree scaling. ### Scale Fixture Updates - Added `xlarge` (50K files/2.5GB) and `xxlarge` (100K files/5GB) profiles to `scale_metadata.json` - Added corresponding thresholds: 50K indexing p50=90s, 100K p50=200s; 50K memory 3.8GB, 100K memory 7.7GB - Added `context_assembly` and `execution_throughput` threshold sections ### Commit `e3db3242` on branch `feature/m7-scaling-tests` ### PR [PR #984](https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/984)
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#859
No description provided.