feat(perf): large project scaling tests #984

Merged
brent.edwards merged 8 commits from feature/m7-scaling-tests into master 2026-03-21 04:46:45 +00:00
Member

Summary

Add large project scaling benchmarks and tests at production scale (10K–100K files).

New ASV Benchmarks

IndexingScalingSuite (large_project_scaling_bench.py):

  • time_walk_and_index at 1K/10K/50K/100K files
  • time_incremental_refresh (1% modified files)
  • track_indexed_file_count, track_tokens_per_second

ContextAssemblyScalingSuite (context_assembly_scaling_bench.py):

  • time_full_pipeline at 100/1K/5K/10K fragments
  • time_tiered_strategy, time_recency_strategy
  • track_assembled_tokens, track_fragments_per_second

ExecutionThroughputSuite (execution_throughput_bench.py):

  • time_sequential_plans at 10/50/100 plans
  • time_executor_construction, time_decision_tree_scaling

Scale Fixture Updates

  • Added xlarge (50K files) and xxlarge (100K files) profiles to scale_metadata.json
  • Added 50K/100K thresholds to baseline_thresholds.json
  • Added context_assembly and execution_throughput threshold sections

Tests & Documentation

  • 15 Behave scenarios validating profiles, thresholds, monotonicity, memory budgets
  • 6 Robot integration tests including live 1K-file indexing throughput check
  • docs/reference/scaling_baselines.md documenting all baseline metrics

Quality Gates

Session Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS (10,910 scenarios)
nox -s integration_tests PASS (1,526 tests)
nox -s coverage_report 97% (>= 97%)

Closes #859

## Summary Add large project scaling benchmarks and tests at production scale (10K–100K files). ### New ASV Benchmarks **IndexingScalingSuite** (`large_project_scaling_bench.py`): - `time_walk_and_index` at 1K/10K/50K/100K files - `time_incremental_refresh` (1% modified files) - `track_indexed_file_count`, `track_tokens_per_second` **ContextAssemblyScalingSuite** (`context_assembly_scaling_bench.py`): - `time_full_pipeline` at 100/1K/5K/10K fragments - `time_tiered_strategy`, `time_recency_strategy` - `track_assembled_tokens`, `track_fragments_per_second` **ExecutionThroughputSuite** (`execution_throughput_bench.py`): - `time_sequential_plans` at 10/50/100 plans - `time_executor_construction`, `time_decision_tree_scaling` ### Scale Fixture Updates - Added `xlarge` (50K files) and `xxlarge` (100K files) profiles to `scale_metadata.json` - Added 50K/100K thresholds to `baseline_thresholds.json` - Added `context_assembly` and `execution_throughput` threshold sections ### Tests & Documentation - 15 Behave scenarios validating profiles, thresholds, monotonicity, memory budgets - 6 Robot integration tests including live 1K-file indexing throughput check - `docs/reference/scaling_baselines.md` documenting all baseline metrics ### Quality Gates | Session | Result | |---|---| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (10,910 scenarios) | | `nox -s integration_tests` | PASS (1,526 tests) | | `nox -s coverage_report` | 97% (>= 97%) | Closes #859
brent.edwards force-pushed feature/m7-scaling-tests from 70d4d885ab
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 19s
CI / quality (pull_request) Successful in 29s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 52s
CI / e2e_tests (pull_request) Has been cancelled
CI / unit_tests (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
to e3db32424b
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / build (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 52s
CI / e2e_tests (pull_request) Successful in 1m23s
CI / unit_tests (pull_request) Successful in 3m16s
CI / integration_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 58s
CI / coverage (pull_request) Successful in 5m46s
CI / benchmark-regression (pull_request) Successful in 54m38s
2026-03-16 22:27:21 +00:00
Compare
brent.edwards added this to the v3.6.0 milestone 2026-03-16 22:27:36 +00:00
Owner

PM Status — Day 37 — Rebase Required

This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.

@brent.edwards — Please rebase this PR onto master by Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.


PM rebase request — Day 37

## PM Status — Day 37 — Rebase Required This PR has **merge conflicts** and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved. @brent.edwards — Please rebase this PR onto `master` by **Day 39 EOD (2026-03-19)**. If you cannot rebase by then, please post a comment explaining the blocker. --- *PM rebase request — Day 37*
brent.edwards force-pushed feature/m7-scaling-tests from e3db32424b
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / build (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 45s
CI / security (pull_request) Successful in 52s
CI / e2e_tests (pull_request) Successful in 1m23s
CI / unit_tests (pull_request) Successful in 3m16s
CI / integration_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 58s
CI / coverage (pull_request) Successful in 5m46s
CI / benchmark-regression (pull_request) Successful in 54m38s
to ebdbc67d92
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 42s
CI / security (pull_request) Successful in 49s
CI / quality (pull_request) Successful in 25s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / e2e_tests (pull_request) Failing after 4m16s
CI / unit_tests (pull_request) Failing after 5m13s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 5m45s
CI / coverage (pull_request) Successful in 9m5s
CI / benchmark-regression (pull_request) Successful in 56m44s
2026-03-18 03:28:53 +00:00
Compare
freemo approved these changes 2026-03-19 04:57:49 +00:00
Dismissed
freemo left a comment

Code Review — PR #984

Large project scaling tests. Proper labels, milestone, and issue linkage (#859).

Approved with one note: this PR has merge conflicts (mergeable: false). Please rebase against current master before merge.

## Code Review — PR #984 Large project scaling tests. Proper labels, milestone, and issue linkage (#859). **Approved** with one note: this PR has **merge conflicts** (`mergeable: false`). Please rebase against current `master` before merge.
brent.edwards force-pushed feature/m7-scaling-tests from ebdbc67d92
Some checks failed
CI / lint (pull_request) Successful in 16s
CI / typecheck (pull_request) Successful in 42s
CI / security (pull_request) Successful in 49s
CI / quality (pull_request) Successful in 25s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / e2e_tests (pull_request) Failing after 4m16s
CI / unit_tests (pull_request) Failing after 5m13s
CI / docker (pull_request) Has been skipped
CI / integration_tests (pull_request) Successful in 5m45s
CI / coverage (pull_request) Successful in 9m5s
CI / benchmark-regression (pull_request) Successful in 56m44s
to e77cf8bb1c
Some checks failed
CI / lint (pull_request) Successful in 21s
CI / quality (pull_request) Successful in 47s
CI / typecheck (pull_request) Successful in 50s
CI / security (pull_request) Successful in 1m6s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 22s
CI / unit_tests (pull_request) Successful in 3m44s
CI / integration_tests (pull_request) Successful in 3m50s
CI / docker (pull_request) Successful in 1m0s
CI / e2e_tests (pull_request) Successful in 6m26s
CI / coverage (pull_request) Successful in 8m0s
CI / benchmark-regression (pull_request) Failing after 59m59s
2026-03-19 22:25:44 +00:00
Compare
brent.edwards dismissed freemo's review 2026-03-19 22:25:44 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards force-pushed feature/m7-scaling-tests from 5c6f23f9d5
Some checks failed
CI / lint (pull_request) Successful in 22s
CI / quality (pull_request) Successful in 47s
CI / typecheck (pull_request) Successful in 53s
CI / benchmark-publish (pull_request) Has been skipped
CI / security (pull_request) Successful in 54s
CI / build (pull_request) Successful in 22s
CI / integration_tests (pull_request) Successful in 5m21s
CI / e2e_tests (pull_request) Successful in 5m6s
CI / unit_tests (pull_request) Successful in 5m55s
CI / docker (pull_request) Successful in 1m48s
CI / coverage (pull_request) Failing after 13m45s
CI / benchmark-regression (pull_request) Failing after 1h3m39s
to 34e1ace676
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 28s
CI / typecheck (pull_request) Successful in 46s
CI / security (pull_request) Successful in 49s
CI / integration_tests (pull_request) Successful in 2m53s
CI / unit_tests (pull_request) Successful in 3m45s
CI / docker (pull_request) Successful in 1m3s
CI / coverage (pull_request) Failing after 17m2s
CI / e2e_tests (pull_request) Failing after 17m48s
CI / benchmark-regression (pull_request) Failing after 27m2s
2026-03-21 00:24:31 +00:00
Compare
Author
Member

Fixed 4 benchmark failures (coverage log was a server death — no code fix needed there).

1. phase_reversion_bench.pyAutoRevertSuite.time_auto_revert_from_apply
complete_strategize() auto-progresses the plan to EXECUTE, so the subsequent execute_plan() call failed with InvalidPhaseTransitionError: execute to execute. Added a phase check to skip execute_plan() if already in EXECUTE.

2. plan_explain_bench.pyPlanExplainSuite.time_explain_with_context
_build_explain_dict() no longer accepts a show_alternatives keyword argument (signature changed upstream). Removed the stale kwarg.

3. resource_cli_tree_bench.pyResourceInspectSuite and ResourceTreeSuite
The hardcoded resource_id="01HBENCH0000000000RESOURCE" contained characters (I) not valid in Crockford Base32 ULID format (^[0-9A-HJKMNP-TV-Z]{26}$). Replaced with a valid ULID-like string 01HBENCH0000000000RES0RCE0.

4. security_async_cleanup_bench.pyTimeRegisterBatch.time_register_100
Despite number = 1, ASV re-runs the benchmark across iterations sharing the same tracker, causing ValueError: Resource 'batch-0' is already registered. Fixed by creating a fresh AsyncResourceTracker inside the benchmark method itself.

Fixed 4 benchmark failures (coverage log was a server death — no code fix needed there). **1. `phase_reversion_bench.py` — `AutoRevertSuite.time_auto_revert_from_apply`** `complete_strategize()` auto-progresses the plan to EXECUTE, so the subsequent `execute_plan()` call failed with `InvalidPhaseTransitionError: execute to execute`. Added a phase check to skip `execute_plan()` if already in EXECUTE. **2. `plan_explain_bench.py` — `PlanExplainSuite.time_explain_with_context`** `_build_explain_dict()` no longer accepts a `show_alternatives` keyword argument (signature changed upstream). Removed the stale kwarg. **3. `resource_cli_tree_bench.py` — `ResourceInspectSuite` and `ResourceTreeSuite`** The hardcoded `resource_id="01HBENCH0000000000RESOURCE"` contained characters (`I`) not valid in Crockford Base32 ULID format (`^[0-9A-HJKMNP-TV-Z]{26}$`). Replaced with a valid ULID-like string `01HBENCH0000000000RES0RCE0`. **4. `security_async_cleanup_bench.py` — `TimeRegisterBatch.time_register_100`** Despite `number = 1`, ASV re-runs the benchmark across iterations sharing the same tracker, causing `ValueError: Resource 'batch-0' is already registered`. Fixed by creating a fresh `AsyncResourceTracker` inside the benchmark method itself.
Merge remote-tracking branch 'origin/master' into feature/m7-scaling-tests
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m54s
CI / security (pull_request) Successful in 4m4s
CI / unit_tests (pull_request) Successful in 7m7s
CI / integration_tests (pull_request) Successful in 6m41s
CI / docker (pull_request) Successful in 1m11s
CI / e2e_tests (pull_request) Successful in 8m57s
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
6e62bd5259
# Conflicts:
#	CHANGELOG.md
Merge remote-tracking branch 'origin/master' into feature/m7-scaling-tests
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / lint (pull_request) Successful in 3m17s
CI / quality (pull_request) Successful in 3m41s
CI / integration_tests (pull_request) Successful in 3m43s
CI / typecheck (pull_request) Successful in 3m51s
CI / security (pull_request) Successful in 3m57s
CI / e2e_tests (pull_request) Successful in 4m34s
CI / unit_tests (pull_request) Successful in 6m49s
CI / docker (pull_request) Successful in 1m0s
CI / coverage (pull_request) Successful in 11m52s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
155aca2b84
# Conflicts:
#	CHANGELOG.md
Merge branch 'master' into feature/m7-scaling-tests
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 28s
CI / lint (pull_request) Successful in 3m17s
CI / integration_tests (pull_request) Successful in 3m34s
CI / unit_tests (pull_request) Successful in 3m45s
CI / quality (pull_request) Successful in 3m48s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m0s
CI / e2e_tests (pull_request) Successful in 5m31s
CI / docker (pull_request) Successful in 2m11s
CI / coverage (pull_request) Successful in 10m49s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 47m32s
877844a8f7
# Conflicts:
#	CHANGELOG.md
Merge branch 'feature/m7-scaling-tests' of https://git.cleverthis.com/cleveragents/cleveragents-core into feature/m7-scaling-tests
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 21s
CI / lint (pull_request) Successful in 3m21s
CI / typecheck (pull_request) Successful in 4m15s
CI / quality (pull_request) Successful in 4m12s
CI / security (pull_request) Successful in 4m17s
CI / integration_tests (pull_request) Successful in 7m24s
CI / e2e_tests (pull_request) Successful in 9m42s
CI / coverage (pull_request) Failing after 16m34s
CI / unit_tests (pull_request) Failing after 20m34s
CI / benchmark-regression (pull_request) Successful in 48m3s
CI / docker (pull_request) Has been skipped
CI / status-check (pull_request) Failing after 1s
2c184ec657
brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-03-21 03:17:06 +00:00
ci: trigger CI re-run
All checks were successful
CI / build (pull_request) Successful in 28s
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m56s
CI / quality (pull_request) Successful in 4m15s
CI / typecheck (pull_request) Successful in 4m30s
CI / security (pull_request) Successful in 4m44s
CI / integration_tests (pull_request) Successful in 7m23s
CI / e2e_tests (pull_request) Successful in 8m30s
CI / unit_tests (pull_request) Successful in 8m49s
CI / docker (pull_request) Successful in 1m9s
CI / coverage (pull_request) Successful in 11m19s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 47m35s
344f05ebb0
brent.edwards deleted branch feature/m7-scaling-tests 2026-03-21 04:46:46 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!984
No description provided.