feat(perf): large project scaling tests #984

2026-03-16T22:26:14Z

brent.edwards commented

2026-03-16 22:26:14 +00:00

Summary

Add large project scaling benchmarks and tests at production scale (10K–100K files).

New ASV Benchmarks

IndexingScalingSuite (large_project_scaling_bench.py):

time_walk_and_index at 1K/10K/50K/100K files
time_incremental_refresh (1% modified files)
track_indexed_file_count, track_tokens_per_second

ContextAssemblyScalingSuite (context_assembly_scaling_bench.py):

time_full_pipeline at 100/1K/5K/10K fragments
time_tiered_strategy, time_recency_strategy
track_assembled_tokens, track_fragments_per_second

ExecutionThroughputSuite (execution_throughput_bench.py):

time_sequential_plans at 10/50/100 plans
time_executor_construction, time_decision_tree_scaling

Scale Fixture Updates

Added xlarge (50K files) and xxlarge (100K files) profiles to scale_metadata.json
Added 50K/100K thresholds to baseline_thresholds.json
Added context_assembly and execution_throughput threshold sections

Tests & Documentation

15 Behave scenarios validating profiles, thresholds, monotonicity, memory budgets
6 Robot integration tests including live 1K-file indexing throughput check
docs/reference/scaling_baselines.md documenting all baseline metrics

Quality Gates

Session	Result
`nox -s lint`	PASS
`nox -s typecheck`	PASS (0 errors)
`nox -s unit_tests`	PASS (10,910 scenarios)
`nox -s integration_tests`	PASS (1,526 tests)
`nox -s coverage_report`	97% (>= 97%)

Closes #859

## Summary Add large project scaling benchmarks and tests at production scale (10K–100K files). ### New ASV Benchmarks **IndexingScalingSuite** (`large_project_scaling_bench.py`): - `time_walk_and_index` at 1K/10K/50K/100K files - `time_incremental_refresh` (1% modified files) - `track_indexed_file_count`, `track_tokens_per_second` **ContextAssemblyScalingSuite** (`context_assembly_scaling_bench.py`): - `time_full_pipeline` at 100/1K/5K/10K fragments - `time_tiered_strategy`, `time_recency_strategy` - `track_assembled_tokens`, `track_fragments_per_second` **ExecutionThroughputSuite** (`execution_throughput_bench.py`): - `time_sequential_plans` at 10/50/100 plans - `time_executor_construction`, `time_decision_tree_scaling` ### Scale Fixture Updates - Added `xlarge` (50K files) and `xxlarge` (100K files) profiles to `scale_metadata.json` - Added 50K/100K thresholds to `baseline_thresholds.json` - Added `context_assembly` and `execution_throughput` threshold sections ### Tests & Documentation - 15 Behave scenarios validating profiles, thresholds, monotonicity, memory budgets - 6 Robot integration tests including live 1K-file indexing throughput check - `docs/reference/scaling_baselines.md` documenting all baseline metrics ### Quality Gates | Session | Result | |---|---| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (10,910 scenarios) | | `nox -s integration_tests` | PASS (1,526 tests) | | `nox -s coverage_report` | 97% (>= 97%) | Closes #859

brent.edwards force-pushed feature/m7-scaling-tests from 70d4d885ab to e3db32424b

2026-03-16 22:27:21 +00:00

Compare

brent.edwards self-assigned this 2026-03-16 22:27:36 +00:00

brent.edwards added this to the v3.6.0 milestone 2026-03-16 22:27:36 +00:00

brent.edwards added the

labels 2026-03-16 22:27:36 +00:00

brent.edwards referenced this pull request

2026-03-16 22:27:56 +00:00

feat(perf): large project scaling tests #859

freemo commented

2026-03-17 18:23:00 +00:00

PM Status — Day 37 — Rebase Required

This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.

@brent.edwards — Please rebase this PR onto master by Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.

PM rebase request — Day 37

## PM Status — Day 37 — Rebase Required This PR has **merge conflicts** and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved. @brent.edwards — Please rebase this PR onto `master` by **Day 39 EOD (2026-03-19)**. If you cannot rebase by then, please post a comment explaining the blocker. --- *PM rebase request — Day 37*

brent.edwards force-pushed feature/m7-scaling-tests from e3db32424b to ebdbc67d92

2026-03-18 03:28:53 +00:00

Compare

freemo approved these changes 2026-03-19 04:57:49 +00:00

Dismissed

freemo left a comment

Code Review — PR #984

Large project scaling tests. Proper labels, milestone, and issue linkage (#859).

Approved with one note: this PR has merge conflicts (mergeable: false). Please rebase against current master before merge.

## Code Review — PR #984 Large project scaling tests. Proper labels, milestone, and issue linkage (#859). **Approved** with one note: this PR has **merge conflicts** (`mergeable: false`). Please rebase against current `master` before merge.

freemo requested review from freemo 2026-03-19 05:21:42 +00:00

freemo requested review from hamza.khyari 2026-03-19 05:21:42 +00:00

brent.edwards force-pushed feature/m7-scaling-tests from ebdbc67d92 to e77cf8bb1c

2026-03-19 22:25:44 +00:00

Compare

brent.edwards dismissed freemo's review 2026-03-19 22:25:44 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards force-pushed feature/m7-scaling-tests from 5c6f23f9d5 to 34e1ace676

2026-03-21 00:24:31 +00:00

Compare

brent.edwards commented

2026-03-21 00:24:50 +00:00

Fixed 4 benchmark failures (coverage log was a server death — no code fix needed there).

1. phase_reversion_bench.py — AutoRevertSuite.time_auto_revert_from_apply
complete_strategize() auto-progresses the plan to EXECUTE, so the subsequent execute_plan() call failed with InvalidPhaseTransitionError: execute to execute. Added a phase check to skip execute_plan() if already in EXECUTE.

2. plan_explain_bench.py — PlanExplainSuite.time_explain_with_context
_build_explain_dict() no longer accepts a show_alternatives keyword argument (signature changed upstream). Removed the stale kwarg.

3. resource_cli_tree_bench.py — ResourceInspectSuite and ResourceTreeSuite
The hardcoded resource_id="01HBENCH0000000000RESOURCE" contained characters (I) not valid in Crockford Base32 ULID format (^[0-9A-HJKMNP-TV-Z]{26}$). Replaced with a valid ULID-like string 01HBENCH0000000000RES0RCE0.

4. security_async_cleanup_bench.py — TimeRegisterBatch.time_register_100
Despite number = 1, ASV re-runs the benchmark across iterations sharing the same tracker, causing ValueError: Resource 'batch-0' is already registered. Fixed by creating a fresh AsyncResourceTracker inside the benchmark method itself.

Fixed 4 benchmark failures (coverage log was a server death — no code fix needed there). **1. `phase_reversion_bench.py` — `AutoRevertSuite.time_auto_revert_from_apply`** `complete_strategize()` auto-progresses the plan to EXECUTE, so the subsequent `execute_plan()` call failed with `InvalidPhaseTransitionError: execute to execute`. Added a phase check to skip `execute_plan()` if already in EXECUTE. **2. `plan_explain_bench.py` — `PlanExplainSuite.time_explain_with_context`** `_build_explain_dict()` no longer accepts a `show_alternatives` keyword argument (signature changed upstream). Removed the stale kwarg. **3. `resource_cli_tree_bench.py` — `ResourceInspectSuite` and `ResourceTreeSuite`** The hardcoded `resource_id="01HBENCH0000000000RESOURCE"` contained characters (`I`) not valid in Crockford Base32 ULID format (`^[0-9A-HJKMNP-TV-Z]{26}$`). Replaced with a valid ULID-like string `01HBENCH0000000000RES0RCE0`. **4. `security_async_cleanup_bench.py` — `TimeRegisterBatch.time_register_100`** Despite `number = 1`, ASV re-runs the benchmark across iterations sharing the same tracker, causing `ValueError: Resource 'batch-0' is already registered`. Fixed by creating a fresh `AsyncResourceTracker` inside the benchmark method itself.

brent.edwards added 1 commit 2026-03-21 01:10:25 +00:00

Merge remote-tracking branch 'origin/master' into feature/m7-scaling-tests

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 16s

Details

CI / lint (pull_request) Successful in 3m19s

Details

CI / quality (pull_request) Successful in 3m42s

Details

CI / typecheck (pull_request) Successful in 3m54s

Details

CI / security (pull_request) Successful in 4m4s

Details

CI / unit_tests (pull_request) Successful in 7m7s

Details

CI / integration_tests (pull_request) Successful in 6m41s

Details

CI / docker (pull_request) Successful in 1m11s

Details

CI / e2e_tests (pull_request) Successful in 8m57s

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

CI / status-check (pull_request) Has been cancelled

Details

6e62bd5259

# Conflicts:
#	CHANGELOG.md

brent.edwards added 1 commit 2026-03-21 01:21:08 +00:00

Merge remote-tracking branch 'origin/master' into feature/m7-scaling-tests

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 15s

Details

CI / lint (pull_request) Successful in 3m17s

Details

CI / quality (pull_request) Successful in 3m41s

Details

CI / integration_tests (pull_request) Successful in 3m43s

Details

CI / typecheck (pull_request) Successful in 3m51s

Details

CI / security (pull_request) Successful in 3m57s

Details

CI / e2e_tests (pull_request) Successful in 4m34s

Details

CI / unit_tests (pull_request) Successful in 6m49s

Details

CI / docker (pull_request) Successful in 1m0s

Details

CI / coverage (pull_request) Successful in 11m52s

Details

CI / status-check (pull_request) Successful in 1s

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

155aca2b84

# Conflicts:
#	CHANGELOG.md

brent.edwards added 1 commit 2026-03-21 01:42:53 +00:00

Merge branch 'master' into feature/m7-scaling-tests

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 28s

Details

CI / lint (pull_request) Successful in 3m17s

Details

CI / integration_tests (pull_request) Successful in 3m34s

Details

CI / unit_tests (pull_request) Successful in 3m45s

Details

CI / quality (pull_request) Successful in 3m48s

Details

CI / typecheck (pull_request) Successful in 3m55s

Details

CI / security (pull_request) Successful in 4m0s

Details

CI / e2e_tests (pull_request) Successful in 5m31s

Details

CI / docker (pull_request) Successful in 2m11s

Details

CI / coverage (pull_request) Successful in 10m49s

Details

CI / status-check (pull_request) Successful in 1s

Details

CI / benchmark-regression (pull_request) Successful in 47m32s

Details

877844a8f7

brent.edwards added 2 commits 2026-03-21 03:16:27 +00:00

Merge remote-tracking branch 'origin/master' into feature/m7-scaling-tests 9b61feaaae

# Conflicts:
#	CHANGELOG.md

Merge branch 'feature/m7-scaling-tests' of https://git.cleverthis.com/cleveragents/cleveragents-core into feature/m7-scaling-tests

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 21s

Details

CI / lint (pull_request) Successful in 3m21s

Details

CI / typecheck (pull_request) Successful in 4m15s

Details

CI / quality (pull_request) Successful in 4m12s

Details

CI / security (pull_request) Successful in 4m17s

Details

CI / integration_tests (pull_request) Successful in 7m24s

Details

CI / e2e_tests (pull_request) Successful in 9m42s

Details

CI / coverage (pull_request) Failing after 16m34s

Details

CI / unit_tests (pull_request) Failing after 20m34s

Details

CI / benchmark-regression (pull_request) Successful in 48m3s

Details

CI / docker (pull_request) Has been skipped

Details

CI / status-check (pull_request) Failing after 1s

Details

2c184ec657

brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-03-21 03:17:06 +00:00

brent.edwards added 1 commit 2026-03-21 04:30:51 +00:00

ci: trigger CI re-run

CI / build (pull_request) Successful in 28s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 3m56s

Details

CI / quality (pull_request) Successful in 4m15s

Details

CI / typecheck (pull_request) Successful in 4m30s

Details

CI / security (pull_request) Successful in 4m44s

Details

CI / integration_tests (pull_request) Successful in 7m23s

Details

CI / e2e_tests (pull_request) Successful in 8m30s

Details

CI / unit_tests (pull_request) Successful in 8m49s

Details

CI / docker (pull_request) Successful in 1m9s

Details

CI / coverage (pull_request) Successful in 11m19s

Details

CI / status-check (pull_request) Successful in 1s

Details

CI / benchmark-regression (pull_request) Successful in 47m35s

Details

344f05ebb0

brent.edwards merged commit b88bc0ec1b into master

2026-03-21 04:46:45 +00:00

brent.edwards deleted branch feature/m7-scaling-tests

2026-03-21 04:46:46 +00:00

brent.edwards referenced this issue from a commit

2026-03-21 04:46:46 +00:00

feat(perf): large project scaling tests (#984)

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#984