feat(perf): large project scaling tests #984
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!984
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m7-scaling-tests"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Add large project scaling benchmarks and tests at production scale (10K–100K files).
New ASV Benchmarks
IndexingScalingSuite (
large_project_scaling_bench.py):time_walk_and_indexat 1K/10K/50K/100K filestime_incremental_refresh(1% modified files)track_indexed_file_count,track_tokens_per_secondContextAssemblyScalingSuite (
context_assembly_scaling_bench.py):time_full_pipelineat 100/1K/5K/10K fragmentstime_tiered_strategy,time_recency_strategytrack_assembled_tokens,track_fragments_per_secondExecutionThroughputSuite (
execution_throughput_bench.py):time_sequential_plansat 10/50/100 planstime_executor_construction,time_decision_tree_scalingScale Fixture Updates
xlarge(50K files) andxxlarge(100K files) profiles toscale_metadata.jsonbaseline_thresholds.jsoncontext_assemblyandexecution_throughputthreshold sectionsTests & Documentation
docs/reference/scaling_baselines.mddocumenting all baseline metricsQuality Gates
nox -s lintnox -s typechecknox -s unit_testsnox -s integration_testsnox -s coverage_reportCloses #859
70d4d885abe3db32424bPM Status — Day 37 — Rebase Required
This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.
@brent.edwards — Please rebase this PR onto
masterby Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.PM rebase request — Day 37
e3db32424bebdbc67d92Code Review — PR #984
Large project scaling tests. Proper labels, milestone, and issue linkage (#859).
Approved with one note: this PR has merge conflicts (
mergeable: false). Please rebase against currentmasterbefore merge.ebdbc67d92e77cf8bb1cNew commits pushed, approval review dismissed automatically according to repository settings
5c6f23f9d534e1ace676Fixed 4 benchmark failures (coverage log was a server death — no code fix needed there).
1.
phase_reversion_bench.py—AutoRevertSuite.time_auto_revert_from_applycomplete_strategize()auto-progresses the plan to EXECUTE, so the subsequentexecute_plan()call failed withInvalidPhaseTransitionError: execute to execute. Added a phase check to skipexecute_plan()if already in EXECUTE.2.
plan_explain_bench.py—PlanExplainSuite.time_explain_with_context_build_explain_dict()no longer accepts ashow_alternativeskeyword argument (signature changed upstream). Removed the stale kwarg.3.
resource_cli_tree_bench.py—ResourceInspectSuiteandResourceTreeSuiteThe hardcoded
resource_id="01HBENCH0000000000RESOURCE"contained characters (I) not valid in Crockford Base32 ULID format (^[0-9A-HJKMNP-TV-Z]{26}$). Replaced with a valid ULID-like string01HBENCH0000000000RES0RCE0.4.
security_async_cleanup_bench.py—TimeRegisterBatch.time_register_100Despite
number = 1, ASV re-runs the benchmark across iterations sharing the same tracker, causingValueError: Resource 'batch-0' is already registered. Fixed by creating a freshAsyncResourceTrackerinside the benchmark method itself.