feat(acms): add ACMS v1 context pipeline #465
No reviewers
Labels
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!465
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "feature/m5-acms-context"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
ContextFragment,ContextBudget, andContextPayloadfrozen Pydantic v2 domain modelsACMSPipelineservice with three built-in fusion strategies: relevance (highest score first), recency (most recent first), tiered (hot > warm > cold priority)register_strategy()for custom fusion strategiesTesting
robot/acms_pipeline.robot)benchmarks/acms_pipeline_bench.py)Files Changed
src/cleveragents/domain/models/core/context_fragment.py— Domain models (ContextFragment, ContextBudget, ContextPayload)src/cleveragents/application/services/acms_service.py— ACMSPipeline with fusion strategiesfeatures/acms_pipeline.feature+features/steps/acms_pipeline_steps.py— BDD testsrobot/acms_pipeline.robot+robot/helper_acms_pipeline.py— Integration testsbenchmarks/acms_pipeline_bench.py— ASV benchmarksdocs/reference/acms.md— Reference documentationCHANGELOG.md— UpdatedCloses #188
Code Review Report: Issue #188 —
feat(acms): add ACMS v1 context pipelineBranch:
feature/m5-acms-contextAuthor: Hamza Khyari (
khyari hamza <khyarihamza3@gmail.com>)Commits reviewed (3):
89bc078feat(acms): add ACMS v1 context pipelinec628856fix(acms): address review findings for ACMS context pipeline34d3d76style(acms): fix import sorting in core __init__Files changed: 10 files, +1301 lines
Forgejo issue: #188 — "feat(acms): add ACMS v1 context pipeline"
CRITICAL — Specification Compliance
C1.
ContextFragmentis missing key fields from the specificationSeverity: Critical | File:
src/cleveragents/domain/models/core/context_fragment.py:17-28The specification (
docs/specification.md~line 24743) definesContextFragmentwith these fields:The implementation deviates significantly:
uko_nodeis replaced withsource: str— a plain label ("file","decision") instead of a UKO URI. This removes all UKO integration. The issue title and commit message both claim "UKO and CRP integration" but UKO is absent.detail_depthis completely missing — a core concept in the spec's multi-level rendering system (MODULE_LISTING(0) through FULL_SOURCE(9)).provenance: FragmentProvenanceis completely missing — the spec mandates traceability back to resource+location for every fragment.token_countrenamed totoken_estimate— a semantic difference. The spec tracks actual counts; the implementation stores estimates.C2.
ContextPayloaddiverges from spec'sAssembledContextSeverity: Critical | File:
src/cleveragents/domain/models/core/context_fragment.py:52-71The spec (~line 24760) defines the output as
AssembledContextwith:Missing from
ContextPayload:budget_used— no fractional budget metricstrategies_used— stores a singlestrategy: strinstead of a list (spec allows multi-strategy assembly)context_hash— no cryptographic hash for snapshot integrity/cachingpreamble— no preamble supportprovenance_map— no provenance trackingC3. Pipeline is missing the 10-component architecture
Severity: Critical | File:
src/cleveragents/application/services/acms_service.pyThe specification defines a 10-component pluggable pipeline across 3 phases:
The implementation collapses all of this into a single
assemble()method (~15 lines) that callsstrategy.rank()and sums tokens. None of the 10 components exist. There is no deduplication, no depth resolution, no scoring, no ordering, no preamble generation, and no skeleton compression.C4.
FusionStrategyprotocol is severely stripped down from spec'sContextStrategySeverity: High | File:
src/cleveragents/application/services/acms_service.py:27-35The spec defines:
The implementation:
Missing:
name,capabilities,can_handle,explain, and the ability to query backends. Strategies receive pre-fetched fragments rather than accessing data stores.C5. CRP acronym is wrong in the reference documentation
Severity: Medium | File:
docs/reference/acms.md:5The documentation says "CRP (Context Relevance Processor)" but the specification consistently defines CRP as "Context Request Protocol" (see
docs/specification.mdlines 31, 175, 24530, 24777). This is a factual error in the reference docs.BUG DETECTION
B1.
RecencyFusionStrategysorts by string comparison of ISO timestampsSeverity: Medium | File:
src/cleveragents/application/services/acms_service.py:63-71created_atis astrfield. String comparison of ISO 8601 timestamps works correctly ONLY when all timestamps use the same timezone format. The default factory usesdatetime.now(UTC).isoformat(), which produces+00:00offsets. But externally created fragments (e.g., from deserialized data) might useZsuffix or different offsets like+05:30. Under mixed formats, string sorting produces incorrect ordering. The strategy should parse todatetimeobjects for comparison.B2.
ContextBudgetdefaults create an implicit minimummax_tokensconstraintSeverity: Low | File:
src/cleveragents/domain/models/core/context_fragment.py:30-44The
ge=1constraint onmax_tokenscombined with the defaultreserved_tokens=512meansContextBudget(max_tokens=100)raises aValidationErrorbecause 512 >= 100. The effective minimum formax_tokensis 513 (not 1) unlessreserved_tokensis explicitly overridden. This is undocumented and surprising.B3.
token_estimate=0default allows unlimited fragment inclusionSeverity: Low | File:
src/cleveragents/domain/models/core/context_fragment.py:24Fragments with
token_estimate=0pass every budget check and will all be included. In production, this could lead to payloads with far more fragments than the budget intends to allow if fragments are created without token estimates.TEST COVERAGE AND TEST FLAWS
T1. Inconsistent error attribute naming in Behave steps
Severity: Medium | File:
features/steps/acms_pipeline_steps.pyValidation error steps use two different context attribute names:
context.validation_error— set bystep_create_fragment_bad_score(line 168) andstep_create_fragment_bad_tokens(line 176)context.acms_error— set bystep_budget_reserved_equals_max(line 148) andstep_invalid_tier(line 157)The assertion step then checks both via
getattr:This is fragile. If a previous scenario set
context.validation_errorand a subsequent scenario only intends to setcontext.acms_errorbut the code path doesn't raise, the test still passes because the stalevalidation_errorfrom a prior scenario is found. Behave'sContextdoes NOT reset custom attributes between scenarios.T2. No Robot Framework test for the
recencystrategySeverity: Medium | File:
robot/acms_pipeline.robotRobot tests cover:
fragment-create,budget-calc,assemble-relevance,assemble-tiered,payload-budget-check. Therecencystrategy has zero Robot integration coverage. Correspondingly,robot/helper_acms_pipeline.pyhas no_cmd_assemble_recencyhandler.T3. No ASV benchmark for the
recencystrategySeverity: Low | File:
benchmarks/acms_pipeline_bench.pyThe benchmark suite covers relevance (default, lines 92-104) and tiered (lines 116-126) strategies but omits
recency. Given that recency sorting relies on string comparison (see B1), performance characteristics under large datasets should be measured.T4.
plan_id,payload_id, andassembled_atare never assertedSeverity: Low | Files:
features/steps/acms_pipeline_steps.py,robot/helper_acms_pipeline.pyNo Behave scenario or Robot test verifies that
plan_idis correctly propagated to the payload, thatpayload_idis generated, or thatassembled_atis set. These are structural fields ofContextPayloadwith no test coverage.T5. No test for overwriting a built-in strategy name
Severity: Low | File:
features/acms_pipeline.featureregister_strategy("relevance", SomeOtherClass)silently overwrites the built-in. There is no test verifying this behavior or guarding against it.T6. No test for
reserved_tokens > max_tokens(only==is tested)Severity: Low | File:
features/acms_pipeline.feature:71The scenario "Budget rejects reserved_tokens >= max_tokens" only tests the
==case (max_tokens=100, reserved_tokens=100). The>case (e.g.,max_tokens=50, reserved_tokens=100) is untested, though the validator uses>=.PERFORMANCE
P1. Strategy objects are instantiated on every
assemble()callSeverity: Low | File:
src/cleveragents/application/services/acms_service.py:145-146All three built-in strategies are stateless. Creating a new instance per call adds unnecessary GC pressure on hot paths. Strategy instances could be cached after first creation.
P2. No early termination when budget is fully consumed
Severity: Low | File:
src/cleveragents/application/services/acms_service.py:49-53(and similar in all strategies)The loop continues iterating all remaining fragments even after
total == budget.available_tokens. Anelif total >= budget.available_tokens: breakwould avoid unnecessary iterations on large fragment sets where the budget fills early.SECURITY
S1.
sourcefield has no validation against allowed valuesSeverity: Low | File:
src/cleveragents/domain/models/core/context_fragment.py:20The docs and spec indicate
sourceshould be one of"file","decision","resource","user". The field is a plainstrwith no constraint. For a domain model, this should beLiteral["file", "decision", "resource", "user"]or at least validated.S2. No size limits on
contentormetadataSeverity: Low | File:
src/cleveragents/domain/models/core/context_fragment.py:21,26content: strandmetadata: dict[str, str]have no upper-bound constraints. A caller can create fragments with arbitrarily large content strings or metadata dicts, potentially exhausting memory. Production systems should enforce max lengths.ARCHITECTURAL / STYLE
A1.
ACMSPipelinedoes not follow the project's service conventionsSeverity: Medium | File:
src/cleveragents/application/services/acms_service.py:104-126Every other service in
application/services/follows the pattern:ACMSPipelinetakes onlydefault_strategy: str, has nosettings/unit_of_workinjection, and has zero logging. Nostructlogcalls anywhere in the service. This will make DI integration, observability, and debugging harder as the pipeline matures.A2. Tiers are fragment tags, not actual storage tiers
Severity: Medium | Conceptual
The spec defines hot/warm/cold as storage tiers with retention policies (warm: 24h, cold: 90d). The implementation treats
tieras a simple label onContextFragmentused only for sorting priority. There is no storage tier system, no retention, no promotion/demotion. This gap should be acknowledged in the documentation.SUMMARY TABLE
ContextFragmentmissinguko_node,detail_depth,provenance, renamestoken_countContextPayloadmissingbudget_used,strategies_used,context_hash,preamble,provenance_mapFusionStrategyseverely stripped from spec'sContextStrategymax_tokens=513due to defaultreserved_tokenstoken_estimate=0default allows unlimited fragmentsrecencystrategyrecencystrategyplan_id,payload_id,assembled_atnever asserted==case, not>assemble()callsourcefield accepts arbitrary stringscontentormetadatastructlog, DI pattern (settings/unit_of_work)Critical items (C1-C3) represent fundamental gaps between the implementation and the specification. The commit message claims "UKO and CRP integration" but neither UKO nor CRP is present. The "v1" framing may justify a simplified approach, but the delta between the implementation and the spec should be explicitly documented as known limitations so the milestone tracking is accurate.
Code Review Report -- Commit
8c6b4122(branchfeature/m5-acms-context)Reviewed commit:
8c6b4122--fix(acms): length-prefix fragment content in context hashAuthor: Hamza Khyari
Forgejo Issue: #188 --
feat(acms): add ACMS v1 context pipelineFiles in branch diff (vs origin/master): 10 files, +2,266 lines
Review scope: test coverage, test flaws, bugs, spec alignment, performance, security
1. TEST COVERAGE GAPS (4 findings)
T1 [HIGH] -- No dedicated test for
compute_context_hashverifying the fix this commit introducesLocation:
src/cleveragents/domain/models/core/context_fragment.py:208-220This function is the sole target of commit
8c6b4122, yet no test verifies the collision the commit claims to fix.The existing test (
features/steps/acms_pipeline_steps.py:459-464) only asserts the hash is non-empty and 64 chars:Missing test cases:
["AB", "C"]must produce a different hash than["A", "BC"]. This is the exact vulnerability this commit fixes -- yet no test proves it.["A", "B"]vs["B", "A"]must differ.compute_context_hash(())should return a stable hash.T2 [MEDIUM] -- No test with multi-byte UTF-8 content
The fix uses
len(encoded)(byte length) as the prefix, which correctly differs from character length for multi-byte sequences. However, no test exercises content like"\u00e9"(2 bytes),"\u4e16"(3 bytes), or"\U0001f600"(4 bytes) to confirm the byte-length prefix handles them correctly.T3 [MEDIUM] -- Robot helper hash assertion is weaker than Behave equivalent
robot/helper_acms_pipeline.py:230-231checks onlynot payload.context_hash(truthy). Unlike the Behave step, it does not verify the length is 64 or that the string is valid hex. A corrupt hash could pass undetected in Robot tests.T4 [LOW] -- Benchmark does not exercise
compute_context_hashin isolationbenchmarks/acms_pipeline_bench.pybenchmarkspipeline.assemblethroughput, which includes hashing. After this commit added a.to_bytes(8, "big")call per fragment, the hash function now does more work. There is no isolated benchmark to detect regressions in the hash function itself (e.g., at 10,000+ fragments, matching the milestone's "Projects with 10,000+ files" criterion from the Milestone v3.4.0 description).2. BUGS (3 findings)
B1 [HIGH] --
provenance_mapkeying deviates from specSpec (
docs/specification.md:42914):Implementation (
src/cleveragents/domain/models/core/context_fragment.py:223-234):The spec keys by
uko_node(UKO URI); the implementation keys byfragment_id(ULID). This is a semantic mismatch: consumers reading the spec would look up by UKO URI but would find ULIDs instead. Additionally, multiple fragments can share the sameuko_node(at different depths), so the spec's keying could lose entries -- but the implementation should document this deviation explicitly if intentional.B2 [MEDIUM] --
strategies_usedandprovenance_mapare mutable inside a frozen modelContextPayloadis declaredfrozen=True, and the docstring calls these "frozen Pydantic v2 value objects." However,strategies_used: list[str]andprovenance_map: dict[str, Any]are mutable containers. Pydantic'sfrozen=Trueonly prevents attribute reassignment; it does not prevent:This violates the stated immutability contract and could corrupt snapshot integrity (the
context_hashwould no longer match the actual payload state). The same issue applies toContextFragment.metadata: dict[str, str].Fix: Use
tuple[str, ...]instead oflist[str]forstrategies_used, and useMappingProxyTypeor Pydantic's frozen-dict mechanism for the dicts, or at minimum add amodel_validatorthat freezes them.B3 [LOW] -- Dead code in
budget_usedcalculationsrc/cleveragents/application/services/acms_service.py:438:The
else 0.0branch is unreachable.ContextBudgetvalidatesreserved_tokens < max_tokensandmax_tokens >= 1, guaranteeingavailable_tokens >= 1. The dead branch obscures the invariant; removing it (or adding an assertion) would make the code clearer.3. SPEC ALIGNMENT (4 findings)
S1 [MEDIUM] --
ContextStrategyprotocol method signatures differ from specdocs/specification.md:25176-25189)acms_service.py:78-92)can_handle(request: ContextRequest, backends: BackendSet) -> float(request: dict[str, Any]) -> floatassemble(request: ContextRequest, backends: BackendSet, budget: int, plan_context: PlanContext) -> list[ContextFragment](fragments: Sequence[ContextFragment], budget: ContextBudget) -> Sequence[ContextFragment]The v1 simplification passes pre-fetched fragments and a
ContextBudgetobject rather than raw request/backend parameters. This is reasonable for v1 but should be documented indocs/reference/acms.mdas a known deviation with a path to spec conformance.S2 [MEDIUM] --
StrategyCapabilitiesfield names differ from specdocs/specification.md:25193-25204)acms_service.py:48-54)uses_text,uses_vector,uses_graph,uses_temporal,uko_levels,resource_types,supports_depth_breadth,supports_plan_hierarchy,quality_scoresupports_semantic_search,supports_graph_navigation,supports_temporal_archaeology,max_fragmentsMany spec-defined fields are missing; the implementation has its own field names. This prevents future interop with spec-compliant strategies without an adapter.
S3 [LOW] -- Pipeline component protocols have simplified signatures vs spec
Several Phase 2/3 protocol method signatures omit parameters present in the spec:
DetailDepthResolver.resolve()-- missingbudgetparameter (spec:docs/specification.md:42731-42738)FragmentScorer.score()-- missingplan_contextparam, returnsSequence[ContextFragment]instead oflist[ScoredFragment]PreambleGenerator.generate()-- missingstrategies_used,budget_used,max_tokensparams (spec:docs/specification.md:42795-42803)These are acceptable v1 simplifications but should be called out in
docs/reference/acms.mdunder a "v1 limitations" section for the future upgrade path.S4 [LOW] --
ContextFragmentmodel adds fields beyond specThe implementation adds
fragment_id,tier, andcreated_atwhich are not in the spec'sContextFragmentdefinition (docs/specification.md:25081-25089). These are useful additions but represent scope beyond the spec dataclass. Worth documenting.4. PERFORMANCE (2 findings)
P1 [LOW] --
_pack_budgetskips fragments that don't fit instead of trying smaller onessrc/cleveragents/application/services/acms_service.py:108-113: The early-termination guard at line 109 (if total >= available: break) causes the loop to exit once the budget is fully consumed, even if subsequent fragments withtoken_count=0exist. Zero-token fragments (e.g., empty structural markers) would be incorrectly excluded. This is an unlikely but valid edge case.P2 [LOW] -- Double iteration to compute total_tokens
src/cleveragents/application/services/acms_service.py:436recomputessum(f.token_count for f in final_fragments), but_pack_budgetalready computes the running total internally and discards it. For very large fragment lists, this is a wasted O(n) pass. Could be addressed by having_pack_budgetreturn a(list, total)tuple.5. SECURITY (1 finding)
SEC1 [LOW] --
register_strategyaccepts unvalidated objectssrc/cleveragents/application/services/acms_service.py:463-470: The method insertsstrategyintoself._strategieswithout validating it satisfies theContextStrategyprotocol. SinceContextStrategyis@runtime_checkable, anisinstancecheck would be trivial and would prevent lateAttributeErrorexplosions at assembly time:6. TEST FLAWS (1 finding)
TF1 [MEDIUM] -- BDD test for "non-existent strategy" catches wrong exception type
features/steps/acms_pipeline_steps.py:290-302catchesValueErrorfromassemble, stored incontext.assemble_error. The assertion at line 404-407 checkscontext.assemble_error is not None. However, the When step catchesValueErrorbut does NOT catchKeyError,AttributeError, or other exceptions that a malformed or missing strategy could raise. If the implementation were refactored to raise a different exception type, the test would pass with an uncaught exception (Behave would report it as a step failure, not an assertion failure), masking the real issue.Summary Table
provenance_mapkeyed byfragment_idvs spec'suko_nodelist/dictin frozen models violates immutability contractelse 0.0branch in budget_used calculationContextStrategymethod signatures differ from specStrategyCapabilitiesfield names differ from specContextFragmentnot in specregister_strategyaccepts unvalidated objectsHigh-priority items to address before merge: T1 (test the fix this commit introduces) and B1 (provenance_map keying contradiction with spec).
Code Review Report —
6f318e36Commit:
6f318e36—fix(acms): address review findings for ACMS v1 context pipelineAuthor: Hamza Khyari
Branch:
feature/m5-acms-contextIssue: #188
Files changed: 10 files, +372 / -31 lines
Summary
The commit adds all 10 spec-defined pipeline component Protocols + Default implementations (Phase 1: StrategySelector, BudgetAllocator, StrategyExecutor; Phase 2: BudgetPacker; Phase 3: SkeletonCompressor), refactors
assemble()to invoke them in order, switchesstrategies_usedto immutable tuple, adds defensive copy validators, integrates structlog + DI params, adds structural Behave scenarios, and updates the reference docs. The overall structure is solid and well-organized, but there are several issues worth addressing.B — Bugs & Logic Errors
B1 [HIGH] —
DefaultBudgetAllocatorduplicates total budget to every candidatesrc/cleveragents/application/services/acms_service.py:397-405The comment says "proportional share" but the code gives every candidate the full
total_budget. With N candidates, the sum of allocated tokens = N * total_budget, violating the spec constraint (specification.mdline 42689: "Sum of allocated_tokens must not exceed total_budget"). In v1 only one strategy is passed so this is latent, but it will break immediately when multi-strategy support is wired in. Either distribute proportionally or document that v1 intentionally allocates the full budget to a single candidate and add a guard/assertion.B2 [MEDIUM] —
DefaultStrategyExecutorignores allocated token budgetsrc/cleveragents/application/services/acms_service.py:416-426The
_tokensvariable (the per-strategy allocation from the BudgetAllocator) is discarded. The strategy receives the fullContextBudgetobject instead of the allocated subset. This means budget allocation is effectively a no-op: even if a custom BudgetAllocator properly divides the budget, the executor ignores it. The fix would be to construct a newContextBudgetfrom_tokensor pass it through.B3 [MEDIUM] —
assemble()passes only one strategy to the selector, defeating the 10-component architecturesrc/cleveragents/application/services/acms_service.py:610-615The StrategySelector always receives a single-element list. A custom StrategySelector that implements multi-strategy selection (as the spec envisions with
ConfidenceWeightedSelector) would never see the other registered strategies. The pipeline should passlist(self._strategies.values())or the entire strategies dict to the selector.B4 [LOW] —
provenance_mapdefensive copy is shallow onlysrc/cleveragents/domain/models/core/context_fragment.py:193-197The
provenance_maphas typedict[str, Any]where values are mutable dicts (as produced bybuild_provenance_map).dict(v)is a shallow copy — the nested value dicts remain shared. A caller could still dopayload.provenance_map["some_id"]["resource_uri"] = "evil"and mutate the internal state. For a frozen model, consider{k: dict(val) if isinstance(val, dict) else val for k, val in v.items()}or usecopy.deepcopy.T — Test Coverage & Test Flaws
T1 [HIGH] — Zero tests for custom pipeline component injection (DI)
The commit adds 10 injectable pipeline component constructor params (
strategy_selector,budget_allocator,strategy_executor,packer,skeleton_compressor, etc.), but no Behave scenario, Robot test, or unit test verifies that a custom-injected component is actually invoked duringassemble(). This is the main extensibility contract of the 10-component architecture and it is entirely untested.T2 [HIGH] — No tests for
SkeletonCompressorThe
SkeletonCompressorprotocol andDefaultSkeletonCompressorare added but never exercised by any test. There is no Behave scenario and no Robot test that callscompress()or verifies skeleton compression behavior. This leaves a gap in both unit and integration coverage.T3 [MEDIUM] — No tests for multi-candidate allocation path
DefaultBudgetAllocatorandDefaultStrategySelectorare never tested with multiple strategies. A test passing 2+ strategies to the selector and allocator would immediately surface Bug B1 above.T4 [MEDIUM] —
settingsandunit_of_workDI params are untested dead codesrc/cleveragents/application/services/acms_service.py:558-559These parameters are accepted in the constructor and stored but never read by any method. No test verifies their presence or behavior. If these are placeholders for future milestones, they should be documented as such. Dead code creates confusion about the service's actual dependency surface.
T5 [LOW] — Import inside function body in test step
features/steps/acms_pipeline_steps.py:496datetimeis already imported at the module level (line 10). Thefrom datetime import UTC, timedeltashould be at the top of the file with the other imports for consistency and to avoid repeated import overhead.T6 [LOW] — Robot tests do not cover Phase 1 pipeline components
The Robot helper (
robot/helper_acms_pipeline.py) exercises only fragment creation, budget calculation, and the 3 built-in strategies. There are no Robot tests for the Phase 1 orchestration path (selector -> allocator -> executor) or any of the new default components.SC — Spec Conformance Issues
SC1 [MEDIUM] — Multiple protocol signature divergences not documented in Known Limitations
The
docs/reference/acms.mdKnown Limitations table (line 151-161) documents deviations forContextStrategy,StrategyCapabilities, and some Phase 2/3 components, but omits the following significant divergences:specification.md~42660-42820)StrategySelector.select()(strategies, request: dict)(strategies, request: ContextRequest, backends: BackendSet)BudgetAllocator.allocate()(candidates, total_budget)(candidates, total_budget, request: ContextRequest)StrategyExecutor.execute()(allocations, fragments, budget)(allocations, request, backends, plan_context)BudgetPacker.pack()(fragments: Sequence[ContextFragment], budget: ContextBudget)(scored_fragments: list[ScoredFragment], budget: int, detail_level_maps)SkeletonCompressor.compress()(fragments: tuple, skeleton_budget: int) -> tuple(parent_context: AssembledContext, child_focus: list[str], skeleton_budget: int) -> AssembledContextAll of these should be added to the Known Limitations table for traceability. The
StrategyExecutordivergence is particularly significant because it takes a completely different set of parameters.SC2 [LOW] — Duplicated section comment in feature file
features/acms_pipeline.feature:324-326andfeatures/acms_pipeline.feature:339-341both have the same section comment# Budget validation edge case, but the first section (lines 328-337) is actually the "Payload structural assertions" test forplan_id,payload_id, andassembled_at. The comment is misleading.S — Security Issues
S1 [MEDIUM] — No input validation on
plan_idsrc/cleveragents/application/services/acms_service.py:587andsrc/cleveragents/domain/models/core/context_fragment.py:160The
plan_idfield onContextPayloadand theplan_idparameter onassemble()accept any arbitrary string with no validation — nomin_length, nomax_length, no format constraints. A caller could pass an empty string, a multi-megabyte string, or a string with control characters. Sinceplan_idappears in structlog output (acms_service.py:604) and potentially in serialized payloads, this is a log-injection/DoS vector. At minimum, addmin_length=1and a reasonablemax_lengthconstraint onContextPayload.plan_id.S2 [LOW] — Fragment
contentup to 1MB allowed per fragment with no fragment count limitsrc/cleveragents/domain/models/core/context_fragment.py:23, 69-72MAX_CONTENT_LENGTH = 1_000_000per fragment. With no limit on the number of fragments passed toassemble(), a caller could pass thousands of 1MB fragments. While_pack_budgetlimits what gets into the final payload, the hash computation (compute_context_hash) and sorting still operate on the full input list. This is more of a resource consumption concern than a security bug, but worth noting for hardening.P — Performance Issues
P1 [LOW] —
DefaultStrategySelector.select()callscan_handle()on every strategy per assemblysrc/cleveragents/application/services/acms_service.py:382-387In v1 only one strategy is passed (due to B3), so this is O(1). But if the pipeline is fixed to pass all registered strategies,
can_handle()will be called on every strategy even for simple requests. Consider caching results or short-circuiting when a single strategy is specified.P2 [LOW] — No pipeline component result caching
The pipeline runs all 10 components sequentially on every
assemble()call even when fragments and budget haven't changed. For repeated assemblies with the same input (common during iterative planning), a memoization layer oncompute_context_hashor the full pipeline output would eliminate redundant work.Overall Assessment
The commit is structurally sound and delivers the core 10-component architecture as specified by issue #188. The code is well-documented, the naming is consistent, and the Behave/Robot/ASV test layers all received updates. However, there are 3 bugs that will surface as soon as multi-strategy support is added (B1-B3), a complete absence of DI tests for the pipeline components (T1), no test coverage for SkeletonCompressor (T2), 5 undocumented spec divergences (SC1), and a missing validation on
plan_idthat could lead to log injection (S1). These should be resolved before merging.e386c4c8ce0836502eceCode Review Report — Commit
0836502eCommit:
0836502e—fix(acms): address second review — pipeline bugs, DI tests, spec docsAuthor: khyari hamza
Branch:
feature/m5-acms-contextIssue: #188 — feat(acms): add ACMS v1 context pipeline
Spec reference:
docs/specification.md(ACMS pipeline ~lines 42613–42945)Files changed (6):
acms_service.py,context_fragment.py,core/__init__.py,acms_pipeline.feature,acms_pipeline_steps.py,docs/reference/acms.mdBUGS (5 findings)
BUG-1 (Medium):
DefaultBudgetAllocatorleaks tokens due toint()truncationsrc/cleveragents/application/services/acms_service.py:417Using
int()truncates toward zero. With certain inputs, the sum of allocated tokens is strictly less thantotal_budget. Example: budget=999, confidences=(0.33, 0.33, 0.34) yields allocations (329, 329, 339) = 997, wasting 2 tokens. While this does not violate the spec constraint ("sum must not exceedtotal_budget"), it causes silent under-utilization. The spec'sProportionalBudgetAllocatoris expected to distribute the full budget. A largest-remainder allocation method would fix this.The zero-confidence fallback at line 415 (
total_budget // len(candidates)) has the same issue — integer division can lose up toN-1tokens.BUG-2 (Medium):
DefaultStrategyExecutormasks zero-allocation withmax(allocated_tokens, 1)src/cleveragents/application/services/acms_service.py:440If the allocator assigns 0 tokens (possible from BUG-1 rounding), the strategy still receives 1 token and is invoked. This can produce unexpected micro-fragments from a strategy that should have been excluded. A more correct approach would be to skip strategies with 0 allocation entirely.
BUG-3 (Low):
provenance_mapdefensive copy is only one level deepsrc/cleveragents/domain/models/core/context_fragment.py:197The docstring claims "Deep defensive copy" but only dict values are copied one level deeper. Other mutable types (
list,set, nested dicts-within-dicts) are passed by reference. Since the field is typeddict[str, Any], a caller could pass{"key": [mutable_list]}and later mutate the model's internal state despite the frozen flag. Either usecopy.deepcopy(v)or restrict the type todict[str, dict[str, str]].BUG-4 (Low):
assemble()discards custom selector's multi-strategy resultssrc/cleveragents/application/services/acms_service.py:646-648The method passes all registered strategies to the selector (line 636–641), then immediately filters the result back to a single strategy using identity comparison (
is). This creates a misleading API contract: a customStrategySelectorthat implements multi-strategy fusion will have its output silently discarded. Theiscomparison (vs==) adds fragility — if a custom selector wraps or copies strategy objects, filtering always falls back to(resolved, 1.0)regardless of the selector's confidence computation. The selector should either receive only the single strategy (like before) or the filtering should be clearly documented in the Protocol docstring.BUG-5 (Low):
DefaultStrategyExecutor.execute()ignores itsbudgetparametersrc/cleveragents/application/services/acms_service.py:429-444The
budget: ContextBudgetparameter is accepted but never referenced in the method body. Each strategy gets a scoped budget from its allocation tokens, but there is no aggregate enforcement. If multiple strategies each produce fragments up to their allocation, the combined output could exceed the original total budget. The downstreamBudgetPackeris expected to enforce this, but since the packer is a no-op in v1 (DefaultBudgetPackerreturns fragments unchanged at line 510), there is currently no aggregate budget enforcement at all after the executor. Fragments can exceed the budget until a real packer is wired in.TEST COVERAGE GAPS (7 findings)
TEST-1 (High): No test for
plan_idvalidation boundariesThe S1 security fix adds
min_length=1, max_length=256toContextPayload.plan_id, but no BDD scenario tests the boundary conditions: empty string rejection, 1-char acceptance, 256-char acceptance, or 257-char rejection. This is a validation change with zero test coverage.TEST-2 (High): No test for
provenance_mapdefensive copyThe B4 fix deep-copies nested dicts, but no test verifies that mutating a nested dict value after
ContextPayloadcreation does not affect the payload's internal state. Without this, a regression could silently reintroduce the shared-reference issue.TEST-3 (Medium): No test for allocator rounding edge cases
The T3 scenarios test exact 0.8/0.2 splits (which divide cleanly into 800/200). No test exercises odd budgets (e.g., budget=999 with 3 candidates), the zero-confidence equal-split path, or very small budgets where rounding dominates.
TEST-4 (Medium): No test for
assemble()strategy filtering / fallback behaviorThe commit changes
assemble()to pass all strategies to the selector then filter to only the requested one. No scenario tests:(resolved, 1.0)fallback).TEST-5 (Medium): DI selector scenario name/behavior mismatch
features/acms_pipeline.feature:349— The scenario is named "Custom strategy selector that picks only recency" but the_TrackingSelectoratfeatures/steps/acms_pipeline_steps.py:518-531simply delegates toDefaultStrategySelector— it does not pick "only recency". The scenario name is misleading and the test doesn't verify selective behavior, only that the DI hook fires.TEST-6 (Low): SkeletonCompressor tests only exercise test doubles
The T2 scenarios test
DefaultSkeletonCompressor(which is a trivial no-op pass-through) and an inline_TruncatingCompressor(a test double that isn't production code). Neither test exercises any real compression logic. While understandable for v1 stubs, this provides a false sense of coverage.TEST-7 (Low):
strategies_usedassertion is vacuousfeatures/acms_pipeline.feature:356assertsthe payload strategies used should include "relevance", butassemble()atacms_service.py:687hardcodesstrategies_used=(strategy_name,)from the caller's input — not from the selector's output. This assertion will always pass regardless of what the custom selector does, making it a tautological test.SECURITY (2 findings)
SEC-1 (Medium):
plan_idlacks character/pattern validationsrc/cleveragents/domain/models/core/context_fragment.py:160While length is validated, no pattern restriction prevents whitespace-only strings (e.g.
" "), control characters, null bytes, or path-traversal sequences. Ifplan_idis later used in file paths, database queries, or log messages, this opens injection vectors. Add apatternconstraint (e.g.,r"^[\w.:\-/]+$"or similar) or a field validator that strips/rejects non-printable characters.SEC-2 (Low):
strategy_namelogged without sanitizationsrc/cleveragents/application/services/acms_service.py:624-630and lines 672–679The
strategy_nameandplan_idare passed directly tostructlogviaself._logger.info(...). While structlog generally handles structured output safely, if downstream log sinks render as plaintext, craftedplan_idorstrategy_namevalues containing newlines or ANSI escape sequences could cause log injection or log spoofing. This is low severity given structured logging but worth noting.PERFORMANCE (1 finding)
PERF-1 (Low):
DefaultStrategySelectorsorts the full strategy registry on every call, then result is discardedsrc/cleveragents/application/services/acms_service.py:388-393/ line 636select()evaluatescan_handle()and sorts all registered strategies, butassemble()immediately filters the result to a single strategy at line 646. This means Ncan_handle()calls and an N-log-N sort are performed on every assembly, only to keep one result. With 3 built-in strategies this is negligible, but the code path will become wasteful as more strategies are registered. Consider either passing only the resolved strategy to the selector (as before), or cachingcan_handleresults.SPEC COMPLIANCE NOTES (2 findings)
SPEC-1 (Info): Aggregate budget enforcement gap in v1 pipeline
The spec defines the 10-component pipeline with
BudgetPacker(spec ~line 42937:GreedyKnapsackPacker) as the component ensuring budget compliance. In v1, theDefaultBudgetPackeris a no-op (returns fragments unchanged,acms_service.py:505-510), andDefaultStrategyExecutorcreates scoped budgets per-strategy but ignores the aggregate budget. This means the v1 pipeline has no hard budget enforcement after strategy execution. Fragments produced by strategies can exceed the original budget, and the no-op packer will pass them through. This is partially mitigated because each strategy calls_pack_budget()internally, but a custom strategy without that behavior would break the budget guarantee.SPEC-2 (Info):
__init__.py# fmt: skipsuppresses formatter on checkpoint importsrc/cleveragents/domain/models/core/__init__.py:43adds# fmt: skipto the checkpoint import block. This is a minor style concern — suppressing the formatter can mask import ordering issues. The reason for this appears to be preventing the formatter from inserting blank lines between the checkpoint and context import blocks after the duplicatecontext_fragmentimport removal.Summary
acms_service.pyacms_service.pycontext_fragment.pyacms_service.pyacms_service.pycontext_fragment.pycontext_fragment.pyacms_pipeline_steps.pyacms_service.pyacms_pipeline.featureacms_pipeline_steps.pyacms_pipeline.featurecontext_fragment.pyacms_service.pyacms_service.pyacms_service.py__init__.pyRecommended priority: Address BUG-1, BUG-2, SEC-1, TEST-1, and TEST-2 before merge. The remaining items are lower priority but worth tracking.
Code Review — Commit
b5bb147a(third review round)Scope: Bug fixes (allocator rounding, executor skip-zero, deepcopy provenance, name-based comparison), security (plan_id validation), and test additions.
Test status: 50/50 BDD scenarios passing (228 steps, 0 failures).
Summary of Findings
plan_idregex allows path traversal sequences (../)plan_idinSandboxCheckpointandChangeEntrymodelsplan_idtype diverges from spec's ULID requirementprovenance_mapkeyed byfragment_id, spec usesuko_nodeBudgetAllocator,StrategyExecutor) — expected v1 simplification_EmptySelectorfallback test lacks invocation verificationmin_useful_budgetexclusion not enforcedcopy.deepcopyon everyContextPayloadconstructionSEC-1 [HIGH]:
plan_idregex allows path traversalThe regex
^[\w.:/\-]+$onContextPayload.plan_idpermits/and.characters. Values like../../etc/passwdpass validation.All other models in the codebase use the strict ULID pattern
^[0-9A-HJKMNP-TV-Z]{26}$:PlanIdentity.plan_id— ULID patternDecision.plan_id— ULID pattern + validatorCheckpoint.plan_id— ULID patternThe specification (
specification.md:43360-43362) mandates filesystem paths:When those paths are implemented, a
plan_idof../../etcwould yield<data-dir>/checkpoints/../../etc/— a classic path traversal.Recommendation: Constrain to the ULID pattern used everywhere else, or at minimum reject
..sequences and add arealpath()containment check in any future path construction.SEC-2 [MEDIUM]: Unvalidated
plan_idin related modelsSandboxCheckpoint.plan_id(sandbox/checkpoint.py:73) andChangeEntry.plan_id(change.py:181-183) accept arbitrary strings with no validation pattern.SPEC-1 [MEDIUM]:
plan_idtype diverges from specificationThe spec (
specification.md:18158) definesplan_idas type ULID — "Unique, immutable identifier." All example values use ULID format (e.g.,01HXM8C2ZK4Q7C2B3F2R4VYV6J). The new regex accepts lowercase, underscores, dots, colons, slashes, and hyphens — most invalid in Crockford Base32. The BDD test validates with plan_id"x", which is not a valid ULID.If this divergence is intentional (allowing flexible identifiers in the context payload), it should be documented as a known deviation.
SPEC-2 [LOW]:
provenance_mapkey divergenceThe spec (
specification.md:42914) constructs provenance_map as:The implementation uses
fragment_idas key and destructures into a dict of selected fields. Usingfragment_idis arguably safer (multiple fragments can share auko_node), but diverges from the spec. This pre-dates this commit but is closely related to the BUG-3 provenance changes.TEST-GAP-1 [MEDIUM]: No path-traversal test for plan_id
The 5 boundary scenarios test length and whitespace, but none test traversal patterns. Recommended additions:
../../../etc/passwd— should be rejectedplan_id/with/slashes— should be rejected or documented as safea\x00b— verify the regex catches itTEST-GAP-2 [LOW]: Empty selector test lacks invocation check
The
_EmptySelector(TEST-4) doesn't track whetherselect()was called. Compare with_TrackingSelectorwhich sets a flag. The test could pass even if the selector were bypassed entirely.TEST-GAP-3 [LOW]: Allocator tests verify total, not distribution
For the 999/3 scenario, the test asserts
total == 999but not that individual allocations are333or334. A buggy distribution of998, 1, 0would also sum to 999.BUG-EDGE-1 [LOW]:
min_useful_budgetnot enforcedThe spec (
specification.md:42927) states the allocator "Guarantees each strategy receives at leastmin_useful_budgettokens or is excluded" (default 500). The executor only skips strategies withallocated_tokens <= 0. A strategy receiving 100 tokens still executes. The commit docstring correctly references spec §42927, but a log warning or TODO would prevent future callers from assuming the constraint is enforced.BUG-EDGE-2 [LOW]: Potential name collision in strategy comparison
Changing from identity to name-based comparison is correct for wrapped/copied strategies, but if
register_strategyis called with a name matching a built-in, the selector output could contain two different objects with the same name, and the filter at line 674 would include both. Risk is low sinceregister_strategyoverwrites the dict entry.PERF-1 [LOW]:
copy.deepcopyon construction hot path_freeze_provenance_mapcallscopy.deepcopy(v)on everyContextPayloadinstantiation. This is ~10-50x slower than the previous shallow approach. The previous approach failed on nested dicts, sodeepcopyis the correct fix. No action needed unless profiling flags it.DOC-1 [LOW]: Commit message numbering gap
The commit message references TEST-1 through TEST-5 and TEST-7, skipping TEST-6.
Positive Observations
DefaultBudgetAllocatoris mathematically correct and guaranteessum(allocations) == total_budget.max(..., 1)masking.copy.deepcopyproperly addresses the nested-dict mutation bug.@ -0,0 +923,4 @@request: dict[str, Any],) -> list[tuple[Any, float]]:return []TEST-GAP-2 [LOW]: Unlike
_TrackingSelectorwhich records invocation,_EmptySelectorhas no mechanism to verifyselect()was actually called. The fallback test could pass even if the selector were bypassed entirely.Suggestion: Add
self.called = Falsein__init__andself.called = Trueinselect(), then assertcontext.pipeline._strategy_selector.calledin the Then step.@ -0,0 +461,4 @@return []collected: list[ContextFragment] = []for strategy, _confidence, allocated_tokens in allocations:if allocated_tokens <= 0:BUG-EDGE-1 [LOW]: The spec (§42927) states the
ProportionalBudgetAllocator"Guarantees each strategy receives at leastmin_useful_budgettokens or is excluded" (default 500). This guard only skips strategies with 0 tokens. A strategy allocated e.g. 100 tokens would still execute, potentially producing low-quality fragments.Consider adding a log warning or TODO here for when
allocated_tokens < min_useful_budget.@ -0,0 +671,4 @@# for v1 single-strategy execution; future multi-strategy support# will remove this filter. Compare by name rather than identity# so custom selectors that wrap/copy strategy objects still match.candidates = [(s, c) for s, c in candidates if s.name == strategy_name] or [BUG-EDGE-2 [LOW]: Name-based comparison is correct for wrapped/copied strategies, but introduces a subtle edge: if two different strategy objects share the same
name(e.g., a custom selector returns both a built-in and a wrapper with the same name), both would pass this filter. In v1 single-strategy mode this could lead to duplicate execution. Risk is low sinceregister_strategyoverwrites the dict entry by name.@ -0,0 +162,4 @@...,min_length=1,max_length=256,pattern=r"^[\w.:/\-]+$",SEC-1 [HIGH]: This regex
^[\w.:/\-]+$permits path traversal sequences. Aplan_idof../../etc/passwdpasses this validation.Every other model in the codebase (
PlanIdentity,Decision,Checkpoint) uses the strict ULID pattern^[0-9A-HJKMNP-TV-Z]{26}$. The spec (specification.md:18158) explicitly typesplan_idas ULID.The spec also mandates filesystem paths like
<data-dir>/checkpoints/<plan_id>/(line 43360), making this a latent path-traversal vulnerability.Recommendation: Use the ULID pattern
^[0-9A-HJKMNP-TV-Z]{26}$consistent with other models, or at minimum reject..sequences.@ -0,0 +204,4 @@@classmethoddef _freeze_provenance_map(cls, v: dict[str, Any]) -> dict[str, Any]:"""Deep defensive copy so callers cannot mutate nested state."""return copy.deepcopy(v)PERF-1 [LOW]:
copy.deepcopyis ~10-50x slower than the previous shallow-copy approach. This validator runs on the hot path of every context assembly cycle. Correct fix for the nested-dict mutation bug, but worth noting for future profiling if provenance maps grow large.Code Review Report — Commit
9ad3ba11Commit:
9ad3ba11—fix(acms): enforce ULID pattern on plan_id, address reviewer findingsAuthor: Hamza Khyari (
khyarihamza3@gmail.com)Date: 2026-03-05
Branch:
feature/m5-acms-contextIssue: #188 —
feat(acms): add ACMS v1 context pipelineFiles changed: 5 (89 insertions, 59 deletions)
Summary
The commit hardens
ContextPayload.plan_idvalidation from a permissive ASCII pattern (^[a-zA-Z0-9_.:/\-]+$, 1–256 chars) to a strict Crockford Base32 ULID pattern (^[0-9A-HJKMNP-TV-Z]{26}$), consistent withPlanIdentity,Decision, andCheckpointmodels. It also improves test coverage for plan_id boundaries, allocator rounding, and strategy selector fallback verification. The changes are well-motivated and the security improvement is genuine. However, 6 findings were identified that are worth addressing.Findings
F-1:
ULID_PATTERNdefined locally instead of imported — Medium (Maintainability)File:
src/cleveragents/domain/models/core/context_fragment.py:24The commit adds a local
ULID_PATTERNconstant rather than importing from the canonical source. The codebase now has 6 independent definitions of the same pattern:core/plan.py:74core/decision.py:81core/session.py:49core/resource.py:41core/context_fragment.py:24observability/llm_trace.py:24core/checkpoint.py:23plan.pycore/resume.py:21plan.pycheckpoint.pyandresume.pyalready demonstrate the preferred pattern of importing fromplan.py. If the ULID pattern ever needs updating, 6 files must change in sync. A centralized constant (in a sharedconstants.pyorvalidators.py) imported everywhere would eliminate drift risk.Recommendation: Import
ULID_PATTERNfromplan.pyor extract to a shared module.F-2: No early
plan_idvalidation inACMSPipeline.assemble()— Medium (Performance / Fail-fast)File:
src/cleveragents/application/services/acms_service.py—assemble()methodThe
assemble()method acceptsplan_id: strwith no validation. The ULID constraint is only enforced whenContextPayloadis constructed at the very end of the pipeline (after strategy selection, budget allocation, parallel strategy execution, deduplication, depth resolution, scoring, budget packing, ordering, preamble generation, and hash computation).If a caller passes an invalid
plan_id, all pipeline work executes to completion before aValidationErroris raised. This wastes compute and produces a confusing error traceback pointing at the Pydantic model rather than the caller.Recommendation: Add an early guard at the top of
assemble():F-3: Incomplete Crockford excluded-character test coverage — Medium (Test Flaw)
File:
features/acms_pipeline.feature:478-482The scenario is named "Payload rejects plan_id with excluded Crockford Base32 chars (I, L, O, U)" but the test string
"01JQTESTPN0000000000000LIA"only contains L and I. Characters O and U are never tested. If the regex had a defect specifically aroundOorU(e.g., a typo in the character classP-TV-Zthat accidentally includedU), this test would not catch it.Recommendation: Either split into 4 individual scenarios (one per excluded char) or use a
Scenario Outlinewith anExamplestable covering all four:F-4: Non-isolated negative test cases — Low (Test Isolation)
File:
features/acms_pipeline.feature:484-494Two security scenarios reject input based on multiple simultaneous violations, not the specific one claimed:
../../etc/passwdplan/sub:v1.2-beta_rc0000Neither test isolates the targeted behavior. If the character validation were broken but the length check worked, these tests would still pass — giving false confidence.
Recommendation: Use 26-character strings that are valid ULIDs except for the targeted violation:
F-5: Broadened exception catch in test step — Low (Test Quality)
File:
features/steps/acms_pipeline_steps.py:305The change from
except ValueErrortoexcept (ValueError, ValidationError)instep_assemble_with_strategywas necessary because the stricter plan_id pattern now raisesValidationErrorinstead ofValueErrorin some paths. However, this broader catch could mask genuineValueErrorexceptions from pipeline logic (e.g., strategy execution failures) by silently treating them as expected.Recommendation: If the test is specifically validating plan_id rejection, catch
ValidationErroronly. If it needs to handle both pipeline errors and validation errors, consider distinguishing them in the assertion step (e.g., storing the exception type alongside the error).F-6: Coverage verification not confirmed — Low (Process)
Issue #188 requires "Verify coverage >= 97% via
nox -s coverage_report" as a subtask. Thebuild/coverage.xmlfile does not exist on the branch, meaning the coverage session has not been run (or its output was not committed). While coverage artifacts are typically generated in CI and not committed, there is no evidence in the commit or branch that the 97% threshold was verified after these changes.Recommendation: Confirm the coverage session passes after the test changes in this commit. The commit removed two step definitions (
step_create_payload_plan_id_lengthandstep_created_payload_success) — ensure no other feature files reference these steps, which would cause runtime failures reducing coverage.Positive Aspects
[a-zA-Z0-9_.:/\-]+pattern (which allowed slashes and dots).ContextPayload.plan_idnow uses the same validation asPlanIdentity,Decision, andCheckpoint, ensuring consistency across the domain model.self.calledtracking to_EmptySelectorand asserting invocation is a genuine test quality improvement — the previous test only checked output behavior, not that the custom selector code path was exercised.333 or 334) to the 999/3 rounding scenario catches rounding bugs that thetotal == 999check alone would miss.list(context.fragments)defensive copy in test steps prevents accidental mutation during pipeline execution.Verdict
The commit is a solid incremental improvement addressing legitimate security and test quality concerns. The 3 medium-severity findings (F-1 through F-3) are worth addressing before merge — F-1 and F-2 affect production code maintainability and robustness, while F-3 represents a gap in the claimed test coverage. The 3 low-severity findings (F-4 through F-6) are improvements that could be addressed in a follow-up.
ULID_PATTERNduplicated locally instead of importedplan_idvalidation inassemble()— fails lateexceptclause could mask pipeline errorsI am approving the PR on the condition that following issue is solved before merging to master:
F-2: No early plan_id validation in ACMSPipeline.assemble() (Performance / Fail-fast) -- Medium
File: src/cleveragents/application/services/acms_service.py -- assemble() method
The assemble() method accepts plan_id: str with no validation. The ULID constraint is only enforced when ContextPayload is constructed at the very end of the pipeline (after strategy selection, budget allocation, parallel strategy execution, deduplication, depth resolution, scoring, budget packing, ordering, preamble generation, and hash computation).
If a caller passes an invalid plan_id, all pipeline work executes to completion before a ValidationError is raised. This wastes compute and produces a confusing error traceback pointing at the Pydantic model rather than the caller.
Recommendation: Add an early guard at the top of assemble():
import re
if not re.match(ULID_PATTERN, plan_id):
raise ValueError(f"plan_id must be a valid ULID, got {plan_id!r}")
i am approving the PR, on the condition that the following is solved:
F-2 No early plan_id validation in assemble() -- fails late Medium Performance / Fail-fast
New commits pushed, approval review dismissed automatically according to repository settings
43227833ee292419aa0b292419aa0bfebea8950f