feat(acms): add ACMS v1 context pipeline #465

2026-02-27T11:47:42Z

hamza.khyari commented

2026-02-27 11:47:42 +00:00

Summary

Add ACMS v1 context assembly pipeline with ContextFragment, ContextBudget, and ContextPayload frozen Pydantic v2 domain models
Implement ACMSPipeline service with three built-in fusion strategies: relevance (highest score first), recency (most recent first), tiered (hot > warm > cold priority)
Extensible strategy registration via register_strategy() for custom fusion strategies
Budget-constrained assembly that fits fragments within token limits

Testing

17 Behave BDD scenarios / 81 steps (all pass)
Robot Framework smoke tests (robot/acms_pipeline.robot)
ASV benchmarks (benchmarks/acms_pipeline_bench.py)
Lint clean (ruff)

Files Changed

src/cleveragents/domain/models/core/context_fragment.py — Domain models (ContextFragment, ContextBudget, ContextPayload)
src/cleveragents/application/services/acms_service.py — ACMSPipeline with fusion strategies
features/acms_pipeline.feature + features/steps/acms_pipeline_steps.py — BDD tests
robot/acms_pipeline.robot + robot/helper_acms_pipeline.py — Integration tests
benchmarks/acms_pipeline_bench.py — ASV benchmarks
docs/reference/acms.md — Reference documentation
CHANGELOG.md — Updated

Closes #188

## Summary - Add ACMS v1 context assembly pipeline with `ContextFragment`, `ContextBudget`, and `ContextPayload` frozen Pydantic v2 domain models - Implement `ACMSPipeline` service with three built-in fusion strategies: relevance (highest score first), recency (most recent first), tiered (hot > warm > cold priority) - Extensible strategy registration via `register_strategy()` for custom fusion strategies - Budget-constrained assembly that fits fragments within token limits ## Testing - 17 Behave BDD scenarios / 81 steps (all pass) - Robot Framework smoke tests (`robot/acms_pipeline.robot`) - ASV benchmarks (`benchmarks/acms_pipeline_bench.py`) - Lint clean (ruff) ## Files Changed - `src/cleveragents/domain/models/core/context_fragment.py` — Domain models (ContextFragment, ContextBudget, ContextPayload) - `src/cleveragents/application/services/acms_service.py` — ACMSPipeline with fusion strategies - `features/acms_pipeline.feature` + `features/steps/acms_pipeline_steps.py` — BDD tests - `robot/acms_pipeline.robot` + `robot/helper_acms_pipeline.py` — Integration tests - `benchmarks/acms_pipeline_bench.py` — ASV benchmarks - `docs/reference/acms.md` — Reference documentation - `CHANGELOG.md` — Updated Closes #188

hamza.khyari added this to the v3.4.0 milestone 2026-02-27 11:47:42 +00:00

hamza.khyari added the

Type

Feature

label 2026-02-27 11:47:42 +00:00

hamza.khyari self-assigned this 2026-02-27 11:48:47 +00:00

CoreRasurae commented

2026-02-27 15:44:45 +00:00

Code Review Report: Issue #188 — `feat(acms): add ACMS v1 context pipeline`

Branch: feature/m5-acms-context
Author: Hamza Khyari (khyari hamza <khyarihamza3@gmail.com>)
Commits reviewed (3):

SHA	Message	Date
`89bc078`	`feat(acms): add ACMS v1 context pipeline`	2026-02-27 11:35
`c628856`	`fix(acms): address review findings for ACMS context pipeline`	2026-02-27 12:12
`34d3d76`	`style(acms): fix import sorting in core __init__`	2026-02-27 12:24

Files changed: 10 files, +1301 lines
Forgejo issue: #188 — "feat(acms): add ACMS v1 context pipeline"

CRITICAL — Specification Compliance

C1. `ContextFragment` is missing key fields from the specification

Severity: Critical | File: src/cleveragents/domain/models/core/context_fragment.py:17-28

The specification (docs/specification.md ~line 24743) defines ContextFragment with these fields:

# Spec
uko_node: str                    # UKO URI of the source node
content: str
detail_depth: int                # Resolved integer depth
token_count: int                 # Actual token count
relevance_score: float
provenance: FragmentProvenance   # Trace back to resource + location
metadata: dict

The implementation deviates significantly:

uko_node is replaced with source: str — a plain label ("file", "decision") instead of a UKO URI. This removes all UKO integration. The issue title and commit message both claim "UKO and CRP integration" but UKO is absent.
detail_depth is completely missing — a core concept in the spec's multi-level rendering system (MODULE_LISTING(0) through FULL_SOURCE(9)).
provenance: FragmentProvenance is completely missing — the spec mandates traceability back to resource+location for every fragment.
token_count renamed to token_estimate — a semantic difference. The spec tracks actual counts; the implementation stores estimates.

C2. `ContextPayload` diverges from spec's `AssembledContext`

Severity: Critical | File: src/cleveragents/domain/models/core/context_fragment.py:52-71

The spec (~line 24760) defines the output as AssembledContext with:

budget_used: float          # Fraction 0.0-1.0
strategies_used: list[str]  # Which strategies contributed
context_hash: str           # Cryptographic hash for snapshot
preamble: str | None        # Optional structure summary
provenance_map: dict        # Fragment -> resource/location

Missing from ContextPayload:

budget_used — no fractional budget metric
strategies_used — stores a single strategy: str instead of a list (spec allows multi-strategy assembly)
context_hash — no cryptographic hash for snapshot integrity/caching
preamble — no preamble support
provenance_map — no provenance tracking

C3. Pipeline is missing the 10-component architecture

Severity: Critical | File: src/cleveragents/application/services/acms_service.py

The specification defines a 10-component pluggable pipeline across 3 phases:

Phase	Components
Strategy Orchestration	StrategySelector, BudgetAllocator, StrategyExecutor
Fragment Fusion	FragmentDeduplicator, DetailDepthResolver, FragmentScorer, BudgetPacker, FragmentOrderer
Context Finalization	PreambleGenerator, SkeletonCompressor

The implementation collapses all of this into a single assemble() method (~15 lines) that calls strategy.rank() and sums tokens. None of the 10 components exist. There is no deduplication, no depth resolution, no scoring, no ordering, no preamble generation, and no skeleton compression.

C4. `FusionStrategy` protocol is severely stripped down from spec's `ContextStrategy`

Severity: High | File: src/cleveragents/application/services/acms_service.py:27-35

The spec defines:

class ContextStrategy(Protocol):
    name: str
    capabilities: StrategyCapabilities
    def can_handle(request, backends) -> float  # 0.0-1.0 confidence
    def assemble(request, backends, budget, plan_context) -> list[ContextFragment]
    def explain() -> str

The implementation:

class FusionStrategy(Protocol):
    def rank(fragments, budget) -> Sequence[ContextFragment]

Missing: name, capabilities, can_handle, explain, and the ability to query backends. Strategies receive pre-fetched fragments rather than accessing data stores.

C5. CRP acronym is wrong in the reference documentation

Severity: Medium | File: docs/reference/acms.md:5

The documentation says "CRP (Context Relevance Processor)" but the specification consistently defines CRP as "Context Request Protocol" (see docs/specification.md lines 31, 175, 24530, 24777). This is a factual error in the reference docs.

BUG DETECTION

B1. `RecencyFusionStrategy` sorts by string comparison of ISO timestamps

Severity: Medium | File: src/cleveragents/application/services/acms_service.py:63-71

sorted_frags = sorted(fragments, key=lambda f: f.created_at, reverse=True)

created_at is a str field. String comparison of ISO 8601 timestamps works correctly ONLY when all timestamps use the same timezone format. The default factory uses datetime.now(UTC).isoformat(), which produces +00:00 offsets. But externally created fragments (e.g., from deserialized data) might use Z suffix or different offsets like +05:30. Under mixed formats, string sorting produces incorrect ordering. The strategy should parse to datetime objects for comparison.

B2. `ContextBudget` defaults create an implicit minimum `max_tokens` constraint

Severity: Low | File: src/cleveragents/domain/models/core/context_fragment.py:30-44

max_tokens: int = Field(ge=1, default=4096)
reserved_tokens: int = Field(ge=0, default=512)

The ge=1 constraint on max_tokens combined with the default reserved_tokens=512 means ContextBudget(max_tokens=100) raises a ValidationError because 512 >= 100. The effective minimum for max_tokens is 513 (not 1) unless reserved_tokens is explicitly overridden. This is undocumented and surprising.

B3. `token_estimate=0` default allows unlimited fragment inclusion

Severity: Low | File: src/cleveragents/domain/models/core/context_fragment.py:24

Fragments with token_estimate=0 pass every budget check and will all be included. In production, this could lead to payloads with far more fragments than the budget intends to allow if fragments are created without token estimates.

TEST COVERAGE AND TEST FLAWS

T1. Inconsistent error attribute naming in Behave steps

Severity: Medium | File: features/steps/acms_pipeline_steps.py

Validation error steps use two different context attribute names:

context.validation_error — set by step_create_fragment_bad_score (line 168) and step_create_fragment_bad_tokens (line 176)
context.acms_error — set by step_budget_reserved_equals_max (line 148) and step_invalid_tier (line 157)

The assertion step then checks both via getattr:

error = getattr(context, "validation_error", None) or getattr(context, "acms_error", None)

This is fragile. If a previous scenario set context.validation_error and a subsequent scenario only intends to set context.acms_error but the code path doesn't raise, the test still passes because the stale validation_error from a prior scenario is found. Behave's Context does NOT reset custom attributes between scenarios.

T2. No Robot Framework test for the `recency` strategy

Severity: Medium | File: robot/acms_pipeline.robot

Robot tests cover: fragment-create, budget-calc, assemble-relevance, assemble-tiered, payload-budget-check. The recency strategy has zero Robot integration coverage. Correspondingly, robot/helper_acms_pipeline.py has no _cmd_assemble_recency handler.

T3. No ASV benchmark for the `recency` strategy

Severity: Low | File: benchmarks/acms_pipeline_bench.py

The benchmark suite covers relevance (default, lines 92-104) and tiered (lines 116-126) strategies but omits recency. Given that recency sorting relies on string comparison (see B1), performance characteristics under large datasets should be measured.

T4. `plan_id`, `payload_id`, and `assembled_at` are never asserted

Severity: Low | Files: features/steps/acms_pipeline_steps.py, robot/helper_acms_pipeline.py

No Behave scenario or Robot test verifies that plan_id is correctly propagated to the payload, that payload_id is generated, or that assembled_at is set. These are structural fields of ContextPayload with no test coverage.

T5. No test for overwriting a built-in strategy name

Severity: Low | File: features/acms_pipeline.feature

register_strategy("relevance", SomeOtherClass) silently overwrites the built-in. There is no test verifying this behavior or guarding against it.

T6. No test for `reserved_tokens > max_tokens` (only `==` is tested)

Severity: Low | File: features/acms_pipeline.feature:71

The scenario "Budget rejects reserved_tokens >= max_tokens" only tests the == case (max_tokens=100, reserved_tokens=100). The > case (e.g., max_tokens=50, reserved_tokens=100) is untested, though the validator uses >=.

PERFORMANCE

P1. Strategy objects are instantiated on every `assemble()` call

Severity: Low | File: src/cleveragents/application/services/acms_service.py:145-146

strategy_cls = self._strategies[strategy_name]
ranker: FusionStrategy = strategy_cls()

All three built-in strategies are stateless. Creating a new instance per call adds unnecessary GC pressure on hot paths. Strategy instances could be cached after first creation.

P2. No early termination when budget is fully consumed

Severity: Low | File: src/cleveragents/application/services/acms_service.py:49-53 (and similar in all strategies)

for frag in sorted_frags:
    if total + frag.token_estimate <= budget.available_tokens:
        result.append(frag)
        total += frag.token_estimate

The loop continues iterating all remaining fragments even after total == budget.available_tokens. An elif total >= budget.available_tokens: break would avoid unnecessary iterations on large fragment sets where the budget fills early.

SECURITY

S1. `source` field has no validation against allowed values

Severity: Low | File: src/cleveragents/domain/models/core/context_fragment.py:20

The docs and spec indicate source should be one of "file", "decision", "resource", "user". The field is a plain str with no constraint. For a domain model, this should be Literal["file", "decision", "resource", "user"] or at least validated.

S2. No size limits on `content` or `metadata`

Severity: Low | File: src/cleveragents/domain/models/core/context_fragment.py:21,26

content: str and metadata: dict[str, str] have no upper-bound constraints. A caller can create fragments with arbitrarily large content strings or metadata dicts, potentially exhausting memory. Production systems should enforce max lengths.

ARCHITECTURAL / STYLE

A1. `ACMSPipeline` does not follow the project's service conventions

Severity: Medium | File: src/cleveragents/application/services/acms_service.py:104-126

Every other service in application/services/ follows the pattern:

class SomeService:
    def __init__(self, settings: object, unit_of_work: UnitOfWork) -> None:
        self.settings = settings
        self.unit_of_work = unit_of_work
        self._logger = structlog.get_logger(__name__).bind(service="some")

ACMSPipeline takes only default_strategy: str, has no settings/unit_of_work injection, and has zero logging. No structlog calls anywhere in the service. This will make DI integration, observability, and debugging harder as the pipeline matures.

A2. Tiers are fragment tags, not actual storage tiers

Severity: Medium | Conceptual

The spec defines hot/warm/cold as storage tiers with retention policies (warm: 24h, cold: 90d). The implementation treats tier as a simple label on ContextFragment used only for sorting priority. There is no storage tier system, no retention, no promotion/demotion. This gap should be acknowledged in the documentation.

SUMMARY TABLE

#	Category	Severity	Finding
C1	Spec compliance	Critical	`ContextFragment` missing `uko_node`, `detail_depth`, `provenance`, renames `token_count`
C2	Spec compliance	Critical	`ContextPayload` missing `budget_used`, `strategies_used`, `context_hash`, `preamble`, `provenance_map`
C3	Spec compliance	Critical	10-component pipeline architecture not implemented
C4	Spec compliance	High	`FusionStrategy` severely stripped from spec's `ContextStrategy`
C5	Spec compliance	Medium	CRP expanded as "Context Relevance Processor" instead of "Context Request Protocol" in docs
B1	Bug	Medium	Recency strategy sorts timestamps as strings; breaks with mixed timezone formats
B2	Bug	Low	Implicit min `max_tokens=513` due to default `reserved_tokens`
B3	Bug	Low	`token_estimate=0` default allows unlimited fragments
T1	Test flaw	Medium	Inconsistent error attribute names risk false-positive assertions
T2	Test coverage	Medium	No Robot test for `recency` strategy
T3	Test coverage	Low	No ASV benchmark for `recency` strategy
T4	Test coverage	Low	`plan_id`, `payload_id`, `assembled_at` never asserted
T5	Test coverage	Low	No test for overwriting built-in strategy names
T6	Test coverage	Low	Budget validation only tests `==` case, not `>`
P1	Performance	Low	Strategy instantiated on every `assemble()` call
P2	Performance	Low	No early termination when budget is fully consumed
S1	Security	Low	`source` field accepts arbitrary strings
S2	Security	Low	No size limits on `content` or `metadata`
A1	Architecture	Medium	Service lacks `structlog`, DI pattern (`settings`/`unit_of_work`)
A2	Architecture	Medium	Tiers are sort-labels, not actual storage tiers as spec requires

Critical items (C1-C3) represent fundamental gaps between the implementation and the specification. The commit message claims "UKO and CRP integration" but neither UKO nor CRP is present. The "v1" framing may justify a simplified approach, but the delta between the implementation and the spec should be explicitly documented as known limitations so the milestone tracking is accurate.

# Code Review Report: Issue #188 — `feat(acms): add ACMS v1 context pipeline` **Branch:** `feature/m5-acms-context` **Author:** Hamza Khyari (`khyari hamza <khyarihamza3@gmail.com>`) **Commits reviewed (3):** | SHA | Message | Date | |-----|---------|------| | `89bc078` | `feat(acms): add ACMS v1 context pipeline` | 2026-02-27 11:35 | | `c628856` | `fix(acms): address review findings for ACMS context pipeline` | 2026-02-27 12:12 | | `34d3d76` | `style(acms): fix import sorting in core __init__` | 2026-02-27 12:24 | **Files changed:** 10 files, +1301 lines **Forgejo issue:** #188 — "feat(acms): add ACMS v1 context pipeline" --- ## CRITICAL — Specification Compliance ### C1. `ContextFragment` is missing key fields from the specification **Severity: Critical** | **File:** `src/cleveragents/domain/models/core/context_fragment.py:17-28` The specification (`docs/specification.md` ~line 24743) defines `ContextFragment` with these fields: ```python # Spec uko_node: str # UKO URI of the source node content: str detail_depth: int # Resolved integer depth token_count: int # Actual token count relevance_score: float provenance: FragmentProvenance # Trace back to resource + location metadata: dict ``` The implementation deviates significantly: - **`uko_node` is replaced with `source: str`** — a plain label (`"file"`, `"decision"`) instead of a UKO URI. This removes all UKO integration. The issue title and commit message both claim "UKO and CRP integration" but UKO is absent. - **`detail_depth` is completely missing** — a core concept in the spec's multi-level rendering system (MODULE_LISTING(0) through FULL_SOURCE(9)). - **`provenance: FragmentProvenance` is completely missing** — the spec mandates traceability back to resource+location for every fragment. - **`token_count` renamed to `token_estimate`** — a semantic difference. The spec tracks actual counts; the implementation stores estimates. ### C2. `ContextPayload` diverges from spec's `AssembledContext` **Severity: Critical** | **File:** `src/cleveragents/domain/models/core/context_fragment.py:52-71` The spec (~line 24760) defines the output as `AssembledContext` with: ``` budget_used: float # Fraction 0.0-1.0 strategies_used: list[str] # Which strategies contributed context_hash: str # Cryptographic hash for snapshot preamble: str | None # Optional structure summary provenance_map: dict # Fragment -> resource/location ``` Missing from `ContextPayload`: - **`budget_used`** — no fractional budget metric - **`strategies_used`** — stores a single `strategy: str` instead of a list (spec allows multi-strategy assembly) - **`context_hash`** — no cryptographic hash for snapshot integrity/caching - **`preamble`** — no preamble support - **`provenance_map`** — no provenance tracking ### C3. Pipeline is missing the 10-component architecture **Severity: Critical** | **File:** `src/cleveragents/application/services/acms_service.py` The specification defines a 10-component pluggable pipeline across 3 phases: | Phase | Components | |-------|-----------| | Strategy Orchestration | StrategySelector, BudgetAllocator, StrategyExecutor | | Fragment Fusion | FragmentDeduplicator, DetailDepthResolver, FragmentScorer, BudgetPacker, FragmentOrderer | | Context Finalization | PreambleGenerator, SkeletonCompressor | The implementation collapses all of this into a single `assemble()` method (~15 lines) that calls `strategy.rank()` and sums tokens. None of the 10 components exist. There is no deduplication, no depth resolution, no scoring, no ordering, no preamble generation, and no skeleton compression. ### C4. `FusionStrategy` protocol is severely stripped down from spec's `ContextStrategy` **Severity: High** | **File:** `src/cleveragents/application/services/acms_service.py:27-35` The spec defines: ```python class ContextStrategy(Protocol): name: str capabilities: StrategyCapabilities def can_handle(request, backends) -> float # 0.0-1.0 confidence def assemble(request, backends, budget, plan_context) -> list[ContextFragment] def explain() -> str ``` The implementation: ```python class FusionStrategy(Protocol): def rank(fragments, budget) -> Sequence[ContextFragment] ``` Missing: `name`, `capabilities`, `can_handle`, `explain`, and the ability to query backends. Strategies receive pre-fetched fragments rather than accessing data stores. ### C5. CRP acronym is wrong in the reference documentation **Severity: Medium** | **File:** `docs/reference/acms.md:5` The documentation says **"CRP (Context Relevance Processor)"** but the specification consistently defines CRP as **"Context Request Protocol"** (see `docs/specification.md` lines 31, 175, 24530, 24777). This is a factual error in the reference docs. --- ## BUG DETECTION ### B1. `RecencyFusionStrategy` sorts by string comparison of ISO timestamps **Severity: Medium** | **File:** `src/cleveragents/application/services/acms_service.py:63-71` ```python sorted_frags = sorted(fragments, key=lambda f: f.created_at, reverse=True) ``` `created_at` is a `str` field. String comparison of ISO 8601 timestamps works correctly ONLY when all timestamps use the same timezone format. The default factory uses `datetime.now(UTC).isoformat()`, which produces `+00:00` offsets. But externally created fragments (e.g., from deserialized data) might use `Z` suffix or different offsets like `+05:30`. Under mixed formats, string sorting produces incorrect ordering. The strategy should parse to `datetime` objects for comparison. ### B2. `ContextBudget` defaults create an implicit minimum `max_tokens` constraint **Severity: Low** | **File:** `src/cleveragents/domain/models/core/context_fragment.py:30-44` ```python max_tokens: int = Field(ge=1, default=4096) reserved_tokens: int = Field(ge=0, default=512) ``` The `ge=1` constraint on `max_tokens` combined with the default `reserved_tokens=512` means `ContextBudget(max_tokens=100)` raises a `ValidationError` because 512 >= 100. The effective minimum for `max_tokens` is 513 (not 1) unless `reserved_tokens` is explicitly overridden. This is undocumented and surprising. ### B3. `token_estimate=0` default allows unlimited fragment inclusion **Severity: Low** | **File:** `src/cleveragents/domain/models/core/context_fragment.py:24` Fragments with `token_estimate=0` pass every budget check and will all be included. In production, this could lead to payloads with far more fragments than the budget intends to allow if fragments are created without token estimates. --- ## TEST COVERAGE AND TEST FLAWS ### T1. Inconsistent error attribute naming in Behave steps **Severity: Medium** | **File:** `features/steps/acms_pipeline_steps.py` Validation error steps use two different context attribute names: - `context.validation_error` — set by `step_create_fragment_bad_score` (line 168) and `step_create_fragment_bad_tokens` (line 176) - `context.acms_error` — set by `step_budget_reserved_equals_max` (line 148) and `step_invalid_tier` (line 157) The assertion step then checks both via `getattr`: ```python error = getattr(context, "validation_error", None) or getattr(context, "acms_error", None) ``` This is fragile. If a previous scenario set `context.validation_error` and a subsequent scenario only intends to set `context.acms_error` but the code path doesn't raise, the test still passes because the stale `validation_error` from a prior scenario is found. Behave's `Context` does NOT reset custom attributes between scenarios. ### T2. No Robot Framework test for the `recency` strategy **Severity: Medium** | **File:** `robot/acms_pipeline.robot` Robot tests cover: `fragment-create`, `budget-calc`, `assemble-relevance`, `assemble-tiered`, `payload-budget-check`. The `recency` strategy has zero Robot integration coverage. Correspondingly, `robot/helper_acms_pipeline.py` has no `_cmd_assemble_recency` handler. ### T3. No ASV benchmark for the `recency` strategy **Severity: Low** | **File:** `benchmarks/acms_pipeline_bench.py` The benchmark suite covers relevance (default, lines 92-104) and tiered (lines 116-126) strategies but omits `recency`. Given that recency sorting relies on string comparison (see B1), performance characteristics under large datasets should be measured. ### T4. `plan_id`, `payload_id`, and `assembled_at` are never asserted **Severity: Low** | **Files:** `features/steps/acms_pipeline_steps.py`, `robot/helper_acms_pipeline.py` No Behave scenario or Robot test verifies that `plan_id` is correctly propagated to the payload, that `payload_id` is generated, or that `assembled_at` is set. These are structural fields of `ContextPayload` with no test coverage. ### T5. No test for overwriting a built-in strategy name **Severity: Low** | **File:** `features/acms_pipeline.feature` `register_strategy("relevance", SomeOtherClass)` silently overwrites the built-in. There is no test verifying this behavior or guarding against it. ### T6. No test for `reserved_tokens > max_tokens` (only `==` is tested) **Severity: Low** | **File:** `features/acms_pipeline.feature:71` The scenario "Budget rejects reserved_tokens >= max_tokens" only tests the `==` case (`max_tokens=100, reserved_tokens=100`). The `>` case (e.g., `max_tokens=50, reserved_tokens=100`) is untested, though the validator uses `>=`. --- ## PERFORMANCE ### P1. Strategy objects are instantiated on every `assemble()` call **Severity: Low** | **File:** `src/cleveragents/application/services/acms_service.py:145-146` ```python strategy_cls = self._strategies[strategy_name] ranker: FusionStrategy = strategy_cls() ``` All three built-in strategies are stateless. Creating a new instance per call adds unnecessary GC pressure on hot paths. Strategy instances could be cached after first creation. ### P2. No early termination when budget is fully consumed **Severity: Low** | **File:** `src/cleveragents/application/services/acms_service.py:49-53` (and similar in all strategies) ```python for frag in sorted_frags: if total + frag.token_estimate <= budget.available_tokens: result.append(frag) total += frag.token_estimate ``` The loop continues iterating all remaining fragments even after `total == budget.available_tokens`. An `elif total >= budget.available_tokens: break` would avoid unnecessary iterations on large fragment sets where the budget fills early. --- ## SECURITY ### S1. `source` field has no validation against allowed values **Severity: Low** | **File:** `src/cleveragents/domain/models/core/context_fragment.py:20` The docs and spec indicate `source` should be one of `"file"`, `"decision"`, `"resource"`, `"user"`. The field is a plain `str` with no constraint. For a domain model, this should be `Literal["file", "decision", "resource", "user"]` or at least validated. ### S2. No size limits on `content` or `metadata` **Severity: Low** | **File:** `src/cleveragents/domain/models/core/context_fragment.py:21,26` `content: str` and `metadata: dict[str, str]` have no upper-bound constraints. A caller can create fragments with arbitrarily large content strings or metadata dicts, potentially exhausting memory. Production systems should enforce max lengths. --- ## ARCHITECTURAL / STYLE ### A1. `ACMSPipeline` does not follow the project's service conventions **Severity: Medium** | **File:** `src/cleveragents/application/services/acms_service.py:104-126` Every other service in `application/services/` follows the pattern: ```python class SomeService: def __init__(self, settings: object, unit_of_work: UnitOfWork) -> None: self.settings = settings self.unit_of_work = unit_of_work self._logger = structlog.get_logger(__name__).bind(service="some") ``` `ACMSPipeline` takes only `default_strategy: str`, has no `settings`/`unit_of_work` injection, and has **zero logging**. No `structlog` calls anywhere in the service. This will make DI integration, observability, and debugging harder as the pipeline matures. ### A2. Tiers are fragment tags, not actual storage tiers **Severity: Medium** | **Conceptual** The spec defines hot/warm/cold as **storage tiers** with retention policies (warm: 24h, cold: 90d). The implementation treats `tier` as a simple label on `ContextFragment` used only for sorting priority. There is no storage tier system, no retention, no promotion/demotion. This gap should be acknowledged in the documentation. --- ## SUMMARY TABLE | # | Category | Severity | Finding | |---|----------|----------|---------| | C1 | Spec compliance | **Critical** | `ContextFragment` missing `uko_node`, `detail_depth`, `provenance`, renames `token_count` | | C2 | Spec compliance | **Critical** | `ContextPayload` missing `budget_used`, `strategies_used`, `context_hash`, `preamble`, `provenance_map` | | C3 | Spec compliance | **Critical** | 10-component pipeline architecture not implemented | | C4 | Spec compliance | High | `FusionStrategy` severely stripped from spec's `ContextStrategy` | | C5 | Spec compliance | Medium | CRP expanded as "Context Relevance Processor" instead of "Context Request Protocol" in docs | | B1 | Bug | Medium | Recency strategy sorts timestamps as strings; breaks with mixed timezone formats | | B2 | Bug | Low | Implicit min `max_tokens=513` due to default `reserved_tokens` | | B3 | Bug | Low | `token_estimate=0` default allows unlimited fragments | | T1 | Test flaw | Medium | Inconsistent error attribute names risk false-positive assertions | | T2 | Test coverage | Medium | No Robot test for `recency` strategy | | T3 | Test coverage | Low | No ASV benchmark for `recency` strategy | | T4 | Test coverage | Low | `plan_id`, `payload_id`, `assembled_at` never asserted | | T5 | Test coverage | Low | No test for overwriting built-in strategy names | | T6 | Test coverage | Low | Budget validation only tests `==` case, not `>` | | P1 | Performance | Low | Strategy instantiated on every `assemble()` call | | P2 | Performance | Low | No early termination when budget is fully consumed | | S1 | Security | Low | `source` field accepts arbitrary strings | | S2 | Security | Low | No size limits on `content` or `metadata` | | A1 | Architecture | Medium | Service lacks `structlog`, DI pattern (`settings`/`unit_of_work`) | | A2 | Architecture | Medium | Tiers are sort-labels, not actual storage tiers as spec requires | **Critical items (C1-C3)** represent fundamental gaps between the implementation and the specification. The commit message claims "UKO and CRP integration" but neither UKO nor CRP is present. The "v1" framing may justify a simplified approach, but the delta between the implementation and the spec should be explicitly documented as known limitations so the milestone tracking is accurate.

freemo added the

labels 2026-03-01 04:36:08 +00:00

CoreRasurae commented

2026-03-04 15:30:34 +00:00

Code Review Report -- Commit `8c6b4122` (branch `feature/m5-acms-context`)

Reviewed commit: 8c6b4122 -- fix(acms): length-prefix fragment content in context hash
Author: Hamza Khyari
Forgejo Issue: #188 -- feat(acms): add ACMS v1 context pipeline
Files in branch diff (vs origin/master): 10 files, +2,266 lines
Review scope: test coverage, test flaws, bugs, spec alignment, performance, security

1. TEST COVERAGE GAPS (4 findings)

T1 [HIGH] -- No dedicated test for `compute_context_hash` verifying the fix this commit introduces

Location: src/cleveragents/domain/models/core/context_fragment.py:208-220

This function is the sole target of commit 8c6b4122, yet no test verifies the collision the commit claims to fix.

The existing test (features/steps/acms_pipeline_steps.py:459-464) only asserts the hash is non-empty and 64 chars:

assert context.payload.context_hash
assert len(context.payload.context_hash) == 64

Missing test cases:

Boundary-split collision: ["AB", "C"] must produce a different hash than ["A", "BC"]. This is the exact vulnerability this commit fixes -- yet no test proves it.
Determinism: same fragments in same order must always produce the same hash.
Order sensitivity: ["A", "B"] vs ["B", "A"] must differ.
Empty tuple: compute_context_hash(()) should return a stable hash.

T2 [MEDIUM] -- No test with multi-byte UTF-8 content

The fix uses len(encoded) (byte length) as the prefix, which correctly differs from character length for multi-byte sequences. However, no test exercises content like "\u00e9" (2 bytes), "\u4e16" (3 bytes), or "\U0001f600" (4 bytes) to confirm the byte-length prefix handles them correctly.

T3 [MEDIUM] -- Robot helper hash assertion is weaker than Behave equivalent

robot/helper_acms_pipeline.py:230-231 checks only not payload.context_hash (truthy). Unlike the Behave step, it does not verify the length is 64 or that the string is valid hex. A corrupt hash could pass undetected in Robot tests.

T4 [LOW] -- Benchmark does not exercise `compute_context_hash` in isolation

benchmarks/acms_pipeline_bench.py benchmarks pipeline.assemble throughput, which includes hashing. After this commit added a .to_bytes(8, "big") call per fragment, the hash function now does more work. There is no isolated benchmark to detect regressions in the hash function itself (e.g., at 10,000+ fragments, matching the milestone's "Projects with 10,000+ files" criterion from the Milestone v3.4.0 description).

2. BUGS (3 findings)

B1 [HIGH] -- `provenance_map` keying deviates from spec

Spec (docs/specification.md:42914):

provenance_map={f.uko_node: f.provenance for f in ordered}

Implementation (src/cleveragents/domain/models/core/context_fragment.py:223-234):

return {frag.fragment_id: {...} for frag in fragments}

The spec keys by uko_node (UKO URI); the implementation keys by fragment_id (ULID). This is a semantic mismatch: consumers reading the spec would look up by UKO URI but would find ULIDs instead. Additionally, multiple fragments can share the same uko_node (at different depths), so the spec's keying could lose entries -- but the implementation should document this deviation explicitly if intentional.

B2 [MEDIUM] -- `strategies_used` and `provenance_map` are mutable inside a frozen model

ContextPayload is declared frozen=True, and the docstring calls these "frozen Pydantic v2 value objects." However, strategies_used: list[str] and provenance_map: dict[str, Any] are mutable containers. Pydantic's frozen=True only prevents attribute reassignment; it does not prevent:

payload.strategies_used.append("injected")       # succeeds
payload.provenance_map["fake"] = {"evil": True}   # succeeds

This violates the stated immutability contract and could corrupt snapshot integrity (the context_hash would no longer match the actual payload state). The same issue applies to ContextFragment.metadata: dict[str, str].

Fix: Use tuple[str, ...] instead of list[str] for strategies_used, and use MappingProxyType or Pydantic's frozen-dict mechanism for the dicts, or at minimum add a model_validator that freezes them.

B3 [LOW] -- Dead code in `budget_used` calculation

src/cleveragents/application/services/acms_service.py:438:

budget_used = total_tokens / available if available > 0 else 0.0

The else 0.0 branch is unreachable. ContextBudget validates reserved_tokens < max_tokens and max_tokens >= 1, guaranteeing available_tokens >= 1. The dead branch obscures the invariant; removing it (or adding an assertion) would make the code clearer.

3. SPEC ALIGNMENT (4 findings)

S1 [MEDIUM] -- `ContextStrategy` protocol method signatures differ from spec

Method	Spec (`docs/specification.md:25176-25189`)	Implementation (`acms_service.py:78-92`)
`can_handle`	`(request: ContextRequest, backends: BackendSet) -> float`	`(request: dict[str, Any]) -> float`
`assemble`	`(request: ContextRequest, backends: BackendSet, budget: int, plan_context: PlanContext) -> list[ContextFragment]`	`(fragments: Sequence[ContextFragment], budget: ContextBudget) -> Sequence[ContextFragment]`

The v1 simplification passes pre-fetched fragments and a ContextBudget object rather than raw request/backend parameters. This is reasonable for v1 but should be documented in docs/reference/acms.md as a known deviation with a path to spec conformance.

S2 [MEDIUM] -- `StrategyCapabilities` field names differ from spec

Spec (`docs/specification.md:25193-25204`)	Implementation (`acms_service.py:48-54`)
`uses_text`, `uses_vector`, `uses_graph`, `uses_temporal`, `uko_levels`, `resource_types`, `supports_depth_breadth`, `supports_plan_hierarchy`, `quality_score`	`supports_semantic_search`, `supports_graph_navigation`, `supports_temporal_archaeology`, `max_fragments`

Many spec-defined fields are missing; the implementation has its own field names. This prevents future interop with spec-compliant strategies without an adapter.

S3 [LOW] -- Pipeline component protocols have simplified signatures vs spec

Several Phase 2/3 protocol method signatures omit parameters present in the spec:

DetailDepthResolver.resolve() -- missing budget parameter (spec: docs/specification.md:42731-42738)
FragmentScorer.score() -- missing plan_context param, returns Sequence[ContextFragment] instead of list[ScoredFragment]
PreambleGenerator.generate() -- missing strategies_used, budget_used, max_tokens params (spec: docs/specification.md:42795-42803)

These are acceptable v1 simplifications but should be called out in docs/reference/acms.md under a "v1 limitations" section for the future upgrade path.

S4 [LOW] -- `ContextFragment` model adds fields beyond spec

The implementation adds fragment_id, tier, and created_at which are not in the spec's ContextFragment definition (docs/specification.md:25081-25089). These are useful additions but represent scope beyond the spec dataclass. Worth documenting.

4. PERFORMANCE (2 findings)

P1 [LOW] -- `_pack_budget` skips fragments that don't fit instead of trying smaller ones

src/cleveragents/application/services/acms_service.py:108-113: The early-termination guard at line 109 (if total >= available: break) causes the loop to exit once the budget is fully consumed, even if subsequent fragments with token_count=0 exist. Zero-token fragments (e.g., empty structural markers) would be incorrectly excluded. This is an unlikely but valid edge case.

P2 [LOW] -- Double iteration to compute total_tokens

src/cleveragents/application/services/acms_service.py:436 recomputes sum(f.token_count for f in final_fragments), but _pack_budget already computes the running total internally and discards it. For very large fragment lists, this is a wasted O(n) pass. Could be addressed by having _pack_budget return a (list, total) tuple.

5. SECURITY (1 finding)

SEC1 [LOW] -- `register_strategy` accepts unvalidated objects

src/cleveragents/application/services/acms_service.py:463-470: The method inserts strategy into self._strategies without validating it satisfies the ContextStrategy protocol. Since ContextStrategy is @runtime_checkable, an isinstance check would be trivial and would prevent late AttributeError explosions at assembly time:

if not isinstance(strategy, ContextStrategy):
    raise TypeError(f"Expected ContextStrategy, got {type(strategy).__name__}")

6. TEST FLAWS (1 finding)

TF1 [MEDIUM] -- BDD test for "non-existent strategy" catches wrong exception type

features/steps/acms_pipeline_steps.py:290-302 catches ValueError from assemble, stored in context.assemble_error. The assertion at line 404-407 checks context.assemble_error is not None. However, the When step catches ValueError but does NOT catch KeyError, AttributeError, or other exceptions that a malformed or missing strategy could raise. If the implementation were refactored to raise a different exception type, the test would pass with an uncaught exception (Behave would report it as a step failure, not an assertion failure), masking the real issue.

Summary Table

ID	Category	Severity	Description
T1	Test Coverage	HIGH	No test verifies the length-prefix collision fix (the commit's reason for existence)
T2	Test Coverage	MEDIUM	No multi-byte UTF-8 content in hash tests
T3	Test Coverage	MEDIUM	Robot hash assertion weaker than Behave
T4	Test Coverage	LOW	No isolated hash benchmark for performance regression
B1	Bug	HIGH	`provenance_map` keyed by `fragment_id` vs spec's `uko_node`
B2	Bug	MEDIUM	Mutable `list`/`dict` in frozen models violates immutability contract
B3	Bug	LOW	Dead `else 0.0` branch in budget_used calculation
S1	Spec Alignment	MEDIUM	`ContextStrategy` method signatures differ from spec
S2	Spec Alignment	MEDIUM	`StrategyCapabilities` field names differ from spec
S3	Spec Alignment	LOW	Pipeline component protocol signatures simplified vs spec
S4	Spec Alignment	LOW	Extra fields on `ContextFragment` not in spec
P1	Performance	LOW	Zero-token fragments skipped by early-termination guard
P2	Performance	LOW	Redundant O(n) pass to re-sum token counts
SEC1	Security	LOW	`register_strategy` accepts unvalidated objects
TF1	Test Flaw	MEDIUM	Exception type mismatch risk in strategy error test

High-priority items to address before merge: T1 (test the fix this commit introduces) and B1 (provenance_map keying contradiction with spec).

# Code Review Report -- Commit `8c6b4122` (branch `feature/m5-acms-context`) **Reviewed commit:** `8c6b4122` -- `fix(acms): length-prefix fragment content in context hash` **Author:** Hamza Khyari **Forgejo Issue:** #188 -- `feat(acms): add ACMS v1 context pipeline` **Files in branch diff (vs origin/master):** 10 files, +2,266 lines **Review scope:** test coverage, test flaws, bugs, spec alignment, performance, security --- ## 1. TEST COVERAGE GAPS (4 findings) ### T1 [HIGH] -- No dedicated test for `compute_context_hash` verifying the fix this commit introduces **Location:** `src/cleveragents/domain/models/core/context_fragment.py:208-220` This function is the *sole target* of commit `8c6b4122`, yet no test verifies the collision the commit claims to fix. The existing test (`features/steps/acms_pipeline_steps.py:459-464`) only asserts the hash is non-empty and 64 chars: ```python assert context.payload.context_hash assert len(context.payload.context_hash) == 64 ``` Missing test cases: - **Boundary-split collision**: `["AB", "C"]` must produce a different hash than `["A", "BC"]`. This is the exact vulnerability this commit fixes -- yet no test proves it. - **Determinism**: same fragments in same order must always produce the same hash. - **Order sensitivity**: `["A", "B"]` vs `["B", "A"]` must differ. - **Empty tuple**: `compute_context_hash(())` should return a stable hash. ### T2 [MEDIUM] -- No test with multi-byte UTF-8 content The fix uses `len(encoded)` (byte length) as the prefix, which correctly differs from character length for multi-byte sequences. However, no test exercises content like `"\u00e9"` (2 bytes), `"\u4e16"` (3 bytes), or `"\U0001f600"` (4 bytes) to confirm the byte-length prefix handles them correctly. ### T3 [MEDIUM] -- Robot helper hash assertion is weaker than Behave equivalent `robot/helper_acms_pipeline.py:230-231` checks only `not payload.context_hash` (truthy). Unlike the Behave step, it does not verify the length is 64 or that the string is valid hex. A corrupt hash could pass undetected in Robot tests. ### T4 [LOW] -- Benchmark does not exercise `compute_context_hash` in isolation `benchmarks/acms_pipeline_bench.py` benchmarks `pipeline.assemble` throughput, which includes hashing. After this commit added a `.to_bytes(8, "big")` call per fragment, the hash function now does more work. There is no isolated benchmark to detect regressions in the hash function itself (e.g., at 10,000+ fragments, matching the milestone's "Projects with 10,000+ files" criterion from the Milestone v3.4.0 description). --- ## 2. BUGS (3 findings) ### B1 [HIGH] -- `provenance_map` keying deviates from spec **Spec** (`docs/specification.md:42914`): ```python provenance_map={f.uko_node: f.provenance for f in ordered} ``` **Implementation** (`src/cleveragents/domain/models/core/context_fragment.py:223-234`): ```python return {frag.fragment_id: {...} for frag in fragments} ``` The spec keys by `uko_node` (UKO URI); the implementation keys by `fragment_id` (ULID). This is a semantic mismatch: consumers reading the spec would look up by UKO URI but would find ULIDs instead. Additionally, multiple fragments can share the same `uko_node` (at different depths), so the spec's keying could lose entries -- but the implementation should document this deviation explicitly if intentional. ### B2 [MEDIUM] -- `strategies_used` and `provenance_map` are mutable inside a frozen model `ContextPayload` is declared `frozen=True`, and the docstring calls these "frozen Pydantic v2 value objects." However, `strategies_used: list[str]` and `provenance_map: dict[str, Any]` are mutable containers. Pydantic's `frozen=True` only prevents attribute reassignment; it does **not** prevent: ```python payload.strategies_used.append("injected") # succeeds payload.provenance_map["fake"] = {"evil": True} # succeeds ``` This violates the stated immutability contract and could corrupt snapshot integrity (the `context_hash` would no longer match the actual payload state). The same issue applies to `ContextFragment.metadata: dict[str, str]`. **Fix:** Use `tuple[str, ...]` instead of `list[str]` for `strategies_used`, and use `MappingProxyType` or Pydantic's frozen-dict mechanism for the dicts, or at minimum add a `model_validator` that freezes them. ### B3 [LOW] -- Dead code in `budget_used` calculation `src/cleveragents/application/services/acms_service.py:438`: ```python budget_used = total_tokens / available if available > 0 else 0.0 ``` The `else 0.0` branch is unreachable. `ContextBudget` validates `reserved_tokens < max_tokens` and `max_tokens >= 1`, guaranteeing `available_tokens >= 1`. The dead branch obscures the invariant; removing it (or adding an assertion) would make the code clearer. --- ## 3. SPEC ALIGNMENT (4 findings) ### S1 [MEDIUM] -- `ContextStrategy` protocol method signatures differ from spec | Method | Spec (`docs/specification.md:25176-25189`) | Implementation (`acms_service.py:78-92`) | |--------|-----|-----| | `can_handle` | `(request: ContextRequest, backends: BackendSet) -> float` | `(request: dict[str, Any]) -> float` | | `assemble` | `(request: ContextRequest, backends: BackendSet, budget: int, plan_context: PlanContext) -> list[ContextFragment]` | `(fragments: Sequence[ContextFragment], budget: ContextBudget) -> Sequence[ContextFragment]` | The v1 simplification passes pre-fetched fragments and a `ContextBudget` object rather than raw request/backend parameters. This is reasonable for v1 but should be documented in `docs/reference/acms.md` as a known deviation with a path to spec conformance. ### S2 [MEDIUM] -- `StrategyCapabilities` field names differ from spec | Spec (`docs/specification.md:25193-25204`) | Implementation (`acms_service.py:48-54`) | |-----|-----| | `uses_text`, `uses_vector`, `uses_graph`, `uses_temporal`, `uko_levels`, `resource_types`, `supports_depth_breadth`, `supports_plan_hierarchy`, `quality_score` | `supports_semantic_search`, `supports_graph_navigation`, `supports_temporal_archaeology`, `max_fragments` | Many spec-defined fields are missing; the implementation has its own field names. This prevents future interop with spec-compliant strategies without an adapter. ### S3 [LOW] -- Pipeline component protocols have simplified signatures vs spec Several Phase 2/3 protocol method signatures omit parameters present in the spec: - `DetailDepthResolver.resolve()` -- missing `budget` parameter (spec: `docs/specification.md:42731-42738`) - `FragmentScorer.score()` -- missing `plan_context` param, returns `Sequence[ContextFragment]` instead of `list[ScoredFragment]` - `PreambleGenerator.generate()` -- missing `strategies_used`, `budget_used`, `max_tokens` params (spec: `docs/specification.md:42795-42803`) These are acceptable v1 simplifications but should be called out in `docs/reference/acms.md` under a "v1 limitations" section for the future upgrade path. ### S4 [LOW] -- `ContextFragment` model adds fields beyond spec The implementation adds `fragment_id`, `tier`, and `created_at` which are not in the spec's `ContextFragment` definition (`docs/specification.md:25081-25089`). These are useful additions but represent scope beyond the spec dataclass. Worth documenting. --- ## 4. PERFORMANCE (2 findings) ### P1 [LOW] -- `_pack_budget` skips fragments that don't fit instead of trying smaller ones `src/cleveragents/application/services/acms_service.py:108-113`: The early-termination guard at line 109 (`if total >= available: break`) causes the loop to exit once the budget is *fully* consumed, even if subsequent fragments with `token_count=0` exist. Zero-token fragments (e.g., empty structural markers) would be incorrectly excluded. This is an unlikely but valid edge case. ### P2 [LOW] -- Double iteration to compute total_tokens `src/cleveragents/application/services/acms_service.py:436` recomputes `sum(f.token_count for f in final_fragments)`, but `_pack_budget` already computes the running total internally and discards it. For very large fragment lists, this is a wasted O(n) pass. Could be addressed by having `_pack_budget` return a `(list, total)` tuple. --- ## 5. SECURITY (1 finding) ### SEC1 [LOW] -- `register_strategy` accepts unvalidated objects `src/cleveragents/application/services/acms_service.py:463-470`: The method inserts `strategy` into `self._strategies` without validating it satisfies the `ContextStrategy` protocol. Since `ContextStrategy` is `@runtime_checkable`, an `isinstance` check would be trivial and would prevent late `AttributeError` explosions at assembly time: ```python if not isinstance(strategy, ContextStrategy): raise TypeError(f"Expected ContextStrategy, got {type(strategy).__name__}") ``` --- ## 6. TEST FLAWS (1 finding) ### TF1 [MEDIUM] -- BDD test for "non-existent strategy" catches wrong exception type `features/steps/acms_pipeline_steps.py:290-302` catches `ValueError` from `assemble`, stored in `context.assemble_error`. The assertion at line 404-407 checks `context.assemble_error is not None`. However, the When step catches `ValueError` but does NOT catch `KeyError`, `AttributeError`, or other exceptions that a malformed or missing strategy could raise. If the implementation were refactored to raise a different exception type, the test would pass with an uncaught exception (Behave would report it as a step failure, not an assertion failure), masking the real issue. --- ## Summary Table | ID | Category | Severity | Description | |----|----------|----------|-------------| | **T1** | Test Coverage | **HIGH** | No test verifies the length-prefix collision fix (the commit's reason for existence) | | **T2** | Test Coverage | MEDIUM | No multi-byte UTF-8 content in hash tests | | **T3** | Test Coverage | MEDIUM | Robot hash assertion weaker than Behave | | **T4** | Test Coverage | LOW | No isolated hash benchmark for performance regression | | **B1** | Bug | **HIGH** | `provenance_map` keyed by `fragment_id` vs spec's `uko_node` | | **B2** | Bug | MEDIUM | Mutable `list`/`dict` in frozen models violates immutability contract | | **B3** | Bug | LOW | Dead `else 0.0` branch in budget_used calculation | | **S1** | Spec Alignment | MEDIUM | `ContextStrategy` method signatures differ from spec | | **S2** | Spec Alignment | MEDIUM | `StrategyCapabilities` field names differ from spec | | **S3** | Spec Alignment | LOW | Pipeline component protocol signatures simplified vs spec | | **S4** | Spec Alignment | LOW | Extra fields on `ContextFragment` not in spec | | **P1** | Performance | LOW | Zero-token fragments skipped by early-termination guard | | **P2** | Performance | LOW | Redundant O(n) pass to re-sum token counts | | **SEC1** | Security | LOW | `register_strategy` accepts unvalidated objects | | **TF1** | Test Flaw | MEDIUM | Exception type mismatch risk in strategy error test | **High-priority items to address before merge:** T1 (test the fix this commit introduces) and B1 (provenance_map keying contradiction with spec).

hamza.khyari referenced this issue from a commit

2026-03-04 20:44:44 +00:00

fix(acms): address review findings for ACMS v1 context pipeline

hamza.khyari removed the

labels 2026-03-04 20:50:07 +00:00

CoreRasurae commented

2026-03-04 21:02:38 +00:00

Code Review Report — `6f318e36`

Commit: 6f318e36 — fix(acms): address review findings for ACMS v1 context pipeline
Author: Hamza Khyari
Branch: feature/m5-acms-context
Issue: #188
Files changed: 10 files, +372 / -31 lines

Summary

The commit adds all 10 spec-defined pipeline component Protocols + Default implementations (Phase 1: StrategySelector, BudgetAllocator, StrategyExecutor; Phase 2: BudgetPacker; Phase 3: SkeletonCompressor), refactors assemble() to invoke them in order, switches strategies_used to immutable tuple, adds defensive copy validators, integrates structlog + DI params, adds structural Behave scenarios, and updates the reference docs. The overall structure is solid and well-organized, but there are several issues worth addressing.

B — Bugs & Logic Errors

B1 [HIGH] — `DefaultBudgetAllocator` duplicates total budget to every candidate

src/cleveragents/application/services/acms_service.py:397-405

def allocate(self, candidates, total_budget):
    return [(s, c, total_budget) for s, c in candidates]

The comment says "proportional share" but the code gives every candidate the full total_budget. With N candidates, the sum of allocated tokens = N * total_budget, violating the spec constraint (specification.md line 42689: "Sum of allocated_tokens must not exceed total_budget"). In v1 only one strategy is passed so this is latent, but it will break immediately when multi-strategy support is wired in. Either distribute proportionally or document that v1 intentionally allocates the full budget to a single candidate and add a guard/assertion.

B2 [MEDIUM] — `DefaultStrategyExecutor` ignores allocated token budget

src/cleveragents/application/services/acms_service.py:416-426

strategy, _confidence, _tokens = allocations[0]
return strategy.assemble(fragments, budget)

The _tokens variable (the per-strategy allocation from the BudgetAllocator) is discarded. The strategy receives the full ContextBudget object instead of the allocated subset. This means budget allocation is effectively a no-op: even if a custom BudgetAllocator properly divides the budget, the executor ignores it. The fix would be to construct a new ContextBudget from _tokens or pass it through.

B3 [MEDIUM] — `assemble()` passes only one strategy to the selector, defeating the 10-component architecture

src/cleveragents/application/services/acms_service.py:610-615

strat = self._strategies[strategy_name]
candidates = self._strategy_selector.select(
    [strat],
    {"strategy": strategy_name},
)

The StrategySelector always receives a single-element list. A custom StrategySelector that implements multi-strategy selection (as the spec envisions with ConfidenceWeightedSelector) would never see the other registered strategies. The pipeline should pass list(self._strategies.values()) or the entire strategies dict to the selector.

B4 [LOW] — `provenance_map` defensive copy is shallow only

src/cleveragents/domain/models/core/context_fragment.py:193-197

def _freeze_provenance_map(cls, v: dict[str, Any]) -> dict[str, Any]:
    return dict(v)

The provenance_map has type dict[str, Any] where values are mutable dicts (as produced by build_provenance_map). dict(v) is a shallow copy — the nested value dicts remain shared. A caller could still do payload.provenance_map["some_id"]["resource_uri"] = "evil" and mutate the internal state. For a frozen model, consider {k: dict(val) if isinstance(val, dict) else val for k, val in v.items()} or use copy.deepcopy.

T — Test Coverage & Test Flaws

T1 [HIGH] — Zero tests for custom pipeline component injection (DI)

The commit adds 10 injectable pipeline component constructor params (strategy_selector, budget_allocator, strategy_executor, packer, skeleton_compressor, etc.), but no Behave scenario, Robot test, or unit test verifies that a custom-injected component is actually invoked during assemble(). This is the main extensibility contract of the 10-component architecture and it is entirely untested.

T2 [HIGH] — No tests for `SkeletonCompressor`

The SkeletonCompressor protocol and DefaultSkeletonCompressor are added but never exercised by any test. There is no Behave scenario and no Robot test that calls compress() or verifies skeleton compression behavior. This leaves a gap in both unit and integration coverage.

T3 [MEDIUM] — No tests for multi-candidate allocation path

DefaultBudgetAllocator and DefaultStrategySelector are never tested with multiple strategies. A test passing 2+ strategies to the selector and allocator would immediately surface Bug B1 above.

T4 [MEDIUM] — `settings` and `unit_of_work` DI params are untested dead code

src/cleveragents/application/services/acms_service.py:558-559

self._settings = settings
self._unit_of_work = unit_of_work

These parameters are accepted in the constructor and stored but never read by any method. No test verifies their presence or behavior. If these are placeholders for future milestones, they should be documented as such. Dead code creates confusion about the service's actual dependency surface.

T5 [LOW] — Import inside function body in test step

features/steps/acms_pipeline_steps.py:496

def step_payload_assembled_at(context: Context) -> None:
    ...
    from datetime import UTC, timedelta  # import inside function

datetime is already imported at the module level (line 10). The from datetime import UTC, timedelta should be at the top of the file with the other imports for consistency and to avoid repeated import overhead.

T6 [LOW] — Robot tests do not cover Phase 1 pipeline components

The Robot helper (robot/helper_acms_pipeline.py) exercises only fragment creation, budget calculation, and the 3 built-in strategies. There are no Robot tests for the Phase 1 orchestration path (selector -> allocator -> executor) or any of the new default components.

SC — Spec Conformance Issues

SC1 [MEDIUM] — Multiple protocol signature divergences not documented in Known Limitations

The docs/reference/acms.md Known Limitations table (line 151-161) documents deviations for ContextStrategy, StrategyCapabilities, and some Phase 2/3 components, but omits the following significant divergences:

Component	v1 Signature	Spec Signature (`specification.md` ~42660-42820)
`StrategySelector.select()`	`(strategies, request: dict)`	`(strategies, request: ContextRequest, backends: BackendSet)`
`BudgetAllocator.allocate()`	`(candidates, total_budget)`	`(candidates, total_budget, request: ContextRequest)`
`StrategyExecutor.execute()`	`(allocations, fragments, budget)`	`(allocations, request, backends, plan_context)`
`BudgetPacker.pack()`	`(fragments: Sequence[ContextFragment], budget: ContextBudget)`	`(scored_fragments: list[ScoredFragment], budget: int, detail_level_maps)`
`SkeletonCompressor.compress()`	`(fragments: tuple, skeleton_budget: int) -> tuple`	`(parent_context: AssembledContext, child_focus: list[str], skeleton_budget: int) -> AssembledContext`

All of these should be added to the Known Limitations table for traceability. The StrategyExecutor divergence is particularly significant because it takes a completely different set of parameters.

SC2 [LOW] — Duplicated section comment in feature file

features/acms_pipeline.feature:324-326 and features/acms_pipeline.feature:339-341 both have the same section comment # Budget validation edge case, but the first section (lines 328-337) is actually the "Payload structural assertions" test for plan_id, payload_id, and assembled_at. The comment is misleading.

S — Security Issues

S1 [MEDIUM] — No input validation on `plan_id`

src/cleveragents/application/services/acms_service.py:587 and src/cleveragents/domain/models/core/context_fragment.py:160

# acms_service.py
def assemble(self, plan_id: str, ...):

# context_fragment.py
plan_id: str

The plan_id field on ContextPayload and the plan_id parameter on assemble() accept any arbitrary string with no validation — no min_length, no max_length, no format constraints. A caller could pass an empty string, a multi-megabyte string, or a string with control characters. Since plan_id appears in structlog output (acms_service.py:604) and potentially in serialized payloads, this is a log-injection/DoS vector. At minimum, add min_length=1 and a reasonable max_length constraint on ContextPayload.plan_id.

S2 [LOW] — Fragment `content` up to 1MB allowed per fragment with no fragment count limit

src/cleveragents/domain/models/core/context_fragment.py:23, 69-72

MAX_CONTENT_LENGTH = 1_000_000 per fragment. With no limit on the number of fragments passed to assemble(), a caller could pass thousands of 1MB fragments. While _pack_budget limits what gets into the final payload, the hash computation (compute_context_hash) and sorting still operate on the full input list. This is more of a resource consumption concern than a security bug, but worth noting for hardening.

P — Performance Issues

P1 [LOW] — `DefaultStrategySelector.select()` calls `can_handle()` on every strategy per assembly

src/cleveragents/application/services/acms_service.py:382-387

In v1 only one strategy is passed (due to B3), so this is O(1). But if the pipeline is fixed to pass all registered strategies, can_handle() will be called on every strategy even for simple requests. Consider caching results or short-circuiting when a single strategy is specified.

P2 [LOW] — No pipeline component result caching

The pipeline runs all 10 components sequentially on every assemble() call even when fragments and budget haven't changed. For repeated assemblies with the same input (common during iterative planning), a memoization layer on compute_context_hash or the full pipeline output would eliminate redundant work.

Overall Assessment

The commit is structurally sound and delivers the core 10-component architecture as specified by issue #188. The code is well-documented, the naming is consistent, and the Behave/Robot/ASV test layers all received updates. However, there are 3 bugs that will surface as soon as multi-strategy support is added (B1-B3), a complete absence of DI tests for the pipeline components (T1), no test coverage for SkeletonCompressor (T2), 5 undocumented spec divergences (SC1), and a missing validation on plan_id that could lead to log injection (S1). These should be resolved before merging.

# Code Review Report — `6f318e36` **Commit:** `6f318e36` — `fix(acms): address review findings for ACMS v1 context pipeline` **Author:** Hamza Khyari **Branch:** `feature/m5-acms-context` **Issue:** #188 **Files changed:** 10 files, +372 / -31 lines --- ## Summary The commit adds all 10 spec-defined pipeline component Protocols + Default implementations (Phase 1: StrategySelector, BudgetAllocator, StrategyExecutor; Phase 2: BudgetPacker; Phase 3: SkeletonCompressor), refactors `assemble()` to invoke them in order, switches `strategies_used` to immutable tuple, adds defensive copy validators, integrates structlog + DI params, adds structural Behave scenarios, and updates the reference docs. The overall structure is solid and well-organized, but there are several issues worth addressing. --- ## B — Bugs & Logic Errors ### B1 [HIGH] — `DefaultBudgetAllocator` duplicates total budget to every candidate `src/cleveragents/application/services/acms_service.py:397-405` ```python def allocate(self, candidates, total_budget): return [(s, c, total_budget) for s, c in candidates] ``` The comment says "proportional share" but the code gives **every** candidate the **full** `total_budget`. With N candidates, the sum of allocated tokens = N * total_budget, violating the spec constraint (`specification.md` line 42689: "Sum of allocated_tokens must not exceed total_budget"). In v1 only one strategy is passed so this is latent, but it will break immediately when multi-strategy support is wired in. Either distribute proportionally or document that v1 intentionally allocates the full budget to a single candidate and add a guard/assertion. ### B2 [MEDIUM] — `DefaultStrategyExecutor` ignores allocated token budget `src/cleveragents/application/services/acms_service.py:416-426` ```python strategy, _confidence, _tokens = allocations[0] return strategy.assemble(fragments, budget) ``` The `_tokens` variable (the per-strategy allocation from the BudgetAllocator) is discarded. The strategy receives the full `ContextBudget` object instead of the allocated subset. This means budget allocation is effectively a no-op: even if a custom BudgetAllocator properly divides the budget, the executor ignores it. The fix would be to construct a new `ContextBudget` from `_tokens` or pass it through. ### B3 [MEDIUM] — `assemble()` passes only one strategy to the selector, defeating the 10-component architecture `src/cleveragents/application/services/acms_service.py:610-615` ```python strat = self._strategies[strategy_name] candidates = self._strategy_selector.select( [strat], {"strategy": strategy_name}, ) ``` The StrategySelector always receives a single-element list. A custom StrategySelector that implements multi-strategy selection (as the spec envisions with `ConfidenceWeightedSelector`) would never see the other registered strategies. The pipeline should pass `list(self._strategies.values())` or the entire strategies dict to the selector. ### B4 [LOW] — `provenance_map` defensive copy is shallow only `src/cleveragents/domain/models/core/context_fragment.py:193-197` ```python def _freeze_provenance_map(cls, v: dict[str, Any]) -> dict[str, Any]: return dict(v) ``` The `provenance_map` has type `dict[str, Any]` where values are mutable dicts (as produced by `build_provenance_map`). `dict(v)` is a shallow copy — the nested value dicts remain shared. A caller could still do `payload.provenance_map["some_id"]["resource_uri"] = "evil"` and mutate the internal state. For a frozen model, consider `{k: dict(val) if isinstance(val, dict) else val for k, val in v.items()}` or use `copy.deepcopy`. --- ## T — Test Coverage & Test Flaws ### T1 [HIGH] — Zero tests for custom pipeline component injection (DI) The commit adds 10 injectable pipeline component constructor params (`strategy_selector`, `budget_allocator`, `strategy_executor`, `packer`, `skeleton_compressor`, etc.), but no Behave scenario, Robot test, or unit test verifies that a custom-injected component is actually invoked during `assemble()`. This is the main extensibility contract of the 10-component architecture and it is entirely untested. ### T2 [HIGH] — No tests for `SkeletonCompressor` The `SkeletonCompressor` protocol and `DefaultSkeletonCompressor` are added but never exercised by any test. There is no Behave scenario and no Robot test that calls `compress()` or verifies skeleton compression behavior. This leaves a gap in both unit and integration coverage. ### T3 [MEDIUM] — No tests for multi-candidate allocation path `DefaultBudgetAllocator` and `DefaultStrategySelector` are never tested with multiple strategies. A test passing 2+ strategies to the selector and allocator would immediately surface Bug B1 above. ### T4 [MEDIUM] — `settings` and `unit_of_work` DI params are untested dead code `src/cleveragents/application/services/acms_service.py:558-559` ```python self._settings = settings self._unit_of_work = unit_of_work ``` These parameters are accepted in the constructor and stored but never read by any method. No test verifies their presence or behavior. If these are placeholders for future milestones, they should be documented as such. Dead code creates confusion about the service's actual dependency surface. ### T5 [LOW] — Import inside function body in test step `features/steps/acms_pipeline_steps.py:496` ```python def step_payload_assembled_at(context: Context) -> None: ... from datetime import UTC, timedelta # import inside function ``` `datetime` is already imported at the module level (line 10). The `from datetime import UTC, timedelta` should be at the top of the file with the other imports for consistency and to avoid repeated import overhead. ### T6 [LOW] — Robot tests do not cover Phase 1 pipeline components The Robot helper (`robot/helper_acms_pipeline.py`) exercises only fragment creation, budget calculation, and the 3 built-in strategies. There are no Robot tests for the Phase 1 orchestration path (selector -> allocator -> executor) or any of the new default components. --- ## SC — Spec Conformance Issues ### SC1 [MEDIUM] — Multiple protocol signature divergences not documented in Known Limitations The `docs/reference/acms.md` Known Limitations table (line 151-161) documents deviations for `ContextStrategy`, `StrategyCapabilities`, and some Phase 2/3 components, but **omits** the following significant divergences: | Component | v1 Signature | Spec Signature (`specification.md` ~42660-42820) | |---|---|---| | `StrategySelector.select()` | `(strategies, request: dict)` | `(strategies, request: ContextRequest, backends: BackendSet)` | | `BudgetAllocator.allocate()` | `(candidates, total_budget)` | `(candidates, total_budget, request: ContextRequest)` | | `StrategyExecutor.execute()` | `(allocations, fragments, budget)` | `(allocations, request, backends, plan_context)` | | `BudgetPacker.pack()` | `(fragments: Sequence[ContextFragment], budget: ContextBudget)` | `(scored_fragments: list[ScoredFragment], budget: int, detail_level_maps)` | | `SkeletonCompressor.compress()` | `(fragments: tuple, skeleton_budget: int) -> tuple` | `(parent_context: AssembledContext, child_focus: list[str], skeleton_budget: int) -> AssembledContext` | All of these should be added to the Known Limitations table for traceability. The `StrategyExecutor` divergence is particularly significant because it takes a completely different set of parameters. ### SC2 [LOW] — Duplicated section comment in feature file `features/acms_pipeline.feature:324-326` and `features/acms_pipeline.feature:339-341` both have the same section comment `# Budget validation edge case`, but the first section (lines 328-337) is actually the "Payload structural assertions" test for `plan_id`, `payload_id`, and `assembled_at`. The comment is misleading. --- ## S — Security Issues ### S1 [MEDIUM] — No input validation on `plan_id` `src/cleveragents/application/services/acms_service.py:587` and `src/cleveragents/domain/models/core/context_fragment.py:160` ```python # acms_service.py def assemble(self, plan_id: str, ...): # context_fragment.py plan_id: str ``` The `plan_id` field on `ContextPayload` and the `plan_id` parameter on `assemble()` accept any arbitrary string with no validation — no `min_length`, no `max_length`, no format constraints. A caller could pass an empty string, a multi-megabyte string, or a string with control characters. Since `plan_id` appears in structlog output (`acms_service.py:604`) and potentially in serialized payloads, this is a log-injection/DoS vector. At minimum, add `min_length=1` and a reasonable `max_length` constraint on `ContextPayload.plan_id`. ### S2 [LOW] — Fragment `content` up to 1MB allowed per fragment with no fragment count limit `src/cleveragents/domain/models/core/context_fragment.py:23, 69-72` `MAX_CONTENT_LENGTH = 1_000_000` per fragment. With no limit on the number of fragments passed to `assemble()`, a caller could pass thousands of 1MB fragments. While `_pack_budget` limits what gets into the final payload, the hash computation (`compute_context_hash`) and sorting still operate on the full input list. This is more of a resource consumption concern than a security bug, but worth noting for hardening. --- ## P — Performance Issues ### P1 [LOW] — `DefaultStrategySelector.select()` calls `can_handle()` on every strategy per assembly `src/cleveragents/application/services/acms_service.py:382-387` In v1 only one strategy is passed (due to B3), so this is O(1). But if the pipeline is fixed to pass all registered strategies, `can_handle()` will be called on every strategy even for simple requests. Consider caching results or short-circuiting when a single strategy is specified. ### P2 [LOW] — No pipeline component result caching The pipeline runs all 10 components sequentially on every `assemble()` call even when fragments and budget haven't changed. For repeated assemblies with the same input (common during iterative planning), a memoization layer on `compute_context_hash` or the full pipeline output would eliminate redundant work. --- ## Overall Assessment The commit is structurally sound and delivers the core 10-component architecture as specified by issue #188. The code is well-documented, the naming is consistent, and the Behave/Robot/ASV test layers all received updates. However, there are **3 bugs that will surface as soon as multi-strategy support is added** (B1-B3), a **complete absence of DI tests for the pipeline components** (T1), **no test coverage for SkeletonCompressor** (T2), **5 undocumented spec divergences** (SC1), and a **missing validation on `plan_id`** that could lead to log injection (S1). These should be resolved before merging.

hamza.khyari requested review from CoreRasurae 2026-03-04 21:39:10 +00:00

hamza.khyari force-pushed feature/m5-acms-context from e386c4c8ce to 0836502ece

2026-03-04 22:04:19 +00:00

Compare

hamza.khyari referenced this issue from a commit

2026-03-04 22:04:19 +00:00

fix(acms): address review findings for ACMS v1 context pipeline

CoreRasurae commented

2026-03-04 22:36:58 +00:00

Code Review Report — Commit `0836502e`

Commit: 0836502e — fix(acms): address second review — pipeline bugs, DI tests, spec docs
Author: khyari hamza
Branch: feature/m5-acms-context
Issue: #188 — feat(acms): add ACMS v1 context pipeline
Spec reference: docs/specification.md (ACMS pipeline ~lines 42613–42945)

Files changed (6): acms_service.py, context_fragment.py, core/__init__.py, acms_pipeline.feature, acms_pipeline_steps.py, docs/reference/acms.md

BUGS (5 findings)

BUG-1 (Medium): `DefaultBudgetAllocator` leaks tokens due to `int()` truncation

src/cleveragents/application/services/acms_service.py:417

return [(s, c, int(total_budget * c / total_confidence)) for s, c in candidates]

Using int() truncates toward zero. With certain inputs, the sum of allocated tokens is strictly less than total_budget. Example: budget=999, confidences=(0.33, 0.33, 0.34) yields allocations (329, 329, 339) = 997, wasting 2 tokens. While this does not violate the spec constraint ("sum must not exceed total_budget"), it causes silent under-utilization. The spec's ProportionalBudgetAllocator is expected to distribute the full budget. A largest-remainder allocation method would fix this.

The zero-confidence fallback at line 415 (total_budget // len(candidates)) has the same issue — integer division can lose up to N-1 tokens.

BUG-2 (Medium): `DefaultStrategyExecutor` masks zero-allocation with `max(allocated_tokens, 1)`

src/cleveragents/application/services/acms_service.py:440

scoped_budget = ContextBudget(
    max_tokens=max(allocated_tokens, 1),
    reserved_tokens=0,
)

If the allocator assigns 0 tokens (possible from BUG-1 rounding), the strategy still receives 1 token and is invoked. This can produce unexpected micro-fragments from a strategy that should have been excluded. A more correct approach would be to skip strategies with 0 allocation entirely.

BUG-3 (Low): `provenance_map` defensive copy is only one level deep

src/cleveragents/domain/models/core/context_fragment.py:197

return {k: dict(val) if isinstance(val, dict) else val for k, val in v.items()}

The docstring claims "Deep defensive copy" but only dict values are copied one level deeper. Other mutable types (list, set, nested dicts-within-dicts) are passed by reference. Since the field is typed dict[str, Any], a caller could pass {"key": [mutable_list]} and later mutate the model's internal state despite the frozen flag. Either use copy.deepcopy(v) or restrict the type to dict[str, dict[str, str]].

BUG-4 (Low): `assemble()` discards custom selector's multi-strategy results

src/cleveragents/application/services/acms_service.py:646-648

candidates = [(s, c) for s, c in candidates if s is resolved] or [
    (resolved, 1.0)
]

The method passes all registered strategies to the selector (line 636–641), then immediately filters the result back to a single strategy using identity comparison (is). This creates a misleading API contract: a custom StrategySelector that implements multi-strategy fusion will have its output silently discarded. The is comparison (vs ==) adds fragility — if a custom selector wraps or copies strategy objects, filtering always falls back to (resolved, 1.0) regardless of the selector's confidence computation. The selector should either receive only the single strategy (like before) or the filtering should be clearly documented in the Protocol docstring.

BUG-5 (Low): `DefaultStrategyExecutor.execute()` ignores its `budget` parameter

src/cleveragents/application/services/acms_service.py:429-444

The budget: ContextBudget parameter is accepted but never referenced in the method body. Each strategy gets a scoped budget from its allocation tokens, but there is no aggregate enforcement. If multiple strategies each produce fragments up to their allocation, the combined output could exceed the original total budget. The downstream BudgetPacker is expected to enforce this, but since the packer is a no-op in v1 (DefaultBudgetPacker returns fragments unchanged at line 510), there is currently no aggregate budget enforcement at all after the executor. Fragments can exceed the budget until a real packer is wired in.

TEST COVERAGE GAPS (7 findings)

TEST-1 (High): No test for `plan_id` validation boundaries

The S1 security fix adds min_length=1, max_length=256 to ContextPayload.plan_id, but no BDD scenario tests the boundary conditions: empty string rejection, 1-char acceptance, 256-char acceptance, or 257-char rejection. This is a validation change with zero test coverage.

TEST-2 (High): No test for `provenance_map` defensive copy

The B4 fix deep-copies nested dicts, but no test verifies that mutating a nested dict value after ContextPayload creation does not affect the payload's internal state. Without this, a regression could silently reintroduce the shared-reference issue.

TEST-3 (Medium): No test for allocator rounding edge cases

The T3 scenarios test exact 0.8/0.2 splits (which divide cleanly into 800/200). No test exercises odd budgets (e.g., budget=999 with 3 candidates), the zero-confidence equal-split path, or very small budgets where rounding dominates.

TEST-4 (Medium): No test for `assemble()` strategy filtering / fallback behavior

The commit changes assemble() to pass all strategies to the selector then filter to only the requested one. No scenario tests:

What happens when the selector returns candidates that don't include the requested strategy (the (resolved, 1.0) fallback).
That the selector actually receives all registered strategies (not just the requested one).

TEST-5 (Medium): DI selector scenario name/behavior mismatch

features/acms_pipeline.feature:349 — The scenario is named "Custom strategy selector that picks only recency" but the _TrackingSelector at features/steps/acms_pipeline_steps.py:518-531 simply delegates to DefaultStrategySelector — it does not pick "only recency". The scenario name is misleading and the test doesn't verify selective behavior, only that the DI hook fires.

TEST-6 (Low): SkeletonCompressor tests only exercise test doubles

The T2 scenarios test DefaultSkeletonCompressor (which is a trivial no-op pass-through) and an inline _TruncatingCompressor (a test double that isn't production code). Neither test exercises any real compression logic. While understandable for v1 stubs, this provides a false sense of coverage.

TEST-7 (Low): `strategies_used` assertion is vacuous

features/acms_pipeline.feature:356 asserts the payload strategies used should include "relevance", but assemble() at acms_service.py:687 hardcodes strategies_used=(strategy_name,) from the caller's input — not from the selector's output. This assertion will always pass regardless of what the custom selector does, making it a tautological test.

SECURITY (2 findings)

SEC-1 (Medium): `plan_id` lacks character/pattern validation

src/cleveragents/domain/models/core/context_fragment.py:160

plan_id: str = Field(..., min_length=1, max_length=256)

While length is validated, no pattern restriction prevents whitespace-only strings (e.g. " "), control characters, null bytes, or path-traversal sequences. If plan_id is later used in file paths, database queries, or log messages, this opens injection vectors. Add a pattern constraint (e.g., r"^[\w.:\-/]+$" or similar) or a field validator that strips/rejects non-printable characters.

SEC-2 (Low): `strategy_name` logged without sanitization

src/cleveragents/application/services/acms_service.py:624-630 and lines 672–679

The strategy_name and plan_id are passed directly to structlog via self._logger.info(...). While structlog generally handles structured output safely, if downstream log sinks render as plaintext, crafted plan_id or strategy_name values containing newlines or ANSI escape sequences could cause log injection or log spoofing. This is low severity given structured logging but worth noting.

PERFORMANCE (1 finding)

PERF-1 (Low): `DefaultStrategySelector` sorts the full strategy registry on every call, then result is discarded

src/cleveragents/application/services/acms_service.py:388-393 / line 636

select() evaluates can_handle() and sorts all registered strategies, but assemble() immediately filters the result to a single strategy at line 646. This means N can_handle() calls and an N-log-N sort are performed on every assembly, only to keep one result. With 3 built-in strategies this is negligible, but the code path will become wasteful as more strategies are registered. Consider either passing only the resolved strategy to the selector (as before), or caching can_handle results.

SPEC COMPLIANCE NOTES (2 findings)

SPEC-1 (Info): Aggregate budget enforcement gap in v1 pipeline

The spec defines the 10-component pipeline with BudgetPacker (spec ~line 42937: GreedyKnapsackPacker) as the component ensuring budget compliance. In v1, the DefaultBudgetPacker is a no-op (returns fragments unchanged, acms_service.py:505-510), and DefaultStrategyExecutor creates scoped budgets per-strategy but ignores the aggregate budget. This means the v1 pipeline has no hard budget enforcement after strategy execution. Fragments produced by strategies can exceed the original budget, and the no-op packer will pass them through. This is partially mitigated because each strategy calls _pack_budget() internally, but a custom strategy without that behavior would break the budget guarantee.

SPEC-2 (Info): `init.py` `# fmt: skip` suppresses formatter on checkpoint import

src/cleveragents/domain/models/core/__init__.py:43 adds # fmt: skip to the checkpoint import block. This is a minor style concern — suppressing the formatter can mask import ordering issues. The reason for this appears to be preventing the formatter from inserting blank lines between the checkpoint and context import blocks after the duplicate context_fragment import removal.

Summary

Category	ID	Severity	File	Line
Bug	BUG-1	Medium	`acms_service.py`	417
Bug	BUG-2	Medium	`acms_service.py`	440
Bug	BUG-3	Low	`context_fragment.py`	197
Bug	BUG-4	Low	`acms_service.py`	646
Bug	BUG-5	Low	`acms_service.py`	429
Test Gap	TEST-1	High	`context_fragment.py`	160
Test Gap	TEST-2	High	`context_fragment.py`	195
Test Gap	TEST-3	Medium	`acms_pipeline_steps.py`	731
Test Gap	TEST-4	Medium	`acms_service.py`	646
Test Gap	TEST-5	Medium	`acms_pipeline.feature`	349
Test Gap	TEST-6	Low	`acms_pipeline_steps.py`	662
Test Gap	TEST-7	Low	`acms_pipeline.feature`	356
Security	SEC-1	Medium	`context_fragment.py`	160
Security	SEC-2	Low	`acms_service.py`	624
Performance	PERF-1	Low	`acms_service.py`	388
Spec	SPEC-1	Info	`acms_service.py`	505
Spec	SPEC-2	Info	`__init__.py`	43

Recommended priority: Address BUG-1, BUG-2, SEC-1, TEST-1, and TEST-2 before merge. The remaining items are lower priority but worth tracking.

# Code Review Report — Commit `0836502e` **Commit**: `0836502e` — `fix(acms): address second review — pipeline bugs, DI tests, spec docs` **Author**: khyari hamza **Branch**: `feature/m5-acms-context` **Issue**: #188 — feat(acms): add ACMS v1 context pipeline **Spec reference**: `docs/specification.md` (ACMS pipeline ~lines 42613–42945) **Files changed (6)**: `acms_service.py`, `context_fragment.py`, `core/__init__.py`, `acms_pipeline.feature`, `acms_pipeline_steps.py`, `docs/reference/acms.md` --- ## BUGS (5 findings) ### BUG-1 (Medium): `DefaultBudgetAllocator` leaks tokens due to `int()` truncation `src/cleveragents/application/services/acms_service.py:417` ```python return [(s, c, int(total_budget * c / total_confidence)) for s, c in candidates] ``` Using `int()` truncates toward zero. With certain inputs, the sum of allocated tokens is strictly less than `total_budget`. Example: budget=999, confidences=(0.33, 0.33, 0.34) yields allocations (329, 329, 339) = 997, wasting 2 tokens. While this does not violate the spec constraint ("sum must not exceed `total_budget`"), it causes silent under-utilization. The spec's `ProportionalBudgetAllocator` is expected to distribute the full budget. A largest-remainder allocation method would fix this. The zero-confidence fallback at line 415 (`total_budget // len(candidates)`) has the same issue — integer division can lose up to `N-1` tokens. --- ### BUG-2 (Medium): `DefaultStrategyExecutor` masks zero-allocation with `max(allocated_tokens, 1)` `src/cleveragents/application/services/acms_service.py:440` ```python scoped_budget = ContextBudget( max_tokens=max(allocated_tokens, 1), reserved_tokens=0, ) ``` If the allocator assigns 0 tokens (possible from BUG-1 rounding), the strategy still receives 1 token and is invoked. This can produce unexpected micro-fragments from a strategy that should have been excluded. A more correct approach would be to skip strategies with 0 allocation entirely. --- ### BUG-3 (Low): `provenance_map` defensive copy is only one level deep `src/cleveragents/domain/models/core/context_fragment.py:197` ```python return {k: dict(val) if isinstance(val, dict) else val for k, val in v.items()} ``` The docstring claims "Deep defensive copy" but only dict values are copied one level deeper. Other mutable types (`list`, `set`, nested dicts-within-dicts) are passed by reference. Since the field is typed `dict[str, Any]`, a caller could pass `{"key": [mutable_list]}` and later mutate the model's internal state despite the frozen flag. Either use `copy.deepcopy(v)` or restrict the type to `dict[str, dict[str, str]]`. --- ### BUG-4 (Low): `assemble()` discards custom selector's multi-strategy results `src/cleveragents/application/services/acms_service.py:646-648` ```python candidates = [(s, c) for s, c in candidates if s is resolved] or [ (resolved, 1.0) ] ``` The method passes **all registered strategies** to the selector (line 636–641), then immediately filters the result back to a single strategy using identity comparison (`is`). This creates a misleading API contract: a custom `StrategySelector` that implements multi-strategy fusion will have its output silently discarded. The `is` comparison (vs `==`) adds fragility — if a custom selector wraps or copies strategy objects, filtering always falls back to `(resolved, 1.0)` regardless of the selector's confidence computation. The selector should either receive only the single strategy (like before) or the filtering should be clearly documented in the Protocol docstring. --- ### BUG-5 (Low): `DefaultStrategyExecutor.execute()` ignores its `budget` parameter `src/cleveragents/application/services/acms_service.py:429-444` The `budget: ContextBudget` parameter is accepted but never referenced in the method body. Each strategy gets a scoped budget from its allocation tokens, but there is no aggregate enforcement. If multiple strategies each produce fragments up to their allocation, the combined output could exceed the original total budget. The downstream `BudgetPacker` is expected to enforce this, but since the packer is a no-op in v1 (`DefaultBudgetPacker` returns fragments unchanged at line 510), **there is currently no aggregate budget enforcement at all after the executor**. Fragments can exceed the budget until a real packer is wired in. --- ## TEST COVERAGE GAPS (7 findings) ### TEST-1 (High): No test for `plan_id` validation boundaries The S1 security fix adds `min_length=1, max_length=256` to `ContextPayload.plan_id`, but no BDD scenario tests the boundary conditions: empty string rejection, 1-char acceptance, 256-char acceptance, or 257-char rejection. This is a validation change with zero test coverage. --- ### TEST-2 (High): No test for `provenance_map` defensive copy The B4 fix deep-copies nested dicts, but no test verifies that mutating a nested dict value after `ContextPayload` creation does not affect the payload's internal state. Without this, a regression could silently reintroduce the shared-reference issue. --- ### TEST-3 (Medium): No test for allocator rounding edge cases The T3 scenarios test exact 0.8/0.2 splits (which divide cleanly into 800/200). No test exercises odd budgets (e.g., budget=999 with 3 candidates), the zero-confidence equal-split path, or very small budgets where rounding dominates. --- ### TEST-4 (Medium): No test for `assemble()` strategy filtering / fallback behavior The commit changes `assemble()` to pass all strategies to the selector then filter to only the requested one. No scenario tests: - What happens when the selector returns candidates that don't include the requested strategy (the `(resolved, 1.0)` fallback). - That the selector actually receives all registered strategies (not just the requested one). --- ### TEST-5 (Medium): DI selector scenario name/behavior mismatch `features/acms_pipeline.feature:349` — The scenario is named "Custom strategy selector that picks only recency" but the `_TrackingSelector` at `features/steps/acms_pipeline_steps.py:518-531` simply delegates to `DefaultStrategySelector` — it does not pick "only recency". The scenario name is misleading and the test doesn't verify selective behavior, only that the DI hook fires. --- ### TEST-6 (Low): SkeletonCompressor tests only exercise test doubles The T2 scenarios test `DefaultSkeletonCompressor` (which is a trivial no-op pass-through) and an inline `_TruncatingCompressor` (a test double that isn't production code). Neither test exercises any real compression logic. While understandable for v1 stubs, this provides a false sense of coverage. --- ### TEST-7 (Low): `strategies_used` assertion is vacuous `features/acms_pipeline.feature:356` asserts `the payload strategies used should include "relevance"`, but `assemble()` at `acms_service.py:687` hardcodes `strategies_used=(strategy_name,)` from the caller's input — not from the selector's output. This assertion will always pass regardless of what the custom selector does, making it a tautological test. --- ## SECURITY (2 findings) ### SEC-1 (Medium): `plan_id` lacks character/pattern validation `src/cleveragents/domain/models/core/context_fragment.py:160` ```python plan_id: str = Field(..., min_length=1, max_length=256) ``` While length is validated, no pattern restriction prevents whitespace-only strings (e.g. `" "`), control characters, null bytes, or path-traversal sequences. If `plan_id` is later used in file paths, database queries, or log messages, this opens injection vectors. Add a `pattern` constraint (e.g., `r"^[\w.:\-/]+$"` or similar) or a field validator that strips/rejects non-printable characters. --- ### SEC-2 (Low): `strategy_name` logged without sanitization `src/cleveragents/application/services/acms_service.py:624-630` and lines 672–679 The `strategy_name` and `plan_id` are passed directly to `structlog` via `self._logger.info(...)`. While structlog generally handles structured output safely, if downstream log sinks render as plaintext, crafted `plan_id` or `strategy_name` values containing newlines or ANSI escape sequences could cause log injection or log spoofing. This is low severity given structured logging but worth noting. --- ## PERFORMANCE (1 finding) ### PERF-1 (Low): `DefaultStrategySelector` sorts the full strategy registry on every call, then result is discarded `src/cleveragents/application/services/acms_service.py:388-393` / line 636 `select()` evaluates `can_handle()` and sorts **all** registered strategies, but `assemble()` immediately filters the result to a single strategy at line 646. This means N `can_handle()` calls and an N-log-N sort are performed on every assembly, only to keep one result. With 3 built-in strategies this is negligible, but the code path will become wasteful as more strategies are registered. Consider either passing only the resolved strategy to the selector (as before), or caching `can_handle` results. --- ## SPEC COMPLIANCE NOTES (2 findings) ### SPEC-1 (Info): Aggregate budget enforcement gap in v1 pipeline The spec defines the 10-component pipeline with `BudgetPacker` (spec ~line 42937: `GreedyKnapsackPacker`) as the component ensuring budget compliance. In v1, the `DefaultBudgetPacker` is a no-op (returns fragments unchanged, `acms_service.py:505-510`), and `DefaultStrategyExecutor` creates scoped budgets per-strategy but ignores the aggregate budget. This means the v1 pipeline has **no hard budget enforcement** after strategy execution. Fragments produced by strategies can exceed the original budget, and the no-op packer will pass them through. This is partially mitigated because each strategy calls `_pack_budget()` internally, but a custom strategy without that behavior would break the budget guarantee. --- ### SPEC-2 (Info): `__init__.py` `# fmt: skip` suppresses formatter on checkpoint import `src/cleveragents/domain/models/core/__init__.py:43` adds `# fmt: skip` to the checkpoint import block. This is a minor style concern — suppressing the formatter can mask import ordering issues. The reason for this appears to be preventing the formatter from inserting blank lines between the checkpoint and context import blocks after the duplicate `context_fragment` import removal. --- ## Summary | Category | ID | Severity | File | Line | |---|---|---|---|---| | Bug | BUG-1 | Medium | `acms_service.py` | 417 | | Bug | BUG-2 | Medium | `acms_service.py` | 440 | | Bug | BUG-3 | Low | `context_fragment.py` | 197 | | Bug | BUG-4 | Low | `acms_service.py` | 646 | | Bug | BUG-5 | Low | `acms_service.py` | 429 | | Test Gap | TEST-1 | High | `context_fragment.py` | 160 | | Test Gap | TEST-2 | High | `context_fragment.py` | 195 | | Test Gap | TEST-3 | Medium | `acms_pipeline_steps.py` | 731 | | Test Gap | TEST-4 | Medium | `acms_service.py` | 646 | | Test Gap | TEST-5 | Medium | `acms_pipeline.feature` | 349 | | Test Gap | TEST-6 | Low | `acms_pipeline_steps.py` | 662 | | Test Gap | TEST-7 | Low | `acms_pipeline.feature` | 356 | | Security | SEC-1 | Medium | `context_fragment.py` | 160 | | Security | SEC-2 | Low | `acms_service.py` | 624 | | Performance | PERF-1 | Low | `acms_service.py` | 388 | | Spec | SPEC-1 | Info | `acms_service.py` | 505 | | Spec | SPEC-2 | Info | `__init__.py` | 43 | **Recommended priority**: Address BUG-1, BUG-2, SEC-1, TEST-1, and TEST-2 before merge. The remaining items are lower priority but worth tracking.

CoreRasurae reviewed 2026-03-04 23:42:59 +00:00

CoreRasurae left a comment

Code Review — Commit `b5bb147a` (third review round)

Scope: Bug fixes (allocator rounding, executor skip-zero, deepcopy provenance, name-based comparison), security (plan_id validation), and test additions.
Test status: 50/50 BDD scenarios passing (228 steps, 0 failures).

Summary of Findings

ID	Severity	Category	Description
SEC-1	HIGH	Security	`plan_id` regex allows path traversal sequences (`../`)
SEC-2	MEDIUM	Security	Unvalidated `plan_id` in `SandboxCheckpoint` and `ChangeEntry` models
SPEC-1	MEDIUM	Spec conformance	`plan_id` type diverges from spec's ULID requirement
SPEC-2	LOW	Spec conformance	`provenance_map` keyed by `fragment_id`, spec uses `uko_node`
SPEC-3	LOW	Spec conformance	Protocol signature divergences (`BudgetAllocator`, `StrategyExecutor`) — expected v1 simplification
TEST-GAP-1	MEDIUM	Test coverage	No plan_id path-traversal test scenarios
TEST-GAP-2	LOW	Test coverage	`_EmptySelector` fallback test lacks invocation verification
TEST-GAP-3	LOW	Test coverage	Allocator tests don't verify individual allocation correctness
BUG-EDGE-1	LOW	Bug edge case	`min_useful_budget` exclusion not enforced
BUG-EDGE-2	LOW	Bug edge case	Name collision possible in strategy comparison
PERF-1	LOW	Performance	`copy.deepcopy` on every `ContextPayload` construction
DOC-1	LOW	Documentation	Commit message skips TEST-6 numbering

SEC-1 [HIGH]: `plan_id` regex allows path traversal

The regex ^[\w.:/\-]+$ on ContextPayload.plan_id permits / and . characters. Values like ../../etc/passwd pass validation.

All other models in the codebase use the strict ULID pattern ^[0-9A-HJKMNP-TV-Z]{26}$:

PlanIdentity.plan_id — ULID pattern
Decision.plan_id — ULID pattern + validator
Checkpoint.plan_id — ULID pattern

The specification (specification.md:43360-43362) mandates filesystem paths:

Checkpoints: <data-dir>/checkpoints/<plan_id>/
Artifacts:   <data-dir>/artifacts/<plan_id>/

When those paths are implemented, a plan_id of ../../etc would yield <data-dir>/checkpoints/../../etc/ — a classic path traversal.

Recommendation: Constrain to the ULID pattern used everywhere else, or at minimum reject .. sequences and add a realpath() containment check in any future path construction.

SEC-2 [MEDIUM]: Unvalidated `plan_id` in related models

SandboxCheckpoint.plan_id (sandbox/checkpoint.py:73) and ChangeEntry.plan_id (change.py:181-183) accept arbitrary strings with no validation pattern.

SPEC-1 [MEDIUM]: `plan_id` type diverges from specification

The spec (specification.md:18158) defines plan_id as type ULID — "Unique, immutable identifier." All example values use ULID format (e.g., 01HXM8C2ZK4Q7C2B3F2R4VYV6J). The new regex accepts lowercase, underscores, dots, colons, slashes, and hyphens — most invalid in Crockford Base32. The BDD test validates with plan_id "x", which is not a valid ULID.

If this divergence is intentional (allowing flexible identifiers in the context payload), it should be documented as a known deviation.

SPEC-2 [LOW]: `provenance_map` key divergence

The spec (specification.md:42914) constructs provenance_map as:

provenance_map={f.uko_node: f.provenance for f in ordered}

The implementation uses fragment_id as key and destructures into a dict of selected fields. Using fragment_id is arguably safer (multiple fragments can share a uko_node), but diverges from the spec. This pre-dates this commit but is closely related to the BUG-3 provenance changes.

TEST-GAP-1 [MEDIUM]: No path-traversal test for plan_id

The 5 boundary scenarios test length and whitespace, but none test traversal patterns. Recommended additions:

../../../etc/passwd — should be rejected
plan_id/with/slashes — should be rejected or documented as safe
Null byte injection a\x00b — verify the regex catches it

TEST-GAP-2 [LOW]: Empty selector test lacks invocation check

The _EmptySelector (TEST-4) doesn't track whether select() was called. Compare with _TrackingSelector which sets a flag. The test could pass even if the selector were bypassed entirely.

TEST-GAP-3 [LOW]: Allocator tests verify total, not distribution

For the 999/3 scenario, the test asserts total == 999 but not that individual allocations are 333 or 334. A buggy distribution of 998, 1, 0 would also sum to 999.

BUG-EDGE-1 [LOW]: `min_useful_budget` not enforced

The spec (specification.md:42927) states the allocator "Guarantees each strategy receives at least min_useful_budget tokens or is excluded" (default 500). The executor only skips strategies with allocated_tokens <= 0. A strategy receiving 100 tokens still executes. The commit docstring correctly references spec §42927, but a log warning or TODO would prevent future callers from assuming the constraint is enforced.

BUG-EDGE-2 [LOW]: Potential name collision in strategy comparison

Changing from identity to name-based comparison is correct for wrapped/copied strategies, but if register_strategy is called with a name matching a built-in, the selector output could contain two different objects with the same name, and the filter at line 674 would include both. Risk is low since register_strategy overwrites the dict entry.

PERF-1 [LOW]: `copy.deepcopy` on construction hot path

_freeze_provenance_map calls copy.deepcopy(v) on every ContextPayload instantiation. This is ~10-50x slower than the previous shallow approach. The previous approach failed on nested dicts, so deepcopy is the correct fix. No action needed unless profiling flags it.

DOC-1 [LOW]: Commit message numbering gap

The commit message references TEST-1 through TEST-5 and TEST-7, skipping TEST-6.

Positive Observations

The largest-remainder method in DefaultBudgetAllocator is mathematically correct and guarantees sum(allocations) == total_budget.
The zero-token skip is cleaner than the previous max(..., 1) masking.
The name-based comparison is the correct solution for custom selectors that clone/wrap strategy objects.
copy.deepcopy properly addresses the nested-dict mutation bug.
All 50 BDD scenarios pass with zero failures.

## Code Review — Commit `b5bb147a` (third review round) **Scope:** Bug fixes (allocator rounding, executor skip-zero, deepcopy provenance, name-based comparison), security (plan_id validation), and test additions. **Test status:** 50/50 BDD scenarios passing (228 steps, 0 failures). --- ### Summary of Findings | ID | Severity | Category | Description | |:---|:---------|:---------|:------------| | SEC-1 | **HIGH** | Security | `plan_id` regex allows path traversal sequences (`../`) | | SEC-2 | MEDIUM | Security | Unvalidated `plan_id` in `SandboxCheckpoint` and `ChangeEntry` models | | SPEC-1 | MEDIUM | Spec conformance | `plan_id` type diverges from spec's ULID requirement | | SPEC-2 | LOW | Spec conformance | `provenance_map` keyed by `fragment_id`, spec uses `uko_node` | | SPEC-3 | LOW | Spec conformance | Protocol signature divergences (`BudgetAllocator`, `StrategyExecutor`) — expected v1 simplification | | TEST-GAP-1 | MEDIUM | Test coverage | No plan_id path-traversal test scenarios | | TEST-GAP-2 | LOW | Test coverage | `_EmptySelector` fallback test lacks invocation verification | | TEST-GAP-3 | LOW | Test coverage | Allocator tests don't verify individual allocation correctness | | BUG-EDGE-1 | LOW | Bug edge case | `min_useful_budget` exclusion not enforced | | BUG-EDGE-2 | LOW | Bug edge case | Name collision possible in strategy comparison | | PERF-1 | LOW | Performance | `copy.deepcopy` on every `ContextPayload` construction | | DOC-1 | LOW | Documentation | Commit message skips TEST-6 numbering | --- ### SEC-1 [HIGH]: `plan_id` regex allows path traversal The regex `^[\w.:/\-]+$` on `ContextPayload.plan_id` permits `/` and `.` characters. Values like `../../etc/passwd` pass validation. All other models in the codebase use the strict ULID pattern `^[0-9A-HJKMNP-TV-Z]{26}$`: - `PlanIdentity.plan_id` — ULID pattern - `Decision.plan_id` — ULID pattern + validator - `Checkpoint.plan_id` — ULID pattern The specification (`specification.md:43360-43362`) mandates filesystem paths: ``` Checkpoints: <data-dir>/checkpoints/<plan_id>/ Artifacts: <data-dir>/artifacts/<plan_id>/ ``` When those paths are implemented, a `plan_id` of `../../etc` would yield `<data-dir>/checkpoints/../../etc/` — a classic path traversal. **Recommendation:** Constrain to the ULID pattern used everywhere else, or at minimum reject `..` sequences and add a `realpath()` containment check in any future path construction. ### SEC-2 [MEDIUM]: Unvalidated `plan_id` in related models `SandboxCheckpoint.plan_id` (`sandbox/checkpoint.py:73`) and `ChangeEntry.plan_id` (`change.py:181-183`) accept arbitrary strings with no validation pattern. --- ### SPEC-1 [MEDIUM]: `plan_id` type diverges from specification The spec (`specification.md:18158`) defines `plan_id` as type **ULID** — "Unique, immutable identifier." All example values use ULID format (e.g., `01HXM8C2ZK4Q7C2B3F2R4VYV6J`). The new regex accepts lowercase, underscores, dots, colons, slashes, and hyphens — most invalid in Crockford Base32. The BDD test validates with plan_id `"x"`, which is not a valid ULID. If this divergence is intentional (allowing flexible identifiers in the context payload), it should be documented as a known deviation. ### SPEC-2 [LOW]: `provenance_map` key divergence The spec (`specification.md:42914`) constructs provenance_map as: ```python provenance_map={f.uko_node: f.provenance for f in ordered} ``` The implementation uses `fragment_id` as key and destructures into a dict of selected fields. Using `fragment_id` is arguably safer (multiple fragments can share a `uko_node`), but diverges from the spec. This pre-dates this commit but is closely related to the BUG-3 provenance changes. --- ### TEST-GAP-1 [MEDIUM]: No path-traversal test for plan_id The 5 boundary scenarios test length and whitespace, but none test traversal patterns. Recommended additions: - `../../../etc/passwd` — should be rejected - `plan_id/with/slashes` — should be rejected or documented as safe - Null byte injection `a\x00b` — verify the regex catches it ### TEST-GAP-2 [LOW]: Empty selector test lacks invocation check The `_EmptySelector` (TEST-4) doesn't track whether `select()` was called. Compare with `_TrackingSelector` which sets a flag. The test could pass even if the selector were bypassed entirely. ### TEST-GAP-3 [LOW]: Allocator tests verify total, not distribution For the 999/3 scenario, the test asserts `total == 999` but not that individual allocations are `333` or `334`. A buggy distribution of `998, 1, 0` would also sum to 999. --- ### BUG-EDGE-1 [LOW]: `min_useful_budget` not enforced The spec (`specification.md:42927`) states the allocator "Guarantees each strategy receives at least `min_useful_budget` tokens or is excluded" (default 500). The executor only skips strategies with `allocated_tokens <= 0`. A strategy receiving 100 tokens still executes. The commit docstring correctly references spec §42927, but a log warning or TODO would prevent future callers from assuming the constraint is enforced. ### BUG-EDGE-2 [LOW]: Potential name collision in strategy comparison Changing from identity to name-based comparison is correct for wrapped/copied strategies, but if `register_strategy` is called with a name matching a built-in, the selector output could contain two different objects with the same name, and the filter at line 674 would include both. Risk is low since `register_strategy` overwrites the dict entry. --- ### PERF-1 [LOW]: `copy.deepcopy` on construction hot path `_freeze_provenance_map` calls `copy.deepcopy(v)` on every `ContextPayload` instantiation. This is ~10-50x slower than the previous shallow approach. The previous approach failed on nested dicts, so `deepcopy` is the correct fix. No action needed unless profiling flags it. ### DOC-1 [LOW]: Commit message numbering gap The commit message references TEST-1 through TEST-5 and TEST-7, skipping TEST-6. --- ### Positive Observations - The **largest-remainder method** in `DefaultBudgetAllocator` is mathematically correct and guarantees `sum(allocations) == total_budget`. - The **zero-token skip** is cleaner than the previous `max(..., 1)` masking. - The **name-based comparison** is the correct solution for custom selectors that clone/wrap strategy objects. - **`copy.deepcopy`** properly addresses the nested-dict mutation bug. - All 50 BDD scenarios pass with zero failures.

features/steps/acms_pipeline_steps.py Outdated

						
				@@ -0,0 +923,4 @@

				        request: dict[str, Any],

				    ) -> list[tuple[Any, float]]:

				        return []

CoreRasurae commented

2026-03-04 23:42:59 +00:00

TEST-GAP-2 [LOW]: Unlike _TrackingSelector which records invocation, _EmptySelector has no mechanism to verify select() was actually called. The fallback test could pass even if the selector were bypassed entirely.

Suggestion: Add self.called = False in __init__ and self.called = True in select(), then assert context.pipeline._strategy_selector.called in the Then step.

**TEST-GAP-2 [LOW]:** Unlike `_TrackingSelector` which records invocation, `_EmptySelector` has no mechanism to verify `select()` was actually called. The fallback test could pass even if the selector were bypassed entirely. **Suggestion:** Add `self.called = False` in `__init__` and `self.called = True` in `select()`, then assert `context.pipeline._strategy_selector.called` in the Then step.

src/cleveragents/application/services/acms_service.py Outdated

						
				@@ -0,0 +461,4 @@

				            return []

				        collected: list[ContextFragment] = []

				        for strategy, _confidence, allocated_tokens in allocations:

				            if allocated_tokens <= 0:

CoreRasurae commented

2026-03-04 23:42:59 +00:00

BUG-EDGE-1 [LOW]: The spec (§42927) states the ProportionalBudgetAllocator "Guarantees each strategy receives at least min_useful_budget tokens or is excluded" (default 500). This guard only skips strategies with 0 tokens. A strategy allocated e.g. 100 tokens would still execute, potentially producing low-quality fragments.

Consider adding a log warning or TODO here for when allocated_tokens < min_useful_budget.

**BUG-EDGE-1 [LOW]:** The spec (§42927) states the `ProportionalBudgetAllocator` "Guarantees each strategy receives at least `min_useful_budget` tokens or is excluded" (default 500). This guard only skips strategies with 0 tokens. A strategy allocated e.g. 100 tokens would still execute, potentially producing low-quality fragments. Consider adding a log warning or TODO here for when `allocated_tokens < min_useful_budget`.

src/cleveragents/application/services/acms_service.py Outdated

						
				@@ -0,0 +671,4 @@

				        # for v1 single-strategy execution; future multi-strategy support

				        # will remove this filter.  Compare by name rather than identity

				        # so custom selectors that wrap/copy strategy objects still match.

				        candidates = [(s, c) for s, c in candidates if s.name == strategy_name] or [

CoreRasurae commented

2026-03-04 23:42:59 +00:00

BUG-EDGE-2 [LOW]: Name-based comparison is correct for wrapped/copied strategies, but introduces a subtle edge: if two different strategy objects share the same name (e.g., a custom selector returns both a built-in and a wrapper with the same name), both would pass this filter. In v1 single-strategy mode this could lead to duplicate execution. Risk is low since register_strategy overwrites the dict entry by name.

**BUG-EDGE-2 [LOW]:** Name-based comparison is correct for wrapped/copied strategies, but introduces a subtle edge: if two different strategy objects share the same `name` (e.g., a custom selector returns both a built-in and a wrapper with the same name), both would pass this filter. In v1 single-strategy mode this could lead to duplicate execution. Risk is low since `register_strategy` overwrites the dict entry by name.

src/cleveragents/domain/models/core/context_fragment.py Outdated

						
				@@ -0,0 +162,4 @@

				        ...,

				        min_length=1,

				        max_length=256,

				        pattern=r"^[\w.:/\-]+$",

CoreRasurae commented

2026-03-04 23:42:59 +00:00

SEC-1 [HIGH]: This regex ^[\w.:/\-]+$ permits path traversal sequences. A plan_id of ../../etc/passwd passes this validation.

Every other model in the codebase (PlanIdentity, Decision, Checkpoint) uses the strict ULID pattern ^[0-9A-HJKMNP-TV-Z]{26}$. The spec (specification.md:18158) explicitly types plan_id as ULID.

The spec also mandates filesystem paths like <data-dir>/checkpoints/<plan_id>/ (line 43360), making this a latent path-traversal vulnerability.

Recommendation: Use the ULID pattern ^[0-9A-HJKMNP-TV-Z]{26}$ consistent with other models, or at minimum reject .. sequences.

**SEC-1 [HIGH]:** This regex `^[\w.:/\-]+$` permits path traversal sequences. A `plan_id` of `../../etc/passwd` passes this validation. Every other model in the codebase (`PlanIdentity`, `Decision`, `Checkpoint`) uses the strict ULID pattern `^[0-9A-HJKMNP-TV-Z]{26}$`. The spec (`specification.md:18158`) explicitly types `plan_id` as ULID. The spec also mandates filesystem paths like `<data-dir>/checkpoints/<plan_id>/` (line 43360), making this a latent path-traversal vulnerability. **Recommendation:** Use the ULID pattern `^[0-9A-HJKMNP-TV-Z]{26}$` consistent with other models, or at minimum reject `..` sequences.

src/cleveragents/domain/models/core/context_fragment.py Outdated

						
				@@ -0,0 +204,4 @@

				    @classmethod

				    def _freeze_provenance_map(cls, v: dict[str, Any]) -> dict[str, Any]:

				        """Deep defensive copy so callers cannot mutate nested state."""

				        return copy.deepcopy(v)

CoreRasurae commented

2026-03-04 23:42:59 +00:00

PERF-1 [LOW]: copy.deepcopy is ~10-50x slower than the previous shallow-copy approach. This validator runs on the hot path of every context assembly cycle. Correct fix for the nested-dict mutation bug, but worth noting for future profiling if provenance maps grow large.

**PERF-1 [LOW]:** `copy.deepcopy` is ~10-50x slower than the previous shallow-copy approach. This validator runs on the hot path of every context assembly cycle. Correct fix for the nested-dict mutation bug, but worth noting for future profiling if provenance maps grow large.

CoreRasurae commented

2026-03-05 00:25:11 +00:00

Code Review Report — Commit `9ad3ba11`

Commit: 9ad3ba11 — fix(acms): enforce ULID pattern on plan_id, address reviewer findings
Author: Hamza Khyari (khyarihamza3@gmail.com)
Date: 2026-03-05
Branch: feature/m5-acms-context
Issue: #188 — feat(acms): add ACMS v1 context pipeline
Files changed: 5 (89 insertions, 59 deletions)

Summary

The commit hardens ContextPayload.plan_id validation from a permissive ASCII pattern (^[a-zA-Z0-9_.:/\-]+$, 1–256 chars) to a strict Crockford Base32 ULID pattern (^[0-9A-HJKMNP-TV-Z]{26}$), consistent with PlanIdentity, Decision, and Checkpoint models. It also improves test coverage for plan_id boundaries, allocator rounding, and strategy selector fallback verification. The changes are well-motivated and the security improvement is genuine. However, 6 findings were identified that are worth addressing.

Findings

F-1: `ULID_PATTERN` defined locally instead of imported — Medium (Maintainability)

File: src/cleveragents/domain/models/core/context_fragment.py:24

The commit adds a local ULID_PATTERN constant rather than importing from the canonical source. The codebase now has 6 independent definitions of the same pattern:

File	Type
`core/plan.py:74`	Definition (canonical)
`core/decision.py:81`	Duplicate definition
`core/session.py:49`	Duplicate definition
`core/resource.py:41`	Duplicate definition
`core/context_fragment.py:24`	Duplicate definition (this commit)
`observability/llm_trace.py:24`	Duplicate definition
`core/checkpoint.py:23`	Import from `plan.py`
`core/resume.py:21`	Import from `plan.py`

checkpoint.py and resume.py already demonstrate the preferred pattern of importing from plan.py. If the ULID pattern ever needs updating, 6 files must change in sync. A centralized constant (in a shared constants.py or validators.py) imported everywhere would eliminate drift risk.

Recommendation: Import ULID_PATTERN from plan.py or extract to a shared module.

F-2: No early `plan_id` validation in `ACMSPipeline.assemble()` — Medium (Performance / Fail-fast)

File: src/cleveragents/application/services/acms_service.py — assemble() method

The assemble() method accepts plan_id: str with no validation. The ULID constraint is only enforced when ContextPayload is constructed at the very end of the pipeline (after strategy selection, budget allocation, parallel strategy execution, deduplication, depth resolution, scoring, budget packing, ordering, preamble generation, and hash computation).

If a caller passes an invalid plan_id, all pipeline work executes to completion before a ValidationError is raised. This wastes compute and produces a confusing error traceback pointing at the Pydantic model rather than the caller.

Recommendation: Add an early guard at the top of assemble():

import re
if not re.match(ULID_PATTERN, plan_id):
    raise ValueError(f"plan_id must be a valid ULID, got {plan_id!r}")

F-3: Incomplete Crockford excluded-character test coverage — Medium (Test Flaw)

File: features/acms_pipeline.feature:478-482

The scenario is named "Payload rejects plan_id with excluded Crockford Base32 chars (I, L, O, U)" but the test string "01JQTESTPN0000000000000LIA" only contains L and I. Characters O and U are never tested. If the regex had a defect specifically around O or U (e.g., a typo in the character class P-TV-Z that accidentally included U), this test would not catch it.

Recommendation: Either split into 4 individual scenarios (one per excluded char) or use a Scenario Outline with an Examples table covering all four:

Scenario Outline: Payload rejects plan_id with excluded char <char>
  Given the ACMS pipeline modules are available
  When I create a payload with plan_id "<plan_id>"
  Then an ACMS validation error should be raised

  Examples:
    | char | plan_id                    |
    | I    | 01JQTESTPN000000000000I0AA |
    | L    | 01JQTESTPN000000000000L0AA |
    | O    | 01JQTESTPN000000000000O0AA |
    | U    | 01JQTESTPN000000000000U0AA |

F-4: Non-isolated negative test cases — Low (Test Isolation)

File: features/acms_pipeline.feature:484-494

Two security scenarios reject input based on multiple simultaneous violations, not the specific one claimed:

Scenario	Test Input	Length	Violations
"Payload rejects path traversal"	`../../etc/passwd`	16	Invalid chars + wrong length
"Payload rejects plan_id with slashes"	`plan/sub:v1.2-beta_rc0000`	25	Invalid chars + wrong length

Neither test isolates the targeted behavior. If the character validation were broken but the length check worked, these tests would still pass — giving false confidence.

Recommendation: Use 26-character strings that are valid ULIDs except for the targeted violation:

"01JQTESTPN00000000/../00AA"  # exactly 26 chars, contains path traversal
"01JQTESTPN00000/0000000AA"   # exactly 26 chars, contains slash

F-5: Broadened exception catch in test step — Low (Test Quality)

File: features/steps/acms_pipeline_steps.py:305

The change from except ValueError to except (ValueError, ValidationError) in step_assemble_with_strategy was necessary because the stricter plan_id pattern now raises ValidationError instead of ValueError in some paths. However, this broader catch could mask genuine ValueError exceptions from pipeline logic (e.g., strategy execution failures) by silently treating them as expected.

Recommendation: If the test is specifically validating plan_id rejection, catch ValidationError only. If it needs to handle both pipeline errors and validation errors, consider distinguishing them in the assertion step (e.g., storing the exception type alongside the error).

F-6: Coverage verification not confirmed — Low (Process)

Issue #188 requires "Verify coverage >= 97% via nox -s coverage_report" as a subtask. The build/coverage.xml file does not exist on the branch, meaning the coverage session has not been run (or its output was not committed). While coverage artifacts are typically generated in CI and not committed, there is no evidence in the commit or branch that the 97% threshold was verified after these changes.

Recommendation: Confirm the coverage session passes after the test changes in this commit. The commit removed two step definitions (step_create_payload_plan_id_length and step_created_payload_success) — ensure no other feature files reference these steps, which would cause runtime failures reducing coverage.

Positive Aspects

SEC-1 is correctly addressed: The ULID constraint eliminates the path-traversal attack surface that existed with the previous [a-zA-Z0-9_.:/\-]+ pattern (which allowed slashes and dots).
SPEC-1 alignment: ContextPayload.plan_id now uses the same validation as PlanIdentity, Decision, and Checkpoint, ensuring consistency across the domain model.
TEST-GAP-2: Adding self.called tracking to _EmptySelector and asserting invocation is a genuine test quality improvement — the previous test only checked output behavior, not that the custom selector code path was exercised.
TEST-GAP-3: Adding individual allocation bounds assertion (333 or 334) to the 999/3 rounding scenario catches rounding bugs that the total == 999 check alone would miss.
Immutable fragment passing: The list(context.fragments) defensive copy in test steps prevents accidental mutation during pipeline execution.

Verdict

The commit is a solid incremental improvement addressing legitimate security and test quality concerns. The 3 medium-severity findings (F-1 through F-3) are worth addressing before merge — F-1 and F-2 affect production code maintainability and robustness, while F-3 represents a gap in the claimed test coverage. The 3 low-severity findings (F-4 through F-6) are improvements that could be addressed in a follow-up.

#	Finding	Severity	Category
F-1	`ULID_PATTERN` duplicated locally instead of imported	Medium	Maintainability
F-2	No early `plan_id` validation in `assemble()` — fails late	Medium	Performance / Fail-fast
F-3	Excluded Crockford chars test only covers I/L, misses O/U	Medium	Test Coverage Gap
F-4	Path-traversal and slash tests fail on length, not just chars	Low	Test Isolation
F-5	Broadened `except` clause could mask pipeline errors	Low	Test Quality
F-6	97% coverage threshold not demonstrably verified	Low	Process

# Code Review Report — Commit `9ad3ba11` **Commit:** `9ad3ba11` — `fix(acms): enforce ULID pattern on plan_id, address reviewer findings` **Author:** Hamza Khyari (`khyarihamza3@gmail.com`) **Date:** 2026-03-05 **Branch:** `feature/m5-acms-context` **Issue:** #188 — `feat(acms): add ACMS v1 context pipeline` **Files changed:** 5 (89 insertions, 59 deletions) --- ## Summary The commit hardens `ContextPayload.plan_id` validation from a permissive ASCII pattern (`^[a-zA-Z0-9_.:/\-]+$`, 1–256 chars) to a strict Crockford Base32 ULID pattern (`^[0-9A-HJKMNP-TV-Z]{26}$`), consistent with `PlanIdentity`, `Decision`, and `Checkpoint` models. It also improves test coverage for plan_id boundaries, allocator rounding, and strategy selector fallback verification. The changes are well-motivated and the security improvement is genuine. However, **6 findings** were identified that are worth addressing. --- ## Findings ### F-1: `ULID_PATTERN` defined locally instead of imported — Medium (Maintainability) **File:** `src/cleveragents/domain/models/core/context_fragment.py:24` The commit adds a local `ULID_PATTERN` constant rather than importing from the canonical source. The codebase now has **6 independent definitions** of the same pattern: | File | Type | |---|---| | `core/plan.py:74` | Definition (canonical) | | `core/decision.py:81` | Duplicate definition | | `core/session.py:49` | Duplicate definition | | `core/resource.py:41` | Duplicate definition | | **`core/context_fragment.py:24`** | **Duplicate definition (this commit)** | | `observability/llm_trace.py:24` | Duplicate definition | | `core/checkpoint.py:23` | Import from `plan.py` | | `core/resume.py:21` | Import from `plan.py` | `checkpoint.py` and `resume.py` already demonstrate the preferred pattern of importing from `plan.py`. If the ULID pattern ever needs updating, 6 files must change in sync. A centralized constant (in a shared `constants.py` or `validators.py`) imported everywhere would eliminate drift risk. **Recommendation:** Import `ULID_PATTERN` from `plan.py` or extract to a shared module. --- ### F-2: No early `plan_id` validation in `ACMSPipeline.assemble()` — Medium (Performance / Fail-fast) **File:** `src/cleveragents/application/services/acms_service.py` — `assemble()` method The `assemble()` method accepts `plan_id: str` with no validation. The ULID constraint is only enforced when `ContextPayload` is constructed at the **very end** of the pipeline (after strategy selection, budget allocation, parallel strategy execution, deduplication, depth resolution, scoring, budget packing, ordering, preamble generation, and hash computation). If a caller passes an invalid `plan_id`, all pipeline work executes to completion before a `ValidationError` is raised. This wastes compute and produces a confusing error traceback pointing at the Pydantic model rather than the caller. **Recommendation:** Add an early guard at the top of `assemble()`: ```python import re if not re.match(ULID_PATTERN, plan_id): raise ValueError(f"plan_id must be a valid ULID, got {plan_id!r}") ``` --- ### F-3: Incomplete Crockford excluded-character test coverage — Medium (Test Flaw) **File:** `features/acms_pipeline.feature:478-482` The scenario is named *"Payload rejects plan_id with excluded Crockford Base32 chars (I, L, O, U)"* but the test string `"01JQTESTPN0000000000000LIA"` only contains **L and I**. Characters **O and U** are never tested. If the regex had a defect specifically around `O` or `U` (e.g., a typo in the character class `P-TV-Z` that accidentally included `U`), this test would not catch it. **Recommendation:** Either split into 4 individual scenarios (one per excluded char) or use a `Scenario Outline` with an `Examples` table covering all four: ```gherkin Scenario Outline: Payload rejects plan_id with excluded char <char> Given the ACMS pipeline modules are available When I create a payload with plan_id "<plan_id>" Then an ACMS validation error should be raised Examples: | char | plan_id | | I | 01JQTESTPN000000000000I0AA | | L | 01JQTESTPN000000000000L0AA | | O | 01JQTESTPN000000000000O0AA | | U | 01JQTESTPN000000000000U0AA | ``` --- ### F-4: Non-isolated negative test cases — Low (Test Isolation) **File:** `features/acms_pipeline.feature:484-494` Two security scenarios reject input based on **multiple** simultaneous violations, not the specific one claimed: | Scenario | Test Input | Length | Violations | |---|---|---|---| | "Payload rejects path traversal" | `../../etc/passwd` | 16 | Invalid chars **+** wrong length | | "Payload rejects plan_id with slashes" | `plan/sub:v1.2-beta_rc0000` | 25 | Invalid chars **+** wrong length | Neither test isolates the targeted behavior. If the character validation were broken but the length check worked, these tests would still pass — giving false confidence. **Recommendation:** Use 26-character strings that are valid ULIDs except for the targeted violation: ``` "01JQTESTPN00000000/../00AA" # exactly 26 chars, contains path traversal "01JQTESTPN00000/0000000AA" # exactly 26 chars, contains slash ``` --- ### F-5: Broadened exception catch in test step — Low (Test Quality) **File:** `features/steps/acms_pipeline_steps.py:305` The change from `except ValueError` to `except (ValueError, ValidationError)` in `step_assemble_with_strategy` was necessary because the stricter plan_id pattern now raises `ValidationError` instead of `ValueError` in some paths. However, this broader catch could mask genuine `ValueError` exceptions from pipeline logic (e.g., strategy execution failures) by silently treating them as expected. **Recommendation:** If the test is specifically validating plan_id rejection, catch `ValidationError` only. If it needs to handle both pipeline errors and validation errors, consider distinguishing them in the assertion step (e.g., storing the exception type alongside the error). --- ### F-6: Coverage verification not confirmed — Low (Process) Issue #188 requires *"Verify coverage >= 97% via `nox -s coverage_report`"* as a subtask. The `build/coverage.xml` file does not exist on the branch, meaning the coverage session has not been run (or its output was not committed). While coverage artifacts are typically generated in CI and not committed, there is no evidence in the commit or branch that the 97% threshold was verified after these changes. **Recommendation:** Confirm the coverage session passes after the test changes in this commit. The commit removed two step definitions (`step_create_payload_plan_id_length` and `step_created_payload_success`) — ensure no other feature files reference these steps, which would cause runtime failures reducing coverage. --- ## Positive Aspects - **SEC-1 is correctly addressed**: The ULID constraint eliminates the path-traversal attack surface that existed with the previous `[a-zA-Z0-9_.:/\-]+` pattern (which allowed slashes and dots). - **SPEC-1 alignment**: `ContextPayload.plan_id` now uses the same validation as `PlanIdentity`, `Decision`, and `Checkpoint`, ensuring consistency across the domain model. - **TEST-GAP-2**: Adding `self.called` tracking to `_EmptySelector` and asserting invocation is a genuine test quality improvement — the previous test only checked output behavior, not that the custom selector code path was exercised. - **TEST-GAP-3**: Adding individual allocation bounds assertion (`333 or 334`) to the 999/3 rounding scenario catches rounding bugs that the `total == 999` check alone would miss. - **Immutable fragment passing**: The `list(context.fragments)` defensive copy in test steps prevents accidental mutation during pipeline execution. --- ## Verdict The commit is a solid incremental improvement addressing legitimate security and test quality concerns. The **3 medium-severity findings (F-1 through F-3)** are worth addressing before merge — F-1 and F-2 affect production code maintainability and robustness, while F-3 represents a gap in the claimed test coverage. The **3 low-severity findings (F-4 through F-6)** are improvements that could be addressed in a follow-up. | # | Finding | Severity | Category | |---|---------|----------|----------| | F-1 | `ULID_PATTERN` duplicated locally instead of imported | Medium | Maintainability | | F-2 | No early `plan_id` validation in `assemble()` — fails late | Medium | Performance / Fail-fast | | F-3 | Excluded Crockford chars test only covers I/L, misses O/U | Medium | Test Coverage Gap | | F-4 | Path-traversal and slash tests fail on length, not just chars | Low | Test Isolation | | F-5 | Broadened `except` clause could mask pipeline errors | Low | Test Quality | | F-6 | 97% coverage threshold not demonstrably verified | Low | Process |

CoreRasurae commented

2026-03-05 00:25:15 +00:00

I am approving the PR on the condition that following issue is solved before merging to master:

F-2: No early plan_id validation in ACMSPipeline.assemble() (Performance / Fail-fast) -- Medium
File: src/cleveragents/application/services/acms_service.py -- assemble() method
The assemble() method accepts plan_id: str with no validation. The ULID constraint is only enforced when ContextPayload is constructed at the very end of the pipeline (after strategy selection, budget allocation, parallel strategy execution, deduplication, depth resolution, scoring, budget packing, ordering, preamble generation, and hash computation).
If a caller passes an invalid plan_id, all pipeline work executes to completion before a ValidationError is raised. This wastes compute and produces a confusing error traceback pointing at the Pydantic model rather than the caller.
Recommendation: Add an early guard at the top of assemble():
import re
if not re.match(ULID_PATTERN, plan_id):
raise ValueError(f"plan_id must be a valid ULID, got {plan_id!r}")

I am approving the PR on the condition that following issue is solved before merging to master: **F-2: No early plan_id validation in ACMSPipeline.assemble() (Performance / Fail-fast) -- Medium** File: src/cleveragents/application/services/acms_service.py -- assemble() method The assemble() method accepts plan_id: str with no validation. The ULID constraint is only enforced when ContextPayload is constructed at the very end of the pipeline (after strategy selection, budget allocation, parallel strategy execution, deduplication, depth resolution, scoring, budget packing, ordering, preamble generation, and hash computation). If a caller passes an invalid plan_id, all pipeline work executes to completion before a ValidationError is raised. This wastes compute and produces a confusing error traceback pointing at the Pydantic model rather than the caller. Recommendation: Add an early guard at the top of assemble(): import re if not re.match(ULID_PATTERN, plan_id): raise ValueError(f"plan_id must be a valid ULID, got {plan_id!r}")

CoreRasurae approved these changes 2026-03-05 00:27:26 +00:00

Dismissed

CoreRasurae left a comment

i am approving the PR, on the condition that the following is solved:
F-2 No early plan_id validation in assemble() -- fails late Medium Performance / Fail-fast

i am approving the PR, on the condition that the following is solved: **F-2** *No early plan_id validation in assemble() -- fails late* **Medium Performance / Fail-fast**

hamza.khyari dismissed CoreRasurae's review 2026-03-05 00:49:17 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

hamza.khyari requested review from CoreRasurae 2026-03-05 00:54:04 +00:00

hamza.khyari requested review from brent.edwards 2026-03-05 00:54:12 +00:00

hamza.khyari force-pushed feature/m5-acms-context from 43227833ee to 292419aa0b

2026-03-05 00:55:00 +00:00

Compare

hamza.khyari scheduled this pull request to auto merge when all checks succeed 2026-03-05 00:58:18 +00:00

hamza.khyari force-pushed feature/m5-acms-context from 292419aa0b to febea8950f

2026-03-05 01:28:54 +00:00

Compare

hamza.khyari merged commit 8764fbed7b into master

2026-03-05 01:33:54 +00:00

hamza.khyari deleted branch feature/m5-acms-context

2026-03-05 01:39:46 +00:00

brent.edwards referenced this issue from a commit

2026-03-05 01:46:01 +00:00

Merge pull request 'feat(acms): add ACMS v1 context pipeline' (#465) from feature/m5-acms-context into master

freemo referenced this pull request

2026-03-09 15:28:00 +00:00

feat(acms): add ACMS v1 context pipeline #188

Sign in to join this conversation.

2 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#465

feat(acms): add ACMS v1 context pipeline #465

Summary

Testing

Files Changed

Code Review Report: Issue #188 — feat(acms): add ACMS v1 context pipeline

CRITICAL — Specification Compliance

C1. ContextFragment is missing key fields from the specification

C2. ContextPayload diverges from spec's AssembledContext

C3. Pipeline is missing the 10-component architecture

C4. FusionStrategy protocol is severely stripped down from spec's ContextStrategy

C5. CRP acronym is wrong in the reference documentation

BUG DETECTION

B1. RecencyFusionStrategy sorts by string comparison of ISO timestamps

B2. ContextBudget defaults create an implicit minimum max_tokens constraint

B3. token_estimate=0 default allows unlimited fragment inclusion

TEST COVERAGE AND TEST FLAWS

T1. Inconsistent error attribute naming in Behave steps

T2. No Robot Framework test for the recency strategy

T3. No ASV benchmark for the recency strategy

T4. plan_id, payload_id, and assembled_at are never asserted

T5. No test for overwriting a built-in strategy name

T6. No test for reserved_tokens > max_tokens (only == is tested)

PERFORMANCE

P1. Strategy objects are instantiated on every assemble() call

P2. No early termination when budget is fully consumed

SECURITY

S1. source field has no validation against allowed values

S2. No size limits on content or metadata

ARCHITECTURAL / STYLE

A1. ACMSPipeline does not follow the project's service conventions

A2. Tiers are fragment tags, not actual storage tiers

SUMMARY TABLE

Code Review Report -- Commit 8c6b4122 (branch feature/m5-acms-context)

1. TEST COVERAGE GAPS (4 findings)

T1 [HIGH] -- No dedicated test for compute_context_hash verifying the fix this commit introduces

T2 [MEDIUM] -- No test with multi-byte UTF-8 content

T3 [MEDIUM] -- Robot helper hash assertion is weaker than Behave equivalent

T4 [LOW] -- Benchmark does not exercise compute_context_hash in isolation

2. BUGS (3 findings)

B1 [HIGH] -- provenance_map keying deviates from spec

B2 [MEDIUM] -- strategies_used and provenance_map are mutable inside a frozen model

B3 [LOW] -- Dead code in budget_used calculation

3. SPEC ALIGNMENT (4 findings)

S1 [MEDIUM] -- ContextStrategy protocol method signatures differ from spec

S2 [MEDIUM] -- StrategyCapabilities field names differ from spec

S3 [LOW] -- Pipeline component protocols have simplified signatures vs spec

S4 [LOW] -- ContextFragment model adds fields beyond spec

4. PERFORMANCE (2 findings)

P1 [LOW] -- _pack_budget skips fragments that don't fit instead of trying smaller ones

P2 [LOW] -- Double iteration to compute total_tokens

5. SECURITY (1 finding)

SEC1 [LOW] -- register_strategy accepts unvalidated objects

6. TEST FLAWS (1 finding)

TF1 [MEDIUM] -- BDD test for "non-existent strategy" catches wrong exception type

Summary Table

Code Review Report — 6f318e36

Summary

B — Bugs & Logic Errors

B1 [HIGH] — DefaultBudgetAllocator duplicates total budget to every candidate

B2 [MEDIUM] — DefaultStrategyExecutor ignores allocated token budget

B3 [MEDIUM] — assemble() passes only one strategy to the selector, defeating the 10-component architecture

B4 [LOW] — provenance_map defensive copy is shallow only

T — Test Coverage & Test Flaws

T1 [HIGH] — Zero tests for custom pipeline component injection (DI)

T2 [HIGH] — No tests for SkeletonCompressor

T3 [MEDIUM] — No tests for multi-candidate allocation path

T4 [MEDIUM] — settings and unit_of_work DI params are untested dead code

T5 [LOW] — Import inside function body in test step

T6 [LOW] — Robot tests do not cover Phase 1 pipeline components

SC — Spec Conformance Issues

SC1 [MEDIUM] — Multiple protocol signature divergences not documented in Known Limitations

SC2 [LOW] — Duplicated section comment in feature file

S — Security Issues

S1 [MEDIUM] — No input validation on plan_id

S2 [LOW] — Fragment content up to 1MB allowed per fragment with no fragment count limit

P — Performance Issues

P1 [LOW] — DefaultStrategySelector.select() calls can_handle() on every strategy per assembly

P2 [LOW] — No pipeline component result caching

Overall Assessment

Code Review Report — Commit 0836502e

Code Review Report: Issue #188 — `feat(acms): add ACMS v1 context pipeline`

C1. `ContextFragment` is missing key fields from the specification

C2. `ContextPayload` diverges from spec's `AssembledContext`

C4. `FusionStrategy` protocol is severely stripped down from spec's `ContextStrategy`

B1. `RecencyFusionStrategy` sorts by string comparison of ISO timestamps

B2. `ContextBudget` defaults create an implicit minimum `max_tokens` constraint

B3. `token_estimate=0` default allows unlimited fragment inclusion

T2. No Robot Framework test for the `recency` strategy

T3. No ASV benchmark for the `recency` strategy

T4. `plan_id`, `payload_id`, and `assembled_at` are never asserted

T6. No test for `reserved_tokens > max_tokens` (only `==` is tested)

P1. Strategy objects are instantiated on every `assemble()` call

S1. `source` field has no validation against allowed values

S2. No size limits on `content` or `metadata`

A1. `ACMSPipeline` does not follow the project's service conventions

Code Review Report -- Commit `8c6b4122` (branch `feature/m5-acms-context`)

T1 [HIGH] -- No dedicated test for `compute_context_hash` verifying the fix this commit introduces

T4 [LOW] -- Benchmark does not exercise `compute_context_hash` in isolation

B1 [HIGH] -- `provenance_map` keying deviates from spec

B2 [MEDIUM] -- `strategies_used` and `provenance_map` are mutable inside a frozen model

B3 [LOW] -- Dead code in `budget_used` calculation

S1 [MEDIUM] -- `ContextStrategy` protocol method signatures differ from spec

S2 [MEDIUM] -- `StrategyCapabilities` field names differ from spec

S4 [LOW] -- `ContextFragment` model adds fields beyond spec

P1 [LOW] -- `_pack_budget` skips fragments that don't fit instead of trying smaller ones

SEC1 [LOW] -- `register_strategy` accepts unvalidated objects

Code Review Report — `6f318e36`

B1 [HIGH] — `DefaultBudgetAllocator` duplicates total budget to every candidate

B2 [MEDIUM] — `DefaultStrategyExecutor` ignores allocated token budget

B3 [MEDIUM] — `assemble()` passes only one strategy to the selector, defeating the 10-component architecture

B4 [LOW] — `provenance_map` defensive copy is shallow only

T2 [HIGH] — No tests for `SkeletonCompressor`

T4 [MEDIUM] — `settings` and `unit_of_work` DI params are untested dead code

S1 [MEDIUM] — No input validation on `plan_id`

S2 [LOW] — Fragment `content` up to 1MB allowed per fragment with no fragment count limit

P1 [LOW] — `DefaultStrategySelector.select()` calls `can_handle()` on every strategy per assembly

Code Review Report — Commit `0836502e`

BUG-1 (Medium): `DefaultBudgetAllocator` leaks tokens due to `int()` truncation

BUG-2 (Medium): `DefaultStrategyExecutor` masks zero-allocation with `max(allocated_tokens, 1)`

BUG-3 (Low): `provenance_map` defensive copy is only one level deep

BUG-4 (Low): `assemble()` discards custom selector's multi-strategy results

BUG-5 (Low): `DefaultStrategyExecutor.execute()` ignores its `budget` parameter

TEST-1 (High): No test for `plan_id` validation boundaries

TEST-2 (High): No test for `provenance_map` defensive copy

TEST-4 (Medium): No test for `assemble()` strategy filtering / fallback behavior

TEST-7 (Low): `strategies_used` assertion is vacuous

SEC-1 (Medium): `plan_id` lacks character/pattern validation

SEC-2 (Low): `strategy_name` logged without sanitization

PERF-1 (Low): `DefaultStrategySelector` sorts the full strategy registry on every call, then result is discarded

SPEC-2 (Info): `init.py` `# fmt: skip` suppresses formatter on checkpoint import

Code Review — Commit `b5bb147a` (third review round)

SEC-1 [HIGH]: `plan_id` regex allows path traversal

SEC-2 [MEDIUM]: Unvalidated `plan_id` in related models

SPEC-1 [MEDIUM]: `plan_id` type diverges from specification

SPEC-2 [LOW]: `provenance_map` key divergence

BUG-EDGE-1 [LOW]: `min_useful_budget` not enforced

PERF-1 [LOW]: `copy.deepcopy` on construction hot path

Code Review Report — Commit `9ad3ba11`

F-1: `ULID_PATTERN` defined locally instead of imported — Medium (Maintainability)

F-2: No early `plan_id` validation in `ACMSPipeline.assemble()` — Medium (Performance / Fail-fast)