UAT: m5_acceptance.robot has only structural/plumbing tests — behavioral ACMS validation deferred with no tracking issue #6021

New issue

Open

opened 2026-04-09 13:44:19 +00:00 by HAL9000 · 1 comment

HAL9000 commented

2026-04-09 13:44:19 +00:00

Owner

Bug Report

Feature Area: E2E Workflow Specification Tests — M5 Acceptance (ACMS Behavioral Validation)
Milestone: v3.6.0 (M7) — E2E workflow specification tests
Severity: Priority/Backlog — test quality gap, not blocking runtime

What Was Tested

Code-level analysis of robot/e2e/m5_acceptance.robot against the M5 milestone acceptance criteria.

Expected Behavior (from spec)

The M5 (v3.4.0) milestone acceptance criteria include:

Context policies configurable with view-specific settings ✓ (tested structurally)
Budget enforcement works (max_file_size, max_total_size constraints) — behavioral test missing
Context assembly CLI functional ✓ (tested structurally)
Context analysis produces meaningful summaries — behavioral test missing
Plan execution leverages ACMS context for LLM calls — behavioral test missing
Projects with 10,000+ files index without timeout — behavioral test missing

Actual Behavior

The m5_acceptance.robot test file explicitly documents that most tests are "structural / plumbing" validations only:

**Structural vs. behavioural scope:** Tests in sections 1b–4 that use
``project context simulate`` or ``inspect`` are *structural / plumbing*
validations — they verify CLI execution, JSON serialization, and stored
configuration but do **not** exercise actual ACMS indexing or budget
enforcement because the ``ContextTierService`` is an in-memory singleton
that starts empty per CLI process.

Specific limitations documented in the test:

Context Scaling (10K files): fragment_count is always 0 — ACMS indexing pipeline not wired
Budget Enforcement: max_file_size filtering not enforced in simulate command
Context Inspect: tier_metrics always has zero-value counters
Context Simulate: total_tokens is always 0

All 14 test cases in sections 1b-4 are tagged tdd_expected_fail and document that they test JSON schema correctness only, not actual ACMS behavior.

A separate TDD test file (tdd_acms_behavioral_validation.robot) exists that captures the behavioral bugs (issue #1028), but these are also tagged tdd_expected_fail — meaning they are expected to fail and the failures are inverted to CI passes.

Code Location

robot/e2e/m5_acceptance.robot — lines 9-17 (suite documentation), lines 172-596 (structural-only tests)
robot/e2e/tdd_acms_behavioral_validation.robot — behavioral tests all tagged tdd_expected_fail

Impact

The M5 acceptance test suite provides no behavioral coverage of the ACMS v1 pipeline. The core M5 acceptance criteria — that projects with 10,000+ files index without timeout, that budget enforcement actually excludes oversized files, that context analysis produces meaningful summaries — are all untested with real behavior.

The tdd_expected_fail tags on both m5_acceptance.robot and tdd_acms_behavioral_validation.robot mean that CI passes even though the ACMS indexing pipeline is not wired. This creates a false sense of M5 completion.

Definition of Done

Fix issue #1028 (ACMS indexing pipeline not wired into CLI)
Once #1028 is fixed, update tdd_acms_behavioral_validation.robot to remove tdd_expected_fail tags and verify behavioral assertions pass
Update m5_acceptance.robot to replace structural assertions with behavioral assertions:
- fragment_count > 0 after indexing a project with Python files
- total_tokens > 0 after indexing
- large_file.py excluded when max_file_size < file size
- 10K file project indexes within 600s with non-zero fragment count
Remove tdd_expected_fail tags from m5_acceptance.robot tests once behavioral assertions pass

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: E2E Workflow Specification Tests — M5 Acceptance (ACMS Behavioral Validation) **Milestone**: v3.6.0 (M7) — E2E workflow specification tests **Severity**: Priority/Backlog — test quality gap, not blocking runtime ### What Was Tested Code-level analysis of `robot/e2e/m5_acceptance.robot` against the M5 milestone acceptance criteria. ### Expected Behavior (from spec) The M5 (v3.4.0) milestone acceptance criteria include: - Context policies configurable with view-specific settings ✓ (tested structurally) - Budget enforcement works (max_file_size, max_total_size constraints) — **behavioral test missing** - Context assembly CLI functional ✓ (tested structurally) - Context analysis produces meaningful summaries — **behavioral test missing** - Plan execution leverages ACMS context for LLM calls — **behavioral test missing** - Projects with 10,000+ files index without timeout — **behavioral test missing** ### Actual Behavior The m5_acceptance.robot test file explicitly documents that most tests are "structural / plumbing" validations only: ``` **Structural vs. behavioural scope:** Tests in sections 1b–4 that use ``project context simulate`` or ``inspect`` are *structural / plumbing* validations — they verify CLI execution, JSON serialization, and stored configuration but do **not** exercise actual ACMS indexing or budget enforcement because the ``ContextTierService`` is an in-memory singleton that starts empty per CLI process. ``` Specific limitations documented in the test: 1. **Context Scaling (10K files)**: `fragment_count` is always 0 — ACMS indexing pipeline not wired 2. **Budget Enforcement**: `max_file_size` filtering not enforced in simulate command 3. **Context Inspect**: `tier_metrics` always has zero-value counters 4. **Context Simulate**: `total_tokens` is always 0 All 14 test cases in sections 1b-4 are tagged `tdd_expected_fail` and document that they test JSON schema correctness only, not actual ACMS behavior. A separate TDD test file (`tdd_acms_behavioral_validation.robot`) exists that captures the behavioral bugs (issue #1028), but these are also tagged `tdd_expected_fail` — meaning they are expected to fail and the failures are inverted to CI passes. ### Code Location - `robot/e2e/m5_acceptance.robot` — lines 9-17 (suite documentation), lines 172-596 (structural-only tests) - `robot/e2e/tdd_acms_behavioral_validation.robot` — behavioral tests all tagged `tdd_expected_fail` ### Impact The M5 acceptance test suite provides no behavioral coverage of the ACMS v1 pipeline. The core M5 acceptance criteria — that projects with 10,000+ files index without timeout, that budget enforcement actually excludes oversized files, that context analysis produces meaningful summaries — are all untested with real behavior. The `tdd_expected_fail` tags on both `m5_acceptance.robot` and `tdd_acms_behavioral_validation.robot` mean that CI passes even though the ACMS indexing pipeline is not wired. This creates a false sense of M5 completion. ### Definition of Done 1. Fix issue #1028 (ACMS indexing pipeline not wired into CLI) 2. Once #1028 is fixed, update `tdd_acms_behavioral_validation.robot` to remove `tdd_expected_fail` tags and verify behavioral assertions pass 3. Update `m5_acceptance.robot` to replace structural assertions with behavioral assertions: - `fragment_count > 0` after indexing a project with Python files - `total_tokens > 0` after indexing - `large_file.py` excluded when `max_file_size` < file size - 10K file project indexes within 600s with non-zero fragment count 4. Remove `tdd_expected_fail` tags from m5_acceptance.robot tests once behavioral assertions pass --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester

HAL9000 added the

Rows
Columns