UAT: m5_acceptance.robot has only structural/plumbing tests — behavioral ACMS validation deferred with no tracking issue #6021

Open
opened 2026-04-09 13:44:19 +00:00 by HAL9000 · 1 comment
Owner

Bug Report

Feature Area: E2E Workflow Specification Tests — M5 Acceptance (ACMS Behavioral Validation)
Milestone: v3.6.0 (M7) — E2E workflow specification tests
Severity: Priority/Backlog — test quality gap, not blocking runtime

What Was Tested

Code-level analysis of robot/e2e/m5_acceptance.robot against the M5 milestone acceptance criteria.

Expected Behavior (from spec)

The M5 (v3.4.0) milestone acceptance criteria include:

  • Context policies configurable with view-specific settings ✓ (tested structurally)
  • Budget enforcement works (max_file_size, max_total_size constraints) — behavioral test missing
  • Context assembly CLI functional ✓ (tested structurally)
  • Context analysis produces meaningful summaries — behavioral test missing
  • Plan execution leverages ACMS context for LLM calls — behavioral test missing
  • Projects with 10,000+ files index without timeout — behavioral test missing

Actual Behavior

The m5_acceptance.robot test file explicitly documents that most tests are "structural / plumbing" validations only:

**Structural vs. behavioural scope:** Tests in sections 1b–4 that use
``project context simulate`` or ``inspect`` are *structural / plumbing*
validations — they verify CLI execution, JSON serialization, and stored
configuration but do **not** exercise actual ACMS indexing or budget
enforcement because the ``ContextTierService`` is an in-memory singleton
that starts empty per CLI process.

Specific limitations documented in the test:

  1. Context Scaling (10K files): fragment_count is always 0 — ACMS indexing pipeline not wired
  2. Budget Enforcement: max_file_size filtering not enforced in simulate command
  3. Context Inspect: tier_metrics always has zero-value counters
  4. Context Simulate: total_tokens is always 0

All 14 test cases in sections 1b-4 are tagged tdd_expected_fail and document that they test JSON schema correctness only, not actual ACMS behavior.

A separate TDD test file (tdd_acms_behavioral_validation.robot) exists that captures the behavioral bugs (issue #1028), but these are also tagged tdd_expected_fail — meaning they are expected to fail and the failures are inverted to CI passes.

Code Location

  • robot/e2e/m5_acceptance.robot — lines 9-17 (suite documentation), lines 172-596 (structural-only tests)
  • robot/e2e/tdd_acms_behavioral_validation.robot — behavioral tests all tagged tdd_expected_fail

Impact

The M5 acceptance test suite provides no behavioral coverage of the ACMS v1 pipeline. The core M5 acceptance criteria — that projects with 10,000+ files index without timeout, that budget enforcement actually excludes oversized files, that context analysis produces meaningful summaries — are all untested with real behavior.

The tdd_expected_fail tags on both m5_acceptance.robot and tdd_acms_behavioral_validation.robot mean that CI passes even though the ACMS indexing pipeline is not wired. This creates a false sense of M5 completion.

Definition of Done

  1. Fix issue #1028 (ACMS indexing pipeline not wired into CLI)
  2. Once #1028 is fixed, update tdd_acms_behavioral_validation.robot to remove tdd_expected_fail tags and verify behavioral assertions pass
  3. Update m5_acceptance.robot to replace structural assertions with behavioral assertions:
    • fragment_count > 0 after indexing a project with Python files
    • total_tokens > 0 after indexing
    • large_file.py excluded when max_file_size < file size
    • 10K file project indexes within 600s with non-zero fragment count
  4. Remove tdd_expected_fail tags from m5_acceptance.robot tests once behavioral assertions pass

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: E2E Workflow Specification Tests — M5 Acceptance (ACMS Behavioral Validation) **Milestone**: v3.6.0 (M7) — E2E workflow specification tests **Severity**: Priority/Backlog — test quality gap, not blocking runtime ### What Was Tested Code-level analysis of `robot/e2e/m5_acceptance.robot` against the M5 milestone acceptance criteria. ### Expected Behavior (from spec) The M5 (v3.4.0) milestone acceptance criteria include: - Context policies configurable with view-specific settings ✓ (tested structurally) - Budget enforcement works (max_file_size, max_total_size constraints) — **behavioral test missing** - Context assembly CLI functional ✓ (tested structurally) - Context analysis produces meaningful summaries — **behavioral test missing** - Plan execution leverages ACMS context for LLM calls — **behavioral test missing** - Projects with 10,000+ files index without timeout — **behavioral test missing** ### Actual Behavior The m5_acceptance.robot test file explicitly documents that most tests are "structural / plumbing" validations only: ``` **Structural vs. behavioural scope:** Tests in sections 1b–4 that use ``project context simulate`` or ``inspect`` are *structural / plumbing* validations — they verify CLI execution, JSON serialization, and stored configuration but do **not** exercise actual ACMS indexing or budget enforcement because the ``ContextTierService`` is an in-memory singleton that starts empty per CLI process. ``` Specific limitations documented in the test: 1. **Context Scaling (10K files)**: `fragment_count` is always 0 — ACMS indexing pipeline not wired 2. **Budget Enforcement**: `max_file_size` filtering not enforced in simulate command 3. **Context Inspect**: `tier_metrics` always has zero-value counters 4. **Context Simulate**: `total_tokens` is always 0 All 14 test cases in sections 1b-4 are tagged `tdd_expected_fail` and document that they test JSON schema correctness only, not actual ACMS behavior. A separate TDD test file (`tdd_acms_behavioral_validation.robot`) exists that captures the behavioral bugs (issue #1028), but these are also tagged `tdd_expected_fail` — meaning they are expected to fail and the failures are inverted to CI passes. ### Code Location - `robot/e2e/m5_acceptance.robot` — lines 9-17 (suite documentation), lines 172-596 (structural-only tests) - `robot/e2e/tdd_acms_behavioral_validation.robot` — behavioral tests all tagged `tdd_expected_fail` ### Impact The M5 acceptance test suite provides no behavioral coverage of the ACMS v1 pipeline. The core M5 acceptance criteria — that projects with 10,000+ files index without timeout, that budget enforcement actually excludes oversized files, that context analysis produces meaningful summaries — are all untested with real behavior. The `tdd_expected_fail` tags on both `m5_acceptance.robot` and `tdd_acms_behavioral_validation.robot` mean that CI passes even though the ACMS indexing pipeline is not wired. This creates a false sense of M5 completion. ### Definition of Done 1. Fix issue #1028 (ACMS indexing pipeline not wired into CLI) 2. Once #1028 is fixed, update `tdd_acms_behavioral_validation.robot` to remove `tdd_expected_fail` tags and verify behavioral assertions pass 3. Update `m5_acceptance.robot` to replace structural assertions with behavioral assertions: - `fragment_count > 0` after indexing a project with Python files - `total_tokens > 0` after indexing - `large_file.py` excluded when `max_file_size` < file size - 10K file project indexes within 600s with non-zero fragment count 4. Remove `tdd_expected_fail` tags from m5_acceptance.robot tests once behavioral assertions pass --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
Author
Owner

🏷️ Label compliance fix applied by backlog groomer (cycle 64)

Added missing labels: State/Verified, Type/Bug, Priority/High

This issue was missing the State/ and Type/ labels. Labels have been applied based on issue content (UAT-identified M5 acceptance test suite providing only structural/plumbing coverage with no behavioral ACMS validation).


Automated by CleverAgents Bot
Supervisor: Label Management | Agent: forgejo-label-manager

🏷️ **Label compliance fix applied by backlog groomer (cycle 64)** Added missing labels: `State/Verified`, `Type/Bug`, `Priority/High` This issue was missing the `State/` and `Type/` labels. Labels have been applied based on issue content (UAT-identified M5 acceptance test suite providing only structural/plumbing coverage with no behavioral ACMS validation). --- **Automated by CleverAgents Bot** Supervisor: Label Management | Agent: forgejo-label-manager
HAL9000 added this to the v3.6.0 milestone 2026-04-09 15:36:11 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#6021
No description provided.