test(e2e): TDD behavioral test proving ACMS indexing pipeline is not wired into CLI (bug #1028) #1124

Merged
hurui200320 merged 3 commits from tdd/m5-acms-cli-indexing-pipeline-wiring into master 2026-03-25 11:13:37 +00:00
Member

Summary

This PR adds a Robot Framework E2E test suite (robot/e2e/tdd_acms_behavioral_validation.robot) that proves bug #1028 exists — the ACMS indexing pipeline is not wired into the CLI, so ContextTierService starts empty on every invocation.

Changes

  • New file: robot/e2e/tdd_acms_behavioral_validation.robot — 4 E2E test cases tagged tdd_expected_fail, tdd_bug, tdd_bug_1028, E2E
  • Modified: robot/e2e/common_e2e.resource — Extracted shared keywords (Run CLI, Extract JSON From Stdout, Link Resource To Project, Create Synthetic Codebase) from both m5_acceptance.robot and tdd_acms_behavioral_validation.robot to eliminate ~97 lines of duplication. Create Synthetic Codebase is parameterized with project_label.
  • Modified: robot/e2e/m5_acceptance.robot — Removed duplicated keywords now provided by common_e2e.resource.
  • CHANGELOG.md: Added entry under ## Unreleased documenting the TDD tests for #1029.

Test Cases

  • Test 1: Context Simulate Returns Non-Empty Tier Data — asserts fragment_count > 0 (fails, proving bug)
  • Test 2: Context Inspect Shows Indexed Resources — asserts tier metrics total > 0 (fails, proving bug)
  • Test 3: Budget Enforcement Excludes Oversized Files — asserts fragment_count > 0 with max_file_size policy (fails, proving bug). Includes TODO comment for post-fix exclusion assertion.
  • Test 4: Large Project Indexes Without Timeout — generates 10K+ files, asserts fragment_count > 0 (fails, proving bug). Includes explicit developer responsibility note about git-tracking of generated files.

All tests pass CI through result inversion by the tdd_expected_fail_listener.py — failing assertions (bug confirmed) are inverted to PASS.

Documentation & Robustness Improvements

  • Suite-level documentation expanded with known limitation section explaining tdd_expected_fail result inversion scope.
  • Suite setup error messages include (rc=${var.rc}). Check DEBUG logs above. for debugging consistency with m5_acceptance.robot.
  • Redundant exit code assertions after Run CLI calls removed (Run CLI already validates rc internally).
  • Run CLI keyword documentation includes API key security notes.
  • Budget enforcement test (Test 3) includes TODO(bugfix/...) comment for the bug-fix developer.
  • Large project test (Test 4) includes explicit NOTE assigning responsibility to the bug-fix developer to evaluate filesystem vs. git-tracked content indexing.

Review Fix Round

Addressed all findings from Luis's review (review #2691):

  • C1 (CRITICAL): Restored the #845 CorrectionService changelog entry (50 lines) accidentally deleted during merge conflict resolution.
  • L1 (LOW): Extracted ~97 lines of duplicated keywords into common_e2e.resource (parameterized Create Synthetic Codebase, shared Run CLI, Extract JSON From Stdout, Link Resource To Project).
  • M1 (MEDIUM): Strengthened NOTE comment about 10K files not being git-committed — explicit developer MUST responsibility.
  • L3 (LOW): Made suite setup error messages verbose with (rc=...). Check DEBUG logs above.
  • L2 (LOW): Removed redundant exit code assertions after Run CLI calls.
  • M2 (MEDIUM): Acknowledged — partial assertion is acceptable for TDD capture phase (TODO documents the gap).
  • I1 (INFO): Acknowledged — tdd_expected_fail masking is documented and mitigated.

Motivation

Per the Bug Fix Workflow in CONTRIBUTING.md, this TDD issue (#1029) is the prerequisite for bug fix #1028. The tests capture the buggy behavior so that when the fix is implemented, removing the tdd_expected_fail tag will cause the tests to pass normally.

Quality Gates

Gate Result
lint PASS
typecheck PASS
unit_tests PASS
integration_tests PASS
e2e_tests PASS (41 tests, 4 TDD)
coverage_report PASS (>=97%)

Closes #1029

## Summary This PR adds a Robot Framework E2E test suite (`robot/e2e/tdd_acms_behavioral_validation.robot`) that proves bug #1028 exists — the ACMS indexing pipeline is not wired into the CLI, so `ContextTierService` starts empty on every invocation. ### Changes - **New file**: `robot/e2e/tdd_acms_behavioral_validation.robot` — 4 E2E test cases tagged `tdd_expected_fail`, `tdd_bug`, `tdd_bug_1028`, `E2E` - **Modified**: `robot/e2e/common_e2e.resource` — Extracted shared keywords (`Run CLI`, `Extract JSON From Stdout`, `Link Resource To Project`, `Create Synthetic Codebase`) from both `m5_acceptance.robot` and `tdd_acms_behavioral_validation.robot` to eliminate ~97 lines of duplication. `Create Synthetic Codebase` is parameterized with `project_label`. - **Modified**: `robot/e2e/m5_acceptance.robot` — Removed duplicated keywords now provided by `common_e2e.resource`. - **CHANGELOG.md**: Added entry under `## Unreleased` documenting the TDD tests for #1029. ### Test Cases - **Test 1**: Context Simulate Returns Non-Empty Tier Data — asserts `fragment_count > 0` (fails, proving bug) - **Test 2**: Context Inspect Shows Indexed Resources — asserts tier metrics total > 0 (fails, proving bug) - **Test 3**: Budget Enforcement Excludes Oversized Files — asserts `fragment_count > 0` with `max_file_size` policy (fails, proving bug). Includes TODO comment for post-fix exclusion assertion. - **Test 4**: Large Project Indexes Without Timeout — generates 10K+ files, asserts `fragment_count > 0` (fails, proving bug). Includes explicit developer responsibility note about git-tracking of generated files. All tests pass CI through result inversion by the `tdd_expected_fail_listener.py` — failing assertions (bug confirmed) are inverted to PASS. ### Documentation & Robustness Improvements - Suite-level documentation expanded with **known limitation** section explaining `tdd_expected_fail` result inversion scope. - Suite setup error messages include `(rc=${var.rc}). Check DEBUG logs above.` for debugging consistency with `m5_acceptance.robot`. - Redundant exit code assertions after `Run CLI` calls removed (Run CLI already validates rc internally). - `Run CLI` keyword documentation includes API key security notes. - Budget enforcement test (Test 3) includes `TODO(bugfix/...)` comment for the bug-fix developer. - Large project test (Test 4) includes explicit NOTE assigning responsibility to the bug-fix developer to evaluate filesystem vs. git-tracked content indexing. ### Review Fix Round Addressed all findings from Luis's review (review #2691): - **C1 (CRITICAL)**: Restored the #845 `CorrectionService` changelog entry (50 lines) accidentally deleted during merge conflict resolution. - **L1 (LOW)**: Extracted ~97 lines of duplicated keywords into `common_e2e.resource` (parameterized `Create Synthetic Codebase`, shared `Run CLI`, `Extract JSON From Stdout`, `Link Resource To Project`). - **M1 (MEDIUM)**: Strengthened NOTE comment about 10K files not being git-committed — explicit developer MUST responsibility. - **L3 (LOW)**: Made suite setup error messages verbose with `(rc=...). Check DEBUG logs above.` - **L2 (LOW)**: Removed redundant exit code assertions after `Run CLI` calls. - **M2 (MEDIUM)**: Acknowledged — partial assertion is acceptable for TDD capture phase (TODO documents the gap). - **I1 (INFO)**: Acknowledged — `tdd_expected_fail` masking is documented and mitigated. ### Motivation Per the Bug Fix Workflow in CONTRIBUTING.md, this TDD issue (#1029) is the prerequisite for bug fix #1028. The tests capture the buggy behavior so that when the fix is implemented, removing the `tdd_expected_fail` tag will cause the tests to pass normally. ### Quality Gates | Gate | Result | |------|--------| | lint | PASS | | typecheck | PASS | | unit_tests | PASS | | integration_tests | PASS | | e2e_tests | PASS (41 tests, 4 TDD) | | coverage_report | PASS (>=97%) | Closes #1029
hurui200320 added this to the v3.4.0 milestone 2026-03-23 05:06:32 +00:00
hurui200320 force-pushed tdd/m5-acms-cli-indexing-pipeline-wiring from 39bfe4c3a1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m28s
CI / typecheck (pull_request) Successful in 4m1s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 4m5s
CI / security (pull_request) Successful in 4m26s
CI / unit_tests (pull_request) Successful in 7m3s
CI / docker (pull_request) Successful in 1m10s
CI / integration_tests (pull_request) Successful in 6m58s
CI / e2e_tests (pull_request) Successful in 10m3s
CI / coverage (pull_request) Successful in 11m4s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Failing after 46m20s
to a2f91cdb27
All checks were successful
CI / build (pull_request) Successful in 19s
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m41s
CI / typecheck (pull_request) Successful in 4m23s
CI / security (pull_request) Successful in 4m23s
CI / integration_tests (pull_request) Successful in 6m16s
CI / unit_tests (pull_request) Successful in 6m59s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 9m10s
CI / coverage (pull_request) Successful in 11m8s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 53m33s
2026-03-23 06:18:39 +00:00
Compare
hurui200320 force-pushed tdd/m5-acms-cli-indexing-pipeline-wiring from a2f91cdb27
All checks were successful
CI / build (pull_request) Successful in 19s
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m41s
CI / typecheck (pull_request) Successful in 4m23s
CI / security (pull_request) Successful in 4m23s
CI / integration_tests (pull_request) Successful in 6m16s
CI / unit_tests (pull_request) Successful in 6m59s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 9m10s
CI / coverage (pull_request) Successful in 11m8s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 53m33s
to 640e90a2a5
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m39s
CI / security (pull_request) Successful in 3m55s
CI / typecheck (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Successful in 6m48s
CI / integration_tests (pull_request) Successful in 6m44s
CI / docker (pull_request) Successful in 9s
CI / e2e_tests (pull_request) Successful in 8m19s
CI / coverage (pull_request) Successful in 11m34s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h0m28s
2026-03-24 05:41:37 +00:00
Compare
CoreRasurae left a comment

Code Review Report — PR #1124 (TDD Bug #1028 / Issue #1029)

Branch: tdd/m5-acms-cli-indexing-pipeline-wiring
Commit: 640e90a by Rui Hu
Review scope: Branch diff vs origin/master (2 files changed: CHANGELOG.md, robot/e2e/tdd_acms_behavioral_validation.robot)
Methodology: 3 iterative full-spectrum review cycles covering bug detection, test flaws, test coverage, performance, security, spec compliance, and code quality. Findings stabilised after cycle 2.


Positive Observations

  • All 4 test cases are correctly aligned with the issue #1029 specification (test case names, tag conventions, and behavioral assertions match exactly).
  • Tag convention (tdd_expected_fail, tdd_bug, tdd_bug_1028, E2E) follows CONTRIBUTING.md > TDD Bug Test Tags rules and will pass the listener's _validate_tdd_tags() checks.
  • The tdd_expected_fail_listener.py IS registered in the e2e_tests nox session (line 723 of noxfile.py), so result inversion will work correctly.
  • CLI commands used (project context simulate, project context inspect, project context set with --include-path, --max-file-size, --max-total-size) all match the specification command synopsis (lines 237–271 of docs/specification.md).
  • The known limitation of tdd_expected_fail masking setup failures is explicitly documented in the suite header and mitigated by m5_acceptance.robot.
  • Security: NO_COLOR=1 is set, stdout/stderr are not embedded in assertion messages, and API key logging risk is mitigated by DEBUG-level logging.
  • Commit message matches the issue metadata exactly, branch name matches, and ISSUES CLOSED: #1029 is correct per the TDD workflow.

Findings by Severity


CRITICAL — Bug Detection

C1. CHANGELOG regression: #845 CorrectionService entry accidentally deleted

  • File: CHANGELOG.md, lines 72–121 of merge-base content
  • Description: The branch removes 50 lines from the ## Unreleased section — the entire #845 CorrectionService subtree isolation entry. This entry documents numerous fixes (BFS cycle detection, dry-run enforcement, terminal-state guards, mode validation, status restoration, etc.) from PR #845 merged on 2026-03-19.
  • Evidence: git diff origin/master...HEAD -- CHANGELOG.md shows the merge-base CHANGELOG has 946 lines; the branch HEAD has 900 lines (946 − 50 deleted + 4 added = 900).
  • Root cause: Likely a merge conflict resolution error when adding the 4-line #1029 entry at the top of the Unreleased section. The #845 entry, which was adjacent to the conflict region, was dropped during resolution.
  • Impact: If merged, the #845 changelog documentation will be permanently lost from release notes.
  • Action required: Restore the #845 entry from origin/master's CHANGELOG.md before merging.

MEDIUM — Test Flaws

M1. Large Project test: 10K generated files are not committed to the git repo

  • File: robot/e2e/tdd_acms_behavioral_validation.robot, lines 317–338
  • Description: The "Large Project Indexes Without Timeout" test generates 10,000 .py files on the filesystem but does not git add + git commit them. The workspace resource is registered as git-checkout type (line 83). If the bug #1028 fix routes indexing through the git sandbox (which only exposes git-tracked content), these files will not be visible and the test will fail for the wrong reason (files not tracked vs. pipeline not wired).
  • Mitigating factors:
    • The current walk_and_index in repo_indexing_utils.py uses os.walk() (filesystem), not git ls-files, so this may not manifest.
    • Lines 331–338 document this risk with commented-out git add/commit commands ready to uncomment.
    • m5_acceptance.robot has the same pattern (neither suite commits generated files).
  • Recommendation: Consider uncommitting the git add/commit lines now (or adding a comment making it clear this is a deliberate risk acceptance for the TDD phase, and the bug-fix developer must evaluate it). Current phrasing ("If the ACMS indexing pipeline reads only git-tracked content...") is ambiguous about responsibility.

M2. Budget enforcement test is a partial assertion

  • File: tdd_acms_behavioral_validation.robot, lines 260–301
  • Description: The "Budget Enforcement Excludes Oversized Files" test only asserts fragment_count > 0. It does not verify that large_file.py (>1 KiB) is excluded from the fragment list while smaller files are included. The test title claims it tests exclusion, but the assertion only tests inclusion.
  • Mitigating factor: The TODO comment on lines 295–298 documents this gap and defers the second assertion to the bug-fix branch.
  • Recommendation: Acceptable for TDD capture phase. Ensure the bug-fix developer is aware this test needs a second assertion for the large_file.py exclusion check.

LOW — Code Quality

L1. ~97 lines of keyword/setup code duplicated from m5_acceptance.robot

  • File: tdd_acms_behavioral_validation.robot, keyword section (lines 46–167)
  • Description: Four keywords are character-for-character identical to m5_acceptance.robot: Run CLI (18 lines), Extract JSON From Stdout (15 lines), Link Resource To Project (6 lines), Suite Teardown (3 lines). Create Synthetic Codebase and Suite Setup differ only in trivial string literals (~55 lines). Total: ~97 duplicated lines out of ~105 keyword-definition lines.
  • Recommendation: Extract the common keywords into common_e2e.resource (which already exists and is imported by both files). The keywords can be parameterized with suite-specific names (resource name, workspace label) passed as arguments. This would reduce maintenance burden when CLI interfaces change.

L2. Redundant exit code assertions

  • File: tdd_acms_behavioral_validation.robot, lines 201, 238, 275, 341
  • Description: Should Be Equal As Integers ${result.rc} 0 is called immediately after Run CLI which already validates the exit code internally via Should Be Equal As Integers ${result.rc} ${expected_rc} (default 0). These assertions are always true if Run CLI didn't fail.
  • Impact: No functional issue; adds minor visual clutter.

L3. Suite setup error messages less verbose than m5_acceptance.robot

  • File: tdd_acms_behavioral_validation.robot, lines 66, 71, 73
  • Description: Error messages use msg=git init failed vs. m5_acceptance.robot's msg=git init failed (rc=${git_init.rc}). Check DEBUG logs above. The less-verbose messages omit the actual return code, making debugging harder if git commands fail in CI.
  • Recommendation: Add (rc=${variable.rc}) and "Check DEBUG logs" to match the m5_acceptance.robot pattern for consistency.

INFORMATIONAL — Known Limitations (documented, no action required)

I1. tdd_expected_fail listener masks unrelated failures

  • If suite setup fails or a CLI infrastructure error occurs, individual test failures are inverted to PASS by the listener, producing false-positive "bug confirmed" results.
  • Explicitly documented in suite header (lines 20–31) and test case comments (lines 170–182).
  • Mitigated by m5_acceptance.robot running the same CLI plumbing without tdd_expected_fail inversion.

Verdict

Request changes — the CHANGELOG regression (C1) must be fixed before merge. The #845 entry needs to be restored from origin/master. All other findings are medium/low severity and can be addressed at the reviewer's discretion.


Review performed using 3 iterative full-spectrum cycles (bug detection, test flaws, test coverage, performance, security, spec compliance, code quality). Findings stabilised after cycle 2 with no new issues found in cycle 3.

## Code Review Report — PR #1124 (TDD Bug #1028 / Issue #1029) **Branch:** `tdd/m5-acms-cli-indexing-pipeline-wiring` **Commit:** `640e90a` by Rui Hu **Review scope:** Branch diff vs `origin/master` (2 files changed: `CHANGELOG.md`, `robot/e2e/tdd_acms_behavioral_validation.robot`) **Methodology:** 3 iterative full-spectrum review cycles covering bug detection, test flaws, test coverage, performance, security, spec compliance, and code quality. Findings stabilised after cycle 2. --- ### Positive Observations - All 4 test cases are correctly aligned with the issue #1029 specification (test case names, tag conventions, and behavioral assertions match exactly). - Tag convention (`tdd_expected_fail`, `tdd_bug`, `tdd_bug_1028`, `E2E`) follows `CONTRIBUTING.md > TDD Bug Test Tags` rules and will pass the listener's `_validate_tdd_tags()` checks. - The `tdd_expected_fail_listener.py` IS registered in the `e2e_tests` nox session (line 723 of `noxfile.py`), so result inversion will work correctly. - CLI commands used (`project context simulate`, `project context inspect`, `project context set` with `--include-path`, `--max-file-size`, `--max-total-size`) all match the specification command synopsis (lines 237–271 of `docs/specification.md`). - The known limitation of `tdd_expected_fail` masking setup failures is explicitly documented in the suite header and mitigated by `m5_acceptance.robot`. - Security: `NO_COLOR=1` is set, stdout/stderr are not embedded in assertion messages, and API key logging risk is mitigated by DEBUG-level logging. - Commit message matches the issue metadata exactly, branch name matches, and `ISSUES CLOSED: #1029` is correct per the TDD workflow. --- ### Findings by Severity --- #### CRITICAL — Bug Detection **C1. CHANGELOG regression: #845 CorrectionService entry accidentally deleted** - **File:** `CHANGELOG.md`, lines 72–121 of merge-base content - **Description:** The branch removes **50 lines** from the `## Unreleased` section — the entire `#845` `CorrectionService` subtree isolation entry. This entry documents numerous fixes (BFS cycle detection, dry-run enforcement, terminal-state guards, mode validation, status restoration, etc.) from PR #845 merged on 2026-03-19. - **Evidence:** `git diff origin/master...HEAD -- CHANGELOG.md` shows the merge-base CHANGELOG has 946 lines; the branch HEAD has 900 lines (946 − 50 deleted + 4 added = 900). - **Root cause:** Likely a merge conflict resolution error when adding the 4-line `#1029` entry at the top of the Unreleased section. The `#845` entry, which was adjacent to the conflict region, was dropped during resolution. - **Impact:** If merged, the `#845` changelog documentation will be **permanently lost** from release notes. - **Action required:** Restore the `#845` entry from `origin/master`'s `CHANGELOG.md` before merging. --- #### MEDIUM — Test Flaws **M1. Large Project test: 10K generated files are not committed to the git repo** - **File:** `robot/e2e/tdd_acms_behavioral_validation.robot`, lines 317–338 - **Description:** The "Large Project Indexes Without Timeout" test generates 10,000 `.py` files on the filesystem but does **not** `git add` + `git commit` them. The workspace resource is registered as `git-checkout` type (line 83). If the bug #1028 fix routes indexing through the git sandbox (which only exposes git-tracked content), these files will not be visible and the test will fail for the **wrong reason** (files not tracked vs. pipeline not wired). - **Mitigating factors:** - The current `walk_and_index` in `repo_indexing_utils.py` uses `os.walk()` (filesystem), not `git ls-files`, so this may not manifest. - Lines 331–338 document this risk with commented-out `git add/commit` commands ready to uncomment. - `m5_acceptance.robot` has the same pattern (neither suite commits generated files). - **Recommendation:** Consider uncommitting the `git add/commit` lines now (or adding a comment making it clear this is a **deliberate** risk acceptance for the TDD phase, and the bug-fix developer **must** evaluate it). Current phrasing ("If the ACMS indexing pipeline reads only git-tracked content...") is ambiguous about responsibility. **M2. Budget enforcement test is a partial assertion** - **File:** `tdd_acms_behavioral_validation.robot`, lines 260–301 - **Description:** The "Budget Enforcement Excludes Oversized Files" test only asserts `fragment_count > 0`. It does **not** verify that `large_file.py` (>1 KiB) is excluded from the fragment list while smaller files are included. The test title claims it tests exclusion, but the assertion only tests inclusion. - **Mitigating factor:** The TODO comment on lines 295–298 documents this gap and defers the second assertion to the bug-fix branch. - **Recommendation:** Acceptable for TDD capture phase. Ensure the bug-fix developer is aware this test needs a second assertion for the `large_file.py` exclusion check. --- #### LOW — Code Quality **L1. ~97 lines of keyword/setup code duplicated from `m5_acceptance.robot`** - **File:** `tdd_acms_behavioral_validation.robot`, keyword section (lines 46–167) - **Description:** Four keywords are **character-for-character identical** to `m5_acceptance.robot`: `Run CLI` (18 lines), `Extract JSON From Stdout` (15 lines), `Link Resource To Project` (6 lines), `Suite Teardown` (3 lines). `Create Synthetic Codebase` and `Suite Setup` differ only in trivial string literals (~55 lines). Total: ~97 duplicated lines out of ~105 keyword-definition lines. - **Recommendation:** Extract the common keywords into `common_e2e.resource` (which already exists and is imported by both files). The keywords can be parameterized with suite-specific names (resource name, workspace label) passed as arguments. This would reduce maintenance burden when CLI interfaces change. **L2. Redundant exit code assertions** - **File:** `tdd_acms_behavioral_validation.robot`, lines 201, 238, 275, 341 - **Description:** `Should Be Equal As Integers ${result.rc} 0` is called immediately after `Run CLI` which already validates the exit code internally via `Should Be Equal As Integers ${result.rc} ${expected_rc}` (default 0). These assertions are always true if `Run CLI` didn't fail. - **Impact:** No functional issue; adds minor visual clutter. **L3. Suite setup error messages less verbose than `m5_acceptance.robot`** - **File:** `tdd_acms_behavioral_validation.robot`, lines 66, 71, 73 - **Description:** Error messages use `msg=git init failed` vs. `m5_acceptance.robot`'s `msg=git init failed (rc=${git_init.rc}). Check DEBUG logs above.` The less-verbose messages omit the actual return code, making debugging harder if git commands fail in CI. - **Recommendation:** Add `(rc=${variable.rc})` and "Check DEBUG logs" to match the `m5_acceptance.robot` pattern for consistency. --- #### INFORMATIONAL — Known Limitations (documented, no action required) **I1. `tdd_expected_fail` listener masks unrelated failures** - If suite setup fails or a CLI infrastructure error occurs, individual test failures are inverted to PASS by the listener, producing false-positive "bug confirmed" results. - Explicitly documented in suite header (lines 20–31) and test case comments (lines 170–182). - Mitigated by `m5_acceptance.robot` running the same CLI plumbing without `tdd_expected_fail` inversion. --- ### Verdict **Request changes** — the CHANGELOG regression (C1) **must** be fixed before merge. The #845 entry needs to be restored from `origin/master`. All other findings are medium/low severity and can be addressed at the reviewer's discretion. --- *Review performed using 3 iterative full-spectrum cycles (bug detection, test flaws, test coverage, performance, security, spec compliance, code quality). Findings stabilised after cycle 2 with no new issues found in cycle 3.*
CHANGELOG.md Outdated
@ -3,2 +3,4 @@
## Unreleased
- Added TDD bug-capture E2E tests for bug #1028 — ACMS indexing pipeline not
wired into CLI. Four Robot Framework E2E tests prove ContextTierService starts
Member

CRITICAL (C1): This diff removes the entire #845 CorrectionService changelog entry (50 lines). The merge-base (9e316b1) contains this entry at lines 72-121, but it is absent from the branch HEAD. This is likely a merge conflict resolution error — when adding the 4-line #1029 entry, the adjacent #845 block was accidentally dropped.

Action required: Restore the #845 entry from origin/master before merging. You can extract it with:

git show origin/master:CHANGELOG.md

and re-insert the block between the (#331) entry and the deferred physical resource types entry.

**CRITICAL (C1):** This diff removes the entire #845 CorrectionService changelog entry (50 lines). The merge-base (`9e316b1`) contains this entry at lines 72-121, but it is absent from the branch HEAD. This is likely a merge conflict resolution error — when adding the 4-line #1029 entry, the adjacent #845 block was accidentally dropped. **Action required:** Restore the #845 entry from `origin/master` before merging. You can extract it with: ``` git show origin/master:CHANGELOG.md ``` and re-insert the block between the `(#331)` entry and the deferred physical resource types entry.
@ -0,0 +63,4 @@
Create Synthetic Codebase ${ws}
# Initialize a git repository
${git_init}= Run Process git init cwd=${ws} timeout=60s on_timeout=kill
Should Be Equal As Integers ${git_init.rc} 0 msg=git init failed
Member

LOW (L3): Error message is msg=git init failed — less verbose than m5_acceptance.robot's equivalent msg=git init failed (rc=${git_init.rc}). Check DEBUG logs above. Consider adding the return code and log hint for debugging consistency.

**LOW (L3):** Error message is `msg=git init failed` — less verbose than `m5_acceptance.robot`'s equivalent `msg=git init failed (rc=${git_init.rc}). Check DEBUG logs above.` Consider adding the return code and log hint for debugging consistency.
@ -0,0 +116,4 @@
${large_content}= Evaluate "# auto-generated large file\\n" + ("x = 1\\n" * 250)
Create File ${base_dir}${/}large_file.py ${large_content}
Run CLI
Member

LOW (L1): This Run CLI keyword is character-for-character identical to the one in m5_acceptance.robot (lines 109-126). Along with Extract JSON From Stdout, Link Resource To Project, Create Synthetic Codebase, and the Suite Setup/Teardown, approximately 97 lines of keyword code are duplicated between the two files. Consider extracting the common keywords into common_e2e.resource (already imported by both files) with parameterized suite-specific names.

**LOW (L1):** This `Run CLI` keyword is character-for-character identical to the one in `m5_acceptance.robot` (lines 109-126). Along with `Extract JSON From Stdout`, `Link Resource To Project`, `Create Synthetic Codebase`, and the Suite Setup/Teardown, approximately **97 lines** of keyword code are duplicated between the two files. Consider extracting the common keywords into `common_e2e.resource` (already imported by both files) with parameterized suite-specific names.
@ -0,0 +257,4 @@
Should Be True ${total_indexed} > 0
... msg=Bug #1028: total indexed fragments is ${total_indexed} (expected > 0). tier_metrics: hot=${hot_count}, warm=${warm_count}, cold=${cold_count}. ACMS indexing pipeline is not wired into CLI.
Budget Enforcement Excludes Oversized Files
Member

MEDIUM (M2): This test title says "Excludes Oversized Files" but the assertion only checks fragment_count > 0 (that some files are indexed). It does not verify that large_file.py is actually excluded. The TODO on lines 295-298 documents this gap — ensure the bug-fix developer adds the exclusion assertion when wiring the pipeline.

**MEDIUM (M2):** This test title says "Excludes Oversized Files" but the assertion only checks `fragment_count > 0` (that *some* files are indexed). It does not verify that `large_file.py` is actually *excluded*. The TODO on lines 295-298 documents this gap — ensure the bug-fix developer adds the exclusion assertion when wiring the pipeline.
@ -0,0 +328,4 @@
${gen}= Run Process ${PYTHON} ${gen_script} ${scale_dir}
... timeout=120s on_timeout=kill
Should Be Equal As Integers ${gen.rc} 0 msg=10K file generation failed
# NOTE: The 10K generated files are written to the filesystem but NOT
Member

MEDIUM (M1): The 10K generated files are written to the filesystem but NOT committed to the workspace git repo. The resource is registered as git-checkout type (line 83). If the bug-fix routes indexing through the git sandbox, these untracked files won't be visible and the test will fail for the wrong reason.

The NOTE comment here is good and documents the risk, but consider making the responsibility assignment more explicit — e.g.: "The bug-fix developer MUST evaluate whether the fix uses filesystem or git-tracked content and uncomment these lines if needed."

**MEDIUM (M1):** The 10K generated files are written to the filesystem but NOT committed to the workspace git repo. The resource is registered as `git-checkout` type (line 83). If the bug-fix routes indexing through the git sandbox, these untracked files won't be visible and the test will fail for the wrong reason. The NOTE comment here is good and documents the risk, but consider making the responsibility assignment more explicit — e.g.: "The bug-fix developer MUST evaluate whether the fix uses filesystem or git-tracked content and uncomment these lines if needed."
hurui200320 force-pushed tdd/m5-acms-cli-indexing-pipeline-wiring from 640e90a2a5
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m39s
CI / security (pull_request) Successful in 3m55s
CI / typecheck (pull_request) Successful in 4m18s
CI / unit_tests (pull_request) Successful in 6m48s
CI / integration_tests (pull_request) Successful in 6m44s
CI / docker (pull_request) Successful in 9s
CI / e2e_tests (pull_request) Successful in 8m19s
CI / coverage (pull_request) Successful in 11m34s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h0m28s
to 5602c9a87c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 3m27s
CI / integration_tests (pull_request) Successful in 3m34s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m53s
CI / security (pull_request) Successful in 4m1s
CI / unit_tests (pull_request) Successful in 6m43s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 8m58s
CI / coverage (pull_request) Successful in 10m3s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 49m58s
2026-03-24 11:12:30 +00:00
Compare
Author
Member

Response to Review #2691 (Luis)

Thanks for the thorough review! All findings addressed in commit 5602c9a8. Summary:

C1 (CRITICAL) — CHANGELOG #845 entry deleted Fixed

Reset CHANGELOG.md to origin/master and re-added the #1029 entry. The #845 CorrectionService entry (50 lines) is now fully restored.

M1 (MEDIUM) — 10K files not committed to git Fixed

Replaced the ambiguous NOTE with an explicit responsibility assignment: "The bug-fix developer MUST evaluate whether the fix routes indexing through the git sandbox (which only exposes git-tracked content) or the filesystem."

M2 (MEDIUM) — Budget enforcement partial assertion ℹ️ Acknowledged

Agreed — the TODO comment already documents the gap, and the assertion is appropriate for the TDD capture phase. The bug-fix developer will add the large_file.py exclusion assertion when wiring the pipeline.

L1 (LOW) — ~97 lines duplicated from m5_acceptance.robot Fixed

Extracted Run CLI, Extract JSON From Stdout, Link Resource To Project, and Create Synthetic Codebase into common_e2e.resource. Create Synthetic Codebase is parameterized with ${project_label} for suite-specific literals. Both m5_acceptance.robot and tdd_acms_behavioral_validation.robot now use the shared keywords.

L2 (LOW) — Redundant exit code assertions Fixed

Removed all 12 instances of redundant Should Be Equal As Integers ${result.rc} 0 that followed Run CLI calls.

L3 (LOW) — Suite setup error messages less verbose Fixed

All git command error messages now include (rc=${var.rc}). Check DEBUG logs above. matching the m5_acceptance.robot pattern.

I1 (INFO) — tdd_expected_fail listener masks ℹ️ Acknowledged

Already documented. No action needed.

## Response to Review #2691 (Luis) Thanks for the thorough review! All findings addressed in commit `5602c9a8`. Summary: ### C1 (CRITICAL) — CHANGELOG #845 entry deleted ✅ Fixed Reset `CHANGELOG.md` to `origin/master` and re-added the `#1029` entry. The #845 `CorrectionService` entry (50 lines) is now fully restored. ### M1 (MEDIUM) — 10K files not committed to git ✅ Fixed Replaced the ambiguous NOTE with an explicit responsibility assignment: _"The bug-fix developer MUST evaluate whether the fix routes indexing through the git sandbox (which only exposes git-tracked content) or the filesystem."_ ### M2 (MEDIUM) — Budget enforcement partial assertion ℹ️ Acknowledged Agreed — the TODO comment already documents the gap, and the assertion is appropriate for the TDD capture phase. The bug-fix developer will add the `large_file.py` exclusion assertion when wiring the pipeline. ### L1 (LOW) — ~97 lines duplicated from m5_acceptance.robot ✅ Fixed Extracted `Run CLI`, `Extract JSON From Stdout`, `Link Resource To Project`, and `Create Synthetic Codebase` into `common_e2e.resource`. `Create Synthetic Codebase` is parameterized with `${project_label}` for suite-specific literals. Both `m5_acceptance.robot` and `tdd_acms_behavioral_validation.robot` now use the shared keywords. ### L2 (LOW) — Redundant exit code assertions ✅ Fixed Removed all 12 instances of redundant `Should Be Equal As Integers ${result.rc} 0` that followed `Run CLI` calls. ### L3 (LOW) — Suite setup error messages less verbose ✅ Fixed All git command error messages now include `(rc=${var.rc}). Check DEBUG logs above.` matching the `m5_acceptance.robot` pattern. ### I1 (INFO) — `tdd_expected_fail` listener masks ℹ️ Acknowledged Already documented. No action needed.
freemo approved these changes 2026-03-24 15:29:11 +00:00
Dismissed
freemo left a comment

Review: APPROVED

TDD E2E behavioral test proving ACMS indexing pipeline is not wired into CLI (bug #1028). Clean test structure with proper tags and clear documentation of what the test proves.

## Review: APPROVED TDD E2E behavioral test proving ACMS indexing pipeline is not wired into CLI (bug #1028). Clean test structure with proper tags and clear documentation of what the test proves.
hurui200320 force-pushed tdd/m5-acms-cli-indexing-pipeline-wiring from 5602c9a87c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 3m27s
CI / integration_tests (pull_request) Successful in 3m34s
CI / quality (pull_request) Successful in 3m42s
CI / typecheck (pull_request) Successful in 3m53s
CI / security (pull_request) Successful in 4m1s
CI / unit_tests (pull_request) Successful in 6m43s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 8m58s
CI / coverage (pull_request) Successful in 10m3s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 49m58s
to 90d18f06d1
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m40s
CI / build (pull_request) Failing after 12s
CI / typecheck (pull_request) Successful in 4m17s
CI / quality (pull_request) Successful in 3m52s
CI / security (pull_request) Successful in 5m17s
CI / e2e_tests (pull_request) Failing after 6m4s
CI / integration_tests (pull_request) Successful in 6m56s
CI / unit_tests (pull_request) Successful in 9m0s
CI / benchmark-regression (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
2026-03-25 05:39:39 +00:00
Compare
hurui200320 dismissed freemo's review 2026-03-25 05:39:39 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

fix: nox -s build
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 20s
CI / security (pull_request) Successful in 56s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m55s
CI / typecheck (pull_request) Successful in 3m56s
CI / e2e_tests (pull_request) Failing after 5m6s
CI / integration_tests (pull_request) Successful in 5m40s
CI / coverage (pull_request) Successful in 11m20s
CI / unit_tests (pull_request) Failing after 18m44s
CI / benchmark-regression (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / status-check (pull_request) Has been cancelled
f56f2052a6
hurui200320 scheduled this pull request to auto merge when all checks succeed 2026-03-25 05:50:53 +00:00
fix: switch to openai llm for e2e test since currently anthrpoic api has no balance
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 25s
CI / build (pull_request) Successful in 24s
CI / quality (pull_request) Successful in 42s
CI / typecheck (pull_request) Successful in 54s
CI / integration_tests (pull_request) Successful in 6m42s
CI / security (pull_request) Successful in 4m14s
CI / unit_tests (pull_request) Successful in 7m30s
CI / e2e_tests (pull_request) Successful in 7m45s
CI / docker (pull_request) Successful in 2m4s
CI / coverage (pull_request) Successful in 11m19s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-regression (pull_request) Successful in 54m1s
dc4c6e6280
hurui200320 deleted branch tdd/m5-acms-cli-indexing-pipeline-wiring 2026-03-25 11:13:38 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!1124
No description provided.