feat(testing): implement @tdd_expected_fail tag handling in Robot Framework #673

Merged
brent.edwards merged 3 commits from feature/m5-robot-tdd-tags into master 2026-03-16 23:13:31 +00:00
Member

Summary

Implements the three-tag TDD bug-capture system in Robot Framework via a Listener v3 module, paralleling the Behave implementation in PR !665 (issue #627). This enables TDD bug-capture tests to work in Robot Framework integration tests using the same tdd_bug, tdd_bug_<N>, and tdd_expected_fail tags as specified in CONTRIBUTING.md > TDD Bug Test Tags.

Changes

  • Listener (robot/tdd_expected_fail_listener.py): Robot Framework Listener v3 with start_test (tag validation) and end_test (result inversion) hooks. Tests tagged tdd_expected_fail that fail have their result inverted to PASS (bug still exists); tests that unexpectedly pass are inverted to FAIL with guidance message. Includes idempotency guard against double-invocation, explicit SKIP status handling, close() hook for clean teardown, and documented data.tags vs result.tags design choice. Error message @ prefix divergence from Behave is documented.
  • Tag validation: tdd_bug_<N> requires tdd_bug; tdd_expected_fail requires both tdd_bug and at least one tdd_bug_<N>. Validation errors are stored in start_test and applied in end_test. Design note documents why str | None return type is used instead of ValueError.
  • Nox registration (noxfile.py): Added --listener flag in integration_tests, slow_integration_tests, and e2e_tests sessions using Path(__file__).parent-relative path resolution. Added --exclude tdd_fixture to prevent fixture files from being picked up by the main pabot runner.
  • Test fixtures (robot/fixtures/): 8 fixture .robot files covering all acceptance criteria plus SKIP handling, tdd_expected_fail-alone validation, and tdd_bug-alone edge case.
  • Integration tests (robot/tdd_tag_validation.robot): 21 test cases total — 12 Behave-side (from PR !665) plus 9 Robot-side that run fixtures via helper script as sub-processes with the listener active, then inspect output XML for correct final status.
  • Helper script (robot/helper_tdd_tag_validation.py): Unified helper supporting both Behave-side validation commands and Robot-side fixture commands. Includes _run_multi_fixture() for proving listener selectivity, output.xml existence guard, tightened substring assertions, __all__ declaration, XML message attribute fallback for RF 7.3+, and XML parsing risk documentation.
  • Shared mock helper (features/mocks/tdd_test_helpers.py): Extracted make_mock_scenario() from duplicated code in robot/helper_tdd_tag_validation.py and features/steps/tdd_tag_validation_steps.py. Both files now import from the shared module.
  • Changelog (CHANGELOG.md): Added entry for this feature.

Review Fixes (Round 4)

Rebased onto latest master — PR now contains a single commit for #628 only. Addressed all applicable findings from @hamza.khyari's code review:

# Severity Finding Resolution
1 P2 Double-invocation idempotency bug Added _processed_tests set guard with close() cleanup
2 P2 Normal Test is a sham test Rewrote to run alongside tdd_expected_fail fixture in single invocation
3 P3 Missing output.xml guard Added existence check with clear diagnostic
4 P3 data.tags vs result.tags Documented intentional choice in docstring
5 P3 No SKIP test Added tdd_expected_fail_skip.robot fixture + test case
6 P3 Missing message check Added "TDD expected failure" assertion
7 P3 Weak substring assertions Tightened to "missing the required tdd_bug tag"
8 P3 No tdd_expected_fail-alone test Added tdd_expected_fail_alone.robot fixture + test
9 P3 Guidance message deviation Kept improvements, documented rationale
10 P3 _validation_errors not cleared Added close() hook
11 P3 Stale temp path in return Simplified to tuple[str, str]
12 P3 CWD-relative listener path Changed to Path(__file__).parent / ...
13 P4 Missing __all__ Added __all__: list[str] = []
14 P4 "mirroring" in docstring Changed to "paralleling"
15 P4 XML parsing without defusedxml Added risk acceptance comment
M-5 Medium _make_mock_scenario duplicated Extracted to features/mocks/tdd_test_helpers.py
M-7 Medium XML message attribute fallback Added status_el.get("message") fallback for RF 7.3+
L-6 Low Docstring says "longname" Fixed to "full_name"
L-7 Low Error messages @ prefix Documented intentional divergence per framework convention
Nit - Missing tdd_bug-alone fixture Added fixture + test case
Nit - _validate_tdd_tags return type undocumented Added design note in docstring

Quality Gates (post-rebase onto latest master)

Gate Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS (10,815 scenarios, 381 features)
nox -s integration_tests PASS (1,519 tests)
nox -s coverage_report PASS (97% >= 97% threshold)

Closes #628

ISSUES CLOSED: #628

## Summary Implements the three-tag TDD bug-capture system in Robot Framework via a Listener v3 module, paralleling the Behave implementation in PR !665 (issue #627). This enables TDD bug-capture tests to work in Robot Framework integration tests using the same `tdd_bug`, `tdd_bug_<N>`, and `tdd_expected_fail` tags as specified in CONTRIBUTING.md > TDD Bug Test Tags. ### Changes - **Listener** (`robot/tdd_expected_fail_listener.py`): Robot Framework Listener v3 with `start_test` (tag validation) and `end_test` (result inversion) hooks. Tests tagged `tdd_expected_fail` that fail have their result inverted to PASS (bug still exists); tests that unexpectedly pass are inverted to FAIL with guidance message. Includes idempotency guard against double-invocation, explicit SKIP status handling, `close()` hook for clean teardown, and documented `data.tags` vs `result.tags` design choice. Error message `@` prefix divergence from Behave is documented. - **Tag validation**: `tdd_bug_<N>` requires `tdd_bug`; `tdd_expected_fail` requires both `tdd_bug` and at least one `tdd_bug_<N>`. Validation errors are stored in `start_test` and applied in `end_test`. Design note documents why `str | None` return type is used instead of `ValueError`. - **Nox registration** (`noxfile.py`): Added `--listener` flag in `integration_tests`, `slow_integration_tests`, and `e2e_tests` sessions using `Path(__file__).parent`-relative path resolution. Added `--exclude tdd_fixture` to prevent fixture files from being picked up by the main pabot runner. - **Test fixtures** (`robot/fixtures/`): 8 fixture `.robot` files covering all acceptance criteria plus SKIP handling, tdd_expected_fail-alone validation, and tdd_bug-alone edge case. - **Integration tests** (`robot/tdd_tag_validation.robot`): 21 test cases total — 12 Behave-side (from PR !665) plus 9 Robot-side that run fixtures via helper script as sub-processes with the listener active, then inspect output XML for correct final status. - **Helper script** (`robot/helper_tdd_tag_validation.py`): Unified helper supporting both Behave-side validation commands and Robot-side fixture commands. Includes `_run_multi_fixture()` for proving listener selectivity, output.xml existence guard, tightened substring assertions, `__all__` declaration, XML message attribute fallback for RF 7.3+, and XML parsing risk documentation. - **Shared mock helper** (`features/mocks/tdd_test_helpers.py`): Extracted `make_mock_scenario()` from duplicated code in `robot/helper_tdd_tag_validation.py` and `features/steps/tdd_tag_validation_steps.py`. Both files now import from the shared module. - **Changelog** (`CHANGELOG.md`): Added entry for this feature. ### Review Fixes (Round 4) Rebased onto latest master — PR now contains a single commit for #628 only. Addressed all applicable findings from @hamza.khyari's code review: | # | Severity | Finding | Resolution | |---|----------|---------|------------| | 1 | P2 | Double-invocation idempotency bug | Added `_processed_tests` set guard with `close()` cleanup | | 2 | P2 | Normal Test is a sham test | Rewrote to run alongside tdd_expected_fail fixture in single invocation | | 3 | P3 | Missing output.xml guard | Added existence check with clear diagnostic | | 4 | P3 | `data.tags` vs `result.tags` | Documented intentional choice in docstring | | 5 | P3 | No SKIP test | Added `tdd_expected_fail_skip.robot` fixture + test case | | 6 | P3 | Missing message check | Added `"TDD expected failure"` assertion | | 7 | P3 | Weak substring assertions | Tightened to `"missing the required tdd_bug tag"` | | 8 | P3 | No tdd_expected_fail-alone test | Added `tdd_expected_fail_alone.robot` fixture + test | | 9 | P3 | Guidance message deviation | Kept improvements, documented rationale | | 10 | P3 | `_validation_errors` not cleared | Added `close()` hook | | 11 | P3 | Stale temp path in return | Simplified to `tuple[str, str]` | | 12 | P3 | CWD-relative listener path | Changed to `Path(__file__).parent / ...` | | 13 | P4 | Missing `__all__` | Added `__all__: list[str] = []` | | 14 | P4 | "mirroring" in docstring | Changed to "paralleling" | | 15 | P4 | XML parsing without defusedxml | Added risk acceptance comment | | M-5 | Medium | `_make_mock_scenario` duplicated | Extracted to `features/mocks/tdd_test_helpers.py` | | M-7 | Medium | XML message attribute fallback | Added `status_el.get("message")` fallback for RF 7.3+ | | L-6 | Low | Docstring says "longname" | Fixed to "full_name" | | L-7 | Low | Error messages `@` prefix | Documented intentional divergence per framework convention | | Nit | - | Missing tdd_bug-alone fixture | Added fixture + test case | | Nit | - | `_validate_tdd_tags` return type undocumented | Added design note in docstring | ### Quality Gates (post-rebase onto latest master) | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (10,815 scenarios, 381 features) | | `nox -s integration_tests` | PASS (1,519 tests) | | `nox -s coverage_report` | PASS (97% >= 97% threshold) | Closes #628 ISSUES CLOSED: #628
hurui200320 added this to the v3.2.0 milestone 2026-03-10 08:31:42 +00:00
hurui200320 left a comment

Self-Review: PR #673feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

This is a self-review to ensure everything looks good before asking other people to review.


Acceptance Criteria Verification (Ticket #628)

All 6 acceptance criteria are met:

Criteria Status Covered By
tdd_expected_fail tests that fail → inverted to PASS tdd_expected_fail_listener.py:176-186, fixture tdd_expected_fail_fails.robot
tdd_expected_fail tests that pass → inverted to FAIL with guidance tdd_expected_fail_listener.py:187-201, fixture tdd_expected_fail_passes.robot
tdd_bug_<N> without tdd_bug → validation error tdd_expected_fail_listener.py:88-94, fixture tdd_missing_tdd_bug.robot
tdd_expected_fail without tdd_bug/tdd_bug_<N> → validation error tdd_expected_fail_listener.py:96-108, fixture tdd_expected_fail_missing_bug_n.robot
tdd_bug usable as a filter tag (--include tdd_bug) Standard RF tag filtering; no custom code needed
Normal tests completely unaffected tdd_expected_fail_listener.py:172-174, fixture tdd_normal_test.robot

Issue Found

# type: ignore[operator] violates CONTRIBUTING.md Type Safety rules

File: robot/helper_tdd_tag_validation.py:184
Related: robot/helper_tdd_tag_validation.py:165

CONTRIBUTING.md (line 546-548) states: "never use inline comments or annotations to suppress individual type checking errors (e.g., no type: ignore)".

Line 165 types the command dict as dict[str, object], which causes handler on line 183 to be inferred as object (not callable). This forces the # type: ignore[operator] on line 184.

10 other helpers in the same directory already use the correct pattern:

from collections.abc import Callable

_COMMANDS: dict[str, Callable[[], int]] = { ... }

(e.g., helper_unified_context_models.py:152, helper_uko_ontology.py:105, helper_repo_indexing.py:170)

Fix:

  1. Add from collections.abc import Callable to imports.
  2. Change line 165: _COMMANDS: dict[str, Callable[[], int]] = {
  3. Remove the # type: ignore[operator] from line 184.

Items Reviewed (No Issues)

  • Listener logic (tdd_expected_fail_listener.py): Correct Listener v3 API usage. Tag validation in start_test with deferred application in end_test is the right pattern. _validation_errors dict properly cleaned via .pop(). Case-insensitive tag matching via .lower() is correct. _TDD_BUG_N_RE regex (tdd_bug_\d+ with fullmatch) correctly matches issue reference tags.
  • Nox integration (noxfile.py): --listener and --exclude tdd_fixture correctly added to both integration_tests and slow_integration_tests sessions. Listener path resolved to absolute.
  • All 5 fixture files: Each has the tdd_fixture exclusion tag, proper documentation, and correctly exercises its intended scenario.
  • Integration test suite (tdd_tag_validation.robot): Uses common.resource for Setup Test Environment, correctly references ${PYTHON} and ${WORKSPACE}, validates each sub-command via the helper.
  • Helper script (helper_tdd_tag_validation.py): Sub-process isolation is correct. XML message extraction has a sensible fallback to <msg> elements.
  • CHANGELOG entry: Present and accurate.
  • Thread/process safety: With pabot each worker gets its own _validation_errors dict (separate processes). With the standard runner, tests are sequential. No concurrency issues.
  • Security: No user-controlled input reaches subprocess.run. All paths derived from __file__ and fixture names.
  • Test coverage: 5 integration test cases cover all acceptance criteria.
## Self-Review: PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` This is a self-review to ensure everything looks good before asking other people to review. --- ### Acceptance Criteria Verification (Ticket #628) All 6 acceptance criteria are met: | Criteria | Status | Covered By | |---|---|---| | `tdd_expected_fail` tests that fail → inverted to PASS | :white_check_mark: | `tdd_expected_fail_listener.py:176-186`, fixture `tdd_expected_fail_fails.robot` | | `tdd_expected_fail` tests that pass → inverted to FAIL with guidance | :white_check_mark: | `tdd_expected_fail_listener.py:187-201`, fixture `tdd_expected_fail_passes.robot` | | `tdd_bug_<N>` without `tdd_bug` → validation error | :white_check_mark: | `tdd_expected_fail_listener.py:88-94`, fixture `tdd_missing_tdd_bug.robot` | | `tdd_expected_fail` without `tdd_bug`/`tdd_bug_<N>` → validation error | :white_check_mark: | `tdd_expected_fail_listener.py:96-108`, fixture `tdd_expected_fail_missing_bug_n.robot` | | `tdd_bug` usable as a filter tag (`--include tdd_bug`) | :white_check_mark: | Standard RF tag filtering; no custom code needed | | Normal tests completely unaffected | :white_check_mark: | `tdd_expected_fail_listener.py:172-174`, fixture `tdd_normal_test.robot` | --- ### Issue Found #### `# type: ignore[operator]` violates CONTRIBUTING.md Type Safety rules **File:** `robot/helper_tdd_tag_validation.py:184` **Related:** `robot/helper_tdd_tag_validation.py:165` CONTRIBUTING.md (line 546-548) states: *"never use inline comments or annotations to suppress individual type checking errors (e.g., no `type: ignore`)"*. Line 165 types the command dict as `dict[str, object]`, which causes `handler` on line 183 to be inferred as `object` (not callable). This forces the `# type: ignore[operator]` on line 184. **10 other helpers in the same directory already use the correct pattern:** ```python from collections.abc import Callable _COMMANDS: dict[str, Callable[[], int]] = { ... } ``` (e.g., `helper_unified_context_models.py:152`, `helper_uko_ontology.py:105`, `helper_repo_indexing.py:170`) **Fix:** 1. Add `from collections.abc import Callable` to imports. 2. Change line 165: `_COMMANDS: dict[str, Callable[[], int]] = {` 3. Remove the `# type: ignore[operator]` from line 184. --- ### Items Reviewed (No Issues) - **Listener logic** (`tdd_expected_fail_listener.py`): Correct Listener v3 API usage. Tag validation in `start_test` with deferred application in `end_test` is the right pattern. `_validation_errors` dict properly cleaned via `.pop()`. Case-insensitive tag matching via `.lower()` is correct. `_TDD_BUG_N_RE` regex (`tdd_bug_\d+` with `fullmatch`) correctly matches issue reference tags. - **Nox integration** (`noxfile.py`): `--listener` and `--exclude tdd_fixture` correctly added to both `integration_tests` and `slow_integration_tests` sessions. Listener path resolved to absolute. - **All 5 fixture files**: Each has the `tdd_fixture` exclusion tag, proper documentation, and correctly exercises its intended scenario. - **Integration test suite** (`tdd_tag_validation.robot`): Uses `common.resource` for `Setup Test Environment`, correctly references `${PYTHON}` and `${WORKSPACE}`, validates each sub-command via the helper. - **Helper script** (`helper_tdd_tag_validation.py`): Sub-process isolation is correct. XML message extraction has a sensible fallback to `<msg>` elements. - **CHANGELOG entry**: Present and accurate. - **Thread/process safety**: With pabot each worker gets its own `_validation_errors` dict (separate processes). With the standard runner, tests are sequential. No concurrency issues. - **Security**: No user-controlled input reaches `subprocess.run`. All paths derived from `__file__` and fixture names. - **Test coverage**: 5 integration test cases cover all acceptance criteria.
@ -0,0 +162,4 @@
return 1
_COMMANDS: dict[str, object] = {
Author
Member

This should be dict[str, Callable[[], int]] (with from collections.abc import Callable added to imports). Using object loses the callable type information and forces the # type: ignore[operator] on line 184. 10 other helpers in this directory already use the Callable[[], int] pattern (e.g., helper_unified_context_models.py:152).

This should be `dict[str, Callable[[], int]]` (with `from collections.abc import Callable` added to imports). Using `object` loses the callable type information and forces the `# type: ignore[operator]` on line 184. 10 other helpers in this directory already use the `Callable[[], int]` pattern (e.g., `helper_unified_context_models.py:152`).
hurui200320 marked this conversation as resolved
@ -0,0 +181,4 @@
sys.exit(1)
handler = _COMMANDS[sys.argv[1]]
sys.exit(handler()) # type: ignore[operator]
Author
Member

Remove this # type: ignore[operator] — it violates CONTRIBUTING.md Type Safety rules (line 546-548). Once the _COMMANDS type on line 165 is fixed to dict[str, Callable[[], int]], this suppression is no longer needed.

Remove this `# type: ignore[operator]` — it violates CONTRIBUTING.md Type Safety rules (line 546-548). Once the `_COMMANDS` type on line 165 is fixed to `dict[str, Callable[[], int]]`, this suppression is no longer needed.
hurui200320 marked this conversation as resolved
hurui200320 force-pushed feature/m5-robot-tdd-tags from b882b5e208
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 42s
CI / unit_tests (pull_request) Successful in 2m58s
CI / integration_tests (pull_request) Successful in 3m10s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 4m57s
CI / benchmark-regression (pull_request) Successful in 31m50s
to f72a981f54
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m42s
CI / integration_tests (pull_request) Successful in 3m10s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 5m56s
CI / benchmark-regression (pull_request) Successful in 30m56s
2026-03-10 09:14:22 +00:00
Compare
Author
Member

Self-review fix applied (force-pushed f72a981f):

Fixed the # type: ignore[operator] violation in robot/helper_tdd_tag_validation.py identified in the self-review:

  1. Added from collections.abc import Callable to imports (line 42)
  2. Changed _COMMANDS: dict[str, object]_COMMANDS: dict[str, Callable[[], int]] (line 166)
  3. Removed the # type: ignore[operator] from sys.exit(handler()) (line 185)

This follows the same pattern used by 10+ other helper scripts in the robot/ directory (e.g., helper_unified_context_models.py:152, helper_uko_ontology.py:105).

Verified locally:

  • nox -s lint → PASS (0 violations)
  • nox -s typecheck → PASS (0 errors, 0 warnings)

CI is running on the updated commit.

**Self-review fix applied** (force-pushed `f72a981f`): Fixed the `# type: ignore[operator]` violation in `robot/helper_tdd_tag_validation.py` identified in the self-review: 1. Added `from collections.abc import Callable` to imports (line 42) 2. Changed `_COMMANDS: dict[str, object]` → `_COMMANDS: dict[str, Callable[[], int]]` (line 166) 3. Removed the `# type: ignore[operator]` from `sys.exit(handler())` (line 185) This follows the same pattern used by 10+ other helper scripts in the `robot/` directory (e.g., `helper_unified_context_models.py:152`, `helper_uko_ontology.py:105`). Verified locally: - `nox -s lint` → PASS (0 violations) - `nox -s typecheck` → PASS (0 errors, 0 warnings) CI is running on the updated commit.
hurui200320 left a comment

Self-Review (Round 2): PR #673feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

This is a self-review to ensure everything looks good before asking other people to review.


Previous Review Issues — All Resolved

The round-1 self-review (review #2091) identified two issues. Both are confirmed fixed in f72a981f:

  1. dict[str, object]dict[str, Callable[[], int]] at helper_tdd_tag_validation.py:166Fixed
  2. # type: ignore[operator] removed at helper_tdd_tag_validation.py:185Fixed

Acceptance Criteria (Ticket #628) — All Met

Criteria Status Covered By
tdd_expected_fail + fail → PASS listener.py:176-186, fixture tdd_expected_fail_fails.robot
tdd_expected_fail + pass → FAIL with guidance listener.py:187-201, fixture tdd_expected_fail_passes.robot
tdd_bug_<N> without tdd_bug → validation error listener.py:88-94, fixture tdd_missing_tdd_bug.robot
tdd_expected_fail without prerequisites → validation error listener.py:96-108, fixture tdd_expected_fail_missing_bug_n.robot
tdd_bug usable as filter Inherent RF tag filtering
Normal tests unaffected listener.py:172-174, fixture tdd_normal_test.robot

Guidance message matches ticket requirement exactly.


New Finding

end_test bare else catches SKIP status incorrectly (Medium)

File: robot/tdd_expected_fail_listener.py:187

See inline comment below. Robot Framework 4.0+ supports SKIP status. The current bare else: at line 187 catches any non-FAIL status (PASS, SKIP, etc.), so a skipped tdd_expected_fail test would be incorrectly reported as "Bug appears to be fixed." Should be changed to elif result.status == "PASS": with an explicit else: for other statuses.

Likelihood: Low — TDD tests shouldn't normally be skipped. But since this is long-lived infrastructure code, defensive handling is warranted.


Items Reviewed (No Issues)

  • Listener logic: Correct Listener v3 API. Tag validation/result inversion split is the right pattern. _validation_errors dict with .pop() cleanup. Case-insensitive matching. Regex with fullmatch correctly distinguishes tdd_bug from tdd_bug_<N>.
  • Nox integration: --listener and --exclude tdd_fixture correctly added to both integration_tests and slow_integration_tests. Listener path resolved to absolute.
  • All 5 fixtures: Correct tags, documentation, and scenario exercise.
  • Integration tests (tdd_tag_validation.robot): Correct setup, variable usage, and assertions.
  • Helper script: Subprocess isolation, XML parsing with fallback, proper typing.
  • CHANGELOG: Present and accurate.
  • Thread/process safety: pabot = separate processes, no concurrency issues.
  • Security: No user-controlled input in subprocess calls.
  • Type safety: No # type: ignore. All functions annotated.
  • Test coverage: 5 integration tests covering all 6 acceptance criteria.
## Self-Review (Round 2): PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` This is a self-review to ensure everything looks good before asking other people to review. --- ### Previous Review Issues — All Resolved The round-1 self-review (review #2091) identified two issues. Both are confirmed **fixed** in `f72a981f`: 1. `dict[str, object]` → `dict[str, Callable[[], int]]` at `helper_tdd_tag_validation.py:166` — **Fixed** 2. `# type: ignore[operator]` removed at `helper_tdd_tag_validation.py:185` — **Fixed** --- ### Acceptance Criteria (Ticket #628) — All Met | Criteria | Status | Covered By | |---|---|---| | `tdd_expected_fail` + fail → PASS | ✅ | `listener.py:176-186`, fixture `tdd_expected_fail_fails.robot` | | `tdd_expected_fail` + pass → FAIL with guidance | ✅ | `listener.py:187-201`, fixture `tdd_expected_fail_passes.robot` | | `tdd_bug_<N>` without `tdd_bug` → validation error | ✅ | `listener.py:88-94`, fixture `tdd_missing_tdd_bug.robot` | | `tdd_expected_fail` without prerequisites → validation error | ✅ | `listener.py:96-108`, fixture `tdd_expected_fail_missing_bug_n.robot` | | `tdd_bug` usable as filter | ✅ | Inherent RF tag filtering | | Normal tests unaffected | ✅ | `listener.py:172-174`, fixture `tdd_normal_test.robot` | Guidance message matches ticket requirement exactly. --- ### New Finding #### `end_test` bare `else` catches SKIP status incorrectly (Medium) **File:** `robot/tdd_expected_fail_listener.py:187` See inline comment below. Robot Framework 4.0+ supports `SKIP` status. The current bare `else:` at line 187 catches any non-FAIL status (PASS, SKIP, etc.), so a skipped `tdd_expected_fail` test would be incorrectly reported as "Bug appears to be fixed." Should be changed to `elif result.status == "PASS":` with an explicit `else:` for other statuses. **Likelihood:** Low — TDD tests shouldn't normally be skipped. But since this is long-lived infrastructure code, defensive handling is warranted. --- ### Items Reviewed (No Issues) - **Listener logic**: Correct Listener v3 API. Tag validation/result inversion split is the right pattern. `_validation_errors` dict with `.pop()` cleanup. Case-insensitive matching. Regex with `fullmatch` correctly distinguishes `tdd_bug` from `tdd_bug_<N>`. - **Nox integration**: `--listener` and `--exclude tdd_fixture` correctly added to both `integration_tests` and `slow_integration_tests`. Listener path resolved to absolute. - **All 5 fixtures**: Correct tags, documentation, and scenario exercise. - **Integration tests** (`tdd_tag_validation.robot`): Correct setup, variable usage, and assertions. - **Helper script**: Subprocess isolation, XML parsing with fallback, proper typing. - **CHANGELOG**: Present and accurate. - **Thread/process safety**: pabot = separate processes, no concurrency issues. - **Security**: No user-controlled input in subprocess calls. - **Type safety**: No `# type: ignore`. All functions annotated. - **Test coverage**: 5 integration tests covering all 6 acceptance criteria.
@ -0,0 +184,4 @@
result.message = (
"TDD expected failure: test failed as expected (bug still exists)."
)
else:
Author
Member

This bare else: catches any non-FAIL status, including SKIP (RF 4.0+). If a tdd_expected_fail test is skipped (e.g., Skip keyword, --skip tag, skip_on_failure), it would incorrectly be reported as "Bug appears to be fixed."

Suggestion: Change to explicit status matching:

    elif result.status == "PASS":
        # Unexpected pass — ...
        ...
    else:
        # SKIP or other statuses — leave unchanged
        _logger.debug(
            "TDD expected-fail test '%s' has status %s — leaving unchanged.",
            full_name,
            result.status,
        )
This bare `else:` catches any non-FAIL status, including `SKIP` (RF 4.0+). If a `tdd_expected_fail` test is skipped (e.g., `Skip` keyword, `--skip` tag, `skip_on_failure`), it would incorrectly be reported as "Bug appears to be fixed." **Suggestion:** Change to explicit status matching: ```python elif result.status == "PASS": # Unexpected pass — ... ... else: # SKIP or other statuses — leave unchanged _logger.debug( "TDD expected-fail test '%s' has status %s — leaving unchanged.", full_name, result.status, ) ```
hurui200320 force-pushed feature/m5-robot-tdd-tags from f72a981f54
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 2m42s
CI / integration_tests (pull_request) Successful in 3m10s
CI / docker (pull_request) Successful in 39s
CI / coverage (pull_request) Successful in 5m56s
CI / benchmark-regression (pull_request) Successful in 30m56s
to f528859e2c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 4m7s
CI / docker (pull_request) Successful in 10s
CI / integration_tests (pull_request) Successful in 4m41s
CI / coverage (pull_request) Successful in 4m56s
CI / benchmark-regression (pull_request) Successful in 31m11s
2026-03-10 10:04:15 +00:00
Compare
Author
Member

Self-review round 2 fix applied (force-pushed f528859e):

Fixed the bare else catching SKIP status in robot/tdd_expected_fail_listener.py (review #2093):

  1. Changed else:elif result.status == "PASS": at line 187 — now only catches actual passes
  2. Added explicit else: clause (line 202) for SKIP and other statuses — logs a debug message and leaves the result unchanged
  3. Updated end_test docstring to document the SKIP handling behavior

Rationale: Robot Framework 4.0+ supports SKIP status via Skip keyword, --skip tag, and skip_on_failure. The previous bare else: would incorrectly report a skipped tdd_expected_fail test as "Bug appears to be fixed." A skipped test is not evidence of a fix — it simply didn't execute.

Verified locally:

  • nox -s lint → PASS (0 violations)
  • nox -s typecheck → PASS (0 errors, 0 warnings)
**Self-review round 2 fix applied** (force-pushed `f528859e`): Fixed the bare `else` catching SKIP status in `robot/tdd_expected_fail_listener.py` (review #2093): 1. Changed `else:` → `elif result.status == "PASS":` at line 187 — now only catches actual passes 2. Added explicit `else:` clause (line 202) for SKIP and other statuses — logs a debug message and leaves the result unchanged 3. Updated `end_test` docstring to document the SKIP handling behavior **Rationale:** Robot Framework 4.0+ supports `SKIP` status via `Skip` keyword, `--skip` tag, and `skip_on_failure`. The previous bare `else:` would incorrectly report a skipped `tdd_expected_fail` test as "Bug appears to be fixed." A skipped test is not evidence of a fix — it simply didn't execute. Verified locally: - `nox -s lint` → PASS (0 violations) - `nox -s typecheck` → PASS (0 errors, 0 warnings)
Owner

PM Status (Day 31):

Good work on the self-review fixes @hurui200320.

Status: Merge conflict detected. 4 comments (self-review).

Action required: Rebase against develop and request external review (suggest @brent.edwards or @CoreRasurae).

Priority: HIGH — this is part of the TDD infrastructure chain (#628) that unblocks the CI quality gate.

**PM Status (Day 31)**: Good work on the self-review fixes @hurui200320. **Status**: Merge conflict detected. 4 comments (self-review). **Action required**: Rebase against `develop` and request external review (suggest @brent.edwards or @CoreRasurae). **Priority**: HIGH — this is part of the TDD infrastructure chain (#628) that unblocks the CI quality gate.
hurui200320 force-pushed feature/m5-robot-tdd-tags from f528859e2c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 33s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 4m7s
CI / docker (pull_request) Successful in 10s
CI / integration_tests (pull_request) Successful in 4m41s
CI / coverage (pull_request) Successful in 4m56s
CI / benchmark-regression (pull_request) Successful in 31m11s
to df74aee449
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 24s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 2m29s
CI / unit_tests (pull_request) Successful in 2m56s
CI / build (pull_request) Successful in 3m16s
CI / docker (pull_request) Successful in 46s
CI / integration_tests (pull_request) Successful in 6m24s
CI / coverage (pull_request) Successful in 5m32s
CI / benchmark-regression (pull_request) Successful in 36m27s
2026-03-11 06:20:34 +00:00
Compare
Owner

PM Review — Day 31 (Specification Update)

This PR is mergeable.

Spec Alignment Check

Robot Framework TDD tag infrastructure is NOT impacted by protocol or TUI changes.

Status

  • Good self-review fixes (type: ignore violation, SKIP status handling)
  • Needs external review
  • HIGH priority — part of TDD infrastructure chain

Action Required

@hurui200320 — Request review from @brent.edwards or @CoreRasurae. This is HIGH priority TDD infrastructure.

## PM Review — Day 31 (Specification Update) This PR is **mergeable**. ### Spec Alignment Check Robot Framework TDD tag infrastructure is NOT impacted by protocol or TUI changes. ### Status - Good self-review fixes (type: ignore violation, SKIP status handling) - Needs external review - HIGH priority — part of TDD infrastructure chain ### Action Required @hurui200320 — Request review from @brent.edwards or @CoreRasurae. This is HIGH priority TDD infrastructure.
Owner

PM Status — Day 31 (2026-03-11)

PR is rebased and clean (no merge conflicts). All quality gates pass. @hurui200320 posted a thorough self-review.

Needs: An external reviewer. @CoreRasurae — given your work on the Behave counterpart (#627/PR #665), you'd be the ideal reviewer for consistency. Alternatively, @brent.edwards has TDD workflow experience.

Priority: Tier 1 — this is TDD infrastructure that unblocks the bug pipeline. Please review within 1 business day.

## PM Status — Day 31 (2026-03-11) PR is rebased and clean (no merge conflicts). All quality gates pass. @hurui200320 posted a thorough self-review. **Needs:** An external reviewer. @CoreRasurae — given your work on the Behave counterpart (#627/PR #665), you'd be the ideal reviewer for consistency. Alternatively, @brent.edwards has TDD workflow experience. **Priority:** Tier 1 — this is TDD infrastructure that unblocks the bug pipeline. Please review within 1 business day.
hurui200320 force-pushed feature/m5-robot-tdd-tags from df74aee449
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 24s
CI / security (pull_request) Successful in 36s
CI / typecheck (pull_request) Successful in 2m29s
CI / unit_tests (pull_request) Successful in 2m56s
CI / build (pull_request) Successful in 3m16s
CI / docker (pull_request) Successful in 46s
CI / integration_tests (pull_request) Successful in 6m24s
CI / coverage (pull_request) Successful in 5m32s
CI / benchmark-regression (pull_request) Successful in 36m27s
to c383623eb7
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 40s
CI / unit_tests (pull_request) Successful in 3m1s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 3m37s
CI / coverage (pull_request) Successful in 5m36s
CI / benchmark-regression (pull_request) Successful in 35m56s
2026-03-12 02:38:54 +00:00
Compare
Member

Code Review -- PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Reviewer: hamza.khyari | Protocol: 9-phase review (2 rounds, 6 independent passes)

Summary

Well-structured implementation. The listener correctly implements all 6 acceptance criteria from issue #628. The fixture-based subprocess test architecture is sound and mirrors real-world usage. Code quality is high -- clean separation between validation (start_test) and inversion (end_test), proper regex compilation, and thorough docstrings.

Two rounds of review (4 focused passes + 2 full-protocol passes) produced the findings below. 2 are should-fix, the rest are minor or nits.

Findings

# Severity Category File Finding
1 P2:should-fix BUG tdd_expected_fail_listener.py Double-invocation idempotency bug
2 P2:should-fix TEST tdd_tag_validation.robot "Normal Test Unaffected" is a sham test
3 P3:nit BUG helper_tdd_tag_validation.py _run_fixture does not guard against missing output.xml
4 P3:nit BUG tdd_expected_fail_listener.py end_test reads data.tags instead of result.tags
5 P3:nit TEST tdd_tag_validation.robot No test for SKIP status branch
6 P3:nit TEST helper_tdd_tag_validation.py cmd_expected_fail_inverted never checks message content
7 P3:nit TEST helper_tdd_tag_validation.py Weak substring assertions in validation checks
8 P3:nit TEST tdd_tag_validation.robot No test for tdd_expected_fail alone (both companions missing)
9 P3:nit SPEC tdd_expected_fail_listener.py Guidance message deviates from AC #2 exact wording
10 P3:nit CODE tdd_expected_fail_listener.py _validation_errors never cleared on abnormal exit
11 P3:nit CODE helper_tdd_tag_validation.py _run_fixture returns stale temp file path
12 P3:nit CODE noxfile.py Listener path resolution is CWD-relative
13 P4:nit CODE helper_tdd_tag_validation.py No __all__ declaration
14 P4:nit CODE tdd_expected_fail_listener.py Docstring says "mirroring" but messages differ
15 P4:nit SEC helper_tdd_tag_validation.py xml.etree.ElementTree without defusedxml

1. Double-invocation idempotency bug

Severity: P2:should-fix | Category: BUG
File: robot/tdd_expected_fail_listener.py (end_test)

If the listener is loaded twice (--listener specified twice, or passed via both noxfile and user args), both invocations share the same module-level state. The first end_test inverts FAIL->PASS; the second sees PASS and inverts it back to FAIL with the "Bug appears to be fixed" guidance message -- a silent double-inversion producing the wrong result.

Recommendation: Add an idempotency guard:

_processed_tests: set[str] = set()

def end_test(data, result):
    full_name = result.full_name
    if full_name in _processed_tests:
        return
    _processed_tests.add(full_name)
    ...

Clear it in a close() hook alongside _validation_errors.


2. "Normal Test Unaffected" is a sham test

Severity: P2:should-fix | Category: TEST
File: robot/helper_tdd_tag_validation.py:133-141, robot/fixtures/tdd_normal_test.robot

The fixture contains Should Be True ${TRUE} which passes unconditionally. The helper checks status == "PASS". This test produces the same PASS result with or without the listener loaded -- even if the listener file is empty or has an import error. It does not prove the listener was active and chose not to modify the result.

Recommendation: Either (a) run the normal-test fixture in the same suite as a tdd_expected_fail fixture to prove the listener is loaded and selectively applies, or (b) check Robot's console output for the listener loading message.


3. _run_fixture does not guard against missing output.xml

Severity: P3:nit | Category: BUG
File: robot/helper_tdd_tag_validation.py:81

If robot fails to start (import error, bad listener path, syntax error), output.xml is never created. ET.parse(out_xml) raises an opaque FileNotFoundError. Since this is test infrastructure, a clear diagnostic would help debugging.

Recommendation:

proc = subprocess.run(cmd, capture_output=True, text=True)
if not Path(out_xml).exists():
    print(f"ERROR: Robot did not produce output.xml (rc={proc.returncode})", file=sys.stderr)
    print(f"stderr: {proc.stderr}", file=sys.stderr)
    sys.exit(1)

4. end_test reads data.tags instead of result.tags

Severity: P3:nit | Category: BUG
File: robot/tdd_expected_fail_listener.py:174

The old listener on master read from result.tags (which reflects runtime Set Tags/Remove Tags modifications). The new listener reads data.tags (static test definition). If any test dynamically adds tdd_expected_fail at runtime, the new listener won't see it. If this is intentional (TDD tags should be static), add a comment explaining the choice.


5. No test for SKIP status branch

Severity: P3:nit | Category: TEST
File: robot/tdd_expected_fail_listener.py:192-199

end_test has an explicit else branch for SKIP that leaves the result unchanged. This branch has zero test coverage. A regression would go undetected.

Recommendation: Add a fixture with [Tags] tdd_bug tdd_bug_777 tdd_expected_fail tdd_fixture that calls Skip msg=Skipping intentionally, and verify the result stays SKIP.


6. cmd_expected_fail_inverted never checks message content

Severity: P3:nit | Category: TEST
File: robot/helper_tdd_tag_validation.py:78-86

This function only checks status == "PASS". It does not verify the message is "TDD expected failure: test failed as expected (bug still exists).". A bug that inverts to PASS with a wrong or empty message goes undetected. By contrast, cmd_unexpected_pass_inverted correctly checks for "Bug appears to be fixed".

Recommendation: if status == "PASS" and "TDD expected failure" in message:


7. Weak substring assertions in validation checks

Severity: P3:nit | Category: TEST
File: robot/helper_tdd_tag_validation.py:100,115

"tdd_bug" in message.lower() matches any message incidentally containing the substring. "Bug appears to be fixed" in message would match a truncated message. Tighten to specific error text, e.g. "missing the required tdd_bug tag" and "Remove the tdd_expected_fail tag".


8. No test for tdd_expected_fail alone (both companions missing)

Severity: P3:nit | Category: TEST
File: robot/tdd_tag_validation.robot

AC #4 covers tdd_expected_fail without tdd_bug or tdd_bug_<N>. The existing fixture tdd_expected_fail_missing_bug_n.robot tests missing tdd_bug_<N> only (while tdd_bug is present). The branch where both are missing (missing: ["tdd_bug", "tdd_bug_<N>"]) is untested.


9. Guidance message deviates from AC #2 exact wording

Severity: P3:nit | Category: SPEC
File: robot/tdd_expected_fail_listener.py:183-188

AC #2 specifies: "Bug appears to be fixed. Remove the tdd_expected_fail tag and verify the fix through the bug fix workflow."

Implementation adds "from this test" and appends "See CONTRIBUTING.md > Bug Fix Workflow." -- arguably improvements, but not spec-verbatim. Also note the Behave implementation uses a different message format entirely.


10. _validation_errors never cleared on abnormal exit

Severity: P3:nit | Category: CODE
File: robot/tdd_expected_fail_listener.py:64

If start_test fires but end_test never fires (Robot crash, SIGINT), stale entries remain. In normal RF execution this is harmless (process exits), but a close() hook would make the listener reusable in long-lived processes:

def close() -> None:
    _validation_errors.clear()

11. _run_fixture returns stale temp file path

Severity: P3:nit | Category: CODE
File: robot/helper_tdd_tag_validation.py:96

Returns (status, message, out_xml) where out_xml points inside an already-deleted TemporaryDirectory. All callers discard it via _. Simplify to tuple[str, str].


12. Listener path resolution is CWD-relative

Severity: P3:nit | Category: CODE
File: noxfile.py

Path("robot/tdd_expected_fail_listener.py").resolve() resolves relative to CWD. If nox is invoked from a different directory, the path breaks. Consider Path(__file__).parent / "robot/..." instead.


13. No __all__ declaration

Severity: P4:nit | Category: CODE
File: robot/helper_tdd_tag_validation.py

Other helpers in robot/ (e.g., helper_cross_plan_correction.py) define __all__: list[str] = []. This helper doesn't -- minor inconsistency.


14. Docstring says "mirroring" but messages differ

Severity: P4:nit | Category: CODE
File: robot/tdd_expected_fail_listener.py:1

Module docstring references "mirroring the Behave implementation" but the error message formats differ significantly between the two. Consider "paralleling" instead.


15. XML parsing without defusedxml

Severity: P4:nit | Category: SEC
File: robot/helper_tdd_tag_validation.py:63

xml.etree.ElementTree.parse() is vulnerable to billion-laughs entity expansion. The XML is self-generated by Robot Framework so the risk is negligible. Either use defusedxml or add a comment documenting the accepted risk.


Process Checks (all pass)

  • Commit message matches issue Metadata
  • ISSUES CLOSED: #628 footer present
  • CHANGELOG updated under Unreleased
  • All files belong to this feature
  • No generated artifacts or secrets
  • File sizes well under 500 lines
  • No # type: ignore in new source
  • print() usage appropriate (CLI test helper, not production code)
## Code Review -- PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` **Reviewer**: hamza.khyari | **Protocol**: 9-phase review (2 rounds, 6 independent passes) ### Summary Well-structured implementation. The listener correctly implements all 6 acceptance criteria from issue #628. The fixture-based subprocess test architecture is sound and mirrors real-world usage. Code quality is high -- clean separation between validation (`start_test`) and inversion (`end_test`), proper regex compilation, and thorough docstrings. Two rounds of review (4 focused passes + 2 full-protocol passes) produced the findings below. 2 are should-fix, the rest are minor or nits. ### Findings | # | Severity | Category | File | Finding | |---|----------|----------|------|---------| | 1 | **P2:should-fix** | BUG | `tdd_expected_fail_listener.py` | Double-invocation idempotency bug | | 2 | **P2:should-fix** | TEST | `tdd_tag_validation.robot` | "Normal Test Unaffected" is a sham test | | 3 | P3:nit | BUG | `helper_tdd_tag_validation.py` | `_run_fixture` does not guard against missing `output.xml` | | 4 | P3:nit | BUG | `tdd_expected_fail_listener.py` | `end_test` reads `data.tags` instead of `result.tags` | | 5 | P3:nit | TEST | `tdd_tag_validation.robot` | No test for SKIP status branch | | 6 | P3:nit | TEST | `helper_tdd_tag_validation.py` | `cmd_expected_fail_inverted` never checks message content | | 7 | P3:nit | TEST | `helper_tdd_tag_validation.py` | Weak substring assertions in validation checks | | 8 | P3:nit | TEST | `tdd_tag_validation.robot` | No test for `tdd_expected_fail` alone (both companions missing) | | 9 | P3:nit | SPEC | `tdd_expected_fail_listener.py` | Guidance message deviates from AC #2 exact wording | | 10 | P3:nit | CODE | `tdd_expected_fail_listener.py` | `_validation_errors` never cleared on abnormal exit | | 11 | P3:nit | CODE | `helper_tdd_tag_validation.py` | `_run_fixture` returns stale temp file path | | 12 | P3:nit | CODE | `noxfile.py` | Listener path resolution is CWD-relative | | 13 | P4:nit | CODE | `helper_tdd_tag_validation.py` | No `__all__` declaration | | 14 | P4:nit | CODE | `tdd_expected_fail_listener.py` | Docstring says "mirroring" but messages differ | | 15 | P4:nit | SEC | `helper_tdd_tag_validation.py` | `xml.etree.ElementTree` without `defusedxml` | --- #### 1. Double-invocation idempotency bug **Severity**: P2:should-fix | **Category**: BUG **File**: `robot/tdd_expected_fail_listener.py` (end_test) If the listener is loaded twice (`--listener` specified twice, or passed via both noxfile and user args), both invocations share the same module-level state. The first `end_test` inverts FAIL->PASS; the second sees PASS and inverts it back to FAIL with the "Bug appears to be fixed" guidance message -- a silent double-inversion producing the wrong result. **Recommendation**: Add an idempotency guard: ```python _processed_tests: set[str] = set() def end_test(data, result): full_name = result.full_name if full_name in _processed_tests: return _processed_tests.add(full_name) ... ``` Clear it in a `close()` hook alongside `_validation_errors`. --- #### 2. "Normal Test Unaffected" is a sham test **Severity**: P2:should-fix | **Category**: TEST **File**: `robot/helper_tdd_tag_validation.py:133-141`, `robot/fixtures/tdd_normal_test.robot` The fixture contains `Should Be True ${TRUE}` which passes unconditionally. The helper checks `status == "PASS"`. This test produces the same PASS result with or without the listener loaded -- even if the listener file is empty or has an import error. It does not prove the listener was active and chose not to modify the result. **Recommendation**: Either (a) run the normal-test fixture in the same suite as a `tdd_expected_fail` fixture to prove the listener is loaded and selectively applies, or (b) check Robot's console output for the listener loading message. --- #### 3. `_run_fixture` does not guard against missing `output.xml` **Severity**: P3:nit | **Category**: BUG **File**: `robot/helper_tdd_tag_validation.py:81` If `robot` fails to start (import error, bad listener path, syntax error), `output.xml` is never created. `ET.parse(out_xml)` raises an opaque `FileNotFoundError`. Since this is test infrastructure, a clear diagnostic would help debugging. **Recommendation**: ```python proc = subprocess.run(cmd, capture_output=True, text=True) if not Path(out_xml).exists(): print(f"ERROR: Robot did not produce output.xml (rc={proc.returncode})", file=sys.stderr) print(f"stderr: {proc.stderr}", file=sys.stderr) sys.exit(1) ``` --- #### 4. `end_test` reads `data.tags` instead of `result.tags` **Severity**: P3:nit | **Category**: BUG **File**: `robot/tdd_expected_fail_listener.py:174` The old listener on master read from `result.tags` (which reflects runtime `Set Tags`/`Remove Tags` modifications). The new listener reads `data.tags` (static test definition). If any test dynamically adds `tdd_expected_fail` at runtime, the new listener won't see it. If this is intentional (TDD tags should be static), add a comment explaining the choice. --- #### 5. No test for SKIP status branch **Severity**: P3:nit | **Category**: TEST **File**: `robot/tdd_expected_fail_listener.py:192-199` `end_test` has an explicit `else` branch for SKIP that leaves the result unchanged. This branch has zero test coverage. A regression would go undetected. **Recommendation**: Add a fixture with `[Tags] tdd_bug tdd_bug_777 tdd_expected_fail tdd_fixture` that calls `Skip msg=Skipping intentionally`, and verify the result stays SKIP. --- #### 6. `cmd_expected_fail_inverted` never checks message content **Severity**: P3:nit | **Category**: TEST **File**: `robot/helper_tdd_tag_validation.py:78-86` This function only checks `status == "PASS"`. It does not verify the message is `"TDD expected failure: test failed as expected (bug still exists)."`. A bug that inverts to PASS with a wrong or empty message goes undetected. By contrast, `cmd_unexpected_pass_inverted` correctly checks for `"Bug appears to be fixed"`. **Recommendation**: `if status == "PASS" and "TDD expected failure" in message:` --- #### 7. Weak substring assertions in validation checks **Severity**: P3:nit | **Category**: TEST **File**: `robot/helper_tdd_tag_validation.py:100,115` `"tdd_bug" in message.lower()` matches any message incidentally containing the substring. `"Bug appears to be fixed" in message` would match a truncated message. Tighten to specific error text, e.g. `"missing the required tdd_bug tag"` and `"Remove the tdd_expected_fail tag"`. --- #### 8. No test for `tdd_expected_fail` alone (both companions missing) **Severity**: P3:nit | **Category**: TEST **File**: `robot/tdd_tag_validation.robot` AC #4 covers `tdd_expected_fail` without `tdd_bug` or `tdd_bug_<N>`. The existing fixture `tdd_expected_fail_missing_bug_n.robot` tests missing `tdd_bug_<N>` only (while `tdd_bug` is present). The branch where both are missing (`missing: ["tdd_bug", "tdd_bug_<N>"]`) is untested. --- #### 9. Guidance message deviates from AC #2 exact wording **Severity**: P3:nit | **Category**: SPEC **File**: `robot/tdd_expected_fail_listener.py:183-188` AC #2 specifies: *"Bug appears to be fixed. Remove the tdd_expected_fail tag and verify the fix through the bug fix workflow."* Implementation adds *"from this test"* and appends *"See CONTRIBUTING.md > Bug Fix Workflow."* -- arguably improvements, but not spec-verbatim. Also note the Behave implementation uses a different message format entirely. --- #### 10. `_validation_errors` never cleared on abnormal exit **Severity**: P3:nit | **Category**: CODE **File**: `robot/tdd_expected_fail_listener.py:64` If `start_test` fires but `end_test` never fires (Robot crash, SIGINT), stale entries remain. In normal RF execution this is harmless (process exits), but a `close()` hook would make the listener reusable in long-lived processes: ```python def close() -> None: _validation_errors.clear() ``` --- #### 11. `_run_fixture` returns stale temp file path **Severity**: P3:nit | **Category**: CODE **File**: `robot/helper_tdd_tag_validation.py:96` Returns `(status, message, out_xml)` where `out_xml` points inside an already-deleted `TemporaryDirectory`. All callers discard it via `_`. Simplify to `tuple[str, str]`. --- #### 12. Listener path resolution is CWD-relative **Severity**: P3:nit | **Category**: CODE **File**: `noxfile.py` `Path("robot/tdd_expected_fail_listener.py").resolve()` resolves relative to CWD. If nox is invoked from a different directory, the path breaks. Consider `Path(__file__).parent / "robot/..."` instead. --- #### 13. No `__all__` declaration **Severity**: P4:nit | **Category**: CODE **File**: `robot/helper_tdd_tag_validation.py` Other helpers in `robot/` (e.g., `helper_cross_plan_correction.py`) define `__all__: list[str] = []`. This helper doesn't -- minor inconsistency. --- #### 14. Docstring says "mirroring" but messages differ **Severity**: P4:nit | **Category**: CODE **File**: `robot/tdd_expected_fail_listener.py:1` Module docstring references "mirroring the Behave implementation" but the error message formats differ significantly between the two. Consider "paralleling" instead. --- #### 15. XML parsing without `defusedxml` **Severity**: P4:nit | **Category**: SEC **File**: `robot/helper_tdd_tag_validation.py:63` `xml.etree.ElementTree.parse()` is vulnerable to billion-laughs entity expansion. The XML is self-generated by Robot Framework so the risk is negligible. Either use `defusedxml` or add a comment documenting the accepted risk. --- ### Process Checks (all pass) - Commit message matches issue Metadata - `ISSUES CLOSED: #628` footer present - CHANGELOG updated under Unreleased - All files belong to this feature - No generated artifacts or secrets - File sizes well under 500 lines - No `# type: ignore` in new source - `print()` usage appropriate (CLI test helper, not production code)
Owner

PM Status — Day 32

Status: CONFLICTED — needs rebase. HIGH PRIORITY — TDD infrastructure.

PR: Robot Framework @tdd_expected_fail tag handler. M3 (v3.2.0), severely overdue (due Feb 26). Author: @hurui200320. 8 comments, has existing reviews.

Context: This is the Robot Framework counterpart to PR #665 (Behave TDD handler). Both are needed for the mandatory TDD bug workflow. PR #665 has a supersession question pending CoreRasurae's response by Day 33.

Action Required:

  1. @hurui200320: Rebase onto current master to resolve conflicts.
  2. This PR needs to be coordinated with PR #665's outcome — if #665's Behave handler is confirmed to not supersede this Robot handler, both should proceed.
  3. Labels need update: has Priority/High, State/Unverified, Type/Feature. Should be State/Verified (active work with branch). Missing MoSCoW (Must have), Points (8).
## PM Status — Day 32 **Status**: CONFLICTED — needs rebase. **HIGH PRIORITY** — TDD infrastructure. **PR**: Robot Framework `@tdd_expected_fail` tag handler. M3 (v3.2.0), severely overdue (due Feb 26). Author: @hurui200320. 8 comments, has existing reviews. **Context**: This is the Robot Framework counterpart to PR #665 (Behave TDD handler). Both are needed for the mandatory TDD bug workflow. PR #665 has a supersession question pending CoreRasurae's response by Day 33. **Action Required**: 1. **@hurui200320**: Rebase onto current master to resolve conflicts. 2. This PR needs to be coordinated with PR #665's outcome — if #665's Behave handler is confirmed to not supersede this Robot handler, both should proceed. 3. Labels need update: has Priority/High, State/Unverified, Type/Feature. Should be State/Verified (active work with branch). Missing MoSCoW (Must have), Points (8).
hurui200320 force-pushed feature/m5-robot-tdd-tags from c383623eb7
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 14s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 18s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 40s
CI / unit_tests (pull_request) Successful in 3m1s
CI / docker (pull_request) Successful in 9s
CI / integration_tests (pull_request) Successful in 3m37s
CI / coverage (pull_request) Successful in 5m36s
CI / benchmark-regression (pull_request) Successful in 35m56s
to b1e45b490e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / e2e_tests (pull_request) Successful in 26s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 3m2s
CI / integration_tests (pull_request) Successful in 3m30s
CI / docker (pull_request) Successful in 51s
CI / coverage (pull_request) Successful in 6m57s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-13 07:11:28 +00:00
Compare
Author
Member

Review Fixes Applied (Round 3) — Force-pushed b1e45b49

Rebased onto latest master (3e3e9b4b) and addressed all 15 findings from @hamza.khyari's code review.

P2 Fixes (should-fix)

1. Double-invocation idempotency bug — Added _processed_tests: set[str] module-level guard in tdd_expected_fail_listener.py. Each test is processed at most once per end_test call. Cleared in the new close() hook alongside _validation_errors.

2. Normal Test is a sham test — Rewrote cmd_normal_test_unaffected() using a new _run_multi_fixture() function that runs tdd_normal_test.robot alongside tdd_expected_fail_fails.robot in a single Robot invocation. The check now verifies both that the normal test passed AND that the expected-fail fixture was inverted to PASS (proving the listener is loaded and selectively applies).

P3 Fixes (nits)

3. Missing output.xml guard — Added Path(out_xml).exists() check after subprocess.run() with diagnostic stderr output including proc.returncode and proc.stderr.

4. data.tags vs result.tags — Added docstring in end_test documenting the intentional choice: TDD tags are declarative metadata that must be statically declared in the source file. Runtime Set Tags/Remove Tags modifications are not supported for TDD tags.

5. No SKIP test — Added tdd_expected_fail_skip.robot fixture (uses Skip keyword) and cmd_skip_status_unchanged() helper + Robot test case.

6. Missing message checkcmd_expected_fail_inverted() now asserts "TDD expected failure" in message in addition to status == "PASS".

7. Weak substring assertionscmd_missing_tdd_bug_validation() now asserts "missing the required tdd_bug tag" instead of generic "tdd_bug" in message.lower().

8. No tdd_expected_fail-alone test — Added tdd_expected_fail_alone.robot fixture (only tdd_expected_fail tag, both tdd_bug and tdd_bug_<N> missing) and cmd_expected_fail_alone_validation() helper + Robot test case.

9. Guidance message deviation — Kept the improvements ("from this test", "See CONTRIBUTING.md > Bug Fix Workflow.") as they provide better developer guidance. The Behave implementation uses @tdd_expected_fail with the @ prefix per Behave convention; Robot uses tdd_expected_fail without prefix per Robot Framework convention.

10. _validation_errors never cleared — Added close() hook that clears both _validation_errors and _processed_tests. Called by Robot Framework when execution finishes.

11. Stale temp path return — Simplified _run_fixture() return type to tuple[str, str] (status, message). The out_xml path was always discarded by callers.

12. CWD-relative listener path — Changed all three nox sessions (integration_tests, slow_integration_tests, e2e_tests) to use Path(__file__).parent / "robot" / "tdd_expected_fail_listener.py" instead of CWD-relative resolution.

P4 Fixes (nits)

13. No __all__ — Added __all__: list[str] = [] to helper_tdd_tag_validation.py.

14. "mirroring" in docstring — Changed to "paralleling" in tdd_expected_fail_listener.py module docstring.

15. XML parsing without defusedxml — Added comment documenting the accepted risk: the XML is always self-generated by Robot Framework, never from untrusted input.

Additional Fix

Increased timeout from 30s to 120s for CLI Plan Tree Displays Subplan Hierarchy in m4_e2e_verification.robot — pre-existing timeout failure unrelated to this feature (3 subprocess invocations + full Alembic migrations exceed 30s).

Quality Gates

Gate Result
nox -s lint PASS
nox -s typecheck PASS (0 errors)
nox -s unit_tests PASS (10,700 scenarios)
nox -s integration_tests PASS (1,505 tests)
nox -s coverage_report PASS (97.9% >= 97%)
nox -s benchmark PASS
nox -s docs PASS
nox -s build PASS
nox -s security_scan PASS
nox -s dead_code PASS
## Review Fixes Applied (Round 3) — Force-pushed `b1e45b49` Rebased onto latest master (`3e3e9b4b`) and addressed all 15 findings from @hamza.khyari's code review. ### P2 Fixes (should-fix) **1. Double-invocation idempotency bug** — Added `_processed_tests: set[str]` module-level guard in `tdd_expected_fail_listener.py`. Each test is processed at most once per `end_test` call. Cleared in the new `close()` hook alongside `_validation_errors`. **2. Normal Test is a sham test** — Rewrote `cmd_normal_test_unaffected()` using a new `_run_multi_fixture()` function that runs `tdd_normal_test.robot` alongside `tdd_expected_fail_fails.robot` in a single Robot invocation. The check now verifies both that the normal test passed AND that the expected-fail fixture was inverted to PASS (proving the listener is loaded and selectively applies). ### P3 Fixes (nits) **3. Missing output.xml guard** — Added `Path(out_xml).exists()` check after `subprocess.run()` with diagnostic stderr output including `proc.returncode` and `proc.stderr`. **4. `data.tags` vs `result.tags`** — Added docstring in `end_test` documenting the intentional choice: TDD tags are declarative metadata that must be statically declared in the source file. Runtime `Set Tags`/`Remove Tags` modifications are not supported for TDD tags. **5. No SKIP test** — Added `tdd_expected_fail_skip.robot` fixture (uses `Skip` keyword) and `cmd_skip_status_unchanged()` helper + Robot test case. **6. Missing message check** — `cmd_expected_fail_inverted()` now asserts `"TDD expected failure" in message` in addition to `status == "PASS"`. **7. Weak substring assertions** — `cmd_missing_tdd_bug_validation()` now asserts `"missing the required tdd_bug tag"` instead of generic `"tdd_bug" in message.lower()`. **8. No tdd_expected_fail-alone test** — Added `tdd_expected_fail_alone.robot` fixture (only `tdd_expected_fail` tag, both `tdd_bug` and `tdd_bug_<N>` missing) and `cmd_expected_fail_alone_validation()` helper + Robot test case. **9. Guidance message deviation** — Kept the improvements ("from this test", "See CONTRIBUTING.md > Bug Fix Workflow.") as they provide better developer guidance. The Behave implementation uses `@tdd_expected_fail` with the `@` prefix per Behave convention; Robot uses `tdd_expected_fail` without prefix per Robot Framework convention. **10. `_validation_errors` never cleared** — Added `close()` hook that clears both `_validation_errors` and `_processed_tests`. Called by Robot Framework when execution finishes. **11. Stale temp path return** — Simplified `_run_fixture()` return type to `tuple[str, str]` (status, message). The `out_xml` path was always discarded by callers. **12. CWD-relative listener path** — Changed all three nox sessions (`integration_tests`, `slow_integration_tests`, `e2e_tests`) to use `Path(__file__).parent / "robot" / "tdd_expected_fail_listener.py"` instead of CWD-relative resolution. ### P4 Fixes (nits) **13. No `__all__`** — Added `__all__: list[str] = []` to `helper_tdd_tag_validation.py`. **14. "mirroring" in docstring** — Changed to "paralleling" in `tdd_expected_fail_listener.py` module docstring. **15. XML parsing without defusedxml** — Added comment documenting the accepted risk: the XML is always self-generated by Robot Framework, never from untrusted input. ### Additional Fix Increased `timeout` from `30s` to `120s` for `CLI Plan Tree Displays Subplan Hierarchy` in `m4_e2e_verification.robot` — pre-existing timeout failure unrelated to this feature (3 subprocess invocations + full Alembic migrations exceed 30s). ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (10,700 scenarios) | | `nox -s integration_tests` | PASS (1,505 tests) | | `nox -s coverage_report` | PASS (97.9% >= 97%) | | `nox -s benchmark` | PASS | | `nox -s docs` | PASS | | `nox -s build` | PASS | | `nox -s security_scan` | PASS | | `nox -s dead_code` | PASS |
hurui200320 force-pushed feature/m5-robot-tdd-tags from b1e45b490e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 15s
CI / quality (pull_request) Successful in 17s
CI / e2e_tests (pull_request) Successful in 26s
CI / security (pull_request) Successful in 35s
CI / typecheck (pull_request) Successful in 36s
CI / unit_tests (pull_request) Successful in 3m2s
CI / integration_tests (pull_request) Successful in 3m30s
CI / docker (pull_request) Successful in 51s
CI / coverage (pull_request) Successful in 6m57s
CI / benchmark-regression (pull_request) Has been cancelled
to d615276349
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Successful in 27s
CI / typecheck (pull_request) Successful in 39s
CI / security (pull_request) Successful in 43s
CI / unit_tests (pull_request) Successful in 3m1s
CI / integration_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 52s
CI / coverage (pull_request) Successful in 6m46s
CI / benchmark-regression (pull_request) Successful in 36m12s
2026-03-13 07:28:50 +00:00
Compare
hamza.khyari left a comment

Code Review — PR #673

Verdict: REQUEST_CHANGES


Critical

C-1: PR bundles 7+ issues across 4+ milestones — must be split

Category: PROCESS

This PR contains 12 non-merge commits closing 7+ distinct issues spanning M3, M4, M5, and M6:

Concern Commits Issues
tdd_expected_fail Robot listener d6152763 #628
tdd_expected_fail Behave hooks c8cd7eab #627
AuditService EventBus wiring 3e3e9b4b #581
Session DI bug fixes e732c329, 61ba4370 #554, #570, #680
E2E test infrastructure d4a1a0d8 #740
M4/M5 E2E acceptance 317d0152, eb770643 #495, #496
E2E mock-only conversion 5e625b22, 6806ef36 #658, #697

CONTRIBUTING.md: "One Epic scope per PR." This needs at minimum 5 separate PRs.


C-2: M4 CLI integration tests use mocks — violates CONTRIBUTING.md

Category: TEST
Files: robot/helper_m4_e2e_cli.py, robot/helper_m4_e2e_cli_errors.py

These Robot helpers use unittest.mock.patch and CliRunner (52+ mock calls in cli.py, 25+ in cli_errors.py). CONTRIBUTING.md line 569: "Mocks are strictly prohibited in integration tests."

M1, M2, M3, and M6 were correctly converted to real subprocess invocations in commit 5e625b22, but M4 CLI tests were not. The TDD mock-only checker doesn't scan these sub-modules because _HELPER_FILES only lists the dispatcher files — so the mocks evade detection.

Fix: Convert to run_cli() from helper_e2e_common.py.


High

H-1: session_service is providers.Factory — creates new engine per call

Category: BUG
File: src/cleveragents/application/container.py:507-512

_build_session_service creates a new SQLAlchemy engine, inspects tables, and creates a sessionmaker on every invocation. CLI mitigates via a module-level cache, but any non-CLI consumer calling container.session_service() multiple times leaks engines.

Fix: Change to providers.Singleton.


H-2: Commit 9c8dc9ed violates Conventional Commits

Category: PROCESS

Subject Docs: Daily update to timeline — wrong format, no body, no ISSUES CLOSED: footer. Unrelated to the feature.


Category: PROCESS

Subject fix(test): correct mock wiring in plan_correct_tree_wiring Robot helper — no issue reference in footer. CONTRIBUTING.md requires every commit to reference its issue.


Medium

M-1: CorrectionService created ad-hoc in plan.py, bypasses container Singleton

Category: BUG
File: src/cleveragents/cli/commands/plan.py:2421

Container has a correction_service Singleton provider, but plan.py constructs its own instance — loses state between calls.

Fix: svc = container.correction_service()


M-2: AuditService._ensure_session() not thread-safe

Category: BUG
File: src/cleveragents/application/services/audit_service.py:131-139

Lazy init with no lock. As a Singleton, concurrent threads can race through _ensure_session, creating duplicate engines.

Fix: Add threading.Lock around initialization.


M-3: Incomplete entity_deleted audit coverage vs spec

Category: SPEC

Spec requires entity_deleted for all agents <entity> delete/remove. Currently emitted by ActorService, ProjectService, SessionService. Missing: SkillService.remove_skill(), tool removal, validation detachment, resource removal. No emitters include "cascade effects" as spec requires.


M-4: _suppress_structlog_stdout / _restore_structlog duplicated 3 times with incomplete restore

Category: CODE
Files: tdd_session_shared_steps.py:27-66, session_list_error_steps.py:60-88, tdd_session_list_missing_db_steps.py:34-63

Three identical copies. The restore function doesn't restore logger_factory / wrapper_class — only cache_logger_on_first_use. In serial test runs, this permanently mutates structlog config.

Fix: Extract to shared utility, restore the full config.


M-5: _make_mock_scenario duplicated between Robot helper and Behave steps

Category: CODE
Files: robot/helper_tdd_tag_validation.py:98-125, features/steps/tdd_tag_validation_steps.py:95-136

Nearly identical mock builder. Extract to features/mocks/tdd_test_helpers.py.


M-6: cleanup_workspace() mutates os.environ — not parallelism-safe

Category: BUG
File: robot/helper_e2e_common.py:106-109

Direct os.environ mutation is unsafe under pabot. Pass env vars via run_cli() env parameter instead.


M-7: XML message extraction may miss RF status message attribute

Category: BUG
File: robot/helper_tdd_tag_validation.py:379-385

Reads status_el.text but doesn't check status_el.get("message"). RF 7.3+ may use either.

Fix: Add fallback: if not message: message = status_el.get("message", "")


Low

  • L-1: _fail() uses raise SystemExit(1) vs sys.exit(1) — inconsistent (helper_m4_e2e_common.py:70, helper_e2e_common.py:112)
  • L-2: Three separate CONFIG_CHANGED audit events for one server connect operation (server.py:123-125)
  • L-3: ConfigService() in _resolve_auto_reindex constructed without event_bus (container.py:256)
  • L-4: Silent audit subscriber deferral — events lost before lazy resolution (container.py:636-643)
  • L-5: Stale feature descriptions in session TDD features — still say "fails due to" after bug fix (tdd_session_create_di.feature:3-6, tdd_session_list_di.feature:3-6, tdd_session_list_missing_db.feature:5-6)
  • L-6: Listener docstring says "longname" but code uses full_name (tdd_expected_fail_listener.py:40,73)
  • L-7: Listener error messages omit @ prefix, diverging from Behave counterpart (tdd_expected_fail_listener.py:105-121)

Nits

  • helper_tdd_tag_validation.py at 637 lines, helper_m3_e2e_verification.py at 901 lines, helper_m6_e2e_verification.py at 863 lines exceed 500-line limit — should be split
  • Tags parsed twice per test in listener (start_test + end_test) — negligible overhead
  • Missing edge case fixture: tdd_bug alone without tdd_bug_N or tdd_expected_fail
  • _validate_tdd_tags returns string vs Behave's ValueError — undocumented design choice
  • Type/Feature label inaccurate for a PR containing bug fixes, tests, observability, and docs
## Code Review — PR #673 **Verdict**: REQUEST_CHANGES --- ### Critical #### C-1: PR bundles 7+ issues across 4+ milestones — must be split **Category**: PROCESS This PR contains 12 non-merge commits closing 7+ distinct issues spanning M3, M4, M5, and M6: | Concern | Commits | Issues | |---------|---------|--------| | `tdd_expected_fail` Robot listener | `d6152763` | #628 | | `tdd_expected_fail` Behave hooks | `c8cd7eab` | #627 | | AuditService EventBus wiring | `3e3e9b4b` | #581 | | Session DI bug fixes | `e732c329`, `61ba4370` | #554, #570, #680 | | E2E test infrastructure | `d4a1a0d8` | #740 | | M4/M5 E2E acceptance | `317d0152`, `eb770643` | #495, #496 | | E2E mock-only conversion | `5e625b22`, `6806ef36` | #658, #697 | CONTRIBUTING.md: "One Epic scope per PR." This needs at minimum 5 separate PRs. --- #### C-2: M4 CLI integration tests use mocks — violates CONTRIBUTING.md **Category**: TEST **Files**: `robot/helper_m4_e2e_cli.py`, `robot/helper_m4_e2e_cli_errors.py` These Robot helpers use `unittest.mock.patch` and `CliRunner` (52+ mock calls in cli.py, 25+ in cli_errors.py). CONTRIBUTING.md line 569: "Mocks are strictly prohibited in integration tests." M1, M2, M3, and M6 were correctly converted to real subprocess invocations in commit `5e625b22`, but M4 CLI tests were not. The TDD mock-only checker doesn't scan these sub-modules because `_HELPER_FILES` only lists the dispatcher files — so the mocks evade detection. **Fix**: Convert to `run_cli()` from `helper_e2e_common.py`. --- ### High #### H-1: `session_service` is `providers.Factory` — creates new engine per call **Category**: BUG **File**: `src/cleveragents/application/container.py:507-512` `_build_session_service` creates a new SQLAlchemy engine, inspects tables, and creates a sessionmaker on every invocation. CLI mitigates via a module-level cache, but any non-CLI consumer calling `container.session_service()` multiple times leaks engines. **Fix**: Change to `providers.Singleton`. --- #### H-2: Commit `9c8dc9ed` violates Conventional Commits **Category**: PROCESS Subject `Docs: Daily update to timeline` — wrong format, no body, no `ISSUES CLOSED:` footer. Unrelated to the feature. --- #### H-3: Commit `94d95324` missing `ISSUES CLOSED:` footer **Category**: PROCESS Subject `fix(test): correct mock wiring in plan_correct_tree_wiring Robot helper` — no issue reference in footer. CONTRIBUTING.md requires every commit to reference its issue. --- ### Medium #### M-1: `CorrectionService` created ad-hoc in `plan.py`, bypasses container Singleton **Category**: BUG **File**: `src/cleveragents/cli/commands/plan.py:2421` Container has a `correction_service` Singleton provider, but `plan.py` constructs its own instance — loses state between calls. **Fix**: `svc = container.correction_service()` --- #### M-2: `AuditService._ensure_session()` not thread-safe **Category**: BUG **File**: `src/cleveragents/application/services/audit_service.py:131-139` Lazy init with no lock. As a Singleton, concurrent threads can race through `_ensure_session`, creating duplicate engines. **Fix**: Add `threading.Lock` around initialization. --- #### M-3: Incomplete `entity_deleted` audit coverage vs spec **Category**: SPEC Spec requires `entity_deleted` for all `agents <entity> delete/remove`. Currently emitted by `ActorService`, `ProjectService`, `SessionService`. Missing: `SkillService.remove_skill()`, tool removal, validation detachment, resource removal. No emitters include "cascade effects" as spec requires. --- #### M-4: `_suppress_structlog_stdout` / `_restore_structlog` duplicated 3 times with incomplete restore **Category**: CODE **Files**: `tdd_session_shared_steps.py:27-66`, `session_list_error_steps.py:60-88`, `tdd_session_list_missing_db_steps.py:34-63` Three identical copies. The restore function doesn't restore `logger_factory` / `wrapper_class` — only `cache_logger_on_first_use`. In serial test runs, this permanently mutates structlog config. **Fix**: Extract to shared utility, restore the full config. --- #### M-5: `_make_mock_scenario` duplicated between Robot helper and Behave steps **Category**: CODE **Files**: `robot/helper_tdd_tag_validation.py:98-125`, `features/steps/tdd_tag_validation_steps.py:95-136` Nearly identical mock builder. Extract to `features/mocks/tdd_test_helpers.py`. --- #### M-6: `cleanup_workspace()` mutates `os.environ` — not parallelism-safe **Category**: BUG **File**: `robot/helper_e2e_common.py:106-109` Direct `os.environ` mutation is unsafe under pabot. Pass env vars via `run_cli()` env parameter instead. --- #### M-7: XML message extraction may miss RF status message attribute **Category**: BUG **File**: `robot/helper_tdd_tag_validation.py:379-385` Reads `status_el.text` but doesn't check `status_el.get("message")`. RF 7.3+ may use either. **Fix**: Add fallback: `if not message: message = status_el.get("message", "")` --- ### Low - **L-1**: `_fail()` uses `raise SystemExit(1)` vs `sys.exit(1)` — inconsistent (`helper_m4_e2e_common.py:70`, `helper_e2e_common.py:112`) - **L-2**: Three separate `CONFIG_CHANGED` audit events for one `server connect` operation (`server.py:123-125`) - **L-3**: `ConfigService()` in `_resolve_auto_reindex` constructed without `event_bus` (`container.py:256`) - **L-4**: Silent audit subscriber deferral — events lost before lazy resolution (`container.py:636-643`) - **L-5**: Stale feature descriptions in session TDD features — still say "fails due to" after bug fix (`tdd_session_create_di.feature:3-6`, `tdd_session_list_di.feature:3-6`, `tdd_session_list_missing_db.feature:5-6`) - **L-6**: Listener docstring says "longname" but code uses `full_name` (`tdd_expected_fail_listener.py:40,73`) - **L-7**: Listener error messages omit `@` prefix, diverging from Behave counterpart (`tdd_expected_fail_listener.py:105-121`) ### Nits - `helper_tdd_tag_validation.py` at 637 lines, `helper_m3_e2e_verification.py` at 901 lines, `helper_m6_e2e_verification.py` at 863 lines exceed 500-line limit — should be split - Tags parsed twice per test in listener (`start_test` + `end_test`) — negligible overhead - Missing edge case fixture: `tdd_bug` alone without `tdd_bug_N` or `tdd_expected_fail` - `_validate_tdd_tags` returns string vs Behave's `ValueError` — undocumented design choice - `Type/Feature` label inaccurate for a PR containing bug fixes, tests, observability, and docs
Owner

PM Triage — Day 33 (2026-03-13)

@hamza.khyari posted a formal REQUEST_CHANGES review with findings across multiple severity levels. @hurui200320's Round 3 fixes addressed the 15 findings from hamza's earlier comment review, but the formal review (C-1, C-2, H-1 through H-3, M-1 through M-7) contains additional findings that need resolution.

Critical Assessment

C-1 (PR bundles 7+ issues — must split): This is the most significant finding. The PR description confirms 12 non-merge commits spanning #628, #627, #581, #554, #570, #680, #740, #495, #496, #658, #697 across M3–M6. CONTRIBUTING.md does require one Epic scope per PR.

PM ruling on C-1: At Day 33, splitting this PR into 5+ PRs would create significant risk and delay TDD infrastructure that 5 Critical TDD issues (#838–#842) now depend on. Recommended path forward:

  1. Extract the core TDD deliverable (commits for #628 — the Robot listener, fixtures, nox integration, helper) into a clean single-issue PR if feasible within 4 hours
  2. If extraction is impractical due to commit entanglement, accept the PR as-is with a follow-up issue to track the process debt
  3. The non-#628 changes should have their own PRs filed as follow-up

C-2 (M4 CLI tests use mocks): Valid — unittest.mock.patch in integration tests violates CONTRIBUTING.md line 569. However, this is pre-existing code from other commits, not the TDD listener feature. Non-blocking for the TDD merge but must be filed as a separate issue.

Blocking vs Non-Blocking

Finding Blocking? Rationale
C-1 Conditional See ruling above
C-2 No Pre-existing code from other commits
H-1 No session_service Factory bug is pre-existing, not introduced by TDD feature
H-2 No Commit format issue — can't be fixed retroactively without rewrite
H-3 No Same as H-2
M-1 through M-7 No All either pre-existing or non-TDD-listener code

Action Required

@hurui200320: Please respond to hamza's formal REQUEST_CHANGES review within 24 hours. Specifically:

  1. Acknowledge C-1 — state whether the TDD listener commits can be cleanly extracted into a standalone PR, or whether the commits are too entangled
  2. Acknowledge C-2 — agree this needs a follow-up issue (will be filed by PM)
  3. For H-1 through M-7 — indicate which are addressed by Round 3 fixes vs. which are new findings from the formal review

@hamza.khyari: Excellent review depth. Once Rui responds to C-1, would you be willing to re-review if the TDD-only subset is extracted? Your review of the listener code itself (findings 1–15 from your earlier comment) has been fully addressed per Round 3.

Priority: This remains Tier 1 — TDD infrastructure. Issues #838, #839, #840, #841, #842 (all Critical/Must have) are blocked until the TDD tag handler is on master.

## PM Triage — Day 33 (2026-03-13) @hamza.khyari posted a formal REQUEST_CHANGES review with findings across multiple severity levels. @hurui200320's Round 3 fixes addressed the 15 findings from hamza's earlier comment review, but the formal review (C-1, C-2, H-1 through H-3, M-1 through M-7) contains **additional findings** that need resolution. ### Critical Assessment **C-1 (PR bundles 7+ issues — must split)**: This is the most significant finding. The PR description confirms 12 non-merge commits spanning #628, #627, #581, #554, #570, #680, #740, #495, #496, #658, #697 across M3–M6. CONTRIBUTING.md does require one Epic scope per PR. **PM ruling on C-1**: At Day 33, splitting this PR into 5+ PRs would create significant risk and delay TDD infrastructure that 5 Critical TDD issues (#838–#842) now depend on. **Recommended path forward:** 1. **Extract the core TDD deliverable** (commits for #628 — the Robot listener, fixtures, nox integration, helper) into a clean single-issue PR if feasible within 4 hours 2. If extraction is impractical due to commit entanglement, **accept the PR as-is** with a follow-up issue to track the process debt 3. The non-#628 changes should have their own PRs filed as follow-up **C-2 (M4 CLI tests use mocks)**: Valid — `unittest.mock.patch` in integration tests violates CONTRIBUTING.md line 569. However, this is pre-existing code from other commits, not the TDD listener feature. **Non-blocking for the TDD merge** but must be filed as a separate issue. ### Blocking vs Non-Blocking | Finding | Blocking? | Rationale | |---------|-----------|-----------| | C-1 | Conditional | See ruling above | | C-2 | No | Pre-existing code from other commits | | H-1 | No | `session_service` Factory bug is pre-existing, not introduced by TDD feature | | H-2 | No | Commit format issue — can't be fixed retroactively without rewrite | | H-3 | No | Same as H-2 | | M-1 through M-7 | No | All either pre-existing or non-TDD-listener code | ### Action Required **@hurui200320**: Please respond to hamza's formal REQUEST_CHANGES review within 24 hours. Specifically: 1. Acknowledge C-1 — state whether the TDD listener commits can be cleanly extracted into a standalone PR, or whether the commits are too entangled 2. Acknowledge C-2 — agree this needs a follow-up issue (will be filed by PM) 3. For H-1 through M-7 — indicate which are addressed by Round 3 fixes vs. which are new findings from the formal review **@hamza.khyari**: Excellent review depth. Once Rui responds to C-1, would you be willing to re-review if the TDD-only subset is extracted? Your review of the listener code itself (findings 1–15 from your earlier comment) has been fully addressed per Round 3. **Priority**: This remains **Tier 1 — TDD infrastructure**. Issues #838, #839, #840, #841, #842 (all Critical/Must have) are blocked until the TDD tag handler is on master.
Owner

PM Status Update — Day 34

This PR is Tier 1 priority — it unblocks 5 Critical bug TDD issues (#838-#842). The technical quality is solid (all 15 of Hamza's findings addressed, 10 quality gates green).

Blocking item: Hamza's REQUEST_CHANGES (C-1 — scope bundling)
The PR bundles 7+ issues across M3-M6 milestones. CONTRIBUTING.md requires one-Epic-scope-per-PR. This is the single thing preventing merge.

PM ruling on C-1 (reaffirming Day 33 position):
Given we are Day 34 and this blocks 5 Critical bugs, splitting is not justified if it takes more than ~4 hours of work. The priority is unblocking the TDD pipeline.

@hurui200320 — Please respond to Hamza's formal review within 24 hours:

  1. Can the TDD listener commits (#628) be cleanly cherry-picked into a standalone PR? If yes and doable in <4 hours, do it.
  2. If not feasible due to commit entanglement, document that in a comment and we will accept as-is with a process-debt follow-up issue.
  3. C-2 (M4 CLI tests using unittest.mock.patch): File a follow-up issue and note the issue number here.

@hamza.khyari — Given the context (5 Critical bugs blocked), would you accept the PR if Rui addresses C-1 with either approach above? Your review quality is outstanding, but the process-purity vs. schedule tradeoff favors merge here.

Label update: State should be State/Verified, not State/Unverified.

**PM Status Update — Day 34** This PR is **Tier 1 priority** — it unblocks 5 Critical bug TDD issues (#838-#842). The technical quality is solid (all 15 of Hamza's findings addressed, 10 quality gates green). **Blocking item: Hamza's REQUEST_CHANGES (C-1 — scope bundling)** The PR bundles 7+ issues across M3-M6 milestones. CONTRIBUTING.md requires one-Epic-scope-per-PR. This is the single thing preventing merge. **PM ruling on C-1 (reaffirming Day 33 position):** Given we are Day 34 and this blocks 5 Critical bugs, splitting is not justified if it takes more than ~4 hours of work. The priority is unblocking the TDD pipeline. **@hurui200320** — Please respond to Hamza's formal review within 24 hours: 1. **Can the TDD listener commits (#628) be cleanly cherry-picked** into a standalone PR? If yes and doable in <4 hours, do it. 2. If not feasible due to commit entanglement, document that in a comment and we will accept as-is with a process-debt follow-up issue. 3. **C-2** (M4 CLI tests using `unittest.mock.patch`): File a follow-up issue and note the issue number here. **@hamza.khyari** — Given the context (5 Critical bugs blocked), would you accept the PR if Rui addresses C-1 with either approach above? Your review quality is outstanding, but the process-purity vs. schedule tradeoff favors merge here. **Label update:** State should be `State/Verified`, not `State/Unverified`.
freemo left a comment

PM Review — Day 34: Scope Bundling Ruling (C-1)

@hurui200320 @hamza.khyari — This PR has been in review for 4 days with 13 comments. All 15 technical findings from Hamza's review have been addressed by Rui. The remaining open discussion is the C-1 scope bundling concern (Hamza's REQUEST_CHANGES): whether the Behave integration tests and Robot handler should be split into separate PRs.

PM Ruling

Accept as-is with a process-debt follow-up. Rationale:

  1. Critical path impact: This PR blocks 5 Critical TDD bug pipelines (#838, #839, #840, #841, #842). Every day of delay cascades to 5 bug fixes.
  2. All technical findings resolved: The code quality is verified — 15/15 review findings addressed, all nox quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage at 97.9%).
  3. Splitting cost exceeds benefit: The Behave integration tests and Robot handler share test infrastructure (helper_tdd_tag_validation.py). Splitting would require duplicating this infrastructure or creating a dependency chain between 2 PRs — adding complexity, not reducing it.
  4. Time constraint: If splitting takes >4 hours of developer time, it's a net schedule loss given the 5 blocked Critical bugs.

Required Action

  • @hamza.khyari: Please convert your REQUEST_CHANGES to APPROVED (or COMMENT). If you have remaining concerns about the scope bundling, please open a follow-up issue to track the process-debt item — we will address it in the continuous improvement track.
  • @hurui200320: PR is mergeable. Once Hamza's review state is resolved, this can go to Jeff for merge.

PR Status

  • Mergeable: Yes
  • Quality gates: All pass
  • Reviews: 15/15 findings addressed
  • Blocking: 5 Critical TDD issues (#838-#842)
  • Recommendation: Merge at earliest opportunity

PM review — Day 34 schedule adherence

## PM Review — Day 34: Scope Bundling Ruling (C-1) @hurui200320 @hamza.khyari — This PR has been in review for 4 days with 13 comments. All 15 technical findings from Hamza's review have been addressed by Rui. The remaining open discussion is the **C-1 scope bundling concern** (Hamza's REQUEST_CHANGES): whether the Behave integration tests and Robot handler should be split into separate PRs. ### PM Ruling **Accept as-is with a process-debt follow-up.** Rationale: 1. **Critical path impact**: This PR blocks 5 Critical TDD bug pipelines (#838, #839, #840, #841, #842). Every day of delay cascades to 5 bug fixes. 2. **All technical findings resolved**: The code quality is verified — 15/15 review findings addressed, all nox quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage at 97.9%). 3. **Splitting cost exceeds benefit**: The Behave integration tests and Robot handler share test infrastructure (`helper_tdd_tag_validation.py`). Splitting would require duplicating this infrastructure or creating a dependency chain between 2 PRs — adding complexity, not reducing it. 4. **Time constraint**: If splitting takes >4 hours of developer time, it's a net schedule loss given the 5 blocked Critical bugs. ### Required Action - **@hamza.khyari**: Please convert your REQUEST_CHANGES to APPROVED (or COMMENT). If you have remaining concerns about the scope bundling, please open a follow-up issue to track the process-debt item — we will address it in the continuous improvement track. - **@hurui200320**: PR is mergeable. Once Hamza's review state is resolved, this can go to Jeff for merge. ### PR Status - Mergeable: **Yes** - Quality gates: **All pass** - Reviews: 15/15 findings addressed - Blocking: **5 Critical TDD issues (#838-#842)** - Recommendation: **Merge at earliest opportunity** --- *PM review — Day 34 schedule adherence*
hurui200320 force-pushed feature/m5-robot-tdd-tags from d615276349
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 15s
CI / build (pull_request) Successful in 16s
CI / quality (pull_request) Successful in 19s
CI / e2e_tests (pull_request) Successful in 27s
CI / typecheck (pull_request) Successful in 39s
CI / security (pull_request) Successful in 43s
CI / unit_tests (pull_request) Successful in 3m1s
CI / integration_tests (pull_request) Successful in 3m31s
CI / docker (pull_request) Successful in 52s
CI / coverage (pull_request) Successful in 6m46s
CI / benchmark-regression (pull_request) Successful in 36m12s
to c7bdb62b73
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 42s
CI / security (pull_request) Successful in 49s
CI / quality (pull_request) Successful in 29s
CI / e2e_tests (pull_request) Successful in 1m29s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / unit_tests (pull_request) Successful in 3m12s
CI / integration_tests (pull_request) Successful in 3m38s
CI / docker (pull_request) Successful in 1m50s
CI / coverage (pull_request) Successful in 6m16s
CI / benchmark-regression (pull_request) Failing after 1m39s
2026-03-16 06:18:11 +00:00
Compare
Author
Member

Review Response (Round 4) — Force-pushed c7bdb62b

Rebased onto latest master (3b6b1d2) and addressed Hamza's formal REQUEST_CHANGES review (review #2190). After rebase, this PR contains a single commit (c7bdb62b) implementing only #628 — all other commits from the previous branch history are now on master.

C-1 and C-2: Resolved by rebase

After rebasing onto latest master, this PR now contains only the #628 commit — 16 files changed, 1004 insertions, 168 deletions. The scope-bundling concern (C-1) and mock-in-integration-tests concern (C-2) are no longer applicable because those commits are already on master.

Findings addressed in this round

# Severity Finding Resolution
M-5 Medium _make_mock_scenario duplicated between Robot helper and Behave steps Fixed: Extracted to features/mocks/tdd_test_helpers.py as make_mock_scenario(). Both robot/helper_tdd_tag_validation.py and features/steps/tdd_tag_validation_steps.py now import from the shared module. Updated features/mocks/__init__.py to export it.
M-7 Medium XML message extraction may miss RF status message attribute Fixed: Added status_el.get("message", "") fallback in both _run_fixture() and _run_multi_fixture(). Now checks element text → message attribute (RF 7.3+) → child <msg> element.
L-6 Low Listener docstring says "longname" but code uses full_name Fixed: Changed "keyed by test longname" to "keyed by test full_name" in the Implementation Notes docstring.
L-7 Low Listener error messages omit @ prefix, diverging from Behave counterpart Documented: Added comment in _validate_tdd_tags explaining the intentional divergence — Robot Framework tags don't use the @ prefix, while Behave does. Each framework's error messages use its own convention.
Nit - Missing edge case fixture: tdd_bug alone without tdd_bug_N or tdd_expected_fail Fixed: Added robot/fixtures/tdd_bug_alone.robot fixture, cmd_tdd_bug_alone_valid() helper command, and "TDD Bug Tag Alone Is Valid" test case. Total Robot-side integration tests: 9.
Nit - _validate_tdd_tags returns string vs Behave's ValueError — undocumented design choice Documented: Added design note in _validate_tdd_tags docstring explaining why `str

Findings now out of scope (resolved by rebase)

After rebasing, the following findings are no longer part of this PR's diff — they exist on master from previously-merged commits:

# Finding Status
H-1 session_service Factory → Singleton Master code (from #554/#570/#680). Recommend filing a separate issue.
H-2 Commit 9c8dc9ed violates Conventional Commits Historical commit, now on master.
H-3 Commit 94d95324 missing ISSUES CLOSED: footer Historical commit, now on master.
M-1 CorrectionService bypasses container Singleton Master code (from plan.py).
M-2 AuditService._ensure_session() not thread-safe Master code (from #581).
M-3 Incomplete entity_deleted audit coverage Spec gap, not introduced by #628.
M-4 _suppress_structlog_stdout duplicated 3 times Master code (from #554/#570/#680).
M-6 cleanup_workspace() mutates os.environ Master code (from helper_e2e_common.py).
L-1 _fail() uses raise SystemExit(1) vs sys.exit(1) Master code.
L-2 Three separate CONFIG_CHANGED audit events Master code.
L-3 ConfigService() without event_bus Master code.
L-4 Silent audit subscriber deferral Master code.
L-5 Stale feature descriptions in session TDD features Master code (from #627).
Nit Type/Feature label inaccurate for multi-type PR No longer applicable — PR is now single-issue (#628), purely Type/Feature.
Nit helper_m3_e2e_verification.py / helper_m6_e2e_verification.py line counts Master code, not part of #628.

I recommend @hamza.khyari or PM file separate issues for H-1, M-1, M-2, M-3, M-4, and M-6 to track these as independent follow-ups.

Quality Gates (post-rebase + review fixes)

Gate Result
nox -s lint PASS (0 violations)
nox -s typecheck PASS (0 errors, 1 pre-existing warning)
nox -s unit_tests PASS (10,815 scenarios, 381 features)
nox -s integration_tests PASS (1,519 tests)
nox -s coverage_report PASS (97% >= 97% threshold)

PR scope after rebase

This PR now contains a single commit for a single issue (#628):

  • 8 new fixture .robot files (including tdd_bug_alone.robot)
  • 1 new shared mock helper (features/mocks/tdd_test_helpers.py)
  • Listener (robot/tdd_expected_fail_listener.py) — modified
  • Helper (robot/helper_tdd_tag_validation.py) — modified
  • Test suite (robot/tdd_tag_validation.robot) — modified
  • Nox config — modified (listener path + tdd_fixture exclude)
  • CHANGELOG — modified
  • features/mocks/__init__.py — modified (export shared helper)
  • features/steps/tdd_tag_validation_steps.py — modified (use shared helper)
## Review Response (Round 4) — Force-pushed `c7bdb62b` Rebased onto latest master (`3b6b1d2`) and addressed Hamza's formal REQUEST_CHANGES review (review #2190). After rebase, this PR contains a **single commit** (`c7bdb62b`) implementing only #628 — all other commits from the previous branch history are now on master. ### C-1 and C-2: Resolved by rebase After rebasing onto latest master, this PR now contains **only the #628 commit** — 16 files changed, 1004 insertions, 168 deletions. The scope-bundling concern (C-1) and mock-in-integration-tests concern (C-2) are no longer applicable because those commits are already on master. ### Findings addressed in this round | # | Severity | Finding | Resolution | |---|----------|---------|------------| | M-5 | Medium | `_make_mock_scenario` duplicated between Robot helper and Behave steps | **Fixed**: Extracted to `features/mocks/tdd_test_helpers.py` as `make_mock_scenario()`. Both `robot/helper_tdd_tag_validation.py` and `features/steps/tdd_tag_validation_steps.py` now import from the shared module. Updated `features/mocks/__init__.py` to export it. | | M-7 | Medium | XML message extraction may miss RF status message attribute | **Fixed**: Added `status_el.get("message", "")` fallback in both `_run_fixture()` and `_run_multi_fixture()`. Now checks element text → `message` attribute (RF 7.3+) → child `<msg>` element. | | L-6 | Low | Listener docstring says "longname" but code uses `full_name` | **Fixed**: Changed "keyed by test longname" to "keyed by test ``full_name``" in the Implementation Notes docstring. | | L-7 | Low | Listener error messages omit `@` prefix, diverging from Behave counterpart | **Documented**: Added comment in `_validate_tdd_tags` explaining the intentional divergence — Robot Framework tags don't use the `@` prefix, while Behave does. Each framework's error messages use its own convention. | | Nit | - | Missing edge case fixture: `tdd_bug` alone without `tdd_bug_N` or `tdd_expected_fail` | **Fixed**: Added `robot/fixtures/tdd_bug_alone.robot` fixture, `cmd_tdd_bug_alone_valid()` helper command, and "TDD Bug Tag Alone Is Valid" test case. Total Robot-side integration tests: 9. | | Nit | - | `_validate_tdd_tags` returns string vs Behave's `ValueError` — undocumented design choice | **Documented**: Added design note in `_validate_tdd_tags` docstring explaining why `str | None` is used instead of raising `ValueError`: the Robot Framework Listener v3 API does not propagate exceptions cleanly from `start_test`. | ### Findings now out of scope (resolved by rebase) After rebasing, the following findings are no longer part of this PR's diff — they exist on master from previously-merged commits: | # | Finding | Status | |---|---------|--------| | H-1 | `session_service` Factory → Singleton | Master code (from #554/#570/#680). Recommend filing a separate issue. | | H-2 | Commit `9c8dc9ed` violates Conventional Commits | Historical commit, now on master. | | H-3 | Commit `94d95324` missing `ISSUES CLOSED:` footer | Historical commit, now on master. | | M-1 | `CorrectionService` bypasses container Singleton | Master code (from plan.py). | | M-2 | `AuditService._ensure_session()` not thread-safe | Master code (from #581). | | M-3 | Incomplete `entity_deleted` audit coverage | Spec gap, not introduced by #628. | | M-4 | `_suppress_structlog_stdout` duplicated 3 times | Master code (from #554/#570/#680). | | M-6 | `cleanup_workspace()` mutates `os.environ` | Master code (from `helper_e2e_common.py`). | | L-1 | `_fail()` uses `raise SystemExit(1)` vs `sys.exit(1)` | Master code. | | L-2 | Three separate `CONFIG_CHANGED` audit events | Master code. | | L-3 | `ConfigService()` without `event_bus` | Master code. | | L-4 | Silent audit subscriber deferral | Master code. | | L-5 | Stale feature descriptions in session TDD features | Master code (from #627). | | Nit | `Type/Feature` label inaccurate for multi-type PR | No longer applicable — PR is now single-issue (#628), purely `Type/Feature`. | | Nit | `helper_m3_e2e_verification.py` / `helper_m6_e2e_verification.py` line counts | Master code, not part of #628. | I recommend @hamza.khyari or PM file separate issues for H-1, M-1, M-2, M-3, M-4, and M-6 to track these as independent follow-ups. ### Quality Gates (post-rebase + review fixes) | Gate | Result | |------|--------| | `nox -s lint` | PASS (0 violations) | | `nox -s typecheck` | PASS (0 errors, 1 pre-existing warning) | | `nox -s unit_tests` | PASS (10,815 scenarios, 381 features) | | `nox -s integration_tests` | PASS (1,519 tests) | | `nox -s coverage_report` | PASS (97% >= 97% threshold) | ### PR scope after rebase This PR now contains a single commit for a single issue (#628): - 8 new fixture `.robot` files (including `tdd_bug_alone.robot`) - 1 new shared mock helper (`features/mocks/tdd_test_helpers.py`) - Listener (`robot/tdd_expected_fail_listener.py`) — modified - Helper (`robot/helper_tdd_tag_validation.py`) — modified - Test suite (`robot/tdd_tag_validation.robot`) — modified - Nox config — modified (listener path + tdd_fixture exclude) - CHANGELOG — modified - `features/mocks/__init__.py` — modified (export shared helper) - `features/steps/tdd_tag_validation_steps.py` — modified (use shared helper)
Author
Member

Self-Review (Round 5) — PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Reviewer: hurui200320 | Protocol: 5-agent parallel review (test coverage/flaws, performance, bug detection, security, spec/code quality)

Verdict: APPROVED — with non-blocking suggestions below.


Overall Assessment

All 6 acceptance criteria from ticket #628 are correctly satisfied. Code quality is high — clean separation between validation (start_test) and inversion (end_test), proper regex compilation at module level, thorough docstrings, and comprehensive Robot + Behave test coverage. All 15 findings from @hamza.khyari's previous review have been properly addressed in Round 4. The rebase cleanly scoped this PR to a single commit for #628.

Performance: No issues. Listener overhead is negligible (~microseconds per test across 1500+ tests).
Security: No issues. Subprocess invocation uses list args (no shell injection), fixture names are hardcoded, temp dirs managed via context managers, XML is self-generated.
Bug Detection: No new bugs found. All 8 previously-reported bugs properly fixed.

6 new suggestions are listed below — 2 Medium, 4 Low/Nit. None are blocking.


Findings

# Severity Category File Finding
1 Medium CODE helper_tdd_tag_validation.py File exceeds 500-line limit (629 lines)
2 Medium TEST tdd_expected_fail_listener.py Idempotency guard (_processed_tests) is untested
3 Low TEST tdd_expected_fail_listener.py close() cleanup function is untested
4 Low TEST helper_tdd_tag_validation.py:554 Weak assertion in expected-fail-alone validation
5 Low TEST tdd_expected_fail_listener.py:161 Tag case-normalization (.lower()) never exercised by tests
6 Low/Nit CODE helper_tdd_tag_validation.py:347,413 Duplicated XML message-extraction logic

1. File exceeds 500-line limit (Medium)

File: robot/helper_tdd_tag_validation.py (629 lines)

CONTRIBUTING.md mandates files stay under 500 lines. This was also noted as a nit by @hamza.khyari in the previous review.

Recommendation: Split along the natural Behave/Robot boundary, or extract the Robot-side fixture runner infrastructure (_run_fixture, _run_multi_fixture) into a small _tdd_fixture_runner.py utility. Extracting the XML message-extraction helper (Finding #6) would also reduce line count.


2. Idempotency guard untested (Medium)

File: robot/tdd_expected_fail_listener.py, lines 79, 193–195

The _processed_tests set guard prevents double-inversion if the listener is loaded twice. No test exercises this — each helper subprocess starts fresh, so the guard is never triggered. If this guard broke (e.g., someone clears the set prematurely), a tdd_expected_fail test would silently double-invert (FAIL→PASS→FAIL), producing wrong results with no test catching it.

Recommendation: Add a Robot-side test that passes --listener <path> --listener <path> and verifies the fixture is still correctly inverted to PASS (not double-inverted back to FAIL).


3. close() cleanup untested (Low)

File: robot/tdd_expected_fail_listener.py, lines 254–262

close() clears both _validation_errors and _processed_tests. No test verifies this behavior. Each test subprocess starts fresh, so close() cleanup never observably matters in the current test topology.

Recommendation: Low priority. Consider adding a note documenting that close() is a safety net for embedded/long-lived Robot processes, or add a simple direct-import test that populates the dicts, calls close(), and verifies they are empty.


4. Weak expected-fail-alone assertion (Low)

File: robot/helper_tdd_tag_validation.py, line 554

if status == "FAIL" and "tdd_bug" in message and "tdd_bug_<N>" in message:

The first check ("tdd_bug" in message) is always satisfied when the second ("tdd_bug_<N>" in message) is true, because "tdd_bug" is a substring of "tdd_bug_<N>". The assertion doesn't truly verify both tags are listed independently in the error message.

Recommendation: Tighten to "tdd_bug, tdd_bug_<N>" in message or check for the combined phrase from the actual error output.


5. Tag case-normalization never exercised (Low)

File: robot/tdd_expected_fail_listener.py, lines 161, 213

All 8 fixture files use lowercase tags exclusively. The .lower() normalization in start_test / end_test is never tested with mixed-case input. If it were removed, no test would fail.

Recommendation: Modify one fixture to use mixed-case tags (e.g., TDD_Expected_Fail, TDD_BUG_999) and verify the listener correctly handles them.


6. Duplicated XML message-extraction (Low/Nit)

File: robot/helper_tdd_tag_validation.py, lines 347–353 and 413–419

The RF output XML message extraction pattern (check status_el.textstatus_el.get("message")<msg> child) is copy-pasted between _run_fixture and _run_multi_fixture. Any future RF output-format change must be updated in two places.

Recommendation: Extract to _extract_message(test_el: ET.Element, status_el: ET.Element) -> str. Also helps reduce line count (Finding #1).


Previous Review Comments — Status

All of @hamza.khyari's 15 findings from the formal review have been addressed in Round 4. The C-1 (scope bundling) and C-2 (mock in integration tests) concerns are resolved by rebase — the PR now contains only the #628 commit. PM (@freemo) has ruled the PR mergeable.

Process Checks (all pass)

  • Commit message matches issue Metadata ✓
  • ISSUES CLOSED: #628 footer present ✓
  • CHANGELOG updated under Unreleased ✓
  • All files belong to this feature ✓
  • No generated artifacts or secrets ✓
  • No # type: ignore in new source ✓
  • Mocking code in features/mocks/ only ✓
  • Robot for integration tests, Behave for unit tests ✓
  • Quality gates reported passing (lint, typecheck, unit_tests, integration_tests, coverage 97%) ✓
## Self-Review (Round 5) — PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` **Reviewer**: hurui200320 | **Protocol**: 5-agent parallel review (test coverage/flaws, performance, bug detection, security, spec/code quality) ### Verdict: **APPROVED** — with non-blocking suggestions below. --- ### Overall Assessment All 6 acceptance criteria from ticket #628 are correctly satisfied. Code quality is high — clean separation between validation (`start_test`) and inversion (`end_test`), proper regex compilation at module level, thorough docstrings, and comprehensive Robot + Behave test coverage. All 15 findings from @hamza.khyari's previous review have been properly addressed in Round 4. The rebase cleanly scoped this PR to a single commit for #628. **Performance**: No issues. Listener overhead is negligible (~microseconds per test across 1500+ tests). **Security**: No issues. Subprocess invocation uses list args (no shell injection), fixture names are hardcoded, temp dirs managed via context managers, XML is self-generated. **Bug Detection**: No new bugs found. All 8 previously-reported bugs properly fixed. **6 new suggestions** are listed below — 2 Medium, 4 Low/Nit. None are blocking. --- ### Findings | # | Severity | Category | File | Finding | |---|----------|----------|------|---------| | 1 | **Medium** | CODE | `helper_tdd_tag_validation.py` | File exceeds 500-line limit (629 lines) | | 2 | **Medium** | TEST | `tdd_expected_fail_listener.py` | Idempotency guard (`_processed_tests`) is untested | | 3 | Low | TEST | `tdd_expected_fail_listener.py` | `close()` cleanup function is untested | | 4 | Low | TEST | `helper_tdd_tag_validation.py:554` | Weak assertion in `expected-fail-alone` validation | | 5 | Low | TEST | `tdd_expected_fail_listener.py:161` | Tag case-normalization (`.lower()`) never exercised by tests | | 6 | Low/Nit | CODE | `helper_tdd_tag_validation.py:347,413` | Duplicated XML message-extraction logic | --- #### 1. File exceeds 500-line limit (Medium) **File**: `robot/helper_tdd_tag_validation.py` (629 lines) CONTRIBUTING.md mandates files stay under 500 lines. This was also noted as a nit by @hamza.khyari in the previous review. **Recommendation**: Split along the natural Behave/Robot boundary, or extract the Robot-side fixture runner infrastructure (`_run_fixture`, `_run_multi_fixture`) into a small `_tdd_fixture_runner.py` utility. Extracting the XML message-extraction helper (Finding #6) would also reduce line count. --- #### 2. Idempotency guard untested (Medium) **File**: `robot/tdd_expected_fail_listener.py`, lines 79, 193–195 The `_processed_tests` set guard prevents double-inversion if the listener is loaded twice. No test exercises this — each helper subprocess starts fresh, so the guard is never triggered. If this guard broke (e.g., someone clears the set prematurely), a `tdd_expected_fail` test would silently double-invert (FAIL→PASS→FAIL), producing wrong results with no test catching it. **Recommendation**: Add a Robot-side test that passes `--listener <path> --listener <path>` and verifies the fixture is still correctly inverted to PASS (not double-inverted back to FAIL). --- #### 3. `close()` cleanup untested (Low) **File**: `robot/tdd_expected_fail_listener.py`, lines 254–262 `close()` clears both `_validation_errors` and `_processed_tests`. No test verifies this behavior. Each test subprocess starts fresh, so `close()` cleanup never observably matters in the current test topology. **Recommendation**: Low priority. Consider adding a note documenting that `close()` is a safety net for embedded/long-lived Robot processes, or add a simple direct-import test that populates the dicts, calls `close()`, and verifies they are empty. --- #### 4. Weak `expected-fail-alone` assertion (Low) **File**: `robot/helper_tdd_tag_validation.py`, line 554 ```python if status == "FAIL" and "tdd_bug" in message and "tdd_bug_<N>" in message: ``` The first check (`"tdd_bug" in message`) is always satisfied when the second (`"tdd_bug_<N>" in message`) is true, because `"tdd_bug"` is a substring of `"tdd_bug_<N>"`. The assertion doesn't truly verify both tags are listed independently in the error message. **Recommendation**: Tighten to `"tdd_bug, tdd_bug_<N>" in message` or check for the combined phrase from the actual error output. --- #### 5. Tag case-normalization never exercised (Low) **File**: `robot/tdd_expected_fail_listener.py`, lines 161, 213 All 8 fixture files use lowercase tags exclusively. The `.lower()` normalization in `start_test` / `end_test` is never tested with mixed-case input. If it were removed, no test would fail. **Recommendation**: Modify one fixture to use mixed-case tags (e.g., `TDD_Expected_Fail`, `TDD_BUG_999`) and verify the listener correctly handles them. --- #### 6. Duplicated XML message-extraction (Low/Nit) **File**: `robot/helper_tdd_tag_validation.py`, lines 347–353 and 413–419 The RF output XML message extraction pattern (check `status_el.text` → `status_el.get("message")` → `<msg>` child) is copy-pasted between `_run_fixture` and `_run_multi_fixture`. Any future RF output-format change must be updated in two places. **Recommendation**: Extract to `_extract_message(test_el: ET.Element, status_el: ET.Element) -> str`. Also helps reduce line count (Finding #1). --- ### Previous Review Comments — Status All of @hamza.khyari's 15 findings from the formal review have been addressed in Round 4. The C-1 (scope bundling) and C-2 (mock in integration tests) concerns are resolved by rebase — the PR now contains only the #628 commit. PM (@freemo) has ruled the PR mergeable. ### Process Checks (all pass) - Commit message matches issue Metadata ✓ - `ISSUES CLOSED: #628` footer present ✓ - CHANGELOG updated under Unreleased ✓ - All files belong to this feature ✓ - No generated artifacts or secrets ✓ - No `# type: ignore` in new source ✓ - Mocking code in `features/mocks/` only ✓ - Robot for integration tests, Behave for unit tests ✓ - Quality gates reported passing (lint, typecheck, unit_tests, integration_tests, coverage 97%) ✓
hurui200320 force-pushed feature/m5-robot-tdd-tags from c7bdb62b73
Some checks failed
CI / lint (pull_request) Successful in 15s
CI / typecheck (pull_request) Successful in 42s
CI / security (pull_request) Successful in 49s
CI / quality (pull_request) Successful in 29s
CI / e2e_tests (pull_request) Successful in 1m29s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 15s
CI / unit_tests (pull_request) Successful in 3m12s
CI / integration_tests (pull_request) Successful in 3m38s
CI / docker (pull_request) Successful in 1m50s
CI / coverage (pull_request) Successful in 6m16s
CI / benchmark-regression (pull_request) Failing after 1m39s
to 5e4453d0a4
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 29s
CI / security (pull_request) Successful in 42s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Successful in 1m24s
CI / integration_tests (pull_request) Successful in 3m34s
CI / unit_tests (pull_request) Successful in 4m53s
CI / docker (pull_request) Successful in 54s
CI / coverage (pull_request) Successful in 7m8s
CI / benchmark-regression (pull_request) Successful in 37m10s
2026-03-16 08:11:08 +00:00
Compare
freemo approved these changes 2026-03-16 16:13:34 +00:00
Dismissed
freemo left a comment

PM Review — Robot Framework TDD Tag Handler (#628)

PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Review Summary

This PR implements the Robot Framework counterpart of the Behave TDD expected-fail system. It has been through 5 rounds of review with all 15 findings from @hamza.khyari addressed. The PR has been rebased to a single commit for #628 only (resolving the C-1 scope bundling concern). All quality gates pass. Approving.

TDD Infrastructure Verification

  • Listener v3 module implements result inversion for tdd_expected_fail tags
  • Tag validation enforces: tdd_bug_<N> requires tdd_bug, tdd_expected_fail requires both
  • Idempotency guard against double-invocation
  • SKIP status handling
  • 8 fixture files covering all acceptance criteria
  • 21 integration test cases (12 Behave-side + 9 Robot-side)
  • Shared mock helper extracted to features/mocks/tdd_test_helpers.py
  • Nox sessions updated with --listener flag
  • All quality gates pass (10,815 scenarios, 97% coverage)

Critical Path Impact

This PR unblocks 5 Critical bug TDD pipelines (#838-#842) that depend on Robot Framework TDD tag support. Its merge is a prerequisite for the following TDD issues:

Previous Scope Concern Resolution

The C-1 scope bundling concern from @hamza.khyari has been resolved — @hurui200320 rebased to include only the single commit for issue #628. The remaining out-of-scope findings were correctly deferred.

Verdict: APPROVED. This is a critical-path merge. Recommend @hamza.khyari convert his previous REQUEST_CHANGES to APPROVED given that all 15 findings have been addressed and the scope concern is resolved. Merge target: Day 37 (2026-03-17).

## PM Review — Robot Framework TDD Tag Handler (#628) **PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework** ### Review Summary This PR implements the Robot Framework counterpart of the Behave TDD expected-fail system. It has been through 5 rounds of review with all 15 findings from @hamza.khyari addressed. The PR has been rebased to a single commit for #628 only (resolving the C-1 scope bundling concern). All quality gates pass. Approving. ### TDD Infrastructure Verification - [x] Listener v3 module implements result inversion for `tdd_expected_fail` tags - [x] Tag validation enforces: `tdd_bug_<N>` requires `tdd_bug`, `tdd_expected_fail` requires both - [x] Idempotency guard against double-invocation - [x] SKIP status handling - [x] 8 fixture files covering all acceptance criteria - [x] 21 integration test cases (12 Behave-side + 9 Robot-side) - [x] Shared mock helper extracted to `features/mocks/tdd_test_helpers.py` - [x] Nox sessions updated with `--listener` flag - [x] All quality gates pass (10,815 scenarios, 97% coverage) ### Critical Path Impact This PR **unblocks 5 Critical bug TDD pipelines** (#838-#842) that depend on Robot Framework TDD tag support. Its merge is a prerequisite for the following TDD issues: - #840 (TDD for ACMS tier bug #821) - #841 (TDD for actor list bug #797) - #842 (TDD for init --yes bug #783) - Plus newly created #977, #978, #979 (TDD for plan CLI bugs #967-#969) ### Previous Scope Concern Resolution The C-1 scope bundling concern from @hamza.khyari has been resolved — @hurui200320 rebased to include only the single commit for issue #628. The remaining out-of-scope findings were correctly deferred. **Verdict: APPROVED.** This is a critical-path merge. Recommend @hamza.khyari convert his previous REQUEST_CHANGES to APPROVED given that all 15 findings have been addressed and the scope concern is resolved. Merge target: Day 37 (2026-03-17).
brent.edwards approved these changes 2026-03-16 22:30:01 +00:00
Dismissed
brent.edwards left a comment

Code Review — PR #673 feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Reviewer: @brent.edwards | Size: L (1,002 lines, 15 files) | Focus: Listener correctness, test quality, process compliance


Verification of Previous Reviews

All prior review findings have been addressed:

Source Findings Status
Rui self-review R1 # type: ignore violation Fixed — Callable[[], int] typing
Rui self-review R2 Bare else catches SKIP incorrectly Fixed — explicit elif PASS + else SKIP
Hamza R1 (15 findings) P2 idempotency, sham test, + 13 nits All 15 addressed in Round 3
Hamza formal review (C-1) Scope bundling Resolved — rebased to single commit for #628
Rui R4 M-5 shared mock, M-7 XML fallback, L-6/L-7, new fixtures All addressed
Rui R5 self-review 629-line file, untested idempotency guard Acknowledged — see below

No P0 or P1 findings.

The listener module is well-designed:

  • Validation-first in end_test — invalid tag combinations always produce FAIL, regardless of test outcome
  • data.tags over result.tags — correctly uses static test definition tags, not runtime-mutable tags (intentional, documented)
  • fullmatch() on regex — correctly distinguishes tdd_bug from tdd_bug_\<N\> without false matches
  • Process-local state_processed_tests and _validation_errors are per-process (pabot uses subprocesses), no concurrency risk
  • close() cleanup — RF Listener v3 API calls close() on execution finish
  • No # type: ignore — clean type safety
  • No # noqa in listener — helper has 3 justified # noqa: E402 for post-sys.path imports

P2:should-fix (3)

1. cmd_skip_status_unchanged and cmd_tdd_bug_alone_valid can false-positive without listener loaded
Both tests verify non-modification of results (SKIP stays SKIP, PASS stays PASS) but run the fixture alone without a positive control. Unlike cmd_normal_test_unaffected (which uses _run_multi_fixture with a companion tdd_expected_fail_fails fixture to PROVE the listener was active and selectively applied), these two commands would produce the same result if the --listener flag were missing entirely.
Fix: Use _run_multi_fixture with tdd_expected_fail_fails as a co-run fixture (same pattern as cmd_normal_test_unaffected), asserting the companion was inverted.

2. subprocess.run calls missing timeout
Both _run_fixture and _run_multi_fixture call subprocess.run(cmd, capture_output=True, text=True) without timeout. If a fixture hangs, the subprocess blocks forever, which blocks the parent Robot test, which blocks CI with no diagnostic. All other Robot helpers in the repo use timeout=120 or similar.
Fix: Add timeout=120 and handle subprocess.TimeoutExpired.

3. helper_tdd_tag_validation.py at 629 lines exceeds the 500-line limit
Already flagged in Rui's self-review R5. The natural split point is around line 270 (Behave-side functions vs Robot-side fixture runners). Extracting _run_fixture/_run_multi_fixture/_extract_message into a _tdd_fixture_runner.py utility would solve this and eliminate the duplicated XML extraction (R5 finding #6).


P3:nit (1)

4. features/mocks/__init__.py doesn't re-export make_mock_scenario
Every other mock in the package is re-exported from __init__.py. tdd_test_helpers.make_mock_scenario is imported by full module path, so this works, but breaks the package convention. Minor inconsistency.


Summary

Severity Count
P0:blocker 0
P1:must-fix 0
P2:should-fix 3
P3:nit 1

Verdict: APPROVED — The listener implementation is correct, well-tested (8 fixtures, 21 integration tests), and all prior review findings have been thoroughly addressed. The 3 P2 findings are genuine quality improvements but do not block merge. This is critical-path TDD infrastructure that unblocks 5+ bug TDD pipelines.

The code demonstrates strong engineering: clean validation/inversion separation, proper RF Listener v3 API usage, comprehensive edge-case coverage (SKIP, double-invocation, tdd_bug-alone, tdd_expected_fail-alone), and an extracted shared mock helper. Well done @hurui200320.

## Code Review — PR #673 `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` **Reviewer:** @brent.edwards | **Size:** L (1,002 lines, 15 files) | **Focus:** Listener correctness, test quality, process compliance --- ### Verification of Previous Reviews All prior review findings have been addressed: | Source | Findings | Status | |--------|----------|--------| | Rui self-review R1 | `# type: ignore` violation | ✅ Fixed — `Callable[[], int]` typing | | Rui self-review R2 | Bare `else` catches SKIP incorrectly | ✅ Fixed — explicit `elif PASS` + `else SKIP` | | Hamza R1 (15 findings) | P2 idempotency, sham test, + 13 nits | ✅ All 15 addressed in Round 3 | | Hamza formal review (C-1) | Scope bundling | ✅ Resolved — rebased to single commit for #628 | | Rui R4 | M-5 shared mock, M-7 XML fallback, L-6/L-7, new fixtures | ✅ All addressed | | Rui R5 self-review | 629-line file, untested idempotency guard | Acknowledged — see below | --- ### No P0 or P1 findings. The listener module is well-designed: - **Validation-first** in `end_test` — invalid tag combinations always produce FAIL, regardless of test outcome - **`data.tags` over `result.tags`** — correctly uses static test definition tags, not runtime-mutable tags (intentional, documented) - **`fullmatch()` on regex** — correctly distinguishes `tdd_bug` from `tdd_bug_\<N\>` without false matches - **Process-local state** — `_processed_tests` and `_validation_errors` are per-process (pabot uses subprocesses), no concurrency risk - **`close()` cleanup** — RF Listener v3 API calls `close()` on execution finish - **No `# type: ignore`** — clean type safety - **No `# noqa` in listener** — helper has 3 justified `# noqa: E402` for post-`sys.path` imports --- ### P2:should-fix (3) **1. `cmd_skip_status_unchanged` and `cmd_tdd_bug_alone_valid` can false-positive without listener loaded** Both tests verify non-modification of results (SKIP stays SKIP, PASS stays PASS) but run the fixture alone without a positive control. Unlike `cmd_normal_test_unaffected` (which uses `_run_multi_fixture` with a companion `tdd_expected_fail_fails` fixture to PROVE the listener was active and selectively applied), these two commands would produce the same result if the `--listener` flag were missing entirely. **Fix:** Use `_run_multi_fixture` with `tdd_expected_fail_fails` as a co-run fixture (same pattern as `cmd_normal_test_unaffected`), asserting the companion was inverted. **2. `subprocess.run` calls missing `timeout`** Both `_run_fixture` and `_run_multi_fixture` call `subprocess.run(cmd, capture_output=True, text=True)` without `timeout`. If a fixture hangs, the subprocess blocks forever, which blocks the parent Robot test, which blocks CI with no diagnostic. All other Robot helpers in the repo use `timeout=120` or similar. **Fix:** Add `timeout=120` and handle `subprocess.TimeoutExpired`. **3. `helper_tdd_tag_validation.py` at 629 lines exceeds the 500-line limit** Already flagged in Rui's self-review R5. The natural split point is around line 270 (Behave-side functions vs Robot-side fixture runners). Extracting `_run_fixture`/`_run_multi_fixture`/`_extract_message` into a `_tdd_fixture_runner.py` utility would solve this and eliminate the duplicated XML extraction (R5 finding #6). --- ### P3:nit (1) **4. `features/mocks/__init__.py` doesn't re-export `make_mock_scenario`** Every other mock in the package is re-exported from `__init__.py`. `tdd_test_helpers.make_mock_scenario` is imported by full module path, so this works, but breaks the package convention. Minor inconsistency. --- ### Summary | Severity | Count | |----------|-------| | P0:blocker | 0 | | P1:must-fix | 0 | | P2:should-fix | 3 | | P3:nit | 1 | **Verdict: APPROVED** — The listener implementation is correct, well-tested (8 fixtures, 21 integration tests), and all prior review findings have been thoroughly addressed. The 3 P2 findings are genuine quality improvements but do not block merge. This is critical-path TDD infrastructure that unblocks 5+ bug TDD pipelines. The code demonstrates strong engineering: clean validation/inversion separation, proper RF Listener v3 API usage, comprehensive edge-case coverage (SKIP, double-invocation, tdd_bug-alone, tdd_expected_fail-alone), and an extracted shared mock helper. Well done @hurui200320.
brent.edwards force-pushed feature/m5-robot-tdd-tags from 5e4453d0a4
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 18s
CI / build (pull_request) Successful in 27s
CI / quality (pull_request) Successful in 29s
CI / security (pull_request) Successful in 42s
CI / typecheck (pull_request) Successful in 43s
CI / e2e_tests (pull_request) Successful in 1m24s
CI / integration_tests (pull_request) Successful in 3m34s
CI / unit_tests (pull_request) Successful in 4m53s
CI / docker (pull_request) Successful in 54s
CI / coverage (pull_request) Successful in 7m8s
CI / benchmark-regression (pull_request) Successful in 37m10s
to a5de448856
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 51s
CI / security (pull_request) Successful in 53s
CI / e2e_tests (pull_request) Successful in 1m20s
CI / unit_tests (pull_request) Successful in 3m24s
CI / benchmark-regression (pull_request) Failing after 2m32s
CI / integration_tests (pull_request) Successful in 3m33s
CI / docker (pull_request) Successful in 15s
CI / coverage (pull_request) Successful in 6m52s
2026-03-16 22:45:59 +00:00
Compare
brent.edwards dismissed freemo's review 2026-03-16 22:45:59 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards dismissed brent.edwards's review 2026-03-16 22:45:59 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Member

Review Fixes Applied — a5de448

Addressed all 3 P2 and 1 P3 findings from @brent.edwards's review:

# Sev Finding Fix
P2-1 should-fix cmd_skip_status_unchanged and cmd_tdd_bug_alone_valid can false-positive without listener loaded Converted both to use run_multi_fixture with tdd_expected_fail_fails as positive control — same pattern as cmd_normal_test_unaffected
P2-2 should-fix subprocess.run missing timeout Added timeout=120 to both run_fixture and run_multi_fixture with subprocess.TimeoutExpired handling
P2-3 should-fix helper_tdd_tag_validation.py at 629 lines (limit 500) Extracted run_fixture, run_multi_fixture, and _extract_message to new robot/_tdd_fixture_runner.py (186 lines). Helper now 499 lines.
P3-4 nit make_mock_scenario not re-exported from features/mocks/__init__.py Added import + __all__ entry

Line counts after fix

  • robot/helper_tdd_tag_validation.py: 499 (was 629)
  • robot/_tdd_fixture_runner.py: 186 (new)

Quality gates

  • Lint: PASS (0 violations)
  • Format: PASS
  • Typecheck: PASS (0 errors)
## Review Fixes Applied — `a5de448` Addressed all 3 P2 and 1 P3 findings from @brent.edwards's review: | # | Sev | Finding | Fix | |---|-----|---------|-----| | **P2-1** | should-fix | `cmd_skip_status_unchanged` and `cmd_tdd_bug_alone_valid` can false-positive without listener loaded | Converted both to use `run_multi_fixture` with `tdd_expected_fail_fails` as positive control — same pattern as `cmd_normal_test_unaffected` | | **P2-2** | should-fix | `subprocess.run` missing `timeout` | Added `timeout=120` to both `run_fixture` and `run_multi_fixture` with `subprocess.TimeoutExpired` handling | | **P2-3** | should-fix | `helper_tdd_tag_validation.py` at 629 lines (limit 500) | Extracted `run_fixture`, `run_multi_fixture`, and `_extract_message` to new `robot/_tdd_fixture_runner.py` (186 lines). Helper now **499 lines**. | | **P3-4** | nit | `make_mock_scenario` not re-exported from `features/mocks/__init__.py` | Added import + `__all__` entry | ### Line counts after fix - `robot/helper_tdd_tag_validation.py`: **499** (was 629) - `robot/_tdd_fixture_runner.py`: **186** (new) ### Quality gates - Lint: PASS (0 violations) - Format: PASS - Typecheck: PASS (0 errors)
Merge branch 'master' into feature/m5-robot-tdd-tags
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 16s
CI / lint (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 29s
CI / unit_tests (pull_request) Has been cancelled
CI / e2e_tests (pull_request) Has been cancelled
CI / typecheck (pull_request) Has been cancelled
CI / security (pull_request) Has been cancelled
CI / integration_tests (pull_request) Has been cancelled
CI / docker (pull_request) Has been cancelled
CI / coverage (pull_request) Has been cancelled
CI / benchmark-regression (pull_request) Has been cancelled
d0ca129d90
Merge branch 'master' into feature/m5-robot-tdd-tags
Some checks failed
CI / lint (pull_request) Successful in 25s
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 40s
CI / typecheck (pull_request) Successful in 47s
CI / security (pull_request) Successful in 45s
CI / e2e_tests (pull_request) Successful in 2m4s
CI / benchmark-regression (pull_request) Failing after 1m41s
CI / unit_tests (pull_request) Successful in 5m2s
CI / integration_tests (pull_request) Successful in 5m30s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 6m9s
c13a62a2f2
brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-03-16 23:06:42 +00:00
brent.edwards deleted branch feature/m5-robot-tdd-tags 2026-03-16 23:13:31 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!673
No description provided.