feat(testing): implement @tdd_expected_fail tag handling in Robot Framework #673

2026-03-10T08:31:34Z

hurui200320 commented

2026-03-10 08:31:34 +00:00

Summary

Implements the three-tag TDD bug-capture system in Robot Framework via a Listener v3 module, paralleling the Behave implementation in PR !665 (issue #627). This enables TDD bug-capture tests to work in Robot Framework integration tests using the same tdd_bug, tdd_bug_<N>, and tdd_expected_fail tags as specified in CONTRIBUTING.md > TDD Bug Test Tags.

Changes

Listener (robot/tdd_expected_fail_listener.py): Robot Framework Listener v3 with start_test (tag validation) and end_test (result inversion) hooks. Tests tagged tdd_expected_fail that fail have their result inverted to PASS (bug still exists); tests that unexpectedly pass are inverted to FAIL with guidance message. Includes idempotency guard against double-invocation, explicit SKIP status handling, close() hook for clean teardown, and documented data.tags vs result.tags design choice. Error message @ prefix divergence from Behave is documented.
Tag validation: tdd_bug_<N> requires tdd_bug; tdd_expected_fail requires both tdd_bug and at least one tdd_bug_<N>. Validation errors are stored in start_test and applied in end_test. Design note documents why str | None return type is used instead of ValueError.
Nox registration (noxfile.py): Added --listener flag in integration_tests, slow_integration_tests, and e2e_tests sessions using Path(__file__).parent-relative path resolution. Added --exclude tdd_fixture to prevent fixture files from being picked up by the main pabot runner.
Test fixtures (robot/fixtures/): 8 fixture .robot files covering all acceptance criteria plus SKIP handling, tdd_expected_fail-alone validation, and tdd_bug-alone edge case.
Integration tests (robot/tdd_tag_validation.robot): 21 test cases total — 12 Behave-side (from PR !665) plus 9 Robot-side that run fixtures via helper script as sub-processes with the listener active, then inspect output XML for correct final status.
Helper script (robot/helper_tdd_tag_validation.py): Unified helper supporting both Behave-side validation commands and Robot-side fixture commands. Includes _run_multi_fixture() for proving listener selectivity, output.xml existence guard, tightened substring assertions, __all__ declaration, XML message attribute fallback for RF 7.3+, and XML parsing risk documentation.
Shared mock helper (features/mocks/tdd_test_helpers.py): Extracted make_mock_scenario() from duplicated code in robot/helper_tdd_tag_validation.py and features/steps/tdd_tag_validation_steps.py. Both files now import from the shared module.
Changelog (CHANGELOG.md): Added entry for this feature.

Review Fixes (Round 4)

Rebased onto latest master — PR now contains a single commit for #628 only. Addressed all applicable findings from @hamza.khyari's code review:

#	Severity	Finding	Resolution
1	P2	Double-invocation idempotency bug	Added `_processed_tests` set guard with `close()` cleanup
2	P2	Normal Test is a sham test	Rewrote to run alongside tdd_expected_fail fixture in single invocation
3	P3	Missing output.xml guard	Added existence check with clear diagnostic
4	P3	`data.tags` vs `result.tags`	Documented intentional choice in docstring
5	P3	No SKIP test	Added `tdd_expected_fail_skip.robot` fixture + test case
6	P3	Missing message check	Added `"TDD expected failure"` assertion
7	P3	Weak substring assertions	Tightened to `"missing the required tdd_bug tag"`
8	P3	No tdd_expected_fail-alone test	Added `tdd_expected_fail_alone.robot` fixture + test
9	P3	Guidance message deviation	Kept improvements, documented rationale
10	P3	`_validation_errors` not cleared	Added `close()` hook
11	P3	Stale temp path in return	Simplified to `tuple[str, str]`
12	P3	CWD-relative listener path	Changed to `Path(__file__).parent / ...`
13	P4	Missing `__all__`	Added `__all__: list[str] = []`
14	P4	"mirroring" in docstring	Changed to "paralleling"
15	P4	XML parsing without defusedxml	Added risk acceptance comment
M-5	Medium	`_make_mock_scenario` duplicated	Extracted to `features/mocks/tdd_test_helpers.py`
M-7	Medium	XML message attribute fallback	Added `status_el.get("message")` fallback for RF 7.3+
L-6	Low	Docstring says "longname"	Fixed to "full_name"
L-7	Low	Error messages `@` prefix	Documented intentional divergence per framework convention
Nit	-	Missing tdd_bug-alone fixture	Added fixture + test case
Nit	-	`_validate_tdd_tags` return type undocumented	Added design note in docstring

Quality Gates (post-rebase onto latest master)

Gate	Result
`nox -s lint`	PASS
`nox -s typecheck`	PASS (0 errors)
`nox -s unit_tests`	PASS (10,815 scenarios, 381 features)
`nox -s integration_tests`	PASS (1,519 tests)
`nox -s coverage_report`	PASS (97% >= 97% threshold)

Closes #628

ISSUES CLOSED: #628

## Summary Implements the three-tag TDD bug-capture system in Robot Framework via a Listener v3 module, paralleling the Behave implementation in PR !665 (issue #627). This enables TDD bug-capture tests to work in Robot Framework integration tests using the same `tdd_bug`, `tdd_bug_<N>`, and `tdd_expected_fail` tags as specified in CONTRIBUTING.md > TDD Bug Test Tags. ### Changes - **Listener** (`robot/tdd_expected_fail_listener.py`): Robot Framework Listener v3 with `start_test` (tag validation) and `end_test` (result inversion) hooks. Tests tagged `tdd_expected_fail` that fail have their result inverted to PASS (bug still exists); tests that unexpectedly pass are inverted to FAIL with guidance message. Includes idempotency guard against double-invocation, explicit SKIP status handling, `close()` hook for clean teardown, and documented `data.tags` vs `result.tags` design choice. Error message `@` prefix divergence from Behave is documented. - **Tag validation**: `tdd_bug_<N>` requires `tdd_bug`; `tdd_expected_fail` requires both `tdd_bug` and at least one `tdd_bug_<N>`. Validation errors are stored in `start_test` and applied in `end_test`. Design note documents why `str | None` return type is used instead of `ValueError`. - **Nox registration** (`noxfile.py`): Added `--listener` flag in `integration_tests`, `slow_integration_tests`, and `e2e_tests` sessions using `Path(__file__).parent`-relative path resolution. Added `--exclude tdd_fixture` to prevent fixture files from being picked up by the main pabot runner. - **Test fixtures** (`robot/fixtures/`): 8 fixture `.robot` files covering all acceptance criteria plus SKIP handling, tdd_expected_fail-alone validation, and tdd_bug-alone edge case. - **Integration tests** (`robot/tdd_tag_validation.robot`): 21 test cases total — 12 Behave-side (from PR !665) plus 9 Robot-side that run fixtures via helper script as sub-processes with the listener active, then inspect output XML for correct final status. - **Helper script** (`robot/helper_tdd_tag_validation.py`): Unified helper supporting both Behave-side validation commands and Robot-side fixture commands. Includes `_run_multi_fixture()` for proving listener selectivity, output.xml existence guard, tightened substring assertions, `__all__` declaration, XML message attribute fallback for RF 7.3+, and XML parsing risk documentation. - **Shared mock helper** (`features/mocks/tdd_test_helpers.py`): Extracted `make_mock_scenario()` from duplicated code in `robot/helper_tdd_tag_validation.py` and `features/steps/tdd_tag_validation_steps.py`. Both files now import from the shared module. - **Changelog** (`CHANGELOG.md`): Added entry for this feature. ### Review Fixes (Round 4) Rebased onto latest master — PR now contains a single commit for #628 only. Addressed all applicable findings from @hamza.khyari's code review: | # | Severity | Finding | Resolution | |---|----------|---------|------------| | 1 | P2 | Double-invocation idempotency bug | Added `_processed_tests` set guard with `close()` cleanup | | 2 | P2 | Normal Test is a sham test | Rewrote to run alongside tdd_expected_fail fixture in single invocation | | 3 | P3 | Missing output.xml guard | Added existence check with clear diagnostic | | 4 | P3 | `data.tags` vs `result.tags` | Documented intentional choice in docstring | | 5 | P3 | No SKIP test | Added `tdd_expected_fail_skip.robot` fixture + test case | | 6 | P3 | Missing message check | Added `"TDD expected failure"` assertion | | 7 | P3 | Weak substring assertions | Tightened to `"missing the required tdd_bug tag"` | | 8 | P3 | No tdd_expected_fail-alone test | Added `tdd_expected_fail_alone.robot` fixture + test | | 9 | P3 | Guidance message deviation | Kept improvements, documented rationale | | 10 | P3 | `_validation_errors` not cleared | Added `close()` hook | | 11 | P3 | Stale temp path in return | Simplified to `tuple[str, str]` | | 12 | P3 | CWD-relative listener path | Changed to `Path(__file__).parent / ...` | | 13 | P4 | Missing `__all__` | Added `__all__: list[str] = []` | | 14 | P4 | "mirroring" in docstring | Changed to "paralleling" | | 15 | P4 | XML parsing without defusedxml | Added risk acceptance comment | | M-5 | Medium | `_make_mock_scenario` duplicated | Extracted to `features/mocks/tdd_test_helpers.py` | | M-7 | Medium | XML message attribute fallback | Added `status_el.get("message")` fallback for RF 7.3+ | | L-6 | Low | Docstring says "longname" | Fixed to "full_name" | | L-7 | Low | Error messages `@` prefix | Documented intentional divergence per framework convention | | Nit | - | Missing tdd_bug-alone fixture | Added fixture + test case | | Nit | - | `_validate_tdd_tags` return type undocumented | Added design note in docstring | ### Quality Gates (post-rebase onto latest master) | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (10,815 scenarios, 381 features) | | `nox -s integration_tests` | PASS (1,519 tests) | | `nox -s coverage_report` | PASS (97% >= 97% threshold) | Closes #628 ISSUES CLOSED: #628

hurui200320 added this to the v3.2.0 milestone 2026-03-10 08:31:42 +00:00

hurui200320 added the

Type

Feature

label 2026-03-10 08:31:52 +00:00

hurui200320 referenced this pull request

2026-03-10 08:34:37 +00:00

Implement @tdd_expected_fail tag handling in Robot Framework #628

hurui200320 self-assigned this 2026-03-10 09:03:22 +00:00

hurui200320 reviewed 2026-03-10 09:10:16 +00:00

hurui200320 left a comment

Self-Review: PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

This is a self-review to ensure everything looks good before asking other people to review.

Acceptance Criteria Verification (Ticket #628)

All 6 acceptance criteria are met:

Criteria	Status	Covered By
`tdd_expected_fail` tests that fail → inverted to PASS	✅	`tdd_expected_fail_listener.py:176-186`, fixture `tdd_expected_fail_fails.robot`
`tdd_expected_fail` tests that pass → inverted to FAIL with guidance	✅	`tdd_expected_fail_listener.py:187-201`, fixture `tdd_expected_fail_passes.robot`
`tdd_bug_<N>` without `tdd_bug` → validation error	✅	`tdd_expected_fail_listener.py:88-94`, fixture `tdd_missing_tdd_bug.robot`
`tdd_expected_fail` without `tdd_bug`/`tdd_bug_<N>` → validation error	✅	`tdd_expected_fail_listener.py:96-108`, fixture `tdd_expected_fail_missing_bug_n.robot`
`tdd_bug` usable as a filter tag (`--include tdd_bug`)	✅	Standard RF tag filtering; no custom code needed
Normal tests completely unaffected	✅	`tdd_expected_fail_listener.py:172-174`, fixture `tdd_normal_test.robot`

Issue Found

`# type: ignore[operator]` violates CONTRIBUTING.md Type Safety rules

File: robot/helper_tdd_tag_validation.py:184
Related: robot/helper_tdd_tag_validation.py:165

CONTRIBUTING.md (line 546-548) states: "never use inline comments or annotations to suppress individual type checking errors (e.g., no type: ignore)".

Line 165 types the command dict as dict[str, object], which causes handler on line 183 to be inferred as object (not callable). This forces the # type: ignore[operator] on line 184.

10 other helpers in the same directory already use the correct pattern:

from collections.abc import Callable

_COMMANDS: dict[str, Callable[[], int]] = { ... }

(e.g., helper_unified_context_models.py:152, helper_uko_ontology.py:105, helper_repo_indexing.py:170)

Fix:

Add from collections.abc import Callable to imports.
Change line 165: _COMMANDS: dict[str, Callable[[], int]] = {
Remove the # type: ignore[operator] from line 184.

Items Reviewed (No Issues)

Listener logic (tdd_expected_fail_listener.py): Correct Listener v3 API usage. Tag validation in start_test with deferred application in end_test is the right pattern. _validation_errors dict properly cleaned via .pop(). Case-insensitive tag matching via .lower() is correct. _TDD_BUG_N_RE regex (tdd_bug_\d+ with fullmatch) correctly matches issue reference tags.
Nox integration (noxfile.py): --listener and --exclude tdd_fixture correctly added to both integration_tests and slow_integration_tests sessions. Listener path resolved to absolute.
All 5 fixture files: Each has the tdd_fixture exclusion tag, proper documentation, and correctly exercises its intended scenario.
Integration test suite (tdd_tag_validation.robot): Uses common.resource for Setup Test Environment, correctly references ${PYTHON} and ${WORKSPACE}, validates each sub-command via the helper.
Helper script (helper_tdd_tag_validation.py): Sub-process isolation is correct. XML message extraction has a sensible fallback to <msg> elements.
CHANGELOG entry: Present and accurate.
Thread/process safety: With pabot each worker gets its own _validation_errors dict (separate processes). With the standard runner, tests are sequential. No concurrency issues.
Security: No user-controlled input reaches subprocess.run. All paths derived from __file__ and fixture names.
Test coverage: 5 integration test cases cover all acceptance criteria.

## Self-Review: PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` This is a self-review to ensure everything looks good before asking other people to review. --- ### Acceptance Criteria Verification (Ticket #628) All 6 acceptance criteria are met: | Criteria | Status | Covered By | |---|---|---| | `tdd_expected_fail` tests that fail → inverted to PASS | :white_check_mark: | `tdd_expected_fail_listener.py:176-186`, fixture `tdd_expected_fail_fails.robot` | | `tdd_expected_fail` tests that pass → inverted to FAIL with guidance | :white_check_mark: | `tdd_expected_fail_listener.py:187-201`, fixture `tdd_expected_fail_passes.robot` | | `tdd_bug_<N>` without `tdd_bug` → validation error | :white_check_mark: | `tdd_expected_fail_listener.py:88-94`, fixture `tdd_missing_tdd_bug.robot` | | `tdd_expected_fail` without `tdd_bug`/`tdd_bug_<N>` → validation error | :white_check_mark: | `tdd_expected_fail_listener.py:96-108`, fixture `tdd_expected_fail_missing_bug_n.robot` | | `tdd_bug` usable as a filter tag (`--include tdd_bug`) | :white_check_mark: | Standard RF tag filtering; no custom code needed | | Normal tests completely unaffected | :white_check_mark: | `tdd_expected_fail_listener.py:172-174`, fixture `tdd_normal_test.robot` | --- ### Issue Found #### `# type: ignore[operator]` violates CONTRIBUTING.md Type Safety rules **File:** `robot/helper_tdd_tag_validation.py:184` **Related:** `robot/helper_tdd_tag_validation.py:165` CONTRIBUTING.md (line 546-548) states: *"never use inline comments or annotations to suppress individual type checking errors (e.g., no `type: ignore`)"*. Line 165 types the command dict as `dict[str, object]`, which causes `handler` on line 183 to be inferred as `object` (not callable). This forces the `# type: ignore[operator]` on line 184. **10 other helpers in the same directory already use the correct pattern:** ```python from collections.abc import Callable _COMMANDS: dict[str, Callable[[], int]] = { ... } ``` (e.g., `helper_unified_context_models.py:152`, `helper_uko_ontology.py:105`, `helper_repo_indexing.py:170`) **Fix:** 1. Add `from collections.abc import Callable` to imports. 2. Change line 165: `_COMMANDS: dict[str, Callable[[], int]] = {` 3. Remove the `# type: ignore[operator]` from line 184. --- ### Items Reviewed (No Issues) - **Listener logic** (`tdd_expected_fail_listener.py`): Correct Listener v3 API usage. Tag validation in `start_test` with deferred application in `end_test` is the right pattern. `_validation_errors` dict properly cleaned via `.pop()`. Case-insensitive tag matching via `.lower()` is correct. `_TDD_BUG_N_RE` regex (`tdd_bug_\d+` with `fullmatch`) correctly matches issue reference tags. - **Nox integration** (`noxfile.py`): `--listener` and `--exclude tdd_fixture` correctly added to both `integration_tests` and `slow_integration_tests` sessions. Listener path resolved to absolute. - **All 5 fixture files**: Each has the `tdd_fixture` exclusion tag, proper documentation, and correctly exercises its intended scenario. - **Integration test suite** (`tdd_tag_validation.robot`): Uses `common.resource` for `Setup Test Environment`, correctly references `${PYTHON}` and `${WORKSPACE}`, validates each sub-command via the helper. - **Helper script** (`helper_tdd_tag_validation.py`): Sub-process isolation is correct. XML message extraction has a sensible fallback to `<msg>` elements. - **CHANGELOG entry**: Present and accurate. - **Thread/process safety**: With pabot each worker gets its own `_validation_errors` dict (separate processes). With the standard runner, tests are sequential. No concurrency issues. - **Security**: No user-controlled input reaches `subprocess.run`. All paths derived from `__file__` and fixture names. - **Test coverage**: 5 integration test cases cover all acceptance criteria.

robot/helper_tdd_tag_validation.py Outdated

						
				@@ -0,0 +162,4 @@

				    return 1

				_COMMANDS: dict[str, object] = {

hurui200320 commented

2026-03-10 09:10:16 +00:00

This should be dict[str, Callable[[], int]] (with from collections.abc import Callable added to imports). Using object loses the callable type information and forces the # type: ignore[operator] on line 184. 10 other helpers in this directory already use the Callable[[], int] pattern (e.g., helper_unified_context_models.py:152).

This should be `dict[str, Callable[[], int]]` (with `from collections.abc import Callable` added to imports). Using `object` loses the callable type information and forces the `# type: ignore[operator]` on line 184. 10 other helpers in this directory already use the `Callable[[], int]` pattern (e.g., `helper_unified_context_models.py:152`).

hurui200320 marked this conversation as resolved

robot/helper_tdd_tag_validation.py Outdated

						
				@@ -0,0 +181,4 @@

				        sys.exit(1)

				    handler = _COMMANDS[sys.argv[1]]

				    sys.exit(handler())  # type: ignore[operator]

hurui200320 commented

2026-03-10 09:10:16 +00:00

Remove this # type: ignore[operator] — it violates CONTRIBUTING.md Type Safety rules (line 546-548). Once the _COMMANDS type on line 165 is fixed to dict[str, Callable[[], int]], this suppression is no longer needed.

Remove this `# type: ignore[operator]` — it violates CONTRIBUTING.md Type Safety rules (line 546-548). Once the `_COMMANDS` type on line 165 is fixed to `dict[str, Callable[[], int]]`, this suppression is no longer needed.

hurui200320 marked this conversation as resolved

hurui200320 force-pushed feature/m5-robot-tdd-tags from b882b5e208 to f72a981f54

2026-03-10 09:14:22 +00:00

Compare

hurui200320 commented

2026-03-10 09:16:50 +00:00

Self-review fix applied (force-pushed f72a981f):

Fixed the # type: ignore[operator] violation in robot/helper_tdd_tag_validation.py identified in the self-review:

Added from collections.abc import Callable to imports (line 42)
Changed _COMMANDS: dict[str, object] → _COMMANDS: dict[str, Callable[[], int]] (line 166)
Removed the # type: ignore[operator] from sys.exit(handler()) (line 185)

This follows the same pattern used by 10+ other helper scripts in the robot/ directory (e.g., helper_unified_context_models.py:152, helper_uko_ontology.py:105).

Verified locally:

nox -s lint → PASS (0 violations)
nox -s typecheck → PASS (0 errors, 0 warnings)

CI is running on the updated commit.

**Self-review fix applied** (force-pushed `f72a981f`): Fixed the `# type: ignore[operator]` violation in `robot/helper_tdd_tag_validation.py` identified in the self-review: 1. Added `from collections.abc import Callable` to imports (line 42) 2. Changed `_COMMANDS: dict[str, object]` → `_COMMANDS: dict[str, Callable[[], int]]` (line 166) 3. Removed the `# type: ignore[operator]` from `sys.exit(handler())` (line 185) This follows the same pattern used by 10+ other helper scripts in the `robot/` directory (e.g., `helper_unified_context_models.py:152`, `helper_uko_ontology.py:105`). Verified locally: - `nox -s lint` → PASS (0 violations) - `nox -s typecheck` → PASS (0 errors, 0 warnings) CI is running on the updated commit.

hurui200320 reviewed 2026-03-10 10:01:56 +00:00

hurui200320 left a comment

Self-Review (Round 2): PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

This is a self-review to ensure everything looks good before asking other people to review.

Previous Review Issues — All Resolved

The round-1 self-review (review #2091) identified two issues. Both are confirmed fixed in f72a981f:

dict[str, object] → dict[str, Callable[[], int]] at helper_tdd_tag_validation.py:166 — Fixed
# type: ignore[operator] removed at helper_tdd_tag_validation.py:185 — Fixed

Acceptance Criteria (Ticket #628) — All Met

Criteria	Status	Covered By
`tdd_expected_fail` + fail → PASS	✅	`listener.py:176-186`, fixture `tdd_expected_fail_fails.robot`
`tdd_expected_fail` + pass → FAIL with guidance	✅	`listener.py:187-201`, fixture `tdd_expected_fail_passes.robot`
`tdd_bug_<N>` without `tdd_bug` → validation error	✅	`listener.py:88-94`, fixture `tdd_missing_tdd_bug.robot`
`tdd_expected_fail` without prerequisites → validation error	✅	`listener.py:96-108`, fixture `tdd_expected_fail_missing_bug_n.robot`
`tdd_bug` usable as filter	✅	Inherent RF tag filtering
Normal tests unaffected	✅	`listener.py:172-174`, fixture `tdd_normal_test.robot`

Guidance message matches ticket requirement exactly.

New Finding

`end_test` bare `else` catches SKIP status incorrectly (Medium)

File: robot/tdd_expected_fail_listener.py:187

See inline comment below. Robot Framework 4.0+ supports SKIP status. The current bare else: at line 187 catches any non-FAIL status (PASS, SKIP, etc.), so a skipped tdd_expected_fail test would be incorrectly reported as "Bug appears to be fixed." Should be changed to elif result.status == "PASS": with an explicit else: for other statuses.

Likelihood: Low — TDD tests shouldn't normally be skipped. But since this is long-lived infrastructure code, defensive handling is warranted.

Items Reviewed (No Issues)

Listener logic: Correct Listener v3 API. Tag validation/result inversion split is the right pattern. _validation_errors dict with .pop() cleanup. Case-insensitive matching. Regex with fullmatch correctly distinguishes tdd_bug from tdd_bug_<N>.
Nox integration: --listener and --exclude tdd_fixture correctly added to both integration_tests and slow_integration_tests. Listener path resolved to absolute.
All 5 fixtures: Correct tags, documentation, and scenario exercise.
Integration tests (tdd_tag_validation.robot): Correct setup, variable usage, and assertions.
Helper script: Subprocess isolation, XML parsing with fallback, proper typing.
CHANGELOG: Present and accurate.
Thread/process safety: pabot = separate processes, no concurrency issues.
Security: No user-controlled input in subprocess calls.
Type safety: No # type: ignore. All functions annotated.
Test coverage: 5 integration tests covering all 6 acceptance criteria.

## Self-Review (Round 2): PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` This is a self-review to ensure everything looks good before asking other people to review. --- ### Previous Review Issues — All Resolved The round-1 self-review (review #2091) identified two issues. Both are confirmed **fixed** in `f72a981f`: 1. `dict[str, object]` → `dict[str, Callable[[], int]]` at `helper_tdd_tag_validation.py:166` — **Fixed** 2. `# type: ignore[operator]` removed at `helper_tdd_tag_validation.py:185` — **Fixed** --- ### Acceptance Criteria (Ticket #628) — All Met | Criteria | Status | Covered By | |---|---|---| | `tdd_expected_fail` + fail → PASS | ✅ | `listener.py:176-186`, fixture `tdd_expected_fail_fails.robot` | | `tdd_expected_fail` + pass → FAIL with guidance | ✅ | `listener.py:187-201`, fixture `tdd_expected_fail_passes.robot` | | `tdd_bug_<N>` without `tdd_bug` → validation error | ✅ | `listener.py:88-94`, fixture `tdd_missing_tdd_bug.robot` | | `tdd_expected_fail` without prerequisites → validation error | ✅ | `listener.py:96-108`, fixture `tdd_expected_fail_missing_bug_n.robot` | | `tdd_bug` usable as filter | ✅ | Inherent RF tag filtering | | Normal tests unaffected | ✅ | `listener.py:172-174`, fixture `tdd_normal_test.robot` | Guidance message matches ticket requirement exactly. --- ### New Finding #### `end_test` bare `else` catches SKIP status incorrectly (Medium) **File:** `robot/tdd_expected_fail_listener.py:187` See inline comment below. Robot Framework 4.0+ supports `SKIP` status. The current bare `else:` at line 187 catches any non-FAIL status (PASS, SKIP, etc.), so a skipped `tdd_expected_fail` test would be incorrectly reported as "Bug appears to be fixed." Should be changed to `elif result.status == "PASS":` with an explicit `else:` for other statuses. **Likelihood:** Low — TDD tests shouldn't normally be skipped. But since this is long-lived infrastructure code, defensive handling is warranted. --- ### Items Reviewed (No Issues) - **Listener logic**: Correct Listener v3 API. Tag validation/result inversion split is the right pattern. `_validation_errors` dict with `.pop()` cleanup. Case-insensitive matching. Regex with `fullmatch` correctly distinguishes `tdd_bug` from `tdd_bug_<N>`. - **Nox integration**: `--listener` and `--exclude tdd_fixture` correctly added to both `integration_tests` and `slow_integration_tests`. Listener path resolved to absolute. - **All 5 fixtures**: Correct tags, documentation, and scenario exercise. - **Integration tests** (`tdd_tag_validation.robot`): Correct setup, variable usage, and assertions. - **Helper script**: Subprocess isolation, XML parsing with fallback, proper typing. - **CHANGELOG**: Present and accurate. - **Thread/process safety**: pabot = separate processes, no concurrency issues. - **Security**: No user-controlled input in subprocess calls. - **Type safety**: No `# type: ignore`. All functions annotated. - **Test coverage**: 5 integration tests covering all 6 acceptance criteria.

robot/tdd_expected_fail_listener.py Outdated

						
				@@ -0,0 +184,4 @@

				        result.message = (

				            "TDD expected failure: test failed as expected (bug still exists)."

				        )

				    else:

hurui200320 commented

2026-03-10 10:01:56 +00:00

This bare else: catches any non-FAIL status, including SKIP (RF 4.0+). If a tdd_expected_fail test is skipped (e.g., Skip keyword, --skip tag, skip_on_failure), it would incorrectly be reported as "Bug appears to be fixed."

Suggestion: Change to explicit status matching:

    elif result.status == "PASS":
        # Unexpected pass — ...
        ...
    else:
        # SKIP or other statuses — leave unchanged
        _logger.debug(
            "TDD expected-fail test '%s' has status %s — leaving unchanged.",
            full_name,
            result.status,
        )

This bare `else:` catches any non-FAIL status, including `SKIP` (RF 4.0+). If a `tdd_expected_fail` test is skipped (e.g., `Skip` keyword, `--skip` tag, `skip_on_failure`), it would incorrectly be reported as "Bug appears to be fixed." **Suggestion:** Change to explicit status matching: ```python elif result.status == "PASS": # Unexpected pass — ... ... else: # SKIP or other statuses — leave unchanged _logger.debug( "TDD expected-fail test '%s' has status %s — leaving unchanged.", full_name, result.status, ) ```

hurui200320 force-pushed feature/m5-robot-tdd-tags from f72a981f54 to f528859e2c

2026-03-10 10:04:15 +00:00

Compare

hurui200320 commented

2026-03-10 10:04:25 +00:00

Self-review round 2 fix applied (force-pushed f528859e):

Fixed the bare else catching SKIP status in robot/tdd_expected_fail_listener.py (review #2093):

Changed else: → elif result.status == "PASS": at line 187 — now only catches actual passes
Added explicit else: clause (line 202) for SKIP and other statuses — logs a debug message and leaves the result unchanged
Updated end_test docstring to document the SKIP handling behavior

Rationale: Robot Framework 4.0+ supports SKIP status via Skip keyword, --skip tag, and skip_on_failure. The previous bare else: would incorrectly report a skipped tdd_expected_fail test as "Bug appears to be fixed." A skipped test is not evidence of a fix — it simply didn't execute.

Verified locally:

nox -s lint → PASS (0 violations)
nox -s typecheck → PASS (0 errors, 0 warnings)

**Self-review round 2 fix applied** (force-pushed `f528859e`): Fixed the bare `else` catching SKIP status in `robot/tdd_expected_fail_listener.py` (review #2093): 1. Changed `else:` → `elif result.status == "PASS":` at line 187 — now only catches actual passes 2. Added explicit `else:` clause (line 202) for SKIP and other statuses — logs a debug message and leaves the result unchanged 3. Updated `end_test` docstring to document the SKIP handling behavior **Rationale:** Robot Framework 4.0+ supports `SKIP` status via `Skip` keyword, `--skip` tag, and `skip_on_failure`. The previous bare `else:` would incorrectly report a skipped `tdd_expected_fail` test as "Bug appears to be fixed." A skipped test is not evidence of a fix — it simply didn't execute. Verified locally: - `nox -s lint` → PASS (0 violations) - `nox -s typecheck` → PASS (0 errors, 0 warnings)

freemo referenced this pull request

2026-03-11 05:42:11 +00:00

Implement @tdd_expected_fail tag handling in Robot Framework #628

freemo commented

2026-03-11 05:51:06 +00:00

PM Status (Day 31):

Good work on the self-review fixes @hurui200320.

Status: Merge conflict detected. 4 comments (self-review).

Action required: Rebase against develop and request external review (suggest @brent.edwards or @CoreRasurae).

Priority: HIGH — this is part of the TDD infrastructure chain (#628) that unblocks the CI quality gate.

**PM Status (Day 31)**: Good work on the self-review fixes @hurui200320. **Status**: Merge conflict detected. 4 comments (self-review). **Action required**: Rebase against `develop` and request external review (suggest @brent.edwards or @CoreRasurae). **Priority**: HIGH — this is part of the TDD infrastructure chain (#628) that unblocks the CI quality gate.

hurui200320 added a new dependency 2026-03-11 05:54:32 +00:00

#628 Implement @tdd_expected_fail tag handling in Robot Framework

hurui200320 force-pushed feature/m5-robot-tdd-tags from f528859e2c to df74aee449

2026-03-11 06:20:34 +00:00

Compare

hurui200320 referenced this pull request

2026-03-11 06:21:04 +00:00

Implement @tdd_expected_fail tag handling in Robot Framework #628

freemo added the

labels 2026-03-11 18:15:29 +00:00

freemo commented

2026-03-11 18:17:58 +00:00

PM Review — Day 31 (Specification Update)

This PR is mergeable.

Spec Alignment Check

Robot Framework TDD tag infrastructure is NOT impacted by protocol or TUI changes.

Status

Good self-review fixes (type: ignore violation, SKIP status handling)
Needs external review
HIGH priority — part of TDD infrastructure chain

Action Required

@hurui200320 — Request review from @brent.edwards or @CoreRasurae. This is HIGH priority TDD infrastructure.

## PM Review — Day 31 (Specification Update) This PR is **mergeable**. ### Spec Alignment Check Robot Framework TDD tag infrastructure is NOT impacted by protocol or TUI changes. ### Status - Good self-review fixes (type: ignore violation, SKIP status handling) - Needs external review - HIGH priority — part of TDD infrastructure chain ### Action Required @hurui200320 — Request review from @brent.edwards or @CoreRasurae. This is HIGH priority TDD infrastructure.

freemo referenced this pull request

2026-03-11 20:25:36 +00:00

Implement @tdd_expected_fail tag handling in Behave environment #627

freemo referenced this pull request

2026-03-11 20:25:40 +00:00

Implement @tdd_expected_fail tag handling in Robot Framework #628

freemo referenced this pull request

2026-03-11 20:25:46 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

freemo commented

2026-03-11 20:29:02 +00:00

PM Status — Day 31 (2026-03-11)

PR is rebased and clean (no merge conflicts). All quality gates pass. @hurui200320 posted a thorough self-review.

Needs: An external reviewer. @CoreRasurae — given your work on the Behave counterpart (#627/PR #665), you'd be the ideal reviewer for consistency. Alternatively, @brent.edwards has TDD workflow experience.

Priority: Tier 1 — this is TDD infrastructure that unblocks the bug pipeline. Please review within 1 business day.

## PM Status — Day 31 (2026-03-11) PR is rebased and clean (no merge conflicts). All quality gates pass. @hurui200320 posted a thorough self-review. **Needs:** An external reviewer. @CoreRasurae — given your work on the Behave counterpart (#627/PR #665), you'd be the ideal reviewer for consistency. Alternatively, @brent.edwards has TDD workflow experience. **Priority:** Tier 1 — this is TDD infrastructure that unblocks the bug pipeline. Please review within 1 business day.

hurui200320 force-pushed feature/m5-robot-tdd-tags from df74aee449 to c383623eb7

2026-03-12 02:38:54 +00:00

Compare

hamza.khyari commented

2026-03-12 13:08:43 +00:00

Code Review -- PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

Reviewer: hamza.khyari | Protocol: 9-phase review (2 rounds, 6 independent passes)

Summary

Well-structured implementation. The listener correctly implements all 6 acceptance criteria from issue #628. The fixture-based subprocess test architecture is sound and mirrors real-world usage. Code quality is high -- clean separation between validation (start_test) and inversion (end_test), proper regex compilation, and thorough docstrings.

Two rounds of review (4 focused passes + 2 full-protocol passes) produced the findings below. 2 are should-fix, the rest are minor or nits.

Findings

#	Severity	Category	File	Finding
1	P2:should-fix	BUG	`tdd_expected_fail_listener.py`	Double-invocation idempotency bug
2	P2:should-fix	TEST	`tdd_tag_validation.robot`	"Normal Test Unaffected" is a sham test
3	P3:nit	BUG	`helper_tdd_tag_validation.py`	`_run_fixture` does not guard against missing `output.xml`
4	P3:nit	BUG	`tdd_expected_fail_listener.py`	`end_test` reads `data.tags` instead of `result.tags`
5	P3:nit	TEST	`tdd_tag_validation.robot`	No test for SKIP status branch
6	P3:nit	TEST	`helper_tdd_tag_validation.py`	`cmd_expected_fail_inverted` never checks message content
7	P3:nit	TEST	`helper_tdd_tag_validation.py`	Weak substring assertions in validation checks
8	P3:nit	TEST	`tdd_tag_validation.robot`	No test for `tdd_expected_fail` alone (both companions missing)
9	P3:nit	SPEC	`tdd_expected_fail_listener.py`	Guidance message deviates from AC #2 exact wording
10	P3:nit	CODE	`tdd_expected_fail_listener.py`	`_validation_errors` never cleared on abnormal exit
11	P3:nit	CODE	`helper_tdd_tag_validation.py`	`_run_fixture` returns stale temp file path
12	P3:nit	CODE	`noxfile.py`	Listener path resolution is CWD-relative
13	P4:nit	CODE	`helper_tdd_tag_validation.py`	No `__all__` declaration
14	P4:nit	CODE	`tdd_expected_fail_listener.py`	Docstring says "mirroring" but messages differ
15	P4:nit	SEC	`helper_tdd_tag_validation.py`	`xml.etree.ElementTree` without `defusedxml`

1. Double-invocation idempotency bug

Severity: P2:should-fix | Category: BUG
File: robot/tdd_expected_fail_listener.py (end_test)

If the listener is loaded twice (--listener specified twice, or passed via both noxfile and user args), both invocations share the same module-level state. The first end_test inverts FAIL->PASS; the second sees PASS and inverts it back to FAIL with the "Bug appears to be fixed" guidance message -- a silent double-inversion producing the wrong result.

Recommendation: Add an idempotency guard:

_processed_tests: set[str] = set()

def end_test(data, result):
    full_name = result.full_name
    if full_name in _processed_tests:
        return
    _processed_tests.add(full_name)
    ...

Clear it in a close() hook alongside _validation_errors.

2. "Normal Test Unaffected" is a sham test

Severity: P2:should-fix | Category: TEST
File: robot/helper_tdd_tag_validation.py:133-141, robot/fixtures/tdd_normal_test.robot

The fixture contains Should Be True ${TRUE} which passes unconditionally. The helper checks status == "PASS". This test produces the same PASS result with or without the listener loaded -- even if the listener file is empty or has an import error. It does not prove the listener was active and chose not to modify the result.

Recommendation: Either (a) run the normal-test fixture in the same suite as a tdd_expected_fail fixture to prove the listener is loaded and selectively applies, or (b) check Robot's console output for the listener loading message.

3. `_run_fixture` does not guard against missing `output.xml`

Severity: P3:nit | Category: BUG
File: robot/helper_tdd_tag_validation.py:81

If robot fails to start (import error, bad listener path, syntax error), output.xml is never created. ET.parse(out_xml) raises an opaque FileNotFoundError. Since this is test infrastructure, a clear diagnostic would help debugging.

Recommendation:

proc = subprocess.run(cmd, capture_output=True, text=True)
if not Path(out_xml).exists():
    print(f"ERROR: Robot did not produce output.xml (rc={proc.returncode})", file=sys.stderr)
    print(f"stderr: {proc.stderr}", file=sys.stderr)
    sys.exit(1)

4. `end_test` reads `data.tags` instead of `result.tags`

Severity: P3:nit | Category: BUG
File: robot/tdd_expected_fail_listener.py:174

The old listener on master read from result.tags (which reflects runtime Set Tags/Remove Tags modifications). The new listener reads data.tags (static test definition). If any test dynamically adds tdd_expected_fail at runtime, the new listener won't see it. If this is intentional (TDD tags should be static), add a comment explaining the choice.

5. No test for SKIP status branch

Severity: P3:nit | Category: TEST
File: robot/tdd_expected_fail_listener.py:192-199

end_test has an explicit else branch for SKIP that leaves the result unchanged. This branch has zero test coverage. A regression would go undetected.

Recommendation: Add a fixture with [Tags] tdd_bug tdd_bug_777 tdd_expected_fail tdd_fixture that calls Skip msg=Skipping intentionally, and verify the result stays SKIP.

6. `cmd_expected_fail_inverted` never checks message content

Severity: P3:nit | Category: TEST
File: robot/helper_tdd_tag_validation.py:78-86

This function only checks status == "PASS". It does not verify the message is "TDD expected failure: test failed as expected (bug still exists).". A bug that inverts to PASS with a wrong or empty message goes undetected. By contrast, cmd_unexpected_pass_inverted correctly checks for "Bug appears to be fixed".

Recommendation: if status == "PASS" and "TDD expected failure" in message:

7. Weak substring assertions in validation checks

Severity: P3:nit | Category: TEST
File: robot/helper_tdd_tag_validation.py:100,115

"tdd_bug" in message.lower() matches any message incidentally containing the substring. "Bug appears to be fixed" in message would match a truncated message. Tighten to specific error text, e.g. "missing the required tdd_bug tag" and "Remove the tdd_expected_fail tag".

8. No test for `tdd_expected_fail` alone (both companions missing)

Severity: P3:nit | Category: TEST
File: robot/tdd_tag_validation.robot

AC #4 covers tdd_expected_fail without tdd_bug or tdd_bug_<N>. The existing fixture tdd_expected_fail_missing_bug_n.robot tests missing tdd_bug_<N> only (while tdd_bug is present). The branch where both are missing (missing: ["tdd_bug", "tdd_bug_<N>"]) is untested.

9. Guidance message deviates from AC #2 exact wording

Severity: P3:nit | Category: SPEC
File: robot/tdd_expected_fail_listener.py:183-188

AC #2 specifies: "Bug appears to be fixed. Remove the tdd_expected_fail tag and verify the fix through the bug fix workflow."

Implementation adds "from this test" and appends "See CONTRIBUTING.md > Bug Fix Workflow." -- arguably improvements, but not spec-verbatim. Also note the Behave implementation uses a different message format entirely.

10. `_validation_errors` never cleared on abnormal exit

Severity: P3:nit | Category: CODE
File: robot/tdd_expected_fail_listener.py:64

If start_test fires but end_test never fires (Robot crash, SIGINT), stale entries remain. In normal RF execution this is harmless (process exits), but a close() hook would make the listener reusable in long-lived processes:

def close() -> None:
    _validation_errors.clear()

11. `_run_fixture` returns stale temp file path

Severity: P3:nit | Category: CODE
File: robot/helper_tdd_tag_validation.py:96

Returns (status, message, out_xml) where out_xml points inside an already-deleted TemporaryDirectory. All callers discard it via _. Simplify to tuple[str, str].

12. Listener path resolution is CWD-relative

Severity: P3:nit | Category: CODE
File: noxfile.py

Path("robot/tdd_expected_fail_listener.py").resolve() resolves relative to CWD. If nox is invoked from a different directory, the path breaks. Consider Path(__file__).parent / "robot/..." instead.

13. No `all` declaration

Severity: P4:nit | Category: CODE
File: robot/helper_tdd_tag_validation.py

Other helpers in robot/ (e.g., helper_cross_plan_correction.py) define __all__: list[str] = []. This helper doesn't -- minor inconsistency.

14. Docstring says "mirroring" but messages differ

Severity: P4:nit | Category: CODE
File: robot/tdd_expected_fail_listener.py:1

Module docstring references "mirroring the Behave implementation" but the error message formats differ significantly between the two. Consider "paralleling" instead.

15. XML parsing without `defusedxml`

Severity: P4:nit | Category: SEC
File: robot/helper_tdd_tag_validation.py:63

xml.etree.ElementTree.parse() is vulnerable to billion-laughs entity expansion. The XML is self-generated by Robot Framework so the risk is negligible. Either use defusedxml or add a comment documenting the accepted risk.

Process Checks (all pass)

Commit message matches issue Metadata
ISSUES CLOSED: #628 footer present
CHANGELOG updated under Unreleased
All files belong to this feature
No generated artifacts or secrets
File sizes well under 500 lines
No # type: ignore in new source
print() usage appropriate (CLI test helper, not production code)

## Code Review -- PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` **Reviewer**: hamza.khyari | **Protocol**: 9-phase review (2 rounds, 6 independent passes) ### Summary Well-structured implementation. The listener correctly implements all 6 acceptance criteria from issue #628. The fixture-based subprocess test architecture is sound and mirrors real-world usage. Code quality is high -- clean separation between validation (`start_test`) and inversion (`end_test`), proper regex compilation, and thorough docstrings. Two rounds of review (4 focused passes + 2 full-protocol passes) produced the findings below. 2 are should-fix, the rest are minor or nits. ### Findings | # | Severity | Category | File | Finding | |---|----------|----------|------|---------| | 1 | **P2:should-fix** | BUG | `tdd_expected_fail_listener.py` | Double-invocation idempotency bug | | 2 | **P2:should-fix** | TEST | `tdd_tag_validation.robot` | "Normal Test Unaffected" is a sham test | | 3 | P3:nit | BUG | `helper_tdd_tag_validation.py` | `_run_fixture` does not guard against missing `output.xml` | | 4 | P3:nit | BUG | `tdd_expected_fail_listener.py` | `end_test` reads `data.tags` instead of `result.tags` | | 5 | P3:nit | TEST | `tdd_tag_validation.robot` | No test for SKIP status branch | | 6 | P3:nit | TEST | `helper_tdd_tag_validation.py` | `cmd_expected_fail_inverted` never checks message content | | 7 | P3:nit | TEST | `helper_tdd_tag_validation.py` | Weak substring assertions in validation checks | | 8 | P3:nit | TEST | `tdd_tag_validation.robot` | No test for `tdd_expected_fail` alone (both companions missing) | | 9 | P3:nit | SPEC | `tdd_expected_fail_listener.py` | Guidance message deviates from AC #2 exact wording | | 10 | P3:nit | CODE | `tdd_expected_fail_listener.py` | `_validation_errors` never cleared on abnormal exit | | 11 | P3:nit | CODE | `helper_tdd_tag_validation.py` | `_run_fixture` returns stale temp file path | | 12 | P3:nit | CODE | `noxfile.py` | Listener path resolution is CWD-relative | | 13 | P4:nit | CODE | `helper_tdd_tag_validation.py` | No `__all__` declaration | | 14 | P4:nit | CODE | `tdd_expected_fail_listener.py` | Docstring says "mirroring" but messages differ | | 15 | P4:nit | SEC | `helper_tdd_tag_validation.py` | `xml.etree.ElementTree` without `defusedxml` | --- #### 1. Double-invocation idempotency bug **Severity**: P2:should-fix | **Category**: BUG **File**: `robot/tdd_expected_fail_listener.py` (end_test) If the listener is loaded twice (`--listener` specified twice, or passed via both noxfile and user args), both invocations share the same module-level state. The first `end_test` inverts FAIL->PASS; the second sees PASS and inverts it back to FAIL with the "Bug appears to be fixed" guidance message -- a silent double-inversion producing the wrong result. **Recommendation**: Add an idempotency guard: ```python _processed_tests: set[str] = set() def end_test(data, result): full_name = result.full_name if full_name in _processed_tests: return _processed_tests.add(full_name) ... ``` Clear it in a `close()` hook alongside `_validation_errors`. --- #### 2. "Normal Test Unaffected" is a sham test **Severity**: P2:should-fix | **Category**: TEST **File**: `robot/helper_tdd_tag_validation.py:133-141`, `robot/fixtures/tdd_normal_test.robot` The fixture contains `Should Be True ${TRUE}` which passes unconditionally. The helper checks `status == "PASS"`. This test produces the same PASS result with or without the listener loaded -- even if the listener file is empty or has an import error. It does not prove the listener was active and chose not to modify the result. **Recommendation**: Either (a) run the normal-test fixture in the same suite as a `tdd_expected_fail` fixture to prove the listener is loaded and selectively applies, or (b) check Robot's console output for the listener loading message. --- #### 3. `_run_fixture` does not guard against missing `output.xml` **Severity**: P3:nit | **Category**: BUG **File**: `robot/helper_tdd_tag_validation.py:81` If `robot` fails to start (import error, bad listener path, syntax error), `output.xml` is never created. `ET.parse(out_xml)` raises an opaque `FileNotFoundError`. Since this is test infrastructure, a clear diagnostic would help debugging. **Recommendation**: ```python proc = subprocess.run(cmd, capture_output=True, text=True) if not Path(out_xml).exists(): print(f"ERROR: Robot did not produce output.xml (rc={proc.returncode})", file=sys.stderr) print(f"stderr: {proc.stderr}", file=sys.stderr) sys.exit(1) ``` --- #### 4. `end_test` reads `data.tags` instead of `result.tags` **Severity**: P3:nit | **Category**: BUG **File**: `robot/tdd_expected_fail_listener.py:174` The old listener on master read from `result.tags` (which reflects runtime `Set Tags`/`Remove Tags` modifications). The new listener reads `data.tags` (static test definition). If any test dynamically adds `tdd_expected_fail` at runtime, the new listener won't see it. If this is intentional (TDD tags should be static), add a comment explaining the choice. --- #### 5. No test for SKIP status branch **Severity**: P3:nit | **Category**: TEST **File**: `robot/tdd_expected_fail_listener.py:192-199` `end_test` has an explicit `else` branch for SKIP that leaves the result unchanged. This branch has zero test coverage. A regression would go undetected. **Recommendation**: Add a fixture with `[Tags] tdd_bug tdd_bug_777 tdd_expected_fail tdd_fixture` that calls `Skip msg=Skipping intentionally`, and verify the result stays SKIP. --- #### 6. `cmd_expected_fail_inverted` never checks message content **Severity**: P3:nit | **Category**: TEST **File**: `robot/helper_tdd_tag_validation.py:78-86` This function only checks `status == "PASS"`. It does not verify the message is `"TDD expected failure: test failed as expected (bug still exists)."`. A bug that inverts to PASS with a wrong or empty message goes undetected. By contrast, `cmd_unexpected_pass_inverted` correctly checks for `"Bug appears to be fixed"`. **Recommendation**: `if status == "PASS" and "TDD expected failure" in message:` --- #### 7. Weak substring assertions in validation checks **Severity**: P3:nit | **Category**: TEST **File**: `robot/helper_tdd_tag_validation.py:100,115` `"tdd_bug" in message.lower()` matches any message incidentally containing the substring. `"Bug appears to be fixed" in message` would match a truncated message. Tighten to specific error text, e.g. `"missing the required tdd_bug tag"` and `"Remove the tdd_expected_fail tag"`. --- #### 8. No test for `tdd_expected_fail` alone (both companions missing) **Severity**: P3:nit | **Category**: TEST **File**: `robot/tdd_tag_validation.robot` AC #4 covers `tdd_expected_fail` without `tdd_bug` or `tdd_bug_<N>`. The existing fixture `tdd_expected_fail_missing_bug_n.robot` tests missing `tdd_bug_<N>` only (while `tdd_bug` is present). The branch where both are missing (`missing: ["tdd_bug", "tdd_bug_<N>"]`) is untested. --- #### 9. Guidance message deviates from AC #2 exact wording **Severity**: P3:nit | **Category**: SPEC **File**: `robot/tdd_expected_fail_listener.py:183-188` AC #2 specifies: *"Bug appears to be fixed. Remove the tdd_expected_fail tag and verify the fix through the bug fix workflow."* Implementation adds *"from this test"* and appends *"See CONTRIBUTING.md > Bug Fix Workflow."* -- arguably improvements, but not spec-verbatim. Also note the Behave implementation uses a different message format entirely. --- #### 10. `_validation_errors` never cleared on abnormal exit **Severity**: P3:nit | **Category**: CODE **File**: `robot/tdd_expected_fail_listener.py:64` If `start_test` fires but `end_test` never fires (Robot crash, SIGINT), stale entries remain. In normal RF execution this is harmless (process exits), but a `close()` hook would make the listener reusable in long-lived processes: ```python def close() -> None: _validation_errors.clear() ``` --- #### 11. `_run_fixture` returns stale temp file path **Severity**: P3:nit | **Category**: CODE **File**: `robot/helper_tdd_tag_validation.py:96` Returns `(status, message, out_xml)` where `out_xml` points inside an already-deleted `TemporaryDirectory`. All callers discard it via `_`. Simplify to `tuple[str, str]`. --- #### 12. Listener path resolution is CWD-relative **Severity**: P3:nit | **Category**: CODE **File**: `noxfile.py` `Path("robot/tdd_expected_fail_listener.py").resolve()` resolves relative to CWD. If nox is invoked from a different directory, the path breaks. Consider `Path(__file__).parent / "robot/..."` instead. --- #### 13. No `__all__` declaration **Severity**: P4:nit | **Category**: CODE **File**: `robot/helper_tdd_tag_validation.py` Other helpers in `robot/` (e.g., `helper_cross_plan_correction.py`) define `__all__: list[str] = []`. This helper doesn't -- minor inconsistency. --- #### 14. Docstring says "mirroring" but messages differ **Severity**: P4:nit | **Category**: CODE **File**: `robot/tdd_expected_fail_listener.py:1` Module docstring references "mirroring the Behave implementation" but the error message formats differ significantly between the two. Consider "paralleling" instead. --- #### 15. XML parsing without `defusedxml` **Severity**: P4:nit | **Category**: SEC **File**: `robot/helper_tdd_tag_validation.py:63` `xml.etree.ElementTree.parse()` is vulnerable to billion-laughs entity expansion. The XML is self-generated by Robot Framework so the risk is negligible. Either use `defusedxml` or add a comment documenting the accepted risk. --- ### Process Checks (all pass) - Commit message matches issue Metadata - `ISSUES CLOSED: #628` footer present - CHANGELOG updated under Unreleased - All files belong to this feature - No generated artifacts or secrets - File sizes well under 500 lines - No `# type: ignore` in new source - `print()` usage appropriate (CLI test helper, not production code)

freemo commented

2026-03-12 20:46:16 +00:00

PM Status — Day 32

Status: CONFLICTED — needs rebase. HIGH PRIORITY — TDD infrastructure.

PR: Robot Framework @tdd_expected_fail tag handler. M3 (v3.2.0), severely overdue (due Feb 26). Author: @hurui200320. 8 comments, has existing reviews.

Context: This is the Robot Framework counterpart to PR #665 (Behave TDD handler). Both are needed for the mandatory TDD bug workflow. PR #665 has a supersession question pending CoreRasurae's response by Day 33.

Action Required:

@hurui200320: Rebase onto current master to resolve conflicts.
This PR needs to be coordinated with PR #665's outcome — if #665's Behave handler is confirmed to not supersede this Robot handler, both should proceed.
Labels need update: has Priority/High, State/Unverified, Type/Feature. Should be State/Verified (active work with branch). Missing MoSCoW (Must have), Points (8).

## PM Status — Day 32 **Status**: CONFLICTED — needs rebase. **HIGH PRIORITY** — TDD infrastructure. **PR**: Robot Framework `@tdd_expected_fail` tag handler. M3 (v3.2.0), severely overdue (due Feb 26). Author: @hurui200320. 8 comments, has existing reviews. **Context**: This is the Robot Framework counterpart to PR #665 (Behave TDD handler). Both are needed for the mandatory TDD bug workflow. PR #665 has a supersession question pending CoreRasurae's response by Day 33. **Action Required**: 1. **@hurui200320**: Rebase onto current master to resolve conflicts. 2. This PR needs to be coordinated with PR #665's outcome — if #665's Behave handler is confirmed to not supersede this Robot handler, both should proceed. 3. Labels need update: has Priority/High, State/Unverified, Type/Feature. Should be State/Verified (active work with branch). Missing MoSCoW (Must have), Points (8).

freemo added the

labels 2026-03-12 20:57:14 +00:00

hurui200320 force-pushed feature/m5-robot-tdd-tags from c383623eb7 to b1e45b490e

2026-03-13 07:11:28 +00:00

Compare

hurui200320 referenced this issue from a commit

2026-03-13 07:11:28 +00:00

feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

hurui200320 commented

2026-03-13 07:12:25 +00:00

Review Fixes Applied (Round 3) — Force-pushed `b1e45b49`

Rebased onto latest master (3e3e9b4b) and addressed all 15 findings from @hamza.khyari's code review.

P2 Fixes (should-fix)

1. Double-invocation idempotency bug — Added _processed_tests: set[str] module-level guard in tdd_expected_fail_listener.py. Each test is processed at most once per end_test call. Cleared in the new close() hook alongside _validation_errors.

2. Normal Test is a sham test — Rewrote cmd_normal_test_unaffected() using a new _run_multi_fixture() function that runs tdd_normal_test.robot alongside tdd_expected_fail_fails.robot in a single Robot invocation. The check now verifies both that the normal test passed AND that the expected-fail fixture was inverted to PASS (proving the listener is loaded and selectively applies).

P3 Fixes (nits)

3. Missing output.xml guard — Added Path(out_xml).exists() check after subprocess.run() with diagnostic stderr output including proc.returncode and proc.stderr.

4. data.tags vs result.tags — Added docstring in end_test documenting the intentional choice: TDD tags are declarative metadata that must be statically declared in the source file. Runtime Set Tags/Remove Tags modifications are not supported for TDD tags.

5. No SKIP test — Added tdd_expected_fail_skip.robot fixture (uses Skip keyword) and cmd_skip_status_unchanged() helper + Robot test case.

6. Missing message check — cmd_expected_fail_inverted() now asserts "TDD expected failure" in message in addition to status == "PASS".

7. Weak substring assertions — cmd_missing_tdd_bug_validation() now asserts "missing the required tdd_bug tag" instead of generic "tdd_bug" in message.lower().

8. No tdd_expected_fail-alone test — Added tdd_expected_fail_alone.robot fixture (only tdd_expected_fail tag, both tdd_bug and tdd_bug_<N> missing) and cmd_expected_fail_alone_validation() helper + Robot test case.

9. Guidance message deviation — Kept the improvements ("from this test", "See CONTRIBUTING.md > Bug Fix Workflow.") as they provide better developer guidance. The Behave implementation uses @tdd_expected_fail with the @ prefix per Behave convention; Robot uses tdd_expected_fail without prefix per Robot Framework convention.

10. _validation_errors never cleared — Added close() hook that clears both _validation_errors and _processed_tests. Called by Robot Framework when execution finishes.

11. Stale temp path return — Simplified _run_fixture() return type to tuple[str, str] (status, message). The out_xml path was always discarded by callers.

12. CWD-relative listener path — Changed all three nox sessions (integration_tests, slow_integration_tests, e2e_tests) to use Path(__file__).parent / "robot" / "tdd_expected_fail_listener.py" instead of CWD-relative resolution.

P4 Fixes (nits)

13. No __all__ — Added __all__: list[str] = [] to helper_tdd_tag_validation.py.

14. "mirroring" in docstring — Changed to "paralleling" in tdd_expected_fail_listener.py module docstring.

15. XML parsing without defusedxml — Added comment documenting the accepted risk: the XML is always self-generated by Robot Framework, never from untrusted input.

Additional Fix

Increased timeout from 30s to 120s for CLI Plan Tree Displays Subplan Hierarchy in m4_e2e_verification.robot — pre-existing timeout failure unrelated to this feature (3 subprocess invocations + full Alembic migrations exceed 30s).

Quality Gates

Gate	Result
`nox -s lint`	PASS
`nox -s typecheck`	PASS (0 errors)
`nox -s unit_tests`	PASS (10,700 scenarios)
`nox -s integration_tests`	PASS (1,505 tests)
`nox -s coverage_report`	PASS (97.9% >= 97%)
`nox -s benchmark`	PASS
`nox -s docs`	PASS
`nox -s build`	PASS
`nox -s security_scan`	PASS
`nox -s dead_code`	PASS

## Review Fixes Applied (Round 3) — Force-pushed `b1e45b49` Rebased onto latest master (`3e3e9b4b`) and addressed all 15 findings from @hamza.khyari's code review. ### P2 Fixes (should-fix) **1. Double-invocation idempotency bug** — Added `_processed_tests: set[str]` module-level guard in `tdd_expected_fail_listener.py`. Each test is processed at most once per `end_test` call. Cleared in the new `close()` hook alongside `_validation_errors`. **2. Normal Test is a sham test** — Rewrote `cmd_normal_test_unaffected()` using a new `_run_multi_fixture()` function that runs `tdd_normal_test.robot` alongside `tdd_expected_fail_fails.robot` in a single Robot invocation. The check now verifies both that the normal test passed AND that the expected-fail fixture was inverted to PASS (proving the listener is loaded and selectively applies). ### P3 Fixes (nits) **3. Missing output.xml guard** — Added `Path(out_xml).exists()` check after `subprocess.run()` with diagnostic stderr output including `proc.returncode` and `proc.stderr`. **4. `data.tags` vs `result.tags`** — Added docstring in `end_test` documenting the intentional choice: TDD tags are declarative metadata that must be statically declared in the source file. Runtime `Set Tags`/`Remove Tags` modifications are not supported for TDD tags. **5. No SKIP test** — Added `tdd_expected_fail_skip.robot` fixture (uses `Skip` keyword) and `cmd_skip_status_unchanged()` helper + Robot test case. **6. Missing message check** — `cmd_expected_fail_inverted()` now asserts `"TDD expected failure" in message` in addition to `status == "PASS"`. **7. Weak substring assertions** — `cmd_missing_tdd_bug_validation()` now asserts `"missing the required tdd_bug tag"` instead of generic `"tdd_bug" in message.lower()`. **8. No tdd_expected_fail-alone test** — Added `tdd_expected_fail_alone.robot` fixture (only `tdd_expected_fail` tag, both `tdd_bug` and `tdd_bug_<N>` missing) and `cmd_expected_fail_alone_validation()` helper + Robot test case. **9. Guidance message deviation** — Kept the improvements ("from this test", "See CONTRIBUTING.md > Bug Fix Workflow.") as they provide better developer guidance. The Behave implementation uses `@tdd_expected_fail` with the `@` prefix per Behave convention; Robot uses `tdd_expected_fail` without prefix per Robot Framework convention. **10. `_validation_errors` never cleared** — Added `close()` hook that clears both `_validation_errors` and `_processed_tests`. Called by Robot Framework when execution finishes. **11. Stale temp path return** — Simplified `_run_fixture()` return type to `tuple[str, str]` (status, message). The `out_xml` path was always discarded by callers. **12. CWD-relative listener path** — Changed all three nox sessions (`integration_tests`, `slow_integration_tests`, `e2e_tests`) to use `Path(__file__).parent / "robot" / "tdd_expected_fail_listener.py"` instead of CWD-relative resolution. ### P4 Fixes (nits) **13. No `__all__`** — Added `__all__: list[str] = []` to `helper_tdd_tag_validation.py`. **14. "mirroring" in docstring** — Changed to "paralleling" in `tdd_expected_fail_listener.py` module docstring. **15. XML parsing without defusedxml** — Added comment documenting the accepted risk: the XML is always self-generated by Robot Framework, never from untrusted input. ### Additional Fix Increased `timeout` from `30s` to `120s` for `CLI Plan Tree Displays Subplan Hierarchy` in `m4_e2e_verification.robot` — pre-existing timeout failure unrelated to this feature (3 subprocess invocations + full Alembic migrations exceed 30s). ### Quality Gates | Gate | Result | |------|--------| | `nox -s lint` | PASS | | `nox -s typecheck` | PASS (0 errors) | | `nox -s unit_tests` | PASS (10,700 scenarios) | | `nox -s integration_tests` | PASS (1,505 tests) | | `nox -s coverage_report` | PASS (97.9% >= 97%) | | `nox -s benchmark` | PASS | | `nox -s docs` | PASS | | `nox -s build` | PASS | | `nox -s security_scan` | PASS | | `nox -s dead_code` | PASS |

hurui200320 requested review from CoreRasurae 2026-03-13 07:13:34 +00:00

hurui200320 requested review from brent.edwards 2026-03-13 07:13:35 +00:00

hurui200320 referenced this issue from a commit

2026-03-13 07:28:50 +00:00

feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

hurui200320 force-pushed feature/m5-robot-tdd-tags from b1e45b490e to d615276349

2026-03-13 07:28:50 +00:00

Compare

hamza.khyari requested changes 2026-03-13 14:30:06 +00:00

hamza.khyari left a comment

Code Review — PR #673

Verdict: REQUEST_CHANGES

Critical

C-1: PR bundles 7+ issues across 4+ milestones — must be split

Category: PROCESS

This PR contains 12 non-merge commits closing 7+ distinct issues spanning M3, M4, M5, and M6:

Concern	Commits	Issues
`tdd_expected_fail` Robot listener	`d6152763`	#628
`tdd_expected_fail` Behave hooks	`c8cd7eab`	#627
AuditService EventBus wiring	`3e3e9b4b`	#581
Session DI bug fixes	`e732c329`, `61ba4370`	#554, #570, #680
E2E test infrastructure	`d4a1a0d8`	#740
M4/M5 E2E acceptance	`317d0152`, `eb770643`	#495, #496
E2E mock-only conversion	`5e625b22`, `6806ef36`	#658, #697

CONTRIBUTING.md: "One Epic scope per PR." This needs at minimum 5 separate PRs.

C-2: M4 CLI integration tests use mocks — violates CONTRIBUTING.md

Category: TEST
Files: robot/helper_m4_e2e_cli.py, robot/helper_m4_e2e_cli_errors.py

These Robot helpers use unittest.mock.patch and CliRunner (52+ mock calls in cli.py, 25+ in cli_errors.py). CONTRIBUTING.md line 569: "Mocks are strictly prohibited in integration tests."

M1, M2, M3, and M6 were correctly converted to real subprocess invocations in commit 5e625b22, but M4 CLI tests were not. The TDD mock-only checker doesn't scan these sub-modules because _HELPER_FILES only lists the dispatcher files — so the mocks evade detection.

Fix: Convert to run_cli() from helper_e2e_common.py.

High

H-1: `session_service` is `providers.Factory` — creates new engine per call

Category: BUG
File: src/cleveragents/application/container.py:507-512

_build_session_service creates a new SQLAlchemy engine, inspects tables, and creates a sessionmaker on every invocation. CLI mitigates via a module-level cache, but any non-CLI consumer calling container.session_service() multiple times leaks engines.

Fix: Change to providers.Singleton.

H-2: Commit `9c8dc9ed` violates Conventional Commits

Category: PROCESS

Subject Docs: Daily update to timeline — wrong format, no body, no ISSUES CLOSED: footer. Unrelated to the feature.

H-3: Commit `94d95324` missing `ISSUES CLOSED:` footer

Category: PROCESS

Subject fix(test): correct mock wiring in plan_correct_tree_wiring Robot helper — no issue reference in footer. CONTRIBUTING.md requires every commit to reference its issue.

Medium

M-1: `CorrectionService` created ad-hoc in `plan.py`, bypasses container Singleton

Category: BUG
File: src/cleveragents/cli/commands/plan.py:2421

Container has a correction_service Singleton provider, but plan.py constructs its own instance — loses state between calls.

Fix: svc = container.correction_service()

M-2: `AuditService._ensure_session()` not thread-safe

Category: BUG
File: src/cleveragents/application/services/audit_service.py:131-139

Lazy init with no lock. As a Singleton, concurrent threads can race through _ensure_session, creating duplicate engines.

Fix: Add threading.Lock around initialization.

M-3: Incomplete `entity_deleted` audit coverage vs spec

Category: SPEC

Spec requires entity_deleted for all agents <entity> delete/remove. Currently emitted by ActorService, ProjectService, SessionService. Missing: SkillService.remove_skill(), tool removal, validation detachment, resource removal. No emitters include "cascade effects" as spec requires.

M-4: `_suppress_structlog_stdout` / `_restore_structlog` duplicated 3 times with incomplete restore

Category: CODE
Files: tdd_session_shared_steps.py:27-66, session_list_error_steps.py:60-88, tdd_session_list_missing_db_steps.py:34-63

Three identical copies. The restore function doesn't restore logger_factory / wrapper_class — only cache_logger_on_first_use. In serial test runs, this permanently mutates structlog config.

Fix: Extract to shared utility, restore the full config.

M-5: `_make_mock_scenario` duplicated between Robot helper and Behave steps

Category: CODE
Files: robot/helper_tdd_tag_validation.py:98-125, features/steps/tdd_tag_validation_steps.py:95-136

Nearly identical mock builder. Extract to features/mocks/tdd_test_helpers.py.

M-6: `cleanup_workspace()` mutates `os.environ` — not parallelism-safe

Category: BUG
File: robot/helper_e2e_common.py:106-109

Direct os.environ mutation is unsafe under pabot. Pass env vars via run_cli() env parameter instead.

M-7: XML message extraction may miss RF status message attribute

Category: BUG
File: robot/helper_tdd_tag_validation.py:379-385

Reads status_el.text but doesn't check status_el.get("message"). RF 7.3+ may use either.

Fix: Add fallback: if not message: message = status_el.get("message", "")

Low

L-1: _fail() uses raise SystemExit(1) vs sys.exit(1) — inconsistent (helper_m4_e2e_common.py:70, helper_e2e_common.py:112)
L-2: Three separate CONFIG_CHANGED audit events for one server connect operation (server.py:123-125)
L-3: ConfigService() in _resolve_auto_reindex constructed without event_bus (container.py:256)
L-4: Silent audit subscriber deferral — events lost before lazy resolution (container.py:636-643)
L-5: Stale feature descriptions in session TDD features — still say "fails due to" after bug fix (tdd_session_create_di.feature:3-6, tdd_session_list_di.feature:3-6, tdd_session_list_missing_db.feature:5-6)
L-6: Listener docstring says "longname" but code uses full_name (tdd_expected_fail_listener.py:40,73)
L-7: Listener error messages omit @ prefix, diverging from Behave counterpart (tdd_expected_fail_listener.py:105-121)

Nits

helper_tdd_tag_validation.py at 637 lines, helper_m3_e2e_verification.py at 901 lines, helper_m6_e2e_verification.py at 863 lines exceed 500-line limit — should be split
Tags parsed twice per test in listener (start_test + end_test) — negligible overhead
Missing edge case fixture: tdd_bug alone without tdd_bug_N or tdd_expected_fail
_validate_tdd_tags returns string vs Behave's ValueError — undocumented design choice
Type/Feature label inaccurate for a PR containing bug fixes, tests, observability, and docs

## Code Review — PR #673 **Verdict**: REQUEST_CHANGES --- ### Critical #### C-1: PR bundles 7+ issues across 4+ milestones — must be split **Category**: PROCESS This PR contains 12 non-merge commits closing 7+ distinct issues spanning M3, M4, M5, and M6: | Concern | Commits | Issues | |---------|---------|--------| | `tdd_expected_fail` Robot listener | `d6152763` | #628 | | `tdd_expected_fail` Behave hooks | `c8cd7eab` | #627 | | AuditService EventBus wiring | `3e3e9b4b` | #581 | | Session DI bug fixes | `e732c329`, `61ba4370` | #554, #570, #680 | | E2E test infrastructure | `d4a1a0d8` | #740 | | M4/M5 E2E acceptance | `317d0152`, `eb770643` | #495, #496 | | E2E mock-only conversion | `5e625b22`, `6806ef36` | #658, #697 | CONTRIBUTING.md: "One Epic scope per PR." This needs at minimum 5 separate PRs. --- #### C-2: M4 CLI integration tests use mocks — violates CONTRIBUTING.md **Category**: TEST **Files**: `robot/helper_m4_e2e_cli.py`, `robot/helper_m4_e2e_cli_errors.py` These Robot helpers use `unittest.mock.patch` and `CliRunner` (52+ mock calls in cli.py, 25+ in cli_errors.py). CONTRIBUTING.md line 569: "Mocks are strictly prohibited in integration tests." M1, M2, M3, and M6 were correctly converted to real subprocess invocations in commit `5e625b22`, but M4 CLI tests were not. The TDD mock-only checker doesn't scan these sub-modules because `_HELPER_FILES` only lists the dispatcher files — so the mocks evade detection. **Fix**: Convert to `run_cli()` from `helper_e2e_common.py`. --- ### High #### H-1: `session_service` is `providers.Factory` — creates new engine per call **Category**: BUG **File**: `src/cleveragents/application/container.py:507-512` `_build_session_service` creates a new SQLAlchemy engine, inspects tables, and creates a sessionmaker on every invocation. CLI mitigates via a module-level cache, but any non-CLI consumer calling `container.session_service()` multiple times leaks engines. **Fix**: Change to `providers.Singleton`. --- #### H-2: Commit `9c8dc9ed` violates Conventional Commits **Category**: PROCESS Subject `Docs: Daily update to timeline` — wrong format, no body, no `ISSUES CLOSED:` footer. Unrelated to the feature. --- #### H-3: Commit `94d95324` missing `ISSUES CLOSED:` footer **Category**: PROCESS Subject `fix(test): correct mock wiring in plan_correct_tree_wiring Robot helper` — no issue reference in footer. CONTRIBUTING.md requires every commit to reference its issue. --- ### Medium #### M-1: `CorrectionService` created ad-hoc in `plan.py`, bypasses container Singleton **Category**: BUG **File**: `src/cleveragents/cli/commands/plan.py:2421` Container has a `correction_service` Singleton provider, but `plan.py` constructs its own instance — loses state between calls. **Fix**: `svc = container.correction_service()` --- #### M-2: `AuditService._ensure_session()` not thread-safe **Category**: BUG **File**: `src/cleveragents/application/services/audit_service.py:131-139` Lazy init with no lock. As a Singleton, concurrent threads can race through `_ensure_session`, creating duplicate engines. **Fix**: Add `threading.Lock` around initialization. --- #### M-3: Incomplete `entity_deleted` audit coverage vs spec **Category**: SPEC Spec requires `entity_deleted` for all `agents <entity> delete/remove`. Currently emitted by `ActorService`, `ProjectService`, `SessionService`. Missing: `SkillService.remove_skill()`, tool removal, validation detachment, resource removal. No emitters include "cascade effects" as spec requires. --- #### M-4: `_suppress_structlog_stdout` / `_restore_structlog` duplicated 3 times with incomplete restore **Category**: CODE **Files**: `tdd_session_shared_steps.py:27-66`, `session_list_error_steps.py:60-88`, `tdd_session_list_missing_db_steps.py:34-63` Three identical copies. The restore function doesn't restore `logger_factory` / `wrapper_class` — only `cache_logger_on_first_use`. In serial test runs, this permanently mutates structlog config. **Fix**: Extract to shared utility, restore the full config. --- #### M-5: `_make_mock_scenario` duplicated between Robot helper and Behave steps **Category**: CODE **Files**: `robot/helper_tdd_tag_validation.py:98-125`, `features/steps/tdd_tag_validation_steps.py:95-136` Nearly identical mock builder. Extract to `features/mocks/tdd_test_helpers.py`. --- #### M-6: `cleanup_workspace()` mutates `os.environ` — not parallelism-safe **Category**: BUG **File**: `robot/helper_e2e_common.py:106-109` Direct `os.environ` mutation is unsafe under pabot. Pass env vars via `run_cli()` env parameter instead. --- #### M-7: XML message extraction may miss RF status message attribute **Category**: BUG **File**: `robot/helper_tdd_tag_validation.py:379-385` Reads `status_el.text` but doesn't check `status_el.get("message")`. RF 7.3+ may use either. **Fix**: Add fallback: `if not message: message = status_el.get("message", "")` --- ### Low - **L-1**: `_fail()` uses `raise SystemExit(1)` vs `sys.exit(1)` — inconsistent (`helper_m4_e2e_common.py:70`, `helper_e2e_common.py:112`) - **L-2**: Three separate `CONFIG_CHANGED` audit events for one `server connect` operation (`server.py:123-125`) - **L-3**: `ConfigService()` in `_resolve_auto_reindex` constructed without `event_bus` (`container.py:256`) - **L-4**: Silent audit subscriber deferral — events lost before lazy resolution (`container.py:636-643`) - **L-5**: Stale feature descriptions in session TDD features — still say "fails due to" after bug fix (`tdd_session_create_di.feature:3-6`, `tdd_session_list_di.feature:3-6`, `tdd_session_list_missing_db.feature:5-6`) - **L-6**: Listener docstring says "longname" but code uses `full_name` (`tdd_expected_fail_listener.py:40,73`) - **L-7**: Listener error messages omit `@` prefix, diverging from Behave counterpart (`tdd_expected_fail_listener.py:105-121`) ### Nits - `helper_tdd_tag_validation.py` at 637 lines, `helper_m3_e2e_verification.py` at 901 lines, `helper_m6_e2e_verification.py` at 863 lines exceed 500-line limit — should be split - Tags parsed twice per test in listener (`start_test` + `end_test`) — negligible overhead - Missing edge case fixture: `tdd_bug` alone without `tdd_bug_N` or `tdd_expected_fail` - `_validate_tdd_tags` returns string vs Behave's `ValueError` — undocumented design choice - `Type/Feature` label inaccurate for a PR containing bug fixes, tests, observability, and docs

freemo commented

2026-03-13 21:36:23 +00:00

PM Triage — Day 33 (2026-03-13)

@hamza.khyari posted a formal REQUEST_CHANGES review with findings across multiple severity levels. @hurui200320's Round 3 fixes addressed the 15 findings from hamza's earlier comment review, but the formal review (C-1, C-2, H-1 through H-3, M-1 through M-7) contains additional findings that need resolution.

Critical Assessment

C-1 (PR bundles 7+ issues — must split): This is the most significant finding. The PR description confirms 12 non-merge commits spanning #628, #627, #581, #554, #570, #680, #740, #495, #496, #658, #697 across M3–M6. CONTRIBUTING.md does require one Epic scope per PR.

PM ruling on C-1: At Day 33, splitting this PR into 5+ PRs would create significant risk and delay TDD infrastructure that 5 Critical TDD issues (#838–#842) now depend on. Recommended path forward:

Extract the core TDD deliverable (commits for #628 — the Robot listener, fixtures, nox integration, helper) into a clean single-issue PR if feasible within 4 hours
If extraction is impractical due to commit entanglement, accept the PR as-is with a follow-up issue to track the process debt
The non-#628 changes should have their own PRs filed as follow-up

C-2 (M4 CLI tests use mocks): Valid — unittest.mock.patch in integration tests violates CONTRIBUTING.md line 569. However, this is pre-existing code from other commits, not the TDD listener feature. Non-blocking for the TDD merge but must be filed as a separate issue.

Blocking vs Non-Blocking

Finding	Blocking?	Rationale
C-1	Conditional	See ruling above
C-2	No	Pre-existing code from other commits
H-1	No	`session_service` Factory bug is pre-existing, not introduced by TDD feature
H-2	No	Commit format issue — can't be fixed retroactively without rewrite
H-3	No	Same as H-2
M-1 through M-7	No	All either pre-existing or non-TDD-listener code

Action Required

@hurui200320: Please respond to hamza's formal REQUEST_CHANGES review within 24 hours. Specifically:

Acknowledge C-1 — state whether the TDD listener commits can be cleanly extracted into a standalone PR, or whether the commits are too entangled
Acknowledge C-2 — agree this needs a follow-up issue (will be filed by PM)
For H-1 through M-7 — indicate which are addressed by Round 3 fixes vs. which are new findings from the formal review

@hamza.khyari: Excellent review depth. Once Rui responds to C-1, would you be willing to re-review if the TDD-only subset is extracted? Your review of the listener code itself (findings 1–15 from your earlier comment) has been fully addressed per Round 3.

Priority: This remains Tier 1 — TDD infrastructure. Issues #838, #839, #840, #841, #842 (all Critical/Must have) are blocked until the TDD tag handler is on master.

## PM Triage — Day 33 (2026-03-13) @hamza.khyari posted a formal REQUEST_CHANGES review with findings across multiple severity levels. @hurui200320's Round 3 fixes addressed the 15 findings from hamza's earlier comment review, but the formal review (C-1, C-2, H-1 through H-3, M-1 through M-7) contains **additional findings** that need resolution. ### Critical Assessment **C-1 (PR bundles 7+ issues — must split)**: This is the most significant finding. The PR description confirms 12 non-merge commits spanning #628, #627, #581, #554, #570, #680, #740, #495, #496, #658, #697 across M3–M6. CONTRIBUTING.md does require one Epic scope per PR. **PM ruling on C-1**: At Day 33, splitting this PR into 5+ PRs would create significant risk and delay TDD infrastructure that 5 Critical TDD issues (#838–#842) now depend on. **Recommended path forward:** 1. **Extract the core TDD deliverable** (commits for #628 — the Robot listener, fixtures, nox integration, helper) into a clean single-issue PR if feasible within 4 hours 2. If extraction is impractical due to commit entanglement, **accept the PR as-is** with a follow-up issue to track the process debt 3. The non-#628 changes should have their own PRs filed as follow-up **C-2 (M4 CLI tests use mocks)**: Valid — `unittest.mock.patch` in integration tests violates CONTRIBUTING.md line 569. However, this is pre-existing code from other commits, not the TDD listener feature. **Non-blocking for the TDD merge** but must be filed as a separate issue. ### Blocking vs Non-Blocking | Finding | Blocking? | Rationale | |---------|-----------|-----------| | C-1 | Conditional | See ruling above | | C-2 | No | Pre-existing code from other commits | | H-1 | No | `session_service` Factory bug is pre-existing, not introduced by TDD feature | | H-2 | No | Commit format issue — can't be fixed retroactively without rewrite | | H-3 | No | Same as H-2 | | M-1 through M-7 | No | All either pre-existing or non-TDD-listener code | ### Action Required **@hurui200320**: Please respond to hamza's formal REQUEST_CHANGES review within 24 hours. Specifically: 1. Acknowledge C-1 — state whether the TDD listener commits can be cleanly extracted into a standalone PR, or whether the commits are too entangled 2. Acknowledge C-2 — agree this needs a follow-up issue (will be filed by PM) 3. For H-1 through M-7 — indicate which are addressed by Round 3 fixes vs. which are new findings from the formal review **@hamza.khyari**: Excellent review depth. Once Rui responds to C-1, would you be willing to re-review if the TDD-only subset is extracted? Your review of the listener code itself (findings 1–15 from your earlier comment) has been fully addressed per Round 3. **Priority**: This remains **Tier 1 — TDD infrastructure**. Issues #838, #839, #840, #841, #842 (all Critical/Must have) are blocked until the TDD tag handler is on master.

freemo commented

2026-03-14 04:23:42 +00:00

PM Status Update — Day 34

This PR is Tier 1 priority — it unblocks 5 Critical bug TDD issues (#838-#842). The technical quality is solid (all 15 of Hamza's findings addressed, 10 quality gates green).

Blocking item: Hamza's REQUEST_CHANGES (C-1 — scope bundling)
The PR bundles 7+ issues across M3-M6 milestones. CONTRIBUTING.md requires one-Epic-scope-per-PR. This is the single thing preventing merge.

PM ruling on C-1 (reaffirming Day 33 position):
Given we are Day 34 and this blocks 5 Critical bugs, splitting is not justified if it takes more than ~4 hours of work. The priority is unblocking the TDD pipeline.

@hurui200320 — Please respond to Hamza's formal review within 24 hours:

Can the TDD listener commits (#628) be cleanly cherry-picked into a standalone PR? If yes and doable in <4 hours, do it.
If not feasible due to commit entanglement, document that in a comment and we will accept as-is with a process-debt follow-up issue.
C-2 (M4 CLI tests using unittest.mock.patch): File a follow-up issue and note the issue number here.

@hamza.khyari — Given the context (5 Critical bugs blocked), would you accept the PR if Rui addresses C-1 with either approach above? Your review quality is outstanding, but the process-purity vs. schedule tradeoff favors merge here.

Label update: State should be State/Verified, not State/Unverified.

**PM Status Update — Day 34** This PR is **Tier 1 priority** — it unblocks 5 Critical bug TDD issues (#838-#842). The technical quality is solid (all 15 of Hamza's findings addressed, 10 quality gates green). **Blocking item: Hamza's REQUEST_CHANGES (C-1 — scope bundling)** The PR bundles 7+ issues across M3-M6 milestones. CONTRIBUTING.md requires one-Epic-scope-per-PR. This is the single thing preventing merge. **PM ruling on C-1 (reaffirming Day 33 position):** Given we are Day 34 and this blocks 5 Critical bugs, splitting is not justified if it takes more than ~4 hours of work. The priority is unblocking the TDD pipeline. **@hurui200320** — Please respond to Hamza's formal review within 24 hours: 1. **Can the TDD listener commits (#628) be cleanly cherry-picked** into a standalone PR? If yes and doable in <4 hours, do it. 2. If not feasible due to commit entanglement, document that in a comment and we will accept as-is with a process-debt follow-up issue. 3. **C-2** (M4 CLI tests using `unittest.mock.patch`): File a follow-up issue and note the issue number here. **@hamza.khyari** — Given the context (5 Critical bugs blocked), would you accept the PR if Rui addresses C-1 with either approach above? Your review quality is outstanding, but the process-purity vs. schedule tradeoff favors merge here. **Label update:** State should be `State/Verified`, not `State/Unverified`.

freemo reviewed 2026-03-14 21:51:13 +00:00

freemo left a comment

PM Review — Day 34: Scope Bundling Ruling (C-1)

@hurui200320 @hamza.khyari — This PR has been in review for 4 days with 13 comments. All 15 technical findings from Hamza's review have been addressed by Rui. The remaining open discussion is the C-1 scope bundling concern (Hamza's REQUEST_CHANGES): whether the Behave integration tests and Robot handler should be split into separate PRs.

PM Ruling

Accept as-is with a process-debt follow-up. Rationale:

Critical path impact: This PR blocks 5 Critical TDD bug pipelines (#838, #839, #840, #841, #842). Every day of delay cascades to 5 bug fixes.
All technical findings resolved: The code quality is verified — 15/15 review findings addressed, all nox quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage at 97.9%).
Splitting cost exceeds benefit: The Behave integration tests and Robot handler share test infrastructure (helper_tdd_tag_validation.py). Splitting would require duplicating this infrastructure or creating a dependency chain between 2 PRs — adding complexity, not reducing it.
Time constraint: If splitting takes >4 hours of developer time, it's a net schedule loss given the 5 blocked Critical bugs.

Required Action

@hamza.khyari: Please convert your REQUEST_CHANGES to APPROVED (or COMMENT). If you have remaining concerns about the scope bundling, please open a follow-up issue to track the process-debt item — we will address it in the continuous improvement track.
@hurui200320: PR is mergeable. Once Hamza's review state is resolved, this can go to Jeff for merge.

PR Status

Mergeable: Yes
Quality gates: All pass
Reviews: 15/15 findings addressed
Blocking: 5 Critical TDD issues (#838-#842)
Recommendation: Merge at earliest opportunity

PM review — Day 34 schedule adherence

## PM Review — Day 34: Scope Bundling Ruling (C-1) @hurui200320 @hamza.khyari — This PR has been in review for 4 days with 13 comments. All 15 technical findings from Hamza's review have been addressed by Rui. The remaining open discussion is the **C-1 scope bundling concern** (Hamza's REQUEST_CHANGES): whether the Behave integration tests and Robot handler should be split into separate PRs. ### PM Ruling **Accept as-is with a process-debt follow-up.** Rationale: 1. **Critical path impact**: This PR blocks 5 Critical TDD bug pipelines (#838, #839, #840, #841, #842). Every day of delay cascades to 5 bug fixes. 2. **All technical findings resolved**: The code quality is verified — 15/15 review findings addressed, all nox quality gates pass (lint, typecheck, unit_tests, integration_tests, coverage at 97.9%). 3. **Splitting cost exceeds benefit**: The Behave integration tests and Robot handler share test infrastructure (`helper_tdd_tag_validation.py`). Splitting would require duplicating this infrastructure or creating a dependency chain between 2 PRs — adding complexity, not reducing it. 4. **Time constraint**: If splitting takes >4 hours of developer time, it's a net schedule loss given the 5 blocked Critical bugs. ### Required Action - **@hamza.khyari**: Please convert your REQUEST_CHANGES to APPROVED (or COMMENT). If you have remaining concerns about the scope bundling, please open a follow-up issue to track the process-debt item — we will address it in the continuous improvement track. - **@hurui200320**: PR is mergeable. Once Hamza's review state is resolved, this can go to Jeff for merge. ### PR Status - Mergeable: **Yes** - Quality gates: **All pass** - Reviews: 15/15 findings addressed - Blocking: **5 Critical TDD issues (#838-#842)** - Recommendation: **Merge at earliest opportunity** --- *PM review — Day 34 schedule adherence*

hurui200320 force-pushed feature/m5-robot-tdd-tags from d615276349 to c7bdb62b73

2026-03-16 06:18:11 +00:00

Compare

hurui200320 referenced this issue from a commit

2026-03-16 06:18:11 +00:00

feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

hurui200320 commented

2026-03-16 06:18:57 +00:00

Review Response (Round 4) — Force-pushed `c7bdb62b`

Rebased onto latest master (3b6b1d2) and addressed Hamza's formal REQUEST_CHANGES review (review #2190). After rebase, this PR contains a single commit (c7bdb62b) implementing only #628 — all other commits from the previous branch history are now on master.

C-1 and C-2: Resolved by rebase

After rebasing onto latest master, this PR now contains only the #628 commit — 16 files changed, 1004 insertions, 168 deletions. The scope-bundling concern (C-1) and mock-in-integration-tests concern (C-2) are no longer applicable because those commits are already on master.

Findings addressed in this round

#	Severity	Finding	Resolution
M-5	Medium	`_make_mock_scenario` duplicated between Robot helper and Behave steps	Fixed: Extracted to `features/mocks/tdd_test_helpers.py` as `make_mock_scenario()`. Both `robot/helper_tdd_tag_validation.py` and `features/steps/tdd_tag_validation_steps.py` now import from the shared module. Updated `features/mocks/__init__.py` to export it.
M-7	Medium	XML message extraction may miss RF status message attribute	Fixed: Added `status_el.get("message", "")` fallback in both `_run_fixture()` and `_run_multi_fixture()`. Now checks element text → `message` attribute (RF 7.3+) → child `<msg>` element.
L-6	Low	Listener docstring says "longname" but code uses `full_name`	Fixed: Changed "keyed by test longname" to "keyed by test `full_name`" in the Implementation Notes docstring.
L-7	Low	Listener error messages omit `@` prefix, diverging from Behave counterpart	Documented: Added comment in `_validate_tdd_tags` explaining the intentional divergence — Robot Framework tags don't use the `@` prefix, while Behave does. Each framework's error messages use its own convention.
Nit	-	Missing edge case fixture: `tdd_bug` alone without `tdd_bug_N` or `tdd_expected_fail`	Fixed: Added `robot/fixtures/tdd_bug_alone.robot` fixture, `cmd_tdd_bug_alone_valid()` helper command, and "TDD Bug Tag Alone Is Valid" test case. Total Robot-side integration tests: 9.
Nit	-	`_validate_tdd_tags` returns string vs Behave's `ValueError` — undocumented design choice	Documented: Added design note in `_validate_tdd_tags` docstring explaining why `str

Findings now out of scope (resolved by rebase)

After rebasing, the following findings are no longer part of this PR's diff — they exist on master from previously-merged commits:

#	Finding	Status
H-1	`session_service` Factory → Singleton	Master code (from #554/#570/#680). Recommend filing a separate issue.
H-2	Commit `9c8dc9ed` violates Conventional Commits	Historical commit, now on master.
H-3	Commit `94d95324` missing `ISSUES CLOSED:` footer	Historical commit, now on master.
M-1	`CorrectionService` bypasses container Singleton	Master code (from plan.py).
M-2	`AuditService._ensure_session()` not thread-safe	Master code (from #581).
M-3	Incomplete `entity_deleted` audit coverage	Spec gap, not introduced by #628.
M-4	`_suppress_structlog_stdout` duplicated 3 times	Master code (from #554/#570/#680).
M-6	`cleanup_workspace()` mutates `os.environ`	Master code (from `helper_e2e_common.py`).
L-1	`_fail()` uses `raise SystemExit(1)` vs `sys.exit(1)`	Master code.
L-2	Three separate `CONFIG_CHANGED` audit events	Master code.
L-3	`ConfigService()` without `event_bus`	Master code.
L-4	Silent audit subscriber deferral	Master code.
L-5	Stale feature descriptions in session TDD features	Master code (from #627).
Nit	`Type/Feature` label inaccurate for multi-type PR	No longer applicable — PR is now single-issue (#628), purely `Type/Feature`.
Nit	`helper_m3_e2e_verification.py` / `helper_m6_e2e_verification.py` line counts	Master code, not part of #628.

I recommend @hamza.khyari or PM file separate issues for H-1, M-1, M-2, M-3, M-4, and M-6 to track these as independent follow-ups.

Quality Gates (post-rebase + review fixes)

Gate	Result
`nox -s lint`	PASS (0 violations)
`nox -s typecheck`	PASS (0 errors, 1 pre-existing warning)
`nox -s unit_tests`	PASS (10,815 scenarios, 381 features)
`nox -s integration_tests`	PASS (1,519 tests)
`nox -s coverage_report`	PASS (97% >= 97% threshold)

PR scope after rebase

This PR now contains a single commit for a single issue (#628):

8 new fixture .robot files (including tdd_bug_alone.robot)
1 new shared mock helper (features/mocks/tdd_test_helpers.py)
Listener (robot/tdd_expected_fail_listener.py) — modified
Helper (robot/helper_tdd_tag_validation.py) — modified
Test suite (robot/tdd_tag_validation.robot) — modified
Nox config — modified (listener path + tdd_fixture exclude)
CHANGELOG — modified
features/mocks/__init__.py — modified (export shared helper)
features/steps/tdd_tag_validation_steps.py — modified (use shared helper)

## Review Response (Round 4) — Force-pushed `c7bdb62b` Rebased onto latest master (`3b6b1d2`) and addressed Hamza's formal REQUEST_CHANGES review (review #2190). After rebase, this PR contains a **single commit** (`c7bdb62b`) implementing only #628 — all other commits from the previous branch history are now on master. ### C-1 and C-2: Resolved by rebase After rebasing onto latest master, this PR now contains **only the #628 commit** — 16 files changed, 1004 insertions, 168 deletions. The scope-bundling concern (C-1) and mock-in-integration-tests concern (C-2) are no longer applicable because those commits are already on master. ### Findings addressed in this round | # | Severity | Finding | Resolution | |---|----------|---------|------------| | M-5 | Medium | `_make_mock_scenario` duplicated between Robot helper and Behave steps | **Fixed**: Extracted to `features/mocks/tdd_test_helpers.py` as `make_mock_scenario()`. Both `robot/helper_tdd_tag_validation.py` and `features/steps/tdd_tag_validation_steps.py` now import from the shared module. Updated `features/mocks/__init__.py` to export it. | | M-7 | Medium | XML message extraction may miss RF status message attribute | **Fixed**: Added `status_el.get("message", "")` fallback in both `_run_fixture()` and `_run_multi_fixture()`. Now checks element text → `message` attribute (RF 7.3+) → child `<msg>` element. | | L-6 | Low | Listener docstring says "longname" but code uses `full_name` | **Fixed**: Changed "keyed by test longname" to "keyed by test ``full_name``" in the Implementation Notes docstring. | | L-7 | Low | Listener error messages omit `@` prefix, diverging from Behave counterpart | **Documented**: Added comment in `_validate_tdd_tags` explaining the intentional divergence — Robot Framework tags don't use the `@` prefix, while Behave does. Each framework's error messages use its own convention. | | Nit | - | Missing edge case fixture: `tdd_bug` alone without `tdd_bug_N` or `tdd_expected_fail` | **Fixed**: Added `robot/fixtures/tdd_bug_alone.robot` fixture, `cmd_tdd_bug_alone_valid()` helper command, and "TDD Bug Tag Alone Is Valid" test case. Total Robot-side integration tests: 9. | | Nit | - | `_validate_tdd_tags` returns string vs Behave's `ValueError` — undocumented design choice | **Documented**: Added design note in `_validate_tdd_tags` docstring explaining why `str | None` is used instead of raising `ValueError`: the Robot Framework Listener v3 API does not propagate exceptions cleanly from `start_test`. | ### Findings now out of scope (resolved by rebase) After rebasing, the following findings are no longer part of this PR's diff — they exist on master from previously-merged commits: | # | Finding | Status | |---|---------|--------| | H-1 | `session_service` Factory → Singleton | Master code (from #554/#570/#680). Recommend filing a separate issue. | | H-2 | Commit `9c8dc9ed` violates Conventional Commits | Historical commit, now on master. | | H-3 | Commit `94d95324` missing `ISSUES CLOSED:` footer | Historical commit, now on master. | | M-1 | `CorrectionService` bypasses container Singleton | Master code (from plan.py). | | M-2 | `AuditService._ensure_session()` not thread-safe | Master code (from #581). | | M-3 | Incomplete `entity_deleted` audit coverage | Spec gap, not introduced by #628. | | M-4 | `_suppress_structlog_stdout` duplicated 3 times | Master code (from #554/#570/#680). | | M-6 | `cleanup_workspace()` mutates `os.environ` | Master code (from `helper_e2e_common.py`). | | L-1 | `_fail()` uses `raise SystemExit(1)` vs `sys.exit(1)` | Master code. | | L-2 | Three separate `CONFIG_CHANGED` audit events | Master code. | | L-3 | `ConfigService()` without `event_bus` | Master code. | | L-4 | Silent audit subscriber deferral | Master code. | | L-5 | Stale feature descriptions in session TDD features | Master code (from #627). | | Nit | `Type/Feature` label inaccurate for multi-type PR | No longer applicable — PR is now single-issue (#628), purely `Type/Feature`. | | Nit | `helper_m3_e2e_verification.py` / `helper_m6_e2e_verification.py` line counts | Master code, not part of #628. | I recommend @hamza.khyari or PM file separate issues for H-1, M-1, M-2, M-3, M-4, and M-6 to track these as independent follow-ups. ### Quality Gates (post-rebase + review fixes) | Gate | Result | |------|--------| | `nox -s lint` | PASS (0 violations) | | `nox -s typecheck` | PASS (0 errors, 1 pre-existing warning) | | `nox -s unit_tests` | PASS (10,815 scenarios, 381 features) | | `nox -s integration_tests` | PASS (1,519 tests) | | `nox -s coverage_report` | PASS (97% >= 97% threshold) | ### PR scope after rebase This PR now contains a single commit for a single issue (#628): - 8 new fixture `.robot` files (including `tdd_bug_alone.robot`) - 1 new shared mock helper (`features/mocks/tdd_test_helpers.py`) - Listener (`robot/tdd_expected_fail_listener.py`) — modified - Helper (`robot/helper_tdd_tag_validation.py`) — modified - Test suite (`robot/tdd_tag_validation.robot`) — modified - Nox config — modified (listener path + tdd_fixture exclude) - CHANGELOG — modified - `features/mocks/__init__.py` — modified (export shared helper) - `features/steps/tdd_tag_validation_steps.py` — modified (use shared helper)

hurui200320 referenced this pull request

2026-03-16 06:19:52 +00:00

Implement @tdd_expected_fail tag handling in Robot Framework #628

hurui200320 commented

2026-03-16 06:51:16 +00:00

Self-Review (Round 5) — PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

Reviewer: hurui200320 | Protocol: 5-agent parallel review (test coverage/flaws, performance, bug detection, security, spec/code quality)

Verdict: APPROVED — with non-blocking suggestions below.

Overall Assessment

All 6 acceptance criteria from ticket #628 are correctly satisfied. Code quality is high — clean separation between validation (start_test) and inversion (end_test), proper regex compilation at module level, thorough docstrings, and comprehensive Robot + Behave test coverage. All 15 findings from @hamza.khyari's previous review have been properly addressed in Round 4. The rebase cleanly scoped this PR to a single commit for #628.

Performance: No issues. Listener overhead is negligible (~microseconds per test across 1500+ tests).
Security: No issues. Subprocess invocation uses list args (no shell injection), fixture names are hardcoded, temp dirs managed via context managers, XML is self-generated.
Bug Detection: No new bugs found. All 8 previously-reported bugs properly fixed.

6 new suggestions are listed below — 2 Medium, 4 Low/Nit. None are blocking.

Findings

#	Severity	Category	File	Finding
1	Medium	CODE	`helper_tdd_tag_validation.py`	File exceeds 500-line limit (629 lines)
2	Medium	TEST	`tdd_expected_fail_listener.py`	Idempotency guard (`_processed_tests`) is untested
3	Low	TEST	`tdd_expected_fail_listener.py`	`close()` cleanup function is untested
4	Low	TEST	`helper_tdd_tag_validation.py:554`	Weak assertion in `expected-fail-alone` validation
5	Low	TEST	`tdd_expected_fail_listener.py:161`	Tag case-normalization (`.lower()`) never exercised by tests
6	Low/Nit	CODE	`helper_tdd_tag_validation.py:347,413`	Duplicated XML message-extraction logic

1. File exceeds 500-line limit (Medium)

File: robot/helper_tdd_tag_validation.py (629 lines)

CONTRIBUTING.md mandates files stay under 500 lines. This was also noted as a nit by @hamza.khyari in the previous review.

Recommendation: Split along the natural Behave/Robot boundary, or extract the Robot-side fixture runner infrastructure (_run_fixture, _run_multi_fixture) into a small _tdd_fixture_runner.py utility. Extracting the XML message-extraction helper (Finding #6) would also reduce line count.

2. Idempotency guard untested (Medium)

File: robot/tdd_expected_fail_listener.py, lines 79, 193–195

The _processed_tests set guard prevents double-inversion if the listener is loaded twice. No test exercises this — each helper subprocess starts fresh, so the guard is never triggered. If this guard broke (e.g., someone clears the set prematurely), a tdd_expected_fail test would silently double-invert (FAIL→PASS→FAIL), producing wrong results with no test catching it.

Recommendation: Add a Robot-side test that passes --listener <path> --listener <path> and verifies the fixture is still correctly inverted to PASS (not double-inverted back to FAIL).

3. `close()` cleanup untested (Low)

File: robot/tdd_expected_fail_listener.py, lines 254–262

close() clears both _validation_errors and _processed_tests. No test verifies this behavior. Each test subprocess starts fresh, so close() cleanup never observably matters in the current test topology.

Recommendation: Low priority. Consider adding a note documenting that close() is a safety net for embedded/long-lived Robot processes, or add a simple direct-import test that populates the dicts, calls close(), and verifies they are empty.

4. Weak `expected-fail-alone` assertion (Low)

File: robot/helper_tdd_tag_validation.py, line 554

if status == "FAIL" and "tdd_bug" in message and "tdd_bug_<N>" in message:

The first check ("tdd_bug" in message) is always satisfied when the second ("tdd_bug_<N>" in message) is true, because "tdd_bug" is a substring of "tdd_bug_<N>". The assertion doesn't truly verify both tags are listed independently in the error message.

Recommendation: Tighten to "tdd_bug, tdd_bug_<N>" in message or check for the combined phrase from the actual error output.

5. Tag case-normalization never exercised (Low)

File: robot/tdd_expected_fail_listener.py, lines 161, 213

All 8 fixture files use lowercase tags exclusively. The .lower() normalization in start_test / end_test is never tested with mixed-case input. If it were removed, no test would fail.

Recommendation: Modify one fixture to use mixed-case tags (e.g., TDD_Expected_Fail, TDD_BUG_999) and verify the listener correctly handles them.

6. Duplicated XML message-extraction (Low/Nit)

File: robot/helper_tdd_tag_validation.py, lines 347–353 and 413–419

The RF output XML message extraction pattern (check status_el.text → status_el.get("message") → <msg> child) is copy-pasted between _run_fixture and _run_multi_fixture. Any future RF output-format change must be updated in two places.

Recommendation: Extract to _extract_message(test_el: ET.Element, status_el: ET.Element) -> str. Also helps reduce line count (Finding #1).

Previous Review Comments — Status

All of @hamza.khyari's 15 findings from the formal review have been addressed in Round 4. The C-1 (scope bundling) and C-2 (mock in integration tests) concerns are resolved by rebase — the PR now contains only the #628 commit. PM (@freemo) has ruled the PR mergeable.

Process Checks (all pass)

Commit message matches issue Metadata ✓
ISSUES CLOSED: #628 footer present ✓
CHANGELOG updated under Unreleased ✓
All files belong to this feature ✓
No generated artifacts or secrets ✓
No # type: ignore in new source ✓
Mocking code in features/mocks/ only ✓
Robot for integration tests, Behave for unit tests ✓
Quality gates reported passing (lint, typecheck, unit_tests, integration_tests, coverage 97%) ✓

## Self-Review (Round 5) — PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` **Reviewer**: hurui200320 | **Protocol**: 5-agent parallel review (test coverage/flaws, performance, bug detection, security, spec/code quality) ### Verdict: **APPROVED** — with non-blocking suggestions below. --- ### Overall Assessment All 6 acceptance criteria from ticket #628 are correctly satisfied. Code quality is high — clean separation between validation (`start_test`) and inversion (`end_test`), proper regex compilation at module level, thorough docstrings, and comprehensive Robot + Behave test coverage. All 15 findings from @hamza.khyari's previous review have been properly addressed in Round 4. The rebase cleanly scoped this PR to a single commit for #628. **Performance**: No issues. Listener overhead is negligible (~microseconds per test across 1500+ tests). **Security**: No issues. Subprocess invocation uses list args (no shell injection), fixture names are hardcoded, temp dirs managed via context managers, XML is self-generated. **Bug Detection**: No new bugs found. All 8 previously-reported bugs properly fixed. **6 new suggestions** are listed below — 2 Medium, 4 Low/Nit. None are blocking. --- ### Findings | # | Severity | Category | File | Finding | |---|----------|----------|------|---------| | 1 | **Medium** | CODE | `helper_tdd_tag_validation.py` | File exceeds 500-line limit (629 lines) | | 2 | **Medium** | TEST | `tdd_expected_fail_listener.py` | Idempotency guard (`_processed_tests`) is untested | | 3 | Low | TEST | `tdd_expected_fail_listener.py` | `close()` cleanup function is untested | | 4 | Low | TEST | `helper_tdd_tag_validation.py:554` | Weak assertion in `expected-fail-alone` validation | | 5 | Low | TEST | `tdd_expected_fail_listener.py:161` | Tag case-normalization (`.lower()`) never exercised by tests | | 6 | Low/Nit | CODE | `helper_tdd_tag_validation.py:347,413` | Duplicated XML message-extraction logic | --- #### 1. File exceeds 500-line limit (Medium) **File**: `robot/helper_tdd_tag_validation.py` (629 lines) CONTRIBUTING.md mandates files stay under 500 lines. This was also noted as a nit by @hamza.khyari in the previous review. **Recommendation**: Split along the natural Behave/Robot boundary, or extract the Robot-side fixture runner infrastructure (`_run_fixture`, `_run_multi_fixture`) into a small `_tdd_fixture_runner.py` utility. Extracting the XML message-extraction helper (Finding #6) would also reduce line count. --- #### 2. Idempotency guard untested (Medium) **File**: `robot/tdd_expected_fail_listener.py`, lines 79, 193–195 The `_processed_tests` set guard prevents double-inversion if the listener is loaded twice. No test exercises this — each helper subprocess starts fresh, so the guard is never triggered. If this guard broke (e.g., someone clears the set prematurely), a `tdd_expected_fail` test would silently double-invert (FAIL→PASS→FAIL), producing wrong results with no test catching it. **Recommendation**: Add a Robot-side test that passes `--listener <path> --listener <path>` and verifies the fixture is still correctly inverted to PASS (not double-inverted back to FAIL). --- #### 3. `close()` cleanup untested (Low) **File**: `robot/tdd_expected_fail_listener.py`, lines 254–262 `close()` clears both `_validation_errors` and `_processed_tests`. No test verifies this behavior. Each test subprocess starts fresh, so `close()` cleanup never observably matters in the current test topology. **Recommendation**: Low priority. Consider adding a note documenting that `close()` is a safety net for embedded/long-lived Robot processes, or add a simple direct-import test that populates the dicts, calls `close()`, and verifies they are empty. --- #### 4. Weak `expected-fail-alone` assertion (Low) **File**: `robot/helper_tdd_tag_validation.py`, line 554 ```python if status == "FAIL" and "tdd_bug" in message and "tdd_bug_<N>" in message: ``` The first check (`"tdd_bug" in message`) is always satisfied when the second (`"tdd_bug_<N>" in message`) is true, because `"tdd_bug"` is a substring of `"tdd_bug_<N>"`. The assertion doesn't truly verify both tags are listed independently in the error message. **Recommendation**: Tighten to `"tdd_bug, tdd_bug_<N>" in message` or check for the combined phrase from the actual error output. --- #### 5. Tag case-normalization never exercised (Low) **File**: `robot/tdd_expected_fail_listener.py`, lines 161, 213 All 8 fixture files use lowercase tags exclusively. The `.lower()` normalization in `start_test` / `end_test` is never tested with mixed-case input. If it were removed, no test would fail. **Recommendation**: Modify one fixture to use mixed-case tags (e.g., `TDD_Expected_Fail`, `TDD_BUG_999`) and verify the listener correctly handles them. --- #### 6. Duplicated XML message-extraction (Low/Nit) **File**: `robot/helper_tdd_tag_validation.py`, lines 347–353 and 413–419 The RF output XML message extraction pattern (check `status_el.text` → `status_el.get("message")` → `<msg>` child) is copy-pasted between `_run_fixture` and `_run_multi_fixture`. Any future RF output-format change must be updated in two places. **Recommendation**: Extract to `_extract_message(test_el: ET.Element, status_el: ET.Element) -> str`. Also helps reduce line count (Finding #1). --- ### Previous Review Comments — Status All of @hamza.khyari's 15 findings from the formal review have been addressed in Round 4. The C-1 (scope bundling) and C-2 (mock in integration tests) concerns are resolved by rebase — the PR now contains only the #628 commit. PM (@freemo) has ruled the PR mergeable. ### Process Checks (all pass) - Commit message matches issue Metadata ✓ - `ISSUES CLOSED: #628` footer present ✓ - CHANGELOG updated under Unreleased ✓ - All files belong to this feature ✓ - No generated artifacts or secrets ✓ - No `# type: ignore` in new source ✓ - Mocking code in `features/mocks/` only ✓ - Robot for integration tests, Behave for unit tests ✓ - Quality gates reported passing (lint, typecheck, unit_tests, integration_tests, coverage 97%) ✓

hurui200320 referenced this issue from a commit

2026-03-16 08:11:08 +00:00

feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

hurui200320 force-pushed feature/m5-robot-tdd-tags from c7bdb62b73 to 5e4453d0a4

2026-03-16 08:11:08 +00:00

Compare

freemo referenced this pull request

2026-03-16 09:36:24 +00:00

Implement @tdd_expected_fail tag handling in Robot Framework #628

freemo referenced this pull request

2026-03-16 10:24:53 +00:00

Implement TDD bug tag quality gate for bug fix PRs #629

freemo approved these changes 2026-03-16 16:13:34 +00:00

Dismissed

freemo left a comment

PM Review — Robot Framework TDD Tag Handler (#628)

PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Review Summary

This PR implements the Robot Framework counterpart of the Behave TDD expected-fail system. It has been through 5 rounds of review with all 15 findings from @hamza.khyari addressed. The PR has been rebased to a single commit for #628 only (resolving the C-1 scope bundling concern). All quality gates pass. Approving.

TDD Infrastructure Verification

Listener v3 module implements result inversion for tdd_expected_fail tags
Tag validation enforces: tdd_bug_<N> requires tdd_bug, tdd_expected_fail requires both
Idempotency guard against double-invocation
SKIP status handling
8 fixture files covering all acceptance criteria
21 integration test cases (12 Behave-side + 9 Robot-side)
Shared mock helper extracted to features/mocks/tdd_test_helpers.py
Nox sessions updated with --listener flag
All quality gates pass (10,815 scenarios, 97% coverage)

Critical Path Impact

This PR unblocks 5 Critical bug TDD pipelines (#838-#842) that depend on Robot Framework TDD tag support. Its merge is a prerequisite for the following TDD issues:

#840 (TDD for ACMS tier bug #821)
#841 (TDD for actor list bug #797)
#842 (TDD for init --yes bug #783)
Plus newly created #977, #978, #979 (TDD for plan CLI bugs #967-#969)

Previous Scope Concern Resolution

The C-1 scope bundling concern from @hamza.khyari has been resolved — @hurui200320 rebased to include only the single commit for issue #628. The remaining out-of-scope findings were correctly deferred.

Verdict: APPROVED. This is a critical-path merge. Recommend @hamza.khyari convert his previous REQUEST_CHANGES to APPROVED given that all 15 findings have been addressed and the scope concern is resolved. Merge target: Day 37 (2026-03-17).

## PM Review — Robot Framework TDD Tag Handler (#628) **PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework** ### Review Summary This PR implements the Robot Framework counterpart of the Behave TDD expected-fail system. It has been through 5 rounds of review with all 15 findings from @hamza.khyari addressed. The PR has been rebased to a single commit for #628 only (resolving the C-1 scope bundling concern). All quality gates pass. Approving. ### TDD Infrastructure Verification - [x] Listener v3 module implements result inversion for `tdd_expected_fail` tags - [x] Tag validation enforces: `tdd_bug_<N>` requires `tdd_bug`, `tdd_expected_fail` requires both - [x] Idempotency guard against double-invocation - [x] SKIP status handling - [x] 8 fixture files covering all acceptance criteria - [x] 21 integration test cases (12 Behave-side + 9 Robot-side) - [x] Shared mock helper extracted to `features/mocks/tdd_test_helpers.py` - [x] Nox sessions updated with `--listener` flag - [x] All quality gates pass (10,815 scenarios, 97% coverage) ### Critical Path Impact This PR **unblocks 5 Critical bug TDD pipelines** (#838-#842) that depend on Robot Framework TDD tag support. Its merge is a prerequisite for the following TDD issues: - #840 (TDD for ACMS tier bug #821) - #841 (TDD for actor list bug #797) - #842 (TDD for init --yes bug #783) - Plus newly created #977, #978, #979 (TDD for plan CLI bugs #967-#969) ### Previous Scope Concern Resolution The C-1 scope bundling concern from @hamza.khyari has been resolved — @hurui200320 rebased to include only the single commit for issue #628. The remaining out-of-scope findings were correctly deferred. **Verdict: APPROVED.** This is a critical-path merge. Recommend @hamza.khyari convert his previous REQUEST_CHANGES to APPROVED given that all 15 findings have been addressed and the scope concern is resolved. Merge target: Day 37 (2026-03-17).

brent.edwards approved these changes 2026-03-16 22:30:01 +00:00

Dismissed

brent.edwards left a comment

Code Review — PR #673 `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

Reviewer: @brent.edwards | Size: L (1,002 lines, 15 files) | Focus: Listener correctness, test quality, process compliance

Verification of Previous Reviews

All prior review findings have been addressed:

Source	Findings	Status
Rui self-review R1	`# type: ignore` violation	✅ Fixed — `Callable[[], int]` typing
Rui self-review R2	Bare `else` catches SKIP incorrectly	✅ Fixed — explicit `elif PASS` + `else SKIP`
Hamza R1 (15 findings)	P2 idempotency, sham test, + 13 nits	✅ All 15 addressed in Round 3
Hamza formal review (C-1)	Scope bundling	✅ Resolved — rebased to single commit for #628
Rui R4	M-5 shared mock, M-7 XML fallback, L-6/L-7, new fixtures	✅ All addressed
Rui R5 self-review	629-line file, untested idempotency guard	Acknowledged — see below

No P0 or P1 findings.

The listener module is well-designed:

Validation-first in end_test — invalid tag combinations always produce FAIL, regardless of test outcome
data.tags over result.tags — correctly uses static test definition tags, not runtime-mutable tags (intentional, documented)
fullmatch() on regex — correctly distinguishes tdd_bug from tdd_bug_\<N\> without false matches
Process-local state — _processed_tests and _validation_errors are per-process (pabot uses subprocesses), no concurrency risk
close() cleanup — RF Listener v3 API calls close() on execution finish
No # type: ignore — clean type safety
No # noqa in listener — helper has 3 justified # noqa: E402 for post-sys.path imports

P2:should-fix (3)

1. cmd_skip_status_unchanged and cmd_tdd_bug_alone_valid can false-positive without listener loaded
Both tests verify non-modification of results (SKIP stays SKIP, PASS stays PASS) but run the fixture alone without a positive control. Unlike cmd_normal_test_unaffected (which uses _run_multi_fixture with a companion tdd_expected_fail_fails fixture to PROVE the listener was active and selectively applied), these two commands would produce the same result if the --listener flag were missing entirely.
Fix: Use _run_multi_fixture with tdd_expected_fail_fails as a co-run fixture (same pattern as cmd_normal_test_unaffected), asserting the companion was inverted.

2. subprocess.run calls missing timeout
Both _run_fixture and _run_multi_fixture call subprocess.run(cmd, capture_output=True, text=True) without timeout. If a fixture hangs, the subprocess blocks forever, which blocks the parent Robot test, which blocks CI with no diagnostic. All other Robot helpers in the repo use timeout=120 or similar.
Fix: Add timeout=120 and handle subprocess.TimeoutExpired.

3. helper_tdd_tag_validation.py at 629 lines exceeds the 500-line limit
Already flagged in Rui's self-review R5. The natural split point is around line 270 (Behave-side functions vs Robot-side fixture runners). Extracting _run_fixture/_run_multi_fixture/_extract_message into a _tdd_fixture_runner.py utility would solve this and eliminate the duplicated XML extraction (R5 finding #6).

P3:nit (1)

4. features/mocks/__init__.py doesn't re-export make_mock_scenario
Every other mock in the package is re-exported from __init__.py. tdd_test_helpers.make_mock_scenario is imported by full module path, so this works, but breaks the package convention. Minor inconsistency.

Summary

Severity	Count
P0:blocker	0
P1:must-fix	0
P2:should-fix	3
P3:nit	1

Verdict: APPROVED — The listener implementation is correct, well-tested (8 fixtures, 21 integration tests), and all prior review findings have been thoroughly addressed. The 3 P2 findings are genuine quality improvements but do not block merge. This is critical-path TDD infrastructure that unblocks 5+ bug TDD pipelines.

The code demonstrates strong engineering: clean validation/inversion separation, proper RF Listener v3 API usage, comprehensive edge-case coverage (SKIP, double-invocation, tdd_bug-alone, tdd_expected_fail-alone), and an extracted shared mock helper. Well done @hurui200320.

## Code Review — PR #673 `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework` **Reviewer:** @brent.edwards | **Size:** L (1,002 lines, 15 files) | **Focus:** Listener correctness, test quality, process compliance --- ### Verification of Previous Reviews All prior review findings have been addressed: | Source | Findings | Status | |--------|----------|--------| | Rui self-review R1 | `# type: ignore` violation | ✅ Fixed — `Callable[[], int]` typing | | Rui self-review R2 | Bare `else` catches SKIP incorrectly | ✅ Fixed — explicit `elif PASS` + `else SKIP` | | Hamza R1 (15 findings) | P2 idempotency, sham test, + 13 nits | ✅ All 15 addressed in Round 3 | | Hamza formal review (C-1) | Scope bundling | ✅ Resolved — rebased to single commit for #628 | | Rui R4 | M-5 shared mock, M-7 XML fallback, L-6/L-7, new fixtures | ✅ All addressed | | Rui R5 self-review | 629-line file, untested idempotency guard | Acknowledged — see below | --- ### No P0 or P1 findings. The listener module is well-designed: - **Validation-first** in `end_test` — invalid tag combinations always produce FAIL, regardless of test outcome - **`data.tags` over `result.tags`** — correctly uses static test definition tags, not runtime-mutable tags (intentional, documented) - **`fullmatch()` on regex** — correctly distinguishes `tdd_bug` from `tdd_bug_\<N\>` without false matches - **Process-local state** — `_processed_tests` and `_validation_errors` are per-process (pabot uses subprocesses), no concurrency risk - **`close()` cleanup** — RF Listener v3 API calls `close()` on execution finish - **No `# type: ignore`** — clean type safety - **No `# noqa` in listener** — helper has 3 justified `# noqa: E402` for post-`sys.path` imports --- ### P2:should-fix (3) **1. `cmd_skip_status_unchanged` and `cmd_tdd_bug_alone_valid` can false-positive without listener loaded** Both tests verify non-modification of results (SKIP stays SKIP, PASS stays PASS) but run the fixture alone without a positive control. Unlike `cmd_normal_test_unaffected` (which uses `_run_multi_fixture` with a companion `tdd_expected_fail_fails` fixture to PROVE the listener was active and selectively applied), these two commands would produce the same result if the `--listener` flag were missing entirely. **Fix:** Use `_run_multi_fixture` with `tdd_expected_fail_fails` as a co-run fixture (same pattern as `cmd_normal_test_unaffected`), asserting the companion was inverted. **2. `subprocess.run` calls missing `timeout`** Both `_run_fixture` and `_run_multi_fixture` call `subprocess.run(cmd, capture_output=True, text=True)` without `timeout`. If a fixture hangs, the subprocess blocks forever, which blocks the parent Robot test, which blocks CI with no diagnostic. All other Robot helpers in the repo use `timeout=120` or similar. **Fix:** Add `timeout=120` and handle `subprocess.TimeoutExpired`. **3. `helper_tdd_tag_validation.py` at 629 lines exceeds the 500-line limit** Already flagged in Rui's self-review R5. The natural split point is around line 270 (Behave-side functions vs Robot-side fixture runners). Extracting `_run_fixture`/`_run_multi_fixture`/`_extract_message` into a `_tdd_fixture_runner.py` utility would solve this and eliminate the duplicated XML extraction (R5 finding #6). --- ### P3:nit (1) **4. `features/mocks/__init__.py` doesn't re-export `make_mock_scenario`** Every other mock in the package is re-exported from `__init__.py`. `tdd_test_helpers.make_mock_scenario` is imported by full module path, so this works, but breaks the package convention. Minor inconsistency. --- ### Summary | Severity | Count | |----------|-------| | P0:blocker | 0 | | P1:must-fix | 0 | | P2:should-fix | 3 | | P3:nit | 1 | **Verdict: APPROVED** — The listener implementation is correct, well-tested (8 fixtures, 21 integration tests), and all prior review findings have been thoroughly addressed. The 3 P2 findings are genuine quality improvements but do not block merge. This is critical-path TDD infrastructure that unblocks 5+ bug TDD pipelines. The code demonstrates strong engineering: clean validation/inversion separation, proper RF Listener v3 API usage, comprehensive edge-case coverage (SKIP, double-invocation, tdd_bug-alone, tdd_expected_fail-alone), and an extracted shared mock helper. Well done @hurui200320.

brent.edwards referenced this issue from a commit

2026-03-16 22:45:58 +00:00

feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

brent.edwards force-pushed feature/m5-robot-tdd-tags from 5e4453d0a4 to a5de448856

2026-03-16 22:45:59 +00:00

Compare

brent.edwards dismissed freemo's review 2026-03-16 22:45:59 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards dismissed brent.edwards's review 2026-03-16 22:45:59 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards commented

2026-03-16 22:46:27 +00:00

Review Fixes Applied — `a5de448`

Addressed all 3 P2 and 1 P3 findings from @brent.edwards's review:

#	Sev	Finding	Fix
P2-1	should-fix	`cmd_skip_status_unchanged` and `cmd_tdd_bug_alone_valid` can false-positive without listener loaded	Converted both to use `run_multi_fixture` with `tdd_expected_fail_fails` as positive control — same pattern as `cmd_normal_test_unaffected`
P2-2	should-fix	`subprocess.run` missing `timeout`	Added `timeout=120` to both `run_fixture` and `run_multi_fixture` with `subprocess.TimeoutExpired` handling
P2-3	should-fix	`helper_tdd_tag_validation.py` at 629 lines (limit 500)	Extracted `run_fixture`, `run_multi_fixture`, and `_extract_message` to new `robot/_tdd_fixture_runner.py` (186 lines). Helper now 499 lines.
P3-4	nit	`make_mock_scenario` not re-exported from `features/mocks/__init__.py`	Added import + `__all__` entry

Line counts after fix

robot/helper_tdd_tag_validation.py: 499 (was 629)
robot/_tdd_fixture_runner.py: 186 (new)

Quality gates

Lint: PASS (0 violations)
Format: PASS
Typecheck: PASS (0 errors)

## Review Fixes Applied — `a5de448` Addressed all 3 P2 and 1 P3 findings from @brent.edwards's review: | # | Sev | Finding | Fix | |---|-----|---------|-----| | **P2-1** | should-fix | `cmd_skip_status_unchanged` and `cmd_tdd_bug_alone_valid` can false-positive without listener loaded | Converted both to use `run_multi_fixture` with `tdd_expected_fail_fails` as positive control — same pattern as `cmd_normal_test_unaffected` | | **P2-2** | should-fix | `subprocess.run` missing `timeout` | Added `timeout=120` to both `run_fixture` and `run_multi_fixture` with `subprocess.TimeoutExpired` handling | | **P2-3** | should-fix | `helper_tdd_tag_validation.py` at 629 lines (limit 500) | Extracted `run_fixture`, `run_multi_fixture`, and `_extract_message` to new `robot/_tdd_fixture_runner.py` (186 lines). Helper now **499 lines**. | | **P3-4** | nit | `make_mock_scenario` not re-exported from `features/mocks/__init__.py` | Added import + `__all__` entry | ### Line counts after fix - `robot/helper_tdd_tag_validation.py`: **499** (was 629) - `robot/_tdd_fixture_runner.py`: **186** (new) ### Quality gates - Lint: PASS (0 violations) - Format: PASS - Typecheck: PASS (0 errors)

brent.edwards added 1 commit 2026-03-16 23:05:46 +00:00

Merge branch 'master' into feature/m5-robot-tdd-tags

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 16s

Details

CI / lint (pull_request) Successful in 20s

Details

CI / quality (pull_request) Successful in 29s

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / e2e_tests (pull_request) Has been cancelled

Details

CI / typecheck (pull_request) Has been cancelled

Details

CI / security (pull_request) Has been cancelled

Details

CI / integration_tests (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

d0ca129d90

brent.edwards added 1 commit 2026-03-16 23:06:21 +00:00

Merge branch 'master' into feature/m5-robot-tdd-tags

CI / lint (pull_request) Successful in 25s

Details

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 17s

Details

CI / quality (pull_request) Successful in 40s

Details

CI / typecheck (pull_request) Successful in 47s

Details

CI / security (pull_request) Successful in 45s

Details

CI / e2e_tests (pull_request) Successful in 2m4s

Details

CI / benchmark-regression (pull_request) Failing after 1m41s

Details

CI / unit_tests (pull_request) Successful in 5m2s

Details

CI / integration_tests (pull_request) Successful in 5m30s

Details

CI / docker (pull_request) Successful in 55s

Details

CI / coverage (pull_request) Successful in 6m9s

Details

c13a62a2f2

brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-03-16 23:06:42 +00:00

brent.edwards merged commit c19c2b2e2c into master

2026-03-16 23:13:31 +00:00

brent.edwards deleted branch feature/m5-robot-tdd-tags

2026-03-16 23:13:31 +00:00

brent.edwards referenced this issue from a commit

2026-03-16 23:13:32 +00:00

Merge pull request 'feat(testing): implement @tdd_expected_fail tag handling in Robot Framework' (#673) from feature/m5-robot-tdd-tags into master

brent.edwards referenced this pull request

2026-03-16 23:23:11 +00:00

fix(test): remove eager tdd_test_helpers import from mocks __init__ #985

Sign in to join this conversation.

No Reviewers

hurui200320

CoreRasurae

hamza.khyari

freemo

4 Participants

Notifications

Due Date

No due date set.

Blocks

#628 Implement @tdd_expected_fail tag handling in Robot Framework

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#673

feat(testing): implement @tdd_expected_fail tag handling in Robot Framework #673

Summary

Changes

Review Fixes (Round 4)

Quality Gates (post-rebase onto latest master)

Self-Review: PR #673 — feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Acceptance Criteria Verification (Ticket #628)

Issue Found

# type: ignore[operator] violates CONTRIBUTING.md Type Safety rules

Items Reviewed (No Issues)

Self-Review (Round 2): PR #673 — feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Previous Review Issues — All Resolved

Acceptance Criteria (Ticket #628) — All Met

New Finding

end_test bare else catches SKIP status incorrectly (Medium)

Items Reviewed (No Issues)

PM Review — Day 31 (Specification Update)

Spec Alignment Check

Status

Action Required

PM Status — Day 31 (2026-03-11)

Code Review -- PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Summary

Findings

1. Double-invocation idempotency bug

2. "Normal Test Unaffected" is a sham test

3. _run_fixture does not guard against missing output.xml

4. end_test reads data.tags instead of result.tags

5. No test for SKIP status branch

6. cmd_expected_fail_inverted never checks message content

7. Weak substring assertions in validation checks

8. No test for tdd_expected_fail alone (both companions missing)

9. Guidance message deviates from AC #2 exact wording

10. _validation_errors never cleared on abnormal exit

11. _run_fixture returns stale temp file path

12. Listener path resolution is CWD-relative

13. No __all__ declaration

14. Docstring says "mirroring" but messages differ

15. XML parsing without defusedxml

Process Checks (all pass)

PM Status — Day 32

Review Fixes Applied (Round 3) — Force-pushed b1e45b49

P2 Fixes (should-fix)

P3 Fixes (nits)

P4 Fixes (nits)

Additional Fix

Quality Gates

Code Review — PR #673

Critical

C-1: PR bundles 7+ issues across 4+ milestones — must be split

C-2: M4 CLI integration tests use mocks — violates CONTRIBUTING.md

High

H-1: session_service is providers.Factory — creates new engine per call

H-2: Commit 9c8dc9ed violates Conventional Commits

H-3: Commit 94d95324 missing ISSUES CLOSED: footer

Medium

M-1: CorrectionService created ad-hoc in plan.py, bypasses container Singleton

M-2: AuditService._ensure_session() not thread-safe

M-3: Incomplete entity_deleted audit coverage vs spec

M-4: _suppress_structlog_stdout / _restore_structlog duplicated 3 times with incomplete restore

M-5: _make_mock_scenario duplicated between Robot helper and Behave steps

M-6: cleanup_workspace() mutates os.environ — not parallelism-safe

M-7: XML message extraction may miss RF status message attribute

Low

Nits

PM Triage — Day 33 (2026-03-13)

Critical Assessment

Blocking vs Non-Blocking

Action Required

PM Review — Day 34: Scope Bundling Ruling (C-1)

PM Ruling

Required Action

PR Status

Review Response (Round 4) — Force-pushed c7bdb62b

C-1 and C-2: Resolved by rebase

Findings addressed in this round

Findings now out of scope (resolved by rebase)

Quality Gates (post-rebase + review fixes)

PR scope after rebase

Self-Review (Round 5) — PR #673: feat(testing): implement @tdd_expected_fail tag handling in Robot Framework

Self-Review: PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

`# type: ignore[operator]` violates CONTRIBUTING.md Type Safety rules

Self-Review (Round 2): PR #673 — `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

`end_test` bare `else` catches SKIP status incorrectly (Medium)

Code Review -- PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

3. `_run_fixture` does not guard against missing `output.xml`

4. `end_test` reads `data.tags` instead of `result.tags`

6. `cmd_expected_fail_inverted` never checks message content

8. No test for `tdd_expected_fail` alone (both companions missing)

10. `_validation_errors` never cleared on abnormal exit

11. `_run_fixture` returns stale temp file path

13. No `all` declaration

15. XML parsing without `defusedxml`

Review Fixes Applied (Round 3) — Force-pushed `b1e45b49`

H-1: `session_service` is `providers.Factory` — creates new engine per call

H-2: Commit `9c8dc9ed` violates Conventional Commits

H-3: Commit `94d95324` missing `ISSUES CLOSED:` footer

M-1: `CorrectionService` created ad-hoc in `plan.py`, bypasses container Singleton

M-2: `AuditService._ensure_session()` not thread-safe

M-3: Incomplete `entity_deleted` audit coverage vs spec

M-4: `_suppress_structlog_stdout` / `_restore_structlog` duplicated 3 times with incomplete restore

M-5: `_make_mock_scenario` duplicated between Robot helper and Behave steps

M-6: `cleanup_workspace()` mutates `os.environ` — not parallelism-safe

Review Response (Round 4) — Force-pushed `c7bdb62b`

Self-Review (Round 5) — PR #673: `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

3. `close()` cleanup untested (Low)

4. Weak `expected-fail-alone` assertion (Low)

Code Review — PR #673 `feat(testing): implement @tdd_expected_fail tag handling in Robot Framework`

Review Fixes Applied — `a5de448`