test: add TDD bug-capture test for #1038 — validation add --required flag #1133

Merged
brent.edwards merged 2 commits from tdd/m5-validation-required-flag into master 2026-03-28 18:09:28 +00:00
Member

Summary

Adds TDD bug-capture tests for bug #1038 — the agents validation add command is missing --required/--informational flags that the specification describes at line 22334 and in numerous walkthrough examples.

Behave BDD Tests

Four Behave BDD scenarios verify the correct expected behavior:

  1. --required flag is accepted — invokes validation add --config <file> --required and asserts exit code 0 and mode "required" in output.
  2. --informational flag is accepted — same pattern with --informational.
  3. --required overrides YAML config mode — creates a YAML config with mode: informational, invokes with --required, and asserts the CLI output and service layer both reflect mode: "required".
  4. --informational overrides YAML config mode — creates a YAML config with mode: required, invokes with --informational, and asserts the CLI output and service layer both reflect mode: "informational".

Robot Framework Integration Tests

Four Robot Framework integration tests exercise the same CLI paths as the Behave scenarios via a subprocess helper script:

  1. TDD Validation Add Required Flag Accepted — verifies --required is recognised by the CLI.
  2. TDD Validation Add Informational Flag Accepted — verifies --informational is recognised.
  3. TDD Validation Add Required Flag Overrides YAML Config — verifies --required overrides a YAML config with mode: informational.
  4. TDD Validation Add Informational Flag Overrides YAML Config — verifies --informational overrides a YAML config with mode: required.

The helper script (robot/helper_tdd_validation_required_flag.py) uses typer.testing.CliRunner with mocked services, following the same pattern as other TDD Robot helpers (e.g., helper_tdd_plan_apply_yes_flag.py).

Tags

All scenarios (Behave and Robot) are tagged @tdd_bug @tdd_bug_1038 @tdd_expected_fail. Because the bug is still present, the underlying assertions fail (proving the bug exists) and the @tdd_expected_fail tag inverts the result so CI passes. Once bug #1038 is fixed, the tag must be removed.

Spec Contradiction Note

Rui Hu's investigation (issue #1038 comment #70755) found that the formal CLI reference (specification.md lines 9279–9290) does NOT include --required/--informational flags — they appear only in walkthrough examples and specification.md line 22334. Additionally, specification.md line 30761 states: "For entity registration commands (actor add, skill add, tool add, validation add, …), the YAML configuration file is the sole source of truth — the --config file fully defines the entity and no CLI override flags are accepted." The resolution may be to add the flags to the CLI OR to clean up the spec. This is documented in both the feature file header and step definition module docstring.

Key Design Decisions

  • YAML configs use flat mode key — matches what Validation.from_config() actually reads (config.get("mode", "required")). The specification's Validation Configuration schema (specification.md line 34109) nests mode under a validation key, but the current from_config() reads a flat top-level mode key. A comment in the step definitions documents this discrepancy.
  • Mock uses side_effect = lambda v: v — the register_tool mock returns the actual Validation object passed by the CLI, so the CLI output reflects real CLI processing rather than hard-coded mock configuration. This prevents false positives where the mock masks an incomplete fix.
  • False-positive guard — the Then step verifying mode checks both CLI output (via f"mode: {expected_mode}" prefix match) AND register_tool call args to prevent the mock from masking an incomplete fix.
  • Robot helper follows established pattern — uses typer.testing.CliRunner with mocked _get_tool_registry_service, matching the pattern from helper_tdd_plan_apply_yes_flag.py. Reports real outcome (exit 0 + sentinel on bug fixed, exit 1 on bug present) and relies on tdd_expected_fail_listener for result inversion.
  • Symmetric override scenarios — both directions are tested: --required overriding informational YAML config, and --informational overriding required YAML config.
  • Positional NAME omitted — the bug report includes a positional NAME argument, but that is a separate spec inconsistency not under test here.
  • Mutual exclusivity deferred — testing what happens when both --required and --informational are passed simultaneously is deferred to the bug-fix PR for #1038.

Closes #1102

## Summary Adds TDD bug-capture tests for bug #1038 — the `agents validation add` command is missing `--required`/`--informational` flags that the specification describes at line 22334 and in numerous walkthrough examples. ### Behave BDD Tests Four Behave BDD scenarios verify the correct expected behavior: 1. **`--required` flag is accepted** — invokes `validation add --config <file> --required` and asserts exit code 0 and mode `"required"` in output. 2. **`--informational` flag is accepted** — same pattern with `--informational`. 3. **`--required` overrides YAML config mode** — creates a YAML config with `mode: informational`, invokes with `--required`, and asserts the CLI output and service layer both reflect `mode: "required"`. 4. **`--informational` overrides YAML config mode** — creates a YAML config with `mode: required`, invokes with `--informational`, and asserts the CLI output and service layer both reflect `mode: "informational"`. ### Robot Framework Integration Tests Four Robot Framework integration tests exercise the same CLI paths as the Behave scenarios via a subprocess helper script: 1. **TDD Validation Add Required Flag Accepted** — verifies `--required` is recognised by the CLI. 2. **TDD Validation Add Informational Flag Accepted** — verifies `--informational` is recognised. 3. **TDD Validation Add Required Flag Overrides YAML Config** — verifies `--required` overrides a YAML config with `mode: informational`. 4. **TDD Validation Add Informational Flag Overrides YAML Config** — verifies `--informational` overrides a YAML config with `mode: required`. The helper script (`robot/helper_tdd_validation_required_flag.py`) uses `typer.testing.CliRunner` with mocked services, following the same pattern as other TDD Robot helpers (e.g., `helper_tdd_plan_apply_yes_flag.py`). ### Tags All scenarios (Behave and Robot) are tagged `@tdd_bug @tdd_bug_1038 @tdd_expected_fail`. Because the bug is still present, the underlying assertions fail (proving the bug exists) and the `@tdd_expected_fail` tag inverts the result so CI passes. Once bug #1038 is fixed, the tag must be removed. ### Spec Contradiction Note Rui Hu's investigation (issue #1038 comment #70755) found that the formal CLI reference (specification.md lines 9279–9290) does NOT include `--required`/`--informational` flags — they appear only in walkthrough examples and specification.md line 22334. Additionally, specification.md line 30761 states: "For entity registration commands (`actor add`, `skill add`, `tool add`, `validation add`, …), the YAML configuration file is the sole source of truth — the `--config` file fully defines the entity and no CLI override flags are accepted." The resolution may be to add the flags to the CLI OR to clean up the spec. This is documented in both the feature file header and step definition module docstring. ### Key Design Decisions - **YAML configs use flat `mode` key** — matches what `Validation.from_config()` actually reads (`config.get("mode", "required")`). The specification's Validation Configuration schema (specification.md line 34109) nests mode under a `validation` key, but the current `from_config()` reads a flat top-level `mode` key. A comment in the step definitions documents this discrepancy. - **Mock uses `side_effect = lambda v: v`** — the `register_tool` mock returns the actual Validation object passed by the CLI, so the CLI output reflects real CLI processing rather than hard-coded mock configuration. This prevents false positives where the mock masks an incomplete fix. - **False-positive guard** — the Then step verifying mode checks both CLI output (via `f"mode: {expected_mode}"` prefix match) AND `register_tool` call args to prevent the mock from masking an incomplete fix. - **Robot helper follows established pattern** — uses `typer.testing.CliRunner` with mocked `_get_tool_registry_service`, matching the pattern from `helper_tdd_plan_apply_yes_flag.py`. Reports real outcome (exit 0 + sentinel on bug fixed, exit 1 on bug present) and relies on `tdd_expected_fail_listener` for result inversion. - **Symmetric override scenarios** — both directions are tested: `--required` overriding informational YAML config, and `--informational` overriding required YAML config. - **Positional NAME omitted** — the bug report includes a positional NAME argument, but that is a separate spec inconsistency not under test here. - **Mutual exclusivity deferred** — testing what happens when both `--required` and `--informational` are passed simultaneously is deferred to the bug-fix PR for #1038. Closes #1102
brent.edwards added this to the v3.5.0 milestone 2026-03-23 12:36:28 +00:00
brent.edwards force-pushed tdd/m5-validation-required-flag from 2b7f239bea
Some checks failed
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 3m17s
CI / typecheck (pull_request) Successful in 3m46s
CI / security (pull_request) Successful in 3m56s
CI / unit_tests (pull_request) Successful in 6m54s
CI / integration_tests (pull_request) Successful in 7m5s
CI / docker (pull_request) Successful in 1m10s
CI / e2e_tests (pull_request) Successful in 8m21s
CI / quality (pull_request) Failing after 15m39s
CI / coverage (pull_request) Successful in 10m27s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 1h2m3s
to 6fff40a2e7
Some checks failed
CI / build (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m39s
CI / typecheck (pull_request) Successful in 3m57s
CI / security (pull_request) Successful in 4m3s
CI / unit_tests (pull_request) Successful in 6m31s
CI / integration_tests (pull_request) Successful in 6m39s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 8m26s
CI / coverage (pull_request) Successful in 11m6s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-23 17:14:04 +00:00
Compare
brent.edwards force-pushed tdd/m5-validation-required-flag from 6fff40a2e7
Some checks failed
CI / build (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m39s
CI / typecheck (pull_request) Successful in 3m57s
CI / security (pull_request) Successful in 4m3s
CI / unit_tests (pull_request) Successful in 6m31s
CI / integration_tests (pull_request) Successful in 6m39s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 8m26s
CI / coverage (pull_request) Successful in 11m6s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Has been cancelled
to 982f46c35c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m37s
CI / typecheck (pull_request) Successful in 4m13s
CI / quality (pull_request) Successful in 4m14s
CI / build (pull_request) Successful in 14s
CI / security (pull_request) Successful in 4m58s
CI / unit_tests (pull_request) Successful in 6m11s
CI / integration_tests (pull_request) Successful in 7m15s
CI / docker (pull_request) Successful in 1m7s
CI / e2e_tests (pull_request) Successful in 7m45s
CI / coverage (pull_request) Successful in 11m12s
CI / status-check (pull_request) Successful in 4s
CI / benchmark-regression (pull_request) Failing after 23m5s
2026-03-23 18:05:53 +00:00
Compare
freemo requested changes 2026-03-24 15:24:46 +00:00
Dismissed
freemo left a comment

Review: REQUEST CHANGES

The Behave tests are excellent — 4 well-designed scenarios covering --required, --informational, and override-mode edge cases. The TDD tags are all correct (@tdd_bug @tdd_bug_1038 @tdd_expected_fail), step definitions are fully implemented with real assertions (not placeholders), and the feature file has a thorough 30-line header comment documenting the spec contradiction. This is high-quality test work.

Blocking Issue

Missing Robot Framework integration tests. This is the only TDD bug-capture PR in the current batch that does not include a .robot file + helper script. Per CONTRIBUTING.md §BDD Framework: integration tests use Robot Framework under robot/. Per §Testing Philosophy: "Every coding task must include or update tests at multiple levels: unit tests, integration tests, and performance benchmarks."

All other recent TDD PRs (e.g., #1148, #1144, #1110, #1113, #1109, #1130) include both Behave and Robot tests. This PR should follow the same pattern for consistency and to satisfy the multi-level testing mandate.

Action Item

  • Add a Robot Framework test file (robot/tdd_validation_required_flag.robot) and corresponding helper script that exercises the validation add --required CLI path via subprocess, with matching TDD tags.

Everything else about this PR is solid — once the Robot tests are added, this is ready to approve.

## Review: REQUEST CHANGES The Behave tests are excellent — 4 well-designed scenarios covering `--required`, `--informational`, and override-mode edge cases. The TDD tags are all correct (`@tdd_bug @tdd_bug_1038 @tdd_expected_fail`), step definitions are fully implemented with real assertions (not placeholders), and the feature file has a thorough 30-line header comment documenting the spec contradiction. This is high-quality test work. ### Blocking Issue **Missing Robot Framework integration tests.** This is the only TDD bug-capture PR in the current batch that does not include a `.robot` file + helper script. Per CONTRIBUTING.md §BDD Framework: integration tests use Robot Framework under `robot/`. Per §Testing Philosophy: *"Every coding task must include or update tests at multiple levels: unit tests, integration tests, and performance benchmarks."* All other recent TDD PRs (e.g., #1148, #1144, #1110, #1113, #1109, #1130) include both Behave and Robot tests. This PR should follow the same pattern for consistency and to satisfy the multi-level testing mandate. ### Action Item - [ ] Add a Robot Framework test file (`robot/tdd_validation_required_flag.robot`) and corresponding helper script that exercises the `validation add --required` CLI path via subprocess, with matching TDD tags. Everything else about this PR is solid — once the Robot tests are added, this is ready to approve.
Author
Member

Self-QA status: QA process was started but interrupted before completing. The PR has not yet been reviewed by the self-QA loop. Manual review recommended.

**Self-QA status:** QA process was started but interrupted before completing. The PR has not yet been reviewed by the self-QA loop. Manual review recommended.
brent.edwards force-pushed tdd/m5-validation-required-flag from 982f46c35c
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 3m37s
CI / typecheck (pull_request) Successful in 4m13s
CI / quality (pull_request) Successful in 4m14s
CI / build (pull_request) Successful in 14s
CI / security (pull_request) Successful in 4m58s
CI / unit_tests (pull_request) Successful in 6m11s
CI / integration_tests (pull_request) Successful in 7m15s
CI / docker (pull_request) Successful in 1m7s
CI / e2e_tests (pull_request) Successful in 7m45s
CI / coverage (pull_request) Successful in 11m12s
CI / status-check (pull_request) Successful in 4s
CI / benchmark-regression (pull_request) Failing after 23m5s
to d81f783917
Some checks failed
CI / status-check (pull_request) Blocked by required conditions
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Failing after 26s
CI / lint (pull_request) Successful in 4m12s
CI / quality (pull_request) Successful in 4m37s
CI / typecheck (pull_request) Successful in 4m48s
CI / coverage (pull_request) Has started running
CI / benchmark-regression (pull_request) Has started running
CI / security (pull_request) Successful in 4m59s
CI / integration_tests (pull_request) Successful in 9m43s
CI / unit_tests (pull_request) Successful in 9m59s
CI / docker (pull_request) Successful in 1m7s
CI / e2e_tests (pull_request) Successful in 11m6s
2026-03-25 01:46:59 +00:00
Compare
brent.edwards force-pushed tdd/m5-validation-required-flag from b11d41386e
Some checks are pending
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 13s
CI / lint (pull_request) Successful in 3m23s
CI / typecheck (pull_request) Successful in 3m53s
CI / benchmark-regression (pull_request) Waiting to run
CI / quality (pull_request) Successful in 3m42s
CI / security (pull_request) Successful in 4m7s
CI / integration_tests (pull_request) Successful in 7m8s
CI / unit_tests (pull_request) Successful in 8m10s
CI / docker (pull_request) Successful in 1m3s
CI / e2e_tests (pull_request) Successful in 9m10s
CI / coverage (pull_request) Successful in 11m38s
CI / status-check (pull_request) Successful in 1s
to 54c748f5c4
All checks were successful
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 3m56s
CI / quality (pull_request) Successful in 3m57s
CI / security (pull_request) Successful in 3m59s
CI / integration_tests (pull_request) Successful in 7m29s
CI / unit_tests (pull_request) Successful in 7m36s
CI / docker (pull_request) Successful in 1m10s
CI / e2e_tests (pull_request) Successful in 10m15s
CI / coverage (pull_request) Successful in 10m53s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 1h16m51s
2026-03-25 21:58:23 +00:00
Compare
Author
Member

Review Response

@freemo — thanks for the thorough review and the good catch on the missing Robot tests. You're right that this was the only TDD bug-capture PR without .robot coverage.

Addressed: Missing Robot Framework integration tests

Added robot/tdd_validation_required_flag.robot and robot/helper_tdd_validation_required_flag.py with four test cases exercising the same CLI paths as the Behave scenarios:

Robot Test Case Behave Scenario Helper Subcommand
TDD Validation Add Required Flag Accepted --required flag is accepted check-required
TDD Validation Add Informational Flag Accepted --informational flag is accepted check-informational
TDD Validation Add Required Flag Overrides YAML Config --required overrides YAML config mode check-required-overrides-config
TDD Validation Add Informational Flag Overrides YAML Config --informational overrides YAML config mode check-informational-overrides-config

All test cases are tagged tdd_bug, tdd_bug_1038, tdd_expected_fail. The helper follows the established pattern from helper_tdd_plan_apply_yes_flag.py — reports the real outcome and relies on tdd_expected_fail_listener for result inversion.

All 4 new Robot tests pass via the expected-fail mechanism (underlying assertions fail correctly, confirming the bug exists).

Quality Gates

  • lint: | typecheck: | unit_tests: | integration_tests: (new tests pass; pre-existing failures only) | coverage: 98%

Branch rebased onto latest origin/master and force-pushed.

## Review Response @freemo — thanks for the thorough review and the good catch on the missing Robot tests. You're right that this was the only TDD bug-capture PR without `.robot` coverage. ### Addressed: Missing Robot Framework integration tests ✅ Added `robot/tdd_validation_required_flag.robot` and `robot/helper_tdd_validation_required_flag.py` with four test cases exercising the same CLI paths as the Behave scenarios: | Robot Test Case | Behave Scenario | Helper Subcommand | |---|---|---| | TDD Validation Add Required Flag Accepted | `--required` flag is accepted | `check-required` | | TDD Validation Add Informational Flag Accepted | `--informational` flag is accepted | `check-informational` | | TDD Validation Add Required Flag Overrides YAML Config | `--required` overrides YAML config mode | `check-required-overrides-config` | | TDD Validation Add Informational Flag Overrides YAML Config | `--informational` overrides YAML config mode | `check-informational-overrides-config` | All test cases are tagged `tdd_bug`, `tdd_bug_1038`, `tdd_expected_fail`. The helper follows the established pattern from `helper_tdd_plan_apply_yes_flag.py` — reports the real outcome and relies on `tdd_expected_fail_listener` for result inversion. All 4 new Robot tests pass via the expected-fail mechanism (underlying assertions fail correctly, confirming the bug exists). ### Quality Gates - `lint`: ✅ | `typecheck`: ✅ | `unit_tests`: ✅ | `integration_tests`: ✅ (new tests pass; pre-existing failures only) | `coverage`: 98% Branch rebased onto latest `origin/master` and force-pushed.
freemo approved these changes 2026-03-27 17:12:36 +00:00
Dismissed
freemo left a comment

Review: test: add TDD bug-capture test for #1038 — validation add --required flag

Approved. Clean TDD bug-capture test with proper conventional commit format, @tdd_expected_fail tags, and issue reference.

## Review: test: add TDD bug-capture test for #1038 — validation add --required flag **Approved.** Clean TDD bug-capture test with proper conventional commit format, `@tdd_expected_fail` tags, and issue reference.
brent.edwards force-pushed tdd/m5-validation-required-flag from 54c748f5c4
All checks were successful
CI / build (pull_request) Successful in 24s
CI / lint (pull_request) Successful in 3m19s
CI / typecheck (pull_request) Successful in 3m56s
CI / quality (pull_request) Successful in 3m57s
CI / security (pull_request) Successful in 3m59s
CI / integration_tests (pull_request) Successful in 7m29s
CI / unit_tests (pull_request) Successful in 7m36s
CI / docker (pull_request) Successful in 1m10s
CI / e2e_tests (pull_request) Successful in 10m15s
CI / coverage (pull_request) Successful in 10m53s
CI / status-check (pull_request) Successful in 2s
CI / benchmark-publish (pull_request) Has been skipped
CI / benchmark-regression (pull_request) Successful in 1h16m51s
to 55fd37c54a
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 34s
CI / helm (pull_request) Successful in 34s
CI / lint (pull_request) Successful in 3m20s
CI / quality (pull_request) Successful in 3m51s
CI / typecheck (pull_request) Successful in 3m57s
CI / benchmark-regression (pull_request) Has started running
CI / security (pull_request) Successful in 4m15s
CI / integration_tests (pull_request) Failing after 5m59s
CI / unit_tests (pull_request) Failing after 6m7s
CI / docker (pull_request) Has been skipped
CI / e2e_tests (pull_request) Successful in 8m24s
CI / coverage (pull_request) Successful in 12m9s
CI / status-check (pull_request) Failing after 1s
2026-03-28 02:41:01 +00:00
Compare
brent.edwards dismissed freemo's review 2026-03-28 02:41:01 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards force-pushed tdd/m5-validation-required-flag from 45bcbc457e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 28s
CI / helm (pull_request) Successful in 34s
CI / lint (pull_request) Successful in 3m55s
CI / typecheck (pull_request) Successful in 3m55s
CI / quality (pull_request) Successful in 4m29s
CI / security (pull_request) Successful in 4m52s
CI / integration_tests (pull_request) Successful in 7m33s
CI / unit_tests (pull_request) Successful in 8m9s
CI / docker (pull_request) Successful in 1m30s
CI / e2e_tests (pull_request) Successful in 12m58s
CI / coverage (pull_request) Successful in 11m52s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Failing after 35m44s
to 1507425240
All checks were successful
CI / lint (pull_request) Successful in 3m18s
CI / typecheck (pull_request) Successful in 4m6s
CI / security (pull_request) Successful in 4m15s
CI / quality (pull_request) Successful in 3m41s
CI / unit_tests (pull_request) Successful in 9m9s
CI / build (pull_request) Successful in 15s
CI / integration_tests (pull_request) Successful in 8m55s
CI / helm (pull_request) Successful in 22s
CI / e2e_tests (pull_request) Successful in 12m45s
CI / benchmark-publish (pull_request) Has been skipped
CI / docker (pull_request) Successful in 1m32s
CI / coverage (pull_request) Successful in 12m46s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h16m51s
2026-03-28 05:55:08 +00:00
Compare
brent.edwards scheduled this pull request to auto merge when all checks succeed 2026-03-28 05:55:59 +00:00
brent.edwards deleted branch tdd/m5-validation-required-flag 2026-03-28 18:09:28 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!1133
No description provided.