test(cli): add failing tests for agents init --yes missing option (#536) #566

Nit: No teardown() method to clean up self._tmpdir. ASV calls setup() before each iteration, so repeated runs will accumulate orphan temp directories. Consider adding:

def teardown(self) -> None:
    import shutil
    shutil.rmtree(self._tmpdir, ignore_errors=True)

Nit: No `teardown()` method to clean up `self._tmpdir`. ASV calls `setup()` before each iteration, so repeated runs will accumulate orphan temp directories. Consider adding: ```python def teardown(self) -> None: import shutil shutil.rmtree(self._tmpdir, ignore_errors=True) ```

brent.edwards marked this conversation as resolved

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +32,4 @@

				        self._runner = CliRunner()

				        self._tmpdir = tempfile.mkdtemp()

				    def time_init_yes_flag(self) -> None:

brent.edwards commented

Minor: The mock setup block (lines 37-45 / 51-59) is duplicated between the two benchmark methods. Could extract a helper like _invoke_with_mock(self, args: list[str]) to reduce duplication. Not blocking — the two methods are short enough that readability isn't harmed.

Minor: The mock setup block (lines 37-45 / 51-59) is duplicated between the two benchmark methods. Could extract a helper like `_invoke_with_mock(self, args: list[str])` to reduce duplication. Not blocking — the two methods are short enough that readability isn't harmed.

brent.edwards marked this conversation as resolved

features/cli_init_yes_flag.feature Outdated

						
				@@ -0,0 +15,4 @@

				    Given I have a temporary project directory for init

				    When I run agents init with the --yes flag

				    Then the init command should exit with code 0

				    And the init output should contain "initialized successfully"

brent.edwards commented

The expected output strings ("initialized successfully", "Location:", "Database:", "Project Initialized") are reasonable guesses based on the specification, but since the --yes code path doesn't exist yet, the bug-fix author for #522 will need to ensure their output matches these expectations — or coordinate to update these assertions. Just flagging so this is on the radar for whoever picks up #522.

Resolved — Corrected in commit 429ba8db per F1 feedback from @CoreRasurae. These were not guesses — the spec provides exact expected output at docs/specification.md:1381-1402 and Workflow Example 1 at 34317-34326. Assertions now match the specification: "Initialized (non-interactive)", "Data Dir:", "Database:", "Initialized" (box title).

The expected output strings (`"initialized successfully"`, `"Location:"`, `"Database:"`, `"Project Initialized"`) are reasonable guesses based on the specification, but since the `--yes` code path doesn't exist yet, the bug-fix author for #522 will need to ensure their output matches these expectations — or coordinate to update these assertions. Just flagging so this is on the radar for whoever picks up #522. --- **Resolved** — Corrected in commit `429ba8db` per F1 feedback from @CoreRasurae. These were not guesses — the spec provides exact expected output at `docs/specification.md:1381-1402` and Workflow Example 1 at `34317-34326`. Assertions now match the specification: `"Initialized (non-interactive)"`, `"Data Dir:"`, `"Database:"`, `"Initialized"` (box title).

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +19,4 @@

				@given("I have a temporary project directory for init")

				def step_temp_project_directory(context):

				    """Create a temporary directory and store it on *context*."""

				    context.temp_dir = tempfile.mkdtemp()

brent.edwards commented

Minor: context.original_cwd is saved here but never restored (no os.chdir(context.original_cwd) in a cleanup/after step). The existing project_commands_coverage_steps.py has the same pattern, so this is consistent — but if environment.py's after_scenario doesn't handle CWD restoration and the temp dir gets cleaned up first, subsequent scenarios could see a stale CWD.

Worth confirming after_scenario handles this, or adding a restore in a @then cleanup step or after_scenario hook. Low priority since it matches existing convention.

Resolved — Addressed in commit 435093c2. Added context.add_cleanup(_restore_cwd, context) which restores the original CWD and cleans up the temp directory via a Behave cleanup callback, ensuring proper teardown regardless of scenario outcome.

Minor: `context.original_cwd` is saved here but never restored (no `os.chdir(context.original_cwd)` in a cleanup/after step). The existing `project_commands_coverage_steps.py` has the same pattern, so this is consistent — but if `environment.py`'s `after_scenario` doesn't handle CWD restoration and the temp dir gets cleaned up first, subsequent scenarios could see a stale CWD. Worth confirming `after_scenario` handles this, or adding a restore in a `@then` cleanup step or `after_scenario` hook. Low priority since it matches existing convention. --- **Resolved** — Addressed in commit `435093c2`. Added `context.add_cleanup(_restore_cwd, context)` which restores the original CWD and cleans up the temp directory via a Behave cleanup callback, ensuring proper teardown regardless of scenario outcome.

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +65,4 @@

				    assert text in output, f"Expected '{text}' in output:\n{output}"

				@then("no interactive prompt should have been presented")

brent.edwards commented

2026-03-04 20:40:57 +00:00

Nit: step_no_interactive_prompt currently only re-asserts exit_code == 0, which the preceding step in the scenario already checks. This makes the step a tautology — it doesn't actually verify that no prompts were presented.

When the bug-fix author implements --yes, consider strengthening this to something like:

# Verify no prompt-like tokens appeared in output
output = context.result["output"]
for prompt_token in ("[Y/n]", "[y/N]", "? ", "Enter "):
    assert prompt_token not in output, (
        f"Unexpected prompt token '{prompt_token}' found in output:\n{output}"
    )

That way the step actually exercises the "no prompts" invariant rather than duplicating the exit-code check. Not blocking since the fix PR will likely revisit this step anyway.

Resolved — Addressed in commit 435093c2. The step now checks for prompt-like tokens ([Y/n], [y/N], Continue? , Proceed? , Enter , Confirm , (yes/no)) instead of duplicating the exit-code assertion. The overly broad "? " token was subsequently refined in commit 429ba8db per F5 feedback from @CoreRasurae.

Nit: `step_no_interactive_prompt` currently only re-asserts `exit_code == 0`, which the preceding step in the scenario already checks. This makes the step a tautology — it doesn't actually verify that no prompts were presented. When the bug-fix author implements `--yes`, consider strengthening this to something like: ```python # Verify no prompt-like tokens appeared in output output = context.result["output"] for prompt_token in ("[Y/n]", "[y/N]", "? ", "Enter "): assert prompt_token not in output, ( f"Unexpected prompt token '{prompt_token}' found in output:\n{output}" ) ``` That way the step actually exercises the "no prompts" invariant rather than duplicating the exit-code check. Not blocking since the fix PR will likely revisit this step anyway. --- **Resolved** — Addressed in commit `435093c2`. The step now checks for prompt-like tokens (`[Y/n]`, `[y/N]`, `Continue? `, `Proceed? `, `Enter `, `Confirm `, `(yes/no)`) instead of duplicating the exit-code assertion. The overly broad `"? "` token was subsequently refined in commit `429ba8db` per F5 feedback from @CoreRasurae.

brent.edwards referenced this issue from a commit

refactor(test): address PR #566 review feedback

brent.edwards referenced this pull request

2026-03-04 20:57:32 +00:00

test(resource): add failing tests for built-in fs-directory type bootstrap (#537) #567

brent.edwards referenced this pull request

2026-03-04 20:57:32 +00:00

test(resource): add failing tests for built-in fs-directory type bootstrap (#537) #567

CoreRasurae commented

2026-03-04 21:17:13 +00:00

Code Review Report: Commit `435093c` by Brent Edwards

Branch: feature/m3-test-init-yes-flag
Issue: #536 | Related Bug: #522
Scope: TDD failing tests for agents init --yes missing option + PR review feedback fixes

Context

Brent's last commit (435093c) addresses PR #566 self-review feedback on top of the original test commit (27d8fd4). The PR adds TDD-style failing tests (Behave BDD, Robot Framework, ASV benchmarks) for the not-yet-implemented agents init --yes flag. The intent is that these tests fail now and will pass once bug #522 is fixed.

The refactor commit specifically:

Strengthened step_no_interactive_prompt to check for prompt-like tokens
Added CWD restoration cleanup via context.add_cleanup()
Added teardown() to the ASV benchmark suite
Extracted _invoke_with_mock() helper to remove duplication

Findings

F1 -- CRITICAL: Test Assertions Contradict the Specification

The expected output strings in both the Behave feature and the Robot test do not match the specification at docs/specification.md (lines 1374-1450). Since this is a TDD approach where tests are supposed to drive the correct implementation, these wrong expectations will either force the fix author to implement the wrong output or require rewriting these tests -- defeating TDD's purpose.

Assertion in Tests	Specification (line 1381-1402)	Status
`"initialized successfully"`	`"Initialized (non-interactive)"`	WRONG
`"Location:"`	`"Data Dir:"`	WRONG
`"Project Initialized"`	`"Initialized"` (box title)	WRONG
`"Database:"`	`"Database:"`	Correct

Affected files:

features/cli_init_yes_flag.feature:20,26,27 -- asserts "initialized successfully", "Location:", "Project Initialized"
robot/cli_init_yes_flag.robot:28 -- asserts "initialized successfully"

The PR self-review comment (#53258) acknowledged these as "reasonable guesses" and deferred to the #522 fix author. But they are not guesses -- the spec provides exact example output for agents init --yes at lines 1374-1450 and again in Workflow Example 1 at lines 34317-34326. The correct strings are documented.

Recommendation: Update assertions to match the spec now. The whole value of TDD here is that the tests serve as an executable spec.

F2 -- HIGH: No Test Coverage for `-y` Short-Form Alias

The specification at docs/specification.md:1217 defines:

agents init [--yes|-y]

No scenario in the feature file, Robot test, or benchmark exercises the -y short form. If the fix author only implements --yes but forgets -y, these tests won't catch it.

Recommendation: Add at least one scenario that invokes agents init -y.

F3 -- HIGH: Mock Bypasses Service Logic -- No Behavioral Verification

features/steps/cli_init_yes_flag_steps.py:38-50 -- The mock patches get_container entirely and provides a fake project_service, but never verifies that:

initialize_project was actually called (assert_called_once())
The --yes flag was propagated as a non-interactive parameter (assert_called_with(...))
The mock's return value was actually consumed by the CLI

This means the test only confirms --yes is accepted as a CLI option (no NoSuchOption), but does NOT confirm it triggers non-interactive behavior. A CLI implementation that simply accepts --yes and ignores it would pass all three scenarios.

Recommendation: Add mock_service.initialize_project.assert_called_once() at minimum. Ideally, assert the call included a non-interactive/yes parameter.

F4 -- MEDIUM: Robot Framework Tests Leak Temp Directories

robot/cli_init_yes_flag.robot:15,23 -- Both Robot test cases create temp directories via Evaluate __import__('tempfile').mkdtemp(...) but never clean them up. There is no [Teardown] keyword on either test case, and the Suite Teardown calls Cleanup Test Environment from common.resource, which does not know about per-test temp directories.

This contrasts with the Behave steps and ASV benchmarks, which now have proper cleanup (addressed in the latest commit for those files, but not Robot).

Recommendation: Add [Teardown] Remove Directory ${tmpdir} recursive=True to each Robot test case.

F5 -- MEDIUM: Prompt Token `"? "` Is Overly Broad

features/steps/cli_init_yes_flag_steps.py:82 -- The prompt_tokens tuple includes "? " (question mark + space). This could match legitimate non-prompt output such as error messages or help text (e.g., "Unexpected state? Check logs"). This risks false test failures when the fix is applied if any informational message contains "? ".

prompt_tokens = ("[Y/n]", "[y/N]", "? ", "Enter ", "Confirm ", "(yes/no)")

Recommendation: Replace "? " with more specific prompt patterns like "Continue? " or "Proceed? ", or remove it and rely on the more specific tokens.

F6 -- MEDIUM: Benchmark Silently Measures Error-Path Latency

benchmarks/cli_init_yes_bench.py:52-53 -- The _invoke_with_mock() helper discards the CliRunner.invoke() result entirely. Since --yes does not exist yet, the benchmark currently measures how fast the CLI raises NoSuchOption -- not actual init latency. When the fix is applied, benchmarks will show an apparent regression (successful init naturally takes longer than an instant error).

Recommendation: Store and assert the result exit code, or at minimum log a warning when the command fails, so benchmark data is meaningful.

F7 -- LOW: PR Self-Review Comment About Output Strings Left Unresolved

The self-review flagged the output string mismatch (Finding F1) but the comment was neither resolved nor explicitly deferred via a follow-up issue. Comments about teardown and duplication were resolved; comments about prompt assertion and CWD restoration were addressed in code but also left unresolved.

Recommendation: Resolve all review comments explicitly -- either fix them or create tracked follow-up issues.

F8 -- LOW: Premature `ISSUES CLOSED: #536` in Commit Messages

Both commits 27d8fd4 and 60e78e4 include ISSUES CLOSED: #536 in their commit messages. However, issue #536's Definition of Done has two unchecked quality subtasks:

Coverage >= 97% verification
Full nox pass across the entire codebase

Brent's PR comment acknowledges this is intentional for TDD, but the commit trailer creates an inconsistency -- automated tooling parsing ISSUES CLOSED may prematurely close the issue.

Recommendation: Remove ISSUES CLOSED: #536 from commit messages and let the PR merge trigger issue closure, or use Refs: #536 instead.

Summary

#	Severity	Category	Finding
F1	CRITICAL	Test Correctness	Output assertions contradict `docs/specification.md`
F2	HIGH	Test Coverage	`-y` short-form alias not tested
F3	HIGH	Test Flaw	Mock has no behavioral verification (`assert_called`)
F4	MEDIUM	Resource Leak	Robot test temp directories never cleaned up
F5	MEDIUM	Test Flaw	`"? "` prompt token too broad -- false positive risk
F6	MEDIUM	Performance	Benchmark discards result, measures error path
F7	LOW	Process	PR review comments left unresolved
F8	LOW	Process	Premature `ISSUES CLOSED` in commit trailer

Positive observations: The commit correctly addresses 3 of 5 PR review items (teardown, deduplication, CWD cleanup). Code structure is clean, docstrings are present, and the TDD approach is sound in principle. The Behave/Robot/ASV test trifecta follows the project's established pattern.

Bottom line: F1 should be fixed before merge -- the tests assert the wrong expected output, which undermines the core TDD value proposition.

## Code Review Report: Commit `435093c` by Brent Edwards **Branch:** `feature/m3-test-init-yes-flag` **Issue:** #536 | **Related Bug:** #522 **Scope:** TDD failing tests for `agents init --yes` missing option + PR review feedback fixes --- ### Context Brent's last commit (`435093c`) addresses PR #566 self-review feedback on top of the original test commit (`27d8fd4`). The PR adds TDD-style failing tests (Behave BDD, Robot Framework, ASV benchmarks) for the not-yet-implemented `agents init --yes` flag. The intent is that these tests fail now and will pass once bug #522 is fixed. The refactor commit specifically: - Strengthened `step_no_interactive_prompt` to check for prompt-like tokens - Added CWD restoration cleanup via `context.add_cleanup()` - Added `teardown()` to the ASV benchmark suite - Extracted `_invoke_with_mock()` helper to remove duplication --- ### Findings #### F1 -- CRITICAL: Test Assertions Contradict the Specification The expected output strings in both the Behave feature and the Robot test **do not match** the specification at `docs/specification.md` (lines 1374-1450). Since this is a TDD approach where tests are supposed to drive the correct implementation, these wrong expectations will either force the fix author to implement the wrong output or require rewriting these tests -- defeating TDD's purpose. | Assertion in Tests | Specification (line 1381-1402) | Status | |---|---|---| | `"initialized successfully"` | `"Initialized (non-interactive)"` | **WRONG** | | `"Location:"` | `"Data Dir:"` | **WRONG** | | `"Project Initialized"` | `"Initialized"` (box title) | **WRONG** | | `"Database:"` | `"Database:"` | Correct | **Affected files:** - `features/cli_init_yes_flag.feature:20,26,27` -- asserts `"initialized successfully"`, `"Location:"`, `"Project Initialized"` - `robot/cli_init_yes_flag.robot:28` -- asserts `"initialized successfully"` The PR self-review comment (#53258) acknowledged these as "reasonable guesses" and deferred to the #522 fix author. But they are **not guesses** -- the spec provides exact example output for `agents init --yes` at lines 1374-1450 and again in Workflow Example 1 at lines 34317-34326. The correct strings are documented. **Recommendation:** Update assertions to match the spec now. The whole value of TDD here is that the tests serve as an executable spec. --- #### F2 -- HIGH: No Test Coverage for `-y` Short-Form Alias The specification at `docs/specification.md:1217` defines: ``` agents init [--yes|-y] ``` No scenario in the feature file, Robot test, or benchmark exercises the `-y` short form. If the fix author only implements `--yes` but forgets `-y`, these tests won't catch it. **Recommendation:** Add at least one scenario that invokes `agents init -y`. --- #### F3 -- HIGH: Mock Bypasses Service Logic -- No Behavioral Verification `features/steps/cli_init_yes_flag_steps.py:38-50` -- The mock patches `get_container` entirely and provides a fake `project_service`, but never verifies that: 1. `initialize_project` was actually called (`assert_called_once()`) 2. The `--yes` flag was propagated as a non-interactive parameter (`assert_called_with(...)`) 3. The mock's return value was actually consumed by the CLI This means the test only confirms `--yes` is accepted as a CLI option (no `NoSuchOption`), but does NOT confirm it triggers non-interactive behavior. A CLI implementation that simply accepts `--yes` and ignores it would pass all three scenarios. **Recommendation:** Add `mock_service.initialize_project.assert_called_once()` at minimum. Ideally, assert the call included a non-interactive/yes parameter. --- #### F4 -- MEDIUM: Robot Framework Tests Leak Temp Directories `robot/cli_init_yes_flag.robot:15,23` -- Both Robot test cases create temp directories via `Evaluate __import__('tempfile').mkdtemp(...)` but never clean them up. There is no `[Teardown]` keyword on either test case, and the `Suite Teardown` calls `Cleanup Test Environment` from `common.resource`, which does not know about per-test temp directories. This contrasts with the Behave steps and ASV benchmarks, which now have proper cleanup (addressed in the latest commit for those files, but not Robot). **Recommendation:** Add `[Teardown] Remove Directory ${tmpdir} recursive=True` to each Robot test case. --- #### F5 -- MEDIUM: Prompt Token `"? "` Is Overly Broad `features/steps/cli_init_yes_flag_steps.py:82` -- The prompt_tokens tuple includes `"? "` (question mark + space). This could match legitimate non-prompt output such as error messages or help text (e.g., `"Unexpected state? Check logs"`). This risks false test failures when the fix is applied if any informational message contains `"? "`. ```python prompt_tokens = ("[Y/n]", "[y/N]", "? ", "Enter ", "Confirm ", "(yes/no)") ``` **Recommendation:** Replace `"? "` with more specific prompt patterns like `"Continue? "` or `"Proceed? "`, or remove it and rely on the more specific tokens. --- #### F6 -- MEDIUM: Benchmark Silently Measures Error-Path Latency `benchmarks/cli_init_yes_bench.py:52-53` -- The `_invoke_with_mock()` helper discards the `CliRunner.invoke()` result entirely. Since `--yes` does not exist yet, the benchmark currently measures how fast the CLI raises `NoSuchOption` -- not actual init latency. When the fix is applied, benchmarks will show an apparent regression (successful init naturally takes longer than an instant error). **Recommendation:** Store and assert the result exit code, or at minimum log a warning when the command fails, so benchmark data is meaningful. --- #### F7 -- LOW: PR Self-Review Comment About Output Strings Left Unresolved The self-review flagged the output string mismatch (Finding F1) but the comment was neither resolved nor explicitly deferred via a follow-up issue. Comments about teardown and duplication were resolved; comments about prompt assertion and CWD restoration were addressed in code but also left unresolved. **Recommendation:** Resolve all review comments explicitly -- either fix them or create tracked follow-up issues. --- #### F8 -- LOW: Premature `ISSUES CLOSED: #536` in Commit Messages Both commits `27d8fd4` and `60e78e4` include `ISSUES CLOSED: #536` in their commit messages. However, issue #536's Definition of Done has two unchecked quality subtasks: - Coverage >= 97% verification - Full `nox` pass across the entire codebase Brent's PR comment acknowledges this is intentional for TDD, but the commit trailer creates an inconsistency -- automated tooling parsing `ISSUES CLOSED` may prematurely close the issue. **Recommendation:** Remove `ISSUES CLOSED: #536` from commit messages and let the PR merge trigger issue closure, or use `Refs: #536` instead. --- ### Summary | # | Severity | Category | Finding | |---|----------|----------|---------| | F1 | **CRITICAL** | Test Correctness | Output assertions contradict `docs/specification.md` | | F2 | **HIGH** | Test Coverage | `-y` short-form alias not tested | | F3 | **HIGH** | Test Flaw | Mock has no behavioral verification (`assert_called`) | | F4 | **MEDIUM** | Resource Leak | Robot test temp directories never cleaned up | | F5 | **MEDIUM** | Test Flaw | `"? "` prompt token too broad -- false positive risk | | F6 | **MEDIUM** | Performance | Benchmark discards result, measures error path | | F7 | **LOW** | Process | PR review comments left unresolved | | F8 | **LOW** | Process | Premature `ISSUES CLOSED` in commit trailer | **Positive observations:** The commit correctly addresses 3 of 5 PR review items (teardown, deduplication, CWD cleanup). Code structure is clean, docstrings are present, and the TDD approach is sound in principle. The Behave/Robot/ASV test trifecta follows the project's established pattern. **Bottom line:** F1 should be fixed before merge -- the tests assert the wrong expected output, which undermines the core TDD value proposition.

brent.edwards commented

2026-03-04 22:05:07 +00:00

Thanks @CoreRasurae for the thorough review. All findings F1–F7 are addressed in commit 429ba8db. Here's the disposition of each:

Findings Addressed

F1 CRITICAL — Output assertions corrected ✅

You're right — these weren't "reasonable guesses", the spec provides exact expected output at docs/specification.md:1381-1402 and Workflow Example 1 at 34317-34326. Fixed:

Before	After (matches spec)
`"initialized successfully"`	`"Initialized (non-interactive)"`
`"Location:"`	`"Data Dir:"`
`"Project Initialized"`	`"Initialized"`

Updated in features/cli_init_yes_flag.feature (scenarios 2 and 4) and robot/cli_init_yes_flag.robot (test case 2).

F2 HIGH — `-y` short-form alias coverage added ✅

New Behave scenario: "-y short-form alias completes without error"
New Robot test case: "Init Short Y Flag Exits Without Error"
Shared _run_init_with_flag(context, flag) helper serves both --yes and -y steps

F3 HIGH — Mock behavioral verification added ✅

mock_service is now stored on context after invocation
New Then step: "the project service initialize_project should have been called" — calls assert_called_once()
Added to the first scenario ("agents init --yes completes without error")

F4 MEDIUM — Robot temp directory cleanup added ✅

Added [Teardown] Remove Directory ${tmpdir} recursive=True to all three Robot test cases

F5 MEDIUM — `"? "` prompt token replaced ✅

Replaced the overly broad "? " with specific "Continue? " and "Proceed? " to avoid false positives from informational messages

F6 MEDIUM — Benchmark exit code captured ✅

_invoke_with_mock now stores result.exit_code on self._last_exit_code so benchmark data can be correlated with command success/failure

F7 LOW — Prior self-review comments ✅

All five original self-review items from review #1939 are now addressed by the combination of the previous commit (435093c) and this one (429ba8db).

F8 LOW — `ISSUES CLOSED` in commit trailers

Acknowledged. The existing commits (27d8fd4, 60e78e4) cannot be safely amended since they've been pushed and the PR has active reviews. This new commit uses Refs: #536 instead. Going forward, TDD-style PRs will use Refs: rather than ISSUES CLOSED: to avoid premature closure by automation.

Verification

nox -s lint — passed
nox -s typecheck — passed (0 errors, 0 warnings)

Thanks @CoreRasurae for the thorough review. All findings F1–F7 are addressed in commit `429ba8db`. Here's the disposition of each: ## Findings Addressed ### F1 CRITICAL — Output assertions corrected ✅ You're right — these weren't "reasonable guesses", the spec provides exact expected output at `docs/specification.md:1381-1402` and Workflow Example 1 at `34317-34326`. Fixed: | Before | After (matches spec) | |--------|----------------------| | `"initialized successfully"` | `"Initialized (non-interactive)"` | | `"Location:"` | `"Data Dir:"` | | `"Project Initialized"` | `"Initialized"` | Updated in `features/cli_init_yes_flag.feature` (scenarios 2 and 4) and `robot/cli_init_yes_flag.robot` (test case 2). ### F2 HIGH — `-y` short-form alias coverage added ✅ - New Behave scenario: *"-y short-form alias completes without error"* - New Robot test case: *"Init Short Y Flag Exits Without Error"* - Shared `_run_init_with_flag(context, flag)` helper serves both `--yes` and `-y` steps ### F3 HIGH — Mock behavioral verification added ✅ - `mock_service` is now stored on `context` after invocation - New Then step: *"the project service initialize_project should have been called"* — calls `assert_called_once()` - Added to the first scenario ("agents init --yes completes without error") ### F4 MEDIUM — Robot temp directory cleanup added ✅ - Added `[Teardown] Remove Directory ${tmpdir} recursive=True` to all three Robot test cases ### F5 MEDIUM — `"? "` prompt token replaced ✅ - Replaced the overly broad `"? "` with specific `"Continue? "` and `"Proceed? "` to avoid false positives from informational messages ### F6 MEDIUM — Benchmark exit code captured ✅ - `_invoke_with_mock` now stores `result.exit_code` on `self._last_exit_code` so benchmark data can be correlated with command success/failure ### F7 LOW — Prior self-review comments ✅ All five original self-review items from review #1939 are now addressed by the combination of the previous commit (`435093c`) and this one (`429ba8db`). ### F8 LOW — `ISSUES CLOSED` in commit trailers Acknowledged. The existing commits (`27d8fd4`, `60e78e4`) cannot be safely amended since they've been pushed and the PR has active reviews. This new commit uses `Refs: #536` instead. Going forward, TDD-style PRs will use `Refs:` rather than `ISSUES CLOSED:` to avoid premature closure by automation. ## Verification - `nox -s lint` — passed - `nox -s typecheck` — passed (0 errors, 0 warnings)

brent.edwards referenced this pull request

2026-03-04 22:19:47 +00:00

test(resource): add failing tests for built-in git-checkout type bootstrap (#524) #568

brent.edwards referenced this pull request

2026-03-04 22:19:47 +00:00

test(resource): add failing tests for built-in git-checkout type bootstrap (#524) #568

brent.edwards referenced this issue from a commit

2026-03-04 22:40:18 +00:00

fix(test): address self-review findings on git-checkout bootstrap tests

brent.edwards referenced this pull request

2026-03-04 22:40:49 +00:00

test(resource): add failing tests for built-in git-checkout type bootstrap (#524) #568

brent.edwards referenced this pull request

2026-03-04 22:41:08 +00:00

test(resource): add failing tests for built-in fs-directory type bootstrap (#537) #567

CoreRasurae requested changes 2026-03-04 23:13:32 +00:00

CoreRasurae left a comment

Code Review — commit `429ba8db` (fix(test): address code review findings F1-F7)

Reviewed against: Issue #536 acceptance criteria, docs/specification.md § agents init (lines 1215–1450), and project conventions in CONTRIBUTING.md.

The commit does a solid job addressing the prior F1–F7 review findings — output strings now match the spec's Initialized (non-interactive) / Data Dir: wording, -y coverage was added, mock verification was introduced, and Robot teardowns are in place. The remaining gaps are primarily about completeness against the specification and parity between --yes and -y verification depth.

Summary of findings

ID	Severity	File(s)	Finding
F1	HIGH	`cli_init_yes_flag.feature`, `cli_init_yes_flag.robot`	Missing `Config:` and `Directories:` output assertions required by spec and issue AC #3
F2	HIGH	`cli_init_yes_flag.feature`	`-y` scenario lacks mock behavioral verification (`initialize_project` call check)
F3	HIGH	`cli_init_yes_flag.robot`	Robot "Produces Summary Output" test only checks status message, no spec output fields
F4	MEDIUM	`cli_init_yes_bench.py`	Benchmark stores exit code but never validates it — will silently report error-path latency
F5	MEDIUM	`cli_init_yes_bench.py`	No `-y` short-form benchmark variant
F6	MEDIUM	`cli_init_yes_flag_steps.py`	Prompt token `"Enter "` risks false positives on legitimate output
F7	MEDIUM	`cli_init_yes_flag.robot`	`[Teardown]` placed between action and assertion steps — functionally correct but non-idiomatic placement
F8	LOW	`cli_init_yes_flag.feature`	No `@wip` or CI filter tag to prevent known-failing TDD tests from polluting CI reports
F9	LOW	`cli_init_yes_flag.feature`	Prompt-absence step in "skips prompts" scenario is unreachable until fix is applied — inherent TDD limitation, just be aware

Detailed findings are posted as inline comments on the relevant lines below.

## Code Review — commit `429ba8db` (fix(test): address code review findings F1-F7) Reviewed against: [Issue #536](https://git.cleverthis.com/cleveragents/cleveragents-core/issues/536) acceptance criteria, `docs/specification.md` § `agents init` (lines 1215–1450), and project conventions in `CONTRIBUTING.md`. The commit does a solid job addressing the prior F1–F7 review findings — output strings now match the spec's `Initialized (non-interactive)` / `Data Dir:` wording, `-y` coverage was added, mock verification was introduced, and Robot teardowns are in place. The remaining gaps are primarily about **completeness against the specification** and **parity between `--yes` and `-y` verification depth**. --- ### Summary of findings | ID | Severity | File(s) | Finding | |:---|:---------|:--------|:--------| | F1 | **HIGH** | `cli_init_yes_flag.feature`, `cli_init_yes_flag.robot` | Missing `Config:` and `Directories:` output assertions required by spec and issue AC #3 | | F2 | **HIGH** | `cli_init_yes_flag.feature` | `-y` scenario lacks mock behavioral verification (`initialize_project` call check) | | F3 | **HIGH** | `cli_init_yes_flag.robot` | Robot "Produces Summary Output" test only checks status message, no spec output fields | | F4 | **MEDIUM** | `cli_init_yes_bench.py` | Benchmark stores exit code but never validates it — will silently report error-path latency | | F5 | **MEDIUM** | `cli_init_yes_bench.py` | No `-y` short-form benchmark variant | | F6 | **MEDIUM** | `cli_init_yes_flag_steps.py` | Prompt token `"Enter "` risks false positives on legitimate output | | F7 | **MEDIUM** | `cli_init_yes_flag.robot` | `[Teardown]` placed between action and assertion steps — functionally correct but non-idiomatic placement | | F8 | **LOW** | `cli_init_yes_flag.feature` | No `@wip` or CI filter tag to prevent known-failing TDD tests from polluting CI reports | | F9 | **LOW** | `cli_init_yes_flag.feature` | Prompt-absence step in "skips prompts" scenario is unreachable until fix is applied — inherent TDD limitation, just be aware | --- ### Detailed findings are posted as inline comments on the relevant lines below.

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +59,4 @@

				    def time_init_yes_flag(self) -> None:

				        """Measure end-to-end latency of ``agents init --yes``."""

				        self._invoke_with_mock(["init", "--yes"], "bench-project")

CoreRasurae commented

2026-03-04 23:13:32 +00:00

F5 — MEDIUM: Missing -y short-form benchmark variant

The suite benchmarks --yes (long form) and --yes --path but not -y (short form). If alias resolution introduces different overhead, this would go undetected.

Suggested fix — add a -y benchmark:

def time_init_short_y_flag(self) -> None:
    """Measure end-to-end latency of ``agents init -y``."""
    self._invoke_with_mock(["init", "-y"], "bench-project")

**F5 — MEDIUM: Missing `-y` short-form benchmark variant** The suite benchmarks `--yes` (long form) and `--yes --path` but not `-y` (short form). If alias resolution introduces different overhead, this would go undetected. **Suggested fix — add a `-y` benchmark:** ```python def time_init_short_y_flag(self) -> None: """Measure end-to-end latency of ``agents init -y``.""" self._invoke_with_mock(["init", "-y"], "bench-project") ```

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +55,4 @@

				            mock_container.return_value.project_service.return_value = mock_service

				            result = self._runner.invoke(app, args)

				            self._last_exit_code = result.exit_code

CoreRasurae commented

F4 — MEDIUM: Benchmark stores exit code but never validates it

The docstring correctly notes that benchmark data is only meaningful on exit code 0, but _last_exit_code is stored and never checked. After the --yes flag is implemented, a regression that reintroduces NoSuchOption would silently cause the benchmark to report error-path latency with no signal to the developer.

Suggested fix — consider one of:

Add a track_exit_code method so ASV tracks the value as a metric:

def track_exit_code(self) -> int:
    self._invoke_with_mock(["init", "--yes"], "bench-project")
    return self._last_exit_code

Or add an assertion inside _invoke_with_mock gated by a flag (disabled while TDD-failing, enabled after fix).

**F4 — MEDIUM: Benchmark stores exit code but never validates it** The docstring correctly notes that benchmark data is only meaningful on exit code 0, but `_last_exit_code` is stored and never checked. After the `--yes` flag is implemented, a regression that reintroduces `NoSuchOption` would silently cause the benchmark to report error-path latency with no signal to the developer. **Suggested fix — consider one of:** 1. Add a `track_exit_code` method so ASV tracks the value as a metric: ```python def track_exit_code(self) -> int: self._invoke_with_mock(["init", "--yes"], "bench-project") return self._last_exit_code ``` 2. Or add an assertion inside `_invoke_with_mock` gated by a flag (disabled while TDD-failing, enabled after fix).

features/cli_init_yes_flag.feature Outdated

						
				@@ -0,0 +4,4 @@

				  I want to run "agents init --yes" for non-interactive initialization

				  So that I can skip interactive prompts and use sensible defaults

				  @tdd @bug522

CoreRasurae commented

F8 — LOW: No @wip or expected-failure tag for CI filtering

All scenarios use @tdd @bug522 but there is no tag that the CI runner / nox session can use to skip or expect-fail these tests. Without a filter, CI will report 4 failures on every run until #522 is fixed.

Consider adding @wip and configuring nox -s unit_tests to pass --tags='not @wip' (or equivalent), or to treat @wip failures as expected.

**F8 — LOW: No `@wip` or expected-failure tag for CI filtering** All scenarios use `@tdd @bug522` but there is no tag that the CI runner / nox session can use to skip or expect-fail these tests. Without a filter, CI will report 4 failures on every run until #522 is fixed. Consider adding `@wip` and configuring `nox -s unit_tests` to pass `--tags='not @wip'` (or equivalent), or to treat `@wip` failures as expected.

features/cli_init_yes_flag.feature Outdated

						
				@@ -0,0 +17,4 @@

				    When I run agents init with the --yes flag

				    Then the init command should exit with code 0

				    And the init output should contain "Initialized (non-interactive)"

				    And no interactive prompt should have been presented

CoreRasurae commented

F9 — LOW: Prompt-absence check is unreachable until fix is applied

The no interactive prompt should have been presented step (line 20) will never execute in the current TDD state because the preceding exit-code assertion at line 18 fails first (since --yes doesn't exist yet). This means the prompt-suppression logic has zero validation that the step itself works correctly until the fix arrives.

This is an inherent limitation of TDD failing tests — not something that needs fixing now, but worth being aware of when the fix PR is submitted. At that point, ensure the prompt check is manually verified to be exercising real logic.

**F9 — LOW: Prompt-absence check is unreachable until fix is applied** The `no interactive prompt should have been presented` step (line 20) will never execute in the current TDD state because the preceding exit-code assertion at line 18 fails first (since `--yes` doesn't exist yet). This means the prompt-suppression logic has zero validation that the step itself works correctly until the fix arrives. This is an inherent limitation of TDD failing tests — not something that needs fixing now, but worth being aware of when the fix PR is submitted. At that point, ensure the prompt check is manually verified to be exercising real logic.

features/cli_init_yes_flag.feature Outdated

						
				@@ -0,0 +24,4 @@

				    Given I have a temporary project directory for init

				    When I run agents init with the -y flag

				    Then the init command should exit with code 0

				    And the init output should contain "Initialized (non-interactive)"

CoreRasurae commented

F2 — HIGH: No mock behavioral verification for -y short-form path

The --yes scenario (line 8) includes And the project service initialize_project should have been called, verifying the mock was actually exercised. This scenario lacks that step.

Without it, the -y code path could produce correct output text through a fallback/error path without ever calling initialize_project, and the test would still pass.

Suggested fix — add mock verification:

    And the project service initialize_project should have been called

**F2 — HIGH: No mock behavioral verification for `-y` short-form path** The `--yes` scenario (line 8) includes `And the project service initialize_project should have been called`, verifying the mock was actually exercised. This scenario lacks that step. Without it, the `-y` code path could produce correct output text through a fallback/error path without ever calling `initialize_project`, and the test would still pass. **Suggested fix — add mock verification:** ```gherkin And the project service initialize_project should have been called ```

features/cli_init_yes_flag.feature Outdated

						
				@@ -0,0 +33,4 @@

				    Then the init command should exit with code 0

				    And the init output should contain "Data Dir:"

				    And the init output should contain "Database:"

				    And the init output should contain "Initialized"

CoreRasurae commented

2026-03-04 23:13:32 +00:00

F1 — HIGH: Missing Config: and Directories: output assertions against specification

The spec (docs/specification.md:1381-1400) defines four output fields for agents init --yes:

Spec field	Tested here?
`Data Dir:`	Yes (line 34)
`Config:`	No
`Database:`	Yes (line 35)
`Directories:`	No

The issue's acceptance criterion #3 explicitly requires: "produces the expected output (data dir, config, database, directories)". Two of the four are missing.

Suggested fix — add two more assertion steps:

    And the init output should contain "Config:"
    And the init output should contain "Directories:"

**F1 — HIGH: Missing `Config:` and `Directories:` output assertions against specification** The spec (`docs/specification.md:1381-1400`) defines four output fields for `agents init --yes`: | Spec field | Tested here? | |:-----------|:-------------| | `Data Dir:` | Yes (line 34) | | `Config:` | **No** | | `Database:` | Yes (line 35) | | `Directories:` | **No** | The issue's acceptance criterion #3 explicitly requires: *"produces the expected output (data dir, config, database, directories)"*. Two of the four are missing. **Suggested fix — add two more assertion steps:** ```gherkin And the init output should contain "Config:" And the init output should contain "Directories:" ```

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +104,4 @@

				        "[y/N]",

				        "Continue? ",

				        "Proceed? ",

				        "Enter ",

CoreRasurae commented

F6 — MEDIUM: Prompt token "Enter " risks false positives

The substring "Enter " could match legitimate non-prompt output such as "Entered configuration..." or "Enter key configured". The spec's interactive prompt pattern is Continue? [y/N]: — more precise patterns would reduce false-positive risk.

Suggested fix — narrow the token or switch to a pattern:

prompt_tokens = (
    "[Y/n]",
    "[y/N]",
    "Continue? ",
    "Proceed? ",
    "Enter a ",       # more specific than bare "Enter "
    "Confirm ",
    "(yes/no)",
)

Alternatively, a regex like r"(?:Enter|Confirm|Proceed|Continue)\s.*[:?]" would match actual prompt structures more precisely.

**F6 — MEDIUM: Prompt token `"Enter "` risks false positives** The substring `"Enter "` could match legitimate non-prompt output such as `"Entered configuration..."` or `"Enter key configured"`. The spec's interactive prompt pattern is `Continue? [y/N]:` — more precise patterns would reduce false-positive risk. **Suggested fix — narrow the token or switch to a pattern:** ```python prompt_tokens = ( "[Y/n]", "[y/N]", "Continue? ", "Proceed? ", "Enter a ", # more specific than bare "Enter " "Confirm ", "(yes/no)", ) ``` Alternatively, a regex like `r"(?:Enter|Confirm|Proceed|Continue)\s.*[:?]"` would match actual prompt structures more precisely.

robot/cli_init_yes_flag.robot Outdated

						
				@@ -0,0 +15,4 @@

				    ${tmpdir}=    Evaluate    __import__('tempfile').mkdtemp(prefix='init_yes_')

				    ${result}=    Run Process    ${PYTHON}    -m    cleveragents    init    --yes

				    ...    timeout=60s    cwd=${tmpdir}

				    [Teardown]    Remove Directory    ${tmpdir}    recursive=True

CoreRasurae commented

F7 — MEDIUM: [Teardown] placed between action and assertion steps

In Robot Framework, [Teardown] is a test setting that always runs after the full test body, regardless of where it appears in the source. Placing it between Run Process and Should Be Equal is functionally correct but visually misleading — it looks like teardown happens before the assertions.

Every Robot Framework style guide places [Teardown] as the last line of the test case. This same pattern applies to all three test cases in this file (lines 18, 27, 38).

Suggested fix: Move [Teardown] to the final line of each test case.

**F7 — MEDIUM: `[Teardown]` placed between action and assertion steps** In Robot Framework, `[Teardown]` is a test setting that always runs after the full test body, regardless of where it appears in the source. Placing it between `Run Process` and `Should Be Equal` is functionally correct but visually misleading — it looks like teardown happens before the assertions. Every Robot Framework style guide places `[Teardown]` as the last line of the test case. This same pattern applies to all three test cases in this file (lines 18, 27, 38). **Suggested fix:** Move `[Teardown]` to the final line of each test case.

robot/cli_init_yes_flag.robot Outdated

						
				@@ -0,0 +27,4 @@

				    [Teardown]    Remove Directory    ${tmpdir}    recursive=True

				    Should Be Equal As Integers    ${result.rc}    0

				    ...    msg=Expected exit code 0 but got ${result.rc}. stderr: ${result.stderr}

				    Should Contain    ${result.stdout}    Initialized (non-interactive)

CoreRasurae commented

2026-03-04 23:19:19 +00:00

F3 — HIGH: Robot "Produces Summary Output" only checks status message, not spec fields

This test verifies Initialized (non-interactive) but none of the four spec-required output fields (Data Dir:, Config:, Database:, Directories:). This makes it nearly identical to the exit-code-only test above.

Suggested fix — add output field assertions:

    Should Contain    ${result.stdout}    Data Dir:
    ...    msg=Output should contain Data Dir field per spec
    Should Contain    ${result.stdout}    Config:
    ...    msg=Output should contain Config field per spec
    Should Contain    ${result.stdout}    Database:
    ...    msg=Output should contain Database field per spec
    Should Contain    ${result.stdout}    Directories:
    ...    msg=Output should contain Directories field per spec

**F3 — HIGH: Robot "Produces Summary Output" only checks status message, not spec fields** This test verifies `Initialized (non-interactive)` but none of the four spec-required output fields (`Data Dir:`, `Config:`, `Database:`, `Directories:`). This makes it nearly identical to the exit-code-only test above. **Suggested fix — add output field assertions:** ```robot Should Contain ${result.stdout} Data Dir: ... msg=Output should contain Data Dir field per spec Should Contain ${result.stdout} Config: ... msg=Output should contain Config field per spec Should Contain ${result.stdout} Database: ... msg=Output should contain Database field per spec Should Contain ${result.stdout} Directories: ... msg=Output should contain Directories field per spec ```

brent.edwards commented

Thanks @CoreRasurae for the second round. All 9 findings addressed in commit 6428971c.

Disposition of findings (review #1945)

F1 HIGH — Missing `Config:` and `Directories:` output assertions

Fixed. Added And the init output should contain "Config:" and And the init output should contain "Directories:" to the "Output includes expected initialization summary" scenario (lines 36-38). All four spec-required fields (Data Dir:, Config:, Database:, Directories:) from docs/specification.md:1381-1400 are now asserted.

F2 HIGH — No mock behavioral verification for `-y` scenario

Fixed. Added And the project service initialize_project should have been called to the -y short-form scenario, giving it the same behavioral verification as the --yes scenario.

F3 HIGH — Robot summary test only checks status message

Fixed. Added Should Contain assertions for all four spec fields (Data Dir:, Config:, Database:, Directories:) to the "Init Yes Flag Produces Summary Output" Robot test case.

F4 MEDIUM — Benchmark stores exit code but never validates

Fixed. Added track_exit_code() method that invokes the CLI and returns self._last_exit_code. ASV will track this as a metric, making exit code regressions visible in benchmark history without needing an in-benchmark assertion that would conflict with the TDD-failing state.

F5 MEDIUM — Missing `-y` short-form benchmark variant

Fixed. Added time_init_short_y_flag() benchmark method that exercises ["init", "-y"], covering the alias resolution path.

F6 MEDIUM — `"Enter "` prompt token too broad

Fixed. Narrowed from "Enter " to "Enter a " to avoid matching legitimate output like "Entered configuration...". The remaining tokens ("Continue? ", "Proceed? ", "Confirm ") are already specific to prompt contexts.

F7 MEDIUM — `[Teardown]` placement non-idiomatic

Fixed. Moved [Teardown] to the last line of all three Robot test cases. While functionally equivalent (Robot always runs teardown after the test body regardless of source position), this follows the conventional placement that every Robot Framework style guide recommends.

F8 LOW — No `@wip` tag for CI filtering

Fixed. Added @wip to all four scenarios alongside the existing @tdd @bug522 tags. CI can now use --tags='not @wip' to exclude these known-failing tests from polluting reports.

F9 LOW — Prompt-absence step unreachable until fix

Acknowledged, no change. This is an inherent limitation of TDD failing tests — the exit-code assertion fails first, so the prompt-suppression step never executes. When the #522 fix PR is submitted, the fix author should verify that the prompt check exercises real logic.

Verification

nox -s lint — passed
nox -s typecheck — passed (0 errors, 0 warnings, 0 informations)

Thanks @CoreRasurae for the second round. All 9 findings addressed in commit `6428971c`. ## Disposition of findings (review #1945) ### F1 HIGH — Missing `Config:` and `Directories:` output assertions **Fixed.** Added `And the init output should contain "Config:"` and `And the init output should contain "Directories:"` to the "Output includes expected initialization summary" scenario (lines 36-38). All four spec-required fields (`Data Dir:`, `Config:`, `Database:`, `Directories:`) from `docs/specification.md:1381-1400` are now asserted. ### F2 HIGH — No mock behavioral verification for `-y` scenario **Fixed.** Added `And the project service initialize_project should have been called` to the `-y` short-form scenario, giving it the same behavioral verification as the `--yes` scenario. ### F3 HIGH — Robot summary test only checks status message **Fixed.** Added `Should Contain` assertions for all four spec fields (`Data Dir:`, `Config:`, `Database:`, `Directories:`) to the "Init Yes Flag Produces Summary Output" Robot test case. ### F4 MEDIUM — Benchmark stores exit code but never validates **Fixed.** Added `track_exit_code()` method that invokes the CLI and returns `self._last_exit_code`. ASV will track this as a metric, making exit code regressions visible in benchmark history without needing an in-benchmark assertion that would conflict with the TDD-failing state. ### F5 MEDIUM — Missing `-y` short-form benchmark variant **Fixed.** Added `time_init_short_y_flag()` benchmark method that exercises `["init", "-y"]`, covering the alias resolution path. ### F6 MEDIUM — `"Enter "` prompt token too broad **Fixed.** Narrowed from `"Enter "` to `"Enter a "` to avoid matching legitimate output like `"Entered configuration..."`. The remaining tokens (`"Continue? "`, `"Proceed? "`, `"Confirm "`) are already specific to prompt contexts. ### F7 MEDIUM — `[Teardown]` placement non-idiomatic **Fixed.** Moved `[Teardown]` to the last line of all three Robot test cases. While functionally equivalent (Robot always runs teardown after the test body regardless of source position), this follows the conventional placement that every Robot Framework style guide recommends. ### F8 LOW — No `@wip` tag for CI filtering **Fixed.** Added `@wip` to all four scenarios alongside the existing `@tdd @bug522` tags. CI can now use `--tags='not @wip'` to exclude these known-failing tests from polluting reports. ### F9 LOW — Prompt-absence step unreachable until fix **Acknowledged, no change.** This is an inherent limitation of TDD failing tests — the exit-code assertion fails first, so the prompt-suppression step never executes. When the #522 fix PR is submitted, the fix author should verify that the prompt check exercises real logic. ## Verification - `nox -s lint` — passed - `nox -s typecheck` — passed (0 errors, 0 warnings, 0 informations)

brent.edwards referenced this pull request

2026-03-04 23:47:46 +00:00

test(resource): add failing tests for built-in fs-directory type bootstrap (#537) #567

CoreRasurae reviewed 2026-03-04 23:53:58 +00:00

CoreRasurae left a comment

Code Review -- Commit `6428971c` (F1-F9 review fixes)

Reviewed the latest commit by Brent against issue #536 acceptance criteria, the specification at docs/specification.md (lines 1215-1450, 34303-34327), and codebase conventions. The commit competently addresses the prior CoreRasurae review findings. Six new findings are noted below -- one medium, four low, one informational.

Summary

ID	Category	Severity	File	Line(s)
F1	Bug	Medium	`benchmarks/cli_init_yes_bench.py`	74-83
F2	Bug	Low	`benchmarks/cli_init_yes_bench.py`	32-34
F3	Test Flaw	Low	`features/steps/cli_init_yes_flag_steps.py`	107
F4	Test Coverage	Low	`features/cli_init_yes_flag.feature`	35-38
F5	Test Flaw	Low	`features/steps/cli_init_yes_flag_steps.py`	28-33
F6	Informational	Info	`features/steps/cli_init_yes_flag_steps.py`	40-47

Details are provided as inline comments on the relevant lines.

## Code Review -- Commit `6428971c` (F1-F9 review fixes) Reviewed the latest commit by Brent against issue #536 acceptance criteria, the specification at `docs/specification.md` (lines 1215-1450, 34303-34327), and codebase conventions. The commit competently addresses the prior CoreRasurae review findings. Six new findings are noted below -- one medium, four low, one informational. ### Summary | ID | Category | Severity | File | Line(s) | |:---|:---------|:---------|:-----|:--------| | F1 | Bug | **Medium** | `benchmarks/cli_init_yes_bench.py` | 74-83 | | F2 | Bug | Low | `benchmarks/cli_init_yes_bench.py` | 32-34 | | F3 | Test Flaw | Low | `features/steps/cli_init_yes_flag_steps.py` | 107 | | F4 | Test Coverage | Low | `features/cli_init_yes_flag.feature` | 35-38 | | F5 | Test Flaw | Low | `features/steps/cli_init_yes_flag_steps.py` | 28-33 | | F6 | Informational | Info | `features/steps/cli_init_yes_flag_steps.py` | 40-47 | Details are provided as inline comments on the relevant lines.

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +29,4 @@

				    timeout = 30.0

				    def setup(self) -> None:

CoreRasurae commented

F2 -- Bug (Low): _last_exit_code not defensively initialized in setup()

self._last_exit_code is only assigned inside _invoke_with_mock() (line 58). While track_exit_code() calls _invoke_with_mock() first so under normal execution it works, if the mock setup or runner.invoke() raises an exception before line 58, the subsequent return self._last_exit_code will raise AttributeError rather than a meaningful error.

Consider initializing it here:

def setup(self) -> None:
    self._runner = CliRunner()
    self._tmpdir = tempfile.mkdtemp()
    self._last_exit_code = -1

This follows the defensive pattern seen in other benchmark suites and ensures ASV always gets a numeric value.

**F2 -- Bug (Low): `_last_exit_code` not defensively initialized in `setup()`** `self._last_exit_code` is only assigned inside `_invoke_with_mock()` (line 58). While `track_exit_code()` calls `_invoke_with_mock()` first so under normal execution it works, if the mock setup or `runner.invoke()` raises an exception before line 58, the subsequent `return self._last_exit_code` will raise `AttributeError` rather than a meaningful error. Consider initializing it here: ```python def setup(self) -> None: self._runner = CliRunner() self._tmpdir = tempfile.mkdtemp() self._last_exit_code = -1 ``` This follows the defensive pattern seen in other benchmark suites and ensures ASV always gets a numeric value.

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +71,4 @@

				            ["init", "--yes", "--path", self._tmpdir], "bench-path-project"

				        )

				    def track_exit_code(self) -> int:

CoreRasurae commented

F1 -- Bug (Medium): Missing ASV .unit metadata on track_exit_code()

Every track_* method across all benchmark files in this codebase declares a .unit class attribute for ASV metric labeling. Examples:

bench_unit_tests.py:90 -- StepModuleDiscoverySuite.track_step_file_count.unit = "files"
bench_unit_tests.py:200 -- TrackTestSuiteMetrics.track_scenario_line_count.unit = "scenarios"
bench_coverage_report.py:205 -- CoverageFileSuite.track_source_file_count.unit = "files"
bench_subprocess_overhead.py:55 -- SubprocessCountSuite.track_subprocess_count.unit = "subprocesses"

This new track_exit_code() is missing the attribute. Without it, ASV will display the metric with no unit label in its HTML reports and JSON results. Add after the class body:

InitYesFlagSuite.track_exit_code.unit = "exit_code"

**F1 -- Bug (Medium): Missing ASV `.unit` metadata on `track_exit_code()`** Every `track_*` method across all benchmark files in this codebase declares a `.unit` class attribute for ASV metric labeling. Examples: - `bench_unit_tests.py:90` -- `StepModuleDiscoverySuite.track_step_file_count.unit = "files"` - `bench_unit_tests.py:200` -- `TrackTestSuiteMetrics.track_scenario_line_count.unit = "scenarios"` - `bench_coverage_report.py:205` -- `CoverageFileSuite.track_source_file_count.unit = "files"` - `bench_subprocess_overhead.py:55` -- `SubprocessCountSuite.track_subprocess_count.unit = "subprocesses"` This new `track_exit_code()` is missing the attribute. Without it, ASV will display the metric with no unit label in its HTML reports and JSON results. Add after the class body: ```python InitYesFlagSuite.track_exit_code.unit = "exit_code" ```

features/cli_init_yes_flag.feature Outdated

						
				@@ -0,0 +32,4 @@

				    Given I have a temporary project directory for init

				    When I run agents init with the --yes flag

				    Then the init command should exit with code 0

				    And the init output should contain "Data Dir:"

CoreRasurae commented

F4 -- Test Coverage (Low): Spec-required output value formats not verified

The spec at docs/specification.md:1381-1388 and the Workflow Example at line 34319-34326 define specific value content for each field:

Field	Spec value	Tested?
`Data Dir:`	`.../.cleveragents (created)`	Key only
`Config:`	`.../.cleveragents/config.toml`	Key only
`Database:`	`initialized (schema v3)`	Key only
`Directories:`	`logs, cache, sessions, contexts`	Key only

All four assertions check only that the label exists, not the value structure. The tests would pass even if the output printed Database: ERROR or Directories: with no value. Consider adding at least one assertion that spot-checks a value format, e.g.:

And the init output should contain "initialized (schema v3)"

**F4 -- Test Coverage (Low): Spec-required output value formats not verified** The spec at `docs/specification.md:1381-1388` and the Workflow Example at line 34319-34326 define specific value content for each field: | Field | Spec value | Tested? | |:------|:-----------|:--------| | `Data Dir:` | `.../.cleveragents (created)` | Key only | | `Config:` | `.../.cleveragents/config.toml` | Key only | | `Database:` | `initialized (schema v3)` | Key only | | `Directories:` | `logs, cache, sessions, contexts` | Key only | All four assertions check only that the label exists, not the value structure. The tests would pass even if the output printed `Database: ERROR` or `Directories:` with no value. Consider adding at least one assertion that spot-checks a value format, e.g.: ```gherkin And the init output should contain "initialized (schema v3)" ```

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +29,4 @@

				    """Create a temporary directory and store it on *context*."""

				    context.temp_dir = tempfile.mkdtemp()

				    context.original_cwd = Path.cwd()

				    os.chdir(context.temp_dir)

CoreRasurae commented

F5 -- Test Flaw (Low): Missing CLEVERAGENTS_HOME environment isolation

The Robot tests properly isolate via CLEVERAGENTS_HOME set in common.resource:23. This Behave step relies on os.chdir(context.temp_dir) but never sets CLEVERAGENTS_HOME.

Currently mitigated because get_container() is mocked at line 40, so the real init path is never exercised. However, when the #522 bug fix is applied and mocks are potentially adjusted, if any code path resolves paths via CLEVERAGENTS_HOME instead of CWD, the test could leak into the developer's real home directory. The before_all() hook in features/environment.py does not set CLEVERAGENTS_HOME either.

Consider adding:

os.environ["CLEVERAGENTS_HOME"] = context.temp_dir

and restoring it during cleanup, matching the Robot test isolation pattern.

**F5 -- Test Flaw (Low): Missing `CLEVERAGENTS_HOME` environment isolation** The Robot tests properly isolate via `CLEVERAGENTS_HOME` set in `common.resource:23`. This Behave step relies on `os.chdir(context.temp_dir)` but never sets `CLEVERAGENTS_HOME`. Currently mitigated because `get_container()` is mocked at line 40, so the real init path is never exercised. However, when the #522 bug fix is applied and mocks are potentially adjusted, if any code path resolves paths via `CLEVERAGENTS_HOME` instead of CWD, the test could leak into the developer's real home directory. The `before_all()` hook in `features/environment.py` does not set `CLEVERAGENTS_HOME` either. Consider adding: ```python os.environ["CLEVERAGENTS_HOME"] = context.temp_dir ``` and restoring it during cleanup, matching the Robot test isolation pattern.

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +37,4 @@

				    """Invoke ``agents init`` with the given flag via the Typer test runner."""

				    runner = CliRunner()

				    with patch("cleveragents.application.container.get_container") as mock_container:

CoreRasurae commented

F6 -- Informational: Spec vs. implementation alignment risk in mock target

The spec at docs/specification.md:1219-1224 describes agents init as a global environment reset: "Initialize or reset the global CleverAgents environment. This wipes any existing data and re-creates the global config and database."

These steps mock project_service.initialize_project(), which aligns with the current implementation at src/cleveragents/cli/main.py:351-415 where init delegates to project_init_command(). However, the spec semantics (global environment reset) differ from the implementation semantics (project initialization).

If the #522 fix remodels init to match the spec's global-reset semantics (wipe data, re-create config/database, create directories), the mock target would need to change. This is inherent to the TDD approach but worth noting so the fix implementer is aware the mock strategy may need adaptation.

**F6 -- Informational: Spec vs. implementation alignment risk in mock target** The spec at `docs/specification.md:1219-1224` describes `agents init` as a **global environment reset**: *"Initialize or reset the global CleverAgents environment. This wipes any existing data and re-creates the global config and database."* These steps mock `project_service.initialize_project()`, which aligns with the current implementation at `src/cleveragents/cli/main.py:351-415` where `init` delegates to `project_init_command()`. However, the spec semantics (global environment reset) differ from the implementation semantics (project initialization). If the #522 fix remodels `init` to match the spec's global-reset semantics (wipe data, re-create config/database, create directories), the mock target would need to change. This is inherent to the TDD approach but worth noting so the fix implementer is aware the mock strategy may need adaptation.

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +104,4 @@

				        "[y/N]",

				        "Continue? ",

				        "Proceed? ",

				        "Enter a ",

CoreRasurae commented

2026-03-05 00:00:39 +00:00

F3 -- Test Flaw (Low): Prompt token narrowing reduces negative-assertion coverage

Changing from "Enter " to "Enter a " addresses false-positive risk on legitimate output (e.g., "Enterprise"), but it now misses realistic interactive prompt patterns like:

"Enter project name: "
"Enter path: "
"Enter value: "

A more targeted approach -- such as a regex like r"Enter\s+\S+.*[:\?]" or checking for "Enter " only at line boundaries -- would preserve detection coverage without the false-positive risk. As-is, the negative assertion has a blind spot for a common prompt pattern.

**F3 -- Test Flaw (Low): Prompt token narrowing reduces negative-assertion coverage** Changing from `"Enter "` to `"Enter a "` addresses false-positive risk on legitimate output (e.g., "Enterprise"), but it now misses realistic interactive prompt patterns like: - `"Enter project name: "` - `"Enter path: "` - `"Enter value: "` A more targeted approach -- such as a regex like `r"Enter\s+\S+.*[:\?]"` or checking for `"Enter "` only at line boundaries -- would preserve detection coverage without the false-positive risk. As-is, the negative assertion has a blind spot for a common prompt pattern.

brent.edwards commented

Thanks @CoreRasurae — all 6 findings addressed in commit 100d5574.

Disposition of findings (review #1948)

F1 MEDIUM — Missing `.unit` metadata on `track_exit_code()`

Fixed. Added InitYesFlagSuite.track_exit_code.unit = "exit_code" after the class body, matching the convention used by every other track_* method in the benchmark suite (bench_unit_tests.py, bench_coverage_report.py, bench_subprocess_overhead.py, etc.).

F2 LOW — `_last_exit_code` not defensively initialized

Fixed. Added self._last_exit_code = -1 in setup(). If _invoke_with_mock() raises before the assignment, ASV will now get -1 instead of an AttributeError.

F3 LOW — Prompt token narrowing reduces coverage

Fixed. Replaced the "Enter a " substring check with a regex: r"Enter\s+\S+.*[:\?]". This catches realistic prompt patterns like "Enter project name:", "Enter path:", "Enter value?" without false-positiving on "Entered configuration" or "Enterprise". The regex is applied as a separate re.search() after the substring token loop so its match group can be included in the assertion message.

F4 LOW — Spec output value formats not verified

Fixed. Added And the init output should contain "logs, cache, sessions, contexts" to the output summary scenario, spot-checking the Directories: field value per docs/specification.md:34323. This ensures the output isn't just printing labels with empty/wrong values.

F5 LOW — Missing `CLEVERAGENTS_HOME` environment isolation

Fixed. The Given step now sets os.environ["CLEVERAGENTS_HOME"] = context.temp_dir and the cleanup restores the original value (or removes it if it wasn't set). This matches the Robot test isolation pattern in common.resource:23 and prevents any code path that resolves via CLEVERAGENTS_HOME from leaking into the developer's real home directory.

F6 INFO — Mock target alignment risk

Acknowledged, no change. The mock targets project_service.initialize_project() which aligns with the current init implementation at main.py:351-415. If the #522 fix remodels init to match the spec's global-reset semantics, the mock strategy will need adaptation — noted for the fix implementer.

Verification

nox -s lint — passed
nox -s typecheck — passed (0 errors, 0 warnings, 0 informations)

Thanks @CoreRasurae — all 6 findings addressed in commit `100d5574`. ## Disposition of findings (review #1948) ### F1 MEDIUM — Missing `.unit` metadata on `track_exit_code()` **Fixed.** Added `InitYesFlagSuite.track_exit_code.unit = "exit_code"` after the class body, matching the convention used by every other `track_*` method in the benchmark suite (`bench_unit_tests.py`, `bench_coverage_report.py`, `bench_subprocess_overhead.py`, etc.). ### F2 LOW — `_last_exit_code` not defensively initialized **Fixed.** Added `self._last_exit_code = -1` in `setup()`. If `_invoke_with_mock()` raises before the assignment, ASV will now get `-1` instead of an `AttributeError`. ### F3 LOW — Prompt token narrowing reduces coverage **Fixed.** Replaced the `"Enter a "` substring check with a regex: `r"Enter\s+\S+.*[:\?]"`. This catches realistic prompt patterns like `"Enter project name:"`, `"Enter path:"`, `"Enter value?"` without false-positiving on `"Entered configuration"` or `"Enterprise"`. The regex is applied as a separate `re.search()` after the substring token loop so its match group can be included in the assertion message. ### F4 LOW — Spec output value formats not verified **Fixed.** Added `And the init output should contain "logs, cache, sessions, contexts"` to the output summary scenario, spot-checking the `Directories:` field value per `docs/specification.md:34323`. This ensures the output isn't just printing labels with empty/wrong values. ### F5 LOW — Missing `CLEVERAGENTS_HOME` environment isolation **Fixed.** The Given step now sets `os.environ["CLEVERAGENTS_HOME"] = context.temp_dir` and the cleanup restores the original value (or removes it if it wasn't set). This matches the Robot test isolation pattern in `common.resource:23` and prevents any code path that resolves via `CLEVERAGENTS_HOME` from leaking into the developer's real home directory. ### F6 INFO — Mock target alignment risk **Acknowledged, no change.** The mock targets `project_service.initialize_project()` which aligns with the current `init` implementation at `main.py:351-415`. If the #522 fix remodels init to match the spec's global-reset semantics, the mock strategy will need adaptation — noted for the fix implementer. ## Verification - `nox -s lint` — passed - `nox -s typecheck` — passed (0 errors, 0 warnings, 0 informations)

freemo added the

labels 2026-03-05 03:28:33 +00:00

CoreRasurae was assigned by freemo

2026-03-05 03:32:37 +00:00

freemo reviewed 2026-03-05 03:35:39 +00:00

freemo left a comment

PM Review — Day 25

Status

Mergeable — no conflicts.
Active review cycle: 4 rounds of back-and-forth between @brent.edwards and @CoreRasurae. Initial REQUEST_CHANGES (9 findings) is now stale after multiple fix commits. Follow-up COMMENT review raised 6 more findings.
Brent has addressed findings across commits 429ba8db, 6428971c, and 100d5574.

Action Item

@CoreRasurae: Please do a final re-review of the current state and submit a formal APPROVED verdict if satisfied. The REQUEST_CHANGES is stale and blocking merge.

Notes

This is a TDD test PR for #536. The tests are intentionally failing (by design) — they validate the agents init --yes missing option bug.
Empty PR body — acceptable for TDD test PRs where the self-review comment serves as documentation.

## PM Review — Day 25 ### Status - **Mergeable** — no conflicts. - **Active review cycle**: 4 rounds of back-and-forth between @brent.edwards and @CoreRasurae. Initial REQUEST_CHANGES (9 findings) is now stale after multiple fix commits. Follow-up COMMENT review raised 6 more findings. - Brent has addressed findings across commits `429ba8db`, `6428971c`, and `100d5574`. ### Action Item **@CoreRasurae**: Please do a final re-review of the current state and submit a formal APPROVED verdict if satisfied. The REQUEST_CHANGES is stale and blocking merge. ### Notes - This is a TDD test PR for #536. The tests are intentionally failing (by design) — they validate the `agents init --yes` missing option bug. - Empty PR body — acceptable for TDD test PRs where the self-review comment serves as documentation.

CoreRasurae reviewed 2026-03-05 10:46:54 +00:00

CoreRasurae left a comment

Code Review — Commit `100d5574` (fix(test): address CoreRasurae review #1948 findings F1-F6)

Reviewed against: Issue #536 acceptance criteria, docs/specification.md (lines 1215-1450, 34317-34326), and project test conventions.

Scope: The commit correctly addresses the F1-F6 review findings from CoreRasurae review #1948. The CLEVERAGENTS_HOME env var isolation, the regex improvement for prompt detection, the spec-value assertion for "logs, cache, sessions, contexts" (grounded in specification.md:1385 and specification.md:34323), and the ASV .unit metadata all follow established project conventions.

Findings Summary

ID	Severity	Category	File	Description
F1	MEDIUM	Test Flaw	`cli_init_yes_flag_steps.py:123`	Greedy `.*` regex may over-match in prompt detection
F2	MEDIUM	Test Flaw	`cli_init_yes_flag_steps.py:48-54`	Mock project lacks spec-required fields for when #522 fix lands
F3	MEDIUM	Test Design	`cli_init_yes_bench.py:69-73`	Benchmark tests `--path` flag not present in spec for `agents init`
F4	MEDIUM	Test Flaw	`cli_init_yes_flag_steps.py:36`	`context.original_cwd` type inconsistency with `environment.py`
F5	LOW	Style	`cli_init_yes_flag_steps.py:123`	Redundant `\?` escape inside character class
F6	LOW	Coverage	`cli_init_yes_flag.feature`	No negative scenario for interactive mode (without `--yes`)
F7	INFO	Positive	`cli_init_yes_flag.robot`	Robot test environment isolation is correct
F8	INFO	Positive	`cli_init_yes_bench.py`	ASV conventions properly followed

Detail on each finding

F1 — MEDIUM | Greedy `.*` regex may over-match (inline comment below)

The regex r"Enter\s+\S+.*[:\?]" uses greedy .* which consumes the entire remainder of the line up to the last : or ?. If the init output ever contains a line like "Enter defaults mode: ok — Config: /path/config.toml", the match would extend all the way to the final :, producing a misleading assertion failure message.

Recommendation:

enter_prompt = re.search(r"Enter\s+\S+.*?[:?]", output)

Non-greedy .*? stops at the first : or ?, matching only the prompt itself. This also drops the redundant \? escape (F5).

F2 — MEDIUM | Mock project lacks spec-required fields (inline comment below)

The mock sets only .name and .path, but specification.md:1381-1386 defines four output fields for init --yes:

Data Dir: → needs a data directory path
Config: → needs a config file path
Database: → needs "initialized (schema v3)"
Directories: → needs "logs, cache, sessions, contexts"

When the #522 fix remodels init_command() to produce this output, the MagicMock will return sub-mock objects for unset attributes, stringifying as <MagicMock id='...'>. Output assertion tests would fail for the wrong reason (mock incompleteness, not a real bug). The same issue applies identically to the benchmark mock (cli_init_yes_bench.py:51-56).

The commit's F6 comment acknowledges the risk, but no future-proofing is applied.

Recommendation: Pre-populate mock with spec-required fields:

mock_project.data_dir = Path(context.temp_dir)
mock_project.config_path = Path(context.temp_dir) / "config.toml"
mock_project.database_status = "initialized (schema v3)"
mock_project.directories = ["logs", "cache", "sessions", "contexts"]

F3 — MEDIUM | Benchmark tests undocumented `--path` flag (inline comment below)

The spec defines agents init [--yes|-y] (specification.md:1217) with no --path option. The current implementation has --path because it treats init as a project-scoped operation, but the spec's init is a global environment reset (specification.md:1219-1224: "wipes all existing data and re-creates the global config and database"). When #522 aligns the init command with the spec, this flag may be removed.

Recommendation: Add a comment documenting that --path is an implementation detail that may not survive the #522 refactor, or remove this benchmark.

F4 — MEDIUM | `context.original_cwd` type inconsistency (inline comment below)

before_scenario in environment.py:222 stores os.getcwd() (returns str). The Given step overwrites it with Path.cwd() (returns Path). While os.chdir() accepts both types, this creates:

A type inconsistency — after_scenario restores using the Path value while the rest of the framework expects str.
An attribute collision — the step silently overwrites the framework's stored CWD.

Recommendation: Use a step-private attribute name and match the framework's type:

context._init_original_cwd = os.getcwd()

F5 — LOW | Redundant `\?` escape

[:\?] — inside a character class [], ? is a literal and needs no escape. Should be [:?]. Covered by the F1 fix.

F6 — LOW | No negative scenario for interactive mode

The feature file only tests the --yes / -y non-interactive path. No scenario verifies that agents init without --yes presents a confirmation prompt (the interactive path per specification.md:1237-1238). Not part of the #536 acceptance criteria, but would strengthen the test contract.

Overall Assessment

The commit is well-structured and addresses the prior review findings correctly. The four MEDIUM findings (F1-F4) are not blocking for this TDD branch in its current state (tests are expected to fail), but F1 and F2 should be addressed before or alongside the #522 fix to ensure the tests pass correctly once the flag is implemented.

## Code Review — Commit `100d5574` (fix(test): address CoreRasurae review #1948 findings F1-F6) **Reviewed against:** Issue #536 acceptance criteria, `docs/specification.md` (lines 1215-1450, 34317-34326), and project test conventions. **Scope:** The commit correctly addresses the F1-F6 review findings from CoreRasurae review #1948. The `CLEVERAGENTS_HOME` env var isolation, the regex improvement for prompt detection, the spec-value assertion for `"logs, cache, sessions, contexts"` (grounded in `specification.md:1385` and `specification.md:34323`), and the ASV `.unit` metadata all follow established project conventions. --- ### Findings Summary | ID | Severity | Category | File | Description | |---|---|---|---|---| | **F1** | MEDIUM | Test Flaw | `cli_init_yes_flag_steps.py:123` | Greedy `.*` regex may over-match in prompt detection | | **F2** | MEDIUM | Test Flaw | `cli_init_yes_flag_steps.py:48-54` | Mock project lacks spec-required fields for when #522 fix lands | | **F3** | MEDIUM | Test Design | `cli_init_yes_bench.py:69-73` | Benchmark tests `--path` flag not present in spec for `agents init` | | **F4** | MEDIUM | Test Flaw | `cli_init_yes_flag_steps.py:36` | `context.original_cwd` type inconsistency with `environment.py` | | **F5** | LOW | Style | `cli_init_yes_flag_steps.py:123` | Redundant `\?` escape inside character class | | **F6** | LOW | Coverage | `cli_init_yes_flag.feature` | No negative scenario for interactive mode (without `--yes`) | | **F7** | INFO | Positive | `cli_init_yes_flag.robot` | Robot test environment isolation is correct | | **F8** | INFO | Positive | `cli_init_yes_bench.py` | ASV conventions properly followed | --- ### Detail on each finding #### F1 — MEDIUM | Greedy `.*` regex may over-match (inline comment below) The regex `r"Enter\s+\S+.*[:\?]"` uses greedy `.*` which consumes the entire remainder of the line up to the **last** `:` or `?`. If the init output ever contains a line like `"Enter defaults mode: ok — Config: /path/config.toml"`, the match would extend all the way to the final `:`, producing a misleading assertion failure message. **Recommendation:** ```python enter_prompt = re.search(r"Enter\s+\S+.*?[:?]", output) ``` Non-greedy `.*?` stops at the first `:` or `?`, matching only the prompt itself. This also drops the redundant `\?` escape (F5). #### F2 — MEDIUM | Mock project lacks spec-required fields (inline comment below) The mock sets only `.name` and `.path`, but `specification.md:1381-1386` defines four output fields for `init --yes`: - `Data Dir:` → needs a data directory path - `Config:` → needs a config file path - `Database:` → needs `"initialized (schema v3)"` - `Directories:` → needs `"logs, cache, sessions, contexts"` When the #522 fix remodels `init_command()` to produce this output, the `MagicMock` will return sub-mock objects for unset attributes, stringifying as `<MagicMock id='...'>`. Output assertion tests would fail for the **wrong reason** (mock incompleteness, not a real bug). The same issue applies identically to the benchmark mock (`cli_init_yes_bench.py:51-56`). The commit's F6 comment acknowledges the risk, but no future-proofing is applied. **Recommendation:** Pre-populate mock with spec-required fields: ```python mock_project.data_dir = Path(context.temp_dir) mock_project.config_path = Path(context.temp_dir) / "config.toml" mock_project.database_status = "initialized (schema v3)" mock_project.directories = ["logs", "cache", "sessions", "contexts"] ``` #### F3 — MEDIUM | Benchmark tests undocumented `--path` flag (inline comment below) The spec defines `agents init [--yes|-y]` (`specification.md:1217`) with **no** `--path` option. The current implementation has `--path` because it treats init as a project-scoped operation, but the spec's `init` is a **global environment** reset (`specification.md:1219-1224`: "wipes all existing data and re-creates the global config and database"). When #522 aligns the init command with the spec, this flag may be removed. **Recommendation:** Add a comment documenting that `--path` is an implementation detail that may not survive the #522 refactor, or remove this benchmark. #### F4 — MEDIUM | `context.original_cwd` type inconsistency (inline comment below) `before_scenario` in `environment.py:222` stores `os.getcwd()` (returns `str`). The Given step overwrites it with `Path.cwd()` (returns `Path`). While `os.chdir()` accepts both types, this creates: 1. A **type inconsistency** — `after_scenario` restores using the `Path` value while the rest of the framework expects `str`. 2. An **attribute collision** — the step silently overwrites the framework's stored CWD. **Recommendation:** Use a step-private attribute name and match the framework's type: ```python context._init_original_cwd = os.getcwd() ``` #### F5 — LOW | Redundant `\?` escape `[:\?]` — inside a character class `[]`, `?` is a literal and needs no escape. Should be `[:?]`. Covered by the F1 fix. #### F6 — LOW | No negative scenario for interactive mode The feature file only tests the `--yes` / `-y` non-interactive path. No scenario verifies that `agents init` **without** `--yes` presents a confirmation prompt (the interactive path per `specification.md:1237-1238`). Not part of the #536 acceptance criteria, but would strengthen the test contract. --- ### Overall Assessment The commit is well-structured and addresses the prior review findings correctly. The four MEDIUM findings (F1-F4) are not blocking for this TDD branch in its current state (tests are expected to fail), but **F1 and F2 should be addressed before or alongside the #522 fix** to ensure the tests pass correctly once the flag is implemented.

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +68,4 @@

				    def time_init_yes_flag_with_path(self) -> None:

				        """Measure latency of ``agents init --yes --path <dir>``."""

				        self._invoke_with_mock(

CoreRasurae commented

F3 (MEDIUM): The spec defines agents init [--yes|-y] with no --path option (specification.md:1217). The spec's init is a global environment reset, not a project init. When #522 aligns the implementation with the spec, --path may be removed, making this benchmark invalid.

Suggestion: Add a comment noting --path is an implementation detail subject to change with #522, or remove this benchmark.

**F3 (MEDIUM):** The spec defines `agents init [--yes|-y]` with **no** `--path` option (`specification.md:1217`). The spec's `init` is a global environment reset, not a project init. When #522 aligns the implementation with the spec, `--path` may be removed, making this benchmark invalid. **Suggestion:** Add a comment noting `--path` is an implementation detail subject to change with #522, or remove this benchmark.

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +33,4 @@

				def step_temp_project_directory(context):

				    """Create a temporary directory and store it on *context*."""

				    context.temp_dir = tempfile.mkdtemp()

				    context.original_cwd = Path.cwd()

CoreRasurae commented

F4 (MEDIUM): This overwrites context.original_cwd set by environment.py:222 (os.getcwd() → str) with a Path object, creating a type inconsistency. It also silently collides with the framework attribute.

Suggestion: Use a step-private name and match the framework's type:

context._init_original_cwd = os.getcwd()

**F4 (MEDIUM):** This overwrites `context.original_cwd` set by `environment.py:222` (`os.getcwd()` → `str`) with a `Path` object, creating a type inconsistency. It also silently collides with the framework attribute. **Suggestion:** Use a step-private name and match the framework's type: ```python context._init_original_cwd = os.getcwd() ```

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +47,4 @@

				    with patch("cleveragents.application.container.get_container") as mock_container:

				        mock_service = MagicMock()

				        mock_project = MagicMock()

				        mock_project.name = Path(context.temp_dir).name

CoreRasurae commented

F2 (MEDIUM): The mock only sets .name and .path, but specification.md:1381-1386 requires the init output to include Data Dir:, Config:, Database:, and Directories: fields. When the #522 fix remodels init_command(), unset MagicMock attributes will stringify as <MagicMock id='...'>, causing output assertions to fail for the wrong reason.

Suggestion: Pre-populate spec-required fields:

mock_project.data_dir = Path(context.temp_dir)
mock_project.config_path = Path(context.temp_dir) / "config.toml"
mock_project.database_status = "initialized (schema v3)"
mock_project.directories = ["logs", "cache", "sessions", "contexts"]

**F2 (MEDIUM):** The mock only sets `.name` and `.path`, but `specification.md:1381-1386` requires the init output to include `Data Dir:`, `Config:`, `Database:`, and `Directories:` fields. When the #522 fix remodels `init_command()`, unset MagicMock attributes will stringify as `<MagicMock id='...'>`, causing output assertions to fail for the wrong reason. **Suggestion:** Pre-populate spec-required fields: ```python mock_project.data_dir = Path(context.temp_dir) mock_project.config_path = Path(context.temp_dir) / "config.toml" mock_project.database_status = "initialized (schema v3)" mock_project.directories = ["logs", "cache", "sessions", "contexts"] ```

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +120,4 @@

				        )

				    # Regex catches "Enter project name:", "Enter path:", "Enter value?"

				    # without false-positiving on "Entered configuration" or "Enterprise".

				    enter_prompt = re.search(r"Enter\s+\S+.*[:\?]", output)

CoreRasurae commented

2026-03-06 01:58:29 +00:00

F1 (MEDIUM) + F5 (LOW): The greedy .* will consume the entire remainder of the line up to the last : or ?, which can over-match if the init output contains colons in non-prompt contexts (e.g., Config: /path). Also, \? is redundant inside [].

Suggestion:

enter_prompt = re.search(r"Enter\s+\S+.*?[:?]", output)

Non-greedy .*? stops at the first : or ?, matching only the actual prompt.

**F1 (MEDIUM) + F5 (LOW):** The greedy `.*` will consume the entire remainder of the line up to the **last** `:` or `?`, which can over-match if the init output contains colons in non-prompt contexts (e.g., `Config: /path`). Also, `\?` is redundant inside `[]`. **Suggestion:** ```python enter_prompt = re.search(r"Enter\s+\S+.*?[:?]", output) ``` Non-greedy `.*?` stops at the first `:` or `?`, matching only the actual prompt.

brent.edwards reviewed 2026-03-05 19:45:03 +00:00

brent.edwards left a comment

Addressed: CoreRasurae review #1970 findings F1-F6

Commit 67730a6a addresses all findings from the latest review:

Finding	Severity	Resolution
F1+F5	MEDIUM+LOW	Regex changed to non-greedy `.*?[:?]` — stops at first `:` or `?` to avoid over-matching. Redundant `\?` escape removed.
F2	MEDIUM	Mock project pre-populated with `data_dir`, `config_path`, `database_status`, `directories` in both `cli_init_yes_flag_steps.py` and `cli_init_yes_bench.py` per `specification.md:1381-1386`.
F3	MEDIUM	Added docstring comment on `time_init_yes_flag_with_path` documenting that `--path` is an implementation detail not in the spec and may be removed with #522.
F4	MEDIUM	Renamed to `context._init_original_cwd` (step-private) and switched to `os.getcwd()` (str) to avoid type inconsistency and attribute collision with `environment.py`.
F6	LOW	Added negative scenario "Interactive mode without --yes presents a prompt" that verifies the non-interactive marker is absent when `--yes` is omitted.

Quality gates: nox -s lint PASS, nox -s typecheck PASS (0 errors).

## Addressed: CoreRasurae review #1970 findings F1-F6 Commit `67730a6a` addresses all findings from the latest review: | Finding | Severity | Resolution | |---------|----------|------------| | **F1+F5** | MEDIUM+LOW | Regex changed to non-greedy `.*?[:?]` — stops at first `:` or `?` to avoid over-matching. Redundant `\?` escape removed. | | **F2** | MEDIUM | Mock project pre-populated with `data_dir`, `config_path`, `database_status`, `directories` in both `cli_init_yes_flag_steps.py` and `cli_init_yes_bench.py` per `specification.md:1381-1386`. | | **F3** | MEDIUM | Added docstring comment on `time_init_yes_flag_with_path` documenting that `--path` is an implementation detail not in the spec and may be removed with #522. | | **F4** | MEDIUM | Renamed to `context._init_original_cwd` (step-private) and switched to `os.getcwd()` (str) to avoid type inconsistency and attribute collision with `environment.py`. | | **F6** | LOW | Added negative scenario "Interactive mode without --yes presents a prompt" that verifies the non-interactive marker is absent when `--yes` is omitted. | Quality gates: `nox -s lint` PASS, `nox -s typecheck` PASS (0 errors).

hamza.khyari requested changes 2026-03-05 22:07:35 +00:00

Dismissed

hamza.khyari left a comment

Review: PR #566 — `feature/m3-test-init-yes-flag`

Reviewer: reviewer-1
Verdict: Changes Requested — 1 blocker (cross-cutting), 4 minor, 2 nit

Prior CoreRasurae findings (F1-F9 across reviews #1945, #1948, #1970) are all resolved. The following are new.

BUG-1: `@wip` tags have no exclusion — breaks `nox -s unit_tests`

Severity: Major (blocker)
Category: BUG
File: features/cli_init_yes_flag.feature:7,14,22,30,42

CoreRasurae #1945 F8 asked for @wip tags — they were added but no exclusion mechanism was created. behave.ini has no tags = ~@wip, nox passes no --tags, environment.py has no hook. Result:

0 features passed, 1 failed
1 scenario passed, 4 failed

CI fails on every run. This is the same cross-cutting issue raised in PR #568 review.

Fix: Add to behave.ini:

tags = ~@wip

TEST-1: Scenario 5 passes today but is tagged `@wip`

Severity: Minor
Category: TEST
File: features/cli_init_yes_flag.feature:42-46

Scenario 5 ("Interactive mode without --yes presents a prompt") was added to address CoreRasurae #1970 F6. It passes today — agents init without --yes doesn't produce "Initialized (non-interactive)". This is a regression guard, not a TDD failing test, and should not be @wip.

Fix: Remove @wip from Scenario 5.

TEST-2: Scenario 5 doesn't verify exit code — masks crashes

Severity: Minor
Category: TEST
File: features/cli_init_yes_flag.feature:42-46

Scenario 5 only asserts the negative ("output should indicate interactive mode" checks "Initialized (non-interactive)" not in output). If the command crashes with a traceback and exit code 1, the assertion still passes because the crash output won't contain "Initialized (non-interactive)" either. The scenario should also assert exit code 0 to confirm the command actually succeeded.

Fix: Add a step:

  @tdd @bug522
  Scenario: Interactive mode without --yes presents a prompt
    Given I have a temporary project directory for init
    When I run agents init without the --yes flag
    Then the init command should exit with code 0
    Then the init output should indicate interactive mode

TEST-3: Bare `MagicMock()` for service and project objects

Severity: Minor
Category: TEST
File: features/steps/cli_init_yes_flag_steps.py:51-52,116-117

CoreRasurae #1970 F2 flagged missing spec fields on the mock — those were added. But the mock itself is still bare MagicMock() without spec=. Per review workflow Phase 6.5: "Never use bare MagicMock() for known types." Without spec=, the mock silently accepts any attribute access, so tests can't detect when the production code changes method signatures or attribute names.

Fix:

from unittest.mock import create_autospec
from cleveragents.application.services.project_service import ProjectService
mock_service = create_autospec(ProjectService, instance=True)

TEST-4: Fix author must change BOTH flag AND output format — no note explains this

Severity: Minor
Category: TEST
File: features/cli_init_yes_flag.feature:31-40, src/cleveragents/cli/commands/project.py:203-213

Scenario 4 asserts output fields ("Data Dir:", "Config:", "Database:", "Directories:", "logs, cache, sessions, contexts") that match the spec (docs/specification.md:1382-1402) but are completely absent from the current init_command output, which produces a different format entirely:

Current output (project.py:203-213):

┌── Project Initialized ──┐
│ ✓ Project 'foo' initialized successfully!
│ Location: /path/.cleveragents
│ Database: SQLite
│ Status: Ready
└──────────────────────────┘

Spec-defined output (what tests assert):

│ Data Dir: /home/alex/.cleveragents (created)  │
│ Config:   ...                                  │
│ Database: ...                                  │
│ Directories: logs, cache, sessions, contexts   │
✓ OK Initialized (non-interactive)

This means Scenarios 1-4 will fail for TWO independent reasons: (a) missing --yes flag, and (b) wrong output format. A fix author who only adds the --yes flag will still see test failures and may be confused. Consider adding a comment in the feature file or a NOTE_FOR_FIX_AUTHOR.md explaining both required changes.

CODE-1: Duplicate mock setup

Severity: Nit
Category: CODE
File: features/steps/cli_init_yes_flag_steps.py:46-73,110-134

Container/service/project mock setup is copy-pasted between _run_init_with_flag (line 50-64) and step_run_init_no_yes (line 115-126). Extract a shared helper to reduce maintenance burden.

CODE-2: Generic `context.result` — collision risk

Severity: Nit
Category: CODE
File: features/steps/cli_init_yes_flag_steps.py:68-73

CoreRasurae #1970 F4 caught context.original_cwd collision and Brent fixed it with _init_ prefix. But context.result, context.raw_result, context.mock_service still use generic names. Should use init_yes_ prefix for consistency with the established convention.

Severity	Count	IDs	Blocking?
Major	1	BUG-1	Yes
Minor	4	TEST-1, TEST-2, TEST-3, TEST-4	No
Nit	2	CODE-1, CODE-2	No

## Review: PR #566 — `feature/m3-test-init-yes-flag` **Reviewer**: reviewer-1 **Verdict**: **Changes Requested** — 1 blocker (cross-cutting), 4 minor, 2 nit Prior CoreRasurae findings (F1-F9 across reviews #1945, #1948, #1970) are all resolved. The following are new. --- ### BUG-1: `@wip` tags have no exclusion — breaks `nox -s unit_tests` **Severity**: Major (blocker) **Category**: BUG **File**: `features/cli_init_yes_flag.feature:7,14,22,30,42` CoreRasurae #1945 F8 asked for `@wip` tags — they were added but no exclusion mechanism was created. `behave.ini` has no `tags = ~@wip`, nox passes no `--tags`, `environment.py` has no hook. Result: ``` 0 features passed, 1 failed 1 scenario passed, 4 failed ``` CI fails on every run. This is the same cross-cutting issue raised in PR #568 review. **Fix**: Add to `behave.ini`: ```ini tags = ~@wip ``` --- ### TEST-1: Scenario 5 passes today but is tagged `@wip` **Severity**: Minor **Category**: TEST **File**: `features/cli_init_yes_flag.feature:42-46` Scenario 5 ("Interactive mode without --yes presents a prompt") was added to address CoreRasurae #1970 F6. It passes today — `agents init` without `--yes` doesn't produce "Initialized (non-interactive)". This is a regression guard, not a TDD failing test, and should not be `@wip`. **Fix**: Remove `@wip` from Scenario 5. --- ### TEST-2: Scenario 5 doesn't verify exit code — masks crashes **Severity**: Minor **Category**: TEST **File**: `features/cli_init_yes_flag.feature:42-46` Scenario 5 only asserts the negative ("output should indicate interactive mode" checks `"Initialized (non-interactive)" not in output`). If the command crashes with a traceback and exit code 1, the assertion still passes because the crash output won't contain "Initialized (non-interactive)" either. The scenario should also assert exit code 0 to confirm the command actually succeeded. **Fix**: Add a step: ```gherkin @tdd @bug522 Scenario: Interactive mode without --yes presents a prompt Given I have a temporary project directory for init When I run agents init without the --yes flag Then the init command should exit with code 0 Then the init output should indicate interactive mode ``` --- ### TEST-3: Bare `MagicMock()` for service and project objects **Severity**: Minor **Category**: TEST **File**: `features/steps/cli_init_yes_flag_steps.py:51-52,116-117` CoreRasurae #1970 F2 flagged missing spec fields on the mock — those were added. But the mock itself is still bare `MagicMock()` without `spec=`. Per review workflow Phase 6.5: "Never use bare `MagicMock()` for known types." Without `spec=`, the mock silently accepts any attribute access, so tests can't detect when the production code changes method signatures or attribute names. **Fix**: ```python from unittest.mock import create_autospec from cleveragents.application.services.project_service import ProjectService mock_service = create_autospec(ProjectService, instance=True) ``` --- ### TEST-4: Fix author must change BOTH flag AND output format — no note explains this **Severity**: Minor **Category**: TEST **File**: `features/cli_init_yes_flag.feature:31-40`, `src/cleveragents/cli/commands/project.py:203-213` Scenario 4 asserts output fields ("Data Dir:", "Config:", "Database:", "Directories:", "logs, cache, sessions, contexts") that match the spec (`docs/specification.md:1382-1402`) but are completely absent from the current `init_command` output, which produces a different format entirely: **Current output** (`project.py:203-213`): ``` ┌── Project Initialized ──┐ │ ✓ Project 'foo' initialized successfully! │ Location: /path/.cleveragents │ Database: SQLite │ Status: Ready └──────────────────────────┘ ``` **Spec-defined output** (what tests assert): ``` │ Data Dir: /home/alex/.cleveragents (created) │ │ Config: ... │ │ Database: ... │ │ Directories: logs, cache, sessions, contexts │ ✓ OK Initialized (non-interactive) ``` This means Scenarios 1-4 will fail for TWO independent reasons: (a) missing `--yes` flag, and (b) wrong output format. A fix author who only adds the `--yes` flag will still see test failures and may be confused. Consider adding a comment in the feature file or a `NOTE_FOR_FIX_AUTHOR.md` explaining both required changes. --- ### CODE-1: Duplicate mock setup **Severity**: Nit **Category**: CODE **File**: `features/steps/cli_init_yes_flag_steps.py:46-73,110-134` Container/service/project mock setup is copy-pasted between `_run_init_with_flag` (line 50-64) and `step_run_init_no_yes` (line 115-126). Extract a shared helper to reduce maintenance burden. --- ### CODE-2: Generic `context.result` — collision risk **Severity**: Nit **Category**: CODE **File**: `features/steps/cli_init_yes_flag_steps.py:68-73` CoreRasurae #1970 F4 caught `context.original_cwd` collision and Brent fixed it with `_init_` prefix. But `context.result`, `context.raw_result`, `context.mock_service` still use generic names. Should use `init_yes_` prefix for consistency with the established convention. --- | Severity | Count | IDs | Blocking? | |----------|-------|-----|-----------| | Major | 1 | BUG-1 | **Yes** | | Minor | 4 | TEST-1, TEST-2, TEST-3, TEST-4 | No | | Nit | 2 | CODE-1, CODE-2 | No |

brent.edwards commented

Thanks @hamza.khyari for the review. All 7 findings addressed in commit e3ec4576.

Disposition of findings (review #1992)

BUG-1 BLOCKER — `@wip` tag exclusion mechanism added ✅

This was the cross-cutting issue. Fixed in three places:

behave.ini: Added tags = ~@wip so Behave excludes @wip-tagged scenarios from all runs (unit_tests, coverage_report).
noxfile.py: Added --exclude wip to the pabot invocation in the integration_tests session, giving Robot Framework the same exclusion.
robot/cli_init_yes_flag.robot: Added [Tags] wip to all three TDD test cases so they are excluded by the new pabot filter.

Verified: nox -s unit_tests now shows 4 skipped scenarios (the @wip ones), 0 failed. nox -s integration_tests shows 1250 passed, 0 failed (the 3 wip Robot tests excluded).

TEST-1 — Scenario 5 `@wip` removed ✅

Removed @wip from "Interactive mode without --yes presents a prompt." It passes today (it's a regression guard checking "Initialized (non-interactive)" not in output), so it should not be excluded. Only @tdd @bug522 tags remain on this scenario.

TEST-2 — Scenario 5 exit code assertion added ✅

Added Then the init command should exit with code 0 before the interactive mode check. This prevents crashes (exit code 1 + traceback) from silently passing the negative assertion.

TEST-3 — `create_autospec` replaces bare `MagicMock()` ✅

Service mock: Both cli_init_yes_flag_steps.py and cli_init_yes_bench.py now use create_autospec(ProjectService, instance=True) instead of bare MagicMock(). This validates method signatures against the real ProjectService interface.
Project mock: Uses a typed _MockProject class with annotated attributes instead of MagicMock(). We cannot use create_autospec(Project) because the spec-required output fields (data_dir, config_path, database_status, directories) do not yet exist on the legacy Project model — they'll be added when #522 aligns the model with the spec. The typed class prevents silent auto-attribute creation that bare MagicMock allows.

TEST-4 — Fix author note added ✅

Added a NOTE FOR FIX AUTHOR (#522) block at the top of the feature file explaining that scenarios 1-4 will fail for TWO independent reasons: (a) missing --yes flag and (b) wrong output format. The fix must address both to pass these tests.

CODE-1 — Shared mock helper extracted ✅

Extracted _create_init_mocks(context) that creates and configures the container, service, and project mocks. Both _run_init_with_flag() and step_run_init_no_yes() now call this shared helper, eliminating the copy-pasted mock setup.

CODE-2 — Context attributes prefixed ✅

Renamed to avoid collision risk with other step files:

context.result → context.init_yes_result
context.raw_result → context.init_yes_raw_result
context.mock_service → context.init_yes_mock_service

This follows the same _init_ prefix convention established for context._init_original_cwd in the prior review round.

Verification

nox -s lint — passed
nox -s typecheck — passed (0 errors, 0 warnings, 0 informations)
nox -s unit_tests — passed (8701 scenarios passed, 0 failed, 4 skipped)
nox -s integration_tests — passed (1250 tests, 1250 passed, 0 failed)
nox -s coverage_report — passed (97% coverage)

Thanks @hamza.khyari for the review. All 7 findings addressed in commit `e3ec4576`. ## Disposition of findings (review #1992) ### BUG-1 BLOCKER — `@wip` tag exclusion mechanism added ✅ This was the cross-cutting issue. Fixed in three places: - **`behave.ini`**: Added `tags = ~@wip` so Behave excludes `@wip`-tagged scenarios from all runs (unit_tests, coverage_report). - **`noxfile.py`**: Added `--exclude wip` to the pabot invocation in the `integration_tests` session, giving Robot Framework the same exclusion. - **`robot/cli_init_yes_flag.robot`**: Added `[Tags] wip` to all three TDD test cases so they are excluded by the new pabot filter. Verified: `nox -s unit_tests` now shows 4 skipped scenarios (the `@wip` ones), 0 failed. `nox -s integration_tests` shows 1250 passed, 0 failed (the 3 wip Robot tests excluded). ### TEST-1 — Scenario 5 `@wip` removed ✅ Removed `@wip` from "Interactive mode without --yes presents a prompt." It passes today (it's a regression guard checking `"Initialized (non-interactive)" not in output`), so it should not be excluded. Only `@tdd @bug522` tags remain on this scenario. ### TEST-2 — Scenario 5 exit code assertion added ✅ Added `Then the init command should exit with code 0` before the interactive mode check. This prevents crashes (exit code 1 + traceback) from silently passing the negative assertion. ### TEST-3 — `create_autospec` replaces bare `MagicMock()` ✅ - **Service mock**: Both `cli_init_yes_flag_steps.py` and `cli_init_yes_bench.py` now use `create_autospec(ProjectService, instance=True)` instead of bare `MagicMock()`. This validates method signatures against the real `ProjectService` interface. - **Project mock**: Uses a typed `_MockProject` class with annotated attributes instead of `MagicMock()`. We cannot use `create_autospec(Project)` because the spec-required output fields (`data_dir`, `config_path`, `database_status`, `directories`) do not yet exist on the legacy `Project` model — they'll be added when #522 aligns the model with the spec. The typed class prevents silent auto-attribute creation that bare `MagicMock` allows. ### TEST-4 — Fix author note added ✅ Added a `NOTE FOR FIX AUTHOR (#522)` block at the top of the feature file explaining that scenarios 1-4 will fail for TWO independent reasons: (a) missing `--yes` flag and (b) wrong output format. The fix must address both to pass these tests. ### CODE-1 — Shared mock helper extracted ✅ Extracted `_create_init_mocks(context)` that creates and configures the container, service, and project mocks. Both `_run_init_with_flag()` and `step_run_init_no_yes()` now call this shared helper, eliminating the copy-pasted mock setup. ### CODE-2 — Context attributes prefixed ✅ Renamed to avoid collision risk with other step files: - `context.result` → `context.init_yes_result` - `context.raw_result` → `context.init_yes_raw_result` - `context.mock_service` → `context.init_yes_mock_service` This follows the same `_init_` prefix convention established for `context._init_original_cwd` in the prior review round. ## Verification - `nox -s lint` — passed - `nox -s typecheck` — passed (0 errors, 0 warnings, 0 informations) - `nox -s unit_tests` — passed (8701 scenarios passed, 0 failed, 4 skipped) - `nox -s integration_tests` — passed (1250 tests, 1250 passed, 0 failed) - `nox -s coverage_report` — passed (97% coverage)

hurui200320 requested changes 2026-03-06 06:44:37 +00:00

hurui200320 left a comment

Code Review — PR #566 (`feature/m3-test-init-yes-flag`)

Reviewer: @hurui200320
Reviewed against: Issue #536 acceptance criteria, docs/specification.md §agents init (lines 1215–1450, 34303–34327), and CONTRIBUTING.md project conventions.
Verdict: REQUEST CHANGES — 4 blockers, 2 high, 3 medium, 2 low.

The PR adds TDD-style failing tests for the agents init --yes missing option (bug #522). The test code itself is well-structured after multiple review rounds — the Behave scenarios are clear, output assertions now align with the spec, mock isolation is reasonable, and the @wip exclusion mechanism works correctly for both Behave (behave.ini) and Robot (noxfile.py). However, the PR has 4 blocking process violations that must be resolved before merge.

BLOCKER Findings

B1: Branch contains 4 merge commits

Commits: 75bd0d2e, b01bc6e4, a026fafe, e49bd099

CONTRIBUTING.md states: "Each branch must not contain merge commits — instead as master drifts the branches should always align with master via rebase, strictly avoiding merge commits in a branch's history."

Recommendation: Rebase the branch on current master using git rebase origin/master, resolving conflicts along the way.

B2: 12 commits (8 non-merge) for a single issue

The branch has: 1 original implementation + 1 refactor + 1 docs commit + 5 review-fix commits + 4 merge commits = 12 total. CONTRIBUTING.md states: "Every commit must completely implement an issue and close it — at no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch."

Recommendation: Interactive-rebase and squash all work into a single commit with the prescribed commit message test(cli): add failing tests for agents init --yes missing option.

B3: CHANGELOG.md deletes entries from other merged PRs

The branch's CHANGELOG.md is missing entries for #495 (M4 acceptance criteria), #193 (scoped backend view filtering), and #191 (context strategy registry) that exist on current master. These were lost during merge conflict resolution. If merged as-is, this PR would delete valid changelog entries from other teams' merged work.

Recommendation: Rebasing on current master (B1 fix) will resolve this — add the new entry at the top of the Unreleased section without removing existing entries.

B4: Empty PR description

The PR body is empty (""). CONTRIBUTING.md requires: "Every PR must include a clear, descriptive body that explains the purpose of the change... PRs submitted without a description or without an issue reference will not be reviewed."

Recommendation: Add a proper PR description with:

Summary of changes and motivation
Closes #536 closing keyword
Forgejo dependency link (PR blocks issue #536)

HIGH Findings

H1: Unnecessary `# type: ignore[attr-defined]` in benchmark file

File: benchmarks/cli_init_yes_bench.py:118

InitYesFlagSuite.track_exit_code.unit = "exit_code"  # type: ignore[attr-defined]

CONTRIBUTING.md states: "Never use # type: ignore or disable type checking." Furthermore, pyrightconfig.json sets "include": ["src"], meaning benchmarks/ is not included in type checking at all. The # type: ignore comment is both a rule violation and unnecessary since Pyright never processes this file.

Recommendation: Remove the # type: ignore[attr-defined] comment entirely.

H2: Issue quality subtasks remain unchecked

Two quality subtasks in issue #536 are still unchecked: "Verify coverage >=97%" and "Run nox (all default sessions)". Brent's latest comment (PR comment #55003) claims nox -s unit_tests (8701 passed), nox -s integration_tests (1250 passed), and nox -s coverage_report (97%) all pass. If these quality gates were indeed verified, the subtasks should be checked off in the issue body.

Recommendation: Mark the quality subtasks as checked if they have been verified.

MEDIUM Findings

M1: `_MockProject` class duplicated in two files

Files: features/steps/cli_init_yes_flag_steps.py:51-59, benchmarks/cli_init_yes_bench.py:29-42

The identical _MockProject class is defined in both files. CONTRIBUTING.md states mocking code belongs in features/mocks/. When #522 adds the real fields to the Project model, both copies must be updated independently.

Recommendation: Extract _MockProject to features/mocks/mock_project.py and import from both locations.

M2: Benchmark tests non-spec `--path` flag

File: benchmarks/cli_init_yes_bench.py:93-104

time_init_yes_flag_with_path benchmarks agents init --yes --path <dir>, but the spec defines only agents init [--yes|-y] with no --path option (specification.md:1217). The comment documents this caveat, but TDD tests should drive toward spec behavior rather than benchmarking implementation details that will be removed.

Recommendation: Remove time_init_yes_flag_with_path to keep the benchmark suite aligned with the spec.

M3: Mock strategy coupled to pre-spec implementation

Files: features/steps/cli_init_yes_flag_steps.py:73-75, benchmarks/cli_init_yes_bench.py:66-80

The mocks patch container.get_container and access .project_service().initialize_project(). The spec defines agents init as a global environment reset/wipe operation (specification.md:1219-1224), not a project-level initialization. When #522 aligns the command with the spec, the mock strategy will likely need rewriting. The feature file header note for fix author (#522) partially addresses this, but no similar note exists in the step definitions or benchmark file.

Recommendation: Add a comment in _create_init_mocks() and _invoke_with_mock() noting the mock target coupling and that it will change when #522 refactors init to match global-reset spec semantics.

LOW Findings

L1: Gherkin uses `Then ... Then` instead of `Then ... And`

File: features/cli_init_yes_flag.feature:55-56

    Then the init command should exit with code 0
    Then the init output should indicate interactive mode

Gherkin convention uses And for consecutive assertion steps.

Recommendation: Change line 56 to And the init output should indicate interactive mode.

L2: CHANGELOG entry says "three scenarios" but there are five

File: CHANGELOG.md

The entry states "Three scenarios verify..." but the feature file has 5 scenarios (4 @wip + 1 regression guard). The text was written before later review rounds added scenarios.

Recommendation: Update to accurately describe the 5 scenarios.

Positive Observations

Behave scenarios clearly map to spec requirements at specification.md:1381-1402
Output assertions match the spec exactly (Data Dir:, Config:, Database:, Directories:, Initialized (non-interactive))
Both --yes and -y short-form are tested across Behave, Robot, and ASV
@wip exclusion mechanism correctly implemented in both behave.ini and noxfile.py
_create_init_mocks() shared helper eliminates duplication in step definitions
create_autospec(ProjectService) validates method signatures
Robot teardowns properly clean up temp directories
Note for fix author (#522) in the feature file header is helpful context

Severity	Count	IDs	Blocking?
BLOCKER	4	B1, B2, B3, B4	Yes
HIGH	2	H1, H2	No (should fix)
MEDIUM	3	M1, M2, M3	No
LOW	2	L1, L2	No

Bottom line: The test content is good — it went through thorough review cycles with @CoreRasurae and @hamza.khyari, and the code quality reflects that. The blockers are all process/hygiene issues: the branch needs a rebase + squash on current master to produce a clean single-commit history, the CHANGELOG conflict must be properly resolved, and the PR description must be populated.

## Code Review — PR #566 (`feature/m3-test-init-yes-flag`) **Reviewer:** @hurui200320 **Reviewed against:** Issue #536 acceptance criteria, `docs/specification.md` §`agents init` (lines 1215–1450, 34303–34327), and `CONTRIBUTING.md` project conventions. **Verdict:** **REQUEST CHANGES** — 4 blockers, 2 high, 3 medium, 2 low. --- The PR adds TDD-style failing tests for the `agents init --yes` missing option (bug #522). The *test code itself* is well-structured after multiple review rounds — the Behave scenarios are clear, output assertions now align with the spec, mock isolation is reasonable, and the `@wip` exclusion mechanism works correctly for both Behave (`behave.ini`) and Robot (`noxfile.py`). However, the PR has **4 blocking process violations** that must be resolved before merge. --- ### BLOCKER Findings #### B1: Branch contains 4 merge commits **Commits:** `75bd0d2e`, `b01bc6e4`, `a026fafe`, `e49bd099` `CONTRIBUTING.md` states: *"Each branch must not contain merge commits — instead as master drifts the branches should always align with master via rebase, strictly avoiding merge commits in a branch's history."* **Recommendation:** Rebase the branch on current `master` using `git rebase origin/master`, resolving conflicts along the way. #### B2: 12 commits (8 non-merge) for a single issue The branch has: 1 original implementation + 1 refactor + 1 docs commit + 5 review-fix commits + 4 merge commits = 12 total. `CONTRIBUTING.md` states: *"Every commit must completely implement an issue and close it — at no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch."* **Recommendation:** Interactive-rebase and squash all work into a single commit with the prescribed commit message `test(cli): add failing tests for agents init --yes missing option`. #### B3: CHANGELOG.md deletes entries from other merged PRs The branch's `CHANGELOG.md` is missing entries for #495 (M4 acceptance criteria), #193 (scoped backend view filtering), and #191 (context strategy registry) that exist on current `master`. These were lost during merge conflict resolution. If merged as-is, this PR would **delete valid changelog entries** from other teams' merged work. **Recommendation:** Rebasing on current `master` (B1 fix) will resolve this — add the new entry at the top of the Unreleased section without removing existing entries. #### B4: Empty PR description The PR body is empty (`""`). `CONTRIBUTING.md` requires: *"Every PR must include a clear, descriptive body that explains the purpose of the change... PRs submitted without a description or without an issue reference will not be reviewed."* **Recommendation:** Add a proper PR description with: - Summary of changes and motivation - `Closes #536` closing keyword - Forgejo dependency link (PR blocks issue #536) --- ### HIGH Findings #### H1: Unnecessary `# type: ignore[attr-defined]` in benchmark file **File:** `benchmarks/cli_init_yes_bench.py:118` ```python InitYesFlagSuite.track_exit_code.unit = "exit_code" # type: ignore[attr-defined] ``` `CONTRIBUTING.md` states: *"Never use `# type: ignore` or disable type checking."* Furthermore, `pyrightconfig.json` sets `"include": ["src"]`, meaning `benchmarks/` is **not included** in type checking at all. The `# type: ignore` comment is both a rule violation and unnecessary since Pyright never processes this file. **Recommendation:** Remove the `# type: ignore[attr-defined]` comment entirely. #### H2: Issue quality subtasks remain unchecked Two quality subtasks in issue #536 are still unchecked: "Verify coverage >=97%" and "Run nox (all default sessions)". Brent's latest comment (PR comment #55003) claims `nox -s unit_tests` (8701 passed), `nox -s integration_tests` (1250 passed), and `nox -s coverage_report` (97%) all pass. If these quality gates were indeed verified, the subtasks should be checked off in the issue body. **Recommendation:** Mark the quality subtasks as checked if they have been verified. --- ### MEDIUM Findings #### M1: `_MockProject` class duplicated in two files **Files:** `features/steps/cli_init_yes_flag_steps.py:51-59`, `benchmarks/cli_init_yes_bench.py:29-42` The identical `_MockProject` class is defined in both files. `CONTRIBUTING.md` states mocking code belongs in `features/mocks/`. When #522 adds the real fields to the Project model, both copies must be updated independently. **Recommendation:** Extract `_MockProject` to `features/mocks/mock_project.py` and import from both locations. #### M2: Benchmark tests non-spec `--path` flag **File:** `benchmarks/cli_init_yes_bench.py:93-104` `time_init_yes_flag_with_path` benchmarks `agents init --yes --path <dir>`, but the spec defines only `agents init [--yes|-y]` with no `--path` option (`specification.md:1217`). The comment documents this caveat, but TDD tests should drive toward spec behavior rather than benchmarking implementation details that will be removed. **Recommendation:** Remove `time_init_yes_flag_with_path` to keep the benchmark suite aligned with the spec. #### M3: Mock strategy coupled to pre-spec implementation **Files:** `features/steps/cli_init_yes_flag_steps.py:73-75`, `benchmarks/cli_init_yes_bench.py:66-80` The mocks patch `container.get_container` and access `.project_service().initialize_project()`. The spec defines `agents init` as a **global environment** reset/wipe operation (`specification.md:1219-1224`), not a project-level initialization. When #522 aligns the command with the spec, the mock strategy will likely need rewriting. The feature file header note for fix author (#522) partially addresses this, but no similar note exists in the step definitions or benchmark file. **Recommendation:** Add a comment in `_create_init_mocks()` and `_invoke_with_mock()` noting the mock target coupling and that it will change when #522 refactors init to match global-reset spec semantics. --- ### LOW Findings #### L1: Gherkin uses `Then ... Then` instead of `Then ... And` **File:** `features/cli_init_yes_flag.feature:55-56` ```gherkin Then the init command should exit with code 0 Then the init output should indicate interactive mode ``` Gherkin convention uses `And` for consecutive assertion steps. **Recommendation:** Change line 56 to `And the init output should indicate interactive mode`. #### L2: CHANGELOG entry says "three scenarios" but there are five **File:** `CHANGELOG.md` The entry states *"Three scenarios verify..."* but the feature file has 5 scenarios (4 `@wip` + 1 regression guard). The text was written before later review rounds added scenarios. **Recommendation:** Update to accurately describe the 5 scenarios. --- ### Positive Observations - Behave scenarios clearly map to spec requirements at `specification.md:1381-1402` - Output assertions match the spec exactly (`Data Dir:`, `Config:`, `Database:`, `Directories:`, `Initialized (non-interactive)`) - Both `--yes` and `-y` short-form are tested across Behave, Robot, and ASV - `@wip` exclusion mechanism correctly implemented in both `behave.ini` and `noxfile.py` - `_create_init_mocks()` shared helper eliminates duplication in step definitions - `create_autospec(ProjectService)` validates method signatures - Robot teardowns properly clean up temp directories - Note for fix author (#522) in the feature file header is helpful context --- | Severity | Count | IDs | Blocking? | |----------|-------|-----|-----------| | BLOCKER | 4 | B1, B2, B3, B4 | **Yes** | | HIGH | 2 | H1, H2 | No (should fix) | | MEDIUM | 3 | M1, M2, M3 | No | | LOW | 2 | L1, L2 | No | **Bottom line:** The test content is good — it went through thorough review cycles with @CoreRasurae and @hamza.khyari, and the code quality reflects that. The blockers are all process/hygiene issues: the branch needs a rebase + squash on current `master` to produce a clean single-commit history, the CHANGELOG conflict must be properly resolved, and the PR description must be populated.

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +90,4 @@

				        """Measure end-to-end latency of ``agents init -y``."""

				        self._invoke_with_mock(["init", "-y"], "bench-project")

				    def time_init_yes_flag_with_path(self) -> None:

hurui200320 commented

M2: This benchmark tests --path which is not in the spec (specification.md:1217 defines only agents init [--yes|-y]). TDD benchmarks should drive toward spec behavior. Recommend removing this method.

**M2**: This benchmark tests `--path` which is not in the spec (`specification.md:1217` defines only `agents init [--yes|-y]`). TDD benchmarks should drive toward spec behavior. Recommend removing this method.

benchmarks/cli_init_yes_bench.py Outdated

						
				@@ -0,0 +115,4 @@

				        return self._last_exit_code

				InitYesFlagSuite.track_exit_code.unit = "exit_code"  # type: ignore[attr-defined]

hurui200320 commented

H1: # type: ignore[attr-defined] — pyrightconfig.json sets "include": ["src"], so benchmarks/ is not type-checked by Pyright. This comment is both unnecessary and a CONTRIBUTING.md violation ("Never use # type: ignore"). Remove it.

**H1**: `# type: ignore[attr-defined]` — `pyrightconfig.json` sets `"include": ["src"]`, so `benchmarks/` is not type-checked by Pyright. This comment is both unnecessary and a `CONTRIBUTING.md` violation ("Never use `# type: ignore`"). Remove it.

features/cli_init_yes_flag.feature Outdated

						
				@@ -0,0 +53,4 @@

				    Given I have a temporary project directory for init

				    When I run agents init without the --yes flag

				    Then the init command should exit with code 0

				    Then the init output should indicate interactive mode

hurui200320 commented

L1: Gherkin convention uses And for consecutive assertion steps. This should be And the init output should indicate interactive mode.

**L1**: Gherkin convention uses `And` for consecutive assertion steps. This should be `And the init output should indicate interactive mode`.

features/steps/cli_init_yes_flag_steps.py Outdated

						
				@@ -0,0 +70,4 @@

				    mock_project.directories = ["logs", "cache", "sessions", "contexts"]

				    mock_service.initialize_project.return_value = mock_project

				    patcher = patch("cleveragents.application.container.get_container")

hurui200320 commented