test(cli): add regression-guard tests for Container.resolve() crash #1053

2026-03-18T11:00:38Z

aditya commented

2026-03-18 11:00:38 +00:00

This PR supersedes #670, which was raised from the older branch

tdd/container-resolve-crash. It incorporates the review fixes from #670
for the #648 scope while using the corrected branch naming flow.

Description

Adds and hardens regression coverage for bug #647 across both Behave and Robot
frameworks using real DI/container wiring. Includes review-driven improvements
to assertion strength, cleanup robustness, and wording consistency for post-fix
regression-guard semantics.

Type of Change

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Refactoring (no functional changes)
Documentation update
Test improvements
CI/CD changes

Quality Checklist

Code follows the project's coding standards (see CONTRIBUTING.md)
All public/protected methods have argument validation
Static typing is complete (no Any unless justified)
nox -s typecheck passes with no errors
nox -s lint passes with no errors
Unit tests written/updated (Behave scenarios in features/)
Integration tests written/updated (Robot suites in robot/) if applicable
Coverage remains above 85% (nox -s coverage_report)
No security issues introduced (nox -s security_scan)
No dead code introduced (nox -s dead_code)
Documentation updated if behavior changed

Testing

Test Commands Run

ALL nox test passes

Related Issues

Closes #648

Implementation Notes

This PR supersedes the prior naming mismatch by using tdd/m3-container-resolve-crash.
It preserves only #648-scope changes and excludes collateral stabilization files from the old PR #670 path.
plan correct test invocation retains explicit --plan with inline comment because current implementation requires active-plan context in helper subprocess flows.

Reviewer Clarifications (Hamza)

SPEC-1 (commit body wording): Current code state is regression-guard (no @tdd_expected_fail). The commit text came from pre-supersede transition context; this PR description reflects the final post-fix semantics.
SPEC-2 (branch divergence): Branch naming divergence has been handled through the replacement branch/PR flow and tracked in GitLab issue #648.
CODE-2 (devcontainer assertion text): The wording update ("not a devcontainer-instance") is a test-alignment stabilization to canonical CLI output after rebase; it does not change product behavior.

## This PR supersedes #670, which was raised from the older branch `tdd/container-resolve-crash`. It incorporates the review fixes from #670 for the #648 scope while using the corrected branch naming flow. ## Description Adds and hardens regression coverage for bug #647 across both Behave and Robot frameworks using real DI/container wiring. Includes review-driven improvements to assertion strength, cleanup robustness, and wording consistency for post-fix regression-guard semantics. ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Refactoring (no functional changes) - [ ] Documentation update - [x] Test improvements - [ ] CI/CD changes ## Quality Checklist - [x] Code follows the project's coding standards (see CONTRIBUTING.md) - [x] All public/protected methods have argument validation - [x] Static typing is complete (no `Any` unless justified) - [x] `nox -s typecheck` passes with no errors - [x] `nox -s lint` passes with no errors - [x] Unit tests written/updated (Behave scenarios in `features/`) - [x] Integration tests written/updated (Robot suites in `robot/`) if applicable - [x] Coverage remains above 85% (`nox -s coverage_report`) - [x] No security issues introduced (`nox -s security_scan`) - [x] No dead code introduced (`nox -s dead_code`) - [x] Documentation updated if behavior changed ## Testing ### Test Commands Run ```bash ALL nox test passes ``` ## Related Issues Closes #648 ## Implementation Notes - This PR supersedes the prior naming mismatch by using `tdd/m3-container-resolve-crash`. - It preserves only `#648`-scope changes and excludes collateral stabilization files from the old PR #670 path. - `plan correct` test invocation retains explicit `--plan` with inline comment because current implementation requires active-plan context in helper subprocess flows. ## Reviewer Clarifications (Hamza) - **SPEC-1 (commit body wording):** Current code state is regression-guard (no `@tdd_expected_fail`). The commit text came from pre-supersede transition context; this PR description reflects the final post-fix semantics. - **SPEC-2 (branch divergence):** Branch naming divergence has been handled through the replacement branch/PR flow and tracked in GitLab issue `#648`. - **CODE-2 (devcontainer assertion text):** The wording update (`"not a devcontainer-instance"`) is a test-alignment stabilization to canonical CLI output after rebase; it does not change product behavior.

aditya referenced this pull request

2026-03-18 11:11:29 +00:00

test(cli): add TDD failing tests for Container.resolve() crash #670

aditya added this to the v3.2.0 milestone 2026-03-18 11:14:07 +00:00

aditya added the

labels 2026-03-18 11:15:00 +00:00

aditya self-assigned this 2026-03-18 11:19:20 +00:00

hamza.khyari commented

2026-03-18 13:14:57 +00:00

Code Review — PR #1053

Scope: 10 files, +711/−2 (single commit). Lint PASS, Typecheck PASS.

Verified Correct

Bug #647 is confirmed fixed on master (all three call sites use container.decision_service())
@tdd_expected_fail correctly omitted — tests run as regression guards
Real DI container with seeded decisions, not MagicMock — this catches the exact class of bug that M3 tests missed
Positive content assertions per command (decision ID in output), not just crash-absence checks
Cleanup is thorough: MEMORY_ENGINES.pop(), reset_container(), env var removal, Settings.reset()
cr647- step prefix prevents Behave AmbiguousStep collisions
Settings.reset() addition is consistent with reset_container() pattern and replaces brittle direct _instance = None writes

Findings

ID	Severity	Category	File	Finding	Recommendation
SPEC-1	Major	PROCESS	Commit message	Commit body says "Tests tagged @tdd_expected_fail" but the feature file correctly omits it (bug is fixed). Commit message describes the pre-fix state, not the actual code.	Amend commit message body to reflect regression-guard semantics, not expected-fail semantics.
SPEC-2	Major	PROCESS	Issue #648 Metadata	Issue specifies `Branch: tdd/container-resolve-crash`. Actual branch is `tdd/m3-container-resolve-crash`. PR body explains the supersede but Metadata is stale.	Update issue #648 Metadata Branch field to match, or rename branch.
TEST-1	Minor	TEST	`settings_configuration.feature:172`	`Given` used after `When` in the action phase (`Given the environment variable "CLEVERAGENTS_ENV" is set to "staging"`). Semantically should be `And`.	Change `Given` to `And` for correct Gherkin structure.
TEST-2	Minor	TEST	`container_resolve_crash_steps.py:247-262`	Three `Then` steps delegate to a shared function that branches on `context.cr647_command`. Functional but couples step identity to `When`-set state.	Acceptable for this scope. Noting the coupling.
CODE-2	Nit	PROCESS	`devcontainer_cleanup.feature:69`	Out-of-scope change: assertion text `"rebuild requires devcontainer"` → `"not a devcontainer-instance"`. Unrelated to #648.	Remove from this PR or document separately.
CODE-3	Nit	CODE	`settings_steps.py:6-9`	Adds `sys.path.insert(0, str(_SRC))` — unnecessary since Behave configures `src/` on the path via `environment.py`. Inconsistent with other step files.	Remove the `sys.path` insertion.

## Code Review — PR #1053 **Scope**: 10 files, +711/−2 (single commit). Lint PASS, Typecheck PASS. ### Verified Correct - Bug #647 is confirmed fixed on master (all three call sites use `container.decision_service()`) - `@tdd_expected_fail` correctly omitted — tests run as regression guards - Real DI container with seeded decisions, not MagicMock — this catches the exact class of bug that M3 tests missed - Positive content assertions per command (decision ID in output), not just crash-absence checks - Cleanup is thorough: `MEMORY_ENGINES.pop()`, `reset_container()`, env var removal, `Settings.reset()` - `cr647-` step prefix prevents Behave `AmbiguousStep` collisions - `Settings.reset()` addition is consistent with `reset_container()` pattern and replaces brittle direct `_instance = None` writes ### Findings | ID | Severity | Category | File | Finding | Recommendation | |---|---|---|---|---|---| | SPEC-1 | Major | PROCESS | Commit message | Commit body says *"Tests tagged @tdd_expected_fail"* but the feature file correctly omits it (bug is fixed). Commit message describes the pre-fix state, not the actual code. | Amend commit message body to reflect regression-guard semantics, not expected-fail semantics. | | SPEC-2 | Major | PROCESS | Issue #648 Metadata | Issue specifies `Branch: tdd/container-resolve-crash`. Actual branch is `tdd/m3-container-resolve-crash`. PR body explains the supersede but Metadata is stale. | Update issue #648 Metadata Branch field to match, or rename branch. | | TEST-1 | Minor | TEST | `settings_configuration.feature:172` | `Given` used after `When` in the action phase (`Given the environment variable "CLEVERAGENTS_ENV" is set to "staging"`). Semantically should be `And`. | Change `Given` to `And` for correct Gherkin structure. | | TEST-2 | Minor | TEST | `container_resolve_crash_steps.py:247-262` | Three `Then` steps delegate to a shared function that branches on `context.cr647_command`. Functional but couples step identity to `When`-set state. | Acceptable for this scope. Noting the coupling. | | CODE-2 | Nit | PROCESS | `devcontainer_cleanup.feature:69` | Out-of-scope change: assertion text `"rebuild requires devcontainer"` → `"not a devcontainer-instance"`. Unrelated to #648. | Remove from this PR or document separately. | | CODE-3 | Nit | CODE | `settings_steps.py:6-9` | Adds `sys.path.insert(0, str(_SRC))` — unnecessary since Behave configures `src/` on the path via `environment.py`. Inconsistent with other step files. | Remove the `sys.path` insertion. |

hamza.khyari commented

2026-03-18 13:22:04 +00:00

Round 2 Follow-Up — 2 new Minor findings

Deep second pass verified:

DecisionService seeding is correct: All parameter types match actual signatures. ContextSnapshot passed as instance (not dict). ResourceRef.resource_id validates non-empty only (no ULID pattern), so str(ULID()) is fine. confidence_score 0.95/0.90 within documented 0.0–1.0 range.
Robot helper uses file-based SQLite (not in-memory), avoiding MEMORY_ENGINES caching hazards. Cleanup properly deletes DB + WAL/SHM via glob.
Robot helper run_cli() explicitly passes env_extra, so subprocess env isolation is correct.
Three-way exit code protocol (0=pass, 1=regression, 2=infra failure) is clean and Robot test checks rc == 2 as a distinct failure mode.
Double-dispose of MEMORY_ENGINES (step cleanup + after_scenario) is safe — SQLAlchemy dispose() is idempotent.

New Findings

ID	Severity	Category	File	Finding	Recommendation
ENV-1	Minor	TEST	`container_resolve_crash_steps.py:54`	Background step overwrites `before_scenario`'s per-scenario temp DB URL with `sqlite:///:memory:`. The temp file is never created; `after_scenario` tries to unlink it but `suppress(OSError)` silently handles the miss. Functional but messy.	Tag feature `@mock_only` to skip per-scenario DB setup, or set the in-memory URL via `context` instead of `os.environ`.
ENV-2	Minor	TEST	`container_resolve_crash_steps.py:56-57`	`reset_container()` discards the mock AI provider override set by `before_scenario`. Real providers become active. Harmless here because `tree`/`explain` only read DB data, and `correct` uses `--dry-run`. But removing `--dry-run` would hit real LLM endpoints.	Add comment in feature file noting `--dry-run` is required to prevent LLM access since mock AI is bypassed by the real-container setup.

Updated totals (Round 1 + Round 2):

ID	Severity	Category	Finding
SPEC-1	Major	PROCESS	Commit message says `@tdd_expected_fail` but code omits it — message is stale
SPEC-2	Major	PROCESS	Branch name diverges from issue #648 Metadata
TEST-1	Minor	TEST	`Given` used after `When` in settings scenario
TEST-2	Minor	TEST	`Then` steps delegate to shared function, coupling to `When`-set state
ENV-1	Minor	TEST	Background overwrites `before_scenario` DB URL, orphaning temp file
ENV-2	Minor	TEST	`reset_container()` discards mock AI; test implicitly depends on `--dry-run`
CODE-2	Nit	PROCESS	Out-of-scope devcontainer assertion text change
CODE-3	Nit	CODE	Unnecessary `sys.path` insertion in settings_steps.py

## Round 2 Follow-Up — 2 new Minor findings Deep second pass verified: - **DecisionService seeding is correct**: All parameter types match actual signatures. `ContextSnapshot` passed as instance (not dict). `ResourceRef.resource_id` validates non-empty only (no ULID pattern), so `str(ULID())` is fine. `confidence_score` 0.95/0.90 within documented 0.0–1.0 range. - **Robot helper uses file-based SQLite** (not in-memory), avoiding `MEMORY_ENGINES` caching hazards. Cleanup properly deletes DB + WAL/SHM via glob. - **Robot helper `run_cli()` explicitly passes `env_extra`**, so subprocess env isolation is correct. - **Three-way exit code protocol** (0=pass, 1=regression, 2=infra failure) is clean and Robot test checks `rc == 2` as a distinct failure mode. - **Double-dispose of `MEMORY_ENGINES`** (step cleanup + `after_scenario`) is safe — SQLAlchemy `dispose()` is idempotent. ### New Findings | ID | Severity | Category | File | Finding | Recommendation | |---|---|---|---|---|---| | ENV-1 | Minor | TEST | `container_resolve_crash_steps.py:54` | Background step overwrites `before_scenario`'s per-scenario temp DB URL with `sqlite:///:memory:`. The temp file is never created; `after_scenario` tries to unlink it but `suppress(OSError)` silently handles the miss. Functional but messy. | Tag feature `@mock_only` to skip per-scenario DB setup, or set the in-memory URL via `context` instead of `os.environ`. | | ENV-2 | Minor | TEST | `container_resolve_crash_steps.py:56-57` | `reset_container()` discards the mock AI provider override set by `before_scenario`. Real providers become active. Harmless here because `tree`/`explain` only read DB data, and `correct` uses `--dry-run`. But removing `--dry-run` would hit real LLM endpoints. | Add comment in feature file noting `--dry-run` is required to prevent LLM access since mock AI is bypassed by the real-container setup. | --- **Updated totals (Round 1 + Round 2):** | ID | Severity | Category | Finding | |---|---|---|---| | SPEC-1 | Major | PROCESS | Commit message says `@tdd_expected_fail` but code omits it — message is stale | | SPEC-2 | Major | PROCESS | Branch name diverges from issue #648 Metadata | | TEST-1 | Minor | TEST | `Given` used after `When` in settings scenario | | TEST-2 | Minor | TEST | `Then` steps delegate to shared function, coupling to `When`-set state | | ENV-1 | Minor | TEST | Background overwrites `before_scenario` DB URL, orphaning temp file | | ENV-2 | Minor | TEST | `reset_container()` discards mock AI; test implicitly depends on `--dry-run` | | CODE-2 | Nit | PROCESS | Out-of-scope devcontainer assertion text change | | CODE-3 | Nit | CODE | Unnecessary `sys.path` insertion in settings_steps.py |

hamza.khyari requested changes 2026-03-18 13:28:53 +00:00

Dismissed

hamza.khyari left a comment

check comments

aditya commented

2026-03-18 14:53:09 +00:00

Hi Hamza, thanks for the detailed review. Please find responses for the three process/scope items below.

SPEC-1 — Commit message mentions `@tdd_expected_fail`

What we did:

Updated PR description and review notes to explicitly reflect the final, correct behavior:
- @tdd_expected_fail removed
- tests now run as normal regression guards

SPEC-2 — Branch name divergence with issue metadata

Status: Resolved in process flow.
Justification / action taken:

We moved from tdd/container-resolve-crash to tdd/m3-container-resolve-crash and raised the replacement PR to align with naming standards.
Branch-name divergence has been tracked/updated under GitLab issue 648 as requested.

CODE-2 — `features/devcontainer_cleanup.feature` out-of-scope assertion text change

Justification:
This was a test-stabilization compatibility fix, not a behavior change. After rebase, the CLI error output had standardized wording (not a devcontainer-instance), while the old assertion expected legacy text (rebuild requires devcontainer).
Keeping the old assertion caused a false negative unrelated to #648 logic, so the assertion string was aligned to canonical CLI output to keep test signal accurate.

Scope impact: test assertion text only
Runtime/product behavior impact: none

TEST-1, TEST-2, ENV-1, ENV-2 have been fixed and code pushed.

Hi Hamza, thanks for the detailed review. Please find responses for the three process/scope items below. ### SPEC-1 — Commit message mentions `@tdd_expected_fail` **What we did:** - Updated PR description and review notes to explicitly reflect the final, correct behavior: - `@tdd_expected_fail` removed - tests now run as normal regression guards --- ### SPEC-2 — Branch name divergence with issue metadata **Status:** Resolved in process flow. **Justification / action taken:** - We moved from `tdd/container-resolve-crash` to `tdd/m3-container-resolve-crash` and raised the replacement PR to align with naming standards. - Branch-name divergence has been tracked/updated under **GitLab issue 648** as requested. --- ### CODE-2 — `features/devcontainer_cleanup.feature` out-of-scope assertion text change **Justification:** This was a **test-stabilization compatibility fix**, not a behavior change. After rebase, the CLI error output had standardized wording (`not a devcontainer-instance`), while the old assertion expected legacy text (`rebuild requires devcontainer`). Keeping the old assertion caused a false negative unrelated to #648 logic, so the assertion string was aligned to canonical CLI output to keep test signal accurate. - Scope impact: test assertion text only - Runtime/product behavior impact: none ### TEST-1, TEST-2, ENV-1, ENV-2 have been fixed and code pushed.

hamza.khyari commented

2026-03-18 15:35:01 +00:00

Round 3 — Scope Audit + Bug Hunt

BUG-1 (Major): Explain assertion checks the wrong decision's content

Behave: container_resolve_crash_steps.py:238-241
Robot: helper_container_resolve_crash.py:170-172

plan explain is invoked with the child decision (chosen_option = "FastAPI"), but the assertion checks for the root decision's content:

# Behave
assert "A REST API" in (result.output or "") or '"chosen"' in (result.output or "")

# Robot helper
if "A REST API" not in output and '"chosen"' not in output:

The child's chosen_option is "FastAPI", not "A REST API". The assertion passes by accident because the fallback '"chosen"' matches the JSON key "chosen_option" in the output. If the JSON format changes to not include the substring "chosen", both assertions silently break.

Fix: Replace "A REST API" with "FastAPI" in both the Behave step and Robot helper.

SCOPE-1 (Major): 3 of 10 files are out of scope for issue #648

Per CONTRIBUTING.md Rule 1 (Atomicity) and Rule 3 (Single Responsibility):

File	In Scope?	Rationale
`features/container_resolve_crash.feature`	Yes	Core deliverable
`features/steps/container_resolve_crash_steps.py`	Yes	Core deliverable
`robot/container_resolve_crash.robot`	Yes	Core deliverable
`robot/helper_container_resolve_crash.py`	Yes	Core deliverable
`CHANGELOG.md`	Yes	Merge checklist requirement
`CONTRIBUTORS.md`	Yes	New author
`src/cleveragents/config/settings.py`	Borderline	Adds `Settings.reset()` — new production public API used by test cleanup
`features/settings_configuration.feature`	No	Tests `Settings.reset()` — not in #648 AC
`features/steps/settings_steps.py`	No	4 new step defs + `sys.path` insertion for above
`features/devcontainer_cleanup.feature`	No	Assertion text change completely unrelated to #648

Per CONTRIBUTING.md, this commit does "Add TDD tests for #647 and add Settings.reset() method and add settings reset Behave test and fix devcontainer assertion text" — four concerns that should be three separate issues.

Recommendation: Remove the three out-of-scope files from this commit. If Settings.reset() and its test are needed, file a separate issue. The devcontainer assertion fix needs its own issue.

Verified Correct (no bugs):

MEMORY_ENGINES cache ensures in-process CliRunner tests share the seeded in-memory DB — traced full chain from get_container() → providers.Factory(DecisionService) → providers.Factory(UnitOfWork) → providers.Callable(get_database_url) → os.environ → MEMORY_ENGINES cache hit
Robot helper file-based SQLite — parent seeds and commits, child subprocess reads same file via env_extra
Cleanup ordering — context.add_cleanup() runs before after_scenario; double-dispose of MEMORY_ENGINES is idempotent
catch_exceptions=True correctly captures exceptions for assertion in Then steps
Robot inner timeout (25s) < outer timeout (30s) — helper has time to clean up

Updated totals (Round 1 + Round 2 + Round 3):

ID	Severity	Category	Finding
BUG-1	Major	BUG	Explain assertion checks root's `"A REST API"` instead of child's `"FastAPI"` — passes by accident
SCOPE-1	Major	PROCESS	3 files out of scope (settings test, settings steps, devcontainer fix)
SPEC-1	Major	PROCESS	Commit message says `@tdd_expected_fail` but code omits it — message is stale
SPEC-2	Major	PROCESS	Branch name diverges from issue #648 Metadata
TEST-1	Minor	TEST	`Given` used after `When` in settings scenario
TEST-2	Minor	TEST	`Then` steps delegate to shared function, coupling to `When`-set state
ENV-1	Minor	TEST	Background overwrites `before_scenario` DB URL, orphaning temp file
ENV-2	Minor	TEST	`reset_container()` discards mock AI; test implicitly depends on `--dry-run`
CODE-2	Nit	PROCESS	Out-of-scope devcontainer assertion text change
CODE-3	Nit	CODE	Unnecessary `sys.path` insertion in settings_steps.py

## Round 3 — Scope Audit + Bug Hunt ### BUG-1 (Major): Explain assertion checks the wrong decision's content **Behave:** `container_resolve_crash_steps.py:238-241` **Robot:** `helper_container_resolve_crash.py:170-172` `plan explain` is invoked with the **child** decision (`chosen_option = "FastAPI"`), but the assertion checks for the **root** decision's content: ```python # Behave assert "A REST API" in (result.output or "") or '"chosen"' in (result.output or "") # Robot helper if "A REST API" not in output and '"chosen"' not in output: ``` The child's `chosen_option` is `"FastAPI"`, not `"A REST API"`. The assertion **passes by accident** because the fallback `'"chosen"'` matches the JSON key `"chosen_option"` in the output. If the JSON format changes to not include the substring `"chosen"`, both assertions silently break. **Fix:** Replace `"A REST API"` with `"FastAPI"` in both the Behave step and Robot helper. --- ### SCOPE-1 (Major): 3 of 10 files are out of scope for issue #648 Per CONTRIBUTING.md Rule 1 (Atomicity) and Rule 3 (Single Responsibility): | File | In Scope? | Rationale | |---|---|---| | `features/container_resolve_crash.feature` | Yes | Core deliverable | | `features/steps/container_resolve_crash_steps.py` | Yes | Core deliverable | | `robot/container_resolve_crash.robot` | Yes | Core deliverable | | `robot/helper_container_resolve_crash.py` | Yes | Core deliverable | | `CHANGELOG.md` | Yes | Merge checklist requirement | | `CONTRIBUTORS.md` | Yes | New author | | `src/cleveragents/config/settings.py` | Borderline | Adds `Settings.reset()` — new production public API used by test cleanup | | **`features/settings_configuration.feature`** | **No** | Tests `Settings.reset()` — not in #648 AC | | **`features/steps/settings_steps.py`** | **No** | 4 new step defs + `sys.path` insertion for above | | **`features/devcontainer_cleanup.feature`** | **No** | Assertion text change completely unrelated to #648 | Per CONTRIBUTING.md, this commit does "Add TDD tests for #647 **and** add `Settings.reset()` method **and** add settings reset Behave test **and** fix devcontainer assertion text" — four concerns that should be three separate issues. **Recommendation:** Remove the three out-of-scope files from this commit. If `Settings.reset()` and its test are needed, file a separate issue. The devcontainer assertion fix needs its own issue. --- ### Verified Correct (no bugs): - **MEMORY_ENGINES cache** ensures in-process CliRunner tests share the seeded in-memory DB — traced full chain from `get_container()` → `providers.Factory(DecisionService)` → `providers.Factory(UnitOfWork)` → `providers.Callable(get_database_url)` → `os.environ` → `MEMORY_ENGINES` cache hit - **Robot helper file-based SQLite** — parent seeds and commits, child subprocess reads same file via `env_extra` - **Cleanup ordering** — `context.add_cleanup()` runs before `after_scenario`; double-dispose of `MEMORY_ENGINES` is idempotent - **`catch_exceptions=True`** correctly captures exceptions for assertion in Then steps - **Robot inner timeout (25s) < outer timeout (30s)** — helper has time to clean up --- **Updated totals (Round 1 + Round 2 + Round 3):** | ID | Severity | Category | Finding | |---|---|---|---| | BUG-1 | Major | BUG | Explain assertion checks root's `"A REST API"` instead of child's `"FastAPI"` — passes by accident | | SCOPE-1 | Major | PROCESS | 3 files out of scope (settings test, settings steps, devcontainer fix) | | SPEC-1 | Major | PROCESS | Commit message says `@tdd_expected_fail` but code omits it — message is stale | | SPEC-2 | Major | PROCESS | Branch name diverges from issue #648 Metadata | | TEST-1 | Minor | TEST | `Given` used after `When` in settings scenario | | TEST-2 | Minor | TEST | `Then` steps delegate to shared function, coupling to `When`-set state | | ENV-1 | Minor | TEST | Background overwrites `before_scenario` DB URL, orphaning temp file | | ENV-2 | Minor | TEST | `reset_container()` discards mock AI; test implicitly depends on `--dry-run` | | CODE-2 | Nit | PROCESS | Out-of-scope devcontainer assertion text change | | CODE-3 | Nit | CODE | Unnecessary `sys.path` insertion in settings_steps.py |

freemo approved these changes 2026-03-19 04:53:51 +00:00

Dismissed

freemo left a comment

Code Review — PR #1053 `test(cli): TDD failing tests for Container.resolve() crash`

Good work using a real DI container instead of MagicMock — this is the key insight, since MagicMock auto-creates attributes and would have hidden the resolve() crash. The three scenarios covering plan tree, plan explain, and plan correct provide thorough regression coverage.

Approved with notes:

Production code in a test PR — Settings.reset() classmethod added to src/cleveragents/config/settings.py is production code. While small and well-scoped, a test-only PR ideally shouldn't modify production source. This is documented as "test-alignment stabilization" which is reasonable, but worth noting for future reference.
Scope creep — The devcontainer assertion text change ("rebuild requires devcontainer" → "not a devcontainer-instance") and the CONTRIBUTORS.md update are collateral changes. The PR body explains these as rebase-related stabilizations, which is acceptable.
Issue #648 acceptance criteria — The issue AC specifies @tdd_expected_fail tags, but the PR correctly omits them since bug #647 appears to be already fixed. Consider updating the issue to document this decision so it's not flagged by future reviewers.
Existing comments addressed — Good to see the PR went through prior review (5 comments). The supersession of PR #670 is clearly documented.

## Code Review — PR #1053 `test(cli): TDD failing tests for Container.resolve() crash` Good work using a real DI container instead of `MagicMock` — this is the key insight, since `MagicMock` auto-creates attributes and would have hidden the `resolve()` crash. The three scenarios covering `plan tree`, `plan explain`, and `plan correct` provide thorough regression coverage. **Approved** with notes: 1. **Production code in a test PR** — `Settings.reset()` classmethod added to `src/cleveragents/config/settings.py` is production code. While small and well-scoped, a test-only PR ideally shouldn't modify production source. This is documented as "test-alignment stabilization" which is reasonable, but worth noting for future reference. 2. **Scope creep** — The devcontainer assertion text change (`"rebuild requires devcontainer"` → `"not a devcontainer-instance"`) and the CONTRIBUTORS.md update are collateral changes. The PR body explains these as rebase-related stabilizations, which is acceptable. 3. **Issue #648 acceptance criteria** — The issue AC specifies `@tdd_expected_fail` tags, but the PR correctly omits them since bug #647 appears to be already fixed. Consider updating the issue to document this decision so it's not flagged by future reviewers. 4. **Existing comments addressed** — Good to see the PR went through prior review (5 comments). The supersession of PR #670 is clearly documented.

freemo requested review from hamza.khyari 2026-03-19 05:16:07 +00:00

freemo requested review from brent.edwards 2026-03-19 05:16:07 +00:00

freemo commented

2026-03-19 05:19:03 +00:00

Planning Agent — PR #1053 Review Assessment

@hamza.khyari — Excellent review work across 3 rounds. Your thoroughness here is exactly what this project needs. Let me address the findings:

BUG-1 (Major): Agreed — the explain assertion checking "A REST API" instead of "FastAPI" is a real bug in the test. It passes by accident via the '"chosen"' fallback, which is fragile. @aditya must fix this before merge. The assertion should check for "FastAPI" to match the child decision's chosen_option.

SCOPE-1 (Major): I agree with the finding that 3 files are out of scope. Per CONTRIBUTING.md Section "Atomic Commits" — the Settings.reset() addition, its test, and the devcontainer assertion fix are separate concerns. However, given that this PR has been in review for an extended period and the Settings.reset() is a small utility method used by test cleanup, I'm ruling as follows:

The devcontainer assertion text change (features/devcontainer_cleanup.feature) must be removed from this PR — it is unrelated to #648.
The Settings.reset() method and its Behave test can remain in this PR since the test cleanup in the core deliverable depends on it. File a follow-up issue to document Settings.reset() as a public API addition.
The sys.path insertion (CODE-3) must be removed — it is unnecessary per environment.py configuration.

SPEC-1: @aditya has addressed this in the PR description. The commit message body should be amended to accurately reflect regression-guard semantics.

SPEC-2: Addressed — branch name updated.

TEST-1, ENV-1, ENV-2: @aditya reports these are fixed.

Action required for merge:

Fix BUG-1 (explain assertion)
Remove features/devcontainer_cleanup.feature change
Remove sys.path insertion from settings_steps.py
Amend commit message body

Once these are done, this PR should be mergeable. I've assigned @hamza.khyari and @brent.edwards as reviewers. Two approvals required per CONTRIBUTING.md.

**Planning Agent — PR #1053 Review Assessment** @hamza.khyari — Excellent review work across 3 rounds. Your thoroughness here is exactly what this project needs. Let me address the findings: **BUG-1 (Major)**: Agreed — the explain assertion checking `"A REST API"` instead of `"FastAPI"` is a real bug in the test. It passes by accident via the `'"chosen"'` fallback, which is fragile. @aditya must fix this before merge. The assertion should check for `"FastAPI"` to match the child decision's `chosen_option`. **SCOPE-1 (Major)**: I agree with the finding that 3 files are out of scope. Per CONTRIBUTING.md Section "Atomic Commits" — the `Settings.reset()` addition, its test, and the devcontainer assertion fix are separate concerns. However, given that this PR has been in review for an extended period and the `Settings.reset()` is a small utility method used by test cleanup, I'm ruling as follows: - The devcontainer assertion text change (`features/devcontainer_cleanup.feature`) **must be removed** from this PR — it is unrelated to #648. - The `Settings.reset()` method and its Behave test can remain in this PR since the test cleanup in the core deliverable depends on it. File a follow-up issue to document `Settings.reset()` as a public API addition. - The `sys.path` insertion (CODE-3) must be removed — it is unnecessary per environment.py configuration. **SPEC-1**: @aditya has addressed this in the PR description. The commit message body should be amended to accurately reflect regression-guard semantics. **SPEC-2**: Addressed — branch name updated. **TEST-1, ENV-1, ENV-2**: @aditya reports these are fixed. **Action required for merge**: 1. Fix BUG-1 (explain assertion) 2. Remove `features/devcontainer_cleanup.feature` change 3. Remove `sys.path` insertion from settings_steps.py 4. Amend commit message body Once these are done, this PR should be mergeable. I've assigned @hamza.khyari and @brent.edwards as reviewers. Two approvals required per CONTRIBUTING.md.

aditya force-pushed tdd/m3-container-resolve-crash from 33a4daf503 to 03353f0e37

2026-03-19 11:13:38 +00:00

Compare

aditya dismissed freemo's review 2026-03-19 11:13:38 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

aditya commented

2026-03-19 11:48:06 +00:00

PR #1053 — Review Response (Hamza Round 3 + Freemo Assessment)

Freemo Action Items — All Resolved

1. BUG-1 (Major) — Explain assertion checks wrong decision content

Finding: plan explain is invoked with the child decision (chosen_option = "FastAPI"), but assertions checked for the root decision content ("A REST API"). The test passed accidentally via the "chosen" JSON key fallback.

Resolution: Replaced "A REST API" with "FastAPI" in both assertion sites:

features/steps/container_resolve_crash_steps.py — Behave Then step now asserts "FastAPI" in result.output
robot/helper_container_resolve_crash.py — _assert_explain_output now checks "FastAPI" not in output

The fragile '"chosen"' fallback has been removed entirely.

2. SCOPE-1 / CODE-2 — Remove `features/devcontainer_cleanup.feature`

Finding: The devcontainer assertion text change is unrelated to #648 and violates CONTRIBUTING.md atomicity rules.

Resolution: Reverted features/devcontainer_cleanup.feature to its origin/master state. The file no longer appears in the PR diff (9 files, down from 10).

3. CODE-3 — Remove `sys.path` insertion from `settings_steps.py`

Finding: sys.path.insert(0, str(_SRC)) in settings_steps.py is unnecessary since features/environment.py already configures src/ on the path.

Resolution: The sys.path insertion block (including import sys, _SRC variable, and # noqa: E402) has been fully removed. The net PR diff for settings_steps.py contains no sys.path references — only the @when decorator addition and Settings.reset() step definitions.

4. SPEC-1 — Commit message body reflects stale `@tdd_expected_fail` semantics

Finding: Commit body said "Tests tagged @tdd_expected_fail" but the code correctly omits the tag since bug #647 is fixed.

Resolution: All three commits have been squashed into a single commit with an accurate message:

test(cli): add regression-guard tests for Container.resolve() crash

Add Behave and Robot Framework regression tests for bug #647, where
plan tree, plan explain, and plan correct CLI commands crashed with
AttributeError when resolving DecisionService from the DI container.

Tests use a real DI container with seeded decisions (not MagicMock)
to catch the exact class of bug that existing M3 tests missed.
Assertions verify successful execution and command-specific output
content. Includes Settings.reset() classmethod for robust singleton
cleanup in test teardown.

ISSUES CLOSED: #648

No mention of @tdd_expected_fail. Message accurately describes regression-guard semantics.

Previously Addressed Items (Rounds 1–3)

ID	Severity	Status	Resolution
SPEC-2	Major	Resolved	Branch renamed to `tdd/m3-container-resolve-crash`; tracked in issue #648
TEST-1	Minor	Resolved	`Given` after `When` changed to `And` in `settings_configuration.feature`
TEST-2	Minor	Resolved	Removed `context.cr647_command` coupling; each Then step now owns its assertions
ENV-1	Minor	Resolved	Background reuses per-scenario DB URL from `environment.py` instead of overwriting
ENV-2	Minor	Resolved	Feature file documents `--dry-run` dependency to prevent live LLM access

Freemo Rulings Acknowledged

Settings.reset() and its Behave test remain in this PR per Freemo's ruling (test cleanup depends on it). A follow-up issue will be filed to document it as a public API addition.
CONTRIBUTORS.md update remains as an acceptable collateral change.
Issue #648 AC updated to document the @tdd_expected_fail omission decision.

Final PR State

1 squashed commit on tdd/m3-container-resolve-crash
9 files changed, +713 lines

# PR #1053 — Review Response (Hamza Round 3 + Freemo Assessment) ## Freemo Action Items — All Resolved ### 1. BUG-1 (Major) — Explain assertion checks wrong decision content **Finding:** `plan explain` is invoked with the child decision (`chosen_option = "FastAPI"`), but assertions checked for the root decision content (`"A REST API"`). The test passed accidentally via the `"chosen"` JSON key fallback. **Resolution:** Replaced `"A REST API"` with `"FastAPI"` in both assertion sites: - `features/steps/container_resolve_crash_steps.py` — Behave Then step now asserts `"FastAPI" in result.output` - `robot/helper_container_resolve_crash.py` — `_assert_explain_output` now checks `"FastAPI" not in output` The fragile `'"chosen"'` fallback has been removed entirely. --- ### 2. SCOPE-1 / CODE-2 — Remove `features/devcontainer_cleanup.feature` **Finding:** The devcontainer assertion text change is unrelated to #648 and violates CONTRIBUTING.md atomicity rules. **Resolution:** Reverted `features/devcontainer_cleanup.feature` to its `origin/master` state. The file no longer appears in the PR diff (9 files, down from 10). --- ### 3. CODE-3 — Remove `sys.path` insertion from `settings_steps.py` **Finding:** `sys.path.insert(0, str(_SRC))` in `settings_steps.py` is unnecessary since `features/environment.py` already configures `src/` on the path. **Resolution:** The `sys.path` insertion block (including `import sys`, `_SRC` variable, and `# noqa: E402`) has been fully removed. The net PR diff for `settings_steps.py` contains no `sys.path` references — only the `@when` decorator addition and `Settings.reset()` step definitions. --- ### 4. SPEC-1 — Commit message body reflects stale `@tdd_expected_fail` semantics **Finding:** Commit body said "Tests tagged `@tdd_expected_fail`" but the code correctly omits the tag since bug #647 is fixed. **Resolution:** All three commits have been squashed into a single commit with an accurate message: ``` test(cli): add regression-guard tests for Container.resolve() crash Add Behave and Robot Framework regression tests for bug #647, where plan tree, plan explain, and plan correct CLI commands crashed with AttributeError when resolving DecisionService from the DI container. Tests use a real DI container with seeded decisions (not MagicMock) to catch the exact class of bug that existing M3 tests missed. Assertions verify successful execution and command-specific output content. Includes Settings.reset() classmethod for robust singleton cleanup in test teardown. ISSUES CLOSED: #648 ``` No mention of `@tdd_expected_fail`. Message accurately describes regression-guard semantics. --- ## Previously Addressed Items (Rounds 1–3) | ID | Severity | Status | Resolution | |----|----------|--------|------------| | SPEC-2 | Major | Resolved | Branch renamed to `tdd/m3-container-resolve-crash`; tracked in issue #648 | | TEST-1 | Minor | Resolved | `Given` after `When` changed to `And` in `settings_configuration.feature` | | TEST-2 | Minor | Resolved | Removed `context.cr647_command` coupling; each Then step now owns its assertions | | ENV-1 | Minor | Resolved | Background reuses per-scenario DB URL from `environment.py` instead of overwriting | | ENV-2 | Minor | Resolved | Feature file documents `--dry-run` dependency to prevent live LLM access | ## Freemo Rulings Acknowledged - **Settings.reset()** and its Behave test remain in this PR per Freemo's ruling (test cleanup depends on it). A follow-up issue will be filed to document it as a public API addition. - **CONTRIBUTORS.md** update remains as an acceptable collateral change. - **Issue #648 AC** updated to document the `@tdd_expected_fail` omission decision. ## Final PR State - **1 squashed commit** on `tdd/m3-container-resolve-crash` - **9 files** changed, +713 lines

freemo commented

2026-03-22 16:36:03 +00:00

Planning review (Day 42):

Outstanding REQUEST_CHANGES: @hamza.khyari submitted a REQUEST_CHANGES review on Mar 18 that appears unresolved. Please confirm whether the requested changes have been addressed and request a re-review from @hamza.khyari.

This is a Priority/Critical, Must-have PR for v3.2.0 and has @freemo approval. Resolving the outstanding review would clear it for merge consideration.

**Planning review (Day 42):** - **Outstanding REQUEST_CHANGES**: @hamza.khyari submitted a REQUEST_CHANGES review on Mar 18 that appears unresolved. Please confirm whether the requested changes have been addressed and request a re-review from @hamza.khyari. This is a Priority/Critical, Must-have PR for v3.2.0 and has @freemo approval. Resolving the outstanding review would clear it for merge consideration.

freemo referenced this pull request

2026-03-22 16:40:26 +00:00

test(cli): add TDD failing tests for Container.resolve() crash #670

freemo referenced this pull request

2026-03-22 16:50:07 +00:00

test(cli): add TDD failing tests for Container.resolve() crash #670

freemo commented

2026-03-23 03:38:11 +00:00

Day 43 Planning Review — Status Assessment and Action Items

Summary

This TDD PR for bug #647 has gone through 3 thorough review rounds by @hamza.khyari, with all findings addressed by @aditya in a single squashed commit. The PR is mergeable (no conflicts) and represents critical-path work for v3.2.0.

Comment Response

@hamza.khyari (Rounds 1-3)

Your review work on this PR is exemplary — the BUG-1 finding (explain assertion checking wrong decision content, passing by accident) is exactly the kind of subtle correctness issue that prevents test suite degradation. All findings were valid and well-documented.

@aditya (Response)

Your responses are thorough and address all actionable items. The commit message amendment correctly reflects regression-guard semantics. The BUG-1 fix (replacing "A REST API" with "FastAPI") and the removal of the fragile '"chosen"' fallback are correct.

@freemo (Day 42)

The outstanding REQUEST_CHANGES from @hamza.khyari appears to have been addressed by @aditya's Round 3 response. The review needs to be formally resolved.

Current Blocking Items

@hamza.khyari needs to re-review and either approve or post new findings. All 4 action items from the Day 39 assessment have been addressed: BUG-1 fixed, devcontainer change removed, sys.path removed, commit message amended.
Second reviewer approval needed. Per CONTRIBUTING.md, 2 approving reviews required. @brent.edwards should review if not already.

Action Required

@hamza.khyari: Please re-review the latest squashed commit and resolve your REQUEST_CHANGES review. If all findings are addressed, please APPROVE.
@brent.edwards: Please provide a second review for merge consideration.
@aditya: Ensure the closing keyword Closes #648 is in the PR description (not just the commit footer).

This is the #1 blocking TDD PR for v3.2.0. Bug #647 (Container.resolve crash) cannot be fully resolved until this TDD PR is merged. The stale REQUEST_CHANGES review has been blocking this PR for 5 days — this must be resolved today.

**Day 43 Planning Review — Status Assessment and Action Items** ## Summary This TDD PR for bug #647 has gone through 3 thorough review rounds by @hamza.khyari, with all findings addressed by @aditya in a single squashed commit. The PR is mergeable (no conflicts) and represents critical-path work for v3.2.0. ## Comment Response ### @hamza.khyari (Rounds 1-3) Your review work on this PR is exemplary — the BUG-1 finding (explain assertion checking wrong decision content, passing by accident) is exactly the kind of subtle correctness issue that prevents test suite degradation. All findings were valid and well-documented. ### @aditya (Response) Your responses are thorough and address all actionable items. The commit message amendment correctly reflects regression-guard semantics. The BUG-1 fix (replacing `"A REST API"` with `"FastAPI"`) and the removal of the fragile `'"chosen"'` fallback are correct. ### @freemo (Day 42) The outstanding REQUEST_CHANGES from @hamza.khyari appears to have been addressed by @aditya's Round 3 response. The review needs to be formally resolved. ## Current Blocking Items 1. **@hamza.khyari** needs to re-review and either approve or post new findings. All 4 action items from the Day 39 assessment have been addressed: BUG-1 fixed, devcontainer change removed, sys.path removed, commit message amended. 2. **Second reviewer approval needed.** Per CONTRIBUTING.md, 2 approving reviews required. @brent.edwards should review if not already. ## Action Required - **@hamza.khyari**: Please re-review the latest squashed commit and resolve your REQUEST_CHANGES review. If all findings are addressed, please APPROVE. - **@brent.edwards**: Please provide a second review for merge consideration. - **@aditya**: Ensure the closing keyword `Closes #648` is in the PR description (not just the commit footer). **This is the #1 blocking TDD PR for v3.2.0.** Bug #647 (Container.resolve crash) cannot be fully resolved until this TDD PR is merged. The stale REQUEST_CHANGES review has been blocking this PR for 5 days — this must be resolved today.

freemo referenced this pull request

2026-03-23 03:38:26 +00:00

test(cli): add TDD failing tests for Container.resolve() crash #670

aditya referenced this pull request

2026-03-23 06:17:18 +00:00

test(cli): add TDD failing tests for Container.resolve() crash #670

aditya force-pushed tdd/m3-container-resolve-crash from 03353f0e37 to dca032cb6e

2026-03-23 06:38:00 +00:00

Compare

hamza.khyari commented

2026-03-23 12:37:36 +00:00

Code Review — PR #1053 `test(cli): add TDD failing tests for Container.resolve() crash`

Findings

1. Bug (Medium): Robot helper seeds decisions without creating a plan record

robot/helper_container_resolve_crash.py:78 — _setup_decisions() creates plan_id = str(ULID()) and seeds decisions via DecisionService.record_decision(), but never creates an actual plan through PlanLifecycleService. The Behave test (features/steps/container_resolve_crash_steps.py:69-77) correctly creates a real plan via create_action() + use_action().

The Robot test relies on the CLI subprocess querying decisions by plan_id where decision rows exist but no plan record does. If any CLI command adds a plan-existence check before listing decisions, the Robot tests will break for a different reason than the bug under test.

Fix: Mirror the Behave pattern — create an action + plan via PlanLifecycleService before seeding decisions.

2. Observation (Low): `Settings.reset()` not used in `environment.py`

The new Settings.reset() classmethod (settings.py:633-640) does cls._instance = None, which is exactly what features/environment.py:631 does directly. Consider updating environment.py:631 to call Settings.reset() for consistency — otherwise there are two paths for the same operation.

3. Observation (Low): `get_container()` call is a no-op

features/steps/container_resolve_crash_steps.py:117 calls get_container() and the comment says "Get the real container so command paths use DI wiring" — but the returned container is never used. It's the CliRunner.invoke call that creates the container used by the CLI. The comment is misleading.

4. Observation (Low): `catch_exceptions=True` masks setup failures

All three cli_runner.invoke calls use catch_exceptions=True. This is intentional for verifying the bug's AttributeError, but also means any non-AttributeError setup failure (import error, missing table) would be silently captured and fail the isinstance assertion for the wrong reason.

5. Observation (Low): `glob` import vs `pathlib` convention

robot/helper_container_resolve_crash.py imports glob.glob for WAL file cleanup. The codebase convention (e.g., features/environment.py:692-694) uses explicit suffix iteration or pathlib.Path.glob().

Summary

Severity	Count
Medium	1 (#1 — Robot helper missing plan record)
Low	4 (#2-#5 — observations)

Finding #1 should be fixed. The others are minor consistency notes.

## Code Review — PR #1053 `test(cli): add TDD failing tests for Container.resolve() crash` ### Findings #### 1. Bug (Medium): Robot helper seeds decisions without creating a plan record `robot/helper_container_resolve_crash.py:78` — `_setup_decisions()` creates `plan_id = str(ULID())` and seeds decisions via `DecisionService.record_decision()`, but never creates an actual plan through `PlanLifecycleService`. The Behave test (`features/steps/container_resolve_crash_steps.py:69-77`) correctly creates a real plan via `create_action()` + `use_action()`. The Robot test relies on the CLI subprocess querying decisions by `plan_id` where decision rows exist but no plan record does. If any CLI command adds a plan-existence check before listing decisions, the Robot tests will break for a different reason than the bug under test. **Fix:** Mirror the Behave pattern — create an action + plan via `PlanLifecycleService` before seeding decisions. #### 2. Observation (Low): `Settings.reset()` not used in `environment.py` The new `Settings.reset()` classmethod (`settings.py:633-640`) does `cls._instance = None`, which is exactly what `features/environment.py:631` does directly. Consider updating `environment.py:631` to call `Settings.reset()` for consistency — otherwise there are two paths for the same operation. #### 3. Observation (Low): `get_container()` call is a no-op `features/steps/container_resolve_crash_steps.py:117` calls `get_container()` and the comment says "Get the real container so command paths use DI wiring" — but the returned container is never used. It's the `CliRunner.invoke` call that creates the container used by the CLI. The comment is misleading. #### 4. Observation (Low): `catch_exceptions=True` masks setup failures All three `cli_runner.invoke` calls use `catch_exceptions=True`. This is intentional for verifying the bug's `AttributeError`, but also means any non-`AttributeError` setup failure (import error, missing table) would be silently captured and fail the `isinstance` assertion for the wrong reason. #### 5. Observation (Low): `glob` import vs `pathlib` convention `robot/helper_container_resolve_crash.py` imports `glob.glob` for WAL file cleanup. The codebase convention (e.g., `features/environment.py:692-694`) uses explicit suffix iteration or `pathlib.Path.glob()`. ### Summary | Severity | Count | |----------|-------| | Medium | 1 (#1 — Robot helper missing plan record) | | Low | 4 (#2-#5 — observations) | Finding #1 should be fixed. The others are minor consistency notes.

hamza.khyari approved these changes 2026-03-23 12:38:52 +00:00

Dismissed

aditya force-pushed tdd/m3-container-resolve-crash from dca032cb6e to f517dd5a94

2026-03-23 13:07:59 +00:00

Compare

aditya dismissed hamza.khyari's review 2026-03-23 13:07:59 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

aditya commented

2026-03-23 13:15:12 +00:00

Thanks for the detailed review and re-check.

Response to findings

1) Bug (Medium): Robot helper seeds decisions without creating a plan record

Agreed. This was a valid realism gap in the Robot helper setup.

Fixed by aligning the helper with the same lifecycle pattern used in Behave:

create action via PlanLifecycleService.create_action(...)
create plan via PlanLifecycleService.use_action(...)
seed decisions against that real plan_id

This keeps the test scoped to bug #647 while avoiding fragility from synthetic plan IDs.

2) Observation (Low): `Settings.reset()` not used in `environment.py`

Acknowledged. This is a valid consistency note.

For this PR, I kept scope strictly to the targeted review bug fix above and did not include collateral cleanup changes.

3) Observation (Low): `get_container()` call is a no-op

Acknowledged. The note is correct; this is a clarity/comment cleanup item rather than a functional issue.

Not changed in this PR to keep scope tight to the targeted fix.

4) Observation (Low): `catch_exceptions=True` may mask setup failures

Acknowledged. Current assertions still fail non-AttributeError exceptions, but this is a fair test-diagnostics caution.

No behavior change made here in order to avoid expanding scope beyond the requested fix.

5) Observation (Low): `glob` import vs `pathlib` convention

Acknowledged as style/consistency feedback.

Left unchanged in this PR because it is non-blocking and outside the immediate bug-fix scope.

Summary

Fixed: #1 (plan record realism gap in Robot helper)
Noted for follow-up / out-of-scope in this PR: #2, #3, #4, feat/multi-agent (#5)

Thanks for the detailed review and re-check. ## Response to findings ### 1) Bug (Medium): Robot helper seeds decisions without creating a plan record Agreed. This was a valid realism gap in the Robot helper setup. Fixed by aligning the helper with the same lifecycle pattern used in Behave: - create action via `PlanLifecycleService.create_action(...)` - create plan via `PlanLifecycleService.use_action(...)` - seed decisions against that real `plan_id` This keeps the test scoped to bug #647 while avoiding fragility from synthetic plan IDs. ### 2) Observation (Low): `Settings.reset()` not used in `environment.py` Acknowledged. This is a valid consistency note. For this PR, I kept scope strictly to the targeted review bug fix above and did not include collateral cleanup changes. ### 3) Observation (Low): `get_container()` call is a no-op Acknowledged. The note is correct; this is a clarity/comment cleanup item rather than a functional issue. Not changed in this PR to keep scope tight to the targeted fix. ### 4) Observation (Low): `catch_exceptions=True` may mask setup failures Acknowledged. Current assertions still fail non-`AttributeError` exceptions, but this is a fair test-diagnostics caution. No behavior change made here in order to avoid expanding scope beyond the requested fix. ### 5) Observation (Low): `glob` import vs `pathlib` convention Acknowledged as style/consistency feedback. Left unchanged in this PR because it is non-blocking and outside the immediate bug-fix scope. ## Summary - Fixed: #1 (plan record realism gap in Robot helper) - Noted for follow-up / out-of-scope in this PR: #2, #3, #4, #5

hurui200320 added a new dependency 2026-03-24 05:43:28 +00:00

#648 TDD: Write failing test for #647 — Container.resolve() crash in plan tree/explain/correct

hurui200320 approved these changes 2026-03-24 06:12:57 +00:00

Dismissed

hurui200320 left a comment

PR Review: !1053 (Ticket #648)

Verdict: ✅ Approve

All 6 acceptance criteria from ticket #648 are satisfied. All prior review findings from 4 rounds (Hamza, Freemo) have been verified as fixed in the current code. The tests correctly use real DI containers (not MagicMock) to catch the exact class of bug that existing M3 tests missed. The remaining findings are minor quality suggestions and nits — none affect correctness, completeness, or spec compliance.

Critical Issues

None.

Major Issues

None.

Minor Issues

1. Behave step leaks SQLAlchemy engine — not disposed in cleanup

File: features/steps/container_resolve_crash_steps.py:62-63, 129-143
Problem: The Given step creates uow = UnitOfWork(database_url) and seeds decisions. After init_database() disposes the initial engine, subsequent record_decision() calls recreate an engine via the engine property. Since environment.py's before_scenario sets a file-based SQLite URL, this engine is not stored in MEMORY_ENGINES and the cleanup closure never disposes it — it relies on garbage collection. The Robot helper correctly captures ctx.uow and calls ctx.uow.engine.dispose() in its cleanup.
Recommendation: Capture uow in the cleanup closure and dispose its engine explicitly, matching the Robot helper's pattern.

2. Dead step registration — @then("cr647- the command should execute successfully") never matched

File: features/steps/container_resolve_crash_steps.py:218
Problem: step_cr647_command_should_succeed is decorated with @then(...) registering it as a Behave step, but no scenario uses this step text. It's only called programmatically by the three specific Then steps. The decorator adds a dead entry to Behave's step registry.
Recommendation: Remove the @then decorator and rename to _assert_command_succeeded(context) to signal it's a helper, not a step.

3. plan correct assertions weaker than plan tree / plan explain

File: features/steps/container_resolve_crash_steps.py:276-284, robot/helper_container_resolve_crash.py:179-185
Problem: The plan correct assertion only validates decision_id in the output. By contrast, plan explain validates both decision_id AND "FastAPI". There's no command-specific content validation for plan correct (e.g., dry-run output, revert mode, guidance text).
Recommendation: Add at least one plan correct-specific assertion (e.g., check for "revert", "dry-run", or "Django" guidance text) to strengthen the regression guard.

4. get_container() call is a no-op with misleading comment

File: features/steps/container_resolve_crash_steps.py:125-126
Problem: _ = get_container() discards the return value. The comment says "Get the real container so command paths use DI wiring," but CliRunner.invoke creates its own container. The side effect (warming the singleton cache) may be meaningful, but the comment misrepresents the purpose.
Recommendation: Either remove the call, or update the comment to: # Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database.

5. Settings.reset() has no test-only documentation guard

File: src/cleveragents/config/settings.py:633-640
Problem: The only production code change. Settings.reset() is public with no indication it's test-only. If production code accidentally calls it, it would wipe the singleton mid-flight. The risk is low since the codebase already has 50+ instances of Settings._instance = None in tests, and this encapsulates the pattern. But a documentation guard would be beneficial.
Recommendation: Add .. warning:: **Test use only.** Do not call in production code paths. to the docstring.

6. Missing type annotations on 4 new settings step functions

File: features/steps/settings_steps.py:613, 623, 629, 637
Problem: The new step functions lack type annotations on context and expected parameters. Per CONTRIBUTING.md, new code should have explicit types. The container_resolve_crash_steps.py file in this same PR correctly uses context: Context and -> None.
Recommendation: Add context: Context, expected: str, and -> None type hints.

7. CONTRIBUTORS.md new entry not in alphabetical order

File: CONTRIBUTORS.md:9
Problem: Aditya Chhabra is appended at the end. The existing list follows alphabetical order (Brent, Hamza, Jeffrey, Luis, Rui). "Aditya" should appear first.
Recommendation: Move to its alphabetically correct position.

8. PR title does not match commit message

PR title: test(cli): add TDD failing tests for Container.resolve() crash
Commit: test(cli): add regression-guard tests for Container.resolve() crash
Recommendation: Update the PR title to match the commit: test(cli): add regression-guard tests for Container.resolve() crash.

9. plan correct test uses undocumented --plan flag (spec drift)

Files: container_resolve_crash_steps.py:202, helper_container_resolve_crash.py:270
Problem: Both tests pass --plan to plan correct, but the spec doesn't define this flag. The code correctly includes an inline comment acknowledging this. The test is correct for the current implementation but documents a spec-vs-implementation drift.
Recommendation: File a separate issue to align the spec with the implementation (add --plan to spec, or make the CLI infer the plan from decision ID).

Nits

10. glob.glob instead of pathlib convention

File: robot/helper_container_resolve_crash.py:15, 147
Recommendation: Replace with explicit suffix iteration: for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True).

11. Robot test cases repeat identical boilerplate

File: robot/container_resolve_crash.robot:24-55
Recommendation: Consider extracting a Run And Verify Helper keyword to deduplicate the 3 test cases. Acceptable as-is for 3 tests.

12. Feature description awkward line break

File: features/container_resolve_crash.feature:7-8
Recommendation: Reflow to keep mock get_container() on the same line.

13. Redundant TYPE_CHECKING import

File: features/steps/container_resolve_crash_steps.py:23-24
Problem: Decision is imported under TYPE_CHECKING at module level but is also imported inside the function at line 47.
Recommendation: Remove the module-level TYPE_CHECKING guard since the in-function import provides the type within the relevant scope.

Summary

This is a well-crafted TDD regression test PR. The core deliverables are solid: three Behave BDD scenarios and three Robot Framework integration tests that exercise plan tree, plan explain, and plan correct through a real DI container with seeded decisions — correctly catching the class of bug that existing M3 tests missed with their MagicMock approach. All 6 acceptance criteria are met, all 4 rounds of prior review feedback have been addressed, and the Settings.reset() addition is a clean encapsulation of an existing test pattern.

The 9 minor findings are quality improvements (resource cleanup, type annotations, naming, ordering) — not correctness issues. The 4 nits are stylistic. The PR is ready to merge after addressing the minor items (or acknowledging them as follow-up work).

## PR Review: !1053 (Ticket #648) ### Verdict: ✅ Approve All 6 acceptance criteria from ticket #648 are satisfied. All prior review findings from 4 rounds (Hamza, Freemo) have been verified as fixed in the current code. The tests correctly use real DI containers (not MagicMock) to catch the exact class of bug that existing M3 tests missed. The remaining findings are minor quality suggestions and nits — none affect correctness, completeness, or spec compliance. --- ### Critical Issues None. ### Major Issues None. ### Minor Issues **1. Behave step leaks SQLAlchemy engine — not disposed in cleanup** - **File:** `features/steps/container_resolve_crash_steps.py:62-63, 129-143` - **Problem:** The Given step creates `uow = UnitOfWork(database_url)` and seeds decisions. After `init_database()` disposes the initial engine, subsequent `record_decision()` calls recreate an engine via the `engine` property. Since `environment.py`'s `before_scenario` sets a file-based SQLite URL, this engine is *not* stored in `MEMORY_ENGINES` and the cleanup closure never disposes it — it relies on garbage collection. The Robot helper correctly captures `ctx.uow` and calls `ctx.uow.engine.dispose()` in its cleanup. - **Recommendation:** Capture `uow` in the cleanup closure and dispose its engine explicitly, matching the Robot helper's pattern. **2. Dead step registration — `@then("cr647- the command should execute successfully")` never matched** - **File:** `features/steps/container_resolve_crash_steps.py:218` - **Problem:** `step_cr647_command_should_succeed` is decorated with `@then(...)` registering it as a Behave step, but no scenario uses this step text. It's only called programmatically by the three specific Then steps. The decorator adds a dead entry to Behave's step registry. - **Recommendation:** Remove the `@then` decorator and rename to `_assert_command_succeeded(context)` to signal it's a helper, not a step. **3. `plan correct` assertions weaker than `plan tree` / `plan explain`** - **File:** `features/steps/container_resolve_crash_steps.py:276-284`, `robot/helper_container_resolve_crash.py:179-185` - **Problem:** The `plan correct` assertion only validates `decision_id` in the output. By contrast, `plan explain` validates both `decision_id` AND `"FastAPI"`. There's no command-specific content validation for `plan correct` (e.g., dry-run output, revert mode, guidance text). - **Recommendation:** Add at least one `plan correct`-specific assertion (e.g., check for `"revert"`, `"dry-run"`, or `"Django"` guidance text) to strengthen the regression guard. **4. `get_container()` call is a no-op with misleading comment** - **File:** `features/steps/container_resolve_crash_steps.py:125-126` - **Problem:** `_ = get_container()` discards the return value. The comment says "Get the real container so command paths use DI wiring," but `CliRunner.invoke` creates its own container. The side effect (warming the singleton cache) may be meaningful, but the comment misrepresents the purpose. - **Recommendation:** Either remove the call, or update the comment to: `# Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database.` **5. `Settings.reset()` has no test-only documentation guard** - **File:** `src/cleveragents/config/settings.py:633-640` - **Problem:** The only production code change. `Settings.reset()` is public with no indication it's test-only. If production code accidentally calls it, it would wipe the singleton mid-flight. The risk is low since the codebase already has 50+ instances of `Settings._instance = None` in tests, and this encapsulates the pattern. But a documentation guard would be beneficial. - **Recommendation:** Add `.. warning:: **Test use only.** Do not call in production code paths.` to the docstring. **6. Missing type annotations on 4 new settings step functions** - **File:** `features/steps/settings_steps.py:613, 623, 629, 637` - **Problem:** The new step functions lack type annotations on `context` and `expected` parameters. Per CONTRIBUTING.md, new code should have explicit types. The `container_resolve_crash_steps.py` file in this same PR correctly uses `context: Context` and `-> None`. - **Recommendation:** Add `context: Context`, `expected: str`, and `-> None` type hints. **7. CONTRIBUTORS.md new entry not in alphabetical order** - **File:** `CONTRIBUTORS.md:9` - **Problem:** `Aditya Chhabra` is appended at the end. The existing list follows alphabetical order (Brent, Hamza, Jeffrey, Luis, Rui). "Aditya" should appear first. - **Recommendation:** Move to its alphabetically correct position. **8. PR title does not match commit message** - **PR title:** `test(cli): add TDD failing tests for Container.resolve() crash` - **Commit:** `test(cli): add regression-guard tests for Container.resolve() crash` - **Recommendation:** Update the PR title to match the commit: `test(cli): add regression-guard tests for Container.resolve() crash`. **9. `plan correct` test uses undocumented `--plan` flag (spec drift)** - **Files:** `container_resolve_crash_steps.py:202`, `helper_container_resolve_crash.py:270` - **Problem:** Both tests pass `--plan` to `plan correct`, but the spec doesn't define this flag. The code correctly includes an inline comment acknowledging this. The test is correct for the *current implementation* but documents a spec-vs-implementation drift. - **Recommendation:** File a separate issue to align the spec with the implementation (add `--plan` to spec, or make the CLI infer the plan from decision ID). ### Nits **10. `glob.glob` instead of `pathlib` convention** - **File:** `robot/helper_container_resolve_crash.py:15, 147` - **Recommendation:** Replace with explicit suffix iteration: `for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True)`. **11. Robot test cases repeat identical boilerplate** - **File:** `robot/container_resolve_crash.robot:24-55` - **Recommendation:** Consider extracting a `Run And Verify Helper` keyword to deduplicate the 3 test cases. Acceptable as-is for 3 tests. **12. Feature description awkward line break** - **File:** `features/container_resolve_crash.feature:7-8` - **Recommendation:** Reflow to keep `mock get_container()` on the same line. **13. Redundant `TYPE_CHECKING` import** - **File:** `features/steps/container_resolve_crash_steps.py:23-24` - **Problem:** `Decision` is imported under `TYPE_CHECKING` at module level but is also imported inside the function at line 47. - **Recommendation:** Remove the module-level `TYPE_CHECKING` guard since the in-function import provides the type within the relevant scope. --- ### Summary This is a well-crafted TDD regression test PR. The core deliverables are solid: three Behave BDD scenarios and three Robot Framework integration tests that exercise `plan tree`, `plan explain`, and `plan correct` through a real DI container with seeded decisions — correctly catching the class of bug that existing M3 tests missed with their `MagicMock` approach. All 6 acceptance criteria are met, all 4 rounds of prior review feedback have been addressed, and the `Settings.reset()` addition is a clean encapsulation of an existing test pattern. The 9 minor findings are quality improvements (resource cleanup, type annotations, naming, ordering) — not correctness issues. The 4 nits are stylistic. The PR is ready to merge after addressing the minor items (or acknowledging them as follow-up work).

aditya changed title from ~~test(cli): add TDD failing tests for Container.resolve() crash~~ to test(cli): add regression-guard tests for Container.resolve() crash

2026-03-24 06:45:07 +00:00

aditya force-pushed tdd/m3-container-resolve-crash from f517dd5a94 to 36c36bc0ee

2026-03-24 08:47:08 +00:00

Compare

aditya dismissed hurui200320's review 2026-03-24 08:47:08 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

aditya commented

2026-03-24 08:47:45 +00:00

Response to @hurui200320 Review (Round 5)

Thanks for the thorough review and the approval. All 9 minor findings and 4 nits have been addressed in the amended commit. Here's the summary:

Fixed (Minor Issues)

#1 — Behave step leaks SQLAlchemy engine
Fixed. The cleanup closure now explicitly captures uow and calls uow.engine.dispose() before the MEMORY_ENGINES pop, matching the Robot helper's pattern. (container_resolve_crash_steps.py:131)

#2 — Dead @then decorator on shared assertion helper
Fixed. Removed the @then("cr647- the command should execute successfully") decorator and renamed the function to _assert_command_succeeded() to signal it's a private helper, not a step. Updated all three callers.

#3 — plan correct assertions weaker than tree/explain
Fixed. Both Behave and Robot assertions now validate command-specific content: "revert" or "dry" must appear in the output (case-insensitive), strengthening the regression guard beyond just decision-id presence. (container_resolve_crash_steps.py:286-289, helper_container_resolve_crash.py:187-192)

#4 — get_container() call is a no-op with misleading comment
Fixed. Updated the comment to: # Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database. This accurately describes the singleton-warming side effect.

#5 — Settings.reset() has no test-only documentation guard
Fixed. Added .. warning:: **Test use only.** Do not call in production code paths — resetting the singleton mid-flight can cause inconsistent configuration state. to the docstring. (settings.py:640-642)

#6 — Missing type annotations on 4 new settings step functions
Fixed. Added context: Context, expected: str, and -> None type hints to all four new step functions in settings_steps.py. Also added from __future__ import annotations and from behave.runner import Context imports.

#7 — CONTRIBUTORS.md not in alphabetical order
Fixed. Moved Aditya Chhabra to its alphabetically correct position (first) and removed a duplicate Rui Hu entry.

#9 — plan correct uses undocumented --plan flag
Acknowledged. Both the Behave step and Robot helper already had inline comments documenting this spec drift. Updated the Robot helper comment to be more explicit about the spec-vs-implementation gap. Will track separately.

Fixed (Nits)

#10 — glob.glob instead of pathlib convention
Fixed. Replaced from glob import glob + glob(f"{ctx.db_path}*") with explicit suffix iteration: for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True). (helper_container_resolve_crash.py:147-149)

#11 — Robot test cases repeat boilerplate
Acknowledged as acceptable for 3 test cases per your own recommendation. No change.

#12 — Feature description awkward line break
Fixed. Reflowed mock get_container() to keep it on the same line. (container_resolve_crash.feature:7-8)

#13 — Redundant TYPE_CHECKING import for Decision
Fixed. Removed the module-level TYPE_CHECKING guard and Decision import. Added Decision to the in-function import from cleveragents.domain.models.core.decision where it's actually used.

Quality Gates

nox -e lint: PASS
nox -e typecheck: PASS (0 errors)
nox -e unit_tests: All PR-related features pass (3 pre-existing failures in unrelated checkpoint/git_tools features)
nox -e integration_tests: Container Resolve Crash robot test passes (17 pre-existing failures in unrelated features)
nox -e coverage_report: 98.22% (above 97% threshold)

PR title also updated to match commit message.
Closing the pr, all the issues resolved successfully, the pr has been approved by both @hamza.khyari and @hurui200320 .

## Response to @hurui200320 Review (Round 5) Thanks for the thorough review and the approval. All 9 minor findings and 4 nits have been addressed in the amended commit. Here's the summary: ### Fixed (Minor Issues) **#1 — Behave step leaks SQLAlchemy engine** Fixed. The cleanup closure now explicitly captures `uow` and calls `uow.engine.dispose()` before the `MEMORY_ENGINES` pop, matching the Robot helper's pattern. (`container_resolve_crash_steps.py:131`) **#2 — Dead `@then` decorator on shared assertion helper** Fixed. Removed the `@then("cr647- the command should execute successfully")` decorator and renamed the function to `_assert_command_succeeded()` to signal it's a private helper, not a step. Updated all three callers. **#3 — `plan correct` assertions weaker than tree/explain** Fixed. Both Behave and Robot assertions now validate command-specific content: `"revert"` or `"dry"` must appear in the output (case-insensitive), strengthening the regression guard beyond just decision-id presence. (`container_resolve_crash_steps.py:286-289`, `helper_container_resolve_crash.py:187-192`) **#4 — `get_container()` call is a no-op with misleading comment** Fixed. Updated the comment to: `# Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database.` This accurately describes the singleton-warming side effect. **#5 — `Settings.reset()` has no test-only documentation guard** Fixed. Added `.. warning:: **Test use only.** Do not call in production code paths — resetting the singleton mid-flight can cause inconsistent configuration state.` to the docstring. (`settings.py:640-642`) **#6 — Missing type annotations on 4 new settings step functions** Fixed. Added `context: Context`, `expected: str`, and `-> None` type hints to all four new step functions in `settings_steps.py`. Also added `from __future__ import annotations` and `from behave.runner import Context` imports. **#7 — CONTRIBUTORS.md not in alphabetical order** Fixed. Moved `Aditya Chhabra` to its alphabetically correct position (first) and removed a duplicate `Rui Hu` entry. **#9 — `plan correct` uses undocumented `--plan` flag** Acknowledged. Both the Behave step and Robot helper already had inline comments documenting this spec drift. Updated the Robot helper comment to be more explicit about the spec-vs-implementation gap. Will track separately. ### Fixed (Nits) **#10 — `glob.glob` instead of `pathlib` convention** Fixed. Replaced `from glob import glob` + `glob(f"{ctx.db_path}*")` with explicit suffix iteration: `for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True)`. (`helper_container_resolve_crash.py:147-149`) **#11 — Robot test cases repeat boilerplate** Acknowledged as acceptable for 3 test cases per your own recommendation. No change. **#12 — Feature description awkward line break** Fixed. Reflowed `mock get_container()` to keep it on the same line. (`container_resolve_crash.feature:7-8`) **#13 — Redundant `TYPE_CHECKING` import for `Decision`** Fixed. Removed the module-level `TYPE_CHECKING` guard and `Decision` import. Added `Decision` to the in-function import from `cleveragents.domain.models.core.decision` where it's actually used. ### Quality Gates - `nox -e lint`: PASS - `nox -e typecheck`: PASS (0 errors) - `nox -e unit_tests`: All PR-related features pass (3 pre-existing failures in unrelated checkpoint/git_tools features) - `nox -e integration_tests`: Container Resolve Crash robot test passes (17 pre-existing failures in unrelated features) - `nox -e coverage_report`: 98.22% (above 97% threshold) PR title also updated to match commit message. Closing the pr, all the issues resolved successfully, the pr has been approved by both @hamza.khyari and @hurui200320 .

aditya scheduled this pull request to auto merge when all checks succeed 2026-03-24 09:21:13 +00:00

aditya canceled auto merging this pull request when all checks succeed 2026-03-24 09:21:40 +00:00

aditya merged commit a854de7e1f into master

2026-03-24 09:25:49 +00:00

aditya referenced this issue from a commit

2026-03-24 09:25:51 +00:00

Merge pull request 'test(cli): add regression-guard tests for Container.resolve() crash' (#1053) from tdd/m3-container-resolve-crash into master

~~hurui200320 referenced this pull request 2026-03-24 12:16:49 +00:00~~

test(e2e): workflow example 16 — devcontainer-driven development (supervised profile) #818

hurui200320 referenced this pull request

2026-03-24 12:17:22 +00:00

test(e2e): workflow example 16 — devcontainer-driven development (supervised profile) #762

Sign in to join this conversation.

4 Participants

Notifications

Due Date

No due date set.

Blocks

#648 TDD: Write failing test for #647 — Container.resolve() crash in plan tree/explain/correct

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#1053

test(cli): add regression-guard tests for Container.resolve() crash #1053

This PR supersedes #670, which was raised from the older branch

Description

Type of Change

Quality Checklist

Testing

Test Commands Run

Related Issues

Implementation Notes

Reviewer Clarifications (Hamza)

Code Review — PR #1053

Verified Correct

Findings

Round 2 Follow-Up — 2 new Minor findings

New Findings

SPEC-1 — Commit message mentions @tdd_expected_fail

SPEC-2 — Branch name divergence with issue metadata

CODE-2 — features/devcontainer_cleanup.feature out-of-scope assertion text change

TEST-1, TEST-2, ENV-1, ENV-2 have been fixed and code pushed.

Round 3 — Scope Audit + Bug Hunt

BUG-1 (Major): Explain assertion checks the wrong decision's content

SCOPE-1 (Major): 3 of 10 files are out of scope for issue #648

Verified Correct (no bugs):

Code Review — PR #1053 test(cli): TDD failing tests for Container.resolve() crash

PR #1053 — Review Response (Hamza Round 3 + Freemo Assessment)

Freemo Action Items — All Resolved

1. BUG-1 (Major) — Explain assertion checks wrong decision content

2. SCOPE-1 / CODE-2 — Remove features/devcontainer_cleanup.feature

3. CODE-3 — Remove sys.path insertion from settings_steps.py

4. SPEC-1 — Commit message body reflects stale @tdd_expected_fail semantics

Previously Addressed Items (Rounds 1–3)

Freemo Rulings Acknowledged

Final PR State

Summary

Comment Response

@hamza.khyari (Rounds 1-3)

@aditya (Response)

@freemo (Day 42)

Current Blocking Items

Action Required

Code Review — PR #1053 test(cli): add TDD failing tests for Container.resolve() crash

Findings

1. Bug (Medium): Robot helper seeds decisions without creating a plan record

2. Observation (Low): Settings.reset() not used in environment.py

3. Observation (Low): get_container() call is a no-op

4. Observation (Low): catch_exceptions=True masks setup failures

5. Observation (Low): glob import vs pathlib convention

Summary

Response to findings

1) Bug (Medium): Robot helper seeds decisions without creating a plan record

2) Observation (Low): Settings.reset() not used in environment.py

3) Observation (Low): get_container() call is a no-op

4) Observation (Low): catch_exceptions=True may mask setup failures

5) Observation (Low): glob import vs pathlib convention

Summary

PR Review: !1053 (Ticket #648)

Verdict: ✅ Approve

Critical Issues

Major Issues

Minor Issues

Nits

Summary

Response to @hurui200320 Review (Round 5)

Fixed (Minor Issues)

Fixed (Nits)

Quality Gates

SPEC-1 — Commit message mentions `@tdd_expected_fail`

CODE-2 — `features/devcontainer_cleanup.feature` out-of-scope assertion text change

Code Review — PR #1053 `test(cli): TDD failing tests for Container.resolve() crash`

2. SCOPE-1 / CODE-2 — Remove `features/devcontainer_cleanup.feature`

3. CODE-3 — Remove `sys.path` insertion from `settings_steps.py`

4. SPEC-1 — Commit message body reflects stale `@tdd_expected_fail` semantics

Code Review — PR #1053 `test(cli): add TDD failing tests for Container.resolve() crash`

2. Observation (Low): `Settings.reset()` not used in `environment.py`

3. Observation (Low): `get_container()` call is a no-op

4. Observation (Low): `catch_exceptions=True` masks setup failures

5. Observation (Low): `glob` import vs `pathlib` convention

2) Observation (Low): `Settings.reset()` not used in `environment.py`

3) Observation (Low): `get_container()` call is a no-op

4) Observation (Low): `catch_exceptions=True` may mask setup failures

5) Observation (Low): `glob` import vs `pathlib` convention