test(cli): add regression-guard tests for Container.resolve() crash #1053

Merged
aditya merged 1 commit from tdd/m3-container-resolve-crash into master 2026-03-24 09:25:49 +00:00
Member

This PR supersedes #670, which was raised from the older branch

tdd/container-resolve-crash. It incorporates the review fixes from #670
for the #648 scope while using the corrected branch naming flow.

Description

Adds and hardens regression coverage for bug #647 across both Behave and Robot
frameworks using real DI/container wiring. Includes review-driven improvements
to assertion strength, cleanup robustness, and wording consistency for post-fix
regression-guard semantics.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (no functional changes)
  • Documentation update
  • Test improvements
  • CI/CD changes

Quality Checklist

  • Code follows the project's coding standards (see CONTRIBUTING.md)
  • All public/protected methods have argument validation
  • Static typing is complete (no Any unless justified)
  • nox -s typecheck passes with no errors
  • nox -s lint passes with no errors
  • Unit tests written/updated (Behave scenarios in features/)
  • Integration tests written/updated (Robot suites in robot/) if applicable
  • Coverage remains above 85% (nox -s coverage_report)
  • No security issues introduced (nox -s security_scan)
  • No dead code introduced (nox -s dead_code)
  • Documentation updated if behavior changed

Testing

Test Commands Run

ALL nox test passes

Closes #648

Implementation Notes

  • This PR supersedes the prior naming mismatch by using tdd/m3-container-resolve-crash.
  • It preserves only #648-scope changes and excludes collateral stabilization files from the old PR #670 path.
  • plan correct test invocation retains explicit --plan with inline comment because current implementation requires active-plan context in helper subprocess flows.

Reviewer Clarifications (Hamza)

  • SPEC-1 (commit body wording): Current code state is regression-guard (no @tdd_expected_fail). The commit text came from pre-supersede transition context; this PR description reflects the final post-fix semantics.
  • SPEC-2 (branch divergence): Branch naming divergence has been handled through the replacement branch/PR flow and tracked in GitLab issue #648.
  • CODE-2 (devcontainer assertion text): The wording update ("not a devcontainer-instance") is a test-alignment stabilization to canonical CLI output after rebase; it does not change product behavior.
## This PR supersedes #670, which was raised from the older branch `tdd/container-resolve-crash`. It incorporates the review fixes from #670 for the #648 scope while using the corrected branch naming flow. ## Description Adds and hardens regression coverage for bug #647 across both Behave and Robot frameworks using real DI/container wiring. Includes review-driven improvements to assertion strength, cleanup robustness, and wording consistency for post-fix regression-guard semantics. ## Type of Change - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Refactoring (no functional changes) - [ ] Documentation update - [x] Test improvements - [ ] CI/CD changes ## Quality Checklist - [x] Code follows the project's coding standards (see CONTRIBUTING.md) - [x] All public/protected methods have argument validation - [x] Static typing is complete (no `Any` unless justified) - [x] `nox -s typecheck` passes with no errors - [x] `nox -s lint` passes with no errors - [x] Unit tests written/updated (Behave scenarios in `features/`) - [x] Integration tests written/updated (Robot suites in `robot/`) if applicable - [x] Coverage remains above 85% (`nox -s coverage_report`) - [x] No security issues introduced (`nox -s security_scan`) - [x] No dead code introduced (`nox -s dead_code`) - [x] Documentation updated if behavior changed ## Testing ### Test Commands Run ```bash ALL nox test passes ``` ## Related Issues Closes #648 ## Implementation Notes - This PR supersedes the prior naming mismatch by using `tdd/m3-container-resolve-crash`. - It preserves only `#648`-scope changes and excludes collateral stabilization files from the old PR #670 path. - `plan correct` test invocation retains explicit `--plan` with inline comment because current implementation requires active-plan context in helper subprocess flows. ## Reviewer Clarifications (Hamza) - **SPEC-1 (commit body wording):** Current code state is regression-guard (no `@tdd_expected_fail`). The commit text came from pre-supersede transition context; this PR description reflects the final post-fix semantics. - **SPEC-2 (branch divergence):** Branch naming divergence has been handled through the replacement branch/PR flow and tracked in GitLab issue `#648`. - **CODE-2 (devcontainer assertion text):** The wording update (`"not a devcontainer-instance"`) is a test-alignment stabilization to canonical CLI output after rebase; it does not change product behavior.
aditya added this to the v3.2.0 milestone 2026-03-18 11:14:07 +00:00
aditya self-assigned this 2026-03-18 11:19:20 +00:00
Member

Code Review — PR #1053

Scope: 10 files, +711/−2 (single commit). Lint PASS, Typecheck PASS.

Verified Correct

  • Bug #647 is confirmed fixed on master (all three call sites use container.decision_service())
  • @tdd_expected_fail correctly omitted — tests run as regression guards
  • Real DI container with seeded decisions, not MagicMock — this catches the exact class of bug that M3 tests missed
  • Positive content assertions per command (decision ID in output), not just crash-absence checks
  • Cleanup is thorough: MEMORY_ENGINES.pop(), reset_container(), env var removal, Settings.reset()
  • cr647- step prefix prevents Behave AmbiguousStep collisions
  • Settings.reset() addition is consistent with reset_container() pattern and replaces brittle direct _instance = None writes

Findings

ID Severity Category File Finding Recommendation
SPEC-1 Major PROCESS Commit message Commit body says "Tests tagged @tdd_expected_fail" but the feature file correctly omits it (bug is fixed). Commit message describes the pre-fix state, not the actual code. Amend commit message body to reflect regression-guard semantics, not expected-fail semantics.
SPEC-2 Major PROCESS Issue #648 Metadata Issue specifies Branch: tdd/container-resolve-crash. Actual branch is tdd/m3-container-resolve-crash. PR body explains the supersede but Metadata is stale. Update issue #648 Metadata Branch field to match, or rename branch.
TEST-1 Minor TEST settings_configuration.feature:172 Given used after When in the action phase (Given the environment variable "CLEVERAGENTS_ENV" is set to "staging"). Semantically should be And. Change Given to And for correct Gherkin structure.
TEST-2 Minor TEST container_resolve_crash_steps.py:247-262 Three Then steps delegate to a shared function that branches on context.cr647_command. Functional but couples step identity to When-set state. Acceptable for this scope. Noting the coupling.
CODE-2 Nit PROCESS devcontainer_cleanup.feature:69 Out-of-scope change: assertion text "rebuild requires devcontainer""not a devcontainer-instance". Unrelated to #648. Remove from this PR or document separately.
CODE-3 Nit CODE settings_steps.py:6-9 Adds sys.path.insert(0, str(_SRC)) — unnecessary since Behave configures src/ on the path via environment.py. Inconsistent with other step files. Remove the sys.path insertion.
## Code Review — PR #1053 **Scope**: 10 files, +711/−2 (single commit). Lint PASS, Typecheck PASS. ### Verified Correct - Bug #647 is confirmed fixed on master (all three call sites use `container.decision_service()`) - `@tdd_expected_fail` correctly omitted — tests run as regression guards - Real DI container with seeded decisions, not MagicMock — this catches the exact class of bug that M3 tests missed - Positive content assertions per command (decision ID in output), not just crash-absence checks - Cleanup is thorough: `MEMORY_ENGINES.pop()`, `reset_container()`, env var removal, `Settings.reset()` - `cr647-` step prefix prevents Behave `AmbiguousStep` collisions - `Settings.reset()` addition is consistent with `reset_container()` pattern and replaces brittle direct `_instance = None` writes ### Findings | ID | Severity | Category | File | Finding | Recommendation | |---|---|---|---|---|---| | SPEC-1 | Major | PROCESS | Commit message | Commit body says *"Tests tagged @tdd_expected_fail"* but the feature file correctly omits it (bug is fixed). Commit message describes the pre-fix state, not the actual code. | Amend commit message body to reflect regression-guard semantics, not expected-fail semantics. | | SPEC-2 | Major | PROCESS | Issue #648 Metadata | Issue specifies `Branch: tdd/container-resolve-crash`. Actual branch is `tdd/m3-container-resolve-crash`. PR body explains the supersede but Metadata is stale. | Update issue #648 Metadata Branch field to match, or rename branch. | | TEST-1 | Minor | TEST | `settings_configuration.feature:172` | `Given` used after `When` in the action phase (`Given the environment variable "CLEVERAGENTS_ENV" is set to "staging"`). Semantically should be `And`. | Change `Given` to `And` for correct Gherkin structure. | | TEST-2 | Minor | TEST | `container_resolve_crash_steps.py:247-262` | Three `Then` steps delegate to a shared function that branches on `context.cr647_command`. Functional but couples step identity to `When`-set state. | Acceptable for this scope. Noting the coupling. | | CODE-2 | Nit | PROCESS | `devcontainer_cleanup.feature:69` | Out-of-scope change: assertion text `"rebuild requires devcontainer"` → `"not a devcontainer-instance"`. Unrelated to #648. | Remove from this PR or document separately. | | CODE-3 | Nit | CODE | `settings_steps.py:6-9` | Adds `sys.path.insert(0, str(_SRC))` — unnecessary since Behave configures `src/` on the path via `environment.py`. Inconsistent with other step files. | Remove the `sys.path` insertion. |
Member

Round 2 Follow-Up — 2 new Minor findings

Deep second pass verified:

  • DecisionService seeding is correct: All parameter types match actual signatures. ContextSnapshot passed as instance (not dict). ResourceRef.resource_id validates non-empty only (no ULID pattern), so str(ULID()) is fine. confidence_score 0.95/0.90 within documented 0.0–1.0 range.
  • Robot helper uses file-based SQLite (not in-memory), avoiding MEMORY_ENGINES caching hazards. Cleanup properly deletes DB + WAL/SHM via glob.
  • Robot helper run_cli() explicitly passes env_extra, so subprocess env isolation is correct.
  • Three-way exit code protocol (0=pass, 1=regression, 2=infra failure) is clean and Robot test checks rc == 2 as a distinct failure mode.
  • Double-dispose of MEMORY_ENGINES (step cleanup + after_scenario) is safe — SQLAlchemy dispose() is idempotent.

New Findings

ID Severity Category File Finding Recommendation
ENV-1 Minor TEST container_resolve_crash_steps.py:54 Background step overwrites before_scenario's per-scenario temp DB URL with sqlite:///:memory:. The temp file is never created; after_scenario tries to unlink it but suppress(OSError) silently handles the miss. Functional but messy. Tag feature @mock_only to skip per-scenario DB setup, or set the in-memory URL via context instead of os.environ.
ENV-2 Minor TEST container_resolve_crash_steps.py:56-57 reset_container() discards the mock AI provider override set by before_scenario. Real providers become active. Harmless here because tree/explain only read DB data, and correct uses --dry-run. But removing --dry-run would hit real LLM endpoints. Add comment in feature file noting --dry-run is required to prevent LLM access since mock AI is bypassed by the real-container setup.

Updated totals (Round 1 + Round 2):

ID Severity Category Finding
SPEC-1 Major PROCESS Commit message says @tdd_expected_fail but code omits it — message is stale
SPEC-2 Major PROCESS Branch name diverges from issue #648 Metadata
TEST-1 Minor TEST Given used after When in settings scenario
TEST-2 Minor TEST Then steps delegate to shared function, coupling to When-set state
ENV-1 Minor TEST Background overwrites before_scenario DB URL, orphaning temp file
ENV-2 Minor TEST reset_container() discards mock AI; test implicitly depends on --dry-run
CODE-2 Nit PROCESS Out-of-scope devcontainer assertion text change
CODE-3 Nit CODE Unnecessary sys.path insertion in settings_steps.py
## Round 2 Follow-Up — 2 new Minor findings Deep second pass verified: - **DecisionService seeding is correct**: All parameter types match actual signatures. `ContextSnapshot` passed as instance (not dict). `ResourceRef.resource_id` validates non-empty only (no ULID pattern), so `str(ULID())` is fine. `confidence_score` 0.95/0.90 within documented 0.0–1.0 range. - **Robot helper uses file-based SQLite** (not in-memory), avoiding `MEMORY_ENGINES` caching hazards. Cleanup properly deletes DB + WAL/SHM via glob. - **Robot helper `run_cli()` explicitly passes `env_extra`**, so subprocess env isolation is correct. - **Three-way exit code protocol** (0=pass, 1=regression, 2=infra failure) is clean and Robot test checks `rc == 2` as a distinct failure mode. - **Double-dispose of `MEMORY_ENGINES`** (step cleanup + `after_scenario`) is safe — SQLAlchemy `dispose()` is idempotent. ### New Findings | ID | Severity | Category | File | Finding | Recommendation | |---|---|---|---|---|---| | ENV-1 | Minor | TEST | `container_resolve_crash_steps.py:54` | Background step overwrites `before_scenario`'s per-scenario temp DB URL with `sqlite:///:memory:`. The temp file is never created; `after_scenario` tries to unlink it but `suppress(OSError)` silently handles the miss. Functional but messy. | Tag feature `@mock_only` to skip per-scenario DB setup, or set the in-memory URL via `context` instead of `os.environ`. | | ENV-2 | Minor | TEST | `container_resolve_crash_steps.py:56-57` | `reset_container()` discards the mock AI provider override set by `before_scenario`. Real providers become active. Harmless here because `tree`/`explain` only read DB data, and `correct` uses `--dry-run`. But removing `--dry-run` would hit real LLM endpoints. | Add comment in feature file noting `--dry-run` is required to prevent LLM access since mock AI is bypassed by the real-container setup. | --- **Updated totals (Round 1 + Round 2):** | ID | Severity | Category | Finding | |---|---|---|---| | SPEC-1 | Major | PROCESS | Commit message says `@tdd_expected_fail` but code omits it — message is stale | | SPEC-2 | Major | PROCESS | Branch name diverges from issue #648 Metadata | | TEST-1 | Minor | TEST | `Given` used after `When` in settings scenario | | TEST-2 | Minor | TEST | `Then` steps delegate to shared function, coupling to `When`-set state | | ENV-1 | Minor | TEST | Background overwrites `before_scenario` DB URL, orphaning temp file | | ENV-2 | Minor | TEST | `reset_container()` discards mock AI; test implicitly depends on `--dry-run` | | CODE-2 | Nit | PROCESS | Out-of-scope devcontainer assertion text change | | CODE-3 | Nit | CODE | Unnecessary `sys.path` insertion in settings_steps.py |
hamza.khyari requested changes 2026-03-18 13:28:53 +00:00
Dismissed
hamza.khyari left a comment

check comments

check comments
Author
Member

Hi Hamza, thanks for the detailed review. Please find responses for the three process/scope items below.

SPEC-1 — Commit message mentions @tdd_expected_fail

What we did:

  • Updated PR description and review notes to explicitly reflect the final, correct behavior:
    • @tdd_expected_fail removed
    • tests now run as normal regression guards

SPEC-2 — Branch name divergence with issue metadata

Status: Resolved in process flow.
Justification / action taken:

  • We moved from tdd/container-resolve-crash to tdd/m3-container-resolve-crash and raised the replacement PR to align with naming standards.
  • Branch-name divergence has been tracked/updated under GitLab issue 648 as requested.

CODE-2 — features/devcontainer_cleanup.feature out-of-scope assertion text change

Justification:
This was a test-stabilization compatibility fix, not a behavior change. After rebase, the CLI error output had standardized wording (not a devcontainer-instance), while the old assertion expected legacy text (rebuild requires devcontainer).
Keeping the old assertion caused a false negative unrelated to #648 logic, so the assertion string was aligned to canonical CLI output to keep test signal accurate.

  • Scope impact: test assertion text only
  • Runtime/product behavior impact: none

TEST-1, TEST-2, ENV-1, ENV-2 have been fixed and code pushed.

Hi Hamza, thanks for the detailed review. Please find responses for the three process/scope items below. ### SPEC-1 — Commit message mentions `@tdd_expected_fail` **What we did:** - Updated PR description and review notes to explicitly reflect the final, correct behavior: - `@tdd_expected_fail` removed - tests now run as normal regression guards --- ### SPEC-2 — Branch name divergence with issue metadata **Status:** Resolved in process flow. **Justification / action taken:** - We moved from `tdd/container-resolve-crash` to `tdd/m3-container-resolve-crash` and raised the replacement PR to align with naming standards. - Branch-name divergence has been tracked/updated under **GitLab issue 648** as requested. --- ### CODE-2 — `features/devcontainer_cleanup.feature` out-of-scope assertion text change **Justification:** This was a **test-stabilization compatibility fix**, not a behavior change. After rebase, the CLI error output had standardized wording (`not a devcontainer-instance`), while the old assertion expected legacy text (`rebuild requires devcontainer`). Keeping the old assertion caused a false negative unrelated to #648 logic, so the assertion string was aligned to canonical CLI output to keep test signal accurate. - Scope impact: test assertion text only - Runtime/product behavior impact: none ### TEST-1, TEST-2, ENV-1, ENV-2 have been fixed and code pushed.
Member

Round 3 — Scope Audit + Bug Hunt

BUG-1 (Major): Explain assertion checks the wrong decision's content

Behave: container_resolve_crash_steps.py:238-241
Robot: helper_container_resolve_crash.py:170-172

plan explain is invoked with the child decision (chosen_option = "FastAPI"), but the assertion checks for the root decision's content:

# Behave
assert "A REST API" in (result.output or "") or '"chosen"' in (result.output or "")

# Robot helper
if "A REST API" not in output and '"chosen"' not in output:

The child's chosen_option is "FastAPI", not "A REST API". The assertion passes by accident because the fallback '"chosen"' matches the JSON key "chosen_option" in the output. If the JSON format changes to not include the substring "chosen", both assertions silently break.

Fix: Replace "A REST API" with "FastAPI" in both the Behave step and Robot helper.


SCOPE-1 (Major): 3 of 10 files are out of scope for issue #648

Per CONTRIBUTING.md Rule 1 (Atomicity) and Rule 3 (Single Responsibility):

File In Scope? Rationale
features/container_resolve_crash.feature Yes Core deliverable
features/steps/container_resolve_crash_steps.py Yes Core deliverable
robot/container_resolve_crash.robot Yes Core deliverable
robot/helper_container_resolve_crash.py Yes Core deliverable
CHANGELOG.md Yes Merge checklist requirement
CONTRIBUTORS.md Yes New author
src/cleveragents/config/settings.py Borderline Adds Settings.reset() — new production public API used by test cleanup
features/settings_configuration.feature No Tests Settings.reset() — not in #648 AC
features/steps/settings_steps.py No 4 new step defs + sys.path insertion for above
features/devcontainer_cleanup.feature No Assertion text change completely unrelated to #648

Per CONTRIBUTING.md, this commit does "Add TDD tests for #647 and add Settings.reset() method and add settings reset Behave test and fix devcontainer assertion text" — four concerns that should be three separate issues.

Recommendation: Remove the three out-of-scope files from this commit. If Settings.reset() and its test are needed, file a separate issue. The devcontainer assertion fix needs its own issue.


Verified Correct (no bugs):

  • MEMORY_ENGINES cache ensures in-process CliRunner tests share the seeded in-memory DB — traced full chain from get_container()providers.Factory(DecisionService)providers.Factory(UnitOfWork)providers.Callable(get_database_url)os.environMEMORY_ENGINES cache hit
  • Robot helper file-based SQLite — parent seeds and commits, child subprocess reads same file via env_extra
  • Cleanup orderingcontext.add_cleanup() runs before after_scenario; double-dispose of MEMORY_ENGINES is idempotent
  • catch_exceptions=True correctly captures exceptions for assertion in Then steps
  • Robot inner timeout (25s) < outer timeout (30s) — helper has time to clean up

Updated totals (Round 1 + Round 2 + Round 3):

ID Severity Category Finding
BUG-1 Major BUG Explain assertion checks root's "A REST API" instead of child's "FastAPI" — passes by accident
SCOPE-1 Major PROCESS 3 files out of scope (settings test, settings steps, devcontainer fix)
SPEC-1 Major PROCESS Commit message says @tdd_expected_fail but code omits it — message is stale
SPEC-2 Major PROCESS Branch name diverges from issue #648 Metadata
TEST-1 Minor TEST Given used after When in settings scenario
TEST-2 Minor TEST Then steps delegate to shared function, coupling to When-set state
ENV-1 Minor TEST Background overwrites before_scenario DB URL, orphaning temp file
ENV-2 Minor TEST reset_container() discards mock AI; test implicitly depends on --dry-run
CODE-2 Nit PROCESS Out-of-scope devcontainer assertion text change
CODE-3 Nit CODE Unnecessary sys.path insertion in settings_steps.py
## Round 3 — Scope Audit + Bug Hunt ### BUG-1 (Major): Explain assertion checks the wrong decision's content **Behave:** `container_resolve_crash_steps.py:238-241` **Robot:** `helper_container_resolve_crash.py:170-172` `plan explain` is invoked with the **child** decision (`chosen_option = "FastAPI"`), but the assertion checks for the **root** decision's content: ```python # Behave assert "A REST API" in (result.output or "") or '"chosen"' in (result.output or "") # Robot helper if "A REST API" not in output and '"chosen"' not in output: ``` The child's `chosen_option` is `"FastAPI"`, not `"A REST API"`. The assertion **passes by accident** because the fallback `'"chosen"'` matches the JSON key `"chosen_option"` in the output. If the JSON format changes to not include the substring `"chosen"`, both assertions silently break. **Fix:** Replace `"A REST API"` with `"FastAPI"` in both the Behave step and Robot helper. --- ### SCOPE-1 (Major): 3 of 10 files are out of scope for issue #648 Per CONTRIBUTING.md Rule 1 (Atomicity) and Rule 3 (Single Responsibility): | File | In Scope? | Rationale | |---|---|---| | `features/container_resolve_crash.feature` | Yes | Core deliverable | | `features/steps/container_resolve_crash_steps.py` | Yes | Core deliverable | | `robot/container_resolve_crash.robot` | Yes | Core deliverable | | `robot/helper_container_resolve_crash.py` | Yes | Core deliverable | | `CHANGELOG.md` | Yes | Merge checklist requirement | | `CONTRIBUTORS.md` | Yes | New author | | `src/cleveragents/config/settings.py` | Borderline | Adds `Settings.reset()` — new production public API used by test cleanup | | **`features/settings_configuration.feature`** | **No** | Tests `Settings.reset()` — not in #648 AC | | **`features/steps/settings_steps.py`** | **No** | 4 new step defs + `sys.path` insertion for above | | **`features/devcontainer_cleanup.feature`** | **No** | Assertion text change completely unrelated to #648 | Per CONTRIBUTING.md, this commit does "Add TDD tests for #647 **and** add `Settings.reset()` method **and** add settings reset Behave test **and** fix devcontainer assertion text" — four concerns that should be three separate issues. **Recommendation:** Remove the three out-of-scope files from this commit. If `Settings.reset()` and its test are needed, file a separate issue. The devcontainer assertion fix needs its own issue. --- ### Verified Correct (no bugs): - **MEMORY_ENGINES cache** ensures in-process CliRunner tests share the seeded in-memory DB — traced full chain from `get_container()` → `providers.Factory(DecisionService)` → `providers.Factory(UnitOfWork)` → `providers.Callable(get_database_url)` → `os.environ` → `MEMORY_ENGINES` cache hit - **Robot helper file-based SQLite** — parent seeds and commits, child subprocess reads same file via `env_extra` - **Cleanup ordering** — `context.add_cleanup()` runs before `after_scenario`; double-dispose of `MEMORY_ENGINES` is idempotent - **`catch_exceptions=True`** correctly captures exceptions for assertion in Then steps - **Robot inner timeout (25s) < outer timeout (30s)** — helper has time to clean up --- **Updated totals (Round 1 + Round 2 + Round 3):** | ID | Severity | Category | Finding | |---|---|---|---| | BUG-1 | Major | BUG | Explain assertion checks root's `"A REST API"` instead of child's `"FastAPI"` — passes by accident | | SCOPE-1 | Major | PROCESS | 3 files out of scope (settings test, settings steps, devcontainer fix) | | SPEC-1 | Major | PROCESS | Commit message says `@tdd_expected_fail` but code omits it — message is stale | | SPEC-2 | Major | PROCESS | Branch name diverges from issue #648 Metadata | | TEST-1 | Minor | TEST | `Given` used after `When` in settings scenario | | TEST-2 | Minor | TEST | `Then` steps delegate to shared function, coupling to `When`-set state | | ENV-1 | Minor | TEST | Background overwrites `before_scenario` DB URL, orphaning temp file | | ENV-2 | Minor | TEST | `reset_container()` discards mock AI; test implicitly depends on `--dry-run` | | CODE-2 | Nit | PROCESS | Out-of-scope devcontainer assertion text change | | CODE-3 | Nit | CODE | Unnecessary `sys.path` insertion in settings_steps.py |
freemo approved these changes 2026-03-19 04:53:51 +00:00
Dismissed
freemo left a comment

Code Review — PR #1053 test(cli): TDD failing tests for Container.resolve() crash

Good work using a real DI container instead of MagicMock — this is the key insight, since MagicMock auto-creates attributes and would have hidden the resolve() crash. The three scenarios covering plan tree, plan explain, and plan correct provide thorough regression coverage.

Approved with notes:

  1. Production code in a test PRSettings.reset() classmethod added to src/cleveragents/config/settings.py is production code. While small and well-scoped, a test-only PR ideally shouldn't modify production source. This is documented as "test-alignment stabilization" which is reasonable, but worth noting for future reference.

  2. Scope creep — The devcontainer assertion text change ("rebuild requires devcontainer""not a devcontainer-instance") and the CONTRIBUTORS.md update are collateral changes. The PR body explains these as rebase-related stabilizations, which is acceptable.

  3. Issue #648 acceptance criteria — The issue AC specifies @tdd_expected_fail tags, but the PR correctly omits them since bug #647 appears to be already fixed. Consider updating the issue to document this decision so it's not flagged by future reviewers.

  4. Existing comments addressed — Good to see the PR went through prior review (5 comments). The supersession of PR #670 is clearly documented.

## Code Review — PR #1053 `test(cli): TDD failing tests for Container.resolve() crash` Good work using a real DI container instead of `MagicMock` — this is the key insight, since `MagicMock` auto-creates attributes and would have hidden the `resolve()` crash. The three scenarios covering `plan tree`, `plan explain`, and `plan correct` provide thorough regression coverage. **Approved** with notes: 1. **Production code in a test PR** — `Settings.reset()` classmethod added to `src/cleveragents/config/settings.py` is production code. While small and well-scoped, a test-only PR ideally shouldn't modify production source. This is documented as "test-alignment stabilization" which is reasonable, but worth noting for future reference. 2. **Scope creep** — The devcontainer assertion text change (`"rebuild requires devcontainer"` → `"not a devcontainer-instance"`) and the CONTRIBUTORS.md update are collateral changes. The PR body explains these as rebase-related stabilizations, which is acceptable. 3. **Issue #648 acceptance criteria** — The issue AC specifies `@tdd_expected_fail` tags, but the PR correctly omits them since bug #647 appears to be already fixed. Consider updating the issue to document this decision so it's not flagged by future reviewers. 4. **Existing comments addressed** — Good to see the PR went through prior review (5 comments). The supersession of PR #670 is clearly documented.
Owner

Planning Agent — PR #1053 Review Assessment

@hamza.khyari — Excellent review work across 3 rounds. Your thoroughness here is exactly what this project needs. Let me address the findings:

BUG-1 (Major): Agreed — the explain assertion checking "A REST API" instead of "FastAPI" is a real bug in the test. It passes by accident via the '"chosen"' fallback, which is fragile. @aditya must fix this before merge. The assertion should check for "FastAPI" to match the child decision's chosen_option.

SCOPE-1 (Major): I agree with the finding that 3 files are out of scope. Per CONTRIBUTING.md Section "Atomic Commits" — the Settings.reset() addition, its test, and the devcontainer assertion fix are separate concerns. However, given that this PR has been in review for an extended period and the Settings.reset() is a small utility method used by test cleanup, I'm ruling as follows:

  • The devcontainer assertion text change (features/devcontainer_cleanup.feature) must be removed from this PR — it is unrelated to #648.
  • The Settings.reset() method and its Behave test can remain in this PR since the test cleanup in the core deliverable depends on it. File a follow-up issue to document Settings.reset() as a public API addition.
  • The sys.path insertion (CODE-3) must be removed — it is unnecessary per environment.py configuration.

SPEC-1: @aditya has addressed this in the PR description. The commit message body should be amended to accurately reflect regression-guard semantics.

SPEC-2: Addressed — branch name updated.

TEST-1, ENV-1, ENV-2: @aditya reports these are fixed.

Action required for merge:

  1. Fix BUG-1 (explain assertion)
  2. Remove features/devcontainer_cleanup.feature change
  3. Remove sys.path insertion from settings_steps.py
  4. Amend commit message body

Once these are done, this PR should be mergeable. I've assigned @hamza.khyari and @brent.edwards as reviewers. Two approvals required per CONTRIBUTING.md.

**Planning Agent — PR #1053 Review Assessment** @hamza.khyari — Excellent review work across 3 rounds. Your thoroughness here is exactly what this project needs. Let me address the findings: **BUG-1 (Major)**: Agreed — the explain assertion checking `"A REST API"` instead of `"FastAPI"` is a real bug in the test. It passes by accident via the `'"chosen"'` fallback, which is fragile. @aditya must fix this before merge. The assertion should check for `"FastAPI"` to match the child decision's `chosen_option`. **SCOPE-1 (Major)**: I agree with the finding that 3 files are out of scope. Per CONTRIBUTING.md Section "Atomic Commits" — the `Settings.reset()` addition, its test, and the devcontainer assertion fix are separate concerns. However, given that this PR has been in review for an extended period and the `Settings.reset()` is a small utility method used by test cleanup, I'm ruling as follows: - The devcontainer assertion text change (`features/devcontainer_cleanup.feature`) **must be removed** from this PR — it is unrelated to #648. - The `Settings.reset()` method and its Behave test can remain in this PR since the test cleanup in the core deliverable depends on it. File a follow-up issue to document `Settings.reset()` as a public API addition. - The `sys.path` insertion (CODE-3) must be removed — it is unnecessary per environment.py configuration. **SPEC-1**: @aditya has addressed this in the PR description. The commit message body should be amended to accurately reflect regression-guard semantics. **SPEC-2**: Addressed — branch name updated. **TEST-1, ENV-1, ENV-2**: @aditya reports these are fixed. **Action required for merge**: 1. Fix BUG-1 (explain assertion) 2. Remove `features/devcontainer_cleanup.feature` change 3. Remove `sys.path` insertion from settings_steps.py 4. Amend commit message body Once these are done, this PR should be mergeable. I've assigned @hamza.khyari and @brent.edwards as reviewers. Two approvals required per CONTRIBUTING.md.
aditya force-pushed tdd/m3-container-resolve-crash from 33a4daf503
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 17s
CI / build (pull_request) Successful in 17s
CI / quality (pull_request) Successful in 31s
CI / typecheck (pull_request) Successful in 43s
CI / security (pull_request) Successful in 43s
CI / unit_tests (pull_request) Successful in 2m2s
CI / docker (pull_request) Successful in 55s
CI / integration_tests (pull_request) Successful in 3m38s
CI / e2e_tests (pull_request) Successful in 3m58s
CI / coverage (pull_request) Successful in 6m39s
CI / benchmark-regression (pull_request) Successful in 37m52s
to 03353f0e37
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 54s
CI / security (pull_request) Successful in 57s
CI / unit_tests (pull_request) Successful in 3m47s
CI / integration_tests (pull_request) Successful in 4m7s
CI / docker (pull_request) Successful in 1m3s
CI / e2e_tests (pull_request) Successful in 5m18s
CI / coverage (pull_request) Successful in 7m26s
CI / benchmark-regression (pull_request) Successful in 38m17s
2026-03-19 11:13:38 +00:00
Compare
aditya dismissed freemo's review 2026-03-19 11:13:38 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

PR #1053 — Review Response (Hamza Round 3 + Freemo Assessment)

Freemo Action Items — All Resolved

1. BUG-1 (Major) — Explain assertion checks wrong decision content

Finding: plan explain is invoked with the child decision (chosen_option = "FastAPI"), but assertions checked for the root decision content ("A REST API"). The test passed accidentally via the "chosen" JSON key fallback.

Resolution: Replaced "A REST API" with "FastAPI" in both assertion sites:

  • features/steps/container_resolve_crash_steps.py — Behave Then step now asserts "FastAPI" in result.output
  • robot/helper_container_resolve_crash.py_assert_explain_output now checks "FastAPI" not in output

The fragile '"chosen"' fallback has been removed entirely.


2. SCOPE-1 / CODE-2 — Remove features/devcontainer_cleanup.feature

Finding: The devcontainer assertion text change is unrelated to #648 and violates CONTRIBUTING.md atomicity rules.

Resolution: Reverted features/devcontainer_cleanup.feature to its origin/master state. The file no longer appears in the PR diff (9 files, down from 10).


3. CODE-3 — Remove sys.path insertion from settings_steps.py

Finding: sys.path.insert(0, str(_SRC)) in settings_steps.py is unnecessary since features/environment.py already configures src/ on the path.

Resolution: The sys.path insertion block (including import sys, _SRC variable, and # noqa: E402) has been fully removed. The net PR diff for settings_steps.py contains no sys.path references — only the @when decorator addition and Settings.reset() step definitions.


4. SPEC-1 — Commit message body reflects stale @tdd_expected_fail semantics

Finding: Commit body said "Tests tagged @tdd_expected_fail" but the code correctly omits the tag since bug #647 is fixed.

Resolution: All three commits have been squashed into a single commit with an accurate message:

test(cli): add regression-guard tests for Container.resolve() crash

Add Behave and Robot Framework regression tests for bug #647, where
plan tree, plan explain, and plan correct CLI commands crashed with
AttributeError when resolving DecisionService from the DI container.

Tests use a real DI container with seeded decisions (not MagicMock)
to catch the exact class of bug that existing M3 tests missed.
Assertions verify successful execution and command-specific output
content. Includes Settings.reset() classmethod for robust singleton
cleanup in test teardown.

ISSUES CLOSED: #648

No mention of @tdd_expected_fail. Message accurately describes regression-guard semantics.


Previously Addressed Items (Rounds 1–3)

ID Severity Status Resolution
SPEC-2 Major Resolved Branch renamed to tdd/m3-container-resolve-crash; tracked in issue #648
TEST-1 Minor Resolved Given after When changed to And in settings_configuration.feature
TEST-2 Minor Resolved Removed context.cr647_command coupling; each Then step now owns its assertions
ENV-1 Minor Resolved Background reuses per-scenario DB URL from environment.py instead of overwriting
ENV-2 Minor Resolved Feature file documents --dry-run dependency to prevent live LLM access

Freemo Rulings Acknowledged

  • Settings.reset() and its Behave test remain in this PR per Freemo's ruling (test cleanup depends on it). A follow-up issue will be filed to document it as a public API addition.
  • CONTRIBUTORS.md update remains as an acceptable collateral change.
  • Issue #648 AC updated to document the @tdd_expected_fail omission decision.

Final PR State

  • 1 squashed commit on tdd/m3-container-resolve-crash
  • 9 files changed, +713 lines
# PR #1053 — Review Response (Hamza Round 3 + Freemo Assessment) ## Freemo Action Items — All Resolved ### 1. BUG-1 (Major) — Explain assertion checks wrong decision content **Finding:** `plan explain` is invoked with the child decision (`chosen_option = "FastAPI"`), but assertions checked for the root decision content (`"A REST API"`). The test passed accidentally via the `"chosen"` JSON key fallback. **Resolution:** Replaced `"A REST API"` with `"FastAPI"` in both assertion sites: - `features/steps/container_resolve_crash_steps.py` — Behave Then step now asserts `"FastAPI" in result.output` - `robot/helper_container_resolve_crash.py` — `_assert_explain_output` now checks `"FastAPI" not in output` The fragile `'"chosen"'` fallback has been removed entirely. --- ### 2. SCOPE-1 / CODE-2 — Remove `features/devcontainer_cleanup.feature` **Finding:** The devcontainer assertion text change is unrelated to #648 and violates CONTRIBUTING.md atomicity rules. **Resolution:** Reverted `features/devcontainer_cleanup.feature` to its `origin/master` state. The file no longer appears in the PR diff (9 files, down from 10). --- ### 3. CODE-3 — Remove `sys.path` insertion from `settings_steps.py` **Finding:** `sys.path.insert(0, str(_SRC))` in `settings_steps.py` is unnecessary since `features/environment.py` already configures `src/` on the path. **Resolution:** The `sys.path` insertion block (including `import sys`, `_SRC` variable, and `# noqa: E402`) has been fully removed. The net PR diff for `settings_steps.py` contains no `sys.path` references — only the `@when` decorator addition and `Settings.reset()` step definitions. --- ### 4. SPEC-1 — Commit message body reflects stale `@tdd_expected_fail` semantics **Finding:** Commit body said "Tests tagged `@tdd_expected_fail`" but the code correctly omits the tag since bug #647 is fixed. **Resolution:** All three commits have been squashed into a single commit with an accurate message: ``` test(cli): add regression-guard tests for Container.resolve() crash Add Behave and Robot Framework regression tests for bug #647, where plan tree, plan explain, and plan correct CLI commands crashed with AttributeError when resolving DecisionService from the DI container. Tests use a real DI container with seeded decisions (not MagicMock) to catch the exact class of bug that existing M3 tests missed. Assertions verify successful execution and command-specific output content. Includes Settings.reset() classmethod for robust singleton cleanup in test teardown. ISSUES CLOSED: #648 ``` No mention of `@tdd_expected_fail`. Message accurately describes regression-guard semantics. --- ## Previously Addressed Items (Rounds 1–3) | ID | Severity | Status | Resolution | |----|----------|--------|------------| | SPEC-2 | Major | Resolved | Branch renamed to `tdd/m3-container-resolve-crash`; tracked in issue #648 | | TEST-1 | Minor | Resolved | `Given` after `When` changed to `And` in `settings_configuration.feature` | | TEST-2 | Minor | Resolved | Removed `context.cr647_command` coupling; each Then step now owns its assertions | | ENV-1 | Minor | Resolved | Background reuses per-scenario DB URL from `environment.py` instead of overwriting | | ENV-2 | Minor | Resolved | Feature file documents `--dry-run` dependency to prevent live LLM access | ## Freemo Rulings Acknowledged - **Settings.reset()** and its Behave test remain in this PR per Freemo's ruling (test cleanup depends on it). A follow-up issue will be filed to document it as a public API addition. - **CONTRIBUTORS.md** update remains as an acceptable collateral change. - **Issue #648 AC** updated to document the `@tdd_expected_fail` omission decision. ## Final PR State - **1 squashed commit** on `tdd/m3-container-resolve-crash` - **9 files** changed, +713 lines
Owner

Planning review (Day 42):

  • Outstanding REQUEST_CHANGES: @hamza.khyari submitted a REQUEST_CHANGES review on Mar 18 that appears unresolved. Please confirm whether the requested changes have been addressed and request a re-review from @hamza.khyari.

This is a Priority/Critical, Must-have PR for v3.2.0 and has @freemo approval. Resolving the outstanding review would clear it for merge consideration.

**Planning review (Day 42):** - **Outstanding REQUEST_CHANGES**: @hamza.khyari submitted a REQUEST_CHANGES review on Mar 18 that appears unresolved. Please confirm whether the requested changes have been addressed and request a re-review from @hamza.khyari. This is a Priority/Critical, Must-have PR for v3.2.0 and has @freemo approval. Resolving the outstanding review would clear it for merge consideration.
Owner

Day 43 Planning Review — Status Assessment and Action Items

Summary

This TDD PR for bug #647 has gone through 3 thorough review rounds by @hamza.khyari, with all findings addressed by @aditya in a single squashed commit. The PR is mergeable (no conflicts) and represents critical-path work for v3.2.0.

Comment Response

@hamza.khyari (Rounds 1-3)

Your review work on this PR is exemplary — the BUG-1 finding (explain assertion checking wrong decision content, passing by accident) is exactly the kind of subtle correctness issue that prevents test suite degradation. All findings were valid and well-documented.

@aditya (Response)

Your responses are thorough and address all actionable items. The commit message amendment correctly reflects regression-guard semantics. The BUG-1 fix (replacing "A REST API" with "FastAPI") and the removal of the fragile '"chosen"' fallback are correct.

@freemo (Day 42)

The outstanding REQUEST_CHANGES from @hamza.khyari appears to have been addressed by @aditya's Round 3 response. The review needs to be formally resolved.

Current Blocking Items

  1. @hamza.khyari needs to re-review and either approve or post new findings. All 4 action items from the Day 39 assessment have been addressed: BUG-1 fixed, devcontainer change removed, sys.path removed, commit message amended.
  2. Second reviewer approval needed. Per CONTRIBUTING.md, 2 approving reviews required. @brent.edwards should review if not already.

Action Required

  • @hamza.khyari: Please re-review the latest squashed commit and resolve your REQUEST_CHANGES review. If all findings are addressed, please APPROVE.
  • @brent.edwards: Please provide a second review for merge consideration.
  • @aditya: Ensure the closing keyword Closes #648 is in the PR description (not just the commit footer).

This is the #1 blocking TDD PR for v3.2.0. Bug #647 (Container.resolve crash) cannot be fully resolved until this TDD PR is merged. The stale REQUEST_CHANGES review has been blocking this PR for 5 days — this must be resolved today.

**Day 43 Planning Review — Status Assessment and Action Items** ## Summary This TDD PR for bug #647 has gone through 3 thorough review rounds by @hamza.khyari, with all findings addressed by @aditya in a single squashed commit. The PR is mergeable (no conflicts) and represents critical-path work for v3.2.0. ## Comment Response ### @hamza.khyari (Rounds 1-3) Your review work on this PR is exemplary — the BUG-1 finding (explain assertion checking wrong decision content, passing by accident) is exactly the kind of subtle correctness issue that prevents test suite degradation. All findings were valid and well-documented. ### @aditya (Response) Your responses are thorough and address all actionable items. The commit message amendment correctly reflects regression-guard semantics. The BUG-1 fix (replacing `"A REST API"` with `"FastAPI"`) and the removal of the fragile `'"chosen"'` fallback are correct. ### @freemo (Day 42) The outstanding REQUEST_CHANGES from @hamza.khyari appears to have been addressed by @aditya's Round 3 response. The review needs to be formally resolved. ## Current Blocking Items 1. **@hamza.khyari** needs to re-review and either approve or post new findings. All 4 action items from the Day 39 assessment have been addressed: BUG-1 fixed, devcontainer change removed, sys.path removed, commit message amended. 2. **Second reviewer approval needed.** Per CONTRIBUTING.md, 2 approving reviews required. @brent.edwards should review if not already. ## Action Required - **@hamza.khyari**: Please re-review the latest squashed commit and resolve your REQUEST_CHANGES review. If all findings are addressed, please APPROVE. - **@brent.edwards**: Please provide a second review for merge consideration. - **@aditya**: Ensure the closing keyword `Closes #648` is in the PR description (not just the commit footer). **This is the #1 blocking TDD PR for v3.2.0.** Bug #647 (Container.resolve crash) cannot be fully resolved until this TDD PR is merged. The stale REQUEST_CHANGES review has been blocking this PR for 5 days — this must be resolved today.
aditya force-pushed tdd/m3-container-resolve-crash from 03353f0e37
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 17s
CI / lint (pull_request) Successful in 18s
CI / quality (pull_request) Successful in 30s
CI / typecheck (pull_request) Successful in 54s
CI / security (pull_request) Successful in 57s
CI / unit_tests (pull_request) Successful in 3m47s
CI / integration_tests (pull_request) Successful in 4m7s
CI / docker (pull_request) Successful in 1m3s
CI / e2e_tests (pull_request) Successful in 5m18s
CI / coverage (pull_request) Successful in 7m26s
CI / benchmark-regression (pull_request) Successful in 38m17s
to dca032cb6e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 28s
CI / lint (pull_request) Successful in 3m31s
CI / quality (pull_request) Successful in 3m48s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 3m56s
CI / integration_tests (pull_request) Successful in 6m0s
CI / unit_tests (pull_request) Successful in 6m55s
CI / e2e_tests (pull_request) Successful in 7m59s
CI / docker (pull_request) Successful in 1m7s
CI / coverage (pull_request) Successful in 11m12s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
2026-03-23 06:38:00 +00:00
Compare
Member

Code Review — PR #1053 test(cli): add TDD failing tests for Container.resolve() crash

Findings

1. Bug (Medium): Robot helper seeds decisions without creating a plan record

robot/helper_container_resolve_crash.py:78_setup_decisions() creates plan_id = str(ULID()) and seeds decisions via DecisionService.record_decision(), but never creates an actual plan through PlanLifecycleService. The Behave test (features/steps/container_resolve_crash_steps.py:69-77) correctly creates a real plan via create_action() + use_action().

The Robot test relies on the CLI subprocess querying decisions by plan_id where decision rows exist but no plan record does. If any CLI command adds a plan-existence check before listing decisions, the Robot tests will break for a different reason than the bug under test.

Fix: Mirror the Behave pattern — create an action + plan via PlanLifecycleService before seeding decisions.

2. Observation (Low): Settings.reset() not used in environment.py

The new Settings.reset() classmethod (settings.py:633-640) does cls._instance = None, which is exactly what features/environment.py:631 does directly. Consider updating environment.py:631 to call Settings.reset() for consistency — otherwise there are two paths for the same operation.

3. Observation (Low): get_container() call is a no-op

features/steps/container_resolve_crash_steps.py:117 calls get_container() and the comment says "Get the real container so command paths use DI wiring" — but the returned container is never used. It's the CliRunner.invoke call that creates the container used by the CLI. The comment is misleading.

4. Observation (Low): catch_exceptions=True masks setup failures

All three cli_runner.invoke calls use catch_exceptions=True. This is intentional for verifying the bug's AttributeError, but also means any non-AttributeError setup failure (import error, missing table) would be silently captured and fail the isinstance assertion for the wrong reason.

5. Observation (Low): glob import vs pathlib convention

robot/helper_container_resolve_crash.py imports glob.glob for WAL file cleanup. The codebase convention (e.g., features/environment.py:692-694) uses explicit suffix iteration or pathlib.Path.glob().

Summary

Severity Count
Medium 1 (#1 — Robot helper missing plan record)
Low 4 (#2-#5 — observations)

Finding #1 should be fixed. The others are minor consistency notes.

## Code Review — PR #1053 `test(cli): add TDD failing tests for Container.resolve() crash` ### Findings #### 1. Bug (Medium): Robot helper seeds decisions without creating a plan record `robot/helper_container_resolve_crash.py:78` — `_setup_decisions()` creates `plan_id = str(ULID())` and seeds decisions via `DecisionService.record_decision()`, but never creates an actual plan through `PlanLifecycleService`. The Behave test (`features/steps/container_resolve_crash_steps.py:69-77`) correctly creates a real plan via `create_action()` + `use_action()`. The Robot test relies on the CLI subprocess querying decisions by `plan_id` where decision rows exist but no plan record does. If any CLI command adds a plan-existence check before listing decisions, the Robot tests will break for a different reason than the bug under test. **Fix:** Mirror the Behave pattern — create an action + plan via `PlanLifecycleService` before seeding decisions. #### 2. Observation (Low): `Settings.reset()` not used in `environment.py` The new `Settings.reset()` classmethod (`settings.py:633-640`) does `cls._instance = None`, which is exactly what `features/environment.py:631` does directly. Consider updating `environment.py:631` to call `Settings.reset()` for consistency — otherwise there are two paths for the same operation. #### 3. Observation (Low): `get_container()` call is a no-op `features/steps/container_resolve_crash_steps.py:117` calls `get_container()` and the comment says "Get the real container so command paths use DI wiring" — but the returned container is never used. It's the `CliRunner.invoke` call that creates the container used by the CLI. The comment is misleading. #### 4. Observation (Low): `catch_exceptions=True` masks setup failures All three `cli_runner.invoke` calls use `catch_exceptions=True`. This is intentional for verifying the bug's `AttributeError`, but also means any non-`AttributeError` setup failure (import error, missing table) would be silently captured and fail the `isinstance` assertion for the wrong reason. #### 5. Observation (Low): `glob` import vs `pathlib` convention `robot/helper_container_resolve_crash.py` imports `glob.glob` for WAL file cleanup. The codebase convention (e.g., `features/environment.py:692-694`) uses explicit suffix iteration or `pathlib.Path.glob()`. ### Summary | Severity | Count | |----------|-------| | Medium | 1 (#1 — Robot helper missing plan record) | | Low | 4 (#2-#5 — observations) | Finding #1 should be fixed. The others are minor consistency notes.
hamza.khyari approved these changes 2026-03-23 12:38:52 +00:00
Dismissed
aditya force-pushed tdd/m3-container-resolve-crash from dca032cb6e
Some checks failed
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 28s
CI / lint (pull_request) Successful in 3m31s
CI / quality (pull_request) Successful in 3m48s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 3m56s
CI / integration_tests (pull_request) Successful in 6m0s
CI / unit_tests (pull_request) Successful in 6m55s
CI / e2e_tests (pull_request) Successful in 7m59s
CI / docker (pull_request) Successful in 1m7s
CI / coverage (pull_request) Successful in 11m12s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Has been cancelled
to f517dd5a94
All checks were successful
CI / build (pull_request) Successful in 21s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m44s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m5s
CI / benchmark-publish (pull_request) Has been skipped
CI / e2e_tests (pull_request) Successful in 8m31s
CI / unit_tests (pull_request) Successful in 8m54s
CI / integration_tests (pull_request) Successful in 8m55s
CI / docker (pull_request) Successful in 1m21s
CI / coverage (pull_request) Successful in 11m20s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h2m4s
2026-03-23 13:07:59 +00:00
Compare
aditya dismissed hamza.khyari's review 2026-03-23 13:07:59 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

Thanks for the detailed review and re-check.

Response to findings

1) Bug (Medium): Robot helper seeds decisions without creating a plan record

Agreed. This was a valid realism gap in the Robot helper setup.

Fixed by aligning the helper with the same lifecycle pattern used in Behave:

  • create action via PlanLifecycleService.create_action(...)
  • create plan via PlanLifecycleService.use_action(...)
  • seed decisions against that real plan_id

This keeps the test scoped to bug #647 while avoiding fragility from synthetic plan IDs.

2) Observation (Low): Settings.reset() not used in environment.py

Acknowledged. This is a valid consistency note.

For this PR, I kept scope strictly to the targeted review bug fix above and did not include collateral cleanup changes.

3) Observation (Low): get_container() call is a no-op

Acknowledged. The note is correct; this is a clarity/comment cleanup item rather than a functional issue.

Not changed in this PR to keep scope tight to the targeted fix.

4) Observation (Low): catch_exceptions=True may mask setup failures

Acknowledged. Current assertions still fail non-AttributeError exceptions, but this is a fair test-diagnostics caution.

No behavior change made here in order to avoid expanding scope beyond the requested fix.

5) Observation (Low): glob import vs pathlib convention

Acknowledged as style/consistency feedback.

Left unchanged in this PR because it is non-blocking and outside the immediate bug-fix scope.

Summary

  • Fixed: #1 (plan record realism gap in Robot helper)
  • Noted for follow-up / out-of-scope in this PR: #2, #3, #4, #5
Thanks for the detailed review and re-check. ## Response to findings ### 1) Bug (Medium): Robot helper seeds decisions without creating a plan record Agreed. This was a valid realism gap in the Robot helper setup. Fixed by aligning the helper with the same lifecycle pattern used in Behave: - create action via `PlanLifecycleService.create_action(...)` - create plan via `PlanLifecycleService.use_action(...)` - seed decisions against that real `plan_id` This keeps the test scoped to bug #647 while avoiding fragility from synthetic plan IDs. ### 2) Observation (Low): `Settings.reset()` not used in `environment.py` Acknowledged. This is a valid consistency note. For this PR, I kept scope strictly to the targeted review bug fix above and did not include collateral cleanup changes. ### 3) Observation (Low): `get_container()` call is a no-op Acknowledged. The note is correct; this is a clarity/comment cleanup item rather than a functional issue. Not changed in this PR to keep scope tight to the targeted fix. ### 4) Observation (Low): `catch_exceptions=True` may mask setup failures Acknowledged. Current assertions still fail non-`AttributeError` exceptions, but this is a fair test-diagnostics caution. No behavior change made here in order to avoid expanding scope beyond the requested fix. ### 5) Observation (Low): `glob` import vs `pathlib` convention Acknowledged as style/consistency feedback. Left unchanged in this PR because it is non-blocking and outside the immediate bug-fix scope. ## Summary - Fixed: #1 (plan record realism gap in Robot helper) - Noted for follow-up / out-of-scope in this PR: #2, #3, #4, #5
hurui200320 approved these changes 2026-03-24 06:12:57 +00:00
Dismissed
hurui200320 left a comment

PR Review: !1053 (Ticket #648)

Verdict: Approve

All 6 acceptance criteria from ticket #648 are satisfied. All prior review findings from 4 rounds (Hamza, Freemo) have been verified as fixed in the current code. The tests correctly use real DI containers (not MagicMock) to catch the exact class of bug that existing M3 tests missed. The remaining findings are minor quality suggestions and nits — none affect correctness, completeness, or spec compliance.


Critical Issues

None.

Major Issues

None.

Minor Issues

1. Behave step leaks SQLAlchemy engine — not disposed in cleanup

  • File: features/steps/container_resolve_crash_steps.py:62-63, 129-143
  • Problem: The Given step creates uow = UnitOfWork(database_url) and seeds decisions. After init_database() disposes the initial engine, subsequent record_decision() calls recreate an engine via the engine property. Since environment.py's before_scenario sets a file-based SQLite URL, this engine is not stored in MEMORY_ENGINES and the cleanup closure never disposes it — it relies on garbage collection. The Robot helper correctly captures ctx.uow and calls ctx.uow.engine.dispose() in its cleanup.
  • Recommendation: Capture uow in the cleanup closure and dispose its engine explicitly, matching the Robot helper's pattern.

2. Dead step registration — @then("cr647- the command should execute successfully") never matched

  • File: features/steps/container_resolve_crash_steps.py:218
  • Problem: step_cr647_command_should_succeed is decorated with @then(...) registering it as a Behave step, but no scenario uses this step text. It's only called programmatically by the three specific Then steps. The decorator adds a dead entry to Behave's step registry.
  • Recommendation: Remove the @then decorator and rename to _assert_command_succeeded(context) to signal it's a helper, not a step.

3. plan correct assertions weaker than plan tree / plan explain

  • File: features/steps/container_resolve_crash_steps.py:276-284, robot/helper_container_resolve_crash.py:179-185
  • Problem: The plan correct assertion only validates decision_id in the output. By contrast, plan explain validates both decision_id AND "FastAPI". There's no command-specific content validation for plan correct (e.g., dry-run output, revert mode, guidance text).
  • Recommendation: Add at least one plan correct-specific assertion (e.g., check for "revert", "dry-run", or "Django" guidance text) to strengthen the regression guard.

4. get_container() call is a no-op with misleading comment

  • File: features/steps/container_resolve_crash_steps.py:125-126
  • Problem: _ = get_container() discards the return value. The comment says "Get the real container so command paths use DI wiring," but CliRunner.invoke creates its own container. The side effect (warming the singleton cache) may be meaningful, but the comment misrepresents the purpose.
  • Recommendation: Either remove the call, or update the comment to: # Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database.

5. Settings.reset() has no test-only documentation guard

  • File: src/cleveragents/config/settings.py:633-640
  • Problem: The only production code change. Settings.reset() is public with no indication it's test-only. If production code accidentally calls it, it would wipe the singleton mid-flight. The risk is low since the codebase already has 50+ instances of Settings._instance = None in tests, and this encapsulates the pattern. But a documentation guard would be beneficial.
  • Recommendation: Add .. warning:: **Test use only.** Do not call in production code paths. to the docstring.

6. Missing type annotations on 4 new settings step functions

  • File: features/steps/settings_steps.py:613, 623, 629, 637
  • Problem: The new step functions lack type annotations on context and expected parameters. Per CONTRIBUTING.md, new code should have explicit types. The container_resolve_crash_steps.py file in this same PR correctly uses context: Context and -> None.
  • Recommendation: Add context: Context, expected: str, and -> None type hints.

7. CONTRIBUTORS.md new entry not in alphabetical order

  • File: CONTRIBUTORS.md:9
  • Problem: Aditya Chhabra is appended at the end. The existing list follows alphabetical order (Brent, Hamza, Jeffrey, Luis, Rui). "Aditya" should appear first.
  • Recommendation: Move to its alphabetically correct position.

8. PR title does not match commit message

  • PR title: test(cli): add TDD failing tests for Container.resolve() crash
  • Commit: test(cli): add regression-guard tests for Container.resolve() crash
  • Recommendation: Update the PR title to match the commit: test(cli): add regression-guard tests for Container.resolve() crash.

9. plan correct test uses undocumented --plan flag (spec drift)

  • Files: container_resolve_crash_steps.py:202, helper_container_resolve_crash.py:270
  • Problem: Both tests pass --plan to plan correct, but the spec doesn't define this flag. The code correctly includes an inline comment acknowledging this. The test is correct for the current implementation but documents a spec-vs-implementation drift.
  • Recommendation: File a separate issue to align the spec with the implementation (add --plan to spec, or make the CLI infer the plan from decision ID).

Nits

10. glob.glob instead of pathlib convention

  • File: robot/helper_container_resolve_crash.py:15, 147
  • Recommendation: Replace with explicit suffix iteration: for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True).

11. Robot test cases repeat identical boilerplate

  • File: robot/container_resolve_crash.robot:24-55
  • Recommendation: Consider extracting a Run And Verify Helper keyword to deduplicate the 3 test cases. Acceptable as-is for 3 tests.

12. Feature description awkward line break

  • File: features/container_resolve_crash.feature:7-8
  • Recommendation: Reflow to keep mock get_container() on the same line.

13. Redundant TYPE_CHECKING import

  • File: features/steps/container_resolve_crash_steps.py:23-24
  • Problem: Decision is imported under TYPE_CHECKING at module level but is also imported inside the function at line 47.
  • Recommendation: Remove the module-level TYPE_CHECKING guard since the in-function import provides the type within the relevant scope.

Summary

This is a well-crafted TDD regression test PR. The core deliverables are solid: three Behave BDD scenarios and three Robot Framework integration tests that exercise plan tree, plan explain, and plan correct through a real DI container with seeded decisions — correctly catching the class of bug that existing M3 tests missed with their MagicMock approach. All 6 acceptance criteria are met, all 4 rounds of prior review feedback have been addressed, and the Settings.reset() addition is a clean encapsulation of an existing test pattern.

The 9 minor findings are quality improvements (resource cleanup, type annotations, naming, ordering) — not correctness issues. The 4 nits are stylistic. The PR is ready to merge after addressing the minor items (or acknowledging them as follow-up work).

## PR Review: !1053 (Ticket #648) ### Verdict: ✅ Approve All 6 acceptance criteria from ticket #648 are satisfied. All prior review findings from 4 rounds (Hamza, Freemo) have been verified as fixed in the current code. The tests correctly use real DI containers (not MagicMock) to catch the exact class of bug that existing M3 tests missed. The remaining findings are minor quality suggestions and nits — none affect correctness, completeness, or spec compliance. --- ### Critical Issues None. ### Major Issues None. ### Minor Issues **1. Behave step leaks SQLAlchemy engine — not disposed in cleanup** - **File:** `features/steps/container_resolve_crash_steps.py:62-63, 129-143` - **Problem:** The Given step creates `uow = UnitOfWork(database_url)` and seeds decisions. After `init_database()` disposes the initial engine, subsequent `record_decision()` calls recreate an engine via the `engine` property. Since `environment.py`'s `before_scenario` sets a file-based SQLite URL, this engine is *not* stored in `MEMORY_ENGINES` and the cleanup closure never disposes it — it relies on garbage collection. The Robot helper correctly captures `ctx.uow` and calls `ctx.uow.engine.dispose()` in its cleanup. - **Recommendation:** Capture `uow` in the cleanup closure and dispose its engine explicitly, matching the Robot helper's pattern. **2. Dead step registration — `@then("cr647- the command should execute successfully")` never matched** - **File:** `features/steps/container_resolve_crash_steps.py:218` - **Problem:** `step_cr647_command_should_succeed` is decorated with `@then(...)` registering it as a Behave step, but no scenario uses this step text. It's only called programmatically by the three specific Then steps. The decorator adds a dead entry to Behave's step registry. - **Recommendation:** Remove the `@then` decorator and rename to `_assert_command_succeeded(context)` to signal it's a helper, not a step. **3. `plan correct` assertions weaker than `plan tree` / `plan explain`** - **File:** `features/steps/container_resolve_crash_steps.py:276-284`, `robot/helper_container_resolve_crash.py:179-185` - **Problem:** The `plan correct` assertion only validates `decision_id` in the output. By contrast, `plan explain` validates both `decision_id` AND `"FastAPI"`. There's no command-specific content validation for `plan correct` (e.g., dry-run output, revert mode, guidance text). - **Recommendation:** Add at least one `plan correct`-specific assertion (e.g., check for `"revert"`, `"dry-run"`, or `"Django"` guidance text) to strengthen the regression guard. **4. `get_container()` call is a no-op with misleading comment** - **File:** `features/steps/container_resolve_crash_steps.py:125-126` - **Problem:** `_ = get_container()` discards the return value. The comment says "Get the real container so command paths use DI wiring," but `CliRunner.invoke` creates its own container. The side effect (warming the singleton cache) may be meaningful, but the comment misrepresents the purpose. - **Recommendation:** Either remove the call, or update the comment to: `# Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database.` **5. `Settings.reset()` has no test-only documentation guard** - **File:** `src/cleveragents/config/settings.py:633-640` - **Problem:** The only production code change. `Settings.reset()` is public with no indication it's test-only. If production code accidentally calls it, it would wipe the singleton mid-flight. The risk is low since the codebase already has 50+ instances of `Settings._instance = None` in tests, and this encapsulates the pattern. But a documentation guard would be beneficial. - **Recommendation:** Add `.. warning:: **Test use only.** Do not call in production code paths.` to the docstring. **6. Missing type annotations on 4 new settings step functions** - **File:** `features/steps/settings_steps.py:613, 623, 629, 637` - **Problem:** The new step functions lack type annotations on `context` and `expected` parameters. Per CONTRIBUTING.md, new code should have explicit types. The `container_resolve_crash_steps.py` file in this same PR correctly uses `context: Context` and `-> None`. - **Recommendation:** Add `context: Context`, `expected: str`, and `-> None` type hints. **7. CONTRIBUTORS.md new entry not in alphabetical order** - **File:** `CONTRIBUTORS.md:9` - **Problem:** `Aditya Chhabra` is appended at the end. The existing list follows alphabetical order (Brent, Hamza, Jeffrey, Luis, Rui). "Aditya" should appear first. - **Recommendation:** Move to its alphabetically correct position. **8. PR title does not match commit message** - **PR title:** `test(cli): add TDD failing tests for Container.resolve() crash` - **Commit:** `test(cli): add regression-guard tests for Container.resolve() crash` - **Recommendation:** Update the PR title to match the commit: `test(cli): add regression-guard tests for Container.resolve() crash`. **9. `plan correct` test uses undocumented `--plan` flag (spec drift)** - **Files:** `container_resolve_crash_steps.py:202`, `helper_container_resolve_crash.py:270` - **Problem:** Both tests pass `--plan` to `plan correct`, but the spec doesn't define this flag. The code correctly includes an inline comment acknowledging this. The test is correct for the *current implementation* but documents a spec-vs-implementation drift. - **Recommendation:** File a separate issue to align the spec with the implementation (add `--plan` to spec, or make the CLI infer the plan from decision ID). ### Nits **10. `glob.glob` instead of `pathlib` convention** - **File:** `robot/helper_container_resolve_crash.py:15, 147` - **Recommendation:** Replace with explicit suffix iteration: `for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True)`. **11. Robot test cases repeat identical boilerplate** - **File:** `robot/container_resolve_crash.robot:24-55` - **Recommendation:** Consider extracting a `Run And Verify Helper` keyword to deduplicate the 3 test cases. Acceptable as-is for 3 tests. **12. Feature description awkward line break** - **File:** `features/container_resolve_crash.feature:7-8` - **Recommendation:** Reflow to keep `mock get_container()` on the same line. **13. Redundant `TYPE_CHECKING` import** - **File:** `features/steps/container_resolve_crash_steps.py:23-24` - **Problem:** `Decision` is imported under `TYPE_CHECKING` at module level but is also imported inside the function at line 47. - **Recommendation:** Remove the module-level `TYPE_CHECKING` guard since the in-function import provides the type within the relevant scope. --- ### Summary This is a well-crafted TDD regression test PR. The core deliverables are solid: three Behave BDD scenarios and three Robot Framework integration tests that exercise `plan tree`, `plan explain`, and `plan correct` through a real DI container with seeded decisions — correctly catching the class of bug that existing M3 tests missed with their `MagicMock` approach. All 6 acceptance criteria are met, all 4 rounds of prior review feedback have been addressed, and the `Settings.reset()` addition is a clean encapsulation of an existing test pattern. The 9 minor findings are quality improvements (resource cleanup, type annotations, naming, ordering) — not correctness issues. The 4 nits are stylistic. The PR is ready to merge after addressing the minor items (or acknowledging them as follow-up work).
aditya changed title from test(cli): add TDD failing tests for Container.resolve() crash to test(cli): add regression-guard tests for Container.resolve() crash 2026-03-24 06:45:07 +00:00
aditya force-pushed tdd/m3-container-resolve-crash from f517dd5a94
All checks were successful
CI / build (pull_request) Successful in 21s
CI / lint (pull_request) Successful in 3m18s
CI / quality (pull_request) Successful in 3m44s
CI / typecheck (pull_request) Successful in 3m55s
CI / security (pull_request) Successful in 4m5s
CI / benchmark-publish (pull_request) Has been skipped
CI / e2e_tests (pull_request) Successful in 8m31s
CI / unit_tests (pull_request) Successful in 8m54s
CI / integration_tests (pull_request) Successful in 8m55s
CI / docker (pull_request) Successful in 1m21s
CI / coverage (pull_request) Successful in 11m20s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 1h2m4s
to 36c36bc0ee
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / build (pull_request) Successful in 18s
CI / lint (pull_request) Successful in 3m19s
CI / quality (pull_request) Successful in 3m41s
CI / typecheck (pull_request) Successful in 3m54s
CI / security (pull_request) Successful in 4m2s
CI / integration_tests (pull_request) Successful in 6m41s
CI / unit_tests (pull_request) Successful in 6m49s
CI / docker (pull_request) Successful in 1m9s
CI / e2e_tests (pull_request) Successful in 9m19s
CI / coverage (pull_request) Successful in 11m17s
CI / status-check (pull_request) Successful in 1s
CI / benchmark-regression (pull_request) Successful in 50m38s
2026-03-24 08:47:08 +00:00
Compare
aditya dismissed hurui200320's review 2026-03-24 08:47:08 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

Response to @hurui200320 Review (Round 5)

Thanks for the thorough review and the approval. All 9 minor findings and 4 nits have been addressed in the amended commit. Here's the summary:

Fixed (Minor Issues)

#1 — Behave step leaks SQLAlchemy engine
Fixed. The cleanup closure now explicitly captures uow and calls uow.engine.dispose() before the MEMORY_ENGINES pop, matching the Robot helper's pattern. (container_resolve_crash_steps.py:131)

#2 — Dead @then decorator on shared assertion helper
Fixed. Removed the @then("cr647- the command should execute successfully") decorator and renamed the function to _assert_command_succeeded() to signal it's a private helper, not a step. Updated all three callers.

#3plan correct assertions weaker than tree/explain
Fixed. Both Behave and Robot assertions now validate command-specific content: "revert" or "dry" must appear in the output (case-insensitive), strengthening the regression guard beyond just decision-id presence. (container_resolve_crash_steps.py:286-289, helper_container_resolve_crash.py:187-192)

#4get_container() call is a no-op with misleading comment
Fixed. Updated the comment to: # Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database. This accurately describes the singleton-warming side effect.

#5Settings.reset() has no test-only documentation guard
Fixed. Added .. warning:: **Test use only.** Do not call in production code paths — resetting the singleton mid-flight can cause inconsistent configuration state. to the docstring. (settings.py:640-642)

#6 — Missing type annotations on 4 new settings step functions
Fixed. Added context: Context, expected: str, and -> None type hints to all four new step functions in settings_steps.py. Also added from __future__ import annotations and from behave.runner import Context imports.

#7 — CONTRIBUTORS.md not in alphabetical order
Fixed. Moved Aditya Chhabra to its alphabetically correct position (first) and removed a duplicate Rui Hu entry.

#9plan correct uses undocumented --plan flag
Acknowledged. Both the Behave step and Robot helper already had inline comments documenting this spec drift. Updated the Robot helper comment to be more explicit about the spec-vs-implementation gap. Will track separately.

Fixed (Nits)

#10glob.glob instead of pathlib convention
Fixed. Replaced from glob import glob + glob(f"{ctx.db_path}*") with explicit suffix iteration: for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True). (helper_container_resolve_crash.py:147-149)

#11 — Robot test cases repeat boilerplate
Acknowledged as acceptable for 3 test cases per your own recommendation. No change.

#12 — Feature description awkward line break
Fixed. Reflowed mock get_container() to keep it on the same line. (container_resolve_crash.feature:7-8)

#13 — Redundant TYPE_CHECKING import for Decision
Fixed. Removed the module-level TYPE_CHECKING guard and Decision import. Added Decision to the in-function import from cleveragents.domain.models.core.decision where it's actually used.

Quality Gates

  • nox -e lint: PASS
  • nox -e typecheck: PASS (0 errors)
  • nox -e unit_tests: All PR-related features pass (3 pre-existing failures in unrelated checkpoint/git_tools features)
  • nox -e integration_tests: Container Resolve Crash robot test passes (17 pre-existing failures in unrelated features)
  • nox -e coverage_report: 98.22% (above 97% threshold)

PR title also updated to match commit message.
Closing the pr, all the issues resolved successfully, the pr has been approved by both @hamza.khyari and @hurui200320 .

## Response to @hurui200320 Review (Round 5) Thanks for the thorough review and the approval. All 9 minor findings and 4 nits have been addressed in the amended commit. Here's the summary: ### Fixed (Minor Issues) **#1 — Behave step leaks SQLAlchemy engine** Fixed. The cleanup closure now explicitly captures `uow` and calls `uow.engine.dispose()` before the `MEMORY_ENGINES` pop, matching the Robot helper's pattern. (`container_resolve_crash_steps.py:131`) **#2 — Dead `@then` decorator on shared assertion helper** Fixed. Removed the `@then("cr647- the command should execute successfully")` decorator and renamed the function to `_assert_command_succeeded()` to signal it's a private helper, not a step. Updated all three callers. **#3 — `plan correct` assertions weaker than tree/explain** Fixed. Both Behave and Robot assertions now validate command-specific content: `"revert"` or `"dry"` must appear in the output (case-insensitive), strengthening the regression guard beyond just decision-id presence. (`container_resolve_crash_steps.py:286-289`, `helper_container_resolve_crash.py:187-192`) **#4 — `get_container()` call is a no-op with misleading comment** Fixed. Updated the comment to: `# Pre-initialize the container singleton so CLI commands get DI wiring from the seeded database.` This accurately describes the singleton-warming side effect. **#5 — `Settings.reset()` has no test-only documentation guard** Fixed. Added `.. warning:: **Test use only.** Do not call in production code paths — resetting the singleton mid-flight can cause inconsistent configuration state.` to the docstring. (`settings.py:640-642`) **#6 — Missing type annotations on 4 new settings step functions** Fixed. Added `context: Context`, `expected: str`, and `-> None` type hints to all four new step functions in `settings_steps.py`. Also added `from __future__ import annotations` and `from behave.runner import Context` imports. **#7 — CONTRIBUTORS.md not in alphabetical order** Fixed. Moved `Aditya Chhabra` to its alphabetically correct position (first) and removed a duplicate `Rui Hu` entry. **#9 — `plan correct` uses undocumented `--plan` flag** Acknowledged. Both the Behave step and Robot helper already had inline comments documenting this spec drift. Updated the Robot helper comment to be more explicit about the spec-vs-implementation gap. Will track separately. ### Fixed (Nits) **#10 — `glob.glob` instead of `pathlib` convention** Fixed. Replaced `from glob import glob` + `glob(f"{ctx.db_path}*")` with explicit suffix iteration: `for suffix in ("", "-journal", "-wal", "-shm"): Path(ctx.db_path + suffix).unlink(missing_ok=True)`. (`helper_container_resolve_crash.py:147-149`) **#11 — Robot test cases repeat boilerplate** Acknowledged as acceptable for 3 test cases per your own recommendation. No change. **#12 — Feature description awkward line break** Fixed. Reflowed `mock get_container()` to keep it on the same line. (`container_resolve_crash.feature:7-8`) **#13 — Redundant `TYPE_CHECKING` import for `Decision`** Fixed. Removed the module-level `TYPE_CHECKING` guard and `Decision` import. Added `Decision` to the in-function import from `cleveragents.domain.models.core.decision` where it's actually used. ### Quality Gates - `nox -e lint`: PASS - `nox -e typecheck`: PASS (0 errors) - `nox -e unit_tests`: All PR-related features pass (3 pre-existing failures in unrelated checkpoint/git_tools features) - `nox -e integration_tests`: Container Resolve Crash robot test passes (17 pre-existing failures in unrelated features) - `nox -e coverage_report`: 98.22% (above 97% threshold) PR title also updated to match commit message. Closing the pr, all the issues resolved successfully, the pr has been approved by both @hamza.khyari and @hurui200320 .
aditya scheduled this pull request to auto merge when all checks succeed 2026-03-24 09:21:13 +00:00
aditya canceled auto merging this pull request when all checks succeed 2026-03-24 09:21:40 +00:00
aditya merged commit a854de7e1f into master 2026-03-24 09:25:49 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core!1053
No description provided.