test(cli): add failing tests for session create DI container error (#570) #595

2026-03-05T02:05:03Z

brent.edwards commented

2026-03-05 02:05:03 +00:00

Summary

Add TDD regression tests for agents session create DI container wiring error. _get_session_service() calls container.db() but the DI Container class has no db provider, raising AttributeError. Same root cause as #554.

This is a test-only PR — the bug fix will be implemented separately by the fix assignee, who will remove the @tdd_expected_fail tags.

Changes

TDD regression tests:

4 Behave BDD scenarios (features/session_create_error.feature) tagged @tdd_bug @tdd_bug_570 @tdd_expected_fail
Robot Framework integration smoke tests (robot/session_create_error.robot) with --format plain and tdd_expected_fail tags
ASV service-layer benchmarks (benchmarks/session_create_error_bench.py)
All CLI invocations use --format plain for reliable CliRunner capture (addresses F1)
Benchmark docstrings clarify they measure service-layer baseline, not DI path (addresses F2)

@tdd_expected_fail infrastructure (implements CONTRIBUTING.md § TDD Bug Test Tags):

Behave: after_scenario hook in features/environment.py inverts pass/fail for @tdd_expected_fail scenarios
Robot Framework: robot/tdd_expected_fail_listener.py (Listener API v3) performs the same inversion
noxfile.py: registers listener via --listener in integration_tests session

Tag migration (18 existing scenarios across 5 feature files):

features/cli_init_yes_flag.feature — 5 scenarios: @tdd @bug522 → @tdd_bug @tdd_bug_522
features/resource_type_bootstrap_fs.feature — 3 scenarios: @tdd @bug523 → @tdd_bug @tdd_bug_523
features/resource_type_bootstrap_git.feature — 3 scenarios: @tdd @bug524 → @tdd_bug @tdd_bug_524
features/project_create_persist.feature — 4 scenarios: @tdd @bug589 → @tdd_bug @tdd_bug_589
features/project_show_after_create.feature — 3 scenarios: @tdd @bug590 → @tdd_bug @tdd_bug_590

Review feedback addressed

From CoreRasurae (Review #1980)

F1: Fixed — all CLI invocations use --format plain
F2: Fixed — benchmark docstrings clarify service-layer-only scope
F3: Not addressed — error handling scenario is out of scope for this TDD PR
F4: Acknowledged — branch name mismatch; not changing to avoid disruption
F5: Fixed — using --format plain avoids Rich output path
F6: Fixed — dead _counter removed
F7: Fixed — shared engine/sessionmaker in setup()
F8: Fixed — Robot tests tagged tdd_bug tdd_bug_570 tdd_expected_fail
F9: Acknowledged — filename mismatch with issue subtask

From hurui200320 (Review #2008)

F1: Fixed — single squashed commit, no merge commits
F2: Fixed — single commit
F3: Acknowledged — branch name mismatch
F4: Fixed — Refs: #570 (not ISSUES CLOSED)
F5: Addressed — Type/Testing is appropriate for test-only PR
F6: Fixed — scenario renamed to "Session create with arbitrary actor name succeeds"
F7: Fixed — exit code assertion added to Scenario #2
F8: Fixed — Robot tests use --format plain
F9: Fixed — benchmark track_create_persists docstring updated
F10: Fixed — cleanup restores context.sce_original_service
F11: Fixed — no # type: ignore in benchmark

From hamza.khyari (Review #2034)

PROC-1: Fixed — @tdd_expected_fail inversion infrastructure replaces @wip exclusion need; scenarios pass CI via inversion
PROC-2: Fixed — Robot tdd_expected_fail listener handles inversion; --exclude wip already in noxfile
PROC-3: Fixed — commit message uses test(cli): prefix
TEST-1: Already correct — Robot test uses lowercase total:
SEC-1: Fixed — _cleanup_sce calls _reset_session_service() then reset_container()
SEC-2: Acknowledged — latent type mismatch is for the fix author
PROC-4: Acknowledged — branch name mismatch
PROC-5: Type/Testing label is appropriate for test-only PR
TEST-2: Acknowledged — error handling scenario out of scope for TDD PR
SEC-3: Fixed — cleanup registered early in _setup_real_di_path
BENCH-1: Acknowledged — pre-existing ASV limitation
PROC-6: Acknowledged — benchmark filename

From freemo (Review #2060)

F1: Fixed — @tdd_expected_fail tag added with full inversion infrastructure
F2: Fixed — PR body updated to match actual tags
F3: Fixed — comment added explaining private _service access rationale
F4: Clean — no # type: ignore in benchmark
F5: Clean — cleanup is well-ordered
F6: Correct — uses Refs: #570

Process

Single squashed commit, rebased onto master (83f2f3a0)
Conventional commit format with Refs: #570 footer (test-only, does not close issue)
nox -s lint ✓ | nox -s typecheck 0 errors ✓
Imports moved to top of file; buried imports eliminated

Refs: #570

## Summary Refs #570 Add TDD regression tests for `agents session create` DI container wiring error. `_get_session_service()` calls `container.db()` but the DI `Container` class has no `db` provider, raising `AttributeError`. Same root cause as #554. **This is a test-only PR** — the bug fix will be implemented separately by the fix assignee, who will remove the `@tdd_expected_fail` tags. ## Changes **TDD regression tests**: - 4 Behave BDD scenarios (`features/session_create_error.feature`) tagged `@tdd_bug @tdd_bug_570 @tdd_expected_fail` - Robot Framework integration smoke tests (`robot/session_create_error.robot`) with `--format plain` and `tdd_expected_fail` tags - ASV service-layer benchmarks (`benchmarks/session_create_error_bench.py`) - All CLI invocations use `--format plain` for reliable `CliRunner` capture (addresses F1) - Benchmark docstrings clarify they measure service-layer baseline, not DI path (addresses F2) **`@tdd_expected_fail` infrastructure** (implements CONTRIBUTING.md § TDD Bug Test Tags): - Behave: `after_scenario` hook in `features/environment.py` inverts pass/fail for `@tdd_expected_fail` scenarios - Robot Framework: `robot/tdd_expected_fail_listener.py` (Listener API v3) performs the same inversion - `noxfile.py`: registers listener via `--listener` in `integration_tests` session **Tag migration** (18 existing scenarios across 5 feature files): - `features/cli_init_yes_flag.feature` — 5 scenarios: `@tdd @bug522` → `@tdd_bug @tdd_bug_522` - `features/resource_type_bootstrap_fs.feature` — 3 scenarios: `@tdd @bug523` → `@tdd_bug @tdd_bug_523` - `features/resource_type_bootstrap_git.feature` — 3 scenarios: `@tdd @bug524` → `@tdd_bug @tdd_bug_524` - `features/project_create_persist.feature` — 4 scenarios: `@tdd @bug589` → `@tdd_bug @tdd_bug_589` - `features/project_show_after_create.feature` — 3 scenarios: `@tdd @bug590` → `@tdd_bug @tdd_bug_590` ## Review feedback addressed ### From CoreRasurae (Review #1980) - **F1**: Fixed — all CLI invocations use `--format plain` - **F2**: Fixed — benchmark docstrings clarify service-layer-only scope - **F3**: Not addressed — error handling scenario is out of scope for this TDD PR - **F4**: Acknowledged — branch name mismatch; not changing to avoid disruption - **F5**: Fixed — using `--format plain` avoids Rich output path - **F6**: Fixed — dead `_counter` removed - **F7**: Fixed — shared engine/sessionmaker in `setup()` - **F8**: Fixed — Robot tests tagged `tdd_bug tdd_bug_570 tdd_expected_fail` - **F9**: Acknowledged — filename mismatch with issue subtask ### From hurui200320 (Review #2008) - **F1**: Fixed — single squashed commit, no merge commits - **F2**: Fixed — single commit - **F3**: Acknowledged — branch name mismatch - **F4**: Fixed — `Refs: #570` (not `ISSUES CLOSED`) - **F5**: Addressed — `Type/Testing` is appropriate for test-only PR - **F6**: Fixed — scenario renamed to "Session create with arbitrary actor name succeeds" - **F7**: Fixed — exit code assertion added to Scenario #2 - **F8**: Fixed — Robot tests use `--format plain` - **F9**: Fixed — benchmark `track_create_persists` docstring updated - **F10**: Fixed — cleanup restores `context.sce_original_service` - **F11**: Fixed — no `# type: ignore` in benchmark ### From hamza.khyari (Review #2034) - **PROC-1**: Fixed — `@tdd_expected_fail` inversion infrastructure replaces `@wip` exclusion need; scenarios pass CI via inversion - **PROC-2**: Fixed — Robot `tdd_expected_fail` listener handles inversion; `--exclude wip` already in noxfile - **PROC-3**: Fixed — commit message uses `test(cli):` prefix - **TEST-1**: Already correct — Robot test uses lowercase `total:` - **SEC-1**: Fixed — `_cleanup_sce` calls `_reset_session_service()` then `reset_container()` - **SEC-2**: Acknowledged — latent type mismatch is for the fix author - **PROC-4**: Acknowledged — branch name mismatch - **PROC-5**: `Type/Testing` label is appropriate for test-only PR - **TEST-2**: Acknowledged — error handling scenario out of scope for TDD PR - **SEC-3**: Fixed — cleanup registered early in `_setup_real_di_path` - **BENCH-1**: Acknowledged — pre-existing ASV limitation - **PROC-6**: Acknowledged — benchmark filename ### From freemo (Review #2060) - **F1**: Fixed — `@tdd_expected_fail` tag added with full inversion infrastructure - **F2**: Fixed — PR body updated to match actual tags - **F3**: Fixed — comment added explaining private `_service` access rationale - **F4**: Clean — no `# type: ignore` in benchmark - **F5**: Clean — cleanup is well-ordered - **F6**: Correct — uses `Refs: #570` ## Process - Single squashed commit, rebased onto `master` (`83f2f3a0`) - Conventional commit format with `Refs: #570` footer (test-only, does not close issue) - `nox -s lint` ✓ | `nox -s typecheck` 0 errors ✓ - Imports moved to top of file; buried imports eliminated Refs: #570

brent.edwards added the

Type

Testing

label 2026-03-05 02:05:11 +00:00

brent.edwards added this to the v3.2.0 milestone 2026-03-05 02:05:20 +00:00

brent.edwards referenced this pull request

2026-03-05 02:05:48 +00:00

`agents session create` causes an error. #570

brent.edwards reviewed 2026-03-05 02:06:01 +00:00

brent.edwards left a comment

Self-review of TDD failing-test PR for bug #570.

Test design: Tests exercise the real _get_session_service() DI path by resetting _service = None. A file-based SQLite database is prepared with all tables, and CLEVERAGENTS_DATABASE_URL is overridden so the commands can reach the database once the fix is applied.

Step pattern prefixes: All step patterns use the session-create-error prefix to avoid AmbiguousStep collisions with existing session_cli_steps.py.

Expected behavior: All 3 scenarios are expected to FAIL because container.db() raises AttributeError. Same root cause as #554 - the fix for both issues is the same code change. The fix author should branch from this TDD branch so the tests are inherited.

Quality gates: nox -s lint PASS (0 errors), nox -s typecheck PASS (0 errors).

Self-review of TDD failing-test PR for bug #570. **Test design**: Tests exercise the real `_get_session_service()` DI path by resetting `_service = None`. A file-based SQLite database is prepared with all tables, and `CLEVERAGENTS_DATABASE_URL` is overridden so the commands can reach the database once the fix is applied. **Step pattern prefixes**: All step patterns use the `session-create-error` prefix to avoid AmbiguousStep collisions with existing `session_cli_steps.py`. **Expected behavior**: All 3 scenarios are expected to FAIL because `container.db()` raises `AttributeError`. Same root cause as #554 - the fix for both issues is the same code change. The fix author should branch from this TDD branch so the tests are inherited. **Quality gates**: `nox -s lint` PASS (0 errors), `nox -s typecheck` PASS (0 errors).

freemo added the

labels 2026-03-05 03:28:38 +00:00

CoreRasurae was assigned by freemo

2026-03-05 03:32:35 +00:00

freemo referenced this pull request

2026-03-05 03:33:10 +00:00

`agents session create` causes an error. #570

freemo reviewed 2026-03-05 03:36:06 +00:00

freemo left a comment

PM Review — Day 25

Status

Mergeable — no conflicts.
Self-review from Brent only. No external review.

Context

TDD test PR for #570 (session create DI container error). Same root cause as #554 — missing db provider in the DI container. Tests exercise the real _get_session_service() path.

Action Item

@CoreRasurae: Please review this PR. The fix will be implemented by @freemo on the container side.

Priority

Medium — blocks M3 acceptance gate. Linked to #554/#596.

## PM Review — Day 25 ### Status - **Mergeable** — no conflicts. - Self-review from Brent only. **No external review.** ### Context TDD test PR for #570 (session create DI container error). Same root cause as #554 — missing `db` provider in the DI container. Tests exercise the real `_get_session_service()` path. ### Action Item **@CoreRasurae**: Please review this PR. The fix will be implemented by @freemo on the container side. ### Priority Medium — blocks M3 acceptance gate. Linked to #554/#596.

CoreRasurae requested changes 2026-03-05 17:56:07 +00:00

CoreRasurae left a comment

Code Review -- `test(cli): add failing tests for session create DI container error`

Commit: 9161f6c7
Issue: #570
Spec reference: docs/specification.md -- agents session create (line 1457+)

Summary

The TDD approach is sound -- writing failing tests first before the fix is the right methodology. The Behave, Robot, and ASV test scaffolding covers the core happy-path scenarios. However, there are two high-severity test design flaws that will cause the tests to produce unreliable signals once the fix is applied, and several medium/low findings around spec alignment, coverage gaps, and process.

Findings

F1 -- Rich Console Output Will Not Be Captured by CliRunner [HIGH -- Test Flaw]

Files: features/steps/session_create_error_steps.py:92, features/session_create_error.feature:14

The Behave steps invoke the session CLI without a --format flag:

context.sce_result = runner.invoke(session_app, ["create"])

This means the create command takes the default fmt="rich" path (session.py:144), which uses the module-level console = Console() (session.py:35) to print output via console.print(Panel(...)).

The problem: Console() is constructed at module import time, capturing a reference to sys.stdout at that point. When CliRunner later replaces sys.stdout with a capture buffer, the Console instance still writes to the original stdout. This means result.output will not contain the Rich Panel output ("Session Created"), and the assertion on feature line 14 will fail even after the bug is fixed.

Recommendation: Invoke with --format plain or --format json so that the output goes through typer.echo() (which CliRunner captures reliably), and adjust assertions to match the plain/json output format. This affects all three Behave scenarios.

F2 -- Benchmarks Bypass the Actual Bug Path [HIGH -- Test Flaw]

File: benchmarks/session_create_error_bench.py:44-52

The docstring states these benchmarks are "expected to error or produce anomalous timings until the fix is applied", but _make_service() constructs PersistentSessionService directly using a hand-built sessionmaker. This completely bypasses _get_session_service() and the DI container -- the exact code path where the bug lives. All three benchmark methods will pass successfully regardless of whether bug #570 is present, directly contradicting the docstring.

Recommendation: Either (a) restructure the benchmark to exercise the real _get_session_service() path (resetting _service = None like the Behave tests do), or (b) update the docstring to accurately state that these benchmarks measure the service layer performance baseline and are not gated on the bug fix.

F3 -- Missing Error Handling Scenario [MEDIUM -- Test Coverage]

File: features/session_create_error.feature

Issue #570 acceptance criteria explicitly require:

Tests (Behave): Add features/session_create_error.feature covering: create, list after create, error handling

The feature file has 3 scenarios (create success, persistence, create-with-actor), but none covers error handling. Missing scenarios include:

Session create with an invalid/nonexistent actor name
Behavior when the database is unreachable
Error output structure matching the spec's error envelope format

F4 -- Branch Name Mismatch With Issue Metadata [MEDIUM -- Process]

Issue #570 metadata specifies branch fix/m3-session-create-error. The actual branch is feature/m3-fix-session-create-error. The issue's Definition of Done states the commit must be pushed to the branch matching the Metadata exactly.

Recommendation: Either rename the branch or update the issue metadata.

F5 -- Specification Alignment: Panel Title and Output String [MEDIUM -- Spec Compliance]

File: features/session_create_error.feature:14

The test asserts "Session Created" which matches the current code (session.py:154 panel title). However, the spec (specification.md:1475) shows the Rich panel title as "Session" (not "Session Created"), and the status message as "Session created" (lowercase c). The test asserts against current code rather than the spec. If the code is later aligned with the spec, these tests break.

Recommendation: Clarify which is canonical and ensure consistency.

F6 -- Dead Code in Benchmark [LOW -- Code Quality]

File: benchmarks/session_create_error_bench.py:66

self._counter = 0 is initialized in setup() but never read or modified anywhere in SessionCreateDISuite.

F7 -- Benchmark Recreates Engine and Schema Per Call [LOW -- Performance]

File: benchmarks/session_create_error_bench.py:46-47

Each time_* method calls _make_service() which runs create_engine() + Base.metadata.create_all() on every invocation. Since setup() already creates the schema, the repeated create_all() calls are redundant and inflate timing measurements with engine construction + DDL introspection overhead.

Recommendation: Create the engine/sessionmaker once in setup() and store as instance attributes.

F8 -- Robot Tests Missing `[Tags]` [LOW -- Test Organization]

File: robot/session_create_error.robot

The Behave scenarios are tagged @tdd @bug570 @wip for selective execution. The Robot test cases have no [Tags], making it impossible to selectively skip these known-failing tests in CI.

Recommendation: Add [Tags] tdd bug570 wip to each test case.

F9 -- Benchmark Filename Mismatch With Issue Subtask [LOW -- Process]

Issue #570 subtask specifies benchmarks/session_create_bench.py. Actual filename is benchmarks/session_create_error_bench.py. Minor naming discrepancy.

Severity Summary

#	Finding	Severity	Category
F1	Rich Console output not captured by CliRunner	HIGH	Test Flaw
F2	Benchmarks bypass the actual bug path	HIGH	Test Flaw
F3	Missing error handling scenario	MEDIUM	Test Coverage
F4	Branch name mismatch with issue metadata	MEDIUM	Process
F5	Panel title/string mismatch with spec	MEDIUM	Spec Compliance
F6	Dead `_counter` in benchmark	LOW	Code Quality
F7	Engine/schema recreated per benchmark call	LOW	Performance
F8	Robot tests missing `[Tags]`	LOW	Test Organization
F9	Benchmark filename mismatch with issue	LOW	Process

Requesting changes on F1 and F2 -- without addressing these, the tests will produce false failures (F1) or false passes (F2) once the fix is applied, defeating the purpose of the TDD cycle.

# Code Review -- `test(cli): add failing tests for session create DI container error` **Commit:** `9161f6c7` **Issue:** #570 **Spec reference:** `docs/specification.md` -- `agents session create` (line 1457+) --- ## Summary The TDD approach is sound -- writing failing tests first before the fix is the right methodology. The Behave, Robot, and ASV test scaffolding covers the core happy-path scenarios. However, there are **two high-severity test design flaws** that will cause the tests to produce unreliable signals once the fix is applied, and several medium/low findings around spec alignment, coverage gaps, and process. --- ## Findings ### F1 -- Rich Console Output Will Not Be Captured by CliRunner [HIGH -- Test Flaw] **Files:** `features/steps/session_create_error_steps.py:92`, `features/session_create_error.feature:14` The Behave steps invoke the session CLI without a `--format` flag: ```python context.sce_result = runner.invoke(session_app, ["create"]) ``` This means the `create` command takes the default `fmt="rich"` path (`session.py:144`), which uses the module-level `console = Console()` (`session.py:35`) to print output via `console.print(Panel(...))`. The problem: `Console()` is constructed at **module import time**, capturing a reference to `sys.stdout` at that point. When `CliRunner` later replaces `sys.stdout` with a capture buffer, the `Console` instance still writes to the **original** stdout. This means `result.output` will **not** contain the Rich Panel output (`"Session Created"`), and the assertion on feature line 14 will fail **even after the bug is fixed**. **Recommendation:** Invoke with `--format plain` or `--format json` so that the output goes through `typer.echo()` (which CliRunner captures reliably), and adjust assertions to match the plain/json output format. This affects all three Behave scenarios. --- ### F2 -- Benchmarks Bypass the Actual Bug Path [HIGH -- Test Flaw] **File:** `benchmarks/session_create_error_bench.py:44-52` The docstring states these benchmarks are *"expected to error or produce anomalous timings until the fix is applied"*, but `_make_service()` constructs `PersistentSessionService` **directly** using a hand-built `sessionmaker`. This completely bypasses `_get_session_service()` and the DI container -- the exact code path where the bug lives. All three benchmark methods will **pass successfully** regardless of whether bug #570 is present, directly contradicting the docstring. **Recommendation:** Either (a) restructure the benchmark to exercise the real `_get_session_service()` path (resetting `_service = None` like the Behave tests do), or (b) update the docstring to accurately state that these benchmarks measure the **service layer** performance baseline and are not gated on the bug fix. --- ### F3 -- Missing Error Handling Scenario [MEDIUM -- Test Coverage] **File:** `features/session_create_error.feature` Issue #570 acceptance criteria explicitly require: > Tests (Behave): Add `features/session_create_error.feature` covering: create, list after create, **error handling** The feature file has 3 scenarios (create success, persistence, create-with-actor), but none covers error handling. Missing scenarios include: - Session create with an invalid/nonexistent actor name - Behavior when the database is unreachable - Error output structure matching the spec's error envelope format --- ### F4 -- Branch Name Mismatch With Issue Metadata [MEDIUM -- Process] Issue #570 metadata specifies branch `fix/m3-session-create-error`. The actual branch is `feature/m3-fix-session-create-error`. The issue's Definition of Done states the commit must be pushed to the branch matching the Metadata exactly. **Recommendation:** Either rename the branch or update the issue metadata. --- ### F5 -- Specification Alignment: Panel Title and Output String [MEDIUM -- Spec Compliance] **File:** `features/session_create_error.feature:14` The test asserts `"Session Created"` which matches the current code (`session.py:154` panel title). However, the spec (`specification.md:1475`) shows the Rich panel title as `"Session"` (not `"Session Created"`), and the status message as `"Session created"` (lowercase `c`). The test asserts against current code rather than the spec. If the code is later aligned with the spec, these tests break. **Recommendation:** Clarify which is canonical and ensure consistency. --- ### F6 -- Dead Code in Benchmark [LOW -- Code Quality] **File:** `benchmarks/session_create_error_bench.py:66` `self._counter = 0` is initialized in `setup()` but never read or modified anywhere in `SessionCreateDISuite`. --- ### F7 -- Benchmark Recreates Engine and Schema Per Call [LOW -- Performance] **File:** `benchmarks/session_create_error_bench.py:46-47` Each `time_*` method calls `_make_service()` which runs `create_engine()` + `Base.metadata.create_all()` on every invocation. Since `setup()` already creates the schema, the repeated `create_all()` calls are redundant and inflate timing measurements with engine construction + DDL introspection overhead. **Recommendation:** Create the engine/sessionmaker once in `setup()` and store as instance attributes. --- ### F8 -- Robot Tests Missing `[Tags]` [LOW -- Test Organization] **File:** `robot/session_create_error.robot` The Behave scenarios are tagged `@tdd @bug570 @wip` for selective execution. The Robot test cases have no `[Tags]`, making it impossible to selectively skip these known-failing tests in CI. **Recommendation:** Add `[Tags] tdd bug570 wip` to each test case. --- ### F9 -- Benchmark Filename Mismatch With Issue Subtask [LOW -- Process] Issue #570 subtask specifies `benchmarks/session_create_bench.py`. Actual filename is `benchmarks/session_create_error_bench.py`. Minor naming discrepancy. --- ## Severity Summary | # | Finding | Severity | Category | |---|---------|----------|----------| | F1 | Rich Console output not captured by CliRunner | **HIGH** | Test Flaw | | F2 | Benchmarks bypass the actual bug path | **HIGH** | Test Flaw | | F3 | Missing error handling scenario | MEDIUM | Test Coverage | | F4 | Branch name mismatch with issue metadata | MEDIUM | Process | | F5 | Panel title/string mismatch with spec | MEDIUM | Spec Compliance | | F6 | Dead `_counter` in benchmark | LOW | Code Quality | | F7 | Engine/schema recreated per benchmark call | LOW | Performance | | F8 | Robot tests missing `[Tags]` | LOW | Test Organization | | F9 | Benchmark filename mismatch with issue | LOW | Process | **Requesting changes** on F1 and F2 -- without addressing these, the tests will produce false failures (F1) or false passes (F2) once the fix is applied, defeating the purpose of the TDD cycle.

benchmarks/session_create_error_bench.py Outdated

						
				@@ -0,0 +1,93 @@

				"""ASV benchmarks for session create DI wiring (bug #570).

CoreRasurae commented

2026-03-05 17:56:06 +00:00

F2 [HIGH]: _make_service() constructs PersistentSessionService directly by building its own create_engine() + sessionmaker(), completely bypassing _get_session_service() and the DI container. The bug (#570) is that container.db() raises AttributeError -- but this function never calls container.db(). All three benchmark methods (time_create_session, time_create_with_actor, track_create_persists) will pass successfully regardless of whether the bug is present, contradicting the module docstring claim that they are "expected to error or produce anomalous timings until the fix is applied".

The Behave tests correctly exercise the broken DI path by resetting _service = None. These benchmarks should either do the same, or the docstring should be corrected to state they measure the service-layer baseline only.

**F2 [HIGH]:** `_make_service()` constructs `PersistentSessionService` directly by building its own `create_engine()` + `sessionmaker()`, **completely bypassing** `_get_session_service()` and the DI container. The bug (#570) is that `container.db()` raises `AttributeError` -- but this function never calls `container.db()`. All three benchmark methods (`time_create_session`, `time_create_with_actor`, `track_create_persists`) will **pass successfully regardless of whether the bug is present**, contradicting the module docstring claim that they are *"expected to error or produce anomalous timings until the fix is applied"*. The Behave tests correctly exercise the broken DI path by resetting `_service = None`. These benchmarks should either do the same, or the docstring should be corrected to state they measure the service-layer baseline only.

benchmarks/session_create_error_bench.py Outdated

						
				@@ -0,0 +1,93 @@

				"""ASV benchmarks for session create DI wiring (bug #570).

				Measures the cost of creating a session through the ``PersistentSessionService``,

CoreRasurae commented

2026-03-05 17:56:06 +00:00

F7 [LOW -- Performance]: _make_service() runs create_engine() + Base.metadata.create_all() on every time_* call. Since setup() already creates the schema (lines 63-64), the repeated create_all() calls are redundant and inflate timing measurements with engine construction + DDL introspection overhead.

Consider building the engine and sessionmaker once in setup() and storing them as instance attributes, then using them in each time_* method.

**F7 [LOW -- Performance]:** `_make_service()` runs `create_engine()` + `Base.metadata.create_all()` on every `time_*` call. Since `setup()` already creates the schema (lines 63-64), the repeated `create_all()` calls are redundant and inflate timing measurements with engine construction + DDL introspection overhead. Consider building the engine and `sessionmaker` once in `setup()` and storing them as instance attributes, then using them in each `time_*` method.

benchmarks/session_create_error_bench.py Outdated

						
				@@ -0,0 +20,4 @@

				from sqlalchemy import create_engine

				from sqlalchemy.orm import sessionmaker

				try:

CoreRasurae commented

2026-03-05 17:56:06 +00:00

F6 [LOW -- Dead code]: self._counter is initialized here but never read or modified anywhere in the class. Remove it.

**F6 [LOW -- Dead code]:** `self._counter` is initialized here but never read or modified anywhere in the class. Remove it.

features/session_create_error.feature Outdated

						
				@@ -0,0 +11,4 @@

				  Scenario: Session create produces a new session

				    When I invoke session-create-error create with no arguments

				    Then the session-create-error command should exit successfully

				    And the session-create-error output should contain "Session Created"

CoreRasurae commented

2026-03-05 17:56:06 +00:00

F5 [MEDIUM -- Spec Compliance]: The assertion checks for "Session Created" which matches the current code's panel title (session.py:154). However, the spec at specification.md:1475 shows the Rich panel title as "Session" (not "Session Created"), and the status message as "Session created" (lowercase c). If the code is later aligned with the spec, this assertion breaks.

Also see F1: even setting aside the string value, this assertion will likely never match because Rich Console output isn't captured by CliRunner (see comment on session_create_error_steps.py).

**F5 [MEDIUM -- Spec Compliance]:** The assertion checks for `"Session Created"` which matches the current code's panel title (`session.py:154`). However, the spec at `specification.md:1475` shows the Rich panel title as `"Session"` (not `"Session Created"`), and the status message as `"Session created"` (lowercase `c`). If the code is later aligned with the spec, this assertion breaks. Also see **F1**: even setting aside the string value, this assertion will likely never match because Rich Console output isn't captured by CliRunner (see comment on `session_create_error_steps.py`).

features/session_create_error.feature Outdated

						
				@@ -0,0 +23,4 @@

				  Scenario: Session create with custom actor succeeds

				    When I invoke session-create-error create with actor "openai/gpt-4"

				    Then the session-create-error command should exit successfully

				    And the session-create-error output should contain "openai/gpt-4"

CoreRasurae commented

2026-03-05 17:56:06 +00:00

F3 [MEDIUM -- Test Coverage]: Issue #570 acceptance criteria require this feature to cover "create, list after create, error handling". There is no error-handling scenario. Consider adding at least:

Session create with an invalid actor name
Behavior when the database path is invalid/unreachable
Verification that the error output structure matches the spec's error envelope

**F3 [MEDIUM -- Test Coverage]:** Issue #570 acceptance criteria require this feature to cover *"create, list after create, **error handling**"*. There is no error-handling scenario. Consider adding at least: - Session create with an invalid actor name - Behavior when the database path is invalid/unreachable - Verification that the error output structure matches the spec's error envelope

features/steps/session_create_error_steps.py Outdated

						
				@@ -0,0 +8,4 @@

				~~~~~~~~~~~~~~~~

				The bug is that ``_get_session_service()`` calls ``container.db()`` but the

				DI ``Container`` class has no ``db`` provider.  This raises

				``AttributeError: 'DynamicContainer' object has no attribute 'db'`` on every

CoreRasurae commented

2026-03-05 17:56:06 +00:00

F1 [HIGH]: This invocation uses the default fmt="rich" path, which writes output via the module-level console = Console() in session.py:35. That Console captures sys.stdout at import time -- before CliRunner replaces it with a capture buffer. As a result, result.output will not contain the Rich Panel text ("Session Created"), and the assertion in the feature file (line 14) will produce a false failure even after the bug is fixed.

Same issue applies to the create with actor step at line 97.

Fix: Pass --format plain (or json) so the output goes through typer.echo(), which CliRunner captures reliably:

context.sce_result = runner.invoke(session_app, ["create", "--format", "plain"])

Then adjust the feature assertions to match the plain output format (e.g., "[OK] Session created" per the spec).

**F1 [HIGH]:** This invocation uses the default `fmt="rich"` path, which writes output via the module-level `console = Console()` in `session.py:35`. That `Console` captures `sys.stdout` at **import time** -- before `CliRunner` replaces it with a capture buffer. As a result, `result.output` will not contain the Rich Panel text (`"Session Created"`), and the assertion in the feature file (line 14) will produce a **false failure even after the bug is fixed**. Same issue applies to the `create with actor` step at line 97. **Fix:** Pass `--format plain` (or `json`) so the output goes through `typer.echo()`, which CliRunner captures reliably: ```python context.sce_result = runner.invoke(session_app, ["create", "--format", "plain"]) ``` Then adjust the feature assertions to match the plain output format (e.g., `"[OK] Session created"` per the spec).

robot/session_create_error.robot Outdated

						
				@@ -0,0 +12,4 @@

				Suite Teardown    Cleanup Test Environment

				*** Test Cases ***

				Session Create After Init Should Not Error

CoreRasurae commented

2026-03-05 17:56:06 +00:00

F8 [LOW -- Test Organization]: The Behave scenarios are tagged @tdd @bug570 @wip for selective execution/filtering. These Robot test cases have no [Tags], making it impossible to selectively skip these known-failing tests in CI.

Suggestion:

Session Create After Init Should Not Error
    [Tags]    tdd    bug570    wip

**F8 [LOW -- Test Organization]:** The Behave scenarios are tagged `@tdd @bug570 @wip` for selective execution/filtering. These Robot test cases have no `[Tags]`, making it impossible to selectively skip these known-failing tests in CI. **Suggestion:** ```robot Session Create After Init Should Not Error [Tags] tdd bug570 wip ```

brent.edwards commented

2026-03-05 20:42:28 +00:00

Thanks @CoreRasurae for the thorough review — all findings addressed in commit da92f2e6.

Disposition of findings (review #1980)

Fixed

ID	Severity	Resolution
F1	HIGH	Added `--format plain` to all `runner.invoke()` calls (`create`, `list`). Rich Console captures `sys.stdout` at import time, before CliRunner replaces it. Plain format uses `typer.echo()` which CliRunner captures reliably. Feature assertions updated to match plain output format (`session_id:` instead of `Session Created`).
F2	HIGH	Rewrote benchmark docstring: clarifies benchmarks measure service-layer create performance baseline, not DI wiring path. Removed misleading "expected to error" language.
F3	MEDIUM	Added scenario "Session create with invalid actor name fails gracefully" covering the error-handling AC from #570.
F5	MEDIUM	Assertions updated for plain output format per F1 fix. List persistence assertion checks for `total:` and absence of `total: 0`.
F6	LOW	Removed dead `self._counter = 0` from `setup()`.
F7	LOW	Engine + sessionmaker built once in `setup()`, stored as instance attributes. `_make_service()` is now an instance method reusing them. `teardown()` disposes the engine.
F8	LOW	Added `[Tags] tdd bug570 wip` to both Robot test cases.
Imports	LOW	Benchmark switched to canonical `_SRC`/`importlib.reload(cleveragents)` pattern.

Acknowledged (no code change)

ID	Severity	Notes
F4	MEDIUM	Branch name mismatch (`fix/` vs `feature/`). Deliberate — this is a test-only PR so `feature/` prefix was chosen. Cannot rename without recreating the PR. Will coordinate with PM if needed.
F9	LOW	Benchmark filename `session_create_error_bench.py` is more descriptive than the issue subtask's `session_create_bench.py`. Kept as-is.

Verification

nox -s lint — passed (All checks passed!)
nox -s typecheck — passed (0 errors, 0 warnings, 0 informations)

Thanks @CoreRasurae for the thorough review — all findings addressed in commit `da92f2e6`. ## Disposition of findings (review #1980) ### Fixed | ID | Severity | Resolution | |----|----------|------------| | **F1** | HIGH | Added `--format plain` to all `runner.invoke()` calls (`create`, `list`). Rich Console captures `sys.stdout` at import time, before CliRunner replaces it. Plain format uses `typer.echo()` which CliRunner captures reliably. Feature assertions updated to match plain output format (`session_id:` instead of `Session Created`). | | **F2** | HIGH | Rewrote benchmark docstring: clarifies benchmarks measure service-layer create performance baseline, not DI wiring path. Removed misleading "expected to error" language. | | **F3** | MEDIUM | Added scenario "Session create with invalid actor name fails gracefully" covering the error-handling AC from #570. | | **F5** | MEDIUM | Assertions updated for plain output format per F1 fix. List persistence assertion checks for `total:` and absence of `total: 0`. | | **F6** | LOW | Removed dead `self._counter = 0` from `setup()`. | | **F7** | LOW | Engine + sessionmaker built once in `setup()`, stored as instance attributes. `_make_service()` is now an instance method reusing them. `teardown()` disposes the engine. | | **F8** | LOW | Added `[Tags] tdd bug570 wip` to both Robot test cases. | | **Imports** | LOW | Benchmark switched to canonical `_SRC`/`importlib.reload(cleveragents)` pattern. | ### Acknowledged (no code change) | ID | Severity | Notes | |----|----------|-------| | **F4** | MEDIUM | Branch name mismatch (`fix/` vs `feature/`). Deliberate — this is a test-only PR so `feature/` prefix was chosen. Cannot rename without recreating the PR. Will coordinate with PM if needed. | | **F9** | LOW | Benchmark filename `session_create_error_bench.py` is more descriptive than the issue subtask's `session_create_bench.py`. Kept as-is. | ## Verification - `nox -s lint` — passed (All checks passed!) - `nox -s typecheck` — passed (0 errors, 0 warnings, 0 informations)

aditya commented

2026-03-06 07:58:57 +00:00

Code Review -- `test(cli): add failing tests for session create DI container error`

Commit: da92f2e6 (post-resolution of Core's review #1980)
Issue: #570
Branch: feature/m3-fix-session-create-error
Reviewer: Aditya Chhabra

Summary

Test PR follows solid TDD methodology with good responses to Core's initial review. Tests properly exercise the DI path with _service = None reset and use --format plain to ensure CliRunner can capture output. Benchmark docstring was clarified, and Robot tests now have proper tags. However, there are two critical test design issues that will cause false signals once the fix is applied, plus minor coverage and documentation gaps.

Findings

F1 -- Test Scenario #4 Assumes Non-Existent Actor Validation [HIGH -- Test Logic Error]

File: features/session_create_error.feature:29-32

The fourth scenario asserts:

Scenario: Session create with invalid actor name fails gracefully
  When I invoke session-create-error create with actor "nonexistent/bogus-actor-999"
  Then the session-create-error command should exit successfully
  And the session-create-error output should contain "nonexistent/bogus-actor-999"

The scenario name says "fails gracefully" but the assertions require exit code 0 and the actor name to appear in output. This is internally contradictory.

More critically: PersistentSessionService.create() (src/cleveragents/application/services/session_service.py:57-71) does not validate whether the actor exists. It accepts any actor_name string and creates the session. The CLI command (session.py:138-140) calls service.create(actor_name=actor) with no validation wrapper.

Result: The test will pass (exit 0, actor name echoed) regardless of whether the actor exists, making the scenario name misleading and the test uninformative about error handling.

Recommendation: Either (a) remove this scenario entirely since actor validation is not part of the session create contract, or (b) rename it to "Session create with arbitrary actor name succeeds" and update the description to clarify that actor existence is not validated at session creation time.

F2 -- Missing Exit Code Assertion in Scenario #2 [MEDIUM -- Test Rigor]

File: features/session_create_error.feature:17-20

Scenario "Created session persists and can be retrieved" calls create then list, but only asserts on the list result. If the create command fails (non-zero exit), the test will still proceed to list and may pass vacuously if there are leftover sessions from prior test runs or shared test database state.

Scenario: Created session persists and can be retrieved
  When I invoke session-create-error create with no arguments
  And I invoke session-create-error list to verify persistence
  Then the session-create-error list should show at least one session

Recommendation: Add Then the session-create-error command should exit successfully after line 18 to assert the create succeeded before checking persistence.

F3 -- Robot Tests Do Not Use `--format plain` [MEDIUM -- Test Consistency]

File: robot/session_create_error.robot:23, 38

The Robot Framework tests invoke session create without --format flag, meaning they will use the default Rich output format. This is inconsistent with the Behave tests (which use --format plain per Core's F1 fix) and may produce flaky assertions if Rich panel rendering changes.

Line 27 asserts Should Not Contain ${create.stdout} AttributeError which is fragile — if the bug is present, the error message may appear in stderr, not stdout.

Recommendation:

Add --format plain to both session create invocations (lines 23, 38)
Change line 27 to check stderr: Should Not Contain ${create.stderr} AttributeError

F4 -- Benchmark `track_create_persists` Docstring Misleading [LOW -- Documentation]

File: benchmarks/session_create_error_bench.py:84-89

The tracker docstring states:

Returns 1 when the service properly commits (fix applied); 0 while bug #570 is present and the DI path is broken.

But the benchmark constructs PersistentSessionService directly, bypassing the DI container. Bug #570 is in _get_session_service() → container.db() which this code never touches. The tracker will always return 1 (number of sessions created), regardless of whether the DI bug is present.

Core's F2 fix updated the class docstring to clarify the benchmark measures service-layer performance, but the track_create_persists docstring was not updated and still implies it detects the bug.

Recommendation: Revise the docstring:

"""Track session create persistence at the service layer.

Returns the count of sessions created via the service. This does not
exercise the DI container path (bug #570) — it measures whether the
service layer correctly commits sessions to the database.
"""

F5 -- Scenario #3 Does Not Assert on `session_id:` [LOW -- Test Coverage]

File: features/session_create_error.feature:23-26

Scenario "Session create with custom actor succeeds" only asserts that the output contains the actor name. It does not verify that a session was actually created (i.e., no session_id: check like Scenario #1 line 14).

This makes the test weaker — if the command prints the actor name in an error message but doesn't create a session, the test would still pass.

Recommendation: Add a second assertion:

And the session-create-error output should contain "session_id:"

F6 -- Cleanup Does Not Restore Original `_service` State [LOW -- Test Isolation]

File: features/steps/session_create_error_steps.py:72-76

_cleanup_sce() sets session_mod._service = None unconditionally. If a previous test had set _service to a mock, this cleanup will clobber it rather than restoring the original state.

Impact: Low — only matters if tests run in unexpected order and share module state, which is unlikely given the @wip tag isolation.

Recommendation: Store the original _service value in _setup_real_di_path() and restore it in _cleanup_sce():

def _setup_real_di_path(context: Any) -> None:
    context.sce_original_service = session_mod._service
    # ... rest of setup ...

def _cleanup_sce(context: Any) -> None:
    session_mod._service = context.sce_original_service
    # ... rest of cleanup ...

Severity Summary

#	Finding	Severity	Category
F1	Scenario #4 assumes non-existent validation	HIGH	Test Logic Error
F2	Missing exit code assertion in Scenario #2	MEDIUM	Test Rigor
F3	Robot tests do not use `--format plain`	MEDIUM	Test Consistency
F4	Benchmark tracker docstring misleading	LOW	Documentation
F5	Scenario #3 missing `session_id:` assertion	LOW	Test Coverage
F6	Cleanup does not restore original `_service`	LOW	Test Isolation

Requesting changes on F1 (misleading test scenario) and F2 (missing assertion that could cause false passes). F3 recommended for consistency with Behave approach.

# Code Review -- `test(cli): add failing tests for session create DI container error` **Commit:** `da92f2e6` (post-resolution of Core's review #1980) **Issue:** #570 **Branch:** `feature/m3-fix-session-create-error` **Reviewer:** Aditya Chhabra --- ## Summary Test PR follows solid TDD methodology with good responses to Core's initial review. Tests properly exercise the DI path with `_service = None` reset and use `--format plain` to ensure CliRunner can capture output. Benchmark docstring was clarified, and Robot tests now have proper tags. However, there are **two critical test design issues** that will cause false signals once the fix is applied, plus minor coverage and documentation gaps. --- ## Findings ### F1 -- Test Scenario #4 Assumes Non-Existent Actor Validation [HIGH -- Test Logic Error] **File:** `features/session_create_error.feature:29-32` The fourth scenario asserts: ```gherkin Scenario: Session create with invalid actor name fails gracefully When I invoke session-create-error create with actor "nonexistent/bogus-actor-999" Then the session-create-error command should exit successfully And the session-create-error output should contain "nonexistent/bogus-actor-999" ``` The scenario name says "fails gracefully" but the assertions require exit code 0 and the actor name to appear in output. This is internally contradictory. More critically: `PersistentSessionService.create()` (`src/cleveragents/application/services/session_service.py:57-71`) does **not** validate whether the actor exists. It accepts any `actor_name` string and creates the session. The CLI command (`session.py:138-140`) calls `service.create(actor_name=actor)` with no validation wrapper. **Result:** The test will pass (exit 0, actor name echoed) regardless of whether the actor exists, making the scenario name misleading and the test uninformative about error handling. **Recommendation:** Either (a) remove this scenario entirely since actor validation is not part of the session create contract, or (b) rename it to "Session create with arbitrary actor name succeeds" and update the description to clarify that actor existence is not validated at session creation time. --- ### F2 -- Missing Exit Code Assertion in Scenario #2 [MEDIUM -- Test Rigor] **File:** `features/session_create_error.feature:17-20` Scenario "Created session persists and can be retrieved" calls `create` then `list`, but only asserts on the list result. If the `create` command fails (non-zero exit), the test will still proceed to list and may pass vacuously if there are leftover sessions from prior test runs or shared test database state. ```gherkin Scenario: Created session persists and can be retrieved When I invoke session-create-error create with no arguments And I invoke session-create-error list to verify persistence Then the session-create-error list should show at least one session ``` **Recommendation:** Add `Then the session-create-error command should exit successfully` after line 18 to assert the create succeeded before checking persistence. --- ### F3 -- Robot Tests Do Not Use `--format plain` [MEDIUM -- Test Consistency] **File:** `robot/session_create_error.robot:23, 38` The Robot Framework tests invoke `session create` without `--format` flag, meaning they will use the default Rich output format. This is inconsistent with the Behave tests (which use `--format plain` per Core's F1 fix) and may produce flaky assertions if Rich panel rendering changes. Line 27 asserts `Should Not Contain ${create.stdout} AttributeError` which is fragile — if the bug is present, the error message may appear in stderr, not stdout. **Recommendation:** - Add `--format plain` to both `session create` invocations (lines 23, 38) - Change line 27 to check stderr: `Should Not Contain ${create.stderr} AttributeError` --- ### F4 -- Benchmark `track_create_persists` Docstring Misleading [LOW -- Documentation] **File:** `benchmarks/session_create_error_bench.py:84-89` The tracker docstring states: > Returns 1 when the service properly commits (fix applied); 0 while bug #570 is present and the DI path is broken. But the benchmark constructs `PersistentSessionService` directly, bypassing the DI container. Bug #570 is in `_get_session_service() → container.db()` which this code never touches. The tracker will **always** return 1 (number of sessions created), regardless of whether the DI bug is present. Core's F2 fix updated the class docstring to clarify the benchmark measures service-layer performance, but the `track_create_persists` docstring was not updated and still implies it detects the bug. **Recommendation:** Revise the docstring: ```python """Track session create persistence at the service layer. Returns the count of sessions created via the service. This does not exercise the DI container path (bug #570) — it measures whether the service layer correctly commits sessions to the database. """ ``` --- ### F5 -- Scenario #3 Does Not Assert on `session_id:` [LOW -- Test Coverage] **File:** `features/session_create_error.feature:23-26` Scenario "Session create with custom actor succeeds" only asserts that the output contains the actor name. It does not verify that a session was actually created (i.e., no `session_id:` check like Scenario #1 line 14). This makes the test weaker — if the command prints the actor name in an error message but doesn't create a session, the test would still pass. **Recommendation:** Add a second assertion: ```gherkin And the session-create-error output should contain "session_id:" ``` --- ### F6 -- Cleanup Does Not Restore Original `_service` State [LOW -- Test Isolation] **File:** `features/steps/session_create_error_steps.py:72-76` `_cleanup_sce()` sets `session_mod._service = None` unconditionally. If a previous test had set `_service` to a mock, this cleanup will clobber it rather than restoring the original state. **Impact:** Low — only matters if tests run in unexpected order and share module state, which is unlikely given the `@wip` tag isolation. **Recommendation:** Store the original `_service` value in `_setup_real_di_path()` and restore it in `_cleanup_sce()`: ```python def _setup_real_di_path(context: Any) -> None: context.sce_original_service = session_mod._service # ... rest of setup ... def _cleanup_sce(context: Any) -> None: session_mod._service = context.sce_original_service # ... rest of cleanup ... ``` --- ## Severity Summary | # | Finding | Severity | Category | |---|---------|----------|----------| | F1 | Scenario #4 assumes non-existent validation | **HIGH** | Test Logic Error | | F2 | Missing exit code assertion in Scenario #2 | MEDIUM | Test Rigor | | F3 | Robot tests do not use `--format plain` | MEDIUM | Test Consistency | | F4 | Benchmark tracker docstring misleading | LOW | Documentation | | F5 | Scenario #3 missing `session_id:` assertion | LOW | Test Coverage | | F6 | Cleanup does not restore original `_service` | LOW | Test Isolation | **Requesting changes** on F1 (misleading test scenario) and F2 (missing assertion that could cause false passes). F3 recommended for consistency with Behave approach.

hurui200320 requested changes 2026-03-06 08:26:52 +00:00

Dismissed

hurui200320 left a comment

Code Review — `test(cli): add failing tests for session create DI container error`

Commit: e46266ae (HEAD)
Issue: #570
Branch: feature/m3-fix-session-create-error
Reviewer: Rui Hu

Summary

The TDD approach is methodologically sound and the Behave/Robot/ASV scaffolding is well-structured. The response to CoreRasurae's review #1980 addressed most of the original findings. However, there are four HIGH-severity process violations that must be resolved before this can be approved, plus several unresolved findings from Aditya's review.

Findings

F1 — Merge commit on branch [HIGH — Process Violation]

File: branch history, commit e46266ae

The branch contains a merge commit: Merge branch 'master-latest' into feature/m3-fix-session-create-error. Per CONTRIBUTING.md: "Each branch must not contain merge commits — instead as master drifts the branches should always align with master via rebase."

Recommendation: Rebase onto origin/master to linearize the branch history, eliminating the merge commit.

F2 — Multiple commits for the same issue [HIGH — Process Violation]

File: branch history, commits 9161f6c7 and da92f2e6

The branch has two non-merge commits addressing the same issue:

9161f6c7 test(cli): add failing tests for session create DI container error
da92f2e6 fix(test): address CoreRasurae review #1980 findings on session create error tests

Per CONTRIBUTING.md: "Every commit must completely implement an issue and close it. At no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch." The review fixes should be squashed into the original commit.

Recommendation: Squash 9161f6c7 and da92f2e6 into a single commit with the correct commit message.

F3 — Branch name mismatch with issue metadata [HIGH — Process Violation]

Issue #570 Metadata specifies branch fix/m3-session-create-error. The actual branch is feature/m3-fix-session-create-error. The issue's Definition of Done states the commit must be pushed to the branch "matching the Branch in Metadata exactly." This was acknowledged in the response to Core's F4 but dismissed as intentional — the DoD rule does not have an exception for test-only PRs.

Recommendation: Either rename the branch to fix/m3-session-create-error, or update the issue Metadata to match the current branch name.

F4 — PR claims `ISSUES CLOSED: #570` but does not fix the bug [HIGH — Scope/Completeness]

File: PR description body

The PR is a TDD-only PR that adds failing tests but does not implement the actual bug fix (the DI container wiring). Yet the PR description states ISSUES CLOSED: #570. Per CONTRIBUTING.md: "One issue = one commit. Atomic, self-contained, buildable, testable" and "Do not commit half-done work. Only commit when fully implemented and tested."

All acceptance criteria in issue #570 remain unchecked:

[ ] Fix DI container wiring
[ ] Verify session create command
[ ] Fix should address #554 too
[ ] Docs, Tests, Quality

Recommendation: Change ISSUES CLOSED: #570 to Refs: #570 in both the PR description and commit message footer. The issue should only be closed by the PR that actually fixes the bug.

F5 — PR Type label mismatch [MEDIUM — Process]

Issue #570 has Type/Bug. The PR has Type/Testing. Per CONTRIBUTING.md: "A Type/ label matching the issue type." If this PR is linked to close #570, the label should be Type/Bug. If it is a standalone test PR not closing #570 (per F4 recommendation), then Type/Testing is acceptable, but ISSUES CLOSED: #570 must be removed.

Recommendation: Align the label with the decision on F4.

F6 — Scenario #4 is misleading ("fails gracefully" but asserts success) [MEDIUM — Test Logic]

File: features/session_create_error.feature:29-32

Scenario: Session create with invalid actor name fails gracefully
  When I invoke session-create-error create with actor "nonexistent/bogus-actor-999"
  Then the session-create-error command should exit successfully
  And the session-create-error output should contain "nonexistent/bogus-actor-999"

The scenario name says "fails gracefully" but the assertions expect exit code 0. PersistentSessionService.create() does not validate actor names — it accepts any string. This scenario is functionally identical to Scenario #3 and provides no additional error handling coverage. (Also raised by Aditya, F1.)

Recommendation: Either rename to "Session create with arbitrary actor name succeeds" to accurately reflect what it tests, or replace with a genuine error handling scenario (e.g., database unreachable).

F7 — Scenario #2 missing create exit code assertion [MEDIUM — Test Rigor]

File: features/session_create_error.feature:17-20

The "Created session persists and can be retrieved" scenario calls create then list, but never asserts that create succeeded. If create fails silently, the test could pass vacuously. (Also raised by Aditya, F2.)

Recommendation: Add Then the session-create-error command should exit successfully between lines 18 and 19.

F8 — Robot tests don't use `--format plain` [MEDIUM — Test Consistency]

File: robot/session_create_error.robot:23, 38

The Behave tests were fixed (per Core's F1) to use --format plain, but the Robot tests still invoke session create without --format. This inconsistency means Robot tests will hit the Rich output path. Additionally, line 27 checks ${create.stdout} for AttributeError, but the error will likely appear in stderr. (Also raised by Aditya, F3.)

Recommendation:

Add --format plain to both Robot session create invocations.
Change line 27 to check ${create.stderr} for AttributeError.

F9 — Benchmark `track_create_persists` docstring still misleading [LOW — Documentation]

File: benchmarks/session_create_error_bench.py:85-88

"""Track whether create persists and list can see it.

Returns 1 when the service properly commits (fix applied); 0 while
bug #570 is present and the DI path is broken.
"""

The benchmark constructs PersistentSessionService directly, bypassing the DI container entirely. The tracker will always return the session count regardless of bug #570 status. The class-level docstring was correctly updated (per Core's F2), but this method-level docstring was missed. (Also raised by Aditya, F4.)

Recommendation: Update to: "Track session create persistence at the service layer. Returns the count of sessions created. Does not exercise the DI container path."

F10 — Cleanup does not restore original `_service` state [LOW — Test Isolation]

File: features/steps/session_create_error_steps.py:72-74

_cleanup_sce() sets session_mod._service = None unconditionally rather than restoring the original value. (Also raised by Aditya, F6.)

Recommendation: Store the original _service value in _setup_real_di_path() and restore it in _cleanup_sce():

def _setup_real_di_path(context: Any) -> None:
    context.sce_original_service = session_mod._service
    # ... rest of setup ...

def _cleanup_sce(context: Any) -> None:
    session_mod._service = context.sce_original_service
    # ... rest of cleanup ...

F11 — Unnecessary `# type: ignore[attr-defined]` in benchmark [LOW — Code Quality]

File: benchmarks/session_create_error_bench.py:96

SessionCreateDISuite.track_create_persists.unit = "sessions"  # type: ignore[attr-defined]

Pyright is configured with include = ["src"] (in both pyrightconfig.json and pyproject.toml). The benchmarks/ directory is outside the type-checking scope entirely — Pyright never processes this file. The # type: ignore comment is therefore dead noise and should be removed.

Recommendation: Remove the # type: ignore[attr-defined] comment since it has no effect.

Severity Summary

#	Finding	Severity	Category
F1	Merge commit on branch	HIGH	Process
F2	Multiple commits for same issue	HIGH	Process
F3	Branch name mismatch with issue metadata	HIGH	Process
F4	PR claims ISSUES CLOSED but doesn't fix the bug	HIGH	Scope
F5	PR Type label mismatch	MEDIUM	Process
F6	Scenario #4 misleading name/logic	MEDIUM	Test Logic
F7	Missing create assertion in Scenario #2	MEDIUM	Test Rigor
F8	Robot tests don't use `--format plain`	MEDIUM	Test Consistency
F9	Benchmark tracker docstring still misleading	LOW	Documentation
F10	Cleanup doesn't restore original `_service`	LOW	Test Isolation
F11	Unnecessary `# type: ignore` in benchmark	LOW	Code Quality

Requesting changes on F1–F4 (process violations and scope mismatch). F6–F8 are unresolved findings from Aditya's review that should also be addressed in the same pass.

# Code Review — `test(cli): add failing tests for session create DI container error` **Commit:** `e46266ae` (HEAD) **Issue:** #570 **Branch:** `feature/m3-fix-session-create-error` **Reviewer:** Rui Hu --- ## Summary The TDD approach is methodologically sound and the Behave/Robot/ASV scaffolding is well-structured. The response to CoreRasurae's review #1980 addressed most of the original findings. However, there are **four HIGH-severity process violations** that must be resolved before this can be approved, plus several unresolved findings from Aditya's review. --- ## Findings ### F1 — Merge commit on branch [HIGH — Process Violation] **File:** branch history, commit `e46266ae` The branch contains a merge commit: `Merge branch 'master-latest' into feature/m3-fix-session-create-error`. Per CONTRIBUTING.md: *"Each branch must not contain merge commits — instead as master drifts the branches should always align with master via rebase."* **Recommendation:** Rebase onto `origin/master` to linearize the branch history, eliminating the merge commit. --- ### F2 — Multiple commits for the same issue [HIGH — Process Violation] **File:** branch history, commits `9161f6c7` and `da92f2e6` The branch has two non-merge commits addressing the same issue: 1. `9161f6c7 test(cli): add failing tests for session create DI container error` 2. `da92f2e6 fix(test): address CoreRasurae review #1980 findings on session create error tests` Per CONTRIBUTING.md: *"Every commit must completely implement an issue and close it. At no point should a branch contain multiple commits addressing the same issue, nor should it have commits that fix earlier commits present in the same branch."* The review fixes should be squashed into the original commit. **Recommendation:** Squash `9161f6c7` and `da92f2e6` into a single commit with the correct commit message. --- ### F3 — Branch name mismatch with issue metadata [HIGH — Process Violation] Issue #570 Metadata specifies branch `fix/m3-session-create-error`. The actual branch is `feature/m3-fix-session-create-error`. The issue's Definition of Done states the commit must be pushed to the branch *"matching the Branch in Metadata exactly."* This was acknowledged in the response to Core's F4 but dismissed as intentional — the DoD rule does not have an exception for test-only PRs. **Recommendation:** Either rename the branch to `fix/m3-session-create-error`, or update the issue Metadata to match the current branch name. --- ### F4 — PR claims `ISSUES CLOSED: #570` but does not fix the bug [HIGH — Scope/Completeness] **File:** PR description body The PR is a TDD-only PR that adds failing tests but does not implement the actual bug fix (the DI container wiring). Yet the PR description states `ISSUES CLOSED: #570`. Per CONTRIBUTING.md: *"One issue = one commit. Atomic, self-contained, buildable, testable"* and *"Do not commit half-done work. Only commit when fully implemented and tested."* All acceptance criteria in issue #570 remain unchecked: - `[ ]` Fix DI container wiring - `[ ]` Verify session create command - `[ ]` Fix should address #554 too - `[ ]` Docs, Tests, Quality **Recommendation:** Change `ISSUES CLOSED: #570` to `Refs: #570` in both the PR description and commit message footer. The issue should only be closed by the PR that actually fixes the bug. --- ### F5 — PR Type label mismatch [MEDIUM — Process] Issue #570 has `Type/Bug`. The PR has `Type/Testing`. Per CONTRIBUTING.md: *"A `Type/` label matching the issue type."* If this PR is linked to close #570, the label should be `Type/Bug`. If it is a standalone test PR not closing #570 (per F4 recommendation), then `Type/Testing` is acceptable, but `ISSUES CLOSED: #570` must be removed. **Recommendation:** Align the label with the decision on F4. --- ### F6 — Scenario #4 is misleading ("fails gracefully" but asserts success) [MEDIUM — Test Logic] **File:** `features/session_create_error.feature:29-32` ```gherkin Scenario: Session create with invalid actor name fails gracefully When I invoke session-create-error create with actor "nonexistent/bogus-actor-999" Then the session-create-error command should exit successfully And the session-create-error output should contain "nonexistent/bogus-actor-999" ``` The scenario name says "fails gracefully" but the assertions expect exit code 0. `PersistentSessionService.create()` does not validate actor names — it accepts any string. This scenario is functionally identical to Scenario #3 and provides no additional error handling coverage. (Also raised by Aditya, F1.) **Recommendation:** Either rename to "Session create with arbitrary actor name succeeds" to accurately reflect what it tests, or replace with a genuine error handling scenario (e.g., database unreachable). --- ### F7 — Scenario #2 missing create exit code assertion [MEDIUM — Test Rigor] **File:** `features/session_create_error.feature:17-20` The "Created session persists and can be retrieved" scenario calls `create` then `list`, but never asserts that `create` succeeded. If `create` fails silently, the test could pass vacuously. (Also raised by Aditya, F2.) **Recommendation:** Add `Then the session-create-error command should exit successfully` between lines 18 and 19. --- ### F8 — Robot tests don't use `--format plain` [MEDIUM — Test Consistency] **File:** `robot/session_create_error.robot:23, 38` The Behave tests were fixed (per Core's F1) to use `--format plain`, but the Robot tests still invoke `session create` without `--format`. This inconsistency means Robot tests will hit the Rich output path. Additionally, line 27 checks `${create.stdout}` for `AttributeError`, but the error will likely appear in `stderr`. (Also raised by Aditya, F3.) **Recommendation:** - Add `--format` `plain` to both Robot `session create` invocations. - Change line 27 to check `${create.stderr}` for `AttributeError`. --- ### F9 — Benchmark `track_create_persists` docstring still misleading [LOW — Documentation] **File:** `benchmarks/session_create_error_bench.py:85-88` ```python """Track whether create persists and list can see it. Returns 1 when the service properly commits (fix applied); 0 while bug #570 is present and the DI path is broken. """ ``` The benchmark constructs `PersistentSessionService` directly, bypassing the DI container entirely. The tracker will always return the session count regardless of bug #570 status. The class-level docstring was correctly updated (per Core's F2), but this method-level docstring was missed. (Also raised by Aditya, F4.) **Recommendation:** Update to: *"Track session create persistence at the service layer. Returns the count of sessions created. Does not exercise the DI container path."* --- ### F10 — Cleanup does not restore original `_service` state [LOW — Test Isolation] **File:** `features/steps/session_create_error_steps.py:72-74` `_cleanup_sce()` sets `session_mod._service = None` unconditionally rather than restoring the original value. (Also raised by Aditya, F6.) **Recommendation:** Store the original `_service` value in `_setup_real_di_path()` and restore it in `_cleanup_sce()`: ```python def _setup_real_di_path(context: Any) -> None: context.sce_original_service = session_mod._service # ... rest of setup ... def _cleanup_sce(context: Any) -> None: session_mod._service = context.sce_original_service # ... rest of cleanup ... ``` --- ### F11 — Unnecessary `# type: ignore[attr-defined]` in benchmark [LOW — Code Quality] **File:** `benchmarks/session_create_error_bench.py:96` ```python SessionCreateDISuite.track_create_persists.unit = "sessions" # type: ignore[attr-defined] ``` Pyright is configured with `include = ["src"]` (in both `pyrightconfig.json` and `pyproject.toml`). The `benchmarks/` directory is outside the type-checking scope entirely — Pyright never processes this file. The `# type: ignore` comment is therefore dead noise and should be removed. **Recommendation:** Remove the `# type: ignore[attr-defined]` comment since it has no effect. --- ## Severity Summary | # | Finding | Severity | Category | |---|---------|----------|----------| | F1 | Merge commit on branch | **HIGH** | Process | | F2 | Multiple commits for same issue | **HIGH** | Process | | F3 | Branch name mismatch with issue metadata | **HIGH** | Process | | F4 | PR claims ISSUES CLOSED but doesn't fix the bug | **HIGH** | Scope | | F5 | PR Type label mismatch | MEDIUM | Process | | F6 | Scenario #4 misleading name/logic | MEDIUM | Test Logic | | F7 | Missing create assertion in Scenario #2 | MEDIUM | Test Rigor | | F8 | Robot tests don't use `--format plain` | MEDIUM | Test Consistency | | F9 | Benchmark tracker docstring still misleading | LOW | Documentation | | F10 | Cleanup doesn't restore original `_service` | LOW | Test Isolation | | F11 | Unnecessary `# type: ignore` in benchmark | LOW | Code Quality | **Requesting changes** on F1–F4 (process violations and scope mismatch). F6–F8 are unresolved findings from Aditya's review that should also be addressed in the same pass.

hurui200320 added a new dependency 2026-03-06 09:36:49 +00:00

#570 `agents session create` causes an error.

hurui200320 referenced this pull request

2026-03-06 10:09:20 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

brent.edwards force-pushed feature/m3-fix-session-create-error from e46266ae47 to d79ceff177

2026-03-06 21:07:05 +00:00

Compare

brent.edwards commented

2026-03-06 21:07:31 +00:00

Squashed into single commit d79ceff1, rebased onto master — all review findings addressed.

Disposition of findings

Aditya's review (comment #55060)

ID	Severity	Resolution
F1	HIGH	Renamed scenario #4 to "Session create with arbitrary actor name succeeds" — actor validation does not exist at create time, so "fails gracefully" was misleading.
F2	MEDIUM	Added `Then the session-create-error command should exit successfully` after the create step in scenario #2.
F3	MEDIUM	Robot tests now use `--format plain` on both `session create` invocations. `AttributeError` check moved to `${create.stderr}`.
F4	LOW	Updated `track_create_persists` docstring: "Track session create persistence at the service layer. Returns the count of sessions created. Does not exercise the DI container path."
F5	LOW	Scenario #3 now asserts `session_id:` in output.
F6	LOW	Cleanup stores and restores original `_service` via `context.sce_original_service`.

hurui200320's review (#2008)

ID	Severity	Resolution
F1	CRITICAL	Rebased — no merge commits.
F2	CRITICAL	Squashed into single commit.
F3	HIGH	Branch name: unchanged (TDD test PR, not the fix).
F4	HIGH	Changed `ISSUES CLOSED: #570` → `Refs: #570` in both PR body and commit message footer.
F5	MEDIUM	`Type/Testing` label kept — this is a TDD-only PR using `Refs:` not `Closes:`.
F6–F10	MEDIUM–LOW	All addressed via Aditya fix mapping above.
F11	LOW	Removed `# type: ignore` from benchmark (outside Pyright scope).

Additional

Added @unit tag to all 4 scenarios for selective test filtering
Updated TDD comments throughout

Verification

nox -s lint — passed (All checks passed!)
nox -s typecheck — passed (0 errors, 0 warnings, 0 informations)

Squashed into single commit `d79ceff1`, rebased onto `master` — all review findings addressed. ## Disposition of findings ### Aditya's review (comment #55060) | ID | Severity | Resolution | |----|----------|------------| | **F1** | HIGH | Renamed scenario #4 to "Session create with arbitrary actor name succeeds" — actor validation does not exist at create time, so "fails gracefully" was misleading. | | **F2** | MEDIUM | Added `Then the session-create-error command should exit successfully` after the create step in scenario #2. | | **F3** | MEDIUM | Robot tests now use `--format plain` on both `session create` invocations. `AttributeError` check moved to `${create.stderr}`. | | **F4** | LOW | Updated `track_create_persists` docstring: "Track session create persistence at the service layer. Returns the count of sessions created. Does not exercise the DI container path." | | **F5** | LOW | Scenario #3 now asserts `session_id:` in output. | | **F6** | LOW | Cleanup stores and restores original `_service` via `context.sce_original_service`. | ### hurui200320's review (#2008) | ID | Severity | Resolution | |----|----------|------------| | **F1** | CRITICAL | Rebased — no merge commits. | | **F2** | CRITICAL | Squashed into single commit. | | **F3** | HIGH | Branch name: unchanged (TDD test PR, not the fix). | | **F4** | HIGH | Changed `ISSUES CLOSED: #570` → `Refs: #570` in both PR body and commit message footer. | | **F5** | MEDIUM | `Type/Testing` label kept — this is a TDD-only PR using `Refs:` not `Closes:`. | | **F6–F10** | MEDIUM–LOW | All addressed via Aditya fix mapping above. | | **F11** | LOW | Removed `# type: ignore` from benchmark (outside Pyright scope). | ### Additional - Added `@unit` tag to all 4 scenarios for selective test filtering - Updated TDD comments throughout ## Verification - `nox -s lint` — passed (All checks passed!) - `nox -s typecheck` — passed (0 errors, 0 warnings, 0 informations)

freemo commented

2026-03-06 23:35:21 +00:00

PM Note (Day 26 — M3 PR Triage):

TDD test suite for session create DI container error. All review findings addressed across ~4 rounds. Latest review stale after squash. Has a merge conflict.

@brent.edwards — Please rebase onto master. Needs fresh approval after rebase.

**PM Note (Day 26 — M3 PR Triage):** TDD test suite for session create DI container error. All review findings addressed across ~4 rounds. Latest review stale after squash. Has a **merge conflict**. @brent.edwards — Please rebase onto `master`. Needs fresh approval after rebase.

hamza.khyari requested changes 2026-03-07 01:05:46 +00:00

hamza.khyari left a comment

PR #595 Review — `test(cli): add failing tests for session create DI container error (#570)`

Commit: d79ceff1 | 5 files, +340/-0 | Issue: #570 | Test-only PR

The TDD approach is sound and most prior review findings have been addressed. However, there are CI-blocking gaps (@wip exclusion missing), a commit message type mismatch, a Robot assertion bug that will produce false failures after the fix, and a resource leak in the test setup that will manifest once the DI fix lands.

Prior Review Resolution

CoreRasurae #1980 — 6/9 resolved:

ID	Finding	Status
F1	Rich Console not captured by CliRunner	✅ `--format plain`
F2	Benchmarks bypass DI path	✅ Docstring clarified
F3	Missing error handling scenario	❌ Still missing — see TEST-2
F4	Branch name mismatch	❌ Still mismatched — see PROC-4
F5	Panel title/string spec mismatch	✅ Plain format avoids Rich
F6	Dead `_counter` in benchmark	✅ Removed
F7	Engine/schema recreated per call	✅ Shared in `setup()`
F8	Robot missing `[Tags]`	✅ Tags added
F9	Benchmark filename mismatch	❌ See PROC-6

hurui200320 #2008 — 8/11 resolved:

ID	Finding	Status
F1	Merge commit	✅ Single squashed commit
F2	Multiple commits	✅ Single commit
F3	Branch name mismatch	❌ Still `feature/m3-fix-session-create-error` — see PROC-4
F4	`ISSUES CLOSED` but doesn't fix	✅ `Refs: #570`
F5	Type label mismatch	❌ Still `Type/Testing` — see PROC-5
F6	Scenario #4 misleading	✅ Renamed
F7	Missing exit code assertion Scenario #2	✅ Added (line 20)
F8	Robot no `--format plain`	✅ Now uses `--format plain`
F9	Benchmark docstring misleading	✅ Updated
F10	Cleanup doesn't restore `_service`	✅ Restores `context.sce_original_service`
F11	`# type: ignore` in benchmark	✅ Removed

Findings

PROC-1 (Major) — `behave.ini` missing `tags = ~@wip`: `@wip` scenarios will fail in CI

All 4 BDD scenarios are tagged @wip (features/session_create_error.feature:11,17,24,31). behave.ini contains:

[behave]
paths = features
stdout_capture = no
stderr_capture = no

No tags = ~@wip directive. When nox -s unit_tests runs, these scenarios execute and fail, blocking CI. This is the same cross-cutting issue as PR #567 (which adds the exclusion but isn't merged yet). Either add the exclusion here or declare an explicit merge-order dependency on #567.

PROC-2 (Major) — `noxfile.py` missing `--exclude wip`: Robot `wip` tests will fail in CI

Robot tests at robot/session_create_error.robot:18,32 are tagged [Tags] tdd bug570 wip. The noxfile.py integration_tests session excludes slow, discovery, code_blocks but not wip. Same cross-cutting gap as PROC-1.

PROC-3 (Major) — Commit message `fix(cli):` on test-only PR

Commit subject: fix(cli): resolve DI container wiring for session create command

This PR contains zero production code changes — only test files. Per Conventional Changelog, fix denotes a bug fix in production code; a test-only commit should use test(cli):. The issue #570 metadata prescribes fix(cli): for the actual fix commit. CONTRIBUTING.md requires: "When an issue specifies a commit message in its Metadata section, that prescribed text is the first line and must be used exactly as written." However, the prescribed message was intended for a commit that includes the fix, not a TDD-only PR that ships failing tests ahead of it. Using the fix commit message on a test-only PR misrepresents the scope in git log and CHANGELOG tooling.

TEST-1 (Medium) — Robot `Should Contain` case mismatch: will fail after fix

robot/session_create_error.robot:44:

Should Contain    ${list.stdout}    Sessions

But --format plain output produces sessions: (lowercase). Confirmed via _session_list_dict() at session.py:108 — the dict key is "sessions" (lowercase), and _format_plain_dict() renders keys verbatim. Robot's Should Contain is case-sensitive by default. After the DI bug is fixed, this test will still fail due to the case mismatch, producing a false negative unrelated to bug #570.

Fix: Change Sessions to sessions on line 44.

SEC-1 (Medium) — DI-created engine never disposed after fix lands

features/steps/session_create_error_steps.py:70-74

_cleanup_sce() restores _service and removes the env var, but never disposes the engine that _get_session_service() creates via container.db() at session.py:66. The setup engine (line 56-58) is properly disposed, but the DI container creates a separate engine when the CLI command executes. After the fix lands, each scenario will leak a SQLAlchemy engine + open SQLite file handle. shutil.rmtree at line 74 will attempt to delete the DB file while the leaked engine holds it open.

Fix: In _cleanup_sce(), reset the container or dispose the DI-created engine before removing the temp dir.

SEC-2 (Medium) — `_get_session_service()` passes `db` where `session_factory` callable expected

session.py:66-69:

db = container.db()
session_repo = SessionRepository(db)
message_repo = SessionMessageRepository(db)

SessionRepository.__init__ expects session_factory: Callable[[], Session] (a sessionmaker). The benchmark does it correctly: SessionRepository(session_factory=self._session_factory). When container.db() eventually returns a value, it must be a sessionmaker callable, not a raw engine or session. This is a latent type mismatch that the fix author must handle — worth flagging here since the TDD tests will need to validate this contract.

PROC-4 (Medium) — Branch name mismatch (still open)

Issue #570 prescribes fix/m3-session-create-error. Actual: feature/m3-fix-session-create-error. Two discrepancies: prefix (feature/ vs fix/) and name ordering (m3-fix-session-create-error vs m3-session-create-error). (Originally CoreRasurae F4 + hurui200320 F3.)

PROC-5 (Medium) — `Type/Testing` label not in CONTRIBUTING.md taxonomy

The valid Type/ labels per CONTRIBUTING.md are: Type/Bug, Type/Feature, Type/Task, Type/Epic, Type/Legendary. Type/Testing is not defined. If the PR references issue #570 (Type/Bug), the label should be Type/Bug. (Originally hurui200320 F5.)

TEST-2 (Medium) — No error handling scenarios

Issue #570 subtasks require: "Tests (Behave): Add features/session_create_error.feature covering: create, list after create, error handling." The feature file has 4 success-path scenarios but no error handling scenario (e.g., verifying the error message when the DI container is misconfigured, or a non-zero exit code on failure). (Originally CoreRasurae F3.)

SEC-3 (Low) — Env var set before cleanup registered

features/steps/session_create_error_steps.py:61,85 — CLEVERAGENTS_DATABASE_URL is set at line 61 inside _setup_real_di_path(), but context.add_cleanup is registered at line 85 after the function returns. If any line between 62-84 raises, cleanup is never registered and the env var leaks to subsequent scenarios.

BENCH-1 (Low) — `importlib.reload` partial: submodules not reloaded

benchmarks/session_create_error_bench.py:29 — importlib.reload(cleveragents) only reloads top-level __init__.py. Submodules like cleveragents.application.services.session_service may still reference stale bytecode from an installed package.

PROC-6 (Low) — Benchmark filename mismatch with issue subtask

Issue #570 prescribes benchmarks/session_create_bench.py. Actual: benchmarks/session_create_error_bench.py. (Originally CoreRasurae F9.)

Summary

ID	Severity	Category	New/Prior
PROC-1	Major	CI blocker	New (cross-cutting)
PROC-2	Major	CI blocker	New (cross-cutting)
PROC-3	Major	Process	New
TEST-1	Medium	Test bug	New
SEC-1	Medium	Resource leak	New
SEC-2	Medium	Latent type mismatch	New
PROC-4	Medium	Process	Prior F3/F4
PROC-5	Medium	Process	Prior F5
TEST-2	Medium	Test coverage	Prior F3
SEC-3	Low	Test isolation	New
BENCH-1	Low	Benchmark	New
PROC-6	Low	Process	Prior F9

12 findings — 3 Major, 6 Medium, 3 Low. The test design is solid after multiple review rounds. The major blockers are the @wip exclusion gaps (PROC-1/2) and the commit message type mismatch (PROC-3). TEST-1 (Robot case mismatch) will produce a false failure after the fix lands and should be corrected now.

## PR #595 Review — `test(cli): add failing tests for session create DI container error (#570)` **Commit**: `d79ceff1` | **5 files**, +340/-0 | **Issue**: #570 | **Test-only PR** The TDD approach is sound and most prior review findings have been addressed. However, there are CI-blocking gaps (`@wip` exclusion missing), a commit message type mismatch, a Robot assertion bug that will produce false failures after the fix, and a resource leak in the test setup that will manifest once the DI fix lands. --- ### Prior Review Resolution **CoreRasurae #1980 — 6/9 resolved:** | ID | Finding | Status | |----|---------|--------| | F1 | Rich Console not captured by CliRunner | ✅ `--format plain` | | F2 | Benchmarks bypass DI path | ✅ Docstring clarified | | **F3** | **Missing error handling scenario** | **❌ Still missing — see TEST-2** | | **F4** | **Branch name mismatch** | **❌ Still mismatched — see PROC-4** | | F5 | Panel title/string spec mismatch | ✅ Plain format avoids Rich | | F6 | Dead `_counter` in benchmark | ✅ Removed | | F7 | Engine/schema recreated per call | ✅ Shared in `setup()` | | F8 | Robot missing `[Tags]` | ✅ Tags added | | **F9** | **Benchmark filename mismatch** | **❌ See PROC-6** | **hurui200320 #2008 — 8/11 resolved:** | ID | Finding | Status | |----|---------|--------| | F1 | Merge commit | ✅ Single squashed commit | | F2 | Multiple commits | ✅ Single commit | | **F3** | **Branch name mismatch** | **❌ Still `feature/m3-fix-session-create-error` — see PROC-4** | | F4 | `ISSUES CLOSED` but doesn't fix | ✅ `Refs: #570` | | **F5** | **Type label mismatch** | **❌ Still `Type/Testing` — see PROC-5** | | F6 | Scenario #4 misleading | ✅ Renamed | | F7 | Missing exit code assertion Scenario #2 | ✅ Added (line 20) | | F8 | Robot no `--format plain` | ✅ Now uses `--format plain` | | F9 | Benchmark docstring misleading | ✅ Updated | | F10 | Cleanup doesn't restore `_service` | ✅ Restores `context.sce_original_service` | | F11 | `# type: ignore` in benchmark | ✅ Removed | --- ### Findings #### PROC-1 (Major) — `behave.ini` missing `tags = ~@wip`: `@wip` scenarios will fail in CI All 4 BDD scenarios are tagged `@wip` (`features/session_create_error.feature:11,17,24,31`). `behave.ini` contains: ```ini [behave] paths = features stdout_capture = no stderr_capture = no ``` No `tags = ~@wip` directive. When `nox -s unit_tests` runs, these scenarios execute and fail, blocking CI. This is the same cross-cutting issue as PR #567 (which adds the exclusion but isn't merged yet). Either add the exclusion here or declare an explicit merge-order dependency on #567. --- #### PROC-2 (Major) — `noxfile.py` missing `--exclude wip`: Robot `wip` tests will fail in CI Robot tests at `robot/session_create_error.robot:18,32` are tagged `[Tags] tdd bug570 wip`. The `noxfile.py` `integration_tests` session excludes `slow`, `discovery`, `code_blocks` but not `wip`. Same cross-cutting gap as PROC-1. --- #### PROC-3 (Major) — Commit message `fix(cli):` on test-only PR Commit subject: `fix(cli): resolve DI container wiring for session create command` This PR contains **zero production code changes** — only test files. Per Conventional Changelog, `fix` denotes a bug fix in production code; a test-only commit should use `test(cli):`. The issue #570 metadata prescribes `fix(cli):` for the *actual fix* commit. CONTRIBUTING.md requires: *"When an issue specifies a commit message in its Metadata section, that prescribed text is the first line and must be used exactly as written."* However, the prescribed message was intended for a commit that *includes the fix*, not a TDD-only PR that ships failing tests ahead of it. Using the fix commit message on a test-only PR misrepresents the scope in `git log` and CHANGELOG tooling. --- #### TEST-1 (Medium) — Robot `Should Contain` case mismatch: will fail after fix `robot/session_create_error.robot:44`: ```robot Should Contain ${list.stdout} Sessions ``` But `--format plain` output produces `sessions:` (lowercase). Confirmed via `_session_list_dict()` at `session.py:108` — the dict key is `"sessions"` (lowercase), and `_format_plain_dict()` renders keys verbatim. Robot's `Should Contain` is case-sensitive by default. After the DI bug is fixed, this test will still fail due to the case mismatch, producing a false negative unrelated to bug #570. **Fix**: Change `Sessions` to `sessions` on line 44. --- #### SEC-1 (Medium) — DI-created engine never disposed after fix lands `features/steps/session_create_error_steps.py:70-74` `_cleanup_sce()` restores `_service` and removes the env var, but never disposes the engine that `_get_session_service()` creates via `container.db()` at `session.py:66`. The setup engine (line 56-58) is properly disposed, but the DI container creates a **separate** engine when the CLI command executes. After the fix lands, each scenario will leak a SQLAlchemy engine + open SQLite file handle. `shutil.rmtree` at line 74 will attempt to delete the DB file while the leaked engine holds it open. **Fix**: In `_cleanup_sce()`, reset the container or dispose the DI-created engine before removing the temp dir. --- #### SEC-2 (Medium) — `_get_session_service()` passes `db` where `session_factory` callable expected `session.py:66-69`: ```python db = container.db() session_repo = SessionRepository(db) message_repo = SessionMessageRepository(db) ``` `SessionRepository.__init__` expects `session_factory: Callable[[], Session]` (a `sessionmaker`). The benchmark does it correctly: `SessionRepository(session_factory=self._session_factory)`. When `container.db()` eventually returns a value, it **must** be a `sessionmaker` callable, not a raw engine or session. This is a latent type mismatch that the fix author must handle — worth flagging here since the TDD tests will need to validate this contract. --- #### PROC-4 (Medium) — Branch name mismatch (still open) Issue #570 prescribes `fix/m3-session-create-error`. Actual: `feature/m3-fix-session-create-error`. Two discrepancies: prefix (`feature/` vs `fix/`) and name ordering (`m3-fix-session-create-error` vs `m3-session-create-error`). *(Originally CoreRasurae F4 + hurui200320 F3.)* --- #### PROC-5 (Medium) — `Type/Testing` label not in CONTRIBUTING.md taxonomy The valid `Type/` labels per CONTRIBUTING.md are: `Type/Bug`, `Type/Feature`, `Type/Task`, `Type/Epic`, `Type/Legendary`. `Type/Testing` is not defined. If the PR references issue #570 (`Type/Bug`), the label should be `Type/Bug`. *(Originally hurui200320 F5.)* --- #### TEST-2 (Medium) — No error handling scenarios Issue #570 subtasks require: *"Tests (Behave): Add `features/session_create_error.feature` covering: create, list after create, **error handling**."* The feature file has 4 success-path scenarios but no error handling scenario (e.g., verifying the error message when the DI container is misconfigured, or a non-zero exit code on failure). *(Originally CoreRasurae F3.)* --- #### SEC-3 (Low) — Env var set before cleanup registered `features/steps/session_create_error_steps.py:61,85` — `CLEVERAGENTS_DATABASE_URL` is set at line 61 inside `_setup_real_di_path()`, but `context.add_cleanup` is registered at line 85 after the function returns. If any line between 62-84 raises, cleanup is never registered and the env var leaks to subsequent scenarios. --- #### BENCH-1 (Low) — `importlib.reload` partial: submodules not reloaded `benchmarks/session_create_error_bench.py:29` — `importlib.reload(cleveragents)` only reloads top-level `__init__.py`. Submodules like `cleveragents.application.services.session_service` may still reference stale bytecode from an installed package. --- #### PROC-6 (Low) — Benchmark filename mismatch with issue subtask Issue #570 prescribes `benchmarks/session_create_bench.py`. Actual: `benchmarks/session_create_error_bench.py`. *(Originally CoreRasurae F9.)* --- ### Summary | ID | Severity | Category | New/Prior | |----|----------|----------|-----------| | PROC-1 | **Major** | CI blocker | New (cross-cutting) | | PROC-2 | **Major** | CI blocker | New (cross-cutting) | | PROC-3 | **Major** | Process | New | | TEST-1 | Medium | Test bug | **New** | | SEC-1 | Medium | Resource leak | **New** | | SEC-2 | Medium | Latent type mismatch | **New** | | PROC-4 | Medium | Process | Prior F3/F4 | | PROC-5 | Medium | Process | Prior F5 | | TEST-2 | Medium | Test coverage | Prior F3 | | SEC-3 | Low | Test isolation | **New** | | BENCH-1 | Low | Benchmark | **New** | | PROC-6 | Low | Process | Prior F9 | **12 findings** — 3 Major, 6 Medium, 3 Low. The test design is solid after multiple review rounds. The major blockers are the `@wip` exclusion gaps (PROC-1/2) and the commit message type mismatch (PROC-3). TEST-1 (Robot case mismatch) will produce a false failure after the fix lands and should be corrected now.

brent.edwards force-pushed feature/m3-fix-session-create-error from d79ceff177 to 1c9fe9e454

2026-03-07 04:03:38 +00:00

Compare

brent.edwards commented

2026-03-07 04:04:53 +00:00

Rebase + Review Fixes — `1c9fe9e4`

Rebased onto master (272f9078) as a single squashed commit with no merge commits.

Disposition of hamza.khyari review #2034

Fixed

ID	Severity	Resolution
PROC-1	Major	`@wip` scenarios now excluded by `behave.ini` `tags = ~@wip` inherited from master (merged via PR #567). All 4 `@wip` scenarios are correctly skipped in CI.
PROC-2	Major	`noxfile.py` now has `--exclude wip` for Robot tests, inherited from master (merged via PR #568). Both `wip`-tagged Robot tests are correctly excluded.
PROC-3	Major	Commit message changed from `fix(cli): resolve DI container wiring for session create command` to `test(cli): add failing tests for session create DI container error` — this is a test-only PR with no production code.
TEST-1	Medium	`Should Contain ${list.stdout} Sessions` → `sessions` (lowercase) to match `--format plain` output from `_session_list_dict()`.
SEC-1	Medium	`_cleanup_sce()` now calls `session_mod._reset_session_service()` before restoring original `_service`, releasing any DI-created engine before `shutil.rmtree` removes the temp dir.
SEC-3	Low	`context.add_cleanup()` now registered immediately after storing original state in `_setup_real_di_path()`, before env var override and DB creation. If any subsequent setup line raises, cleanup still runs.

Acknowledged (no code change needed)

ID	Severity	Notes
SEC-2	Medium	Latent type mismatch in `_get_session_service()` — `container.db()` must return a `sessionmaker` callable, not a raw engine. This is a note for the fix author (PR implementing the `db` provider). Not actionable in this test-only PR.
PROC-4	Medium	Branch name `feature/m3-fix-session-create-error` vs issue metadata `fix/m3-session-create-error`. Cannot rename without recreating PR. This is a test-only PR using `Refs:` not `Closes:`, so the `feature/` prefix is appropriate.
PROC-5	Medium	`Type/Testing` label is correct — this is a TDD-only PR that uses `Refs: #570`, not `Closes: #570`. The issue `Type/Bug` label applies to the fix PR, not this test PR.
TEST-2	Medium	Error handling scenario: the 4 existing scenarios test the expected post-fix behavior. The current error (DI AttributeError) is the bug itself — it will be exercised implicitly when the `@wip` tag is removed after the fix. Adding an explicit "verify the error message" scenario would test the broken state, not the fixed state, contradicting TDD methodology.
PROC-6	Low	Benchmark filename `session_create_error_bench.py` is more descriptive than the issue's `session_create_bench.py`. Kept as-is.
BENCH-1	Low	`importlib.reload` only reloads top-level `__init__.py`. This is a known ASV limitation — submodule caching is unavoidable without restarting the interpreter.

Quality gates

Gate	Result
`nox -s lint`	Pass
`nox -s typecheck`	Pass (0 errors)
`nox -s unit_tests`	Pass — 9,029 scenarios, 0 failures, 1 feature skipped (the `@wip` feature)

PR is mergeable: true. Single commit, rebased onto master, no merge commits.

## Rebase + Review Fixes — `1c9fe9e4` Rebased onto `master` (`272f9078`) as a single squashed commit with no merge commits. ### Disposition of hamza.khyari review #2034 #### Fixed | ID | Severity | Resolution | |----|----------|------------| | **PROC-1** | Major | `@wip` scenarios now excluded by `behave.ini` `tags = ~@wip` inherited from master (merged via PR #567). All 4 `@wip` scenarios are correctly skipped in CI. | | **PROC-2** | Major | `noxfile.py` now has `--exclude wip` for Robot tests, inherited from master (merged via PR #568). Both `wip`-tagged Robot tests are correctly excluded. | | **PROC-3** | Major | Commit message changed from `fix(cli): resolve DI container wiring for session create command` to `test(cli): add failing tests for session create DI container error` — this is a test-only PR with no production code. | | **TEST-1** | Medium | `Should Contain ${list.stdout} Sessions` → `sessions` (lowercase) to match `--format plain` output from `_session_list_dict()`. | | **SEC-1** | Medium | `_cleanup_sce()` now calls `session_mod._reset_session_service()` before restoring original `_service`, releasing any DI-created engine before `shutil.rmtree` removes the temp dir. | | **SEC-3** | Low | `context.add_cleanup()` now registered immediately after storing original state in `_setup_real_di_path()`, before env var override and DB creation. If any subsequent setup line raises, cleanup still runs. | #### Acknowledged (no code change needed) | ID | Severity | Notes | |----|----------|-------| | **SEC-2** | Medium | Latent type mismatch in `_get_session_service()` — `container.db()` must return a `sessionmaker` callable, not a raw engine. This is a note for the fix author (PR implementing the `db` provider). Not actionable in this test-only PR. | | **PROC-4** | Medium | Branch name `feature/m3-fix-session-create-error` vs issue metadata `fix/m3-session-create-error`. Cannot rename without recreating PR. This is a test-only PR using `Refs:` not `Closes:`, so the `feature/` prefix is appropriate. | | **PROC-5** | Medium | `Type/Testing` label is correct — this is a TDD-only PR that uses `Refs: #570`, not `Closes: #570`. The issue `Type/Bug` label applies to the fix PR, not this test PR. | | **TEST-2** | Medium | Error handling scenario: the 4 existing scenarios test the expected post-fix behavior. The current error (DI AttributeError) is the bug itself — it will be exercised implicitly when the `@wip` tag is removed after the fix. Adding an explicit "verify the error message" scenario would test the broken state, not the fixed state, contradicting TDD methodology. | | **PROC-6** | Low | Benchmark filename `session_create_error_bench.py` is more descriptive than the issue's `session_create_bench.py`. Kept as-is. | | **BENCH-1** | Low | `importlib.reload` only reloads top-level `__init__.py`. This is a known ASV limitation — submodule caching is unavoidable without restarting the interpreter. | ### Quality gates | Gate | Result | |------|--------| | `nox -s lint` | Pass | | `nox -s typecheck` | Pass (0 errors) | | `nox -s unit_tests` | Pass — **9,029 scenarios, 0 failures, 1 feature skipped** (the `@wip` feature) | PR is `mergeable: true`. Single commit, rebased onto master, no merge commits.

brent.edwards referenced this pull request

2026-03-07 17:28:32 +00:00

test(cli): add failing tests for session list DI container error (#554) #596

freemo referenced this pull request

2026-03-07 23:02:29 +00:00

TDD: `agents session create` DI container missing `db` provider test #631

freemo reviewed 2026-03-07 23:13:06 +00:00

freemo left a comment

Planning Authority Review — PR #595

Branch Naming Convention:
This PR uses the feature/ prefix (feature/m3-tdd-session-create-test), but per CONTRIBUTING.md, TDD-related issues should use the tdd/ prefix (e.g., tdd/m3-session-create-test). This is flagged for future compliance — not blocking this PR, but please use the correct prefix on subsequent TDD branches.

TDD Counterpart Tracking:
TDD counterpart issue #631 has been created for the originating bug #570. This ensures the TDD cycle is properly tracked alongside the fix.

Review Status:
This PR is in advanced review with 10+ comments already addressed. No additional blocking concerns from planning at this time.

## Planning Authority Review — PR #595 **Branch Naming Convention:** This PR uses the `feature/` prefix (`feature/m3-tdd-session-create-test`), but per CONTRIBUTING.md, TDD-related issues should use the `tdd/` prefix (e.g., `tdd/m3-session-create-test`). This is flagged for future compliance — not blocking this PR, but please use the correct prefix on subsequent TDD branches. **TDD Counterpart Tracking:** TDD counterpart issue #631 has been created for the originating bug #570. This ensures the TDD cycle is properly tracked alongside the fix. **Review Status:** This PR is in advanced review with 10+ comments already addressed. No additional blocking concerns from planning at this time.

freemo commented

2026-03-09 02:47:33 +00:00

PM Status Check — Day 29

Author: @brent.edwards | Milestone: v3.2.0 (M3) | Mergeable: Yes | Assigned Reviewer: @CoreRasurae

Current State

Brent has addressed all findings from 4 review rounds (CoreRasurae #1980, Aditya #55060, hurui200320 #2008, hamza.khyari #2034). Final commit 1c9fe9e4 is rebased onto master, single squashed commit, no merge conflicts. Quality gates: lint pass, typecheck 0 errors, 9,029 scenarios 0 failures.

Status: READY FOR MERGE

All review findings addressed. No outstanding objections. Previous approvals are stale after rebase.

Action Required

Who	Action	Deadline
@CoreRasurae	Re-approve (stale after rebase)	Mar 10 EOD
@aditya	Verify F1-F6 addressed (optional second approval)	Mar 10 EOD

This is a TDD test PR for bug #570 (session create DI error, M3). The fix PR cannot proceed until these tests are merged. Bug fixes are top priority per CONTRIBUTING.md.

## PM Status Check — Day 29 **Author**: @brent.edwards | **Milestone**: v3.2.0 (M3) | **Mergeable**: Yes | **Assigned Reviewer**: @CoreRasurae ### Current State Brent has addressed **all findings** from 4 review rounds (CoreRasurae #1980, Aditya #55060, hurui200320 #2008, hamza.khyari #2034). Final commit `1c9fe9e4` is rebased onto master, single squashed commit, no merge conflicts. Quality gates: lint pass, typecheck 0 errors, 9,029 scenarios 0 failures. ### Status: READY FOR MERGE All review findings addressed. No outstanding objections. Previous approvals are stale after rebase. ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @CoreRasurae | Re-approve (stale after rebase) | **Mar 10 EOD** | | @aditya | Verify F1-F6 addressed (optional second approval) | Mar 10 EOD | This is a TDD test PR for bug #570 (session create DI error, M3). The fix PR cannot proceed until these tests are merged. **Bug fixes are top priority per CONTRIBUTING.md.**

freemo referenced this pull request

2026-03-09 02:47:38 +00:00

test(cli): add failing tests for session list DI container error (#554) #596

freemo referenced this pull request

2026-03-09 02:49:43 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

hurui200320 referenced this pull request

2026-03-09 08:36:31 +00:00

test(e2e): validate M3 acceptance criteria for v3.2.0 milestone closure #494

aditya commented

2026-03-09 10:45:36 +00:00

Code Review — PR #595: `test(cli): add failing tests for session create DI container error`

Commit: test(cli): add failing tests for session create DI container error
Issue: #570
Branch: feature/m3-fix-session-create-error
Reviewer: Aditya Chhabra

Previous Review Resolution — F1–F6 (#55060)

All six findings from the prior review cycle have been correctly resolved:

#	Finding	Status
F1	Scenario #4 name contradicted its assertions ("fails gracefully" vs exit 0)	✅ Fixed — renamed to "Session create with arbitrary actor name succeeds"
F2	Missing exit code assertion after `create` in Scenario #2	✅ Fixed — `Then the session-create-error command should exit successfully` added before the list step
F3	Robot tests missing `--format plain`; AttributeError checked on stdout instead of stderr	✅ Fixed — `--format plain` on both invocations; check moved to `${create.stderr}`
F4	Benchmark `track_create_persists` docstring implied it detected bug #570	✅ Fixed — docstring now accurately states it measures service-layer persistence only
F5	Scenario #3 missing `session_id:` assertion	✅ Fixed — `And the session-create-error output should contain "session_id:"` added
F6	Cleanup didn't restore original `_service`	✅ Fixed — original value stored in `context.sce_original_service` and restored; additionally improved with `_reset_session_service()` call to release DI-created engine before restore

Scope

TDD-only PR — adds failing tests that will pass once the DI container fix (#554 root cause) lands.

Findings

F1 — `@unit` Tag Misclassifies Integration-Level Tests [HIGH]

File: features/session_create_error.feature:11,17,24,31

All four scenarios are tagged @unit @tdd @bug570 @wip. This is incorrect. The tests:

Create a real tempfile.mkdtemp() directory
Build a real SQLite database with create_engine + Base.metadata.create_all
Set a real CLEVERAGENTS_DATABASE_URL environment variable
Invoke the live DI container path via _get_session_service() (by design)

These are integration-level tests. @unit is reserved for in-memory, isolated, fast unit scenarios. Using @unit here misleads other developers about test characteristics, and will pull these (inherently slower, filesystem-dependent) tests into unit test budget expectations.

Evidence from repo: A directly analogous TDD scenario in features/resource_type_bootstrap_git.feature (bug #524, same TDD pattern) uses only @tdd @bug524 @wip — no @unit.

Recommendation: Remove @unit from all four scenarios. The correct tagging is:

@tdd @bug570 @wip

F2 — Robot Tests Do Not Assert on `agents init` Exit Code [MEDIUM]

File: robot/session_create_error.robot:20–21, 34–35

Both test cases call agents init but silently discard the result:

Run Process    ${PYTHON}    -m    cleveragents    init    sce-test
...    timeout=60s    cwd=${tmpdir}

If init fails (database setup error, migration failure, disk full, pre-existing path conflict), the test continues and the subsequent session create failure becomes ambiguous — the test would either fail on the RC assertion or, worse, init may return RC 0 but produce a corrupted state.

Evidence from repo: Every other Robot test that calls init captures the result and asserts. From robot/core_cli_commands.robot:34–37:

${result} =    Run Process    ${PYTHON}    -m    cleveragents    init    ${PROJECT_NAME}
...    cwd=${TEST_DIR}/project1    timeout=120s
Should Be Equal As Numbers    ${result.rc}    0
Should Contain    ${result.stdout}    Project '${PROJECT_NAME}' initialized successfully

Recommendation: Capture and assert on the init result in both test cases:

${init}=    Run Process    ${PYTHON}    -m    cleveragents    init    sce-test
...    timeout=60s    cwd=${tmpdir}
Should Be Equal As Integers    ${init.rc}    0
...    msg=agents init should exit 0 but got ${init.rc}. stderr: ${init.stderr}

F3 — Robot `list` Assertion Is Too Weak to Verify Persistence [MEDIUM]

File: robot/session_create_error.robot:44

Should Contain    ${list.stdout}    sessions
...    msg=session list should show sessions in plain output but got: ${list.stdout}

The string "sessions" will appear in plain-format output even for an empty list because _format_plain_dict renders the key sessions: regardless of content (it would still output sessions: with an empty list body if the create had failed silently). The assertion gives a false green if session create actually failed but the list command still ran and printed its header.

Compare with the Behave equivalent (step_list_shows_sessions), which correctly checks both:

assert "total:" in result.output
assert "total: 0" not in result.output

Recommendation: Replace the weak string check with assertions that verify an actual session was created:

Should Contain    ${list.stdout}    total:
...    msg=Expected 'total:' in plain output: ${list.stdout}
Should Not Contain    ${list.stdout}    total: 0
...    msg=Expected at least one session in list output: ${list.stdout}

F4 — DI Container Not Reset Between Scenarios [MEDIUM]

File: features/steps/session_create_error_steps.py:42–74

_setup_real_di_path() sets CLEVERAGENTS_DATABASE_URL and resets session_mod._service = None, but never calls reset_container() from cleveragents.application.container. The Container is a global singleton accessed via get_container(). If an earlier test has already called get_container() and initialized any Singleton providers against a different database URL (e.g., context_tier_service, autonomy_controller, or any other Singleton-scoped provider), those cached instances persist for the lifetime of the process.

After the DI fix lands, if the implementation resolves the session factory through the container — even if CLEVERAGENTS_DATABASE_URL is correctly set — any Singleton providers already resolved against the old URL will not be re-initialized. While the test currently passes (the bug causes immediate failure before any Singleton is resolved), this will be a latent isolation problem once the fix lands.

Recommendation: Add reset_container() to both setup and cleanup:

from cleveragents.application.container import reset_container

def _setup_real_di_path(context: Any) -> None:
    context.sce_original_service = session_mod._service
    context.sce_tmpdir = tempfile.mkdtemp(prefix="session_create_err_570_")
    context.add_cleanup(_cleanup_sce, context)
    # Reset global DI container so post-fix DI reads fresh env var
    reset_container()
    # ... rest of setup ...

def _cleanup_sce(context: Any) -> None:
    session_mod._reset_session_service()
    session_mod._service = context.sce_original_service
    reset_container()  # Release any DI-created connections
    os.environ.pop("CLEVERAGENTS_DATABASE_URL", None)
    shutil.rmtree(context.sce_tmpdir, ignore_errors=True)

F5 — `_cleanup_sce` Comment Implies Engine Disposal That Does Not Occur [LOW]

File: features/steps/session_create_error_steps.py:251–252

# Reset via public API first to release any DI-created engine (SEC-1).
session_mod._reset_session_service()

_reset_session_service() does global _service; _service = None. It sets the cached reference to None — it does not call .dispose() on any SQLAlchemy engine, and the current _get_session_service() implementation never stores its result in _service (the DI path returns without caching). So the comment is misleading: no engine is released by this call; Python's garbage collector handles cleanup if and when the refcount drops to zero.

After the DI fix lands, if _get_session_service() is also corrected to cache the result in _service, then this call will correctly clear the cached reference. But the comment overstates the guarantee today.

Recommendation: Revise the comment to be accurate:

# Ensure module-level cache is cleared; GC will release any engine when
# the container is reset (done separately via reset_container()).
session_mod._reset_session_service()

F6 — Benchmark `timeout` Is Float Instead of Int [LOW]

File: benchmarks/session_create_error_bench.py:76

timeout = 30.0

Every other benchmark suite in this repo uses timeout = 60 (integer). The float works in ASV, but the inconsistency is a style deviation. Additionally, 30 seconds is half the standard timeout. For a file-based SQLite round-trip that includes engine construction (done in setup()), 30 seconds may be tight in CI under load.

Recommendation: Align with the codebase standard:

timeout = 60

F7 — `list_sessions` Empty-Path Bug Not Covered by Any Test [LOW]

File: src/cleveragents/cli/commands/session.py:181–184 (pre-existing, not introduced by PR)

When the session list is empty, list_sessions uses console.print regardless of --format:

if not sessions:
    console.print("[yellow]No sessions found.[/yellow]")   # bypasses --format
    console.print("Create one with 'agents session create'")
    return

With --format plain, this outputs Rich markup text [yellow]No sessions found.[/yellow] to the captured output instead of plain text. Only the non-empty path correctly routes through typer.echo(format_output(data, fmt)).

The PR test avoids this path entirely (it always creates a session before listing), so this is not a failure introduced by this PR. However, the subtask "Tests (Behave): Add features/session_create_error.feature covering: create, list after create, error handling" (issue #570) implies error handling should be covered. An explicit scenario for the empty-list path would both document this behavior and surface the format inconsistency.

Recommendation: Add a fifth scenario (tagged @tdd @bug570 @wip without @unit):

Scenario: Session list with no sessions in plain format does not emit rich markup
  When I invoke session-create-error list to verify persistence
  Then the session-create-error list result should not contain "[yellow]"

This is a separate bug report concern and can be tracked separately if the team prefers to keep this PR narrowly scoped.

F8 — Robot Tests Use `Evaluate import('tempfile').mkdtemp()` Anti-Pattern [LOW]

File: robot/session_create_error.robot:19, 33

${tmpdir}=    Evaluate    __import__('tempfile').mkdtemp(prefix='sce_570_')

This is a code smell. The Robot Framework standard for creating temporary directories is via a helper keyword or the OperatingSystem library. The Evaluate with __import__ is brittle (depends on string-level Python eval), harder to read, and doesn't follow the pattern used elsewhere in the robot suite.

Evidence from repo: Other tests that need temporary directories use Create Directory with explicit paths or delegate temp-dir creation to common.resource keywords.

Recommendation: Create a helper keyword in common.resource or use a Python helper file (e.g., robot/helper_session_create_error.py) with a Create Temp Dir keyword:

${tmpdir}=    Create Temp Dir    prefix=sce_570_

Or at minimum, use a named variable to make the intent clear.

Severity Summary

#	Finding	Severity	Category	File
F1	`@unit` tag misclassifies integration-style tests	HIGH	Test Classification	`session_create_error.feature`
F2	Robot `agents init` exit code not asserted	MEDIUM	Test Rigor	`session_create_error.robot`
F3	Robot `list` assertion too weak to verify persistence	MEDIUM	Test Rigor	`session_create_error.robot`
F4	DI container not reset between scenarios	MEDIUM	Test Isolation	`session_create_error_steps.py`
F5	`_cleanup_sce` comment overstates engine disposal	LOW	Documentation	`session_create_error_steps.py`
F6	Benchmark `timeout = 30.0` inconsistent and possibly short	LOW	Style / Reliability	`session_create_error_bench.py`
F7	Empty-list plain-format path untested (pre-existing bug)	LOW	Coverage Gap	`session.py`
F8	`Evaluate __import__('tempfile')...` anti-pattern in Robot	LOW	Style	`session_create_error.robot`

Positive Observations

The TDD intent is well-executed: @wip correctly excludes scenarios from CI via behave.ini (tags = ~@wip); Robot excludes wip tagged tests in nox integration_tests. No CI breakage risk.
_setup_real_di_path correctly stores original _service before modification (F6 from prior review addressed).
Exit code assertion now correctly inserted between create and list in Scenario 2 (F2 from prior review addressed).
--format plain consistently applied to all Behave and Robot session create invocations (F3 from prior review addressed).
Benchmark module-level docstring is accurate: explicitly states it does NOT exercise the DI container wiring path.
_cleanup_sce calls context.add_cleanup immediately after capturing original state — ensures cleanup runs even on partial setup failure.
Scenario #3 correctly asserts session_id: (F5 from prior review addressed).
Scenario #4 renamed to "arbitrary actor name succeeds" — no longer contradicts its own assertions (F1 from prior review addressed).

Verdict

Requesting changes on F1 (incorrect @unit tag — misleads developers about test nature and breaks classification assumptions), F2 (Robot silent init failure), and F3 (Robot persistence assertion gives false-green). F4 is a time-bomb that will cause subtle isolation failures after the DI fix lands and is strongly recommended.

# Code Review — PR #595: `test(cli): add failing tests for session create DI container error` **Commit:** `test(cli): add failing tests for session create DI container error` **Issue:** #570 **Branch:** `feature/m3-fix-session-create-error` **Reviewer:** Aditya Chhabra --- ## Previous Review Resolution — F1–F6 (#55060) All six findings from the prior review cycle have been **correctly resolved**: | # | Finding | Status | |---|---------|--------| | F1 | Scenario #4 name contradicted its assertions ("fails gracefully" vs exit 0) | ✅ Fixed — renamed to "Session create with arbitrary actor name succeeds" | | F2 | Missing exit code assertion after `create` in Scenario #2 | ✅ Fixed — `Then the session-create-error command should exit successfully` added before the list step | | F3 | Robot tests missing `--format plain`; AttributeError checked on stdout instead of stderr | ✅ Fixed — `--format plain` on both invocations; check moved to `${create.stderr}` | | F4 | Benchmark `track_create_persists` docstring implied it detected bug #570 | ✅ Fixed — docstring now accurately states it measures service-layer persistence only | | F5 | Scenario #3 missing `session_id:` assertion | ✅ Fixed — `And the session-create-error output should contain "session_id:"` added | | F6 | Cleanup didn't restore original `_service` | ✅ Fixed — original value stored in `context.sce_original_service` and restored; additionally improved with `_reset_session_service()` call to release DI-created engine before restore | --- ## Scope TDD-only PR — adds failing tests that will pass once the DI container fix (#554 root cause) lands. --- ## Findings ### F1 — `@unit` Tag Misclassifies Integration-Level Tests [HIGH] **File:** `features/session_create_error.feature:11,17,24,31` All four scenarios are tagged `@unit @tdd @bug570 @wip`. This is incorrect. The tests: - Create a real `tempfile.mkdtemp()` directory - Build a real SQLite database with `create_engine` + `Base.metadata.create_all` - Set a real `CLEVERAGENTS_DATABASE_URL` environment variable - Invoke the live DI container path via `_get_session_service()` (by design) These are integration-level tests. `@unit` is reserved for in-memory, isolated, fast unit scenarios. Using `@unit` here misleads other developers about test characteristics, and will pull these (inherently slower, filesystem-dependent) tests into unit test budget expectations. **Evidence from repo:** A directly analogous TDD scenario in `features/resource_type_bootstrap_git.feature` (bug #524, same TDD pattern) uses only `@tdd @bug524 @wip` — no `@unit`. **Recommendation:** Remove `@unit` from all four scenarios. The correct tagging is: ```gherkin @tdd @bug570 @wip ``` --- ### F2 — Robot Tests Do Not Assert on `agents init` Exit Code [MEDIUM] **File:** `robot/session_create_error.robot:20–21, 34–35` Both test cases call `agents init` but silently discard the result: ```robotframework Run Process ${PYTHON} -m cleveragents init sce-test ... timeout=60s cwd=${tmpdir} ``` If `init` fails (database setup error, migration failure, disk full, pre-existing path conflict), the test continues and the subsequent `session create` failure becomes ambiguous — the test would either fail on the RC assertion or, worse, `init` may return RC 0 but produce a corrupted state. **Evidence from repo:** Every other Robot test that calls `init` captures the result and asserts. From `robot/core_cli_commands.robot:34–37`: ```robotframework ${result} = Run Process ${PYTHON} -m cleveragents init ${PROJECT_NAME} ... cwd=${TEST_DIR}/project1 timeout=120s Should Be Equal As Numbers ${result.rc} 0 Should Contain ${result.stdout} Project '${PROJECT_NAME}' initialized successfully ``` **Recommendation:** Capture and assert on the `init` result in both test cases: ```robotframework ${init}= Run Process ${PYTHON} -m cleveragents init sce-test ... timeout=60s cwd=${tmpdir} Should Be Equal As Integers ${init.rc} 0 ... msg=agents init should exit 0 but got ${init.rc}. stderr: ${init.stderr} ``` --- ### F3 — Robot `list` Assertion Is Too Weak to Verify Persistence [MEDIUM] **File:** `robot/session_create_error.robot:44` ```robotframework Should Contain ${list.stdout} sessions ... msg=session list should show sessions in plain output but got: ${list.stdout} ``` The string `"sessions"` will appear in plain-format output even for an **empty** list because `_format_plain_dict` renders the key `sessions:` regardless of content (it would still output `sessions:` with an empty list body if the create had failed silently). The assertion gives a false green if `session create` actually failed but the list command still ran and printed its header. Compare with the Behave equivalent (`step_list_shows_sessions`), which correctly checks both: 1. `assert "total:" in result.output` 2. `assert "total: 0" not in result.output` **Recommendation:** Replace the weak string check with assertions that verify an actual session was created: ```robotframework Should Contain ${list.stdout} total: ... msg=Expected 'total:' in plain output: ${list.stdout} Should Not Contain ${list.stdout} total: 0 ... msg=Expected at least one session in list output: ${list.stdout} ``` --- ### F4 — DI Container Not Reset Between Scenarios [MEDIUM] **File:** `features/steps/session_create_error_steps.py:42–74` `_setup_real_di_path()` sets `CLEVERAGENTS_DATABASE_URL` and resets `session_mod._service = None`, but never calls `reset_container()` from `cleveragents.application.container`. The `Container` is a global singleton accessed via `get_container()`. If an earlier test has already called `get_container()` and initialized any `Singleton` providers against a different database URL (e.g., `context_tier_service`, `autonomy_controller`, or any other Singleton-scoped provider), those cached instances persist for the lifetime of the process. After the DI fix lands, if the implementation resolves the session factory through the container — even if `CLEVERAGENTS_DATABASE_URL` is correctly set — any Singleton providers already resolved against the old URL will not be re-initialized. While the test currently passes (the bug causes immediate failure before any Singleton is resolved), this will be a latent isolation problem once the fix lands. **Recommendation:** Add `reset_container()` to both setup and cleanup: ```python from cleveragents.application.container import reset_container def _setup_real_di_path(context: Any) -> None: context.sce_original_service = session_mod._service context.sce_tmpdir = tempfile.mkdtemp(prefix="session_create_err_570_") context.add_cleanup(_cleanup_sce, context) # Reset global DI container so post-fix DI reads fresh env var reset_container() # ... rest of setup ... def _cleanup_sce(context: Any) -> None: session_mod._reset_session_service() session_mod._service = context.sce_original_service reset_container() # Release any DI-created connections os.environ.pop("CLEVERAGENTS_DATABASE_URL", None) shutil.rmtree(context.sce_tmpdir, ignore_errors=True) ``` --- ### F5 — `_cleanup_sce` Comment Implies Engine Disposal That Does Not Occur [LOW] **File:** `features/steps/session_create_error_steps.py:251–252` ```python # Reset via public API first to release any DI-created engine (SEC-1). session_mod._reset_session_service() ``` `_reset_session_service()` does `global _service; _service = None`. It sets the cached reference to `None` — it does **not** call `.dispose()` on any SQLAlchemy engine, and the current `_get_session_service()` implementation never stores its result in `_service` (the DI path returns without caching). So the comment is misleading: no engine is released by this call; Python's garbage collector handles cleanup if and when the refcount drops to zero. After the DI fix lands, if `_get_session_service()` is also corrected to cache the result in `_service`, then this call will correctly clear the cached reference. But the comment overstates the guarantee today. **Recommendation:** Revise the comment to be accurate: ```python # Ensure module-level cache is cleared; GC will release any engine when # the container is reset (done separately via reset_container()). session_mod._reset_session_service() ``` --- ### F6 — Benchmark `timeout` Is Float Instead of Int [LOW] **File:** `benchmarks/session_create_error_bench.py:76` ```python timeout = 30.0 ``` Every other benchmark suite in this repo uses `timeout = 60` (integer). The float works in ASV, but the inconsistency is a style deviation. Additionally, 30 seconds is half the standard timeout. For a file-based SQLite round-trip that includes engine construction (done in `setup()`), 30 seconds may be tight in CI under load. **Recommendation:** Align with the codebase standard: ```python timeout = 60 ``` --- ### F7 — `list_sessions` Empty-Path Bug Not Covered by Any Test [LOW] **File:** `src/cleveragents/cli/commands/session.py:181–184` (pre-existing, not introduced by PR) When the session list is empty, `list_sessions` uses `console.print` regardless of `--format`: ```python if not sessions: console.print("[yellow]No sessions found.[/yellow]") # bypasses --format console.print("Create one with 'agents session create'") return ``` With `--format plain`, this outputs Rich markup text `[yellow]No sessions found.[/yellow]` to the captured output instead of plain text. Only the non-empty path correctly routes through `typer.echo(format_output(data, fmt))`. The PR test avoids this path entirely (it always creates a session before listing), so this is not a failure introduced by this PR. However, the subtask **"Tests (Behave): Add features/session_create_error.feature covering: create, list after create, error handling"** (issue #570) implies error handling should be covered. An explicit scenario for the empty-list path would both document this behavior and surface the format inconsistency. **Recommendation:** Add a fifth scenario (tagged `@tdd @bug570 @wip` without `@unit`): ```gherkin Scenario: Session list with no sessions in plain format does not emit rich markup When I invoke session-create-error list to verify persistence Then the session-create-error list result should not contain "[yellow]" ``` This is a separate bug report concern and can be tracked separately if the team prefers to keep this PR narrowly scoped. --- ### F8 — Robot Tests Use `Evaluate __import__('tempfile').mkdtemp()` Anti-Pattern [LOW] **File:** `robot/session_create_error.robot:19, 33` ```robotframework ${tmpdir}= Evaluate __import__('tempfile').mkdtemp(prefix='sce_570_') ``` This is a code smell. The Robot Framework standard for creating temporary directories is via a helper keyword or the `OperatingSystem` library. The `Evaluate` with `__import__` is brittle (depends on string-level Python eval), harder to read, and doesn't follow the pattern used elsewhere in the robot suite. **Evidence from repo:** Other tests that need temporary directories use `Create Directory` with explicit paths or delegate temp-dir creation to `common.resource` keywords. **Recommendation:** Create a helper keyword in `common.resource` or use a Python helper file (e.g., `robot/helper_session_create_error.py`) with a `Create Temp Dir` keyword: ```robotframework ${tmpdir}= Create Temp Dir prefix=sce_570_ ``` Or at minimum, use a named variable to make the intent clear. --- ## Severity Summary | # | Finding | Severity | Category | File | |---|---------|----------|----------|------| | F1 | `@unit` tag misclassifies integration-style tests | **HIGH** | Test Classification | `session_create_error.feature` | | F2 | Robot `agents init` exit code not asserted | **MEDIUM** | Test Rigor | `session_create_error.robot` | | F3 | Robot `list` assertion too weak to verify persistence | **MEDIUM** | Test Rigor | `session_create_error.robot` | | F4 | DI container not reset between scenarios | **MEDIUM** | Test Isolation | `session_create_error_steps.py` | | F5 | `_cleanup_sce` comment overstates engine disposal | LOW | Documentation | `session_create_error_steps.py` | | F6 | Benchmark `timeout = 30.0` inconsistent and possibly short | LOW | Style / Reliability | `session_create_error_bench.py` | | F7 | Empty-list plain-format path untested (pre-existing bug) | LOW | Coverage Gap | `session.py` | | F8 | `Evaluate __import__('tempfile')...` anti-pattern in Robot | LOW | Style | `session_create_error.robot` | --- ## Positive Observations - The TDD intent is well-executed: `@wip` correctly excludes scenarios from CI via `behave.ini` (`tags = ~@wip`); Robot excludes `wip` tagged tests in nox `integration_tests`. No CI breakage risk. - `_setup_real_di_path` correctly stores original `_service` before modification (F6 from prior review addressed). - Exit code assertion now correctly inserted between `create` and `list` in Scenario 2 (F2 from prior review addressed). - `--format plain` consistently applied to all Behave and Robot `session create` invocations (F3 from prior review addressed). - Benchmark module-level docstring is accurate: explicitly states it does NOT exercise the DI container wiring path. - `_cleanup_sce` calls `context.add_cleanup` immediately after capturing original state — ensures cleanup runs even on partial setup failure. - Scenario #3 correctly asserts `session_id:` (F5 from prior review addressed). - Scenario #4 renamed to "arbitrary actor name succeeds" — no longer contradicts its own assertions (F1 from prior review addressed). --- ## Verdict **Requesting changes** on F1 (incorrect `@unit` tag — misleads developers about test nature and breaks classification assumptions), F2 (Robot silent init failure), and F3 (Robot persistence assertion gives false-green). F4 is a time-bomb that will cause subtle isolation failures after the DI fix lands and is strongly recommended.

aditya referenced this pull request

2026-03-09 11:48:18 +00:00

fix(actor): handle empty actor list without validation error #594

freemo commented

2026-03-09 19:03:29 +00:00

PM Status Check — Day 29 (Update)

Author: @brent.edwards | Milestone: v3.2.0 (M3) | Reviews: REQUEST_CHANGES (Aditya, new review posted today)

New Review — Aditya (#56627)

Aditya confirmed all prior F1-F6 findings resolved, then identified 8 new findings in the latest commit:

#	Finding	Severity
F1	`@unit` tag misclassifies integration-style tests (real DB, real DI, real temp dirs)	HIGH
F2	Robot `agents init` exit code not asserted	MEDIUM
F3	Robot `list` assertion too weak (matches header, not actual session)	MEDIUM
F4	DI container not reset between scenarios (latent isolation bomb)	MEDIUM
F5-F8	Documentation, style, pre-existing gaps	LOW

Action Required

Who	Action	Deadline
@brent.edwards	Address F1 (remove `@unit`), F2-F4 (Robot/DI fixes)	Mar 10 EOD
@aditya	Re-review after fixes	Mar 11

F1 is the highest priority — the @unit tag violates test classification standards since these tests use real SQLite, real temp dirs, and real DI paths. The fix is a one-line tag removal per scenario.

Note: This TDD PR blocks the bug fix for #570 (session create DI error). Bug fixes remain top priority.

## PM Status Check — Day 29 (Update) **Author**: @brent.edwards | **Milestone**: v3.2.0 (M3) | **Reviews**: REQUEST_CHANGES (Aditya, new review posted today) ### New Review — Aditya (#56627) Aditya confirmed all prior F1-F6 findings resolved, then identified **8 new findings** in the latest commit: | # | Finding | Severity | |---|---------|----------| | F1 | `@unit` tag misclassifies integration-style tests (real DB, real DI, real temp dirs) | **HIGH** | | F2 | Robot `agents init` exit code not asserted | MEDIUM | | F3 | Robot `list` assertion too weak (matches header, not actual session) | MEDIUM | | F4 | DI container not reset between scenarios (latent isolation bomb) | MEDIUM | | F5-F8 | Documentation, style, pre-existing gaps | LOW | ### Action Required | Who | Action | Deadline | |:----|:-------|:---------| | @brent.edwards | Address F1 (remove `@unit`), F2-F4 (Robot/DI fixes) | **Mar 10 EOD** | | @aditya | Re-review after fixes | Mar 11 | F1 is the highest priority — the `@unit` tag violates test classification standards since these tests use real SQLite, real temp dirs, and real DI paths. The fix is a one-line tag removal per scenario. **Note**: This TDD PR blocks the bug fix for #570 (session create DI error). Bug fixes remain top priority.

brent.edwards force-pushed feature/m3-fix-session-create-error from 1c9fe9e454 to 0248d2220d

2026-03-09 19:04:03 +00:00

Compare

freemo referenced this pull request

2026-03-09 19:04:33 +00:00

test(cli): add failing tests for session list DI container error (#554) #596

brent.edwards commented

2026-03-09 19:04:53 +00:00

Disposition of Aditya review (comment #56627) — commit `0248d222`

Single commit rebased onto master (a808c395). All blocking findings fixed plus recommended items.

Fixed

ID	Severity	Finding	Resolution
F1	HIGH	`@unit` tag misclassifies integration-level tests	Fixed — removed `@unit` from all 4 scenarios. Tags are now `@tdd @bug570 @wip`, consistent with `resource_type_bootstrap_git.feature` pattern.
F2	MEDIUM	Robot `agents init` exit code not asserted	Fixed — both test cases now capture `${init}` result and assert `Should Be Equal As Integers ${init.rc} 0` with diagnostic error message.
F3	MEDIUM	Robot `list` assertion too weak	Fixed — replaced `Should Contain ${list.stdout} sessions` with `Should Contain ${list.stdout} total:` and `Should Not Contain ${list.stdout} total: 0`. Mirrors the Behave step logic.
F4	MEDIUM	DI container not reset between scenarios	Fixed — added `from cleveragents.application.container import reset_container` at module level. `_setup_real_di_path()` calls `reset_container()` before setting env var. `_cleanup_sce()` calls `reset_container()` after restoring `_service` to release any DI-created connections.
F5	LOW	`_cleanup_sce` comment overstates engine disposal	Fixed — revised comment to: "Ensure module-level cache is cleared; GC will release any engine when the container is reset (done separately via reset_container())."
F6	LOW	Benchmark `timeout = 30.0` inconsistent	Fixed — changed to `timeout = 60` (int, matching all other benchmark suites).

Acknowledged (no code change)

ID	Severity	Finding	Notes
F7	LOW	Empty-list plain-format path untested (pre-existing bug)	Out of scope for this TDD PR (#570). Aditya notes this is a separate bug report concern.
F8	LOW	Robot `Evaluate __import__('tempfile')` anti-pattern	Acknowledged — matches existing pattern in other Robot files. Will address in a Robot infrastructure cleanup pass.

Verification

Gate	Result
`nox -s lint`	All checks passed
`nox -s typecheck`	0 errors, 0 warnings
`nox -s unit_tests`	9,029 passed, 0 failed, 1 feature skipped (`@wip`)
Single commit	`0248d222` — no merge commits
Rebased onto master	`a808c395`

## Disposition of Aditya review (comment #56627) — commit `0248d222` Single commit rebased onto master (`a808c395`). All blocking findings fixed plus recommended items. ### Fixed | ID | Severity | Finding | Resolution | |----|----------|---------|------------| | **F1** | HIGH | `@unit` tag misclassifies integration-level tests | **Fixed** — removed `@unit` from all 4 scenarios. Tags are now `@tdd @bug570 @wip`, consistent with `resource_type_bootstrap_git.feature` pattern. | | **F2** | MEDIUM | Robot `agents init` exit code not asserted | **Fixed** — both test cases now capture `${init}` result and assert `Should Be Equal As Integers ${init.rc} 0` with diagnostic error message. | | **F3** | MEDIUM | Robot `list` assertion too weak | **Fixed** — replaced `Should Contain ${list.stdout} sessions` with `Should Contain ${list.stdout} total:` and `Should Not Contain ${list.stdout} total: 0`. Mirrors the Behave step logic. | | **F4** | MEDIUM | DI container not reset between scenarios | **Fixed** — added `from cleveragents.application.container import reset_container` at module level. `_setup_real_di_path()` calls `reset_container()` before setting env var. `_cleanup_sce()` calls `reset_container()` after restoring `_service` to release any DI-created connections. | | **F5** | LOW | `_cleanup_sce` comment overstates engine disposal | **Fixed** — revised comment to: "Ensure module-level cache is cleared; GC will release any engine when the container is reset (done separately via reset_container())." | | **F6** | LOW | Benchmark `timeout = 30.0` inconsistent | **Fixed** — changed to `timeout = 60` (int, matching all other benchmark suites). | ### Acknowledged (no code change) | ID | Severity | Finding | Notes | |----|----------|---------|-------| | **F7** | LOW | Empty-list plain-format path untested (pre-existing bug) | Out of scope for this TDD PR (#570). Aditya notes this is a separate bug report concern. | | **F8** | LOW | Robot `Evaluate __import__('tempfile')` anti-pattern | Acknowledged — matches existing pattern in other Robot files. Will address in a Robot infrastructure cleanup pass. | ### Verification | Gate | Result | |------|--------| | `nox -s lint` | All checks passed | | `nox -s typecheck` | 0 errors, 0 warnings | | `nox -s unit_tests` | 9,029 passed, 0 failed, 1 feature skipped (`@wip`) | | Single commit | `0248d222` — no merge commits | | Rebased onto master | `a808c395` |

freemo commented

2026-03-09 19:11:32 +00:00

PM Status — Day 29 (2026-03-09)

@brent.edwards — Acknowledged your disposition of Aditya's review findings (commit 0248d222). Excellent turnaround on all 6 findings (F1–F6), especially the @unit tag reclassification and DI container reset.

Next step: @aditya — Please re-review this PR at your earliest convenience. All findings from your latest review have been addressed. This TDD PR is blocking the bug fix for #570, which is on the M3 critical path.

Timeline: We need approval by EOD 2026-03-10 to stay on track for M3 closure.

**PM Status — Day 29 (2026-03-09)** @brent.edwards — Acknowledged your disposition of Aditya's review findings (commit `0248d222`). Excellent turnaround on all 6 findings (F1–F6), especially the `@unit` tag reclassification and DI container reset. **Next step:** @aditya — Please re-review this PR at your earliest convenience. All findings from your latest review have been addressed. This TDD PR is blocking the bug fix for #570, which is on the M3 critical path. **Timeline:** We need approval by EOD 2026-03-10 to stay on track for M3 closure.

freemo added the

labels 2026-03-09 19:11:37 +00:00

freemo referenced this pull request

2026-03-09 19:16:59 +00:00

`agents session create` causes an error. #570

freemo reviewed 2026-03-09 19:21:06 +00:00

freemo left a comment

Code Review — Day 29 (2026-03-09)

PR: #595 — test(cli): add failing tests for session create DI container error (#570)
Author: brent.edwards | Milestone: v3.2.0 (M3) | Refs: #570

Findings:

F1 (Medium) — @tdd_expected_fail tag missing: The review playbook requires TDD PRs to use @tdd_expected_fail tags to mark tests that are expected to fail until the fix lands. The scenarios use @tdd @bug570 @wip but not @tdd_expected_fail. If the project convention uses @wip as the equivalent, this should be documented; otherwise, add @tdd_expected_fail for CI gate compatibility. This matters because automated test runners may not know to exclude these from pass/fail reporting without the canonical tag.
F2 (Low) — PR body claims @unit tag but diff doesn't include it: The PR description states "Added @unit tag to all scenarios for discoverability" but features/session_create_error.feature shows only @tdd @bug570 @wip on each scenario — no @unit tag. Either add the tag to match the description or correct the PR body.
F3 (Low) — Coupling to private module internals: Step definitions access session_mod._service and call session_mod._reset_session_service() — both private APIs. This is defensible for DI integration testing (you need to force the real code path), but it creates fragile tests that will break if the module's internal caching mechanism changes. Consider documenting this coupling in a comment at the top of the step file so future maintainers understand why the private access is necessary.
F4 (Info) — Benchmark track_create_persists docstring clarity: The docstring correctly notes this doesn't exercise the DI path (addressing Aditya's F4), which is good. The type: ignore removal (hurui200320's F11) is confirmed absent from the diff — the benchmark has no type: ignore comments. Clean.
F5 (Info) — Cleanup robustness: _cleanup_sce calls session_mod._reset_session_service() then restores context.sce_original_service. This is well-ordered — clearing the cache first, then restoring the original. The context.add_cleanup() registration happens early in _setup_real_di_path before any exception-prone lines (addressing the SEC-3 comment in the code). Good defensive pattern.
F6 (Info) — Refs: vs ISSUES CLOSED:: Correctly uses Refs: #570 since this TDD PR does not fix the bug — it only adds failing tests. This properly addresses hurui200320's F4 finding.

Checklist:

Criterion	Status
Architecture compliance	PASS — test-only PR, properly exercises DI/CLI/service layers without production changes
Test coverage	PASS — 4 BDD scenarios, Robot Framework smoke tests, ASV benchmarks; tests cover create, persistence, actor variants
Code quality	PASS — type hints on all functions, thorough docstrings, proper cleanup/teardown, no bare excepts
CONTRIBUTING.md compliance	PASS — conventional commit, `Refs:` (not `ISSUES CLOSED:`), implementation notes present, CI status documented
Security	PASS — temp dirs cleaned up, env vars restored, no hardcoded secrets, `engine.dispose()` called
Performance	PASS — shared engine/sessionmaker in benchmarks, no unbounded operations

Verdict: COMMENT
Summary: This is a well-structured TDD test PR. Brent has thoroughly addressed all 6 of Aditya's findings (F1-F6) as well as hurui200320's and CoreRasurae's earlier review feedback. The test design is sound — resetting _service to force the real DI path, using file-based SQLite for realistic testing, and proper cleanup with early registration. The two actionable items are F1 (confirm @tdd_expected_fail vs @wip convention for CI gating) and F2 (missing @unit tag claimed in PR body). Neither blocks merge. This PR is ready to serve as the TDD harness for the #570 fix.

**Code Review — Day 29 (2026-03-09)** **PR:** #595 — test(cli): add failing tests for session create DI container error (#570) **Author:** brent.edwards | **Milestone:** v3.2.0 (M3) | **Refs:** #570 --- **Findings:** - **F1 (Medium) — `@tdd_expected_fail` tag missing:** The review playbook requires TDD PRs to use `@tdd_expected_fail` tags to mark tests that are expected to fail until the fix lands. The scenarios use `@tdd @bug570 @wip` but not `@tdd_expected_fail`. If the project convention uses `@wip` as the equivalent, this should be documented; otherwise, add `@tdd_expected_fail` for CI gate compatibility. This matters because automated test runners may not know to exclude these from pass/fail reporting without the canonical tag. - **F2 (Low) — PR body claims `@unit` tag but diff doesn't include it:** The PR description states "Added `@unit` tag to all scenarios for discoverability" but `features/session_create_error.feature` shows only `@tdd @bug570 @wip` on each scenario — no `@unit` tag. Either add the tag to match the description or correct the PR body. - **F3 (Low) — Coupling to private module internals:** Step definitions access `session_mod._service` and call `session_mod._reset_session_service()` — both private APIs. This is defensible for DI integration testing (you need to force the real code path), but it creates fragile tests that will break if the module's internal caching mechanism changes. Consider documenting this coupling in a comment at the top of the step file so future maintainers understand why the private access is necessary. - **F4 (Info) — Benchmark `track_create_persists` docstring clarity:** The docstring correctly notes this doesn't exercise the DI path (addressing Aditya's F4), which is good. The `type: ignore` removal (hurui200320's F11) is confirmed absent from the diff — the benchmark has no `type: ignore` comments. Clean. - **F5 (Info) — Cleanup robustness:** `_cleanup_sce` calls `session_mod._reset_session_service()` then restores `context.sce_original_service`. This is well-ordered — clearing the cache first, then restoring the original. The `context.add_cleanup()` registration happens early in `_setup_real_di_path` before any exception-prone lines (addressing the SEC-3 comment in the code). Good defensive pattern. - **F6 (Info) — `Refs:` vs `ISSUES CLOSED:`:** Correctly uses `Refs: #570` since this TDD PR does not fix the bug — it only adds failing tests. This properly addresses hurui200320's F4 finding. --- **Checklist:** | Criterion | Status | |---|---| | Architecture compliance | PASS — test-only PR, properly exercises DI/CLI/service layers without production changes | | Test coverage | PASS — 4 BDD scenarios, Robot Framework smoke tests, ASV benchmarks; tests cover create, persistence, actor variants | | Code quality | PASS — type hints on all functions, thorough docstrings, proper cleanup/teardown, no bare excepts | | CONTRIBUTING.md compliance | PASS — conventional commit, `Refs:` (not `ISSUES CLOSED:`), implementation notes present, CI status documented | | Security | PASS — temp dirs cleaned up, env vars restored, no hardcoded secrets, `engine.dispose()` called | | Performance | PASS — shared engine/sessionmaker in benchmarks, no unbounded operations | **Verdict:** COMMENT **Summary:** This is a well-structured TDD test PR. Brent has thoroughly addressed all 6 of Aditya's findings (F1-F6) as well as hurui200320's and CoreRasurae's earlier review feedback. The test design is sound — resetting `_service` to force the real DI path, using file-based SQLite for realistic testing, and proper cleanup with early registration. The two actionable items are **F1** (confirm `@tdd_expected_fail` vs `@wip` convention for CI gating) and **F2** (missing `@unit` tag claimed in PR body). Neither blocks merge. This PR is ready to serve as the TDD harness for the #570 fix.

features/session_create_error.feature Outdated

						
				@@ -0,0 +9,4 @@

				    Given a session-create-error CLI runner using the real DI path

				  @tdd @bug570 @wip

				  Scenario: Session create produces a new session

freemo commented

2026-03-09 19:21:06 +00:00

F1 (Medium): Review playbook requires @tdd_expected_fail tag for TDD PRs where tests are expected to fail until the fix lands. These scenarios use @tdd @bug570 @wip — if @wip serves the same CI gating purpose, document the equivalence; otherwise add @tdd_expected_fail here.

**F1 (Medium):** Review playbook requires `@tdd_expected_fail` tag for TDD PRs where tests are expected to fail until the fix lands. These scenarios use `@tdd @bug570 @wip` — if `@wip` serves the same CI gating purpose, document the equivalence; otherwise add `@tdd_expected_fail` here.

features/session_create_error.feature Outdated

						
				@@ -0,0 +14,4 @@

				    Then the session-create-error command should exit successfully

				    And the session-create-error output should contain "session_id:"

				  @tdd @bug570 @wip

freemo commented

2026-03-09 19:21:06 +00:00

F2 (Low): PR description states @unit tag was added to all scenarios, but this line only has @tdd @bug570 @wip. Add @unit for consistency with the stated change, or update the PR body.

**F2 (Low):** PR description states `@unit` tag was added to all scenarios, but this line only has `@tdd @bug570 @wip`. Add `@unit` for consistency with the stated change, or update the PR body.

features/steps/session_create_error_steps.py Outdated

						
				@@ -0,0 +48,4 @@

				    """

				    # Store original _service so cleanup can restore it.

				    context.sce_original_service = session_mod._service

				    context.sce_tmpdir = tempfile.mkdtemp(prefix="session_create_err_570_")

freemo commented

2026-03-09 19:21:06 +00:00

F3 (Low): Accessing session_mod._service (private attribute) and session_mod._reset_session_service() is necessary here to force the real DI path, but it creates coupling to internal implementation. Consider adding a brief comment here noting why the private access is required (e.g. # Access private _service to force _get_session_service() to re-resolve via DI) so future maintainers don't mistake this for an oversight.

**F3 (Low):** Accessing `session_mod._service` (private attribute) and `session_mod._reset_session_service()` is necessary here to force the real DI path, but it creates coupling to internal implementation. Consider adding a brief comment here noting *why* the private access is required (e.g. `# Access private _service to force _get_session_service() to re-resolve via DI`) so future maintainers don't mistake this for an oversight.

brent.edwards commented

2026-03-09 20:05:15 +00:00

Code Review — PR #595 `feature/m3-fix-session-create-error`

Reviewed commit: 0248d222 on base a808c395
Review type: Test review + Architecture review
Verdict: REQUEST CHANGES — 2× P1, 5× P2, 2× P3

P1 — Must fix before merge

F1 · P1:must-fix — TDD bug tags use wrong convention
features/session_create_error.feature

Tags @tdd @bug570 @wip must be @tdd_bug @tdd_bug_570 @tdd_expected_fail per CONTRIBUTING.md §TDD Bug Test Tags (lines 1176-1221). The @wip tag is especially problematic because behave.ini has tags = ~@wip which excludes these scenarios from CI entirely — meaning these tests are never actually run.

F2 · P1:must-fix — Imports buried inside function body
features/steps/session_create_error_steps.py — _setup_real_di_path()

Two imports (from cleveragents.infrastructure.persistence.connection import reset_container and import importlib) are inside the _setup_real_di_path() function body. CONTRIBUTING.md §Import Guidelines (lines 1287-1296) requires all imports at the top of the file. The only exception is inside if TYPE_CHECKING: blocks.

P2 — Should fix (follow-up within 3 days)

F3 · P2:should-fix — CHANGELOG overstates DI path coverage

The CHANGELOG entry says "Add full DI path coverage" but the PR only tests the session-create error path. "Full DI path coverage" implies comprehensive coverage of all DI paths, which isn't the case. Consider rewording to "Add DI path test coverage for session creation error handling."

F4 · P2:should-fix — importlib.reload() is a fragile pattern
features/steps/session_create_error_steps.py

Using importlib.reload() to reset module state between tests is fragile and can cause subtle test pollution. If the module has side effects on import, reloading may not fully reset state. Consider documenting why this approach is necessary or using dependency injection to make the code more testable.

F5 · P2:should-fix — Robot test #2 missing AttributeError stderr check
robot/session_create_error.robot

The second Robot test case verifies the exit code but doesn't assert on the expected error message in stderr. If the CLI fails for a different reason (e.g., import error), the test would still pass.

F6 · P2:should-fix — Missing @regression tag

Sibling feature files (e.g., session_list_error.feature in PR #596) include a @regression tag. This feature file is missing it. If this is a regression test for bug #570, it should carry the @regression tag for consistent test suite filtering.

F7 · P2:should-fix — track_create_persists step always returns 1
features/steps/session_create_error_steps.py

The step that verifies session creation persistence appears to always assert a count of 1 regardless of context. If the test is meant to verify that creation succeeded, the expected count should be parameterized or clearly documented.

P3 — Nits (author discretion)

F8 · P3:nit — Commit message tag order doesn't match feature file

The commit body lists tags in a different order than they appear in the feature file. Minor inconsistency but worth aligning for grep-ability.

F9 · P3:nit — No Robot test for custom actor scenarios

The Robot tests cover default actor scenarios but don't exercise the custom actor session creation path. This would improve integration-level confidence.

Summary

The session creation error handling test coverage is a solid addition. The production path through the DI container is properly exercised. However, the P1 findings are significant: the wrong TDD tags mean these tests are silently excluded from CI (via behave.ini's ~@wip filter), and the buried imports violate the project's import guidelines. Both must be fixed before merge.

## Code Review — PR #595 `feature/m3-fix-session-create-error` **Reviewed commit:** `0248d222` on base `a808c395` **Review type:** Test review + Architecture review **Verdict:** REQUEST CHANGES — 2× P1, 5× P2, 2× P3 --- ### P1 — Must fix before merge **F1 · `P1:must-fix` — TDD bug tags use wrong convention** `features/session_create_error.feature` Tags `@tdd @bug570 @wip` must be `@tdd_bug @tdd_bug_570 @tdd_expected_fail` per CONTRIBUTING.md §TDD Bug Test Tags (lines 1176-1221). The `@wip` tag is especially problematic because `behave.ini` has `tags = ~@wip` which excludes these scenarios from CI entirely — meaning these tests are never actually run. **F2 · `P1:must-fix` — Imports buried inside function body** `features/steps/session_create_error_steps.py` — `_setup_real_di_path()` Two imports (`from cleveragents.infrastructure.persistence.connection import reset_container` and `import importlib`) are inside the `_setup_real_di_path()` function body. CONTRIBUTING.md §Import Guidelines (lines 1287-1296) requires all imports at the top of the file. The only exception is inside `if TYPE_CHECKING:` blocks. --- ### P2 — Should fix (follow-up within 3 days) **F3 · `P2:should-fix` — CHANGELOG overstates DI path coverage** The CHANGELOG entry says "Add full DI path coverage" but the PR only tests the session-create error path. "Full DI path coverage" implies comprehensive coverage of all DI paths, which isn't the case. Consider rewording to "Add DI path test coverage for session creation error handling." **F4 · `P2:should-fix` — `importlib.reload()` is a fragile pattern** `features/steps/session_create_error_steps.py` Using `importlib.reload()` to reset module state between tests is fragile and can cause subtle test pollution. If the module has side effects on import, reloading may not fully reset state. Consider documenting why this approach is necessary or using dependency injection to make the code more testable. **F5 · `P2:should-fix` — Robot test #2 missing `AttributeError` stderr check** `robot/session_create_error.robot` The second Robot test case verifies the exit code but doesn't assert on the expected error message in stderr. If the CLI fails for a different reason (e.g., import error), the test would still pass. **F6 · `P2:should-fix` — Missing `@regression` tag** Sibling feature files (e.g., `session_list_error.feature` in PR #596) include a `@regression` tag. This feature file is missing it. If this is a regression test for bug #570, it should carry the `@regression` tag for consistent test suite filtering. **F7 · `P2:should-fix` — `track_create_persists` step always returns 1** `features/steps/session_create_error_steps.py` The step that verifies session creation persistence appears to always assert a count of 1 regardless of context. If the test is meant to verify that creation succeeded, the expected count should be parameterized or clearly documented. --- ### P3 — Nits (author discretion) **F8 · `P3:nit` — Commit message tag order doesn't match feature file** The commit body lists tags in a different order than they appear in the feature file. Minor inconsistency but worth aligning for grep-ability. **F9 · `P3:nit` — No Robot test for custom actor scenarios** The Robot tests cover default actor scenarios but don't exercise the custom actor session creation path. This would improve integration-level confidence. --- ### Summary The session creation error handling test coverage is a solid addition. The production path through the DI container is properly exercised. However, the P1 findings are significant: the wrong TDD tags mean these tests are **silently excluded from CI** (via `behave.ini`'s `~@wip` filter), and the buried imports violate the project's import guidelines. Both must be fixed before merge.

CoreRasurae was unassigned by freemo

2026-03-09 20:09:03 +00:00

brent.edwards was assigned by freemo

2026-03-09 20:09:03 +00:00

freemo referenced this pull request

2026-03-09 20:17:15 +00:00

feat(cli): implement full output rendering framework per specification #550

freemo commented

2026-03-09 20:17:25 +00:00

PM Note (Day 29) — Reassignment

Changes:

Assignee: @CoreRasurae → @brent.edwards

Rationale: Brent has the active branch feature/m3-fix-session-create-error and the open PR for this work. The issue assignee should match the developer who has the active branch and PR. Luis was nominally assigned but Brent is doing the actual implementation.

**PM Note (Day 29) — Reassignment** **Changes:** - **Assignee**: @CoreRasurae → @brent.edwards **Rationale:** Brent has the active branch `feature/m3-fix-session-create-error` and the open PR for this work. The issue assignee should match the developer who has the active branch and PR. Luis was nominally assigned but Brent is doing the actual implementation.

freemo referenced this pull request

2026-03-09 20:18:18 +00:00

TDD: `agents session create` DI container missing `db` provider test #631

freemo commented

2026-03-09 20:28:57 +00:00

PM Compliance Audit — CONTRIBUTING.md Checklist

#	Requirement	Status
1	Detailed PR description	PASS
2	Issue reference with closing keyword	PASS — `Closes #570`
3	Dependency link (PR blocks issue)	PASS — PR blocks #570
4	CHANGELOG.md updated	PASS
5	Milestone	PASS — v3.2.0
6	Type label	PASS — Type/Testing
7	Assignee	PASS — @brent.edwards
8	Mergeable	PASS
9	Reviews	STALE — REQUEST_CHANGES from @CoreRasurae, @hurui200320, @hamza.khyari predate latest rebase. Need re-review.

Merge blockers: 3 stale REQUEST_CHANGES reviews. Brent addressed all findings (commits 0248d222, 1c9fe9e4). Reviewers need to re-review or withdraw.

## PM Compliance Audit — CONTRIBUTING.md Checklist | # | Requirement | Status | |---|------------|--------| | 1 | Detailed PR description | PASS | | 2 | Issue reference with closing keyword | PASS — `Closes #570` | | 3 | Dependency link (PR blocks issue) | PASS — PR blocks #570 | | 4 | CHANGELOG.md updated | PASS | | 5 | Milestone | PASS — v3.2.0 | | 6 | Type label | PASS — Type/Testing | | 7 | Assignee | PASS — @brent.edwards | | 8 | Mergeable | PASS | | 9 | Reviews | **STALE** — REQUEST_CHANGES from @CoreRasurae, @hurui200320, @hamza.khyari predate latest rebase. Need re-review. | **Merge blockers**: 3 stale REQUEST_CHANGES reviews. Brent addressed all findings (commits `0248d222`, `1c9fe9e4`). Reviewers need to re-review or withdraw.

brent.edwards force-pushed feature/m3-fix-session-create-error from 0248d2220d to 9ef1e9c000

2026-03-09 22:49:47 +00:00

Compare

brent.edwards commented

2026-03-09 22:50:53 +00:00

Disposition of all review findings — commit `9ef1e9c0`

Rebased as a single commit on master (83f2f3a0). All quality gates pass: nox -s lint ✓, nox -s typecheck 0 errors ✓.

Key structural changes in this revision

@tdd_expected_fail infrastructure implemented — Behave after_scenario hook and Robot listener both invert pass/fail for tagged scenarios. This replaces the @wip exclusion approach entirely.
18 existing TDD scenarios migrated from @tdd @bugNNN to @tdd_bug @tdd_bug_NNN across 5 feature files.
PR #595's own scenarios now tagged @tdd_bug @tdd_bug_570 @tdd_expected_fail.
Buried imports moved to top of file in step definitions.
Private API access documented with rationale comment in step definitions.

CoreRasurae — Review #1980

ID	Severity	Finding	Disposition
F1	HIGH	Rich Console not captured	✅ Fixed — `--format plain` on all invocations
F2	HIGH	Benchmarks bypass DI path	✅ Fixed — docstrings clarify service-layer scope
F3	MEDIUM	Missing error handling scenario	⏭️ Out of scope for TDD PR — covers only bug reproduction
F4	MEDIUM	Branch name mismatch	⏭️ Acknowledged — not changing to avoid PR disruption
F5	MEDIUM	Panel title spec mismatch	✅ Fixed — `--format plain` avoids Rich path entirely
F6	LOW	Dead `_counter`	✅ Fixed — removed
F7	LOW	Engine recreated per call	✅ Fixed — shared in `setup()`
F8	LOW	Robot missing `[Tags]`	✅ Fixed — `tdd_bug tdd_bug_570 tdd_expected_fail`
F9	LOW	Benchmark filename mismatch	⏭️ Acknowledged

hurui200320 — Review #2008

ID	Severity	Finding	Disposition
F1	HIGH	Merge commit	✅ Fixed — single squashed commit, rebased
F2	HIGH	Multiple commits	✅ Fixed — single commit `9ef1e9c0`
F3	HIGH	Branch name mismatch	⏭️ Acknowledged
F4	HIGH	`ISSUES CLOSED` on test-only PR	✅ Fixed — `Refs: #570`
F5	MEDIUM	Type label mismatch	✅ `Type/Testing` appropriate for test-only PR
F6	MEDIUM	Scenario #4 misleading	✅ Fixed — renamed
F7	MEDIUM	Missing exit code assertion	✅ Fixed — added to Scenario #2
F8	MEDIUM	Robot no `--format plain`	✅ Fixed
F9	LOW	Benchmark docstring misleading	✅ Fixed
F10	LOW	Cleanup doesn't restore `_service`	✅ Fixed — restores `sce_original_service`
F11	LOW	`# type: ignore` in benchmark	✅ Fixed — removed

hamza.khyari — Review #2034

ID	Severity	Finding	Disposition
PROC-1	Major	`behave.ini` missing `~@wip`	✅ Fixed — `@tdd_expected_fail` inversion replaces `@wip` exclusion; scenarios pass CI via inversion
PROC-2	Major	`noxfile.py` missing `--exclude wip` for Robot	✅ Fixed — `--exclude wip` already present; Robot listener handles `tdd_expected_fail` inversion
PROC-3	Major	`fix(cli):` on test-only PR	✅ Fixed — commit uses `test(cli):` prefix
TEST-1	Medium	Robot `Sessions` case mismatch	✅ Already correct — uses lowercase `total:`
SEC-1	Medium	DI-created engine never disposed	✅ Fixed — `_cleanup_sce` calls `_reset_session_service()` then `reset_container()`
SEC-2	Medium	`db` vs `session_factory` type mismatch	⏭️ Acknowledged — for the fix author to handle
PROC-4	Medium	Branch name mismatch	⏭️ Acknowledged
PROC-5	Medium	`Type/Testing` not in taxonomy	✅ `Type/Testing` exists as a label and is appropriate
TEST-2	Medium	No error handling scenarios	⏭️ Out of scope for TDD PR
SEC-3	Low	Env var set before cleanup registered	✅ Fixed — cleanup registered early in `_setup_real_di_path`
BENCH-1	Low	`importlib.reload` doesn't cascade	⏭️ Pre-existing ASV limitation
PROC-6	Low	Benchmark filename mismatch	⏭️ Acknowledged

freemo — Review #2060

ID	Severity	Finding	Disposition
F1	Medium	`@tdd_expected_fail` tag missing	✅ Fixed — full infrastructure implemented (Behave hook + Robot listener + noxfile registration)
F2	Low	PR body claims `@unit` but not in diff	✅ Fixed — PR body updated to match actual tags
F3	Low	Coupling to private module internals	✅ Fixed — comment added explaining rationale for `_service` access
F4	Info	Benchmark clean	✅ Confirmed — no `# type: ignore`
F5	Info	Cleanup robustness	✅ Clean — well-ordered
F6	Info	`Refs:` vs `ISSUES CLOSED:`	✅ Correct — uses `Refs: #570`

All actionable findings resolved. Items marked ⏭️ are out of scope for this TDD PR or acknowledged process items that don't block merge.

## Disposition of all review findings — commit `9ef1e9c0` Rebased as a single commit on master (`83f2f3a0`). All quality gates pass: `nox -s lint` ✓, `nox -s typecheck` 0 errors ✓. ### Key structural changes in this revision 1. **`@tdd_expected_fail` infrastructure implemented** — Behave `after_scenario` hook and Robot listener both invert pass/fail for tagged scenarios. This replaces the `@wip` exclusion approach entirely. 2. **18 existing TDD scenarios migrated** from `@tdd @bugNNN` to `@tdd_bug @tdd_bug_NNN` across 5 feature files. 3. **PR #595's own scenarios** now tagged `@tdd_bug @tdd_bug_570 @tdd_expected_fail`. 4. **Buried imports moved to top of file** in step definitions. 5. **Private API access documented** with rationale comment in step definitions. ### CoreRasurae — Review #1980 | ID | Severity | Finding | Disposition | |----|----------|---------|-------------| | F1 | HIGH | Rich Console not captured | ✅ Fixed — `--format plain` on all invocations | | F2 | HIGH | Benchmarks bypass DI path | ✅ Fixed — docstrings clarify service-layer scope | | F3 | MEDIUM | Missing error handling scenario | ⏭️ Out of scope for TDD PR — covers only bug reproduction | | F4 | MEDIUM | Branch name mismatch | ⏭️ Acknowledged — not changing to avoid PR disruption | | F5 | MEDIUM | Panel title spec mismatch | ✅ Fixed — `--format plain` avoids Rich path entirely | | F6 | LOW | Dead `_counter` | ✅ Fixed — removed | | F7 | LOW | Engine recreated per call | ✅ Fixed — shared in `setup()` | | F8 | LOW | Robot missing `[Tags]` | ✅ Fixed — `tdd_bug tdd_bug_570 tdd_expected_fail` | | F9 | LOW | Benchmark filename mismatch | ⏭️ Acknowledged | ### hurui200320 — Review #2008 | ID | Severity | Finding | Disposition | |----|----------|---------|-------------| | F1 | HIGH | Merge commit | ✅ Fixed — single squashed commit, rebased | | F2 | HIGH | Multiple commits | ✅ Fixed — single commit `9ef1e9c0` | | F3 | HIGH | Branch name mismatch | ⏭️ Acknowledged | | F4 | HIGH | `ISSUES CLOSED` on test-only PR | ✅ Fixed — `Refs: #570` | | F5 | MEDIUM | Type label mismatch | ✅ `Type/Testing` appropriate for test-only PR | | F6 | MEDIUM | Scenario #4 misleading | ✅ Fixed — renamed | | F7 | MEDIUM | Missing exit code assertion | ✅ Fixed — added to Scenario #2 | | F8 | MEDIUM | Robot no `--format plain` | ✅ Fixed | | F9 | LOW | Benchmark docstring misleading | ✅ Fixed | | F10 | LOW | Cleanup doesn't restore `_service` | ✅ Fixed — restores `sce_original_service` | | F11 | LOW | `# type: ignore` in benchmark | ✅ Fixed — removed | ### hamza.khyari — Review #2034 | ID | Severity | Finding | Disposition | |----|----------|---------|-------------| | PROC-1 | Major | `behave.ini` missing `~@wip` | ✅ Fixed — `@tdd_expected_fail` inversion replaces `@wip` exclusion; scenarios pass CI via inversion | | PROC-2 | Major | `noxfile.py` missing `--exclude wip` for Robot | ✅ Fixed — `--exclude wip` already present; Robot listener handles `tdd_expected_fail` inversion | | PROC-3 | Major | `fix(cli):` on test-only PR | ✅ Fixed — commit uses `test(cli):` prefix | | TEST-1 | Medium | Robot `Sessions` case mismatch | ✅ Already correct — uses lowercase `total:` | | SEC-1 | Medium | DI-created engine never disposed | ✅ Fixed — `_cleanup_sce` calls `_reset_session_service()` then `reset_container()` | | SEC-2 | Medium | `db` vs `session_factory` type mismatch | ⏭️ Acknowledged — for the fix author to handle | | PROC-4 | Medium | Branch name mismatch | ⏭️ Acknowledged | | PROC-5 | Medium | `Type/Testing` not in taxonomy | ✅ `Type/Testing` exists as a label and is appropriate | | TEST-2 | Medium | No error handling scenarios | ⏭️ Out of scope for TDD PR | | SEC-3 | Low | Env var set before cleanup registered | ✅ Fixed — cleanup registered early in `_setup_real_di_path` | | BENCH-1 | Low | `importlib.reload` doesn't cascade | ⏭️ Pre-existing ASV limitation | | PROC-6 | Low | Benchmark filename mismatch | ⏭️ Acknowledged | ### freemo — Review #2060 | ID | Severity | Finding | Disposition | |----|----------|---------|-------------| | F1 | Medium | `@tdd_expected_fail` tag missing | ✅ Fixed — full infrastructure implemented (Behave hook + Robot listener + noxfile registration) | | F2 | Low | PR body claims `@unit` but not in diff | ✅ Fixed — PR body updated to match actual tags | | F3 | Low | Coupling to private module internals | ✅ Fixed — comment added explaining rationale for `_service` access | | F4 | Info | Benchmark clean | ✅ Confirmed — no `# type: ignore` | | F5 | Info | Cleanup robustness | ✅ Clean — well-ordered | | F6 | Info | `Refs:` vs `ISSUES CLOSED:` | ✅ Correct — uses `Refs: #570` | **All actionable findings resolved. Items marked ⏭️ are out of scope for this TDD PR or acknowledged process items that don't block merge.**

brent.edwards referenced this pull request

2026-03-09 22:59:16 +00:00

test(cli): add failing tests for session list DI container error (#554) #596

brent.edwards referenced this pull request

2026-03-09 23:00:12 +00:00

test(cli): add failing tests for session list DI container error (#554) #596

brent.edwards force-pushed feature/m3-fix-session-create-error from 9ef1e9c000 to 37cd34c18e

2026-03-09 23:28:15 +00:00

Compare

brent.edwards reviewed 2026-03-09 23:30:07 +00:00

brent.edwards left a comment

Self-Review — PR #595 `test(cli): add failing tests for session create DI container error`

Reviewed commit: 37cd34c1 on base 83f2f3a0
Review type: Self-review (test + architecture + infrastructure)
Verdict: APPROVE with comments — 0 P0, 0 P1 (fixed), 4 P2, 1 P3

P1 findings — Fixed in this push

Two P1 issues were identified during self-review and fixed in commit 37cd34c1:

F1 (was P1, now fixed) — slow_integration_tests missing --listener
noxfile.py — The slow_integration_tests nox session ran robot over robot/ without --listener robot/tdd_expected_fail_listener.py. The integration_tests session had it, but slow_integration_tests did not. @tdd_expected_fail tests would fail unexpectedly in that session.
Resolution: Added --listener robot/tdd_expected_fail_listener.py to the slow_integration_tests session.

F2 (was P1, now fixed) — "Unexpected pass" branch sets no error message
features/environment.py:322 — When a @tdd_expected_fail scenario unexpectedly passes, the Behave hook called scenario.set_status(Status.failed) with no diagnostic text. Developer would see FAILED with all steps PASSED and zero explanation. The Robot listener correctly sets result.message but the Behave hook didn't.
Resolution: Added scenario.error_message with the same diagnostic message pattern used by the Robot listener.

P2 — Should fix (follow-up within 3 days)

F3 · P2:should-fix — CHANGELOG entry says "infrastructure" but could be more specific

The CHANGELOG entry is somewhat vague about what @tdd_expected_fail infrastructure entails. Consider expanding to mention both the Behave hook and Robot listener specifically.

F4 · P2:should-fix — importlib.reload() in step setup is fragile
features/steps/session_create_error_steps.py

Using importlib.reload() to reset module state is a known fragile pattern. It works for the current use case (forcing _service = None to exercise the real DI path), but if the module grows side effects on import, it may not fully reset state. Documenting the rationale in a comment would help future maintainers.

F5 · P2:should-fix — Robot test #2 missing stderr check for AttributeError
robot/session_create_error.robot

The second Robot test case verifies exit code but doesn't assert on the expected error message in stderr. A different failure mode could produce the same exit code without the specific AttributeError.

F6 · P2:should-fix — Benchmark track_create_persists always returns session count regardless of DI bug
benchmarks/session_create_error_bench.py

The benchmark constructs PersistentSessionService directly, bypassing the DI container. The tracker always returns the session count regardless of bug #570. The class-level docstring correctly notes this, but the method-level docstring could be clearer.

P3 — Nits

F7 · P3:nit — Step definitions access private _service attribute
features/steps/session_create_error_steps.py

Steps access session_mod._service and call session_mod._reset_session_service(). This is defensible for DI integration testing, but a brief comment explaining why would improve maintainability.

Quality Gates

Gate	Status
`nox -s lint`	PASS (0 errors)
`nox -s typecheck`	PASS (0 errors)
Single squashed commit	PASS
Rebased on master	PASS (`83f2f3a0`)
Conventional commit format	PASS
`Refs: #570` (not ISSUES CLOSED)	PASS

Summary

This TDD test PR is solid after 5 review rounds from CoreRasurae, hurui200320, hamza.khyari, and freemo. The latest push fixes the two P1 infrastructure gaps (missing --listener in slow_integration_tests and missing diagnostic message in the Behave unexpected-pass branch). The remaining P2/P3 findings are minor and can be addressed in follow-up. Ready for merge.

## Self-Review — PR #595 `test(cli): add failing tests for session create DI container error` **Reviewed commit:** `37cd34c1` on base `83f2f3a0` **Review type:** Self-review (test + architecture + infrastructure) **Verdict:** APPROVE with comments — 0 P0, 0 P1 (fixed), 4 P2, 1 P3 --- ### P1 findings — Fixed in this push Two P1 issues were identified during self-review and fixed in commit `37cd34c1`: **F1 (was P1, now fixed) — `slow_integration_tests` missing `--listener`** `noxfile.py` — The `slow_integration_tests` nox session ran `robot` over `robot/` without `--listener robot/tdd_expected_fail_listener.py`. The `integration_tests` session had it, but `slow_integration_tests` did not. `@tdd_expected_fail` tests would fail unexpectedly in that session. **Resolution:** Added `--listener robot/tdd_expected_fail_listener.py` to the `slow_integration_tests` session. **F2 (was P1, now fixed) — "Unexpected pass" branch sets no error message** `features/environment.py:322` — When a `@tdd_expected_fail` scenario unexpectedly passes, the Behave hook called `scenario.set_status(Status.failed)` with no diagnostic text. Developer would see `FAILED` with all steps `PASSED` and zero explanation. The Robot listener correctly sets `result.message` but the Behave hook didn't. **Resolution:** Added `scenario.error_message` with the same diagnostic message pattern used by the Robot listener. --- ### P2 — Should fix (follow-up within 3 days) **F3 · `P2:should-fix` — CHANGELOG entry says "infrastructure" but could be more specific** The CHANGELOG entry is somewhat vague about what `@tdd_expected_fail` infrastructure entails. Consider expanding to mention both the Behave hook and Robot listener specifically. **F4 · `P2:should-fix` — `importlib.reload()` in step setup is fragile** `features/steps/session_create_error_steps.py` Using `importlib.reload()` to reset module state is a known fragile pattern. It works for the current use case (forcing `_service = None` to exercise the real DI path), but if the module grows side effects on import, it may not fully reset state. Documenting the rationale in a comment would help future maintainers. **F5 · `P2:should-fix` — Robot test #2 missing stderr check for `AttributeError`** `robot/session_create_error.robot` The second Robot test case verifies exit code but doesn't assert on the expected error message in stderr. A different failure mode could produce the same exit code without the specific `AttributeError`. **F6 · `P2:should-fix` — Benchmark `track_create_persists` always returns session count regardless of DI bug** `benchmarks/session_create_error_bench.py` The benchmark constructs `PersistentSessionService` directly, bypassing the DI container. The tracker always returns the session count regardless of bug #570. The class-level docstring correctly notes this, but the method-level docstring could be clearer. --- ### P3 — Nits **F7 · `P3:nit` — Step definitions access private `_service` attribute** `features/steps/session_create_error_steps.py` Steps access `session_mod._service` and call `session_mod._reset_session_service()`. This is defensible for DI integration testing, but a brief comment explaining why would improve maintainability. --- ### Quality Gates | Gate | Status | |------|--------| | `nox -s lint` | PASS (0 errors) | | `nox -s typecheck` | PASS (0 errors) | | Single squashed commit | PASS | | Rebased on master | PASS (`83f2f3a0`) | | Conventional commit format | PASS | | `Refs: #570` (not ISSUES CLOSED) | PASS | ### Summary This TDD test PR is solid after 5 review rounds from CoreRasurae, hurui200320, hamza.khyari, and freemo. The latest push fixes the two P1 infrastructure gaps (missing `--listener` in `slow_integration_tests` and missing diagnostic message in the Behave unexpected-pass branch). The remaining P2/P3 findings are minor and can be addressed in follow-up. Ready for merge.

brent.edwards referenced this pull request

2026-03-10 00:28:40 +00:00

bug(test): M1–M6 E2E verification suites use mocks instead of exercising the real system #658

brent.edwards force-pushed feature/m3-fix-session-create-error from 37cd34c18e to 0f629b2613

2026-03-10 01:25:40 +00:00

Compare

brent.edwards commented

2026-03-10 01:25:59 +00:00

Disposition — Self-Review #2072 Findings

Addressed all P2/P3 findings from self-review in commit 0f629b26. Rebased onto latest master (fcdf80f3).

F1 (was P1, fixed previously) — `slow_integration_tests` missing `--listener`

Already fixed in prior push. Confirmed still present in 0f629b26.

F2 (was P1, fixed previously) — "Unexpected pass" branch sets no error message

Already fixed in prior push. Confirmed still present in 0f629b26.

F3 (P2) — CHANGELOG entry vague about infrastructure

Fixed. Expanded CHANGELOG to explicitly describe "a Behave after_scenario hook in features/environment.py that flips pass/fail for @tdd_expected_fail scenarios, and a Robot Framework Listener API v3 plugin (robot/tdd_expected_fail_listener.py) with identical semantics."

F4 (P2) — `importlib.reload()` / module reset is fragile

Fixed. Added inline comment at the session_mod._service = None line explaining the fragility and pointing to the module docstring for full rationale. The step file's module docstring (lines 20-28) already documents why private _service access is necessary.

F5 (P2) — Robot test #2 missing stderr check for `AttributeError`

Fixed. Added Should Not Contain ${create.stderr} AttributeError assertion to the "Session Create Then List Shows Created Session" test case, matching the pattern in test case #1.

F6 (P2) — Benchmark `track_create_persists` docstring unclear

Fixed. Expanded method docstring to explicitly state: "This benchmark constructs PersistentSessionService directly, bypassing the DI container (_get_session_service / container.db()). It therefore does not reproduce bug #570 — its purpose is to establish a service-layer persistence baseline."

F7 (P3) — Private `_service` access

Acknowledged. Already documented in module docstring (lines 20-28) with a dedicated "Private API access" section explaining the rationale and maintenance implications. No further action needed.

Quality gates

nox -s typecheck — 0 errors ✓
nox -s lint — 0 errors ✓
Single squashed commit ✓
Rebased on latest master (fcdf80f3) ✓

## Disposition — Self-Review #2072 Findings Addressed all P2/P3 findings from self-review in commit `0f629b26`. Rebased onto latest `master` (`fcdf80f3`). ### F1 (was P1, fixed previously) — `slow_integration_tests` missing `--listener` Already fixed in prior push. Confirmed still present in `0f629b26`. ### F2 (was P1, fixed previously) — "Unexpected pass" branch sets no error message Already fixed in prior push. Confirmed still present in `0f629b26`. ### F3 (P2) — CHANGELOG entry vague about infrastructure **Fixed.** Expanded CHANGELOG to explicitly describe "a Behave `after_scenario` hook in `features/environment.py` that flips pass/fail for `@tdd_expected_fail` scenarios, and a Robot Framework Listener API v3 plugin (`robot/tdd_expected_fail_listener.py`) with identical semantics." ### F4 (P2) — `importlib.reload()` / module reset is fragile **Fixed.** Added inline comment at the `session_mod._service = None` line explaining the fragility and pointing to the module docstring for full rationale. The step file's module docstring (lines 20-28) already documents why private `_service` access is necessary. ### F5 (P2) — Robot test #2 missing stderr check for `AttributeError` **Fixed.** Added `Should Not Contain ${create.stderr} AttributeError` assertion to the "Session Create Then List Shows Created Session" test case, matching the pattern in test case #1. ### F6 (P2) — Benchmark `track_create_persists` docstring unclear **Fixed.** Expanded method docstring to explicitly state: "This benchmark constructs `PersistentSessionService` directly, bypassing the DI container (`_get_session_service` / `container.db()`). It therefore does **not** reproduce bug #570 — its purpose is to establish a service-layer persistence baseline." ### F7 (P3) — Private `_service` access **Acknowledged.** Already documented in module docstring (lines 20-28) with a dedicated "Private API access" section explaining the rationale and maintenance implications. No further action needed. ### Quality gates - `nox -s typecheck` — 0 errors ✓ - `nox -s lint` — 0 errors ✓ - Single squashed commit ✓ - Rebased on latest master (`fcdf80f3`) ✓

hurui200320 approved these changes 2026-03-10 08:55:11 +00:00

Dismissed

hurui200320 left a comment

Code Review — `test(cli): add failing tests for session create DI container error (#570)`

Commit: 0f629b26 | Branch: feature/m3-fix-session-create-error | Base: master
Reviewer: Rui Hu — follow-up to my previous REQUEST_CHANGES review #2008

Prior Review Resolution (my review #2008)

ID	Finding	Severity	Status
F1	Merge commit on branch	HIGH	RESOLVED — single squashed commit
F2	Multiple commits for same issue	HIGH	RESOLVED — single commit
F3	Branch name mismatch with issue metadata	HIGH	Acknowledged — see N1 below
F4	PR claims ISSUES CLOSED but doesn't fix bug	HIGH	RESOLVED — uses `Refs: #570`
F5	Type label mismatch	MEDIUM	RESOLVED — `Type/Testing` appropriate for test-only PR
F6	Scenario #4 misleading name	MEDIUM	RESOLVED — renamed to "arbitrary actor name succeeds"
F7	Missing exit code assertion Scenario #2	MEDIUM	RESOLVED — added (feature line 23)
F8	Robot tests no `--format plain`	MEDIUM	RESOLVED — both Robot tests use `--format plain`
F9	Benchmark tracker docstring misleading	LOW	RESOLVED — docstring rewritten
F10	Cleanup doesn't restore `_service`	LOW	RESOLVED — stores/restores via `context.sce_original_service`
F11	`# type: ignore` in benchmark	LOW	RESOLVED — removed

10 of 11 findings resolved. The only remaining item (F3, branch name) was acknowledged and waived by PM (freemo, review #2042).

Verification of Other Reviewers' Findings

CoreRasurae #1980: 9/9 addressed (F1–F2 HIGH fixed, F3/F4/F5/F9 acknowledged, F6–F8 fixed)
hamza.khyari #2034: 12/12 addressed (PROC-1/2/3 major fixed via @tdd_expected_fail infrastructure, TEST-1 fixed, SEC-1/SEC-3 fixed, remaining acknowledged)
freemo #2060: 6/6 addressed (F1 fixed via @tdd_expected_fail tags, F2–F6 fixed/confirmed clean)

New Findings (This Review)

N1 — Branch name mismatch (MEDIUM — Process, carried forward)

Issue #570 prescribes fix/m3-session-create-error. Actual branch: feature/m3-fix-session-create-error. CONTRIBUTING.md § TDD Bug Test Tags specifies TDD branches should use tdd/ prefix.

Status: Acknowledged by 3 reviewers and the PM. PM explicitly said "not blocking." I accept this waiver and will not block on it.

N2 — `_get_session_service()` does not cache to `_service` (LOW — Note for fix author)

File: src/cleveragents/cli/commands/session.py:69

_get_session_service() returns a new PersistentSessionService(...) without assigning to _service. The singleton cache at line 53 (if _service is not None: return _service) is never populated. Currently moot because container.db() raises AttributeError before reaching line 69. The fix developer should add _service = PersistentSessionService(...) before the return statement.

Recommendation: Note for the bug-fix PR author — not a concern for this TDD PR.

N3 — PR description mentions `--listener` in `integration_tests` session only (LOW — Documentation)

The PR description says the listener is registered "in integration_tests session" but the diff shows it was added to both integration_tests (noxfile.py:579–580) and slow_integration_tests (noxfile.py:603–604). The self-review (#2072) notes this was a P1 fix.

Recommendation: Update PR description to mention both nox sessions.

Code Quality Assessment

Criterion	Status	Notes
Single squashed commit	PASS	`0f629b26`, rebased on `master`
Conventional commit format	PASS	`test(cli):` prefix, `Refs: #570` footer
Behave scenarios	PASS	4 scenarios, all with correct `@tdd_bug @tdd_bug_570 @tdd_expected_fail` tags
Robot Framework tests	PASS	2 test cases with matching tags, `--format plain`
ASV benchmarks	PASS	3 benchmarks (2 time, 1 track), correct docstrings clarifying service-layer scope
`@tdd_expected_fail` infra	PASS	Behave hook (`environment.py:308–327`) + Robot listener (`tdd_expected_fail_listener.py`)
Noxfile listener wiring	PASS	Both `integration_tests` and `slow_integration_tests`
Tag migration (18 scenarios)	PASS	All `@tdd @bugNNN` → `@tdd_bug @tdd_bug_NNN`, zero remnants in codebase
Test isolation / cleanup	PASS	Early cleanup registration (SEC-3), original `_service` restored, container reset, env vars restored, tmpdir removed
CHANGELOG entry	PASS	Placed in `## Unreleased`, references #570
No `# type: ignore`	PASS	None present in any PR file
No bare exceptions	PASS	All error handling is specific
Plain format assertions	PASS	`session_id:` and `total:` match `_session_summary_dict` / `_session_list_dict` output keys

Verdict: APPROVE

All 10 previously blocking findings from my review #2008 are resolved. The @tdd_expected_fail inversion infrastructure is well-implemented and correctly addresses the CI-blocking concerns raised by hamza.khyari (PROC-1/PROC-2). The tag migration is complete with zero remnants. Test design is sound — Behave scenarios exercise the real DI path via _service = None reset, and Robot tests exercise the full init → session create subprocess flow.

The two new findings (N2, N3) are LOW severity and informational. N1 (branch name) was waived by PM.

This PR is ready to serve as the TDD regression harness for the #570 bug fix.

# Code Review — `test(cli): add failing tests for session create DI container error (#570)` **Commit:** `0f629b26` | **Branch:** `feature/m3-fix-session-create-error` | **Base:** `master` **Reviewer:** Rui Hu — follow-up to my previous REQUEST_CHANGES review #2008 --- ## Prior Review Resolution (my review #2008) | ID | Finding | Severity | Status | |----|---------|----------|--------| | F1 | Merge commit on branch | HIGH | **RESOLVED** — single squashed commit | | F2 | Multiple commits for same issue | HIGH | **RESOLVED** — single commit | | F3 | Branch name mismatch with issue metadata | HIGH | **Acknowledged** — see N1 below | | F4 | PR claims ISSUES CLOSED but doesn't fix bug | HIGH | **RESOLVED** — uses `Refs: #570` | | F5 | Type label mismatch | MEDIUM | **RESOLVED** — `Type/Testing` appropriate for test-only PR | | F6 | Scenario #4 misleading name | MEDIUM | **RESOLVED** — renamed to "arbitrary actor name succeeds" | | F7 | Missing exit code assertion Scenario #2 | MEDIUM | **RESOLVED** — added (feature line 23) | | F8 | Robot tests no `--format plain` | MEDIUM | **RESOLVED** — both Robot tests use `--format plain` | | F9 | Benchmark tracker docstring misleading | LOW | **RESOLVED** — docstring rewritten | | F10 | Cleanup doesn't restore `_service` | LOW | **RESOLVED** — stores/restores via `context.sce_original_service` | | F11 | `# type: ignore` in benchmark | LOW | **RESOLVED** — removed | **10 of 11 findings resolved.** The only remaining item (F3, branch name) was acknowledged and waived by PM (freemo, review #2042). --- ## Verification of Other Reviewers' Findings **CoreRasurae #1980:** 9/9 addressed (F1–F2 HIGH fixed, F3/F4/F5/F9 acknowledged, F6–F8 fixed) **hamza.khyari #2034:** 12/12 addressed (PROC-1/2/3 major fixed via `@tdd_expected_fail` infrastructure, TEST-1 fixed, SEC-1/SEC-3 fixed, remaining acknowledged) **freemo #2060:** 6/6 addressed (F1 fixed via `@tdd_expected_fail` tags, F2–F6 fixed/confirmed clean) --- ## New Findings (This Review) ### N1 — Branch name mismatch (MEDIUM — Process, carried forward) Issue #570 prescribes `fix/m3-session-create-error`. Actual branch: `feature/m3-fix-session-create-error`. CONTRIBUTING.md § TDD Bug Test Tags specifies TDD branches should use `tdd/` prefix. **Status:** Acknowledged by 3 reviewers and the PM. PM explicitly said "not blocking." I accept this waiver and will not block on it. --- ### N2 — `_get_session_service()` does not cache to `_service` (LOW — Note for fix author) **File:** `src/cleveragents/cli/commands/session.py:69` `_get_session_service()` returns a new `PersistentSessionService(...)` without assigning to `_service`. The singleton cache at line 53 (`if _service is not None: return _service`) is never populated. Currently moot because `container.db()` raises `AttributeError` before reaching line 69. The fix developer should add `_service = PersistentSessionService(...)` before the return statement. **Recommendation:** Note for the bug-fix PR author — not a concern for this TDD PR. --- ### N3 — PR description mentions `--listener` in `integration_tests` session only (LOW — Documentation) The PR description says the listener is registered "in `integration_tests` session" but the diff shows it was added to **both** `integration_tests` (noxfile.py:579–580) and `slow_integration_tests` (noxfile.py:603–604). The self-review (#2072) notes this was a P1 fix. **Recommendation:** Update PR description to mention both nox sessions. --- ## Code Quality Assessment | Criterion | Status | Notes | |-----------|--------|-------| | Single squashed commit | PASS | `0f629b26`, rebased on `master` | | Conventional commit format | PASS | `test(cli):` prefix, `Refs: #570` footer | | Behave scenarios | PASS | 4 scenarios, all with correct `@tdd_bug @tdd_bug_570 @tdd_expected_fail` tags | | Robot Framework tests | PASS | 2 test cases with matching tags, `--format plain` | | ASV benchmarks | PASS | 3 benchmarks (2 time, 1 track), correct docstrings clarifying service-layer scope | | `@tdd_expected_fail` infra | PASS | Behave hook (`environment.py:308–327`) + Robot listener (`tdd_expected_fail_listener.py`) | | Noxfile listener wiring | PASS | Both `integration_tests` and `slow_integration_tests` | | Tag migration (18 scenarios) | PASS | All `@tdd @bugNNN` → `@tdd_bug @tdd_bug_NNN`, zero remnants in codebase | | Test isolation / cleanup | PASS | Early cleanup registration (SEC-3), original `_service` restored, container reset, env vars restored, tmpdir removed | | CHANGELOG entry | PASS | Placed in `## Unreleased`, references #570 | | No `# type: ignore` | PASS | None present in any PR file | | No bare exceptions | PASS | All error handling is specific | | Plain format assertions | PASS | `session_id:` and `total:` match `_session_summary_dict` / `_session_list_dict` output keys | --- ## Verdict: APPROVE All 10 previously blocking findings from my review #2008 are resolved. The `@tdd_expected_fail` inversion infrastructure is well-implemented and correctly addresses the CI-blocking concerns raised by hamza.khyari (PROC-1/PROC-2). The tag migration is complete with zero remnants. Test design is sound — Behave scenarios exercise the real DI path via `_service = None` reset, and Robot tests exercise the full `init` → `session create` subprocess flow. The two new findings (N2, N3) are LOW severity and informational. N1 (branch name) was waived by PM. This PR is ready to serve as the TDD regression harness for the #570 bug fix.

brent.edwards force-pushed feature/m3-fix-session-create-error from 0f629b2613 to c9037a00b5

2026-03-10 19:05:32 +00:00

Compare

brent.edwards dismissed hurui200320's review 2026-03-10 19:05:32 +00:00

Reason:

New commits pushed, approval review dismissed automatically according to repository settings

brent.edwards commented

2026-03-10 20:32:13 +00:00

CI Fix: False-positive exit code 1 from behave-parallel runner

Commit: 26b7a0c2

Root Cause

The CI failure was not in repositories_coverage_r2.feature:272 (checkpoint prune test) — the garbled parallel output made it look that way. The actual problem was that the @tdd_expected_fail inversion in after_scenario changed scenario statuses but could not change behave's internal failure tracking.

Here's the mechanism:

session_create_error.feature has 4 scenarios tagged @tdd_expected_fail that are expected to fail (bug #570 is unfixed)
The steps fail as expected → behave's Scenario.run() sets its local failed = True variable (line 1202 in behave/model.py)
after_scenario hook fires and flips scenario.status from failed to passed ✅
But Scenario.run() returns the local failed boolean, which is still True — the hook can't modify a local variable in the caller's stack frame
Feature.run() increments failed_count, runner.run() returns True
The parallel runner's _worker_run_features propagated runner.run() → worker_failed=True → overall failed=True → sys.exit(1)

Meanwhile, _extract_summary() reads scenario.status.name (the post-hook value), so the summary correctly showed 0 failures — but the exit code was still 1.

Fix

Changed both _run_features_inprocess and _worker_run_features in the embedded _BEHAVE_PARALLEL_CLI_SOURCE to derive the failed flag from the post-hook summary (via _has_failures(summary)) instead of from runner.run(). This ensures the final, post-hook scenario status is authoritative.

Verification

nox -e unit_tests: 10,099 scenarios passed, 0 failed — exit code 0 ✅
nox -e typecheck: 0 errors ✅
nox -e coverage_report: 98% (threshold: 97%) ✅
All pre-commit hooks pass ✅

## CI Fix: False-positive exit code 1 from behave-parallel runner **Commit:** `26b7a0c2` ### Root Cause The CI failure was **not** in `repositories_coverage_r2.feature:272` (checkpoint prune test) — the garbled parallel output made it look that way. The actual problem was that the `@tdd_expected_fail` inversion in `after_scenario` changed scenario statuses but could not change behave's internal failure tracking. Here's the mechanism: 1. `session_create_error.feature` has 4 scenarios tagged `@tdd_expected_fail` that are expected to fail (bug #570 is unfixed) 2. The steps fail as expected → behave's `Scenario.run()` sets its **local** `failed = True` variable (line 1202 in `behave/model.py`) 3. `after_scenario` hook fires and flips `scenario.status` from `failed` to `passed` ✅ 4. But `Scenario.run()` returns the **local** `failed` boolean, which is still `True` — the hook can't modify a local variable in the caller's stack frame 5. `Feature.run()` increments `failed_count`, `runner.run()` returns `True` 6. The parallel runner's `_worker_run_features` propagated `runner.run()` → `worker_failed=True` → overall `failed=True` → `sys.exit(1)` Meanwhile, `_extract_summary()` reads `scenario.status.name` (the post-hook value), so the summary correctly showed **0 failures** — but the exit code was still 1. ### Fix Changed both `_run_features_inprocess` and `_worker_run_features` in the embedded `_BEHAVE_PARALLEL_CLI_SOURCE` to derive the `failed` flag from the post-hook summary (via `_has_failures(summary)`) instead of from `runner.run()`. This ensures the final, post-hook scenario status is authoritative. ### Verification - `nox -e unit_tests`: **10,099 scenarios passed, 0 failed** — exit code 0 ✅ - `nox -e typecheck`: 0 errors ✅ - `nox -e coverage_report`: 98% (threshold: 97%) ✅ - All pre-commit hooks pass ✅

brent.edwards force-pushed feature/m3-fix-session-create-error from 26b7a0c201 to d0689573e0

2026-03-10 21:41:32 +00:00

Compare

brent.edwards added 1 commit 2026-03-10 22:27:55 +00:00

fix(test): use scoped_session to prevent GC-induced data loss in r2cov

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / build (pull_request) Successful in 16s

Details

CI / lint (pull_request) Successful in 17s

Details

CI / quality (pull_request) Successful in 18s

Details

CI / security (pull_request) Successful in 35s

Details

CI / typecheck (pull_request) Successful in 38s

Details

CI / unit_tests (pull_request) Successful in 2m35s

Details

CI / docker (pull_request) Successful in 39s

Details

CI / integration_tests (pull_request) Successful in 3m22s

Details

CI / coverage (pull_request) Successful in 6m4s

Details

CI / benchmark-regression (pull_request) Has been cancelled

Details

4fff7f67c5

SQLite in-memory engines use SingletonThreadPool, giving every Session
the same underlying connection.  Repository methods create a new
Session via the factory, flush, then return — letting the Session go
out of scope.  Under high memory pressure (e.g. 32 parallel behave
workers) Python's garbage collector closes these orphaned Sessions,
issuing an implicit ROLLBACK on the shared connection and wiping
flushed-but-uncommitted rows written by other Sessions.

Replace the plain sessionmaker with scoped_session in the r2cov
Background step so that every factory() call returns the same Session
instance.  A single long-lived Session per scenario eliminates the
premature close/rollback window entirely.

Verified: 3 consecutive green runs with --processes 32 (10 099
scenarios, 0 failures each).

Refs: #570

brent.edwards commented

2026-03-10 22:28:14 +00:00

CI Fix: Intermittent checkpoint test failure under high parallelism

Commit: 4fff7f67

Root Cause

The repositories_coverage_r2.feature checkpoint scenarios (:249 list_by_plan, :272 prune) fail intermittently when the behave-parallel runner uses ≥32 worker processes but pass reliably with ≤16.

Mechanism — GC-induced implicit ROLLBACK on a shared SQLite connection:

SQLite in-memory engines use SingletonThreadPool — one connection per thread, shared by all Sessions
Each repository method (create, prune, etc.) calls self._session_factory() to get a new Session, does work, calls flush(), then returns — the Session goes out of scope
Under high memory pressure (32 forked worker processes), Python's garbage collector runs more aggressively between step calls
When GC collects an orphaned Session, SQLAlchemy closes it, issuing an implicit ROLLBACK on the shared connection
This wipes flushed-but-uncommitted rows written by other Sessions on the same connection
Result: prune() queries the DB and finds 0 rows instead of 5 → returns [] → assertion fails

Verified with instrumented reproduction:

# Plain sessionmaker + forced GC between inserts:
# After flush+GC 0: 0 rows visible  ← data lost!
# After flush+GC 1: 0 rows visible  ← data lost!

# scoped_session + forced GC between inserts:  
# After flush+GC 0: 1 rows visible  ✓
# After flush+GC 1: 2 rows visible  ✓

Fix

Replace sessionmaker(bind=engine) with scoped_session(sessionmaker(bind=engine)) in the step_fresh_db Background step (features/steps/repositories_coverage_r2_steps.py:237). scoped_session ensures every factory() call within the same thread returns the same Session instance, so no Session goes out of scope prematurely and no implicit ROLLBACK occurs.

Verification

behave-parallel --processes 32: 3 consecutive green runs (10,099 scenarios, 0 failures each)
nox -e typecheck: 0 errors ✅
nox -e coverage_report: 98% (threshold: 97%) ✅
All pre-commit hooks pass ✅

## CI Fix: Intermittent checkpoint test failure under high parallelism **Commit:** `4fff7f67` ### Root Cause The `repositories_coverage_r2.feature` checkpoint scenarios (`:249` list_by_plan, `:272` prune) fail intermittently when the behave-parallel runner uses ≥32 worker processes but pass reliably with ≤16. **Mechanism — GC-induced implicit ROLLBACK on a shared SQLite connection:** 1. SQLite in-memory engines use `SingletonThreadPool` — one connection per thread, shared by all Sessions 2. Each repository method (`create`, `prune`, etc.) calls `self._session_factory()` to get a new `Session`, does work, calls `flush()`, then returns — the Session goes out of scope 3. Under high memory pressure (32 forked worker processes), Python's garbage collector runs more aggressively between step calls 4. When GC collects an orphaned Session, SQLAlchemy closes it, issuing an implicit ROLLBACK on the **shared** connection 5. This wipes flushed-but-uncommitted rows written by **other** Sessions on the same connection 6. Result: `prune()` queries the DB and finds 0 rows instead of 5 → returns `[]` → assertion fails Verified with instrumented reproduction: ```python # Plain sessionmaker + forced GC between inserts: # After flush+GC 0: 0 rows visible ← data lost! # After flush+GC 1: 0 rows visible ← data lost! # scoped_session + forced GC between inserts: # After flush+GC 0: 1 rows visible ✓ # After flush+GC 1: 2 rows visible ✓ ``` ### Fix Replace `sessionmaker(bind=engine)` with `scoped_session(sessionmaker(bind=engine))` in the `step_fresh_db` Background step (`features/steps/repositories_coverage_r2_steps.py:237`). `scoped_session` ensures every `factory()` call within the same thread returns the **same** Session instance, so no Session goes out of scope prematurely and no implicit ROLLBACK occurs. ### Verification - `behave-parallel --processes 32`: **3 consecutive green runs** (10,099 scenarios, 0 failures each) - `nox -e typecheck`: 0 errors ✅ - `nox -e coverage_report`: 98% (threshold: 97%) ✅ - All pre-commit hooks pass ✅

brent.edwards added 1 commit 2026-03-10 23:02:07 +00:00

Merge remote-tracking branch 'origin/master' into feature/m3-fix-session-create-error

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / lint (pull_request) Successful in 16s

Details

CI / build (pull_request) Successful in 18s

Details

CI / quality (pull_request) Successful in 21s

Details

CI / typecheck (pull_request) Successful in 39s

Details

CI / security (pull_request) Successful in 39s

Details

CI / unit_tests (pull_request) Successful in 4m22s

Details

CI / integration_tests (pull_request) Successful in 5m2s

Details

CI / docker (pull_request) Successful in 40s

Details

CI / coverage (pull_request) Successful in 5m9s

Details

CI / benchmark-regression (pull_request) Successful in 34m47s

Details

142895e34d

# Conflicts:
#	CHANGELOG.md

brent.edwards merged commit 832b8a8b5a into master

2026-03-10 23:08:21 +00:00

brent.edwards referenced this issue from a commit

2026-03-10 23:08:21 +00:00

Merge pull request 'test(cli): add failing tests for session create DI container error (#570)' (#595) from feature/m3-fix-session-create-error into master

brent.edwards deleted branch feature/m3-fix-session-create-error

2026-03-10 23:08:21 +00:00

freemo referenced this pull request

2026-03-11 05:28:55 +00:00

`agents session create` causes an error. #570

freemo referenced this pull request

2026-03-11 20:15:47 +00:00

`agents session list` causes an error. #554

freemo referenced this pull request

2026-03-11 20:15:51 +00:00

`agents session create` causes an error. #570

Sign in to join this conversation.

No Reviewers

CoreRasurae

hamza.khyari

freemo

brent.edwards

6 Participants

Notifications

Due Date

No due date set.

Blocks

#570 `agents session create` causes an error.

cleveragents/cleveragents-core

Reference: cleveragents/cleveragents-core#595

				`@@ -0,0 +1,93 @@`
				`"""ASV benchmarks for session create DI wiring (bug #570).`

test(cli): add failing tests for session create DI container error (#570) #595

Summary

Changes

Review feedback addressed

From CoreRasurae (Review #1980)

From hurui200320 (Review #2008)

From hamza.khyari (Review #2034)

From freemo (Review #2060)

Process

PM Review — Day 25

Status

Context

Action Item

Priority

Code Review -- test(cli): add failing tests for session create DI container error

Summary

Findings

F1 -- Rich Console Output Will Not Be Captured by CliRunner [HIGH -- Test Flaw]

F2 -- Benchmarks Bypass the Actual Bug Path [HIGH -- Test Flaw]

F3 -- Missing Error Handling Scenario [MEDIUM -- Test Coverage]

F4 -- Branch Name Mismatch With Issue Metadata [MEDIUM -- Process]

F5 -- Specification Alignment: Panel Title and Output String [MEDIUM -- Spec Compliance]

F6 -- Dead Code in Benchmark [LOW -- Code Quality]

F7 -- Benchmark Recreates Engine and Schema Per Call [LOW -- Performance]

F8 -- Robot Tests Missing [Tags] [LOW -- Test Organization]

F9 -- Benchmark Filename Mismatch With Issue Subtask [LOW -- Process]

Severity Summary

Disposition of findings (review #1980)

Fixed

Acknowledged (no code change)

Verification

Code Review -- test(cli): add failing tests for session create DI container error

Summary

Findings

F1 -- Test Scenario #4 Assumes Non-Existent Actor Validation [HIGH -- Test Logic Error]

F2 -- Missing Exit Code Assertion in Scenario #2 [MEDIUM -- Test Rigor]

F3 -- Robot Tests Do Not Use --format plain [MEDIUM -- Test Consistency]

F4 -- Benchmark track_create_persists Docstring Misleading [LOW -- Documentation]

F5 -- Scenario #3 Does Not Assert on session_id: [LOW -- Test Coverage]

F6 -- Cleanup Does Not Restore Original _service State [LOW -- Test Isolation]

Severity Summary

Code Review — test(cli): add failing tests for session create DI container error

Summary

Findings

F1 — Merge commit on branch [HIGH — Process Violation]

F2 — Multiple commits for the same issue [HIGH — Process Violation]

F3 — Branch name mismatch with issue metadata [HIGH — Process Violation]

F4 — PR claims ISSUES CLOSED: #570 but does not fix the bug [HIGH — Scope/Completeness]

F5 — PR Type label mismatch [MEDIUM — Process]

F6 — Scenario #4 is misleading ("fails gracefully" but asserts success) [MEDIUM — Test Logic]

F7 — Scenario #2 missing create exit code assertion [MEDIUM — Test Rigor]

F8 — Robot tests don't use --format plain [MEDIUM — Test Consistency]

F9 — Benchmark track_create_persists docstring still misleading [LOW — Documentation]

F10 — Cleanup does not restore original _service state [LOW — Test Isolation]

F11 — Unnecessary # type: ignore[attr-defined] in benchmark [LOW — Code Quality]

Severity Summary

Disposition of findings

Aditya's review (comment #55060)

hurui200320's review (#2008)

Additional

Verification

PR #595 Review — test(cli): add failing tests for session create DI container error (#570)

Prior Review Resolution

Findings

PROC-1 (Major) — behave.ini missing tags = ~@wip: @wip scenarios will fail in CI

PROC-2 (Major) — noxfile.py missing --exclude wip: Robot wip tests will fail in CI

PROC-3 (Major) — Commit message fix(cli): on test-only PR

TEST-1 (Medium) — Robot Should Contain case mismatch: will fail after fix

SEC-1 (Medium) — DI-created engine never disposed after fix lands

SEC-2 (Medium) — _get_session_service() passes db where session_factory callable expected

PROC-4 (Medium) — Branch name mismatch (still open)

PROC-5 (Medium) — Type/Testing label not in CONTRIBUTING.md taxonomy

TEST-2 (Medium) — No error handling scenarios

SEC-3 (Low) — Env var set before cleanup registered

BENCH-1 (Low) — importlib.reload partial: submodules not reloaded

PROC-6 (Low) — Benchmark filename mismatch with issue subtask

Summary

Rebase + Review Fixes — 1c9fe9e4

Disposition of hamza.khyari review #2034

Fixed

Code Review -- `test(cli): add failing tests for session create DI container error`

F8 -- Robot Tests Missing `[Tags]` [LOW -- Test Organization]

Code Review -- `test(cli): add failing tests for session create DI container error`

F3 -- Robot Tests Do Not Use `--format plain` [MEDIUM -- Test Consistency]

F4 -- Benchmark `track_create_persists` Docstring Misleading [LOW -- Documentation]

F5 -- Scenario #3 Does Not Assert on `session_id:` [LOW -- Test Coverage]

F6 -- Cleanup Does Not Restore Original `_service` State [LOW -- Test Isolation]

Code Review — `test(cli): add failing tests for session create DI container error`

F4 — PR claims `ISSUES CLOSED: #570` but does not fix the bug [HIGH — Scope/Completeness]

F8 — Robot tests don't use `--format plain` [MEDIUM — Test Consistency]

F9 — Benchmark `track_create_persists` docstring still misleading [LOW — Documentation]

F10 — Cleanup does not restore original `_service` state [LOW — Test Isolation]

F11 — Unnecessary `# type: ignore[attr-defined]` in benchmark [LOW — Code Quality]

PR #595 Review — `test(cli): add failing tests for session create DI container error (#570)`

PROC-1 (Major) — `behave.ini` missing `tags = ~@wip`: `@wip` scenarios will fail in CI

PROC-2 (Major) — `noxfile.py` missing `--exclude wip`: Robot `wip` tests will fail in CI

PROC-3 (Major) — Commit message `fix(cli):` on test-only PR

TEST-1 (Medium) — Robot `Should Contain` case mismatch: will fail after fix

SEC-2 (Medium) — `_get_session_service()` passes `db` where `session_factory` callable expected

PROC-5 (Medium) — `Type/Testing` label not in CONTRIBUTING.md taxonomy

BENCH-1 (Low) — `importlib.reload` partial: submodules not reloaded

Rebase + Review Fixes — `1c9fe9e4`

Code Review — PR #595: `test(cli): add failing tests for session create DI container error`

F1 — `@unit` Tag Misclassifies Integration-Level Tests [HIGH]

F2 — Robot Tests Do Not Assert on `agents init` Exit Code [MEDIUM]

F3 — Robot `list` Assertion Is Too Weak to Verify Persistence [MEDIUM]

F5 — `_cleanup_sce` Comment Implies Engine Disposal That Does Not Occur [LOW]

F6 — Benchmark `timeout` Is Float Instead of Int [LOW]

F7 — `list_sessions` Empty-Path Bug Not Covered by Any Test [LOW]

F8 — Robot Tests Use `Evaluate import('tempfile').mkdtemp()` Anti-Pattern [LOW]

Disposition of Aditya review (comment #56627) — commit `0248d222`

Code Review — PR #595 `feature/m3-fix-session-create-error`

Disposition of all review findings — commit `9ef1e9c0`

Self-Review — PR #595 `test(cli): add failing tests for session create DI container error`

F1 (was P1, fixed previously) — `slow_integration_tests` missing `--listener`

F4 (P2) — `importlib.reload()` / module reset is fragile

F5 (P2) — Robot test #2 missing stderr check for `AttributeError`

F6 (P2) — Benchmark `track_create_persists` docstring unclear

F7 (P3) — Private `_service` access

Code Review — `test(cli): add failing tests for session create DI container error (#570)`

N2 — `_get_session_service()` does not cache to `_service` (LOW — Note for fix author)

N3 — PR description mentions `--listener` in `integration_tests` session only (LOW — Documentation)