feature/m6-estimation-actor-yaml #975

Merged
aditya merged 3 commits from feature/m6-estimation-actor-yaml into master 2026-03-18 07:23:14 +00:00
Member

Description

Adds estimation-actor support and role-aware actor validation for issue #650.
This change introduces actor schema support for role_hint and
response_format, adds non-fatal compatibility warnings during actor
registration and plan preflight checks, adds a new
examples/actors/estimator.yaml, and updates docs/tests accordingly.
It also adjusts E2E helper expectations in local environments where provider
keys are not configured, while still failing on real crashes.

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Refactoring (no functional changes)
  • Documentation update
  • Test improvements
  • CI/CD changes

Quality Checklist

  • Code follows the project's coding standards (see CONTRIBUTING.md)
  • All public/protected methods have argument validation
  • Static typing is complete (no Any unless justified)
  • nox -s typecheck passes with no errors
  • nox -s lint passes with no errors
  • Unit tests written/updated (Behave scenarios in features/)
  • Integration tests written/updated (Robot suites in robot/) if applicable
  • Coverage remains above 85% (nox -s coverage_report)
  • No security issues introduced (nox -s security_scan)
  • No dead code introduced (nox -s dead_code)
  • Documentation updated if behavior changed

Testing

Ran targeted and session-level tests with log capture and grep summaries.

Test Commands Run

nox -s unit_tests
nox -s integration_tests

Additional targeted runs during debugging:

robot robot/m1_e2e_verification.robot
robot robot/m2_e2e_verification.robot
robot robot/m3_e2e_verification.robot
robot robot/m6_e2e_verification.robot

Closes #650

Implementation Notes

  • Added actor-level fields and compatibility checks for estimation-role actors.
  • Preflight now emits warnings (non-fatal) when estimation actors miss
    response_format.
  • Added examples/actors/estimator.yaml and updated docs plus Behave/Robot
    coverage.
  • Fixed integration helper behavior for provider-not-configured local runs in
    M1/M2/M3/M6 suites.
## Description Adds estimation-actor support and role-aware actor validation for issue #650. This change introduces actor schema support for `role_hint` and `response_format`, adds non-fatal compatibility warnings during actor registration and plan preflight checks, adds a new `examples/actors/estimator.yaml`, and updates docs/tests accordingly. It also adjusts E2E helper expectations in local environments where provider keys are not configured, while still failing on real crashes. ## Type of Change - [x] Bug fix (non-breaking change which fixes an issue) - [x] New feature (non-breaking change which adds functionality) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Refactoring (no functional changes) - [x] Documentation update - [x] Test improvements - [ ] CI/CD changes ## Quality Checklist - [x] Code follows the project's coding standards (see CONTRIBUTING.md) - [x] All public/protected methods have argument validation - [x] Static typing is complete (no `Any` unless justified) - [x] `nox -s typecheck` passes with no errors - [x] `nox -s lint` passes with no errors - [x] Unit tests written/updated (Behave scenarios in `features/`) - [x] Integration tests written/updated (Robot suites in `robot/`) if applicable - [x] Coverage remains above 85% (`nox -s coverage_report`) - [x] No security issues introduced (`nox -s security_scan`) - [x] No dead code introduced (`nox -s dead_code`) - [x] Documentation updated if behavior changed ## Testing Ran targeted and session-level tests with log capture and grep summaries. ### Test Commands Run ```bash nox -s unit_tests nox -s integration_tests ``` Additional targeted runs during debugging: ```bash robot robot/m1_e2e_verification.robot robot robot/m2_e2e_verification.robot robot robot/m3_e2e_verification.robot robot robot/m6_e2e_verification.robot ``` ## Related Issues Closes #650 ## Implementation Notes - Added actor-level fields and compatibility checks for estimation-role actors. - Preflight now emits warnings (non-fatal) when estimation actors miss `response_format`. - Added `examples/actors/estimator.yaml` and updated docs plus Behave/Robot coverage. - Fixed integration helper behavior for provider-not-configured local runs in M1/M2/M3/M6 suites.
freemo left a comment

PM Day 36 Triage: Estimation actor YAML template. Closes #650. M6 scope — part of the estimation domain work (4 issues: #649-#652). Reviewer: @aditya (actor YAML domain expert). Verify actor config aligns with ADR-010 and ADR-032 Jinja2 YAML template preprocessing.

PM Day 36 Triage: Estimation actor YAML template. Closes #650. M6 scope — part of the estimation domain work (4 issues: #649-#652). Reviewer: @aditya (actor YAML domain expert). Verify actor config aligns with ADR-010 and ADR-032 Jinja2 YAML template preprocessing.
brent.edwards requested changes 2026-03-16 20:25:41 +00:00
Dismissed
brent.edwards left a comment

PR #975 Review — feature/m6-estimation-actor-yaml

19 files changed, +380/−14 | Linked issue: #650


P0 — Must Fix Before Merge

P0-1. Duplicated response_format lookup logic diverges from actor_role_warnings

check_estimation_actor_compatibility_warning() in plan_preflight_guardrail.py:248-271 re-implements the response_format nested-config fallback logic that already exists in actor_role_warnings() in schema.py. Critically, the preflight method omits the context_view warning that actor_role_warnings produces:

  • CLI actor add/update warns about wrong context_view on estimation actors
  • Preflight silently ignores it

The preflight service doesn't import actor_role_warnings at all. This is a DRY violation that will diverge further as rules are added.

Fix: Have check_estimation_actor_compatibility_warning delegate to actor_role_warnings() from schema.py.

P0-2. _is_expected_provider_unavailable copy-pasted identically into 4 files

Identical 6-line function in robot/helper_m1_e2e_verification.py, helper_m2_e2e_verification.py, helper_m3_e2e_verification.py, helper_m6_e2e_verification.py. Any future change to the detection heuristic must be applied in 4 places.

Fix: Extract to a shared module (e.g. robot/helpers_common.py) and import.

P0-3. PR missing required milestone and Type/ label

Per CONTRIBUTING.md: "A PR without a milestone will not be reviewed" and "Every PR must carry exactly one Type/ label." Currently labels: [], milestone: null, assignee: null.


P1 — Should Fix Before Merge

P1-1. response_format accepts any dict with zero structural validation

response_format: dict[str, Any] | None has no field_validator. A user could set response_format: {foo: bar} and it would pass schema loading, only to fail at LLM call time with an opaque error. Add a validator checking at minimum for a "type" key.

P1-2. Provider-unavailable heuristic may mask real errors

The substring match "openai_api_key" in hay is overly broad. A genuine internal error whose context dumps environment variable names would be silently swallowed. Tighten to match the specific controlled error message pattern.

P1-3. format_report() hard-codes warning category label

All warnings render as [WARN] actor_role_compatibility: regardless of source. Store warnings as structured objects rather than bare strings.

P1-4. Missing negative test coverage for new schema fields

Missing tests:

  • Invalid role_hint value → Pydantic rejection
  • Non-dict response_format → validation catch
  • context_view warning path (estimation + context_view: executor)
  • actor_role_warnings called with ActorConfigSchema object (only dict path tested)

P2 — Should Fix (Can Follow Up)

P2-1. Quality checklist items self-reported as unchecked

typecheck, lint, coverage_report, security_scan, dead_code are all unchecked in the PR body.

P2-2. Multiple files exceed 500-line limit

schema.py is now 974 lines (limit: 500). The new actor_role_warnings + coercion helpers are a natural extraction candidate. Pre-existing violations in 5 other files are also grown by this PR.

P2-3. Robot test doc string stale

Load All Example YAML Files doc was updated to remove the count but the assertion still checks example-count:8.


P3 — Nitpicks

  • _coerce_role_hint / _coerce_context_view lack docstrings
  • CHANGELOG entry is a dense 10-line paragraph; split into sub-bullets
  • $schema key in estimator.yaml may confuse some YAML linters (cosmetic)

Severity Count
P0 3
P1 4
P2 3
P3 3

Verdict: REQUEST_CHANGES — The logic duplication between actor_role_warnings and the preflight method (P0-1) is the most concerning architectural issue, creating a divergence where the CLI warns about problems the preflight silently ignores.

## PR #975 Review — `feature/m6-estimation-actor-yaml` **19 files changed, +380/−14** | Linked issue: #650 --- ## P0 — Must Fix Before Merge ### P0-1. Duplicated response_format lookup logic diverges from `actor_role_warnings` `check_estimation_actor_compatibility_warning()` in `plan_preflight_guardrail.py:248-271` re-implements the response_format nested-config fallback logic that already exists in `actor_role_warnings()` in `schema.py`. Critically, the preflight method **omits the `context_view` warning** that `actor_role_warnings` produces: - CLI `actor add/update` warns about wrong `context_view` on estimation actors - Preflight silently ignores it The preflight service doesn't import `actor_role_warnings` at all. This is a DRY violation that **will** diverge further as rules are added. **Fix:** Have `check_estimation_actor_compatibility_warning` delegate to `actor_role_warnings()` from `schema.py`. ### P0-2. `_is_expected_provider_unavailable` copy-pasted identically into 4 files Identical 6-line function in `robot/helper_m1_e2e_verification.py`, `helper_m2_e2e_verification.py`, `helper_m3_e2e_verification.py`, `helper_m6_e2e_verification.py`. Any future change to the detection heuristic must be applied in 4 places. **Fix:** Extract to a shared module (e.g. `robot/helpers_common.py`) and import. ### P0-3. PR missing required milestone and `Type/` label Per CONTRIBUTING.md: *"A PR without a milestone will not be reviewed"* and *"Every PR must carry exactly one Type/ label."* Currently `labels: []`, `milestone: null`, `assignee: null`. --- ## P1 — Should Fix Before Merge ### P1-1. `response_format` accepts any `dict` with zero structural validation `response_format: dict[str, Any] | None` has no `field_validator`. A user could set `response_format: {foo: bar}` and it would pass schema loading, only to fail at LLM call time with an opaque error. Add a validator checking at minimum for a `"type"` key. ### P1-2. Provider-unavailable heuristic may mask real errors The substring match `"openai_api_key" in hay` is overly broad. A genuine internal error whose context dumps environment variable names would be silently swallowed. Tighten to match the specific controlled error message pattern. ### P1-3. `format_report()` hard-codes warning category label All warnings render as `[WARN] actor_role_compatibility:` regardless of source. Store warnings as structured objects rather than bare strings. ### P1-4. Missing negative test coverage for new schema fields Missing tests: - Invalid `role_hint` value → Pydantic rejection - Non-dict `response_format` → validation catch - `context_view` warning path (estimation + `context_view: executor`) - `actor_role_warnings` called with `ActorConfigSchema` object (only dict path tested) --- ## P2 — Should Fix (Can Follow Up) ### P2-1. Quality checklist items self-reported as unchecked typecheck, lint, coverage_report, security_scan, dead_code are all unchecked in the PR body. ### P2-2. Multiple files exceed 500-line limit `schema.py` is now 974 lines (limit: 500). The new `actor_role_warnings` + coercion helpers are a natural extraction candidate. Pre-existing violations in 5 other files are also grown by this PR. ### P2-3. Robot test doc string stale `Load All Example YAML Files` doc was updated to remove the count but the assertion still checks `example-count:8`. --- ## P3 — Nitpicks - `_coerce_role_hint` / `_coerce_context_view` lack docstrings - CHANGELOG entry is a dense 10-line paragraph; split into sub-bullets - `$schema` key in estimator.yaml may confuse some YAML linters (cosmetic) --- | Severity | Count | |---|---| | **P0** | 3 | | **P1** | 4 | | **P2** | 3 | | **P3** | 3 | **Verdict: REQUEST_CHANGES** — The logic duplication between `actor_role_warnings` and the preflight method (P0-1) is the most concerning architectural issue, creating a divergence where the CLI warns about problems the preflight silently ignores.
@ -154,1 +154,4 @@
def _is_expected_provider_unavailable(output: str) -> bool:
"""Return True when failure is due to missing external LLM provider config."""
hay = output.lower()
Member

P0-2: This identical function is copy-pasted into 4 helper files (m1, m2, m3, m6). Extract to a shared module like robot/helpers_common.py and import from all four.

P0-2: This identical function is copy-pasted into 4 helper files (m1, m2, m3, m6). Extract to a shared module like `robot/helpers_common.py` and import from all four.
@ -155,0 +159,4 @@
"provider openai is not configured" in hay
or "openai_api_key" in hay
or "no provider configured" in hay
)
Member

P1-2: The substring match "openai_api_key" in hay is overly broad. A genuine internal error whose error context includes environment variable names would be silently swallowed. Consider matching on the specific controlled error message pattern (e.g. "provider openai is not configured") rather than bare key name substrings.

P1-2: The substring match `"openai_api_key" in hay` is overly broad. A genuine internal error whose error context includes environment variable names would be silently swallowed. Consider matching on the specific controlled error message pattern (e.g. `"provider openai is not configured"`) rather than bare key name substrings.
@ -715,6 +725,10 @@ class ActorConfigSchema(BaseModel):
# LLM configuration
model: str | None = Field(default=None, description="LLM model name")
system_prompt: str | None = Field(default=None, description="System prompt")
response_format: dict[str, Any] | None = Field(
Member

P1-1: response_format accepts any dict with zero structural validation. A user could set response_format: {foo: bar} and it passes schema loading but fails at LLM call time. Consider adding a field_validator that checks for at minimum a "type" key.

P1-1: `response_format` accepts any `dict` with zero structural validation. A user could set `response_format: {foo: bar}` and it passes schema loading but fails at LLM call time. Consider adding a `field_validator` that checks for at minimum a `"type"` key.
@ -89,2 +94,4 @@
lines.append(f" [{status}] {r.check.value}: {r.message}")
for warning in self.warnings:
lines.append(f" [WARN] actor_role_compatibility: {warning}")
return "\n".join(lines)
Member

P1-3: This hard-codes actor_role_compatibility as the label for ALL warnings. When more warning types are added, this will misattribute them. Consider storing warnings as structured objects (category + message) rather than bare strings.

P1-3: This hard-codes `actor_role_compatibility` as the label for ALL warnings. When more warning types are added, this will misattribute them. Consider storing warnings as structured objects (category + message) rather than bare strings.
@ -235,0 +249,4 @@
actor_registry: dict[str, object] | None,
) -> str | None:
"""Emit non-fatal warnings for estimation actor role compatibility."""
registry = actor_registry or {}
Member

P0-1: This method re-implements the response_format lookup logic from actor_role_warnings() in schema.py but omits the context_view warning. This creates a behavioral divergence: CLI actor add warns about context_view issues on estimation actors, but preflight does not.

Suggestion: import and delegate to actor_role_warnings() from cleveragents.actor.schema instead of duplicating the logic.

P0-1: This method re-implements the response_format lookup logic from `actor_role_warnings()` in `schema.py` but **omits the context_view warning**. This creates a behavioral divergence: CLI `actor add` warns about context_view issues on estimation actors, but preflight does not. Suggestion: import and delegate to `actor_role_warnings()` from `cleveragents.actor.schema` instead of duplicating the logic.
brent.edwards requested changes 2026-03-16 20:26:19 +00:00
Dismissed
brent.edwards left a comment

Code Review — PR #975 feature/m6-estimation-actor-yaml

Reviewer: @brent.edwards | Size: M (~600 lines) | Focus: Actor schema, preflight warnings, E2E helpers


P0:blocker (1)

1. Preflight warning diverges from actor_role_warnings() — silent logic split
plan_preflight_guardrail.py:check_estimation_actor_compatibility_warning() reimplements a subset of schema.py:actor_role_warnings(). The guardrail only warns about missing response_format, but actor_role_warnings() also warns when context_view is not strategist. This means actor add warns about a misconfigured estimation actor but plan use preflight doesn't — users get inconsistent feedback. As rules evolve, these will diverge further.
Fix: Call actor_role_warnings() from the guardrail instead of duplicating logic.


P1:must-fix (4)

2. response_format accepts arbitrary dicts with no validation
schema.py:728-730: The field is dict[str, Any] | None with no Pydantic validator. Any dict passes — no check that it's valid JSON Schema, no depth limit, no key whitelist. If forwarded to an LLM API (e.g., OpenAI's response_format), a crafted dict could inject unexpected keys. Add a @field_validator that at minimum validates type is present and is "object" or "json_schema".

3. _is_expected_provider_unavailable() copy-pasted across 4 files
The identical 6-line function is in helper_m1_e2e_verification.py, helper_m2_e2e_verification.py, helper_m3_e2e_verification.py, and helper_m6_e2e_verification.py. Any change requires updating all 4 in lockstep. Extract to robot/helper_e2e_common.py.

4. _is_expected_provider_unavailable() matches "openai_api_key" too broadly
The bare substring matches any output mentioning the key name — including internal errors like "Failed to rotate openai_api_key". Use a more precise pattern like "openai_api_key is not set" or "openai_api_key not found".

5. No negative tests for response_format or role_hint validation
There are no Behave scenarios testing that invalid response_format values or invalid role_hint strings are rejected. Without negative tests, the validation could silently accept malformed input.


P2:should-fix (4)

6. Missing PR metadata — No labels, no milestone, no assignee. Per CONTRIBUTING.md, all PRs must have Type/, Priority/, State/ labels and be assigned to a milestone.

7. Quality gates unchecked — PR checklist shows nox -s typecheck and nox -s lint as unchecked. These are mandatory before review.

8. Preflight warning silently skips non-dict actor registry entriesplan_preflight_guardrail.py:256 checks isinstance(estimation_actor, dict) but the registry may contain ActorConfigSchema instances. The actor_role_warnings() function handles both types; the guardrail should too.

9. Stale Robot documentationm6_e2e_verification.robot suite documentation references the original test scope but doesn't mention the new provider-unavailable handling added in this PR.


P3:nit (3)

10. Missing docstrings on actor_role_warnings() return values.
11. examples/actors/estimator.yamldescription field repeats the name verbatim.
12. CHANGELOG entry is dense — split into separate bullet points for schema vs preflight vs E2E changes.


Positive Observations

  • role_hint and response_format default to None — no backward compatibility break for existing actor YAML.
  • RoleHint enum properly rejects invalid values when set explicitly.
  • actor_role_warnings() generates clear, actionable warning messages.
  • Cross-PR independence: this PR and #528 are independent — either can merge first.

Summary

Severity Count
P0:blocker 1
P1:must-fix 4
P2:should-fix 4
P3:nit 3

Verdict: REQUEST_CHANGES — The core schema additions are well-designed, but the preflight logic divergence (P0-1), unsanitized response_format passthrough (P1-2), and copy-pasted helper functions (P1-3) need to be resolved. The missing PR metadata (P2-6) should also be fixed.

## Code Review — PR #975 `feature/m6-estimation-actor-yaml` **Reviewer:** @brent.edwards | **Size:** M (~600 lines) | **Focus:** Actor schema, preflight warnings, E2E helpers --- ### P0:blocker (1) **1. Preflight warning diverges from `actor_role_warnings()` — silent logic split** `plan_preflight_guardrail.py:check_estimation_actor_compatibility_warning()` reimplements a subset of `schema.py:actor_role_warnings()`. The guardrail only warns about missing `response_format`, but `actor_role_warnings()` also warns when `context_view` is not `strategist`. This means `actor add` warns about a misconfigured estimation actor but `plan use` preflight doesn't — users get inconsistent feedback. As rules evolve, these will diverge further. **Fix:** Call `actor_role_warnings()` from the guardrail instead of duplicating logic. --- ### P1:must-fix (4) **2. `response_format` accepts arbitrary dicts with no validation** `schema.py:728-730`: The field is `dict[str, Any] | None` with no Pydantic validator. Any dict passes — no check that it's valid JSON Schema, no depth limit, no key whitelist. If forwarded to an LLM API (e.g., OpenAI's `response_format`), a crafted dict could inject unexpected keys. Add a `@field_validator` that at minimum validates `type` is present and is `"object"` or `"json_schema"`. **3. `_is_expected_provider_unavailable()` copy-pasted across 4 files** The identical 6-line function is in `helper_m1_e2e_verification.py`, `helper_m2_e2e_verification.py`, `helper_m3_e2e_verification.py`, and `helper_m6_e2e_verification.py`. Any change requires updating all 4 in lockstep. Extract to `robot/helper_e2e_common.py`. **4. `_is_expected_provider_unavailable()` matches `"openai_api_key"` too broadly** The bare substring matches any output mentioning the key name — including internal errors like `"Failed to rotate openai_api_key"`. Use a more precise pattern like `"openai_api_key is not set"` or `"openai_api_key not found"`. **5. No negative tests for `response_format` or `role_hint` validation** There are no Behave scenarios testing that invalid `response_format` values or invalid `role_hint` strings are rejected. Without negative tests, the validation could silently accept malformed input. --- ### P2:should-fix (4) **6. Missing PR metadata** — No labels, no milestone, no assignee. Per CONTRIBUTING.md, all PRs must have Type/, Priority/, State/ labels and be assigned to a milestone. **7. Quality gates unchecked** — PR checklist shows `nox -s typecheck` and `nox -s lint` as unchecked. These are mandatory before review. **8. Preflight warning silently skips non-dict actor registry entries** — `plan_preflight_guardrail.py:256` checks `isinstance(estimation_actor, dict)` but the registry may contain `ActorConfigSchema` instances. The `actor_role_warnings()` function handles both types; the guardrail should too. **9. Stale Robot documentation** — `m6_e2e_verification.robot` suite documentation references the original test scope but doesn't mention the new provider-unavailable handling added in this PR. --- ### P3:nit (3) **10.** Missing docstrings on `actor_role_warnings()` return values. **11.** `examples/actors/estimator.yaml` — `description` field repeats the name verbatim. **12.** CHANGELOG entry is dense — split into separate bullet points for schema vs preflight vs E2E changes. --- ### Positive Observations - `role_hint` and `response_format` default to `None` — no backward compatibility break for existing actor YAML. - `RoleHint` enum properly rejects invalid values when set explicitly. - `actor_role_warnings()` generates clear, actionable warning messages. - Cross-PR independence: this PR and #528 are independent — either can merge first. --- ### Summary | Severity | Count | |----------|-------| | P0:blocker | 1 | | P1:must-fix | 4 | | P2:should-fix | 4 | | P3:nit | 3 | **Verdict:** REQUEST_CHANGES — The core schema additions are well-designed, but the preflight logic divergence (P0-1), unsanitized `response_format` passthrough (P1-2), and copy-pasted helper functions (P1-3) need to be resolved. The missing PR metadata (P2-6) should also be fixed.
brent.edwards requested changes 2026-03-16 20:47:05 +00:00
Dismissed
brent.edwards left a comment

Code Review Round 2 — PR #975 feature/m6-estimation-actor-yaml

Reviewer: @brent.edwards | Focus: Items missed in Round 1

Round 1 findings (P0-1, P1-2 through P1-5, P2-6 through P2-9, P3-10/11/12) remain outstanding.


P1:must-fix (2)

13. response_format is a dead field — declared but never consumed at runtime
No code in src/ reads response_format from a loaded actor and passes it to an LLM provider. The field exists only in the schema and the preflight warning. The example YAML includes a full JSON Schema for EstimationReport, and warnings scold users who omit it, but nothing enforces structured output at runtime. This creates a false contract.
Fix: Either wire response_format into the LLM call path, or add an explicit # TODO: runtime enforcement planned for <issue> and note the gap in CHANGELOG.

14. RoleHint enum has 5 members but ACTOR_ROLES has 4 — review drifts silently
RoleHint defines STRATEGY, EXECUTION, ESTIMATION, INVARIANT_RECONCILIATION, REVIEW. PlanPreflightGuardrail.ACTOR_ROLES only lists the first 4 as strings. These are independently maintained — neither derives from the other. Adding a 6th role will require updating both in lockstep.
Fix: Derive ACTOR_ROLES from RoleHint, or add a comment explaining why review is intentionally excluded.


P2:should-fix (3)

15. schema.py at 974 lines — nearly 2× the 500-line limit
This PR adds 79 lines (895→974). The new RoleHint, _coerce_role_hint, _coerce_context_view, and actor_role_warnings form a natural extraction unit (actor/role_validation.py).

16. actor_role_warnings model-input branch (ActorConfigSchema) entirely untested
The function has two branches: dict (tested via CLI) and ActorConfigSchema (zero test coverage). A bug in the model branch would go undetected.

17. context_view warning path untested
The second warning (context_view != strategist) is never exercised by any test. Only the response_format warning has coverage.


P3:nit (2)

18. _coerce_role_hint silently swallows typos
role_hint: "estmation" (typo) → coercion returns None → no warning generated. Add a logging.warning() for unrecognized values.

19. Estimator example test doesn't verify tools or skills fields
The estimator.yaml scenario checks type/context_view/role_hint/response_format but not tool/skill contents. Other example scenarios verify these.


Summary (Round 2)

Severity Count
P1:must-fix 2
P2:should-fix 3
P3:nit 2

Combined with Round 1: 1 P0 + 6 P1 + 7 P2 + 5 P3 = 19 total findings.

## Code Review Round 2 — PR #975 `feature/m6-estimation-actor-yaml` **Reviewer:** @brent.edwards | **Focus:** Items missed in Round 1 Round 1 findings (P0-1, P1-2 through P1-5, P2-6 through P2-9, P3-10/11/12) remain outstanding. --- ### P1:must-fix (2) **13. `response_format` is a dead field — declared but never consumed at runtime** No code in `src/` reads `response_format` from a loaded actor and passes it to an LLM provider. The field exists only in the schema and the preflight warning. The example YAML includes a full JSON Schema for `EstimationReport`, and warnings scold users who omit it, but nothing enforces structured output at runtime. This creates a false contract. **Fix:** Either wire `response_format` into the LLM call path, or add an explicit `# TODO: runtime enforcement planned for <issue>` and note the gap in CHANGELOG. **14. `RoleHint` enum has 5 members but `ACTOR_ROLES` has 4 — `review` drifts silently** `RoleHint` defines `STRATEGY, EXECUTION, ESTIMATION, INVARIANT_RECONCILIATION, REVIEW`. `PlanPreflightGuardrail.ACTOR_ROLES` only lists the first 4 as strings. These are independently maintained — neither derives from the other. Adding a 6th role will require updating both in lockstep. **Fix:** Derive `ACTOR_ROLES` from `RoleHint`, or add a comment explaining why `review` is intentionally excluded. --- ### P2:should-fix (3) **15. `schema.py` at 974 lines — nearly 2× the 500-line limit** This PR adds 79 lines (895→974). The new `RoleHint`, `_coerce_role_hint`, `_coerce_context_view`, and `actor_role_warnings` form a natural extraction unit (`actor/role_validation.py`). **16. `actor_role_warnings` model-input branch (`ActorConfigSchema`) entirely untested** The function has two branches: dict (tested via CLI) and `ActorConfigSchema` (zero test coverage). A bug in the model branch would go undetected. **17. `context_view` warning path untested** The second warning (`context_view != strategist`) is never exercised by any test. Only the `response_format` warning has coverage. --- ### P3:nit (2) **18. `_coerce_role_hint` silently swallows typos** `role_hint: "estmation"` (typo) → coercion returns `None` → no warning generated. Add a `logging.warning()` for unrecognized values. **19. Estimator example test doesn't verify `tools` or `skills` fields** The `estimator.yaml` scenario checks type/context_view/role_hint/response_format but not tool/skill contents. Other example scenarios verify these. --- ### Summary (Round 2) | Severity | Count | |----------|-------| | P1:must-fix | 2 | | P2:should-fix | 3 | | P3:nit | 2 | Combined with Round 1: **1 P0 + 6 P1 + 7 P2 + 5 P3 = 19 total findings.**
brent.edwards requested changes 2026-03-16 21:16:09 +00:00
Dismissed
brent.edwards left a comment

Code Review Round 3 — PR #975 feature/m6-estimation-actor-yaml

Reviewer: @brent.edwards | Focus: Coercion edge cases, schema backward compatibility

Rounds 1-2 found 19 issues. This round adds 1 genuinely new finding.


P2:should-fix (1)

20. _coerce_context_view swallows invalid values — suppresses estimation actor warning
schema.py:_coerce_context_view() catches ValueError on unrecognized strings and returns None. Downstream in actor_role_warnings, the check is if context_view not in (None, ContextView.STRATEGIST) — since None is in the tuple, the warning is suppressed. An estimation actor with context_view: "exeecutor" (typo) gets no warning, because the typo becomes None which is treated as "not set."

This is the specific mechanism through which the already-reported preflight divergence (Round 1 P0-1) silently drops checks. The suggested fix from Round 1 (delegate to actor_role_warnings()) would inherit this bug.

Fix: Return the raw string from _coerce_context_view when unrecognized (instead of None), and adjust the caller to treat unrecognized values as warning-worthy:

def _coerce_context_view(value: object) -> ContextView | str | None:
    ...
    except ValueError:
        return value  # preserve the bad string so caller can warn

Backward compatibility verified

Both role_hint and response_format are Optional with default=None. Tested against existing YAML files (simple_llm.yaml, graph_workflow.yaml, etc.) — all validate without error. No breakage.


Summary (Round 3)

Severity Count
P2:should-fix 1

Combined total across 3 rounds: 1 P0 + 6 P1 + 8 P2 + 5 P3 = 20 findings. No further review rounds planned.

## Code Review Round 3 — PR #975 `feature/m6-estimation-actor-yaml` **Reviewer:** @brent.edwards | **Focus:** Coercion edge cases, schema backward compatibility Rounds 1-2 found 19 issues. This round adds 1 genuinely new finding. --- ### P2:should-fix (1) **20. `_coerce_context_view` swallows invalid values — suppresses estimation actor warning** `schema.py:_coerce_context_view()` catches `ValueError` on unrecognized strings and returns `None`. Downstream in `actor_role_warnings`, the check is `if context_view not in (None, ContextView.STRATEGIST)` — since `None` is in the tuple, the warning is suppressed. An estimation actor with `context_view: "exeecutor"` (typo) gets no warning, because the typo becomes `None` which is treated as "not set." This is the specific mechanism through which the already-reported preflight divergence (Round 1 P0-1) silently drops checks. The suggested fix from Round 1 (delegate to `actor_role_warnings()`) would inherit this bug. **Fix:** Return the raw string from `_coerce_context_view` when unrecognized (instead of `None`), and adjust the caller to treat unrecognized values as warning-worthy: ```python def _coerce_context_view(value: object) -> ContextView | str | None: ... except ValueError: return value # preserve the bad string so caller can warn ``` --- ### Backward compatibility verified Both `role_hint` and `response_format` are `Optional` with `default=None`. Tested against existing YAML files (`simple_llm.yaml`, `graph_workflow.yaml`, etc.) — all validate without error. No breakage. --- ### Summary (Round 3) | Severity | Count | |----------|-------| | P2:should-fix | 1 | **Combined total across 3 rounds: 1 P0 + 6 P1 + 8 P2 + 5 P3 = 20 findings.** No further review rounds planned.
brent.edwards requested changes 2026-03-16 21:29:21 +00:00
Dismissed
brent.edwards left a comment

Code Review Round 4 — PR #975 feature/m6-estimation-actor-yaml

Reviewer: @brent.edwards | Focus: Preflight execution path tracing, E2E helper consistency

Rounds 1–3 found 20 issues (1 P0, 6 P1, 8 P2, 5 P3). This round adds 1 genuinely new P1.


P1:must-fix (1)

21. check_estimation_actor_compatibility_warning() is dead code — can never fire in production

The sole production caller in plan_preflight_guardrail.py builds actor_registry as:

actor_registry = {
    "strategy": plan.strategy_actor or "__unset__",
    "execution": plan.execution_actor or "__unset__",
    "estimation": plan.estimation_actor or "__optional__",
    "invariant_reconciliation": plan.invariant_actor or "__optional__",
}

Values are always strings (actor names like "local/estimator" or sentinel placeholders like "__optional__").

The new method then does:

estimation_actor = registry.get("estimation")   # → always a str
if not isinstance(estimation_actor, dict):       # → always True for strings
    return None                                  # → always exits here
# ... response_format check never reached ...

isinstance(estimation_actor, dict) is never True because the registry values are actor name strings, not config dicts. The entire estimation-compatibility warning feature introduced by this PR can never fire during preflight.

This is distinct from the Round 1 P0-1 finding (which noted the logic was duplicated and divergent). This finding is that the preflight copy is functionally unreachable — the guardrail is completely inert.

Fix: The method needs to resolve the actor name string to its config dict (from the actor registry/DB) before inspecting response_format. Or, as suggested in Round 1, delegate to actor_role_warnings() with the actual loaded config data.


Verified clean (no issues)

Thread Question Verdict
DB deserialization Do old actors without role_hint/response_format crash on load? Clean — DB stores raw dict via config_blob; ActorConfigSchema only used in YAML file loader; new fields default to None
actor_role_warnings(role_hint=None) Spurious warnings for non-estimation actors? Clean — early return if role_hint != RoleHint.ESTIMATION: return []
Pydantic vs _coerce_role_hint Double coercion conflict? Clean — _coerce_role_hint is a standalone function, not a @field_validator; only called in the dict branch of actor_role_warnings
Preflight Check 2 Does new code change ACTOR_ROLES or pass/fail semantics? Clean — ACTOR_ROLES unchanged, warnings don't affect all_passed
CLI warnings for non-estimation actors Does _print_role_warnings fire spuriously? Clean — actor_role_warnings returns [] for non-estimation actors

Summary (Round 4)

Severity Count
P1:must-fix 1

Combined total across 4 rounds: 1 P0 + 7 P1 + 8 P2 + 5 P3 = 21 findings.

The new P1 (#21) is the most impactful finding this round — it means the primary feature of this PR (preflight estimation warnings) is silently non-functional. Combined with the Round 1 P0-1 (logic divergence), the preflight warning system needs a fundamental rethink: resolve actor names to configs before checking, and delegate to the single actor_role_warnings() source of truth.

## Code Review Round 4 — PR #975 `feature/m6-estimation-actor-yaml` **Reviewer:** @brent.edwards | **Focus:** Preflight execution path tracing, E2E helper consistency Rounds 1–3 found 20 issues (1 P0, 6 P1, 8 P2, 5 P3). This round adds 1 genuinely new P1. --- ### P1:must-fix (1) **21. `check_estimation_actor_compatibility_warning()` is dead code — can never fire in production** The sole production caller in `plan_preflight_guardrail.py` builds `actor_registry` as: ```python actor_registry = { "strategy": plan.strategy_actor or "__unset__", "execution": plan.execution_actor or "__unset__", "estimation": plan.estimation_actor or "__optional__", "invariant_reconciliation": plan.invariant_actor or "__optional__", } ``` Values are always **strings** (actor names like `"local/estimator"` or sentinel placeholders like `"__optional__"`). The new method then does: ```python estimation_actor = registry.get("estimation") # → always a str if not isinstance(estimation_actor, dict): # → always True for strings return None # → always exits here # ... response_format check never reached ... ``` `isinstance(estimation_actor, dict)` is **never** `True` because the registry values are actor name strings, not config dicts. The entire estimation-compatibility warning feature introduced by this PR **can never fire during preflight**. This is distinct from the Round 1 P0-1 finding (which noted the logic was duplicated and divergent). This finding is that the preflight copy is **functionally unreachable** — the guardrail is completely inert. **Fix:** The method needs to resolve the actor name string to its config dict (from the actor registry/DB) before inspecting `response_format`. Or, as suggested in Round 1, delegate to `actor_role_warnings()` with the actual loaded config data. --- ### Verified clean (no issues) | Thread | Question | Verdict | |--------|----------|---------| | DB deserialization | Do old actors without `role_hint`/`response_format` crash on load? | Clean — DB stores raw `dict` via `config_blob`; `ActorConfigSchema` only used in YAML file loader; new fields default to `None` | | `actor_role_warnings(role_hint=None)` | Spurious warnings for non-estimation actors? | Clean — early return `if role_hint != RoleHint.ESTIMATION: return []` | | Pydantic vs `_coerce_role_hint` | Double coercion conflict? | Clean — `_coerce_role_hint` is a standalone function, not a `@field_validator`; only called in the `dict` branch of `actor_role_warnings` | | Preflight Check 2 | Does new code change ACTOR_ROLES or pass/fail semantics? | Clean — `ACTOR_ROLES` unchanged, warnings don't affect `all_passed` | | CLI warnings for non-estimation actors | Does `_print_role_warnings` fire spuriously? | Clean — `actor_role_warnings` returns `[]` for non-estimation actors | --- ### Summary (Round 4) | Severity | Count | |----------|-------| | P1:must-fix | 1 | **Combined total across 4 rounds: 1 P0 + 7 P1 + 8 P2 + 5 P3 = 21 findings.** The new P1 (#21) is the most impactful finding this round — it means the primary feature of this PR (preflight estimation warnings) is silently non-functional. Combined with the Round 1 P0-1 (logic divergence), the preflight warning system needs a fundamental rethink: resolve actor names to configs before checking, and delegate to the single `actor_role_warnings()` source of truth.
brent.edwards requested changes 2026-03-16 21:45:37 +00:00
Dismissed
brent.edwards left a comment

Fifth-pass review — Factual accuracy of CHANGELOG, PR body, and docs

Prior rounds flagged the dense CHANGELOG, stale Robot docs, and dead preflight warning code. This pass verified the factual accuracy of claims in documentation against what the code actually does. One genuinely new P1 finding.


P1 — response_format is inert: "output contract" claim in CHANGELOG and docs is false

What the docs/CHANGELOG claim:

  • CHANGELOG: "EstimationReport JSON schema output contract"
  • docs/reference/actors_examples.md: "defines response_format with an EstimationReport JSON schema so outputs are consistently machine-readable"
  • examples/actors/estimator.yaml system_prompt: "Always return valid JSON matching the provided response_format schema"

What the code actually does:

  • response_format is declared as a dict[str, Any] | None field on ActorConfigSchema (schema.py:728)
  • It is checked for presence in actor_role_warnings() and check_estimation_actor_compatibility_warning() (warning if missing)
  • It is never read by the actor loader (loader.py), actor compiler (compiler.py), or any LLM invocation code

I searched every src/ file on the branch for response_format — it appears in exactly two files: actor/schema.py (field definition + warning logic) and plan_preflight_guardrail.py (warning logic). No runtime code consumes it.

Consequences:

  1. The JSON schema is parsed from YAML and stored, then ignored. LLM calls do not receive it as OpenAI's response_format parameter or in any other form.
  2. The system_prompt says "matching the provided response_format schema" but the actual schema content is never injected into the LLM context — the LLM never sees the schema.
  3. Calling this an "output contract" is misleading. There is no enforcement, no structured output mode, no schema injection. The field is purely decorative metadata.

Suggested fix: Either:

  • (a) Wire response_format into the LLM invocation path so it's actually used, OR
  • (b) Rewrite the CHANGELOG/docs to say this is a schema annotation for future use rather than a functioning output contract. Remove the system_prompt line that references a schema the LLM will never see.

The CHANGELOG says:

non-fatal compatibility warnings for estimation-role actors in both actor registration flow and plan preflight guardrails

Prior rounds flagged the preflight warning as dead code. To sharpen why: in plan_lifecycle_service.py:863-867, the production caller passes strings (actor names) as actor_registry values:

actor_registry = {
    "estimation": plan.estimation_actor or "__optional__",
    ...
}

But check_estimation_actor_compatibility_warning() immediately bails on non-dicts:

if not isinstance(estimation_actor, dict):
    return None

The tests pass because they construct synthetic registries with dict values, which doesn't match production. The word "both" in the CHANGELOG is a factual inaccuracy — the warning only fires in the CLI actor add/actor update path, never in preflight.

Suggested fix: Change "in both actor registration flow and plan preflight guardrails" to "during CLI actor registration (actor add / actor update)". Either fix the production integration or remove the preflight claim.

## Fifth-pass review — Factual accuracy of CHANGELOG, PR body, and docs Prior rounds flagged the dense CHANGELOG, stale Robot docs, and dead preflight warning code. This pass verified the *factual accuracy* of claims in documentation against what the code actually does. One genuinely new P1 finding. --- ### P1 — `response_format` is inert: "output contract" claim in CHANGELOG and docs is false **What the docs/CHANGELOG claim:** - CHANGELOG: "EstimationReport JSON schema output contract" - `docs/reference/actors_examples.md`: "defines `response_format` with an `EstimationReport` JSON schema so outputs are consistently machine-readable" - `examples/actors/estimator.yaml` system_prompt: "Always return valid JSON matching the provided response_format schema" **What the code actually does:** - `response_format` is declared as a `dict[str, Any] | None` field on `ActorConfigSchema` (`schema.py:728`) - It is checked for *presence* in `actor_role_warnings()` and `check_estimation_actor_compatibility_warning()` (warning if missing) - It is **never read** by the actor loader (`loader.py`), actor compiler (`compiler.py`), or any LLM invocation code I searched every `src/` file on the branch for `response_format` — it appears in exactly two files: `actor/schema.py` (field definition + warning logic) and `plan_preflight_guardrail.py` (warning logic). No runtime code consumes it. **Consequences:** 1. The JSON schema is parsed from YAML and stored, then ignored. LLM calls do not receive it as OpenAI's `response_format` parameter or in any other form. 2. The system_prompt says "matching the provided response_format schema" but the actual schema content is never injected into the LLM context — the LLM never sees the schema. 3. Calling this an "output contract" is misleading. There is no enforcement, no structured output mode, no schema injection. The field is purely decorative metadata. **Suggested fix:** Either: - (a) Wire `response_format` into the LLM invocation path so it's actually used, OR - (b) Rewrite the CHANGELOG/docs to say this is a *schema annotation for future use* rather than a functioning output contract. Remove the system_prompt line that references a schema the LLM will never see. --- ### P1 (sharpened) — CHANGELOG "both" claim is factually wrong (related to known dead-code issue) The CHANGELOG says: > non-fatal compatibility warnings for estimation-role actors in **both** actor registration flow and plan preflight guardrails Prior rounds flagged the preflight warning as dead code. To sharpen why: in `plan_lifecycle_service.py:863-867`, the production caller passes **strings** (actor names) as actor_registry values: ```python actor_registry = { "estimation": plan.estimation_actor or "__optional__", ... } ``` But `check_estimation_actor_compatibility_warning()` immediately bails on non-dicts: ```python if not isinstance(estimation_actor, dict): return None ``` The tests pass because they construct synthetic registries with dict values, which doesn't match production. The word **"both"** in the CHANGELOG is a factual inaccuracy — the warning only fires in the CLI `actor add`/`actor update` path, never in preflight. **Suggested fix:** Change "in both actor registration flow and plan preflight guardrails" to "during CLI actor registration (`actor add` / `actor update`)". Either fix the production integration or remove the preflight claim.
CHANGELOG.md Outdated
@ -5,0 +8,4 @@
registration flow and plan preflight guardrails. Added
`examples/actors/estimator.yaml` with `context_view: strategist` and
`EstimationReport` JSON schema output contract, and updated actor example
docs and schema/example test coverage (Behave + Robot). Also aligned M1/M2/M3/M6
Member

P1 — Factual inaccuracy: This line claims an "EstimationReport JSON schema output contract" but response_format is never consumed by any LLM invocation code. The JSON schema is parsed and stored on the config object, then ignored at runtime. Calling it an "output contract" implies enforcement that does not exist.

Also: "in both actor registration flow and plan preflight guardrails" — the preflight path never fires in production because plan_lifecycle_service.py passes string actor names into the registry, while the warning check requires dict values. The word "both" is factually wrong.

**P1 — Factual inaccuracy:** This line claims an "EstimationReport JSON schema output contract" but `response_format` is never consumed by any LLM invocation code. The JSON schema is parsed and stored on the config object, then ignored at runtime. Calling it an "output contract" implies enforcement that does not exist. Also: "in both actor registration flow and plan preflight guardrails" — the preflight path never fires in production because `plan_lifecycle_service.py` passes string actor names into the registry, while the warning check requires dict values. The word "both" is factually wrong.
@ -286,0 +292,4 @@
This actor is intended for the optional `estimation_actor` slot on actions. It uses
`context_view: strategist`, declares `role_hint: estimation`, and defines
`response_format` with an `EstimationReport` JSON schema so outputs are consistently
Member

P1 — Misleading claim: "defines response_format with an EstimationReport JSON schema so outputs are consistently machine-readable" — response_format is not passed to the LLM API or injected into prompts. Outputs are NOT consistently machine-readable as a result of this field; it's inert metadata. The LLM may or may not return JSON depending on its mood, not this schema.

**P1 — Misleading claim:** "defines `response_format` with an `EstimationReport` JSON schema so outputs are consistently machine-readable" — `response_format` is not passed to the LLM API or injected into prompts. Outputs are NOT consistently machine-readable as a result of this field; it's inert metadata. The LLM may or may not return JSON depending on its mood, not this schema.
@ -0,0 +18,4 @@
- Risk assessment with severity and mitigation suggestions
- Key assumptions and unknowns
Always return valid JSON matching the provided response_format schema.
Member

This system_prompt line says "Always return valid JSON matching the provided response_format schema" — but the response_format schema defined below is never injected into the LLM context. The LLM will see this instruction but has no access to the actual schema definition. This is a broken reference in the prompt.

This system_prompt line says "Always return valid JSON matching the provided response_format schema" — but the response_format schema defined below is never injected into the LLM context. The LLM will see this instruction but has no access to the actual schema definition. This is a broken reference in the prompt.
brent.edwards left a comment

Code Review Round 5 — PR #975 feature/m6-estimation-actor-yaml

Reviewer: @brent.edwards | Focus: CLI persistence path tracing, test assertion fidelity, CHANGELOG accuracy, line-by-line P0/P1 sweep

Rounds 1–4 found 21 issues (1 P0, 7 P1, 8 P2, 5 P3). This round examined 5 additional threads plus a final line-by-line production code audit.


No new P0 or P1 findings.

The fifth pass confirmed the production code changes are crash-safe, backward-compatible, and contain no data corruption or security vectors. The new Pydantic fields default to None, all dict access is guarded, and the warning system is strictly advisory (cannot affect all_passed).

Threads investigated:

Thread Question Verdict
CLI actor add/update path Does role_hint survive YAML → canonicalize → DB? Clean_canonicalize_actor_config does shallow dict copy, no field stripping
CLI actor update path Does updating response_format work? Clean — new YAML replaces entire config_blob
_print_role_warnings input Does it receive the right config shape? Clean — receives canonicalized dict with all YAML fields
Test fixture fidelity Does pfg-check2-warn use a production-realistic registry? P2 — fixture uses dicts (warns correctly) but production uses strings (always returns None). Consequence of existing P1-21.
CHANGELOG claims Does "both" paths claim match reality? Reinforces P1-21 — CHANGELOG falsely claims preflight warnings work
Line-by-line audit Any crash/corruption/security in changed production lines? Clean — all read-only/advisory code, parameterized storage, no user content in warning strings

P2:should-fix (2)

22. _coerce_role_hint is case-sensitive — role_hint: ESTIMATION silently bypasses warnings
RoleHint("ESTIMATION") raises ValueError (enum values are lowercase). The except ValueError: return None path treats uppercase as "not set," so actor_role_warnings returns []. The actor YAML is accepted and stored, but the estimation advisory warnings never fire. A .lower() before enum construction would fix this.

23. response_format test assertion checks only title key out of 25-line JSON Schema dict
actor_schema_steps.py asserts only context.actor_config.response_format.get("title") == expected_title. If the remaining 24 lines of the JSON Schema ($schema, type, required, properties, additionalProperties) were deleted from the example YAML, the test would still pass. Add at minimum a check that "required" or "properties" keys are present.


Summary (Round 5)

Severity Count
P2:should-fix 2

Combined total across 5 rounds: 1 P0 + 7 P1 + 10 P2 + 5 P3 = 23 findings.

I'm confident the P0/P1 surface for this PR is fully covered. The core issues remain:

  1. P0-1: Preflight warning logic diverges from actor_role_warnings
  2. P1-21: Preflight warning is dead code (registry passes strings, not dicts)
  3. P1-13: response_format declared but never consumed at runtime

These three form a coherent picture: the PR's headline feature — "preflight compatibility warnings for estimation actors" — is non-functional in production. The fix is straightforward: resolve actor name strings to their config dicts before checking, and delegate to actor_role_warnings() as the single source of truth.

No further review rounds planned.

## Code Review Round 5 — PR #975 `feature/m6-estimation-actor-yaml` **Reviewer:** @brent.edwards | **Focus:** CLI persistence path tracing, test assertion fidelity, CHANGELOG accuracy, line-by-line P0/P1 sweep Rounds 1–4 found 21 issues (1 P0, 7 P1, 8 P2, 5 P3). This round examined 5 additional threads plus a final line-by-line production code audit. --- ### No new P0 or P1 findings. The fifth pass confirmed the production code changes are crash-safe, backward-compatible, and contain no data corruption or security vectors. The new Pydantic fields default to `None`, all dict access is guarded, and the warning system is strictly advisory (cannot affect `all_passed`). Threads investigated: | Thread | Question | Verdict | |--------|----------|---------| | CLI actor add/update path | Does `role_hint` survive YAML → canonicalize → DB? | **Clean** — `_canonicalize_actor_config` does shallow dict copy, no field stripping | | CLI actor update path | Does updating `response_format` work? | **Clean** — new YAML replaces entire config_blob | | `_print_role_warnings` input | Does it receive the right config shape? | **Clean** — receives canonicalized dict with all YAML fields | | Test fixture fidelity | Does `pfg-check2-warn` use a production-realistic registry? | P2 — fixture uses dicts (warns correctly) but production uses strings (always returns None). Consequence of existing P1-21. | | CHANGELOG claims | Does "both" paths claim match reality? | Reinforces P1-21 — CHANGELOG falsely claims preflight warnings work | | Line-by-line audit | Any crash/corruption/security in changed production lines? | **Clean** — all read-only/advisory code, parameterized storage, no user content in warning strings | --- ### P2:should-fix (2) **22. `_coerce_role_hint` is case-sensitive — `role_hint: ESTIMATION` silently bypasses warnings** `RoleHint("ESTIMATION")` raises `ValueError` (enum values are lowercase). The `except ValueError: return None` path treats uppercase as "not set," so `actor_role_warnings` returns `[]`. The actor YAML is accepted and stored, but the estimation advisory warnings never fire. A `.lower()` before enum construction would fix this. **23. `response_format` test assertion checks only `title` key out of 25-line JSON Schema dict** `actor_schema_steps.py` asserts only `context.actor_config.response_format.get("title") == expected_title`. If the remaining 24 lines of the JSON Schema (`$schema`, `type`, `required`, `properties`, `additionalProperties`) were deleted from the example YAML, the test would still pass. Add at minimum a check that `"required"` or `"properties"` keys are present. --- ### Summary (Round 5) | Severity | Count | |----------|-------| | P2:should-fix | 2 | **Combined total across 5 rounds: 1 P0 + 7 P1 + 10 P2 + 5 P3 = 23 findings.** I'm confident the P0/P1 surface for this PR is fully covered. The core issues remain: 1. **P0-1**: Preflight warning logic diverges from `actor_role_warnings` 2. **P1-21**: Preflight warning is dead code (registry passes strings, not dicts) 3. **P1-13**: `response_format` declared but never consumed at runtime These three form a coherent picture: the PR's headline feature — "preflight compatibility warnings for estimation actors" — is non-functional in production. The fix is straightforward: resolve actor name strings to their config dicts before checking, and delegate to `actor_role_warnings()` as the single source of truth. No further review rounds planned.
Author
Member

Brent Review Response (PR #975)

Hi Brent — thank you for the depth and persistence across all review rounds.
This response is intentionally mapped to the full review history in PR_COMMENTS.md (not only the last test-stability fix), so re-review can be done against the original finding set.

Scope of This Response

This covers all 5 rounds and addresses:

  • original P0/P1/P2/P3 findings,
  • the later production-path correctness findings (dead preflight path and documentation accuracy),
  • and the final full-suite execution stability fix discovered during coverage reruns.

Detailed task tracking is in PR_REVIEW_CHECKLIST_AND_CHANGES.md; this file is the reviewer-facing narrative summary.

P0 Items

P0-1: Preflight divergence from actor_role_warnings

Resolved. Preflight checks now use shared warning logic instead of maintaining a parallel, drifting subset.
This removed inconsistency between CLI actor warnings and preflight warnings and aligned behavior for response_format + context_view compatibility messaging.

P0-2: duplicated provider-unavailable helper in M1/M2/M3/M6 Robot helpers

Resolved. Duplicated logic was extracted to shared E2E helper code and all affected helpers now import the shared implementation.

P0-3: PR metadata (milestone/Type label/assignee)

This is a Forgejo/Git hosting metadata task, not a codebase diff task.
Code changes in this branch do not directly set repository PR labels/milestone/assignee, but the requirement is acknowledged and tracked as an explicit checklist item.

P1 Items

response_format validation and runtime-claim accuracy

Resolved in two parts:

  1. Validation hardening: schema validation now rejects malformed response_format structures rather than accepting arbitrary dict payloads.
  2. Claim accuracy: docs/changelog/example text was corrected so current behavior is described truthfully (metadata today; runtime enforcement tracked separately), avoiding false “output contract” implications.

Provider-unavailable heuristic too broad

Resolved. Matching logic was tightened away from overly broad key-name substring behavior and centralized in the shared helper.

Warning model category hard-coding

Resolved. Warnings are represented structurally (category + message), and report rendering no longer assumes a single warning class.

Dead preflight path (string actor name vs dict config payload)

Resolved. The lifecycle/preflight path now resolves actor-name references to concrete actor config payloads before compatibility checks, so warnings execute in production-realistic flows rather than only synthetic test dict fixtures.

Role enum drift risk (RoleHint vs static role list)

Resolved. Role list derivation now follows enum-backed source logic to avoid manual drift.

P2 Items

File-size and modularity concerns

Resolved for new role logic added by this issue: role validation/coercion/warning logic was extracted into src/cleveragents/actor/role_validation.py, reducing schema bloat and isolating responsibility.

Missing branch coverage paths identified in review

Resolved with explicit Behave/Robot additions, including:

  • invalid role_hint rejection,
  • non-dict / malformed response_format validation paths,
  • ActorConfigSchema model-input warning branch,
  • non-strategist/invalid context_view warning branch,
  • stronger estimator-schema assertions (required/properties etc., not only title),
  • estimator example tools/skills assertions.

PR checklist quality gates status

Updated in PR_DESCRIPTION.md to reflect completed quality gates and aligned with executed sessions.

Robot stale docs/assertion mismatch

Resolved by making the assertion non-brittle and keeping docs/assertions consistent for example count behavior.

P3 Items

Resolved nitpicks include:

  • added missing coercion docstrings,
  • changelog readability improvements (dense paragraph split into clear bullets),
  • estimator YAML cosmetic improvements ("$schema" quoting + improved description text),
  • logging visibility for unrecognized role-hint typo paths.

Full-Run Stability Work (Post-Review Test Execution)

After functional fixes, there remained an intermittent “passes individually, fails in full run” behavior during coverage_report.
Root cause was cross-scenario patcher leakage in certain step modules under large combined execution.

Resolved by:

  • registering patcher teardown with scenario cleanup (instead of relying on local ad-hoc cleanup patterns),
  • using safe-stop semantics for already-stopped patchers,
  • validating with targeted feature reruns and full coverage rerun.
# Brent Review Response (PR #975) Hi Brent — thank you for the depth and persistence across all review rounds. This response is intentionally mapped to the full review history in `PR_COMMENTS.md` (not only the last test-stability fix), so re-review can be done against the original finding set. ## Scope of This Response This covers all 5 rounds and addresses: - original P0/P1/P2/P3 findings, - the later production-path correctness findings (dead preflight path and documentation accuracy), - and the final full-suite execution stability fix discovered during coverage reruns. Detailed task tracking is in `PR_REVIEW_CHECKLIST_AND_CHANGES.md`; this file is the reviewer-facing narrative summary. ## P0 Items ### P0-1: Preflight divergence from `actor_role_warnings` Resolved. Preflight checks now use shared warning logic instead of maintaining a parallel, drifting subset. This removed inconsistency between CLI actor warnings and preflight warnings and aligned behavior for `response_format` + `context_view` compatibility messaging. ### P0-2: duplicated provider-unavailable helper in M1/M2/M3/M6 Robot helpers Resolved. Duplicated logic was extracted to shared E2E helper code and all affected helpers now import the shared implementation. ### P0-3: PR metadata (milestone/Type label/assignee) This is a Forgejo/Git hosting metadata task, not a codebase diff task. Code changes in this branch do not directly set repository PR labels/milestone/assignee, but the requirement is acknowledged and tracked as an explicit checklist item. ## P1 Items ### `response_format` validation and runtime-claim accuracy Resolved in two parts: 1. **Validation hardening:** schema validation now rejects malformed `response_format` structures rather than accepting arbitrary dict payloads. 2. **Claim accuracy:** docs/changelog/example text was corrected so current behavior is described truthfully (metadata today; runtime enforcement tracked separately), avoiding false “output contract” implications. ### Provider-unavailable heuristic too broad Resolved. Matching logic was tightened away from overly broad key-name substring behavior and centralized in the shared helper. ### Warning model category hard-coding Resolved. Warnings are represented structurally (category + message), and report rendering no longer assumes a single warning class. ### Dead preflight path (string actor name vs dict config payload) Resolved. The lifecycle/preflight path now resolves actor-name references to concrete actor config payloads before compatibility checks, so warnings execute in production-realistic flows rather than only synthetic test dict fixtures. ### Role enum drift risk (`RoleHint` vs static role list) Resolved. Role list derivation now follows enum-backed source logic to avoid manual drift. ## P2 Items ### File-size and modularity concerns Resolved for new role logic added by this issue: role validation/coercion/warning logic was extracted into `src/cleveragents/actor/role_validation.py`, reducing schema bloat and isolating responsibility. ### Missing branch coverage paths identified in review Resolved with explicit Behave/Robot additions, including: - invalid `role_hint` rejection, - non-dict / malformed `response_format` validation paths, - `ActorConfigSchema` model-input warning branch, - non-strategist/invalid `context_view` warning branch, - stronger estimator-schema assertions (`required`/`properties` etc., not only title), - estimator example tools/skills assertions. ### PR checklist quality gates status Updated in `PR_DESCRIPTION.md` to reflect completed quality gates and aligned with executed sessions. ### Robot stale docs/assertion mismatch Resolved by making the assertion non-brittle and keeping docs/assertions consistent for example count behavior. ## P3 Items Resolved nitpicks include: - added missing coercion docstrings, - changelog readability improvements (dense paragraph split into clear bullets), - estimator YAML cosmetic improvements (`"$schema"` quoting + improved description text), - logging visibility for unrecognized role-hint typo paths. ## Full-Run Stability Work (Post-Review Test Execution) After functional fixes, there remained an intermittent “passes individually, fails in full run” behavior during `coverage_report`. Root cause was cross-scenario patcher leakage in certain step modules under large combined execution. Resolved by: - registering patcher teardown with scenario cleanup (instead of relying on local ad-hoc cleanup patterns), - using safe-stop semantics for already-stopped patchers, - validating with targeted feature reruns and full coverage rerun.
aditya added this to the v3.6.0 milestone 2026-03-17 14:09:11 +00:00
Owner

PM Status — Day 37 — Rebase Required

This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.

@aditya — Please rebase this PR onto master by Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.


PM rebase request — Day 37

## PM Status — Day 37 — Rebase Required This PR has **merge conflicts** and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved. @aditya — Please rebase this PR onto `master` by **Day 39 EOD (2026-03-19)**. If you cannot rebase by then, please post a comment explaining the blocker. --- *PM rebase request — Day 37*
brent.edwards approved these changes 2026-03-17 19:53:47 +00:00
Dismissed
brent.edwards left a comment

Re-Review — PR #975 feature/m6-estimation-actor-yaml

Reviewer: @brent.edwards | Round 6 — verifying resolution of all critical findings from 5 prior rounds


Prior Critical Findings Status

# Sev Finding Status
P0-1 Preflight diverges from actor_role_warnings RESOLVEDcheck_estimation_actor_compatibility_warnings now delegates to actor_role_warnings() for all code paths (dict, model object via model_dump, string via actor_resolver). No reimplemented logic.
P1-21 Preflight warning dead code (registry passes strings not dicts) RESOLVED — Added actor_resolver: Callable parameter. String entries resolved through callback before warning checks. Falls back to [] if no resolver.
P1-13 response_format dead field Downgraded to informational — Now actively used by actor_role_warnings() for estimation actor warnings. Not consumed at LLM call time, but that's expected for this PR's scope (schema+validation only).

No new P0/P1 findings.

The most impactful finding from 5 rounds of review — that the preflight warning was 100% dead code due to registry passing strings — has been fixed with a clean actor_resolver callback pattern. The single source of truth is now actor_role_warnings().


Verdict: APPROVED — All 23 findings from 5 prior rounds have been addressed or accepted. The PR is ready to merge.

## Re-Review — PR #975 `feature/m6-estimation-actor-yaml` **Reviewer:** @brent.edwards | **Round 6** — verifying resolution of all critical findings from 5 prior rounds --- ### Prior Critical Findings Status | # | Sev | Finding | Status | |---|-----|---------|--------| | **P0-1** | Preflight diverges from `actor_role_warnings` | **RESOLVED** — `check_estimation_actor_compatibility_warnings` now delegates to `actor_role_warnings()` for all code paths (dict, model object via `model_dump`, string via `actor_resolver`). No reimplemented logic. | | **P1-21** | Preflight warning dead code (registry passes strings not dicts) | **RESOLVED** — Added `actor_resolver: Callable` parameter. String entries resolved through callback before warning checks. Falls back to `[]` if no resolver. | | **P1-13** | `response_format` dead field | **Downgraded to informational** — Now actively used by `actor_role_warnings()` for estimation actor warnings. Not consumed at LLM call time, but that's expected for this PR's scope (schema+validation only). | ### No new P0/P1 findings. The most impactful finding from 5 rounds of review — that the preflight warning was 100% dead code due to registry passing strings — has been fixed with a clean `actor_resolver` callback pattern. The single source of truth is now `actor_role_warnings()`. --- **Verdict:** APPROVED — All 23 findings from 5 prior rounds have been addressed or accepted. The PR is ready to merge.
aditya force-pushed feature/m6-estimation-actor-yaml from 9e30b74293
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 20s
CI / build (pull_request) Successful in 20s
CI / quality (pull_request) Successful in 37s
CI / typecheck (pull_request) Successful in 40s
CI / security (pull_request) Successful in 1m2s
CI / e2e_tests (pull_request) Successful in 1m26s
CI / unit_tests (pull_request) Successful in 3m19s
CI / integration_tests (pull_request) Successful in 3m42s
CI / docker (pull_request) Successful in 55s
CI / coverage (pull_request) Successful in 7m37s
CI / benchmark-regression (pull_request) Successful in 38m4s
to 2764fcef5c
All checks were successful
CI / benchmark-publish (pull_request) Has been skipped
CI / lint (pull_request) Successful in 16s
CI / build (pull_request) Successful in 25s
CI / quality (pull_request) Successful in 34s
CI / typecheck (pull_request) Successful in 39s
CI / security (pull_request) Successful in 52s
CI / unit_tests (pull_request) Successful in 2m51s
CI / integration_tests (pull_request) Successful in 3m41s
CI / docker (pull_request) Successful in 56s
CI / e2e_tests (pull_request) Successful in 3m56s
CI / coverage (pull_request) Successful in 6m28s
CI / benchmark-regression (pull_request) Successful in 38m39s
2026-03-18 07:03:04 +00:00
Compare
aditya dismissed brent.edwards's review 2026-03-18 07:03:04 +00:00
Reason:

New commits pushed, approval review dismissed automatically according to repository settings

Author
Member

Rebased onto latest master, resolved conflict in CHANGELOG.md, already approved merging once all checks are passed

Rebased onto latest master, resolved conflict in CHANGELOG.md, already approved merging once all checks are passed
aditya merged commit 20efab9021 into master 2026-03-18 07:23:14 +00:00
aditya deleted branch feature/m6-estimation-actor-yaml 2026-03-18 07:23:15 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core!975
No description provided.