cleveragents/cleveragents-core

Fork 3

UAT: `ApplyValidationGate` uses `DefaultValidationRunner` stub — apply-phase validation does not execute registered validation tools #3490

New issue

Open

opened 2026-04-05 18:36:35 +00:00 by freemo · 3 comments

freemo commented

2026-04-05 18:36:35 +00:00

Owner

Metadata

Branch: fix/m3-apply-validation-gate-real-runner
Commit Message: fix(validation): replace DefaultValidationRunner stub with ToolRegistry-backed runner in ApplyValidationGate
Milestone: v3.2.0
Parent Epic: #357

Bug Description

The ApplyValidationGate class (src/cleveragents/application/services/validation_apply.py) is designed to gate the Apply phase by running validation attachments. However, its default runner implementation (DefaultValidationRunner) is a stub that uses text-matching against context strings to determine pass/fail — it does not actually invoke the registered validation tools from the Tool Registry.

The DefaultValidationRunner.run_validation() method contains this comment in the code:

"This is a placeholder — real implementations invoke the actual validation tool."

And its logic is:

context_text = " ".join(f"{k}={v}" for k, v in context.items()).lower()
val_name = attachment.validation_name.lower()
passed = val_name in context_text

This means a validation named local/lint-check will "pass" if the string "local/lint-check" appears anywhere in the context dict values, and "fail" otherwise — completely unrelated to whether the actual lint check passes.

Spec References (docs/specification.md):

Line 22322: "a plain Tool cannot be used where a Validation is specifically required" — implies real tool execution
Line 22560: "If the validation is informational, the error is still recorded but does not block." — implies real execution results
Line 22808: "final_validation_results | plan.apply | Snapshot of the final validation state at the time execution completed (all required validations passing)."

Expected Behavior:
ApplyValidationGate should use a real ValidationRunner implementation that:

Looks up the validation tool from the Tool Registry by name
Executes the tool via ToolRunner.execute() with the appropriate inputs
Interprets the tool's return value (checking for passed: bool in the output)
Returns an ApplyValidationResult based on the actual tool execution result

Actual Behavior:
ApplyValidationGate uses DefaultValidationRunner which performs text-matching against context strings. Registered validation tools are never actually executed. The Apply phase validation gate produces meaningless results based on string matching rather than real validation outcomes.

Code Location:

src/cleveragents/application/services/validation_apply.py, class DefaultValidationRunner (lines ~270–320)
The DefaultValidationRunner is the default runner passed to ApplyValidationGate.__init__() in all usages
A real ToolRegistryValidationRunner needs to be implemented that uses ToolRunner.execute() and checks the passed boolean in the output

Impact: Critical — the Apply phase validation gate is non-functional. Required validations that should block apply never actually run their registered tool logic. Plans can be applied even when their resources fail real validation checks.

Subtasks

Implement ToolRegistryValidationRunner in src/cleveragents/application/services/validation_apply.py that looks up validation tools from the Tool Registry by name
Wire ToolRegistryValidationRunner to use ToolRunner.execute() to invoke the validation tool with appropriate inputs
Parse the tool's return value and check for passed: bool in the output to determine pass/fail
Replace DefaultValidationRunner as the default runner in ApplyValidationGate.__init__() with ToolRegistryValidationRunner
Retain DefaultValidationRunner (or remove it) only as a test double in features/mocks/ if needed
Add static type annotations to all new/modified methods; verify with nox -e typecheck
Tests (Behave): Add/update scenarios in features/ covering real tool invocation, passed: True, passed: False, and missing tool cases
Tests (Robot): Add/update integration test in robot/ verifying end-to-end apply-phase validation gate with a real registered validation tool
Verify coverage ≥ 97% via nox -s coverage_report
Run nox (all default sessions), fix any errors

Definition of Done

This issue is complete when:

All subtasks above are completed and checked off.
ApplyValidationGate uses a ValidationRunner implementation that actually invokes registered validation tools via ToolRunner.execute() and interprets the passed: bool result — no text-matching stub logic remains in the production code path.
A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation.
The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
The commit is submitted as a pull request to master, reviewed, and merged before this issue is marked done.
All nox stages pass.
Coverage ≥ 97%.

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: ca-uat-tester

## Metadata - **Branch**: `fix/m3-apply-validation-gate-real-runner` - **Commit Message**: `fix(validation): replace DefaultValidationRunner stub with ToolRegistry-backed runner in ApplyValidationGate` - **Milestone**: v3.2.0 - **Parent Epic**: #357 ## Bug Description The `ApplyValidationGate` class (`src/cleveragents/application/services/validation_apply.py`) is designed to gate the Apply phase by running validation attachments. However, its default runner implementation (`DefaultValidationRunner`) is a stub that uses text-matching against context strings to determine pass/fail — it does **not** actually invoke the registered validation tools from the Tool Registry. The `DefaultValidationRunner.run_validation()` method contains this comment in the code: > "This is a placeholder — real implementations invoke the actual validation tool." And its logic is: ```python context_text = " ".join(f"{k}={v}" for k, v in context.items()).lower() val_name = attachment.validation_name.lower() passed = val_name in context_text ``` This means a validation named `local/lint-check` will "pass" if the string `"local/lint-check"` appears anywhere in the context dict values, and "fail" otherwise — completely unrelated to whether the actual lint check passes. **Spec References** (`docs/specification.md`): - Line 22322: "a plain Tool cannot be used where a Validation is specifically required" — implies real tool execution - Line 22560: "If the validation is informational, the error is still recorded but does not block." — implies real execution results - Line 22808: "`final_validation_results` | `plan.apply` | Snapshot of the final validation state at the time execution completed (all required validations passing)." **Expected Behavior**: `ApplyValidationGate` should use a real `ValidationRunner` implementation that: 1. Looks up the validation tool from the Tool Registry by name 2. Executes the tool via `ToolRunner.execute()` with the appropriate inputs 3. Interprets the tool's return value (checking for `passed: bool` in the output) 4. Returns an `ApplyValidationResult` based on the actual tool execution result **Actual Behavior**: `ApplyValidationGate` uses `DefaultValidationRunner` which performs text-matching against context strings. Registered validation tools are never actually executed. The Apply phase validation gate produces meaningless results based on string matching rather than real validation outcomes. **Code Location**: - `src/cleveragents/application/services/validation_apply.py`, class `DefaultValidationRunner` (lines ~270–320) - The `DefaultValidationRunner` is the default runner passed to `ApplyValidationGate.__init__()` in all usages - A real `ToolRegistryValidationRunner` needs to be implemented that uses `ToolRunner.execute()` and checks the `passed` boolean in the output **Impact**: Critical — the Apply phase validation gate is non-functional. Required validations that should block apply never actually run their registered tool logic. Plans can be applied even when their resources fail real validation checks. ## Subtasks - [ ] Implement `ToolRegistryValidationRunner` in `src/cleveragents/application/services/validation_apply.py` that looks up validation tools from the Tool Registry by name - [ ] Wire `ToolRegistryValidationRunner` to use `ToolRunner.execute()` to invoke the validation tool with appropriate inputs - [ ] Parse the tool's return value and check for `passed: bool` in the output to determine pass/fail - [ ] Replace `DefaultValidationRunner` as the default runner in `ApplyValidationGate.__init__()` with `ToolRegistryValidationRunner` - [ ] Retain `DefaultValidationRunner` (or remove it) only as a test double in `features/mocks/` if needed - [ ] Add static type annotations to all new/modified methods; verify with `nox -e typecheck` - [ ] Tests (Behave): Add/update scenarios in `features/` covering real tool invocation, `passed: True`, `passed: False`, and missing tool cases - [ ] Tests (Robot): Add/update integration test in `robot/` verifying end-to-end apply-phase validation gate with a real registered validation tool - [ ] Verify coverage ≥ 97% via `nox -s coverage_report` - [ ] Run `nox` (all default sessions), fix any errors ## Definition of Done This issue is complete when: - All subtasks above are completed and checked off. - `ApplyValidationGate` uses a `ValidationRunner` implementation that actually invokes registered validation tools via `ToolRunner.execute()` and interprets the `passed: bool` result — no text-matching stub logic remains in the production code path. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line, then additional lines providing relevant details about the implementation. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged** before this issue is marked done. - All nox stages pass. - Coverage ≥ 97%. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: ca-uat-tester

freemo added this to the v3.2.0 milestone

2026-04-05 18:36:40 +00:00

freemo added the

labels

2026-04-05 18:36:47 +00:00

freemo added a new dependency

2026-04-05 18:36:51 +00:00

#357 Epic: Decisions + Validations + Invariants (M3)

freemo referenced this issue

2026-04-05 18:39:51 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3377

freemo referenced this issue

2026-04-05 18:45:07 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3377

freemo referenced this issue

2026-04-05 18:52:57 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3377

freemo referenced this issue

2026-04-05 19:05:02 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3377

freemo removed the

State

Unverified

label

2026-04-05 19:32:37 +00:00

freemo referenced this issue

2026-04-05 19:33:25 +00:00

UAT: ValidationPipeline not integrated into plan lifecycle — validations never execute during plan phases #3487

freemo commented

2026-04-05 19:33:28 +00:00

Author

Owner

Issue verified and triaged:

Priority: Critical — ApplyValidationGate uses text-matching stub instead of real tool execution, making apply-phase validation meaningless.
Milestone: v3.2.0 (already assigned)
Story Points: 5 (L) — requires implementing ToolRegistryValidationRunner with real tool invocation and result interpretation.
Parent Epic: #357 (already linked)
Dependency: Part of the validation subsystem fix cluster with #3487 and #3482.
Next step: This issue is now ready for implementation.

Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: ca-human-liaison

Issue verified and triaged: - **Priority**: Critical — ApplyValidationGate uses text-matching stub instead of real tool execution, making apply-phase validation meaningless. - **Milestone**: v3.2.0 (already assigned) - **Story Points**: 5 (L) — requires implementing ToolRegistryValidationRunner with real tool invocation and result interpretation. - **Parent Epic**: #357 (already linked) - **Dependency**: Part of the validation subsystem fix cluster with #3487 and #3482. - **Next step**: This issue is now ready for implementation. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: ca-human-liaison

freemo referenced this issue

2026-04-05 19:33:30 +00:00

UAT: agents validation attach accepts plain tools (tool_type="tool") — spec requires rejection #3482

freemo referenced this issue

2026-04-05 19:49:58 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3377

freemo referenced this issue

2026-04-05 20:03:42 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3572

freemo referenced this issue

2026-04-05 20:05:52 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3572

freemo referenced this issue

2026-04-05 20:07:13 +00:00

[Automated] CleverAgents Build Session - 2026-04-05 #3572

freemo commented

2026-04-05 20:18:22 +00:00

Author

Owner

Transition request: State/Verified -> State/In Progress for issue #3490 (cleveragents/cleveragents-core).

Current state label: State/Verified.
Other labels present: Priority/Critical, Type/Bug.

Preconditions:

Blocked label is not present, so there is no blocking issue to unlock.

Planned label changes (to be executed by maintainers or via API):

Remove old state label(s): State/Verified
Add new state label: State/In Progress

Note: This action requires label operations via the Forgejo API. The current toolset does not expose direct label editing endpoints. If you approve, I can either (a) attempt to apply these label changes through an extended API wrapper, or (b) leave a formal transition record here and proceed once label endpoints are available. For now, I am documenting the intended transition and will re-run when label-edit capability is accessible.

Blockers/Linked issues: None (Blocker not set).

Reason for transition: Verifies that work can commence on the issue and aligns with the Ticket Lifecycle: Unverified → Verified → In Progress.

Automated by CleverAgents Bot
Supervisor: Implementation | Agent: ca-issue-state-updater

Transition request: State/Verified -> State/In Progress for issue #3490 (cleveragents/cleveragents-core). Current state label: State/Verified. Other labels present: Priority/Critical, Type/Bug. Preconditions: - Blocked label is not present, so there is no blocking issue to unlock. Planned label changes (to be executed by maintainers or via API): - Remove old state label(s): State/Verified - Add new state label: State/In Progress Note: This action requires label operations via the Forgejo API. The current toolset does not expose direct label editing endpoints. If you approve, I can either (a) attempt to apply these label changes through an extended API wrapper, or (b) leave a formal transition record here and proceed once label endpoints are available. For now, I am documenting the intended transition and will re-run when label-edit capability is accessible. Blockers/Linked issues: None (Blocker not set). Reason for transition: Verifies that work can commence on the issue and aligns with the Ticket Lifecycle: Unverified → Verified → In Progress. --- **Automated by CleverAgents Bot** Supervisor: Implementation | Agent: ca-issue-state-updater

freemo commented

2026-04-06 18:51:56 +00:00

Author

Owner

Milestone Triage Decision: Moved to Backlog

This issue has been moved out of v3.3.0 during aggressive milestone triage. While important for completeness, it does not directly relate to the core focus of Corrections + Subplans + Checkpoints.

Reasoning:

v3.3.0 focus: Essential corrections, subplan management, and checkpoint functionality
This issue: Infrastructure improvement not directly blocking core milestone functionality
Impact: System enhancement, not core corrections/subplans/checkpoints functionality

Will be addressed in a future milestone after core corrections, subplans, and checkpoints are stable.

**Milestone Triage Decision: Moved to Backlog** This issue has been moved out of v3.3.0 during aggressive milestone triage. While important for completeness, it does not directly relate to the core focus of Corrections + Subplans + Checkpoints. **Reasoning:** - v3.3.0 focus: Essential corrections, subplan management, and checkpoint functionality - This issue: Infrastructure improvement not directly blocking core milestone functionality - Impact: System enhancement, not core corrections/subplans/checkpoints functionality Will be addressed in a future milestone after core corrections, subplans, and checkpoints are stable.