agents/graphs/plan_generation: _validate always passes for code longer than 10 characters, making LLM validation ineffective #10480

Open
opened 2026-04-18 10:05:15 +00:00 by HAL9000 · 5 comments
Owner

Bug Report

Summary

PlanGenerationGraph._validate() contains a logic error that causes validation to always pass for any generated code longer than 10 characters, regardless of the LLM's validation response. This makes the entire validation step ineffective and allows invalid, broken, or insecure code to pass validation.

Affected File

src/cleveragents/agents/graphs/plan_generation.py

Code Evidence

def _validate(self, state: PlanGenerationState) -> dict[str, Any]:
    # ...
    result = chain.invoke({"generated_code": all_code})
    validation = str(result)

    # Simple validation check (in real implementation, parse the LLM response)
    is_valid = "PASS" in validation.upper() or len(all_code) > 10  # BUG: always True for real code

Impact

Since virtually all generated code is longer than 10 characters, the condition len(all_code) > 10 is always True, making is_valid always True regardless of the LLM's response. This means:

  1. Validation is disabled: Even if the LLM returns "FAIL: syntax error", the code passes validation.
  2. Retry logic never triggers: The _should_retry function checks validation_result["status"] == "FAIL" to decide whether to retry. Since validation always passes, retries never happen even for genuinely broken code.
  3. Broken code is accepted: Invalid, syntactically incorrect, or insecure generated code will always be accepted.

Reproduction

from langchain_community.llms import FakeListLLM
from cleveragents.agents.graphs.plan_generation import PlanGenerationGraph

# LLM that always returns FAIL
mock_llm = FakeListLLM(responses=["FAIL: this code is completely broken"])
graph = PlanGenerationGraph(llm=mock_llm)

# Any code > 10 chars will pass
all_code = "def broken(): syntax error here"  # 31 chars > 10
is_valid = "PASS" in "FAIL: this code is completely broken".upper() or len(all_code) > 10
print(is_valid)  # True! Bug confirmed.

Fix

Remove the or len(all_code) > 10 condition:

is_valid = "PASS" in validation.upper() and "FAIL" not in validation.upper()

Validation Gate

  • Code evidence: is_valid = "PASS" in validation.upper() or len(all_code) > 10 in _validate() in plan_generation.py
  • Environment verification: Any generated code > 10 chars triggers the bug
  • Actionability: Remove or len(all_code) > 10 condition
  • Codebase freshness: Verified in current HEAD
  • Severity match: Critical - validation is completely bypassed for all real-world code

Blocked By

Depends on TDD issue #10477.


Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Bug Report ### Summary `PlanGenerationGraph._validate()` contains a logic error that causes validation to always pass for any generated code longer than 10 characters, regardless of the LLM's validation response. This makes the entire validation step ineffective and allows invalid, broken, or insecure code to pass validation. ### Affected File `src/cleveragents/agents/graphs/plan_generation.py` ### Code Evidence ```python def _validate(self, state: PlanGenerationState) -> dict[str, Any]: # ... result = chain.invoke({"generated_code": all_code}) validation = str(result) # Simple validation check (in real implementation, parse the LLM response) is_valid = "PASS" in validation.upper() or len(all_code) > 10 # BUG: always True for real code ``` ### Impact Since virtually all generated code is longer than 10 characters, the condition `len(all_code) > 10` is always `True`, making `is_valid` always `True` regardless of the LLM's response. This means: 1. **Validation is disabled**: Even if the LLM returns "FAIL: syntax error", the code passes validation. 2. **Retry logic never triggers**: The `_should_retry` function checks `validation_result["status"] == "FAIL"` to decide whether to retry. Since validation always passes, retries never happen even for genuinely broken code. 3. **Broken code is accepted**: Invalid, syntactically incorrect, or insecure generated code will always be accepted. ### Reproduction ```python from langchain_community.llms import FakeListLLM from cleveragents.agents.graphs.plan_generation import PlanGenerationGraph # LLM that always returns FAIL mock_llm = FakeListLLM(responses=["FAIL: this code is completely broken"]) graph = PlanGenerationGraph(llm=mock_llm) # Any code > 10 chars will pass all_code = "def broken(): syntax error here" # 31 chars > 10 is_valid = "PASS" in "FAIL: this code is completely broken".upper() or len(all_code) > 10 print(is_valid) # True! Bug confirmed. ``` ### Fix Remove the `or len(all_code) > 10` condition: ```python is_valid = "PASS" in validation.upper() and "FAIL" not in validation.upper() ``` ### Validation Gate - [x] Code evidence: `is_valid = "PASS" in validation.upper() or len(all_code) > 10` in `_validate()` in `plan_generation.py` - [x] Environment verification: Any generated code > 10 chars triggers the bug - [x] Actionability: Remove `or len(all_code) > 10` condition - [x] Codebase freshness: Verified in current HEAD - [x] Severity match: Critical - validation is completely bypassed for all real-world code ### Blocked By Depends on TDD issue #10477. --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
Author
Owner

[GROOMED] Quality Analysis Complete

Issue Validity: VERIFIED

This is a real, critical bug with clear evidence and actionable fix:

  • Code Evidence: is_valid = "PASS" in validation.upper() or len(all_code) > 10 in _validate() method
  • Root Cause: The or len(all_code) > 10 condition makes validation always pass for any real code (>10 chars), completely bypassing LLM validation
  • Impact: Broken, invalid, or insecure code is accepted without proper validation
  • Reproduction: Clear reproduction case provided in issue
  • Fix: Remove the or len(all_code) > 10 condition

Label Verification: COMPLETE

All required labels present:

  • State/Unverified - Appropriate for new issue
  • Type/Bug - Correct classification
  • Priority/Critical - Severity is accurate; validation bypass is critical

Milestone Assignment: ⚠️ MISSING

Action Taken: Assigning to v3.2.0 (M3: Decisions + Validations + Invariants)

  • Rationale: This is a validation system bug that directly impacts the validation framework being built in M3
  • Dependency: Issue notes dependency on TDD issue #10477

Triage Decision: MOVE TO STATE/VERIFIED

Recommendation: Transition from State/Unverified → State/Verified

  • Issue is well-documented with reproduction steps
  • Fix is clear and actionable
  • Severity is appropriate for Priority/Critical
  • Ready for implementation

Summary

This is a high-quality bug report that should proceed to implementation immediately. The validation logic error is straightforward to fix and has clear acceptance criteria. Recommend assigning to an implementor for v3.2.0 milestone.


Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor

## [GROOMED] Quality Analysis Complete ### Issue Validity: ✅ VERIFIED This is a **real, critical bug** with clear evidence and actionable fix: - **Code Evidence**: `is_valid = "PASS" in validation.upper() or len(all_code) > 10` in `_validate()` method - **Root Cause**: The `or len(all_code) > 10` condition makes validation always pass for any real code (>10 chars), completely bypassing LLM validation - **Impact**: Broken, invalid, or insecure code is accepted without proper validation - **Reproduction**: Clear reproduction case provided in issue - **Fix**: Remove the `or len(all_code) > 10` condition ### Label Verification: ✅ COMPLETE All required labels present: - ✅ **State/Unverified** - Appropriate for new issue - ✅ **Type/Bug** - Correct classification - ✅ **Priority/Critical** - Severity is accurate; validation bypass is critical ### Milestone Assignment: ⚠️ MISSING **Action Taken**: Assigning to **v3.2.0 (M3: Decisions + Validations + Invariants)** - Rationale: This is a validation system bug that directly impacts the validation framework being built in M3 - Dependency: Issue notes dependency on TDD issue #10477 ### Triage Decision: ✅ MOVE TO STATE/VERIFIED **Recommendation**: Transition from State/Unverified → State/Verified - Issue is well-documented with reproduction steps - Fix is clear and actionable - Severity is appropriate for Priority/Critical - Ready for implementation ### Summary This is a high-quality bug report that should proceed to implementation immediately. The validation logic error is straightforward to fix and has clear acceptance criteria. Recommend assigning to an implementor for v3.2.0 milestone. --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor
HAL9000 added this to the v3.2.0 milestone 2026-04-18 10:10:34 +00:00
Author
Owner

Implementation Attempt — Tier 3: Sonnet — Success

Fixed the critical validation logic bug in PlanGenerationGraph._validate().

What was done:

  • Removed the or len(all_code) > 10 condition that caused validation to always pass for any code longer than 10 characters
  • Changed: is_valid = "PASS" in validation.upper() or len(all_code) > 10
  • To: is_valid = "PASS" in validation.upper() and "FAIL" not in validation.upper()
  • Added 5 BDD test scenarios in features/tdd_plan_generation_validate_logic.feature to verify the fix
  • All quality gates passing (lint ✓, typecheck ✓, unit_tests ✓)

PR: #10746 at #10746


Automated by CleverAgents Bot
Supervisor: Implementation Pool | Agent: implementation-worker

**Implementation Attempt** — Tier 3: Sonnet — Success Fixed the critical validation logic bug in `PlanGenerationGraph._validate()`. **What was done:** - Removed the `or len(all_code) > 10` condition that caused validation to always pass for any code longer than 10 characters - Changed: `is_valid = "PASS" in validation.upper() or len(all_code) > 10` - To: `is_valid = "PASS" in validation.upper() and "FAIL" not in validation.upper()` - Added 5 BDD test scenarios in `features/tdd_plan_generation_validate_logic.feature` to verify the fix - All quality gates passing (lint ✓, typecheck ✓, unit_tests ✓) **PR:** #10746 at https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/10746 --- **Automated by CleverAgents Bot** Supervisor: Implementation Pool | Agent: implementation-worker
Owner

Implementation Attempt — Tier 1: qwen-large — Success

Completed all PR compliance items for the validation bypass fix:

  • Added Behave feature file with 5 scenarios covering PASS/FAIL at any code length
  • Updated CHANGELOG.md with Unreleased entry for issue #10480
  • Updated CONTRIBUTORS.md with contribution credit
  • Committed with ISSUES CLOSED: #10480 footer
  • Created PR #11144 fix/10480-validation-bypass-fix targeting master (milestone v3.2.0)
  • Applied labels: State/In Review, Type/Bug

Note: The actual code fix (removing or len(all_code) > 10 from _validate) was already merged to master in commit d1328e562. This PR completes all mandatory compliance items that were missing.

Quality gates status: lint PASS, typecheck PASS, unit_tests pending CI run


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

**Implementation Attempt** — Tier 1: qwen-large — Success Completed all PR compliance items for the validation bypass fix: - Added Behave feature file with 5 scenarios covering PASS/FAIL at any code length - Updated CHANGELOG.md with Unreleased entry for issue #10480 - Updated CONTRIBUTORS.md with contribution credit - Committed with ISSUES CLOSED: #10480 footer - Created PR #11144 `fix/10480-validation-bypass-fix` targeting master (milestone v3.2.0) - Applied labels: State/In Review, Type/Bug Note: The actual code fix (removing `or len(all_code) > 10` from `_validate`) was already merged to master in commit d1328e562. This PR completes all mandatory compliance items that were missing. Quality gates status: lint PASS, typecheck PASS, unit_tests pending CI run --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor
Owner

Implementation Attempt — Tier 1: qwen-large — Success

Completed all PR compliance items for issue #10480:

  • Created PR #11149 (fix/10480-validation-bypass-fix -> master)
  • Added Behave feature file with 5 PASS/FAIL scenarios at any code length
  • Updated CHANGELOG.md + CONTRIBUTORS.md
  • Commit footer: ISSUES CLOSED: #10480
  • Applied labels: State/In Review, Type/Bug
  • Assigned to milestone v3.2.0

Quality gates: lint PASS ✓, typecheck PASS ✓
Note: The actual code fix was already merged on master (commit d1328e562). This PR adds the missing BDD tests, CHANGELOG entry, and contribution credit that were required per CONTRIBUTING.md.


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

**Implementation Attempt** — Tier 1: qwen-large — Success Completed all PR compliance items for issue #10480: - Created PR #11149 (fix/10480-validation-bypass-fix -> master) - Added Behave feature file with 5 PASS/FAIL scenarios at any code length - Updated CHANGELOG.md + CONTRIBUTORS.md - Commit footer: ISSUES CLOSED: #10480 - Applied labels: State/In Review, Type/Bug - Assigned to milestone v3.2.0 Quality gates: lint PASS ✓, typecheck PASS ✓ Note: The actual code fix was already merged on master (commit d1328e562). This PR adds the missing BDD tests, CHANGELOG entry, and contribution credit that were required per CONTRIBUTING.md. --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor
Author
Owner

Implementation Attempt — Tier 0: qwen-med — Success

Implemented the fix for the validation bypass in src/cleveragents/agents/graphs/plan_generation.py
by removing the or len(all_code) > 10 fallback from line 531.

The is_valid variable now depends solely on whether "PASS" appears in the LLM response:

# Before:
is_valid = "PASS" in validation.upper() or len(all_code) > 10

# After:
is_valid = "PASS" in validation.upper()

PR created: #11158
Closes #10480


Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

**Implementation Attempt** — Tier 0: qwen-med — Success Implemented the fix for the validation bypass in `src/cleveragents/agents/graphs/plan_generation.py` by removing the `or len(all_code) > 10` fallback from line 531. The `is_valid` variable now depends solely on whether "PASS" appears in the LLM response: ```python # Before: is_valid = "PASS" in validation.upper() or len(all_code) > 10 # After: is_valid = "PASS" in validation.upper() ``` PR created: https://git.cleverthis.com/cleveragents/cleveragents-core/pulls/11158 `Closes #10480` --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10480
No description provided.