agents/graphs/plan_generation: Add test for _validate always passing validation for any code longer than 10 characters #10477

Open
opened 2026-04-18 10:04:48 +00:00 by HAL9000 · 1 comment
Owner

Test Description

Add a test that verifies PlanGenerationGraph._validate() correctly fails validation when the LLM returns a FAIL response, even when the generated code is longer than 10 characters.

Failing Scenario

@tdd_issue
@tdd_issue_1
@tdd_expected_fail
def test_validate_respects_llm_fail_response():
    """_validate() must fail when LLM returns FAIL, regardless of code length."""
    from unittest.mock import MagicMock
    from langchain_community.llms import FakeListLLM
    from cleveragents.agents.graphs.plan_generation import PlanGenerationGraph
    from cleveragents.domain.models.core import Change, OperationType

    # LLM returns FAIL response
    mock_llm = FakeListLLM(responses=["FAIL: syntax error on line 5"])
    graph = PlanGenerationGraph(llm=mock_llm)

    # Create a change with code longer than 10 characters
    change = MagicMock(spec=Change)
    change.file_path = "test.py"
    change.new_content = "def broken_function():\n    syntax error here\n    return None"  # > 10 chars

    state = {
        "generated_changes": [change],
        "validation_result": {},
        "error": None,
        # ... other required state fields
    }

    result = graph._validate(state)

    # When LLM says FAIL, validation must fail regardless of code length
    assert result["validation_result"]["status"] == "FAIL", (
        "Expected FAIL but got PASS. The `or len(all_code) > 10` condition "
        "incorrectly overrides the LLM's FAIL response."
    )

Root Cause

In src/cleveragents/agents/graphs/plan_generation.py, the _validate() method has a logic error:

# Simple validation check (in real implementation, parse the LLM response)
is_valid = "PASS" in validation.upper() or len(all_code) > 10  # BUG!

The or len(all_code) > 10 condition means that ANY generated code longer than 10 characters will pass validation, regardless of what the LLM returns. Since virtually all generated code is longer than 10 characters, the LLM validation is effectively disabled.

Expected Fix

Remove the or len(all_code) > 10 condition:

is_valid = "PASS" in validation.upper()

Or implement proper LLM response parsing:

is_valid = "PASS" in validation.upper() and "FAIL" not in validation.upper()

Automated by CleverAgents Bot
Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor

## Test Description Add a test that verifies `PlanGenerationGraph._validate()` correctly fails validation when the LLM returns a FAIL response, even when the generated code is longer than 10 characters. ## Failing Scenario ```python @tdd_issue @tdd_issue_1 @tdd_expected_fail def test_validate_respects_llm_fail_response(): """_validate() must fail when LLM returns FAIL, regardless of code length.""" from unittest.mock import MagicMock from langchain_community.llms import FakeListLLM from cleveragents.agents.graphs.plan_generation import PlanGenerationGraph from cleveragents.domain.models.core import Change, OperationType # LLM returns FAIL response mock_llm = FakeListLLM(responses=["FAIL: syntax error on line 5"]) graph = PlanGenerationGraph(llm=mock_llm) # Create a change with code longer than 10 characters change = MagicMock(spec=Change) change.file_path = "test.py" change.new_content = "def broken_function():\n syntax error here\n return None" # > 10 chars state = { "generated_changes": [change], "validation_result": {}, "error": None, # ... other required state fields } result = graph._validate(state) # When LLM says FAIL, validation must fail regardless of code length assert result["validation_result"]["status"] == "FAIL", ( "Expected FAIL but got PASS. The `or len(all_code) > 10` condition " "incorrectly overrides the LLM's FAIL response." ) ``` ## Root Cause In `src/cleveragents/agents/graphs/plan_generation.py`, the `_validate()` method has a logic error: ```python # Simple validation check (in real implementation, parse the LLM response) is_valid = "PASS" in validation.upper() or len(all_code) > 10 # BUG! ``` The `or len(all_code) > 10` condition means that ANY generated code longer than 10 characters will pass validation, regardless of what the LLM returns. Since virtually all generated code is longer than 10 characters, the LLM validation is effectively disabled. ## Expected Fix Remove the `or len(all_code) > 10` condition: ```python is_valid = "PASS" in validation.upper() ``` Or implement proper LLM response parsing: ```python is_valid = "PASS" in validation.upper() and "FAIL" not in validation.upper() ``` --- **Automated by CleverAgents Bot** Supervisor: Bug Hunt Pool | Agent: bug-hunt-pool-supervisor
Author
Owner

[GROOMED] Quality Analysis Complete

Issue Assessment

Valid & Actionable: This is a legitimate TDD issue for a critical bug in the PlanGenerationGraph._validate() method.

Issue Details

  • Type: Type/Testing (correctly labeled)
  • Scope: Add test case for validation logic bug
  • Root Cause: The _validate() method has a logic error where or len(all_code) > 10 condition allows ANY code longer than 10 characters to pass validation, regardless of LLM response
  • Impact: LLM validation is effectively disabled for most generated code

Grooming Actions Completed

State Label: Requires State/Unverified (new issues start unverified)
Priority Label: Requires Priority/High (TDD for critical validation bug)
Type Label: Already has Type/Testing ✓
Milestone: No milestone assigned (appropriate - this is a test for existing code)
Assignee: None (ready for backlog)

Quality Checks

  • Issue is clearly written with test scenario, root cause, and expected fix
  • Test code is well-structured with proper assertions
  • Root cause is accurately identified
  • Expected fix is provided with two implementation options
  • No orphaned issue - relates to core validation system
  • No duplicate issues detected

Recommendation

Status: Ready for Development

  • Assign to a developer when capacity available
  • This is a high-priority test that should be implemented before the validation logic is used in production
  • The test will help prevent regression of this critical bug

Automated by CleverAgents Bot
Supervisor: Grooming | Agent: grooming-pool-supervisor

## [GROOMED] Quality Analysis Complete ### Issue Assessment ✅ **Valid & Actionable**: This is a legitimate TDD issue for a critical bug in the `PlanGenerationGraph._validate()` method. ### Issue Details - **Type**: Type/Testing (correctly labeled) - **Scope**: Add test case for validation logic bug - **Root Cause**: The `_validate()` method has a logic error where `or len(all_code) > 10` condition allows ANY code longer than 10 characters to pass validation, regardless of LLM response - **Impact**: LLM validation is effectively disabled for most generated code ### Grooming Actions Completed ✅ **State Label**: Requires State/Unverified (new issues start unverified) ✅ **Priority Label**: Requires Priority/High (TDD for critical validation bug) ✅ **Type Label**: Already has Type/Testing ✓ ✅ **Milestone**: No milestone assigned (appropriate - this is a test for existing code) ✅ **Assignee**: None (ready for backlog) ### Quality Checks - ✅ Issue is clearly written with test scenario, root cause, and expected fix - ✅ Test code is well-structured with proper assertions - ✅ Root cause is accurately identified - ✅ Expected fix is provided with two implementation options - ✅ No orphaned issue - relates to core validation system - ✅ No duplicate issues detected ### Recommendation **Status**: Ready for Development - Assign to a developer when capacity available - This is a high-priority test that should be implemented before the validation logic is used in production - The test will help prevent regression of this critical bug --- **Automated by CleverAgents Bot** Supervisor: Grooming | Agent: grooming-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#10477
No description provided.