feat(safety): Connect Safety Profile to ShellSafetyService for autonomous-mode hard gates #6763

Open
opened 2026-04-10 02:03:23 +00:00 by drew · 4 comments
Member

Overview

Bridge the existing SafetyProfile (which knows the automation level) and the existing ShellSafetyService (which knows dangerous commands) so that shell danger detection escalates from advisory to blocking in headless autonomous execution modes.

Gap Being Filled

The specification explicitly states that shell danger detection is "advisory only — it never prevents command execution" (§TUI > Shell Danger Detection, ~line 30062). This is the correct design for the TUI where a human is watching.

However, in ci or full-auto automation profiles with no human present, the sandbox makes most operations reversible — but not everything. A command that recursively deletes the sandbox root, formats a filesystem mount, or pipes to /dev/sda destroys sandbox contents before any checkpoint can help. The advisory model plus sandbox does not fully cover this case.

The gap: The SafetyProfile (allow_unsafe_tools, automation level) and the ShellSafetyService (DangerousPatternDetector) are two systems that currently don't talk to each other. In autonomous modes, that connection is missing.

Proposed Design

When a plan is executed with allow_unsafe_tools: false AND the active automation profile is ci, auto, or full-auto (i.e., headless/autonomous), the ShellSafetyService should be configured in blocking mode for CRITICAL and HIGH danger-level patterns:

`python

Current behaviour (advisory everywhere)

block_level = ShellDangerLevel.CRITICAL # Nothing actually blocks today

Proposed: autonomous mode escalation

if safety_profile.allow_unsafe_tools is False and profile.is_headless:
block_level = ShellDangerLevel.HIGH # CRITICAL + HIGH patterns hard-block
else:
block_level = ShellDangerLevel.CRITICAL # Advisory-only (existing TUI behaviour)
`

  • No new system is needed. ShellSafetyService already has a block_level parameter — it just needs to be set correctly based on the Safety Profile context.
  • The existing DangerousPatternDetector and pattern registry are unchanged.
  • Blocked commands in autonomous mode produce a structured denial that is fed back into the actor's context (same pattern as the existing guardrail denial feedback), allowing the actor to reason about alternatives.
  • This is not an AST parser. It uses the existing regex-based pattern detector. The sandbox model handles the vast majority of cases; this hard gate covers the small set of patterns that can destroy the sandbox itself.

Inspiration from Claude Code

Claude Code has a 5-layer hard-block system for shell commands culminating in a 4,437-line hand-rolled recursive descent bash AST parser that is fail-closed by default. This would be impractical to replicate and largely redundant given CleverAgents' sandbox model.

The specific CC pattern being adapted is much simpler: CC's Layer 5 (Dangerous Pattern Detection) — a hardcoded blocklist that can override the permission system for known-dangerous operations. The CC insight is that certain commands should be hard-blocked regardless of user configuration. CleverAgents already has this list; what's missing is connecting it to the autonomy level so it becomes a hard gate rather than a warning when no human is watching.

Recommendation

If approved, update:

  • Specification §TUI > Shell Danger Detection: clarify that advisory-only applies to the TUI; add a note that autonomous execution contexts may enforce blocking
  • Specification §Automation & Safety > Safety Profile: document that allow_unsafe_tools: false in headless profiles implies shell blocking for CRITICAL/HIGH patterns
  • ADR-041 (Safety Profile Extraction): add the ShellSafetyService integration as a safety profile enforcement point
  • Implementation: Connect ShellSafetyService block level to Safety Profile in src/cleveragents/application/services/plan_executor.py or the relevant execution entry point. Existing files: src/cleveragents/tui/shell_safety/safety_service.py, src/cleveragents/tui/shell_safety/pattern_registry.py
## Overview Bridge the existing `SafetyProfile` (which knows the automation level) and the existing `ShellSafetyService` (which knows dangerous commands) so that shell danger detection escalates from **advisory** to **blocking** in headless autonomous execution modes. ## Gap Being Filled The specification explicitly states that shell danger detection is *"advisory only — it never prevents command execution"* (§TUI > Shell Danger Detection, ~line 30062). This is the correct design for the TUI where a human is watching. However, in `ci` or `full-auto` automation profiles with no human present, the sandbox makes most operations reversible — but **not everything**. A command that recursively deletes the sandbox root, formats a filesystem mount, or pipes to `/dev/sda` destroys sandbox contents before any checkpoint can help. The advisory model plus sandbox does not fully cover this case. **The gap:** The `SafetyProfile` (`allow_unsafe_tools`, automation level) and the `ShellSafetyService` (`DangerousPatternDetector`) are two systems that currently don't talk to each other. In autonomous modes, that connection is missing. ## Proposed Design When a plan is executed with `allow_unsafe_tools: false` AND the active automation profile is `ci`, `auto`, or `full-auto` (i.e., headless/autonomous), the `ShellSafetyService` should be configured in **blocking mode** for `CRITICAL` and `HIGH` danger-level patterns: `python # Current behaviour (advisory everywhere) block_level = ShellDangerLevel.CRITICAL # Nothing actually blocks today # Proposed: autonomous mode escalation if safety_profile.allow_unsafe_tools is False and profile.is_headless: block_level = ShellDangerLevel.HIGH # CRITICAL + HIGH patterns hard-block else: block_level = ShellDangerLevel.CRITICAL # Advisory-only (existing TUI behaviour) ` - No new system is needed. `ShellSafetyService` already has a `block_level` parameter — it just needs to be set correctly based on the Safety Profile context. - The existing `DangerousPatternDetector` and pattern registry are unchanged. - Blocked commands in autonomous mode produce a structured denial that is fed back into the actor's context (same pattern as the existing guardrail denial feedback), allowing the actor to reason about alternatives. - **This is not an AST parser.** It uses the existing regex-based pattern detector. The sandbox model handles the vast majority of cases; this hard gate covers the small set of patterns that can destroy the sandbox itself. ## Inspiration from Claude Code Claude Code has a **5-layer hard-block system** for shell commands culminating in a 4,437-line hand-rolled recursive descent bash AST parser that is fail-closed by default. This would be impractical to replicate and largely redundant given CleverAgents' sandbox model. The specific CC pattern being adapted is much simpler: CC's **Layer 5 (Dangerous Pattern Detection)** — a hardcoded blocklist that can override the permission system for known-dangerous operations. The CC insight is that certain commands should be hard-blocked regardless of user configuration. CleverAgents already has this list; what's missing is connecting it to the autonomy level so it becomes a hard gate rather than a warning when no human is watching. ## Recommendation If approved, update: - **Specification** §TUI > Shell Danger Detection: clarify that advisory-only applies to the TUI; add a note that autonomous execution contexts may enforce blocking - **Specification** §Automation & Safety > Safety Profile: document that `allow_unsafe_tools: false` in headless profiles implies shell blocking for CRITICAL/HIGH patterns - **ADR-041** (Safety Profile Extraction): add the `ShellSafetyService` integration as a safety profile enforcement point - **Implementation**: Connect `ShellSafetyService` block level to Safety Profile in `src/cleveragents/application/services/plan_executor.py` or the relevant execution entry point. Existing files: `src/cleveragents/tui/shell_safety/safety_service.py`, `src/cleveragents/tui/shell_safety/pattern_registry.py`
Owner

Thank you for filing this proposal, @drew. I have reviewed it.

This issue carries the Needs Feedback label, indicating it is a proposal awaiting project owner review. I will not modify its state, priority, or milestone — those decisions belong to the project owner (@freemo).

Summary of the proposal: Bridge the existing SafetyProfile and ShellSafetyService so that shell danger detection escalates from advisory to blocking in headless autonomous execution modes (ci, auto, full-auto). No new system needed — ShellSafetyService already has a block_level parameter that just needs to be set based on the Safety Profile context.

Completeness assessment: This proposal is well-structured and detailed. It includes:

  • Clear gap analysis with the specific spec section referenced (§TUI > Shell Danger Detection)
  • Concrete proposed design with Python pseudocode
  • Inspiration from Claude Code's Layer 5 dangerous pattern detection
  • Specific files and spec sections to update if approved

What happens next: The project owner (@freemo) needs to review this proposal and either:

  1. Approve it — at which point the Needs Feedback label is removed and implementation issues are created
  2. Request changes — provide feedback in comments
  3. Decline it — close with a State/Wont Do label and explanation

I will monitor this issue and respond to any questions or comments.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Thank you for filing this proposal, @drew. I have reviewed it. This issue carries the `Needs Feedback` label, indicating it is a proposal awaiting project owner review. I will not modify its state, priority, or milestone — those decisions belong to the project owner (@freemo). **Summary of the proposal:** Bridge the existing `SafetyProfile` and `ShellSafetyService` so that shell danger detection escalates from advisory to blocking in headless autonomous execution modes (`ci`, `auto`, `full-auto`). No new system needed — `ShellSafetyService` already has a `block_level` parameter that just needs to be set based on the Safety Profile context. **Completeness assessment:** This proposal is well-structured and detailed. It includes: - Clear gap analysis with the specific spec section referenced (§TUI > Shell Danger Detection) - Concrete proposed design with Python pseudocode - Inspiration from Claude Code's Layer 5 dangerous pattern detection - Specific files and spec sections to update if approved **What happens next:** The project owner (@freemo) needs to review this proposal and either: 1. Approve it — at which point the `Needs Feedback` label is removed and implementation issues are created 2. Request changes — provide feedback in comments 3. Decline it — close with a `State/Wont Do` label and explanation I will monitor this issue and respond to any questions or comments. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
Owner

Update: The architecture agent has reviewed this proposal and created PR #6884 with the corresponding specification update.

Status: PR #6884 is open and awaiting human review from @freemo. Once approved and merged, this issue will be closed and implementation issues will be created.

The spec PR covers all four proposals from @drew (#6760, #6761, #6763, #6765) in a single architectural cycle.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison

Update: The architecture agent has reviewed this proposal and created PR #6884 with the corresponding specification update. **Status:** PR #6884 is open and awaiting human review from @freemo. Once approved and merged, this issue will be closed and implementation issues will be created. The spec PR covers all four proposals from @drew (#6760, #6761, #6763, #6765) in a single architectural cycle. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison
Owner

Label compliance fix applied: Added missing Priority/Backlog label. Feature proposals without a milestone default to backlog priority per CONTRIBUTING.md.


Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: Added missing `Priority/Backlog` label. Feature proposals without a milestone default to backlog priority per CONTRIBUTING.md. --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Owner

Verified — Feature discussion: connect Safety Profile to ShellSafetyService. MoSCoW: Could-have. Priority: Low — future enhancement.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

✅ **Verified** — Feature discussion: connect Safety Profile to ShellSafetyService. MoSCoW: Could-have. Priority: Low — future enhancement. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#6763
No description provided.