UAT: No plugin isolation mechanism — plugin crashes propagate directly to core with only basic try/except protection #5726

Open
opened 2026-04-09 08:50:17 +00:00 by HAL9000 · 1 comment
Owner

Bug Report

Feature Area: Plugin Architecture — Plugin Isolation
Priority: Backlog (non-critical for MVP, but important for production stability)

What Was Tested

Analyzed src/cleveragents/infrastructure/plugins/manager.py for plugin isolation mechanisms (subprocess isolation, exception sandboxing, resource limits).

Expected Behavior (from spec)

The spec (§Extensibility, §Plugin Architecture Overview) states:

"The system is open for extension through well-defined interfaces, but closed for modification of core behavior."

This implies that plugins should not be able to break core functionality. A robust plugin system should provide:

  1. Exception isolation: Plugin exceptions should be caught and converted to PluginError, not allowed to propagate as arbitrary exceptions
  2. Resource isolation: Plugins should not be able to consume unbounded memory/CPU
  3. State isolation: A crashed plugin should not corrupt the PluginManager state

Actual Behavior

The PluginManager.activate_plugin() has only basic exception handling:

def activate_plugin(self, name: str) -> None:
    try:
        cls = self._loader.load_class(...)
        instance = cls()  # ← If cls() raises, only PluginError wraps it
        ...
    except (PluginLoadError, ProtocolMismatchError):
        descriptor.state = PluginState.ERRORED
        raise  # ← Re-raises original exception type
    except Exception as exc:
        descriptor.state = PluginState.ERRORED
        msg = f"Failed to activate plugin '{name}': {exc}"
        raise PluginError(msg) from exc  # ← Wraps in PluginError

Issues:

  1. No execution isolation: There is no execute_plugin() method, so callers invoke plugin methods directly on the instance (self._instances[name]). Any exception from a plugin method propagates directly to the caller with no wrapping.
  2. No resource limits: Plugins can consume unbounded memory/CPU during activation or execution.
  3. No subprocess isolation: Plugins run in the same process as core, so a plugin that calls sys.exit() or raises SystemExit will terminate the entire application.
  4. No timeout: Plugin activation has no timeout, so a plugin that hangs during __init__ will hang the entire application.

Code Location

  • src/cleveragents/infrastructure/plugins/manager.pyactivate_plugin() (lines 222–273)
  • src/cleveragents/infrastructure/plugins/manager.pyget_plugin_instance() (lines 175–185) — returns raw instance with no isolation wrapper

Impact

A misbehaving plugin can:

  1. Raise arbitrary exceptions that propagate to core code
  2. Hang the application during activation
  3. Call sys.exit() to terminate the process
  4. Consume unbounded memory/CPU
  5. Corrupt shared state

Steps to Reproduce

  1. Create a plugin whose __init__ raises SystemExit(1)
  2. Register and activate it — the entire application terminates
  3. Create a plugin whose __init__ runs while True: pass — application hangs

Suggested Fix

  1. Add timeout to activate_plugin() using concurrent.futures:
import concurrent.futures
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(cls)
    try:
        instance = future.result(timeout=30.0)  # 30-second activation timeout
    except concurrent.futures.TimeoutError:
        raise PluginLoadError(f"Plugin '{name}' activation timed out after 30s")
  1. Catch SystemExit and KeyboardInterrupt in activation:
except (SystemExit, KeyboardInterrupt) as exc:
    descriptor.state = PluginState.ERRORED
    raise PluginLoadError(f"Plugin '{name}' called sys.exit() during activation") from exc
  1. Add an execute_plugin() method that wraps all plugin method calls with exception isolation.

Automated by CleverAgents Bot
Supervisor: UAT Testing | Agent: uat-tester

## Bug Report **Feature Area**: Plugin Architecture — Plugin Isolation **Priority**: Backlog (non-critical for MVP, but important for production stability) ### What Was Tested Analyzed `src/cleveragents/infrastructure/plugins/manager.py` for plugin isolation mechanisms (subprocess isolation, exception sandboxing, resource limits). ### Expected Behavior (from spec) The spec (§Extensibility, §Plugin Architecture Overview) states: > "The system is open for extension through well-defined interfaces, but **closed for modification of core behavior**." This implies that plugins should not be able to break core functionality. A robust plugin system should provide: 1. **Exception isolation**: Plugin exceptions should be caught and converted to `PluginError`, not allowed to propagate as arbitrary exceptions 2. **Resource isolation**: Plugins should not be able to consume unbounded memory/CPU 3. **State isolation**: A crashed plugin should not corrupt the `PluginManager` state ### Actual Behavior The `PluginManager.activate_plugin()` has only basic exception handling: ```python def activate_plugin(self, name: str) -> None: try: cls = self._loader.load_class(...) instance = cls() # ← If cls() raises, only PluginError wraps it ... except (PluginLoadError, ProtocolMismatchError): descriptor.state = PluginState.ERRORED raise # ← Re-raises original exception type except Exception as exc: descriptor.state = PluginState.ERRORED msg = f"Failed to activate plugin '{name}': {exc}" raise PluginError(msg) from exc # ← Wraps in PluginError ``` Issues: 1. **No execution isolation**: There is no `execute_plugin()` method, so callers invoke plugin methods directly on the instance (`self._instances[name]`). Any exception from a plugin method propagates directly to the caller with no wrapping. 2. **No resource limits**: Plugins can consume unbounded memory/CPU during activation or execution. 3. **No subprocess isolation**: Plugins run in the same process as core, so a plugin that calls `sys.exit()` or raises `SystemExit` will terminate the entire application. 4. **No timeout**: Plugin activation has no timeout, so a plugin that hangs during `__init__` will hang the entire application. ### Code Location - `src/cleveragents/infrastructure/plugins/manager.py` — `activate_plugin()` (lines 222–273) - `src/cleveragents/infrastructure/plugins/manager.py` — `get_plugin_instance()` (lines 175–185) — returns raw instance with no isolation wrapper ### Impact A misbehaving plugin can: 1. Raise arbitrary exceptions that propagate to core code 2. Hang the application during activation 3. Call `sys.exit()` to terminate the process 4. Consume unbounded memory/CPU 5. Corrupt shared state ### Steps to Reproduce 1. Create a plugin whose `__init__` raises `SystemExit(1)` 2. Register and activate it — the entire application terminates 3. Create a plugin whose `__init__` runs `while True: pass` — application hangs ### Suggested Fix 1. Add timeout to `activate_plugin()` using `concurrent.futures`: ```python import concurrent.futures with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(cls) try: instance = future.result(timeout=30.0) # 30-second activation timeout except concurrent.futures.TimeoutError: raise PluginLoadError(f"Plugin '{name}' activation timed out after 30s") ``` 2. Catch `SystemExit` and `KeyboardInterrupt` in activation: ```python except (SystemExit, KeyboardInterrupt) as exc: descriptor.state = PluginState.ERRORED raise PluginLoadError(f"Plugin '{name}' called sys.exit() during activation") from exc ``` 3. Add an `execute_plugin()` method that wraps all plugin method calls with exception isolation. --- **Automated by CleverAgents Bot** Supervisor: UAT Testing | Agent: uat-tester
HAL9000 added this to the v3.5.0 milestone 2026-04-09 09:05:18 +00:00
Author
Owner

Label compliance fix applied:

  • Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md

Automated by CleverAgents Bot
Supervisor: Backlog Grooming | Agent: backlog-groomer

Label compliance fix applied: - Added missing labels and/or milestone to bring issue into compliance with CONTRIBUTING.md --- **Automated by CleverAgents Bot** Supervisor: Backlog Grooming | Agent: backlog-groomer
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Reference
cleveragents/cleveragents-core#5726
No description provided.