test(e2e): E2E acceptance criteria for M4 (v3.3.0) — corrections, subplans, and checkpoints #814
Closed
freemo
wants to merge 19 commits from
test/e2e-m4-acceptance into master
pull from: test/e2e-m4-acceptance
merge into: cleveragents:master
cleveragents:master
cleveragents:fix/config-service-remove-undocumented-local-scope
cleveragents:bugfix/validation-attach-named-option-format
cleveragents:docs/add-example-tool-and-validation-management
cleveragents:bugfix/project-show-resource-name
cleveragents:bugfix/backlog-resource-schema-missing-overlay-strategy
cleveragents:fix/action-argument-schema/misleading-error-message
cleveragents:fix/remove-executable-resource-type
cleveragents:fix/config-get-output-missing-origin-panel-and-envelope
cleveragents:fix/tui-help-command-full-catalog-listing
cleveragents:fix/a2a-plan-execute-full-lifecycle
cleveragents:fix/invariant-service-action-scope-effective
cleveragents:fix/plan-explain-rich-output-panels
cleveragents:fix/a2a-dispatch-not-found-error-response
cleveragents:fix/project-service-namespaced-project
cleveragents:fix/automation-profile-remove-rich-output-panel
cleveragents:fix/container-handler-module-missing
cleveragents:fix/format-output-rich-color-renderers
cleveragents:fix/type-safety-legacy-migrator-type-ignore
cleveragents:spec/update-sse-streaming-event-example
cleveragents:fix/acms-skeleton-compressor-signature
cleveragents:controller-state-machine
cleveragents:fix/skill-add-yaml-wrapper-key
cleveragents:fix/1476-tool-list-cols
cleveragents:bugfix/permissions-diff-mode-cycle
cleveragents:fix/1444-access-type
cleveragents:fix/1429-node-ref
cleveragents:fix/1443-tier-defaults
cleveragents:bugfix/session-export-format-flag
cleveragents:feature/aws-cloud-handler-sdk
cleveragents:feat/output-renderer-registry
cleveragents:fix/1432-lsp
cleveragents:bugfix/1039-missing-validation-unit-tests-yaml
cleveragents:feature/audit-preserve-event-timestamp
cleveragents:feature/m8-tui-materializer
cleveragents:tdd/m4-automation-profile-di-bypass
cleveragents:bugfix/m7-audit-session-race
cleveragents:fix/1441-ctrl-tab
cleveragents:feature/m9-entity-sync
cleveragents:feature/extract-cleveractors-library
cleveragents:feature/m9-agent-card
cleveragents:feature/m9-team-collab
cleveragents:feature/m7-postgresql-backend
cleveragents:feature/m9-container-lifecycle
cleveragents:fix/issue-11189-config-actor-format
cleveragents:bugfix/m5-actor-options-ignored
cleveragents:fix-11004-tui-suggestions
cleveragents:feature/9827-wrap-plan-status-json-envelope
cleveragents:fix/arg-swap-validation-attachment-8177
cleveragents:pr-fix/9663-hot-warm-cold-tier-reliability
cleveragents:pr_fix-11000-conflict-report
cleveragents:bugfix/m3.6.0-lsp-7044-subprocess-cleanup
cleveragents:fix/7478-file-ops-security-fix
cleveragents:impl-tui-materializer
cleveragents:test/hierarchical-plan-4phase-lifecycle
cleveragents:feature/security-fix-relpath-pr-11217
cleveragents:feature/m2-implementation-pool-supervisor-checklist
cleveragents:fix-file-tools-path-validation
cleveragents:bugfix/m8-tui-input-live-refresh
cleveragents:feature/9126-fix-action-scope-invariant-merge
cleveragents:bugfix/m7-tool-calling-llm-options
cleveragents:fix-7478-startswith-bypass
cleveragents:bugfix/m3-cleanup-subprocess-on-failed-init
cleveragents:bugfix/m8-tui-anthropic-model-name
cleveragents:feat/integrate-cleveractors
cleveragents:feature/m8-tui-llm-dispatch
cleveragents:bugfix/m3.6.0-lsp-transport-header-injection-ascii
cleveragents:fix-11175
cleveragents:fix/auto_debug-partial-state
cleveragents:fix/issue-9124-add-bdd-tags
cleveragents:pr-9673-budget-enforcement
cleveragents:fix/actor-loader-list-actors-race-condition
cleveragents:pr-9675
cleveragents:feat/v3.3.0-three-way-merge-engine
cleveragents:fix/issue-7478-inline-executor-startswith-bypass
cleveragents:fix/plan-apply-json-envelope
cleveragents:feat/v3.4.0-acms-storage-tiers
cleveragents:feat/tui-tuimat-5326
cleveragents:fix-9675-context-show-clear
cleveragents:agents/final-working
cleveragents:feat/v3.4.0-context-show-clear-cli
cleveragents:fix/10356-eventbus-unsubscribe
cleveragents:11229-fix-acms-hot-max-tokens-regression-tests
cleveragents:pr-fix-7801
cleveragents:pr-8701-invariant-model
cleveragents:pr-fix/10597-lsp-transport-cleanup
cleveragents:bugfix/m3.6.0-lsp-transport-resource-leak
cleveragents:bugfix/9558-plan-conflict-detection
cleveragents:pr-fix-9608
cleveragents:feat/v3.3.0-plan-correct-revert-append
cleveragents:dmpipeline-v2
cleveragents:pr-fix-10608-header-injection
cleveragents:pr-9827-fix
cleveragents:bugfix/7492-validation-attachment-argument-swap
cleveragents:pr-fix-11002
cleveragents:feat/v3.4.0-context-list-add-cli
cleveragents:fix/plan-status-json-envelope
cleveragents:feat/v370/multi-session-tabs
cleveragents:fix-branch
cleveragents:fix/project-show-missing-panels
cleveragents:AUTO-IMP/PR-10069-checklist
cleveragents:feature/m2-pr-compliance-checklist
cleveragents:feature/pr-10592-cloud-resource-types
cleveragents:fix-lsp-transport-cleanup
cleveragents:feat/v360/cloud-resource-types
cleveragents:feature/context-strategy-protocol
cleveragents:refactor/v3.6.0-acp-to-a2a-rename
cleveragents:fix/context-cli-consolidation
cleveragents:fix/10608-lsp-header-injection
cleveragents:feat/acms-context-index
cleveragents:fix/plan-status-missing-output-panels
cleveragents:pr/fix-arg-swap-validation-attachment-8177
cleveragents:feature/issue-4748-actor-context-list-show-clear
cleveragents:fix-cli-plan-status-envelope
cleveragents:fix/plan-tree-color-format-ansi-output
cleveragents:pr/9981
cleveragents:pr/11153-auto-debug-fix
cleveragents:pr/10589-tui-materializer
cleveragents:fix/validate_path_security
cleveragents:pr-fix-11177-status-check-native-expressions
cleveragents:bugfix/m6-validate-path-startswith
cleveragents:security/relpath-containment-fallback
cleveragents:a2a-materializer-pr-fix
cleveragents:pr-fix-10608
cleveragents:bugfix/9250-a2a-session-id-validation-before-cleanup
cleveragents:pr-fix-11053
cleveragents:fix/10496-auto-debug-node-state-mutation
cleveragents:feat/tui-v370/tui-materializer
cleveragents:fix/a2a-handle-session-close-missing-session-id
cleveragents:fix/validation-attachment-arg-swap-8177
cleveragents:pr-fix-11196-invariant
cleveragents:feat/v3.4.0-acms-budget-enforcement
cleveragents:pr-fix-11196
cleveragents:bugfix/m5-fix-hot-max-tokens-tier
cleveragents:pr-fix-9675
cleveragents:perf/acms-large-project-indexing-optimization
cleveragents:perf-fix
cleveragents:pr-9608
cleveragents:feature/ten-way-merge-engine
cleveragents:pr-fix-branch
cleveragents:pr-11217
cleveragents:bugfix/9608-three-way-merge-engine
cleveragents:11101-three-way-merge-engine
cleveragents:feat/v3.4.0/acms-context-policy
cleveragents:fix/remove-silent-argument-swap
cleveragents:fix-pr-11000-structured-conflict-report
cleveragents:pr-fix-11053-session-id-validation
cleveragents:agents/fix-eventbus-unsubscribe
cleveragents:pr-10356
cleveragents:fix/invariant-action-scope
cleveragents:bugfix/issue-8395-sanitise-db-url
cleveragents:bugfix/m3-fix-action-scope-invariant-merge
cleveragents:pr-9671
cleveragents:feature/wire-missing-event-emitters
cleveragents:bugfix/m3.6.0-lsp-transport-post-spawn-cleanup
cleveragents:dmpipeline
cleveragents:bugfix/m5-acms-project-budget-override
cleveragents:fix/iterate-all-actors
cleveragents:pr/11217-fix-prefix-collision-bypass
cleveragents:fix/pr-11011-subprocess-cleanup
cleveragents:pr-11217-fix
cleveragents:pr-11217-relpath-fix
cleveragents:feat/v3.6.0-context-strategy-protocol
cleveragents:bugfix/tui-actor-overlay-render-shadow
cleveragents:bugfix/m5-revert-acms-budget-assembler
cleveragents:fix/eventbus-unsubscribe
cleveragents:feature/pr-9981
cleveragents:fix/v3.7.0/actor-add-update-flag
cleveragents:agents/fix-invariant-persistence-8573
cleveragents:fix/invariant-database-persistence
cleveragents:feat/tui-materializer-a2a
cleveragents:fix/tui-tui-materializer-a2a-event-queue
cleveragents:fix/unsubscribe-eventbus
cleveragents:pr-11153
cleveragents:feature/11201
cleveragents:pr-fix-11153-patched
cleveragents:pr-branch
cleveragents:fix/10813-strategy-decision-persistence
cleveragents:fix-pr-11145-status-check
cleveragents:pr-11053
cleveragents:pr-fix-10597-subprocess-cleanup
cleveragents:bugfix/mcp-infer-resource-slots-null-properties
cleveragents:pr-11166
cleveragents:pr-9675-fix
cleveragents:feat/structural-component-output-validation
cleveragents:fix/invariant-service-thread-safety
cleveragents:pr-fix-8179-implementation
cleveragents:pr-fix-9313
cleveragents:cleveragents-pr-fix-11038
cleveragents:fix/m2-acceptance-test
cleveragents:fix/pr-11042-rename-render
cleveragents:fix/action-scope-inmerge
cleveragents:fix/wf12-oom-sigkill
cleveragents:fix/wf18-container-clone-e2e
cleveragents:tdd/mcp-client-timer-cancel-race
cleveragents:feature/auto-debug-nodes
cleveragents:feat/v3.2.0-decision-recording-persistence
cleveragents:bugfix/m6-actor-overlay-render-shadow
cleveragents:bugfix/m7-plan-strategy-decisions-json
cleveragents:fix/10911-tui-suggestions-query-extraction
cleveragents:fix/lsp-transport-subprocess-cleanup
cleveragents:pr-fix-8177-validation
cleveragents:bugfix/m3-plan-status-json-envelope
cleveragents:fix/invariant-persistence-8573
cleveragents:pr-fix-11037
cleveragents:pr-11015-fix
cleveragents:pr_fix_11015
cleveragents:fix/m1-security-fix-startswith-bypass
cleveragents:fix/automation-profile-gates-lifecycle
cleveragents:fix-status-check-brittle-pipeline-11212
cleveragents:feat/pr-10590-dual-capability-strategies
cleveragents:feat/structural-output-validation
cleveragents:bugfix/m2-ci-status-check-resilience
cleveragents:fix-sandbox-cache-invalidation
cleveragents:feature/acp-a2a-rename-fix
cleveragents:feature/m3-plan-correction-data-model
cleveragents:pr-fix-10356-unsubscribe
cleveragents:pr-fix-11011
cleveragents:pr_fix/lsp-transport-header-injection-ascii
cleveragents:fix-pr-11002-startswith-bypass-7478
cleveragents:bugfix/acms-project-budget-override
cleveragents:fix/ci-status-check-resilience
cleveragents:bugfix/pr-fix-10597-cleanup-subprocess-on-init-failure
cleveragents:bugfix/sandbox-reexecute-cleanup
cleveragents:pr-fix-8701-invariant-model
cleveragents:fix/test-dotdot-traversal-assertion
cleveragents:fix/cleanup-stale-preserve-commits
cleveragents:fix/10592-pr-compliance
cleveragents:fix/security-file-tools-path-traversal-7478
cleveragents:pr-11180-fix
cleveragents:fix-combined-format
cleveragents:fix-9131-invariant-propagation
cleveragents:fix/tui-actor-selection-overlay
cleveragents:pr-11201
cleveragents:merge/pr-11196-invariant-fix
cleveragents:fix/issue-10813-strategize-decision-persistence
cleveragents:pr-fix-11170
cleveragents:pr/11165
cleveragents:temp-pr-11174
cleveragents:feat/invariant-enforcement-validation-pipeline
cleveragents:pr-fix-10356-unsubscribe-eventbus
cleveragents:pr-fix-11156-python313-deprecation
cleveragents:feature/pr-7801-fix-validate-path-security
cleveragents:fix/11039-render-refresh
cleveragents:fix/tui-actor-selection-render-rename
cleveragents:pr-fix-11089-session-close-validation
cleveragents:pr-fix/11089-session-close-validation
cleveragents:pr-fix-11182
cleveragents:feature/7926-persist-decision-dependencies
cleveragents:bugfix/m3-rxpy-subject-close
cleveragents:test/restore-e2e-tests
cleveragents:feature/m694-tui-materializer-a2a-integration-layer
cleveragents:feature/issue-pr-9271-hot-max-tokens
cleveragents:pr-fix-8177
cleveragents:test/v360/e2e-project-plan-correction
cleveragents:bugfix/issue-8426-stdio-cleanup
cleveragents:feature/eventbus-unsubscribe
cleveragents:bugfix/m3-integrate-mcp-transport
cleveragents:fix/concurrent-stdout-restoration
cleveragents:feat/a2a-stdio-transport-fix-264
cleveragents:PR-fix-wf18
cleveragents:feature/sandbox-cache-invalidation
cleveragents:fix/issue-10496-auto-debug-state-mutation
cleveragents:fix/python-313-asyncio-deprecations
cleveragents:pr-11128
cleveragents:pr-11180
cleveragents:pr-11165
cleveragents:pr-practice
cleveragents:structural-output-validation
cleveragents:fix/status-check-native-expressions
cleveragents:feat/merge-conflict-detection
cleveragents:11036-fix-acms-hot-max-tokens
cleveragents:pr/11166
cleveragents:fix/ci-status-check-native-expressions
cleveragents:fix/stdlib-transport-cleanup
cleveragents:fix/11176-actor-selection-render
cleveragents:pr-fix-10597
cleveragents:feature/pr-compliance-pool-supervisor
cleveragents:fix/actor-add-update-enforcement-fix
cleveragents:pr_fix/8209
cleveragents:pr-10590
cleveragents:fix/python313-asyncio-get-event-loop-deprecation
cleveragents:pr-fix-#11053-session-id-validation
cleveragents:pr-fix-11042-renamed-render
cleveragents:feat/v360/acp-to-a2a-rename
cleveragents:fix-arg-swap-validation-attachment-8177
cleveragents:fix/asyncio-get-event-loop-deprecation
cleveragents:fix_8395_pr
cleveragents:pr-fix-11153-auto-debug-mutation
cleveragents:pr/11051-thread-safety-invariant
cleveragents:fix-plan-status-json-envelope
cleveragents:bugfix/pr-11015-pool-supervisor-checklist
cleveragents:feature/fix-7478-validate-path
cleveragents:feature/plans-conflict-detection
cleveragents:pr-11141-cleanup-stale-commits-beyond-head
cleveragents:fix/pyyaml-vulnerability-upgrade
cleveragents:pr-fix-9244
cleveragents:bugfix/m3-invariant-propagation
cleveragents:feature/issue-10480-fix-validation-bypass
cleveragents:feature/m3-invariant-enforcement-validation-pipeline
cleveragents:feat/invariant-enforcement-strategize-phase
cleveragents:bugfix/mcp-race-condition-start
cleveragents:fix/action-schema-argument-default-type-validation
cleveragents:issue-10438-fix
cleveragents:fix/mcp-timer-race-10516
cleveragents:fix/10480-validation-bypass-fix
cleveragents:fix/cli-session-tell-format-flag
cleveragents:feat/agents-invariant-add-list-remove-commands
cleveragents:restore-e2e-cleanup
cleveragents:fix/events-eventbus-unsubscribe
cleveragents:fix/issue-11120-cleanup-stale-preserve-artifacts
cleveragents:feature/fix-issue-11121-cleanup-stale-reinvoke
cleveragents:fix/issue-10480-plan-validation
cleveragents:feature/m5-tdd-quality-gate
cleveragents:bugfix/11121-fix-cleanup_stale-preserve-meaningful-changes
cleveragents:bugfix/m8-set-active-persona-preset-reset
cleveragents:feat/context-priority-strategy
cleveragents:feature/issue-4381-docs-api-and-module-guides
cleveragents:m7-opencode-ruff
cleveragents:bugfix/m3-wf18-oom-sigkill
cleveragents:bugfix/acms-dual-strategy-capabilities-incompatible-fields
cleveragents:feature/benchmark-scheduled-workflow
cleveragents:feature/m8-tui-mainscreen
cleveragents:feat/v3.4.0/acms-project-indexer
cleveragents:fix/10932-preserve-strategy-decisions-json
cleveragents:fix/data-integrity-session-rollback-7489
cleveragents:fix/issue-6329-resource-remove-edge-table
cleveragents:fix/issue-7524-invariant-service-thread-safety
cleveragents:pr-10932-fix-plan-strategy-decisions
cleveragents:pr-fix-9244-pyyaml-upgrade
cleveragents:refactor/noxfile-parallel-test-architecture
cleveragents:task/ci-matrix-strategy-python-versions
cleveragents:bugfix/m3.6.0-ci-pipeline-flakiness-stabilization
cleveragents:feat/v3.3.0-plan-rollback
cleveragents:refactor/auto-guard-1-cli-a2a-boundary
cleveragents:feature/issue-10755-redirect-rich-panels-to-stderr
cleveragents:pr10871
cleveragents:fix/10881-propagate-invariants-to-child-plans
cleveragents:feat/resources-extension-interface
cleveragents:pr-fix-10901
cleveragents:ci/optimize-benchmarks-regression
cleveragents:fix/tui-extract-at-token-suggestions
cleveragents:feat/acms-index-data-model
cleveragents:feature-10887-eventbus-unsubscribe
cleveragents:feature/m5-add-repo-indexing-showcase
cleveragents:PR-10910-a2a-json-rpc-routing
cleveragents:feature/milestone-based-pr-prioritization
cleveragents:bugfix/m3-issue-9055
cleveragents:auto-time-3-day106-cycle2
cleveragents:feature/m39-timeline-day106-cycle2-2026-04-16
cleveragents:timeline/day-106-cycle2-2026-04-16-auto-time-3
cleveragents:feat/issue-10921-a2a-http-transport
cleveragents:pr/fix-10842
cleveragents:feature/issue-10746-fix-agents-graphs-plan-generation-validate-always-passes-for-code-longer-than-10-characters-making-llm-validation-ineffective
cleveragents:agents/fix-10866-permissions-screen-to-textual-screen
cleveragents:pr-10886
cleveragents:bugfix/m3-session-tell-format
cleveragents:fix/pr-10890-shell-safety-integration
cleveragents:fix/session-delete-json-envelope
cleveragents:pr-10851
cleveragents:test/v3.8.0-ci-quality-execution-time
cleveragents:feature/m7-timeline-day-106-update
cleveragents:bugfix/context-remove-path-traversal-10924
cleveragents:pr-10876
cleveragents:fix/gemini-fallback-order
cleveragents:fix/trailing-comma-opencode-json
cleveragents:pr/fix/mcp-client-start-race-condition
cleveragents:fix/project-switch-command
cleveragents:fix-pr-4211
cleveragents:feat/three-way-merge-engine-9608
cleveragents:pr/9673
cleveragents:fix/1469-plan-execute-structured-panels
cleveragents:fix/actor-provider-validation
cleveragents:implement-pr-9442
cleveragents:cleveragents-push-23420b48
cleveragents:fix/validation-repo-silent-swap
cleveragents:feat/context-strategy-plugin-system
cleveragents:fix/startswith-bypass-7478
cleveragents:fix-plan-status-envelope-11034
cleveragents:fix/invariant-thread-safety
cleveragents:fix-thread-safety-invariant-service
cleveragents:fix/8284-warned-sessions-reset
cleveragents:docs/milestone-plan-navigation
cleveragents:feat/v3.3.0-checkpoint-creation
cleveragents:feature/implementor-notification-11032
cleveragents:task/ci-optimize-e2e-tests-execution-time
cleveragents:feature/pr-9599-plan-correct-correction-engine
cleveragents:pr-fix-10593
cleveragents:pr9452
cleveragents:fix/isolate-checkpoint-prune-test
cleveragents:pr/fix-9601
cleveragents:pr/9234-hardening-bdd-tags
cleveragents:bugfix/9673-acms-budget-enforcement
cleveragents:pr-8667
cleveragents:auto-arch/spec-pr-10451-test-coverage
cleveragents:fix/10954-security-scan-dockerfile
cleveragents:bugfix/9183-bdd-tag-enforcement
cleveragents:fix/7566-engine_cache-toctou-race
cleveragents:fix/10934-preserve-strategy-decisions-json
cleveragents:bugfix/10608-lsp-header-injection
cleveragents:bugfix/9981-acms-indexing-optimize
cleveragents:bugfix/11077-security-escape-bypass
cleveragents:fix/auto-rev-sup-tracking-prefix
cleveragents:fix-lsp-subprocess-cleanup-10597
cleveragents:improvement/agent-evolution-pool-supervisor-pr-metadata
cleveragents:fix/plan-tree-json-output-envelope
cleveragents:pr-9313-fix
cleveragents:bugfix/9244-pyyaml-security-upgrade
cleveragents:feature/issue-1925-add-asv-tests-for-domain-module
cleveragents:test/domain-asv-benchmarks
cleveragents:feature/9250-fix-a2a-session-close
cleveragents:fix/pr-10027-acms-default-pipeline
cleveragents:bugfix/m2-plan-explain-alternatives-format
cleveragents:fix-invalidate-sandbox-dirs-cache-after-purge-7527
cleveragents:pr-fix-10958-async-cleanup-tests
cleveragents:feat/adr-049-layer-boundary-enforcement
cleveragents:fix/action-list-table-columns
cleveragents:fix/issue-7478-validate-path-startswith-bypass
cleveragents:pr-fix-ci-11000
cleveragents:fix/agent-skill-multi-scope-discovery
cleveragents:pr_fix_8675_switch_project_command
cleveragents:feat/m6/devcontainer-clone-into-sandbox
cleveragents:fix/tui-keybinding-preset-persona-cycling
cleveragents:pr-fix-10982
cleveragents:bugfix/m3-invariant-service-thread-safety
cleveragents:pr-fix-10937-close-reactive-eventbus
cleveragents:pr-fix-7478-path-traversal
cleveragents:feature/benchmark-scheduled-workflow-fix
cleveragents:pr-9183-add-bdd-tags
cleveragents:pr/11029-review-started-notification
cleveragents:fix/pyyaml-security-upgrade
cleveragents:fix-plan-status-panels
cleveragents:fix-pr-11037
cleveragents:feat/v3.6.0-database-resource-types
cleveragents:pr-10591-checkout
cleveragents:pr-10979
cleveragents:fix/invariant-thread-safety-8209
cleveragents:pr-fix-11002-validate-path-bypass
cleveragents:fix/10597-lsp-proc-cleanup
cleveragents:fix/plan/tree-envelope-9313
cleveragents:fix-6568-push
cleveragents:fix/issue-6425-tui-persona-cycling-keybinding
cleveragents:pr/11044
cleveragents:feature/m6-reduce-redundant-ci-status-reporting
cleveragents:fix/11041-plan-tree-envelope
cleveragents:fix/ca-test-infra-improver-health-spam
cleveragents:agents/pr-6628-fix
cleveragents:docs/add-showcase-cli-basics
cleveragents:auto-time-1-day107-cycle
cleveragents:improvement/agent-uat-tester-parallel-docs-pr-fix
cleveragents:fix/issue-11047-actor-add-rename-from-config
cleveragents:fix/pr-11050-subprocess-cleanup
cleveragents:pr-6741
cleveragents:ci/cache-helm-binary-auto-inf-1
cleveragents:fix/8675-project-switch
cleveragents:fix/7527-sandbox-cache-invalidation
cleveragents:fix/issue-6319-project-context-set-output
cleveragents:pr/fix-9183-bdd-tags
cleveragents:fix/issue-6325-plan-explain-decision-id
cleveragents:fix/1422-docs
cleveragents:pr-fix-1485-updates
cleveragents:spec/subplan-system-v3.3.0
cleveragents:pr/6723-fix-session-create-json
cleveragents:improvement/agent-bug-hunt-pool-supervisor-tracking-prefix-complete
cleveragents:fix/pr-6695-session-list-empty-json
cleveragents:fix/file-tools-startswith-bypass
cleveragents:pr_fix_8256
cleveragents:pr-9663-fix
cleveragents:docs/add-example-resource-and-skill-management
cleveragents:feature/m39-cli-basics-showcase
cleveragents:pr-fix-7478-startswith-bypass
cleveragents:fix/issue-11047-actor-add-remove-positional-name
cleveragents:fix/gemini-fallback-order-fix-3
cleveragents:pr_fix_8179
cleveragents:fix/gemini-fallback-order-fix-2
cleveragents:fix/validation-list-command
cleveragents:fix/validation-list-command-clean
cleveragents:fix-pr7957-complete-tracking-prefix
cleveragents:pr-7922-fix-lint
cleveragents:fix/validation-swap-8177
cleveragents:add-plan-start-alias
cleveragents:feature/pr-8304-container-clone-into
cleveragents:fix-pyyaml-11012
cleveragents:pr-fix-9461
cleveragents:fix/pr-11004-tui-token-extraction
cleveragents:fix/invariant-scope-handling
cleveragents:feat/plan-correction-8531
cleveragents:pr/8685-correction-data-model-persistence
cleveragents:bugfix/lsp-stdio-transport-cleanup-10597
cleveragents:pr-8660
cleveragents:feat-scope-chain-resolution
cleveragents:chore/pyyaml-upgrade
cleveragents:fix/9250-session-id-validation-handle-session-close
cleveragents:fix/issue-7478-file-tools-validate-path
cleveragents:pr-fix-9442-tui-ctrltab
cleveragents:spec/update-cycle8-validation-gate-empty-run-guard
cleveragents:fix/tui-sqlite-session-persistence-10648
cleveragents:fix/8661-plan-start-alias
cleveragents:fix-10649
cleveragents:refactor/add-return-type-get-services
cleveragents:pr-fix-cache-init
cleveragents:pr9407-timeline
cleveragents:feat/tui-prompt-symbol
cleveragents:pr_fix_9407-plan-alternatives-structured
cleveragents:feat/automation-profile-precedence-chain
cleveragents:bugfix/8179-remove-session-rollback-calls
cleveragents:feat/v360/pluggable-scope-chain-api
cleveragents:pr-9246
cleveragents:refactor/agent-configurable-limits-context-analysis-plan-generation
cleveragents:fix/issue-6452-session-tell-output
cleveragents:fix/v370/quality-gates-command-injection
cleveragents:pr-fix-10635-fixed
cleveragents:pr-10069
cleveragents:pr/fix-9313
cleveragents:pr-10643
cleveragents:invariant-pr-8684-fix
cleveragents:pr-fix-6676-resource-remove-edge-table
cleveragents:refactor/v360/audit-rename-acp-imports
cleveragents:fix/issue-7623-validation-pipeline-stdout
cleveragents:fix/acms-consolidate-strategycapabilities
cleveragents:fix/issue-7604-a2a-event-queue-concurrency
cleveragents:pr-fix-8661
cleveragents:auto-arch/spec-clarifications-cycle-1
cleveragents:feat/pure-graph-bdd-coverage
cleveragents:fix/9250-validate-session-id-before-cleanup
cleveragents:feature/issue-9442-fix-tui-correct-preset-cycling-keybinding-to-ctrl-tab-and-add-persona-tab-cycling
cleveragents:bugfix/m6-file-tools-validate-path-bypass
cleveragents:fix/invariant-add-scope
cleveragents:bugfix/m3-shell-safety-service-tui
cleveragents:pr-8684-persist-invariants
cleveragents:pr-8209-fix
cleveragents:docs/v360/repl-actor-run-showcase
cleveragents:feat/v360/cost-session-budget
cleveragents:bugfix/8177-remove-silent-argument-swap
cleveragents:fix/plan-apply-rich-output-panels
cleveragents:pr-fix-11012
cleveragents:pr-fix-11012-pyyaml-upgrade
cleveragents:pr-fix-8667
cleveragents:pr/fix/11012-pyinsec
cleveragents:pr-fix-9407
cleveragents:pr-8853
cleveragents:test/cli-lifecycle-e2e-full-plan-lifecycle
cleveragents:bugfix/m3-evlv-9824-implementation-pool-compliance-checklist
cleveragents:pr/10069
cleveragents:docs/pr-creator-state-priority-labels
cleveragents:fix/1514-structured-panels
cleveragents:test/core-asv-benchmarks
cleveragents:fix-8640-remove-positional-name
cleveragents:pr-fix-10995
cleveragents:refactor/v3.6.0-acp-to-a2a-rename-push
cleveragents:pr-9663
cleveragents:bugfix/m3.6.0-lsp-discovery-resource-exhaustion-dos
cleveragents:8660-move-namespace-filter-inside-lock
cleveragents:pr-fix-work
cleveragents:test/plan-correct-json-output-tdd
cleveragents:pr-8304
cleveragents:feat/v3.2.0-invariant-data-model-db-schema
cleveragents:pr_fix_1514_v2
cleveragents:timeline-update-2026-04-19
cleveragents:pr-fix-9313-plan-tree-envelope
cleveragents:test/v3.6.0/advanced-context-strategies-tests
cleveragents:pr/11004-fix-tui-suggestions-query-extraction
cleveragents:pr-fix-9817
cleveragents:feat/9558-plan-conflict-detection
cleveragents:docs/timeline-day-101
cleveragents:fix/v360/plugin-loader-security
cleveragents:feat/acms-context-policy-fix-9671
cleveragents:pr-9817-plan-apply-json
cleveragents:pr-fix-9460
cleveragents:pr-fix-6722-prompt-symbol
cleveragents:pr/9671
cleveragents:pr-fix-9671
cleveragents:pr-10592-fix
cleveragents:fix/issue-7478-file-path-validation
cleveragents:pr-fix-7478-validatepath
cleveragents:feat/pr-10590-context-strategy-fix
cleveragents:bugfix/m6-acms-path-matching-absolute
cleveragents:bugfix/pr-9183-bdd-tags
cleveragents:fix-pr-10975-path-matching-normalize
cleveragents:pr_fix/lsp-transport-subprocess-cleanup
cleveragents:pr-8177-validation-fix
cleveragents:feat/acms-context-show-clear-cli
cleveragents:feat/v360/plugin-architecture
cleveragents:fix/invariant-add-scope-required
cleveragents:pr-fix-10590-context-strategy
cleveragents:pr-fix-10590-local
cleveragents:pr-8662-fix
cleveragents:pr/1485
cleveragents:bugfix/8660-move-namespace-filter-inside-lock
cleveragents:pr/9460-project-show-invariants-validations
cleveragents:pr-11013
cleveragents:fix-1469-impl
cleveragents:fix/1469-impl
cleveragents:fix/cleanup-service-sandbox-cache-invalidation
cleveragents:pr-8257
cleveragents:pr-3329
cleveragents:feat/v3.2.0-decision-recording-strategize
cleveragents:fix/strategize-full-context-snapshots
cleveragents:clone-verify-test
cleveragents:fix/issue-6316-session-list-json-empty-case
cleveragents:AUTO-IMP/PR-9672-context-list-add
cleveragents:AUTO-IMP/PR-9663-storage-tiers
cleveragents:fix/issue-pr-11002
cleveragents:fix/plan-lifecycle-prompt-decision
cleveragents:fix/gemini-fallback-order-10906
cleveragents:AUTO-IMP/PR-10583-a2a-rename
cleveragents:fix-check-same-thread-migration-runner
cleveragents:d2188407
cleveragents:fix/a2a-handle-session-close-missing-session-id-pr-9250
cleveragents:fix/invariant-merge-action-scope
cleveragents:pr-fix-8179
cleveragents:bugfix/report-number-of-actors
cleveragents:bugfix/m6-devcontainer-autodiscovery-wiring
cleveragents:fix-gemini-fallback-order-10906
cleveragents:bugfix/m5-event-bus-exception-swallow
cleveragents:pr/3458
cleveragents:acms-parallel-indexing-fix
cleveragents:bugfix/m3-error-handling-fileconfig-unhandled-exception
cleveragents:acms-parallel-indexing
cleveragents:fix/resource-removal-children-check-6886
cleveragents:pr/9451-fix-tui-thinking-effort-presets
cleveragents:pr-fix-10958
cleveragents:fix/8179-remove-session-rollback-calls
cleveragents:pr/9817-plan-apply-json-envelope
cleveragents:fix/lsp-context-enrichment-acms-wiring
cleveragents:fix/cli-remove-positional-name-from-actor-add
cleveragents:fix/acms-context-cli
cleveragents:fix/tui-permissions-screen-wrong-base-class
cleveragents:bugfix/m6-session-create-suppress-exception-logging
cleveragents:fix/plan-tree-json-missing-decision-id
cleveragents:fix/plan-start-spec-alignment
cleveragents:fix-10957
cleveragents:fix/6726-tui-persona-cycling-keybinding
cleveragents:feat/plan-rollback-cli-checkpoint-restore
cleveragents:pr-8661-plan-start-alias
cleveragents:pr/1486/resource-handler-return-type
cleveragents:feature/8667-add-validation-list-command
cleveragents:auto-docs-1-mkdocs-setup
cleveragents:fix/actor-add-positional-name
cleveragents:feat/v3.3.0-merge-strategy-config
cleveragents:fix/invariant-precedence-chain-action-scope
cleveragents:improvement/agent-pr-review-pool-supervisor-tracking-prefix-complete
cleveragents:pr/fix/actor-loader-list-actors-race-condition
cleveragents:bugfix/m4-lsp-context-enrichment-acms-wiring
cleveragents:docs/auto-docs-2-v320-v330-features
cleveragents:bugfix/m-error-suppression-reactive-registry-adapter-v2
cleveragents:fix/7501-plan-repository-success-derivation
cleveragents:pr-10492
cleveragents:pr-8225
cleveragents:fix/plan-artifacts-missing-validation-apply-summary
cleveragents:feature/m9-v3.8.0-v3.9.0-documentation
cleveragents:docs/fix-automation-profile-default-supervised
cleveragents:fix/context-analysis-agent-path-traversal
cleveragents:pr-9229-path-traversal-fix
cleveragents:pr-10975
cleveragents:pr-fix-10986
cleveragents:pr/1486/fix-resource-handler-return-type
cleveragents:feat/m8/tui-main-screen
cleveragents:pr-9257-fix
cleveragents:fix/9222-guard-integration-e2e-jobs
cleveragents:refactor/clarify-behave-robot-framework-roles
cleveragents:docs/reference-glossary
cleveragents:feat/9088-a2a-message-send-stream
cleveragents:bugfix/m6-gemini-fallback-order
cleveragents:fix/validation-list-command-fixed
cleveragents:fix-executable-resource
cleveragents:test/plan-tree-correction-visual-tdd
cleveragents:auto-time/timeline-update-2026-04-18
cleveragents:pr-8179
cleveragents:spec/auto-arch-24-a2a-boundary-enforcement-adr
cleveragents:pr/10988/head
cleveragents:fix/7566-engine-cache-toctou-race
cleveragents:feat/v3.6.0-llm-provider-abstraction
cleveragents:fix/concurrency-catalog-cache-lock-7590-cleandiff
cleveragents:chore/test-infra-broad-exception-lint
cleveragents:issue-7502-fix-get-for-plan
cleveragents:fix/1500-impl
cleveragents:feat/context-show-cli-commands
cleveragents:pr-fix-7527-cache-invalidation
cleveragents:pr-fix-9407-plan-explain-structured-alternatives
cleveragents:fix/multi-scope-skill-discovery-9369
cleveragents:pr_9454
cleveragents:feat/agent-switch-cmd
cleveragents:pr-9329
cleveragents:8661-plan-start-alias
cleveragents:feat/acms-context-analysis-summaries
cleveragents:fix/invariant-add-repeatable-plan-action
cleveragents:tdd/m6-session-create-suppress-exception
cleveragents:test-push-check-only
cleveragents:pr-10889
cleveragents:pr-10889-fix
cleveragents:feature/issue-10952-provider-integration-tests
cleveragents:pr/10879-benchmark-caching-parallelism
cleveragents:bugfix/m3-eventbus-unsubscribe
cleveragents:spec/add-deleted-at-field-to-project-delete
cleveragents:fix/issue-6500-actor-context-list-regex
cleveragents:tdd/m8-tui-sqlite-session-persistence
cleveragents:fix/issue-6464-resource-add-auto-discovery
cleveragents:fix/bug-hunt-supervisor-tracking-prefix
cleveragents:feat/v3.2.0-plan-tree-cli
cleveragents:fix/issue-6491-actor-remove-format-option
cleveragents:fix/issue-6457-json-envelope-messages-text
cleveragents:improvement/agent-ca-test-infra-improver-duplicate-avoidance
cleveragents:fix/boundary-cost-budget-warning-re-trigger-7525
cleveragents:bugfix/6879-cli-format-option
cleveragents:feat/jwt-token-refresh
cleveragents:auto-discovered-stale-conflicts-review-task
cleveragents:docs/add-example-audit-log-and-security
cleveragents:docs/v3.8.0-api-and-module-guides
cleveragents:fix/issue-9169
cleveragents:improvement/reduce-redundant-ci-status-reporting
cleveragents:feat/v3.4.0-acms-index-data-model-traversal
cleveragents:bugfix/m3-sqlite-check-same-thread
cleveragents:issue-1-conversation-state
cleveragents:bugfix/m3-evlv-implementation-pool-compliance-checklist
cleveragents:feature/m9-a2a-jsonrpc
cleveragents:bugfix/m6-plan-execute-rich-output
cleveragents:fix/uat-checkpoint-prune-test-isolation
cleveragents:feature/issue-4749-split-monolithic-specification
cleveragents:bugfix/m8-suggestions-query-extraction
cleveragents:bugfix/m6-session-delete-format-json-envelope
cleveragents:bugfix/m3-langgraph-disposables
cleveragents:timeline/day-104-2026-04-14-auto-time-2
cleveragents:docs/quickstart-guide
cleveragents:fix/plan-prompt-json-timing-started
cleveragents:feat/v3.6.0-virtual-resource-types
cleveragents:feat/tui-v370/persona-registry
cleveragents:fix/1431-subgraph
cleveragents:bugfix/7529-a2a-terminal-phase-guard
cleveragents:bugfix/m3-bdd-feature-file-tags
cleveragents:ci/v360/isolate-slow-e2e-tests
cleveragents:feature/m3-consolidate-documentation
cleveragents:feature/m7-user-driven-review-agent
cleveragents:feature/m9-a2a-http
cleveragents:fix/1423-refactor
cleveragents:fix/tui-mainscreen-3state-sidebar-adr044
cleveragents:task/v3.8.0-ci-reusable-workflows
cleveragents:testbed/m9-hello
cleveragents:docs/add-label-verification-to-new-issue-creator
cleveragents:bugfix/m3-database-migration-runner-check-same-thread
cleveragents:feature/m4-plan-correction-revert
cleveragents:improvement/agent-architecture-pool-supervisor-milestone-assignment
cleveragents:docs/changelog-unreleased-cycle7
cleveragents:feature/m9-changelog-unreleased-cycle7
cleveragents:fix/issue-10512-mcptooladapter-rlock
cleveragents:fix/data-integrity-llm-trace-repository-7505
cleveragents:agents/auto-working-new
cleveragents:fix/resource-removal-guard-linked-children
cleveragents:fix/1468-impl
cleveragents:feature/1915-timezone-aware-datetime
cleveragents:feature/issue-4381-docs-add-invariantreconciliationactor-api-docs-devcontainer-discovery-module-guide-and-mkdocs-nav
cleveragents:task/ci-actor-context-mgmt-test-optimization
cleveragents:fix/7619-git-tools-base-env-toctou
cleveragents:pr-fix-8661-updates
cleveragents:feature/issue-2798-chore-agents-improve-ca-test-infra-improver-strengthen-duplicate-avoidance
cleveragents:bugfix/m3-migration-runner-check-same-thread
cleveragents:feature/issue-10952-fix-database-migration-runner-check-same-thread
cleveragents:fix/dependency-security-aiohttp-cves
cleveragents:test/uko-persistence-coverage
cleveragents:fix/security-b608-sql-fstring-migration-plan-phases
cleveragents:fix/cli-legacy-removal
cleveragents:feature/m39-auto-arch-23-minor-clarifications
cleveragents:bugfix/m3-langgraph-execute-state-bypass
cleveragents:feat/issue-6370-actor-context-clear
cleveragents:feat/acms-hot-storage-tier-lru-cache
cleveragents:feature/m3111-milestone-based-pr-prioritization
cleveragents:bugfix/m3-actor-run-response
cleveragents:fix/issue-7524-invariant-service-thread-safety-v2
cleveragents:pr-fix-10746
cleveragents:fix/tui-auto-generate-presets-actor-schema
cleveragents:feat/agent-card-discovery
cleveragents:feature/pr-10916-close-reactive-event-bus
cleveragents:feature/issue-1917-optimize-robot-actor-context-management-tests
cleveragents:feature/issue-10803-fix-nox-sessions-use-uv-sync-frozen
cleveragents:feature/issue-1923-missing-test-levels-core-module
cleveragents:feature/1928-add-test-coverage-for-tui-module
cleveragents:chore/ci-dockerfile-server-security-scan
cleveragents:task/ci-centralize-tool-versions
cleveragents:feature/m9-langgraph-platform
cleveragents:bugfix/m5-validation-attach-output-format
cleveragents:test/ci-execution-time-optimize-benchmark-regression
cleveragents:feature/issue-3105-add-mandatory-labels-to-supervisor-tracking-issue-creation
cleveragents:feat/acms-context-policy-configuration-schema
cleveragents:feat/context-sliding-window-strategy
cleveragents:feature/issue-5163-align-checkpoint-trigger-names
cleveragents:feature/issue-4221-docs-add-showcase-example-for-audit-log-and-security-commands
cleveragents:bugfix/m3-output-plan-results
cleveragents:fix/action-archive-output-panels
cleveragents:pr/9912-fix
cleveragents:fix/concurrency-catalog-cache-lock-7590
cleveragents:bugfix/executor-error-details-overwrite-mini-max
cleveragents:fix-10866-permissions-screen
cleveragents:feature/issue-7957-bug-hunt-pool-supervisor-tracking-prefix
cleveragents:fix-pr-10852
cleveragents:fix/10922-conversation-state-mgmt
cleveragents:pr-check
cleveragents:bugfix/10931-preserve-strategy-decisions-json
cleveragents:fix/10903-nox-showcase-docs
cleveragents:pr/10885-pyyaml-upgrade
cleveragents:pr-fix-10931
cleveragents:bugfix/executor-error-details-overwrite-qwen
cleveragents:fix-orchestrator-scaling-32-workers
cleveragents:fix-pr-1107-asgi-uvicorn
cleveragents:feature/m9-timeline-day-99
cleveragents:feat/issue-6369-actor-context-show
cleveragents:improvement/agent-label-compliance
cleveragents:fix-9912-branch
cleveragents:bugfix/10821-fix-tui-keybinding
cleveragents:feat/issue-6450-tui-escape-cascade
cleveragents:bugfix/m8-shell-safety-service-integration
cleveragents:fix/redaction-pattern-exception-handling
cleveragents:bugfix/m8-tui-on-input-changed
cleveragents:fix/action-schema-env-var-exfiltration
cleveragents:feature/spec-timeline-6003
cleveragents:feature/spec-timeline-6008
cleveragents:feature/issue-4746-update-spec-agents-diagnostics-all-9-providers
cleveragents:feat/v3.6.0/gemini-provider
cleveragents:pr/8194
cleveragents:tdd/prompt-input-textarea
cleveragents:feat/v3.6.0/cost-reporting-cli
cleveragents:fix/lsp-transport-security
cleveragents:feat/v3.6.0/semantic-context-strategy
cleveragents:feature/issue-10820-chore-agents-fix-bug-hunt-pool-supervisor-tracking-prefix-auto-bug-pool-to-auto-bug-sup-complete-fix
cleveragents:tdd/mN-registry-thread-safety
cleveragents:fix/v360/remove-acp-module
cleveragents:temp-squash
cleveragents:fix/v360/lsp-runtime-instantiation
cleveragents:feat/690-jsonrpc-routing
cleveragents:feat/v3.6.0-anthropic-gemini-backends
cleveragents:build/agents-system-rewrite
cleveragents:feat/v3.3.0-plan-rollback-cli
cleveragents:feat/v3.3.0-parallel-subplan-scheduler
cleveragents:feature/issue-10846-optimize-benchmark-regression-test-suite
cleveragents:feature/issue-10826-docs-spec-align-checkpoint-trigger-names-and-config-key-path-with-implementation
cleveragents:feature/issue-10744-fix-tui-convert-permissionsscreen-from-static-widget-to-proper-textual-screen-subclass
cleveragents:feature/issue-10794-feat-a2a-implement-a2a-http-transport-for-server-mode
cleveragents:fix/tui-preset-cycling
cleveragents:pr-10820
cleveragents:feature/696-implement-a2a-http-transport-for-server-mode
cleveragents:feature/issue-10792-feat-server-langgraph-platform-remotegraph-integration
cleveragents:feature/issue-1486-fix-v3-7-0-resourcehandler-return-type-1444
cleveragents:feature/issue-1488-fix-v3-7-0-resolve-issue-1432
cleveragents:bugfix/m1-plan-execute-sandbox-root
cleveragents:feature/issue-4663-day-97-schedule-adherence-update
cleveragents:feature/issue-10858-devops-run-linter
cleveragents:docs/milestone-v3.6.0-v3.7.0
cleveragents:feature/issue-10835-add-milestone-based-pr-prioritization
cleveragents:pr-8701-head
cleveragents:fix/7927-apply-phase-dod-gating
cleveragents:fix/sse-formatter-json-rpc-2.0
cleveragents:feat/v3.6.0/scope-chain-assembler-integration
cleveragents:fix/tui-bindings-block-cursor-navigation
cleveragents:fix/v360/compute-actor-impact-exceptions
cleveragents:feat/v360/openrouter-provider
cleveragents:docs/v360/cli-version-info-diagnostics
cleveragents:feat/context-semantic-chunking-strategy
cleveragents:feat/acms-cli-context-show-clear
cleveragents:feature/m7-actor-management-showcase-metadata
cleveragents:feature/m6-4213-resource-skill-showcase
cleveragents:feat/v360/anthropic-gemini-backends
cleveragents:feat/v3.6.0/safety-profile-enforcement
cleveragents:feat/context-dynamic-budget-allocation
cleveragents:refactor/v360/unify-error-handling-cli
cleveragents:fix/v370/tui-materializer-a2a
cleveragents:fix/auto-debug-agent-prompt-injection
cleveragents:refactor/v360/unify-api-naming
cleveragents:test/cli-docstring-example-validation
cleveragents:fix/v360/resource-kind-field
cleveragents:feat/v3.6.0/context-relevance-scoring
cleveragents:fix/v360/plugin-state-executing
cleveragents:fix/v360/lsp-path-traversal-file-reading
cleveragents:feat/acms-semantic-chunking-context-strategy
cleveragents:refactor/v360/unify-service-initialization
cleveragents:bugfix/m3.6.0-lsp-server-dos-message-read-timeout
cleveragents:feat/v360/pluggable-scope-chain-api-v2
cleveragents:docs/v360/actor-management-showcase
cleveragents:docs/v360/actor-removal-impact
cleveragents:docs/v360/align-depth-reduction-devcontainer
cleveragents:tdd/issue-10413-dollar-prefix-shell-mode
cleveragents:fix/issue-10503-session-export-json-stdout
cleveragents:fix/pr-10755
cleveragents:feat/v370/tui-web-mode
cleveragents:feat/v360/plugin-cli-discovery
cleveragents:fix/v360/llm-trace-latency-type
cleveragents:feat/v3.6.0/ollama-mistral-providers
cleveragents:feat/v3.6.0/adaptive-context-selector
cleveragents:feat/tui-v370/persona-registry-merge-v2
cleveragents:feat/v3.6.0/cost-tracker
cleveragents:fix/v360/resource-type-cycle-detection
cleveragents:refactor/auto-guard-1-address-todo-fixme-comments
cleveragents:feat/v3.6.0/pluggable-scope-chain
cleveragents:fix/v360/scope-chain-resolver-registration
cleveragents:test/v360/e2e-a2a-context-management
cleveragents:fix/v360/lsp-env-var-injection
cleveragents:feature/m6-sandbox-correction-invariant-docs
cleveragents:feature/m3-timeline-day97-update
cleveragents:fix/10480-validate-logic-error
cleveragents:feat/acms-cli-context-add
cleveragents:feat/acms-core-pipeline-components
cleveragents:feature/m4652-module-guides
cleveragents:feature/m5-extend-agents-diagnostics-example
cleveragents:feature/m5832-add-unreleased-changelog-entries
cleveragents:docs/add-repo-indexing-showcase
cleveragents:improvement/agent-pr-self-reviewer-blocking-vs-nonblocking
cleveragents:feature/issue-8225-validation-gate-empty-summary
cleveragents:spec/resource-type-yaml-format-canonical-5622
cleveragents:bugfix/m8179-fix-data-integrity-remove-session-rollback-calls-from-projectrepository
cleveragents:feat/v3.6.0/context-policy-strategy-config
cleveragents:test/v3.6.0/a2a-rename-regression-tests
cleveragents:fix/plan-lifecycle-root-decision-type
cleveragents:bugfix/cancel-worktree-cleanup
cleveragents:pr-10586
cleveragents:pr-9215
cleveragents:feat/issue-6357-tui-loading-states
cleveragents:temp-bug2-combined
cleveragents:timeline/day-105-2026-04-15-auto-time-1-v2
cleveragents:docs/consolidated-all-documentation
cleveragents:bugfix/m6-sandbox-reexecute-cleanup
cleveragents:fix/issue-9963-memory-service-timestamp-guards
cleveragents:docs/context-management-deep-dive-v2
cleveragents:docs/context-management-deep-dive
cleveragents:docs/agent-development-guide
cleveragents:feature/10008-file-level-correction-diff
cleveragents:feat/acms-scope-resolution-context-inheritance
cleveragents:docs/a2a-protocol-guide
cleveragents:fix/tui-bindings-reload-settings
cleveragents:docs/tui-user-guide-keybindings
cleveragents:fix/plan-generation-validate-logic
cleveragents:bugfix/issue-10408-dollar-prefix-shell-mode
cleveragents:test/issue-10500-persona-state-reset-tdd
cleveragents:docs/getting-started-tutorial
cleveragents:test/tdd-session-create-suppress-exception
cleveragents:fix/issue-10485-fallback-selector-budget-limits
cleveragents:docs/error-codes-guide
cleveragents:docs/common-tasks-recipes-guide
cleveragents:bugfix/mN-registry-thread-safety
cleveragents:test/migration-runner-sqlite-threading
cleveragents:docs/configuration-reference
cleveragents:pr-10678
cleveragents:pr-10681
cleveragents:test/issue-10510-mcptooladapter-rlock-tdd
cleveragents:feature/tui-screens-directory
cleveragents:fix/issue-10511-suppress-runtimeerror
cleveragents:pr-10676
cleveragents:fix/tui-block-cursor-bindings
cleveragents:pr-10680
cleveragents:test/issue-10502-session-export-json-tdd
cleveragents:fix/issue-10507-sqlite-check-same-thread
cleveragents:docs/installation-setup
cleveragents:test/v3.6.0/scope-chain-integration-tests
cleveragents:fix/v370/loading-throbber-restore
cleveragents:feat/v370/tui-settings-sessions-screens
cleveragents:fix/v370/tui-session-persistence
cleveragents:fix/v360/context-strategy-unification
cleveragents:fix/v370/shell-safety-regex
cleveragents:feat/v370/tui-rebase-merge
cleveragents:feat/v370/tui-complete-squashed
cleveragents:fix/v370/tui-shell-async
cleveragents:feat/v3.6.0/budget-enforcement
cleveragents:refactor/v360/decouple-cli-services
cleveragents:feat/v370/tui-session-persistence
cleveragents:auto-arch-1-spec-module-definitions
cleveragents:docs/v3.6.0-v3.7.0-updates
cleveragents:auto-time/timeline-update-2026-04-18-c3
cleveragents:auto-docs-2/add-changelog-contributing
cleveragents:auto-time/timeline-update-2026-04-18-c2
cleveragents:auto-docs-1/fix-mkdocs-nav-and-links
cleveragents:pr-5968
cleveragents:docs/timeline-day-107-2026-04-17
cleveragents:fix/issue-6323-project-context-show-output
cleveragents:improvement/agent-bug-hunt-pool-supervisor-tracking-prefix
cleveragents:auto-time/update-2026-04-17
cleveragents:docs/auto-docs-8-a2a-rename-documentation
cleveragents:auto-docs-3-v340-v350
cleveragents:docs/timeline-update-2026-04-15
cleveragents:auto-docs/initial-documentation-assessment
cleveragents:feature/m1-initial-documentation
cleveragents:fix/agent-task-list-memory-leak
cleveragents:bugfix/m4-plan-diff-correction-stub
cleveragents:pr-9247
cleveragents:docs/timeline-update-2026-04-17
cleveragents:timeline/day-106-2026-04-17-auto-time-1
cleveragents:fix/quality-gates-click82-compat
cleveragents:auto-arch-14/spec-anonymous-tool-enforcement
cleveragents:fix/issue-6441-session-create-json-output
cleveragents:fix/issue-6331-invariant-add-scope
cleveragents:timeline/day-106-2026-04-16-auto-time-1-v2
cleveragents:spec/auto-arch-23-minor-clarifications
cleveragents:timeline/day-106-2026-04-16-auto-time-2
cleveragents:docs/auto-docs-2-v380-v390
cleveragents:timeline/day-104-2026-04-14-auto-time-1
cleveragents:bugfix/m3-actor-add-v3-schema-validation
cleveragents:timeline/day-106-2026-04-16-auto-time-1
cleveragents:auto-docs/changelog-architecture-readme
cleveragents:spec/auto-arch-21-v350-autonomy-hardening
cleveragents:chore/timeline-day-105-2026-04-15
cleveragents:docs/timeline-update-2026-04-15-auto-time-1
cleveragents:timeline/day-105-2026-04-15-auto-time-1
cleveragents:benchmark-ci
cleveragents:fix/plan-phase-migration-raw-sql-root-plan-id
cleveragents:auto-arch-12/spec-acms-context-tier-hydrator
cleveragents:timeline/day-106-2026-04-15-auto-time-1
cleveragents:feat/invariant-enforcement-strategize
cleveragents:feat/plan-tree-decision-rendering
cleveragents:feat/plan-correct-revert-append-modes
cleveragents:docs/auto-docs-4-fix-conflicts
cleveragents:docs/auto-docs-1-milestone-docs-v3.0.0-v3.1.0
cleveragents:feat/v3.4.0-acms-lifecycle-policy
cleveragents:pr-9220
cleveragents:fix/a2a-facade-optional-param-validation
cleveragents:feat/ci-guard-llm-secrets
cleveragents:pr-9214
cleveragents:feat/v3.3.0-subplan-status-tracking
cleveragents:feat/v3.3.0-merge-conflict-detection
cleveragents:uat/checkpoint-rollback-merge-tests
cleveragents:fix/pr-review-pool-supervisor-prefix-mismatch
cleveragents:feat/v3.3.0-spawn-subplan-step
cleveragents:auto-time-1-day103-cycle1-session6
cleveragents:feat/v3.8.0-agent-card-endpoint
cleveragents:docs/auto-docs-cycle-24-showcase-nav
cleveragents:auto-inf-3-consolidate-behave-fixtures
cleveragents:fix/issue-7663-docs-writer-missing
cleveragents:auto-time-1-day103-cycle2
cleveragents:docs/timeline-day-104-auto-time-1
cleveragents:auto-arch-16/spec-xml-prompt-injection-mitigation
cleveragents:bugfix/m4-invariant-persistence
cleveragents:uat-a2a-facade-tests-v350
cleveragents:bugfix/m3-behave-parallel-failed-chunk-logs
cleveragents:bugfix/7664-automation-tracking-label-requirements
cleveragents:docs/auto-time-1-timeline-update-2026-04-14
cleveragents:docs/auto-docs-1-milestone-v3-updates
cleveragents:fix/issue-6344-plan-execute-rich-output
cleveragents:docs/action-config-schema-api
cleveragents:fix/bug-hunt-supervisor-nonexistent-file-preflight
cleveragents:fix/retry-policy-model-missing-fields
cleveragents:docs/validation-gate-empty-run-guard
cleveragents:auto-arch-15/spec-retry-policy-canonical-fields
cleveragents:docs/lockservice-advisory-locking
cleveragents:docs/changelog-plan-fix-4197
cleveragents:spec/milestone-plan-section
cleveragents:docs/update-changelog-recent-features
cleveragents:fix/test-infra-remove-redundant-python-variable-robot-files
cleveragents:timeline/day-104-2026-04-14-cycle2
cleveragents:fix/bdd-feature-file-tags
cleveragents:auto-arch-13/spec-default-automation-profile
cleveragents:docs/auto-docs-cycle-1-2026-04-12
cleveragents:docs/cycle-1-git-worktree-sandbox
cleveragents:spec/architecture-critical-gap-fixes
cleveragents:docs/timeline-day-104-auto-time-2
cleveragents:auto-arch-1/add-v380-v390-milestone-plan
cleveragents:docs/developer-setup-guide
cleveragents:fix/auto-profile-spec-prose-description
cleveragents:auto-arch-10/spec-tui-a2a-integration-layer
cleveragents:spec/resource-event-types-clarification
cleveragents:auto-docs-4/changelog-and-observability
cleveragents:auto-arch-4/adr-049-layered-boundary-enforcement
cleveragents:docs/a2a-protocol-autonomy-hardening
cleveragents:auto-arch-9/spec-v3.8.0-milestone-plan
cleveragents:docs/auto-docs-3-reference-index
cleveragents:auto-arch-7/spec-apply-git-worktree
cleveragents:docs/timeline-day104-cycle1-auto-time-4
cleveragents:docs/auto-docs-cycle-1-changelog-updates
cleveragents:auto-arch-6/adr-049-spec-restructuring
cleveragents:docs/auto-docs-1-v340-acms-context-management
cleveragents:docs/auto-docs-1-v320-v330-cli-reference
cleveragents:auto-arch-5/v3.9.0-milestone-plan
cleveragents:test/create-scripts
cleveragents:auto-time-1-day104
cleveragents:timeline/day-104-2026-04-14
cleveragents:docs/auto-time-4-day103-cycle5
cleveragents:auto-time-3-day103-cycle4
cleveragents:auto-docs-5-architecture-overview
cleveragents:spec/three-way-merge-strategy-v3.3.0
cleveragents:spec/checkpoint-system-v3.3.0
cleveragents:auto-docs-4-api-docs-update
cleveragents:auto-docs-1-changelog-expansion
cleveragents:spec/invariant-management-system-v3.2.0
cleveragents:pr-8289
cleveragents:spec/plan-correction-engine-v3.2.0
cleveragents:spec/layered-architecture-boundary-policy
cleveragents:spec/tui-materializer-a2a-integration-v3.7.0
cleveragents:spec/decision-recording-system-v3.2.0
cleveragents:docs/auto-docs-1-milestone-overview
cleveragents:pr-7484
cleveragents:pr-4212
cleveragents:auto-arch-3/v3.8.0-milestone-plan
cleveragents:auto-docs-6/troubleshooting-and-config
cleveragents:auto-time-1-day103-session5
cleveragents:auto-docs-5/contributor-guide-and-readme
cleveragents:docs/plan-tree-ulid-examples
cleveragents:docs/m3-spec-clarify-path-datetime-plugin-contracts
cleveragents:docs/auto-docs-cycle-10-diagnostics-ref
cleveragents:auto-docs-3/user-guide-and-architecture
cleveragents:docs/cycle-7-changelog-update
cleveragents:spec/reconciliation-failure-behavior
cleveragents:auto-docs-2/api-documentation
cleveragents:auto-arch-2/adr-053-repositories-decomposition
cleveragents:auto-docs-1/release-notes-v3.0-v3.1
cleveragents:spec/update-validation-attach-project-delete
cleveragents:spec/architecture-cycle2-impl-clarifications
cleveragents:auto-arch-1/adr-049-052-violations
cleveragents:auto-time-1-day103
cleveragents:docs/auto-docs-cycle-13-updates
cleveragents:docs/timeline-day-102-auto-time
cleveragents:timeline/day-103-2026-04-13
cleveragents:spec/arch-invariant-cli-completeness
cleveragents:spec/update-cycle1-validation-attach-project-delete
cleveragents:docs/add-session-management-showcase
cleveragents:spec/arch-sandbox-path-correction-cycle9
cleveragents:spec/architecture-v380-milestone-plan
cleveragents:docs/auto-docs-cycle-12-updates
cleveragents:docs/cycle-1-validation-gate-fix
cleveragents:docs/2026-04-08-unreleased-changelog
cleveragents:docs/auto-docs-cycle-2-2026-04-10
cleveragents:docs/session-4615-2026-04-08-cycle1
cleveragents:feat/issue-6361-shell-safety-service-tui
cleveragents:spec/architecture-cycle-25-new-features
cleveragents:fix/issue-6345-automation-profile-add-output
cleveragents:docs/timeline-day-102-2026-04-12
cleveragents:docs/cycle-2-git-worktree-acms-hydrator
cleveragents:spec/arch-sandbox-cleanup-discovery
cleveragents:docs/timeline-day96-2026-04-08
cleveragents:docs/auto-docs-cycle-11
cleveragents:spec/fix-sandbox-strategy-protocol-name
cleveragents:spec/arch-acms-tier-hydration
cleveragents:fix/v3.4.0/context-settings-defaults
cleveragents:docs/add-example-repl-and-actor-run
cleveragents:docs/auto-docs-cycle-10-updates
cleveragents:docs/session-4-2026-04-08-updates
cleveragents:docs/showcase-all-examples-consolidated
cleveragents:docs/timeline-day-97
cleveragents:docs/acms-context-hydrator-cycle2
cleveragents:docs/add-example-output-format-flags
cleveragents:spec/arch-failfast-cancel-semantics
cleveragents:timeline/day-101-2026-04-11
cleveragents:docs/timeline-day99-2026-04-09-v2
cleveragents:docs/auto-docs-cycle-2-worktree-acms
cleveragents:spec/architecture-v3.8.0-milestone-plan
cleveragents:docs/api-lsp-acms-reference
cleveragents:improvement/agent-bug-hunt-pool-supervisor-yaml-syntax-fix
cleveragents:spec/project-delete-deleted-at-field
cleveragents:spec/architecture-provider-registry-tui-materializer
cleveragents:spec/document-reconciliation-blocked-error-5942
cleveragents:fix/issue-7482-git-log-injection
cleveragents:spec/devcontainer-auto-discovery-schema
cleveragents:feat/issue-6350-conversation-content-pruning
cleveragents:docs/update-module-guides-2026-04-10
cleveragents:timeline/day-100-2026-04-10-auto-time-cycle1
cleveragents:timeline/day-99-2026-04-09-auto-time-v2
cleveragents:docs/cycle-3-module-guides
cleveragents:timeline/day-99-2026-04-09-auto-time
cleveragents:pr-4226
cleveragents:spec/additional-llm-providers-gemini-groq-cohere-together-ollama-mistral
cleveragents:spec/document-context-tier-hydrator-6175
cleveragents:docs/timeline-day99-2026-04-09
cleveragents:spec/invariant-cli-clarifications
cleveragents:docs/add-example-project-init-and-context-management
cleveragents:spec/reconciliation-blocked-error-documentation
cleveragents:spec/fix-invariant-precedence-reference-5861
cleveragents:spec/fix-plan-correct-accepts-plan-id-5558
cleveragents:spec/fix-validation-attach-synopsis-5328
cleveragents:docs/timeline-day-99-cycle-1
cleveragents:docs/timeline-day-99-cycle-2
cleveragents:fix/actor-context-list-regex-arg
cleveragents:docs/timeline-day-99-cycle-3
cleveragents:spec/arch-security-mode-init
cleveragents:docs/auto-docs-cycle-9-updates
cleveragents:fix-resource-fix-resource-remove-to-check-correct-edge-table
cleveragents:feat/issue-6434-tui-env-var-expansion
cleveragents:fix/issue-6321-plan-prompt-timing-field
cleveragents:fix/issue-6322-resource-add-url-flag
cleveragents:feat/issue-6348-sessions-screen
cleveragents:spec/plan-show-command
cleveragents:temp
cleveragents:feat/harden-label-restrictions-1775753628
cleveragents:spec/invariant-reconciliation-failure-behavior
cleveragents:spec/add-reconciliation-failure-behavior-5942
cleveragents:spec/architecture-corrections-cycle3
cleveragents:spec/checkpoint-trigger-names-and-config-key-fix
cleveragents:spec/fix-ai-provider-interface-5801
cleveragents:spec/azure-api-version-default-update
cleveragents:docs/auto-docs-writer-cycle1-labels
cleveragents:spec/fix-resource-type-yaml-format-5622
cleveragents:spec/add-plan-revert-resume-commands-5574
cleveragents:docs/auto-docs-cycle-1-2026-04-09
cleveragents:spec/plan-correct-plan-id-or-decision-id-5558
cleveragents:spec/fix-subgraph-node-actor-ref-field-5427
cleveragents:issue/5284-master-ci-fix
cleveragents:timeline/day-99-2026-04-09-v2
cleveragents:merge-me
cleveragents:docs/session-3377-initial-docs-update
cleveragents:fix/llm-provider-subpackage-exports
cleveragents:spec/arce-acronym-and-tui-keybinding-fixes
cleveragents:spec/architecture-corrections-cycle2
cleveragents:spec/architecture-corrections-cycle1
cleveragents:docs/cycle-1-updates
cleveragents:spec/tui-clarifications-session-export-persona
cleveragents:docs/session-4940-2026-04-08-cycle1
cleveragents:spec/architecture-milestone-plan-v3.2-v3.7
cleveragents:docs/session-4743-2026-04-08-cycle1
cleveragents:docs/timeline-day-98
cleveragents:fix/plan-lifecycle-service-rollback-method
cleveragents:docs/timeline-day98-2026-04-08-v2
cleveragents:docs/add-example-action-and-plan-management
cleveragents:docs/session-2026-04-06-updates
cleveragents:docs/ca-docs-writer-v3.8.1-2026-04-05
cleveragents:fix/session-tell-stub-missing-panels-and-actor-execution
cleveragents:improvement/agent-arch-guard-clone-failure-handling
cleveragents:improvement/agent-test-infra-health-spam-fix-v2
cleveragents:fix-tdd-invert-non-assertion-exceptions
cleveragents:improvement/agent-arch-guard-clone-failure
cleveragents:bugfix/3472-fix-tdd-inversion-logic
cleveragents:bugfix/989-fix-persistence-json-decode-error
cleveragents:improvement/agent-supervisor-tracking-labels-v2
cleveragents:docs/timeline-day95-v2
cleveragents:docs/timeline-day95-final
cleveragents:docs/update-lsp-api-and-changelog
cleveragents:fix/lsp-resource-handler-module-missing
cleveragents:docs/timeline-day95-final-2026-04-05
cleveragents:fix/a2a-plan-correct-rollback-wiring
cleveragents:docs/add-lsp-api-and-changelog-2026-04-05
cleveragents:fix/tool-registry-validation-type-discriminator
cleveragents:docs/v3.7.0-documentation-update
cleveragents:docs/ca-docs-writer-2026-04-05-cycle2
cleveragents:fix/invariant-set-merge-action-scope
cleveragents:docs/unreleased-feature-docs
cleveragents:fix/concurrency-cost-tracker-record-usage-race-condition
cleveragents:improvement/agent-ca-test-infra-improver-failure-handling
cleveragents:docs/update-changelog-mcp-plan-ci-2026-04-05
cleveragents:improvement/agent-pr-reviewer-milestone-prioritization
cleveragents:docs/timeline-day95-refresh-2026-04-05
cleveragents:improvement/agent-mandatory-labels-tracking-issues
cleveragents:docs/api-domain-providers-changelog-2026-04-05
cleveragents:docs/ca-docs-writer-2026-04-05
cleveragents:docs/timeline-day95-refresh
cleveragents:fix/skill-add-include-validation
cleveragents:docs/timeline-day-95-2026-04-05-update3
cleveragents:docs/timeline-day-95-2026-04-05-update2
cleveragents:docs/ci-incident-runbook-2597
cleveragents:improvement/agent-ca-test-infra-improver-worker-api-mode
cleveragents:docs/shell-safety-api-and-readme-highlights
cleveragents:docs/timeline-day-55-2026-04-04-v2
cleveragents:docs/timeline-day-55-2026-04-04
cleveragents:docs/timeline-day54-update3
cleveragents:improvement/agent-ca-test-infra-improver-fixes
cleveragents:spec/restructure-monolithic-to-split
cleveragents:docs/timeline-day54-update-v2
cleveragents:docs/timeline-day54-update
cleveragents:fix-agents
cleveragents:docs/shell-safety-and-domain-base-model
cleveragents:fix/1452-impl
cleveragents:fix/1473-plan-cancel
cleveragents:fix/1425-test
cleveragents:fix/1426-config
cleveragents:fix/1421-perf
cleveragents:fix/1424-impl
cleveragents:test/int-wf16-devcontainer
cleveragents:feature/m8-tui-persona-export
cleveragents:feature/m7-post-resource-equivalence
cleveragents:feature/m6-tantivy-backend
cleveragents:feature/m6-estimation
cleveragents:feature/m6-estimation-report-model
cleveragents:feature/observability-prometheus-audit
cleveragents:feat/server-auth-namespace
cleveragents:feature/m8-session-editing
cleveragents:feature/llm-actor-subplan-wiring
cleveragents:feature/m8-tui-first-run-actor-selection
cleveragents:feature/m8-tui-conversation-block-catalog
cleveragents:feature/m8-tui-settings-screen
cleveragents:feature/m7-e2e-porting
cleveragents:feature/m6-estimation-historical-stats
cleveragents:feature/m8-tui-persona-export-import
cleveragents:feature/m8-tui-sessions-screen
cleveragents:feature/m7-graph-backend
cleveragents:feature/m8-tui-block-context-menu
cleveragents:feature/m8-tui-tool-call-expand
cleveragents:feature/m4-missing-builtin-tools
cleveragents:docs/v3.7.0-release-docs
cleveragents:feature/m8-tui-session-export
cleveragents:test/e2e-wf15-disaster-recovery
cleveragents:test/e2e-wf03-refactoring
cleveragents:test/e2e-m3-acceptance
cleveragents:feature/m8-tui-prompt-history
cleveragents:feature/m8-tui-actor-thought-block-rendering
cleveragents:bugfix/m6-build-hierarchy-child-ids
cleveragents:feature/resource-inheritance-wiring
cleveragents:test/e2e-wf09-session
cleveragents:test/e2e-wf06-doc-generation
cleveragents:test/e2e-wf08-cloud-infra
cleveragents:test/e2e-wf02-test-generation
cleveragents:test/e2e-wf13-custom-profile
cleveragents:test/e2e-wf11-graph-actor
cleveragents:test/e2e-wf01-hello-world
cleveragents:test/int-wf17-explicit-container
cleveragents:test/int-wf12-hierarchical
cleveragents:test/int-wf15-disaster-recovery
cleveragents:test/int-wf13-custom-profile
cleveragents:test/int-wf03-refactoring
cleveragents:test/int-wf11-graph-actor
cleveragents:test/int-wf10-batch
cleveragents:test/int-wf09-session
cleveragents:feature/m3-tdd-issue-consistency-gate
cleveragents:feature/m3-invariant-enforcement-strategize
cleveragents:test/int-wf18-container-clone
cleveragents:test/int-wf01-hello-world
cleveragents:feature/m6-diagnostic-dashboard-health-categories
cleveragents:feature/m6-cli-polish
cleveragents:fix/e2e-db-isolation
cleveragents:feature/m7-post-tui
cleveragents:feature/m9-asgi-endpoint
cleveragents:feature/m7-post-server
cleveragents:tdd/m7-audit-session-race
cleveragents:tdd/m3-skill-add-regression
cleveragents:feature/m9-remote-repos
cleveragents:feature/fs-mount-file-types
cleveragents:tdd/container-resolve-crash
cleveragents:test/e2e-m1-acceptance
cleveragents:test/e2e-m2-acceptance
cleveragents:eugen.thaci-patch-3
cleveragents:eugen.thaci-patch-2
cleveragents:eugen.thaci-patch-1
cleveragents:aditya-fix-latest
cleveragents:feature/m4-secret-masking-llm-context
cleveragents:aditya-fix
cleveragents:refactor/m3-replace-mktemp
cleveragents:refactor/m3-remove-unittest-mock-integration
cleveragents:refactor/m3-remove-robot-mock-imports
cleveragents:refactor/m3-remove-mock-llm-integration
cleveragents:docs/improved-menu-adr
cleveragents:feature/m7-post-auth
cleveragents:feature/m3-fix-resource-bootstrap
cleveragents:feature/post-safety-profile-tests
cleveragents:integration/batch-2026-03-02
cleveragents:feat/slipcover
cleveragents:docs/safety-profile-spec-composition
cleveragents:integrate/freemo-batch-1
cleveragents:feature/m4-error-recovery
cleveragents:feature/m4-security-template
cleveragents:feature/m3-validation-pipeline
cleveragents:develop-aditya-2
cleveragents:feature/m3-diff-review
cleveragents:feature/m3-validation-apply
cleveragents:feature/m6-acp-stubs
cleveragents:feature/m4-correction-flows
cleveragents:feature/m1-plan-execute-runtime
cleveragents:feature/m4-security-exceptions
cleveragents:feature/m4-definition-of-done
cleveragents:feature/m4-correction-model
cleveragents:feature/m1-apply-pipeline
cleveragents:feature/m5-automation-profiles
cleveragents:feature/m2-lsp-stubs
cleveragents:feature/m3-invariants
cleveragents:feature/m1-actor-runtime
cleveragents:feature/docs-v2-restore
cleveragents:feature/m6-perf-scale
cleveragents:feature/m6-validation-edge
cleveragents:feature/m3-session-cli
cleveragents:feature/m1-persistence-tests-robot
cleveragents:feature/m3-config-cli
cleveragents:feature/m1-cli-tests-robot
cleveragents:feature/m5-subplan-tests
cleveragents:feature/m6-review-playbook
cleveragents:feature/aditya-m3-actor-loader
cleveragents:feature/m3-skill-protocol
cleveragents:feature/m4-automation-legacy-cleanup
cleveragents:feature/m3-change-model
cleveragents:feature/m3-skill-git
cleveragents:feature/m3-skill-registry
cleveragents:feature/m4-security-eval
cleveragents:fix/robot-tests
cleveragents:feature/m3-actor-registry
cleveragents:feature/m3-tool-cli
cleveragents:feature/m4-automation-profiles-cli
cleveragents:feature/m2-resource-cli-extensions
cleveragents:feature/m3-actor-loader
cleveragents:feature/m3-tool-domain-robot
cleveragents:feature/m3-skill-domain-robot
cleveragents:feature/m3-skill-cli
cleveragents:feature/m1-resource-db-robot-tests
cleveragents:feature/m3-session-domain-robot
cleveragents:feature/m1-persistence-tests
cleveragents:feature/m1-cli-tests
cleveragents:ten-branches-backup
cleveragents:feature/m3-skill-schema
cleveragents:feature/m3-session-persistence
cleveragents:feature/automation-profiles-and-resource-dag
cleveragents:feature/m1-plan-repo
cleveragents:feature/m1-db-plan-phase-rebaseline
cleveragents:feat/B4-sandbox
cleveragents:feat/B2-cli-wiring
cleveragents:feat/B5-project-persistence
cleveragents:feat/B1-project-data-models
cleveragents:feat/b1-data-models
cleveragents:feat-repo-manager-and-sourcegraph-support
cleveragents:feat/actor-schema
cleveragents:fix/component-isolation-security-fix
cleveragents:feat/ontology-agent
cleveragents:fix/error-handling-security-fix
cleveragents:fix/concurrency-security-fix
cleveragents:fix/serialization-security-fix
cleveragents:fix/server-side-request-forgery-security-fix
cleveragents:fix/file-system-security
cleveragents:fix/template-injection-fix
cleveragents:fix/data-injection-fix
cleveragents:tests/unit-tests
cleveragents:latest/poetry-generator
cleveragents:poetry-generator
cleveragents:config/contract-metadata-extractor
cleveragents:docs/readme-yaml-syntax
cleveragents:config/memory-yaml
cleveragents:fix/double-response
cleveragents:brent-additions
cleveragents:intel_2_demo
No reviewers
Labels
Clear labels
auto/needs-reevaluation
Controller deferred this PR; awaiting Phase 6+ scope-evaluator or operator re-enablement.
controller-managed
Auto-agents controller manages this PR/issue (see tools/controller/deploy/RUNBOOK.md). Remove this label to abandon controller management.
auto/blocked-by-deps
PR blocked by an open issue dependency. Operator must close the dep (or remove the dependency link) before the merge driver can act. Auto-cleared by merge_drive when no open deps remain.
auto/ci-timeout
Most recent merge cycle hit CI timeout. Driver excludes this PR while last merge_cycle row is < 30 min old; label persists thereafter as visible history.
auto/claimed-implementer
Currently being processed by an implementer worker.
auto/claimed-merge
Currently being processed by the merge driver.
auto/claimed-reviewer
Currently being processed by a reviewer worker.
auto/driver-down
Merge driver heartbeat stale; pipeline halted. Closed automatically on next clean tick.
auto/invariant-violation
Detected master commit violating the strict merge invariant. Tracked as an issue (not a PR label); kept here for label completeness.
auto/last-attempt-tier-0
In-cycle escalation: most recent attempt ran at the Tier 0 slot (`tier-0`). Slot's model defined in .opencode/models/tiers.yaml.
auto/last-attempt-tier-1
In-cycle escalation: most recent attempt ran at the Tier 1 slot (`tier-1`). Slot's model defined in .opencode/models/tiers.yaml.
auto/last-attempt-tier-2
In-cycle escalation: most recent attempt ran at the Tier 2 slot (`tier-2`). Slot's model defined in .opencode/models/tiers.yaml. Gated behind IMPLEMENTER_ESCALATION_TIER2_ENABLED.
auto/last-attempt-tier-min
In-cycle escalation: most recent attempt ran at the Tier -1 slot (`tier-min`). Slot's model defined in .opencode/models/tiers.yaml. Suffix is ``-min`` (not ``--1``) so the Forgejo UI reads naturally.
Automation Tracking
Tracking issues used by the AI Automation system for agents to communicate and report.
auto/needs-conflict-resolution
Rebase conflict needs LLM conflict-resolver.
auto/needs-implementer
Failing CI needs implementer attention.
auto/postmortem
Documenting a driver incident or rollback.
auto/ready-to-merge
Reviewer has APPROVED this PR and no later REQUEST_CHANGES is outstanding. The merge driver requires this label to even consider a PR for merging. Set by the reviewer worker on APPROVE; cleared on REQUEST_CHANGES.
auto/restart-throttled
Train repeatedly lost master-tempo races. Driver excludes via merge_cycle until cooldown elapses; label persists as visible history.
auto/revert
Revert PR backing out an invariant violation. Fast-tracked through the merge driver.
auto/sentinel
Sentinel PR duplicated from upstream into a personal fork by tools/duplicate_prs_to_fork.py for pipeline testing. Lives only in the fork; the canonical pipeline never sees it.
auto/stale-inactivity
No implementer activity for N days. Flagged for human review. Auto-cleared on next push to head branch.
auto/unstable
Repeatedly fails on current master (>= 3 ci-fail-on-rebased-sha releases in 12 h). Excluded from driver until human triage.
Blocked
A ticket in a blocked state and unable to complete until some other task is completed first.
Bounty
$100
A bounty of $100 for any open-source contributor who provides a MR that solves this issue
Bounty
$1000
A bounty of $1000 for any open-source contributor who provides a MR that solves this issue
Bounty
$10000
A bounty of $10000 for any open-source contributor who provides a MR that solves this issue
Bounty
$20
A bounty of $20 for any open-source contributor who provides a MR that solves this issue
Bounty
$2000
A bounty of $2000 for any open-source contributor who provides a MR that solves this issue
Bounty
$250
A bounty of $250 for any open-source contributor who provides a MR that solves this issue
Bounty
$50
A bounty of $50 for any open-source contributor who provides a MR that solves this issue
Bounty
$500
A bounty of $500 for any open-source contributor who provides a MR that solves this issue
Bounty
$5000
A bounty of $5000 for any open-source contributor who provides a MR that solves this issue
Bounty
$750
A bounty of $750 for any open-source contributor who provides a MR that solves this issue
MoSCoW
Could have
Could have feature in order to satisfy the epic/legendary.
MoSCoW
Must have
Must have feature in order to satisfy the epic/legendary.
MoSCoW
Should have
Should have feature in order to satisfy the epic/legendary.
Needs Feedback
There are questions in the ticket that can not be completed until the project owner provides clarity.
Points
1
1 man-hours worth of work for an expert with no learning curve.
Points
13
13 man-hours worth of work for an expert with no learning curve.
Points
2
2 man-hours worth of work for an expert with no learning curve.
Points
21
21 man-hours worth of work for an expert with no learning curve.
Points
3
3 man-hours worth of work for an expert with no learning curve.
Points
34
34 man-hours worth of work for an expert with no learning curve.
Points
5
5 man-hours worth of work for an expert with no learning curve.
Points
55
55 man-hours worth of work for an expert with no learning curve.
Points
8
8 man-hours worth of work for an expert with no learning curve.
Points
88
88 man-hours worth of work for an expert with no learning curve.
Priority
Backlog
This ticket has backlogged priority and is not to be worked on yet
Priority
CI Blocker
Critical priority issue that blocks CI/CD pipeline and prevents PR merges
Priority
Critical
The priority is critical
Priority
High
The priority is high
Priority
Low
The priority is low
Priority
Medium
The priority is medium
Signed-off: Owner
When an epic or legendary is in review it must be signed off by owner, tech lead, and scrum master before being marked as completed.
Signed-off: Scrum Master
When an epic or legendary is in review it must be signed off by owner, tech lead, and scrum master before being marked as completed.
Signed-off: Tech Lead
When an epic or legendary is in review it must be signed off by owner, tech lead, and scrum master before being marked as completed.
Spike
A ticket for learning a tool or technology that is needed to be able to do future planning and design.
State
Completed
The ticket has been fully implemented, completed, and merged with the source code. This label should only be applied once a ticket is closed.
State
Duplicate
A ticket that represents the same content as an existing ticket.
State
In Progress
A ticket that is actively being developed.
State
In Review
A ticket that has had some code completed to implement but is waiting to pass peer review and is not yet merged in.
State
Paused
This ticket's work started but wasn't finished. It's on hold (likely in a feature branch) and will be resumed later, either due to a blocker or a delay.
State
Unverified
All new tickets start in this state. A developer may set it to show the ticket is unverified. This means we haven't agreed to work on it. It will either move to a verified state or be closed as wontdo.
State
Verified
The issue has been verified by a developer as legitimate. It will be worked on and verified tickets are now considered part of the backlog.
State
Wont Do
This ticket has been decided it wont be done. This may mean the bug has been determined to not be real (cant verify) or the feature is one we have decided we dont want to adopt.
Type
Automation
Any edits or discussion about the AI automated coding system.
Type
Bug
Something that doesnt work as intended.
Type
Discussion
Anytime a ticket represents a discussion about a subject and doesnt fall into one of the other categories.
Type
Documentation
An error or improvement needed in the documentation.
Type
Epic
Any first tier epic. That is, an epic which contains only issues as children and will not have sub-epics.
Type
Feature
Some new functionality not present.
Type
Legendary
A type of Epic which will contain other Epics.
Type
Refactor
A code change that restructures existing code without changing its external behavior.
Type
Support
Someone needs help using the project.
Type
Task
A generic task that doesnt fit into the other type categories.
Type
Testing
Work exclusively focusing on fixing or expanding testing.
No labels
auto/needs-reevaluation
controller-managed
auto/blocked-by-deps
auto/ci-timeout
auto/claimed-implementer
auto/claimed-merge
auto/claimed-reviewer
auto/driver-down
auto/invariant-violation
auto/last-attempt-tier-0
auto/last-attempt-tier-1
auto/last-attempt-tier-2
auto/last-attempt-tier-min
Automation Tracking
auto/needs-conflict-resolution
auto/needs-implementer
auto/postmortem
auto/ready-to-merge
auto/restart-throttled
auto/revert
auto/sentinel
auto/stale-inactivity
auto/unstable
Blocked
Bounty
$100
Bounty
$1000
Bounty
$10000
Bounty
$20
Bounty
$2000
Bounty
$250
Bounty
$50
Bounty
$500
Bounty
$5000
Bounty
$750
MoSCoW
Could have
MoSCoW
Must have
MoSCoW
Should have
Needs Feedback
Points
1
Points
13
Points
2
Points
21
Points
3
Points
34
Points
5
Points
55
Points
8
Points
88
Priority
Backlog
Priority
CI Blocker
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Signed-off: Owner
Signed-off: Scrum Master
Signed-off: Tech Lead
Spike
State
Completed
State
Duplicate
State
In Progress
State
In Review
State
Paused
State
Unverified
State
Verified
State
Wont Do
Type
Automation
Type
Bug
Type
Discussion
Type
Documentation
Type
Epic
Type
Feature
Type
Legendary
Type
Refactor
Type
Support
Type
Task
Type
Testing
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".
No due date set.
Dependencies
No dependencies set.
Reference
cleveragents/cleveragents-core!814
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "test/e2e-m4-acceptance"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
E2E acceptance test for M4 (v3.3.0) — corrections, subplans, and checkpoints. Tests both revert and append correction modes on the plan lifecycle.
Closes #744
ISSUES CLOSED: #744
Manual Verification
Prerequisites
OPENAI_API_KEYorGEMINI_API_KEYenvironment variable setCommands
What to Look For
plan correct --mode revertcompletes without errorplan correct --mode appendcompletes without errorTracebackin any command's stderrd9ee9094576a95fa90fa6a95fa90fa210904e340210904e340a701676aa5a701676aa5052203c923052203c9231417aae2011417aae201ef3d4574adPM Review — Day 34
Status: Mergeable, 0 reviews, M4 (v3.3.0)
Closes: #744 | Author: @freemo
E2E acceptance test for M4: subplan spawning, correction flows (revert/append), checkpoints, decision tree, plan lifecycle. Single clean commit. PR body includes thorough manual verification steps.
Structurally sound. Requires real LLM API keys at runtime (not hardcoded). Depends on PR #710 merging first for CI execution.
Action Items
PM Status — Day 36 (2026-03-16)
Day 34 review assignment deadline check. This PR has 0 reviewer activity after 2 days.
Priority note: M3 PRs take precedence. Reviewers should complete M3 reviews first, then address M4+ PRs in milestone order.
Assigned reviewer: Please acknowledge and provide an ETA for your review, or flag if reassignment is needed.
@hamza.khyari I am going to have you take over this PR, it is mostly completed but is waiting on #628 and #966 One is yours and one is Brent's. Please be sure to get this PR and the two blocking PRs I listed in asap, thanks.
ef3d4574adbecfb68b1c- Add init step with --path ${SUITE_HOME} for DB creation (#1023) - Pass --plan explicitly to plan correct for isolated env (#1025) - Fix terminal state assertion: lifecycle-apply produces apply/queued not applied - Tag test with tdd_bug_1023, tdd_bug_1024, tdd_bug_1025 for infra bug tracking ISSUES CLOSED: #744PR #814 — M4 E2E Acceptance Review
Summary
Test-only PR adding
m4_acceptance.robot(248 lines). Scope is clean — only new robot file + CHANGELOG. No production code modified. Does not touchcommon_e2e.resourceor other test files.P1 — Missing
Skip If No LLM KeysguardFile:
robot/e2e/m4_acceptance.robotSame issue as PR #799. The test uses real LLM calls (openai/gpt-4) but never calls
Skip If No LLM Keys. CI without API keys will hard-FAIL. M6 calls this on every LLM-dependent test.Fix: Add
Skip If No LLM Keysas the first keyword call in the test case body.P1 — Conditional checkpoint rollback is a silent skip path
File:
robot/e2e/m4_acceptance.robot(step 11)The M4 milestone AC explicitly states: "Checkpoint creation and rollback (
plan rollback) functional." However, step 11 silently skips rollback if no checkpoint was created:This means the test can pass without ever exercising checkpoint rollback — a core M4 AC. The WARN log is buried in Robot output and won't cause CI to flag it.
Suggestion: Either (a) assert a checkpoint exists after execution (
Should Be True ${has_checkpoint} > 0) to enforce the AC, or (b) if checkpoints are genuinely optional at this stage, add a dedicated test case for checkpoint rollback taggedtdd_expected_failso the gap is tracked.P1 — Subplan spawning is not verified
File:
robot/e2e/m4_acceptance.robot(step 6)The M4 AC says: "Plans spawn child subplans during execution" and "Subplan status tracking works." The test tree inspection is purely diagnostic:
There is no assertion at all — the count is only logged. The test can pass with zero subplans, which violates the milestone's core acceptance criteria.
Suggestion: Either assert
Should Be True ${subplan_count} > 0or, if the action definition needs to be crafted to guarantee subplan spawning, modify the action YAML to explicitly require decomposition.P2 — Does not use shared
Extract Plan Idfromcommon_e2e.resourceFile:
robot/e2e/m4_acceptance.robot(lines 83-85)PR #799 refactors
Extract Plan Idintocommon_e2e.resourcewith UUID/plan_id:fallback support. This PR uses the old inline pattern instead:This creates three problems:
plan_id:fallback extraction[0-9A-HJ-NP-Z]{26}, M3's shared keyword uses[0-9A-Z]{26}Fix: Use
Extract Plan Id ${r_use.stdout} ${r_use.stderr}from the common resource.P2 —
Run Processcalls for git commands lacktimeoutandon_timeout=killFile:
robot/e2e/m4_acceptance.robot(lines 42-44)Same as PR #799. Bare
Run Process git add ./git commit/git rev-parsewithout timeout or on_timeout. M5 appliestimeout=60s on_timeout=killconsistently.P2 —
Exercise Checkpoint Rollbackkeyword usesexpected_rc=Noneto swallow failuresFile:
robot/e2e/m4_acceptance.robot(line 240)The rollback command uses
expected_rc=Noneso any exit code is accepted, then logs a WARN if non-zero:Combined with the conditional skip in step 11, this means rollback is either not exercised at all or its failure is silently swallowed. For an AC that says "Checkpoint creation and rollback functional," this is too lenient.
P2 — Boilerplate duplication (init/resource/project/action setup)
Same as PR #799. The ~45-line setup sequence duplicates M1, M2, M3. M6 extracted this into
Setup Plan Test Resources.P3 — Correction output assertions are minimal
Steps 9-10 (revert and append correction) assert
Should Not Be Empty ${r_correct_revert.stdout}which is better than M3's approach, but don't check for any correction-specific content. Consider checking for keywords likerevert,append,correction, orapplied.P3 —
[Timeout] 15 minutesis generousThe test-level timeout of 15 minutes is correct to have (M3 lacks one entirely), but individual step timeouts sum to ~23 minutes of possible execution. The test-level timeout should be the hard ceiling. This is fine as-is but worth noting the individual timeouts can't all fire without hitting the test timeout first.
Cross-PR Coordination Note
PR #799 and #814 have a merge-order dependency. If #814 merges first, the
common_e2e.resourcechanges from #799 (sharedExtract Plan Id) will not be available, and #814 will work fine with its inline pattern. If #799 merges first, #814 should be updated to use the shared keyword. Recommend merging #799 first and rebasing #814 to adopt the sharedExtract Plan Id.Verdict
REQUEST CHANGES — P1 items: missing skip guard, silent checkpoint skip, and unverified subplan spawning mean two of M4's three core ACs (subplans, checkpoints) can pass without being exercised. The test effectively only validates corrections, and even those have minimal output assertions.
@ -0,0 +20,4 @@[Documentation] End-to-end acceptance test for the M4 milestone....... Tests subplan spawning, correction modes (revert/append),... checkpoint creation, and rollback with real LLM API keys.P1: Missing
Skip If No LLM Keysguard. This test uses real LLM calls (openai/gpt-4) but will hard-FAIL in CI without API keys. AddSkip If No LLM Keysas the first keyword call. M6 does this consistently.@ -0,0 +39,4 @@Create Directory ${repo_dir}${/}srcCreate File ${repo_dir}${/}src${/}main.py print("hello")\nCreate File ${repo_dir}${/}src${/}utils.py def helper(): return True\nRun Process git add . cwd=${repo_dir}P2:
Run Processcalls for git add/commit/rev-parse lacktimeoutandon_timeout=kill. M5 appliestimeout=60s on_timeout=killconsistently to allRun Processcalls. Hung git processes block CI.@ -0,0 +80,4 @@Should Not Contain ${r_use.stdout}${r_use.stderr} TracebackShould Not Contain ${r_use.stdout}${r_use.stderr} INTERNALShould Not Be Empty ${r_use.stdout}${plan_ids}= Get Regexp Matches ${r_use.stdout} [0-9A-HJ-NP-Z]{26}P2: Uses inline
Get Regexp Matcheswith Crockford ULID pattern instead of the sharedExtract Plan Idkeyword fromcommon_e2e.resource(added by PR #799). Creates inconsistency with M2/M3 and doesn't benefit from the UUID/plan_id:fallback.Replace with:
@ -0,0 +116,4 @@# We log whether subplan decisions were found for diagnostic purposes.${subplan_matches}= Get Regexp Matches ${r_tree.stdout} subplan${subplan_count}= Get Length ${subplan_matches}Log Subplan-related entries in tree: ${subplan_count}P1: Subplan spawning — a core M4 AC — is only logged, never asserted. The comment says "test still passes" with zero subplans. This means the test can pass without exercising the primary M4 feature.
Either assert
Should Be True ${subplan_count} > 0or design the action YAML to reliably trigger subplan decomposition.@ -0,0 +177,4 @@# produced a checkpoint, exercise rollback; otherwise log and continue.Run Keyword If ${has_checkpoint} > 0... Exercise Checkpoint Rollback ${plan_id} ${checkpoint_matches}[0]Run Keyword If ${has_checkpoint} == 0P1: Checkpoint rollback is conditionally skipped. M4 AC says "Checkpoint creation and rollback functional." If
has_checkpoint == 0, rollback is never exercised and the test passes. A WARN log in Robot output won't trigger CI failure.Either assert
Should Be True ${has_checkpoint} > 0to enforce the AC, or add a separatetdd_expected_failtest case for checkpoint rollback so the gap is visible.@ -0,0 +237,4 @@${r_rollback}= Run CleverAgents Command... plan rollback ${plan_id} ${checkpoint_id}... --yes --format plain... timeout=120s expected_rc=NoneP2:
expected_rc=Noneswallows rollback failures silently. Combined with the conditional skip in step 11, checkpoint rollback is either never exercised or its failure is ignored. For an AC that says "rollback functional," this is too lenient.Code Review — PR #814
test(e2e): E2E acceptance criteria for M4 (v3.3.0)Reviewer: @brent.edwards | Size: M (test-only) | Focus: E2E test quality, M4 acceptance criteria coverage
P1:must-fix (3)
1. Missing
Skip If No LLM Keysguard — same as PR #799Hard-FAILs CI without API keys instead of graceful skip.
2. Checkpoint rollback conditionally skipped with WARN log — core M4 AC unexercised
Exercise Checkpoint Rollbackusesexpected_rc=None+ WARN log pattern: if the command fails, it logs a warning and continues. The M4 milestone acceptance criteria state "Checkpoint creation and rollback (plan rollback) functional." This core requirement can pass CI completely unverified.Fix: If the command fails, FAIL the test (not WARN). Use
Should Be Equal As Integers ${result.rc} 0without the WARN fallback.3. Subplan spawning only logged, never asserted — core M4 AC unverified
The M4 AC states "Plans spawn child subplans during execution." The test logs subplan-related output but never asserts that a subplan was actually created.
plan tree --format jsonoutput should be parsed to verify at least one child node exists.Fix: Parse
plan tree --format jsonoutput and assertlen(tree.children) > 0or equivalent.P2:should-fix (3)
4. Uses a different ULID extraction regex than PR #799 — inconsistent. Extract to shared keyword.
5.
Run Processgit calls lacktimeout/on_timeout=kill.6.
Exercise Checkpoint Rollbackhasexpected_rc=None— this is not a Robot keyword argument. It's likely a Python-side parameter that silently does nothing. The RobotRun Processkeyword doesn't acceptexpected_rc. Check whether this is dead code.P3:nit (2)
7. Correction output assertions don't check for correction-specific keywords (e.g., "reverted", "appended").
8.
[Timeout] 15 minutesis present (good) — but per-step timeouts onRun Processwould give faster feedback on hung individual commands.Positive Observations
--mode revertand--mode appendcorrection paths — good M4 coverageplan treeandplan explainare tested after correction — verifies post-correction state[Timeout] 15 minutescorrectly set (unlike M3 PR #799)Verdict: REQUEST_CHANGES — the skip guard (P1-1), checkpoint rollback (P1-2), and subplan assertion (P1-3) must be addressed. The checkpoint and subplan verifications are the core M4 differentiators — without them, this test is M3 with extra commands.
PM Status — Day 37 — Rebase Required
This PR has merge conflicts and cannot be merged in its current state. 42% of all open PRs (21 of 50) have conflicts — this is a project-wide issue that must be resolved.
@freemo — Please rebase this PR onto
masterby Day 39 EOD (2026-03-19). If you cannot rebase by then, please post a comment explaining the blocker.PM rebase request — Day 37
e577acd99fe9ed9f3151Response to @brent.edwards review findings
Branch rebased on master, workarounds removed,
tdd_expected_failadded. Test now expresses intended behavior and fails naturally due to infra bugs #1023/#1024/#1025.nox -s e2e_tests: 17 tests, 17 passed.Skip If No LLM Keysd7d63e42)Skip If No LLM Keysadded. Note: no other E2E test (M1, M2) calls this either —common_e2e.resource:46-54defines it but has zero callers codebase-wide.tdd_expected_fail(test fails at step 2). Added TODO comment documenting that assertions should be hardened when infra bugs are fixed. References bug #822 (simulated rollback).tdd_expected_fail. Subplan spawning is non-deterministic (depends on LLM strategy output). Added TODO comment with options: craft action YAML for forced decomposition, or tag separate subplan test.Extract Plan IdExtract Plan Iddoes not exist incommon_e2e.resourceon master. It is a local keyword inm1_acceptance.robot:124-135only. M2 does inline extraction. No shared keyword to adopt.timeout/on_timeoutRun Process gitcalls.common_e2e.resource:92-98(Create Temp Git Repo) has 5 bare git calls without timeout. This is the universal codebase pattern, not an M4-specific omission.expected_rc=Noneis dead codeexpected_rcis a custom parameter ofRun CleverAgents Command(common_e2e.resource:61), not a RobotRun Processargument. Line 69 explicitly checks:Run Keyword If '${expected_rc}' != 'None'— passingNoneintentionally disables the RC assertion. Working as designed.tdd_expected_fail.timeout=onRun CleverAgents Commandcalls.Code Review — PR #814
(Cannot submit formal approval — self-authored PR.)
M4 E2E acceptance criteria test. Well-structured with proper labels, milestone (v3.3.0), and issue linkage (Closes #744). No issues found.
Code Review Report -- PR #814 / Issue #744
M4 E2E Acceptance Test (
test/e2e-m4-acceptance)Reviewer: Automated Code Review (3 full review cycles)
Scope: All code changes on branch
test/e2e-m4-acceptance(commits by Hamza Khyari) plus close connections to surrounding code (common_e2e.resource, sibling E2E test suites, specdocs/specification.md, CLI implementationcli/commands/plan.py).Files reviewed:
robot/e2e/m4_acceptance.robot(new, 251 lines),CHANGELOG.md(9 lines changed)Summary
The M4 E2E acceptance test is well-structured and follows the established patterns from the M1/M2/M6 test suites. The test exercises the intended M4 lifecycle (subplans, corrections, checkpoints) and the use of
tdd_expected_failwith documented TODO comments is a pragmatic approach while infrastructure bugs #1023/#1024/#1025 are open. However, three review cycles identified 15 findings across four severity levels that should be addressed before merge.Findings by Severity
CRITICAL (1)
C-1: Unresolved merge conflict marker in CHANGELOG.md
Category: Bug
File:
CHANGELOG.md:5Line 5 contains a stray
<<<<<<< HEADmerge conflict marker with no matching=======or>>>>>>>closing markers. This indicates a partially resolved merge conflict where the content was chosen but the opening marker was left behind. This will produce a malformed CHANGELOG visible to users and will cause problems in subsequent merges.Recommendation: Remove line 5 (
<<<<<<< HEAD).HIGH (5)
H-1: ULID regex captures non-decision IDs from plan tree JSON
Category: Test Flaw / Bug
File:
m4_acceptance.robot:99, 147Step 6 extracts decision IDs using:
This regex matches any ULID in the JSON output -- including the plan_id, resource IDs, actor references, or any other ULID-shaped string. The first match (
${decision_ids}[0]) is then used at line 147 as the target forplan correct. If the first ULID in the tree JSON is not a decision_id (e.g., it is the plan's own ID appearing in a header field), the correction command will fail for the wrong reason or operate on the wrong entity.The M6 test (
m6_acceptance.robot:404) uses a more targeted approach:$tree.stdout.count('"decision_id"').Recommendation: Parse the JSON and extract the value associated with the
"decision_id"key specifically, rather than matching bare ULIDs. Example:H-2: Same decision corrected twice (revert then append)
Category: Test Flaw
File:
m4_acceptance.robot:147-168Both Step 9 (revert) and Step 10 (append) target
${decision_ids}[0]-- the same decision. Per the specification,revertmode supersedes the original decision, invalidates the affected subtree, and creates a new replacement decision. After Step 9 completes, the original decision is superseded. Attemptingappendon the same (now-superseded) decision in Step 10 is semantically questionable and may produce undefined behavior or a legitimate error from the correction service.Recommendation: Either (a) use the new decision ID returned by the revert correction for the append step, or (b) use a different decision from the tree for the append correction, or (c) re-inspect the tree after revert to get the replacement decision ID.
H-3: Correction commands missing
expected_rc=Nonefor graceful failureCategory: Test Flaw
File:
m4_acceptance.robot:148-168Steps 9 and 10 use
Run CleverAgents Commandwith the defaultexpected_rc=${0}. Since the test is taggedtdd_expected_failand depends on infrastructure bugs being fixed, these correction steps are very likely to fail. When they do, the test will abort with a generic "rc mismatch" error from the process library rather than logging useful diagnostic output.Compare with the
Exercise Checkpoint Rollbackkeyword (line 239-242) which correctly usesexpected_rc=Noneand logs the result regardless of exit code. The correction steps should follow the same pattern.Recommendation: Add
expected_rc=Noneto both correction commands and add conditional logging for failure, matching the rollback keyword's pattern.H-4: Action YAML hardcodes
openai/gpt-4-- fails when only Anthropic keys availableCategory: Test Flaw
File:
m4_acceptance.robot:60-61The action YAML hardcodes
strategy_actor: openai/gpt-4andexecution_actor: openai/gpt-4. TheSkip If No LLM Keysguard (fromcommon_e2e.resource:56) checks for eitherANTHROPIC_API_KEYorOPENAI_API_KEY. If a CI or local environment has onlyANTHROPIC_API_KEY:openai/gpt-4requires an OpenAI key.tdd_expected_fail.The M6 test (
m6_acceptance.robot:29-33) handles this correctly by dynamically selecting the actor based on available keys.Additionally,
gpt-4is significantly more expensive and slower thangpt-4o-mini(used by M1). For E2E tests that primarily validate CLI plumbing rather than LLM quality, a cheaper model is more appropriate.Recommendation: Dynamically select the actor based on available API keys, following the M6 pattern. Consider using
gpt-4o-miniorclaude-sonnetfor cost and speed.H-5: No merge strategy verification (Acceptance Criteria gap)
Category: Test Coverage Gap
File:
m4_acceptance.robot(entire file)Issue #744 Acceptance Criteria state: "Test verifies merge strategy application on subplan results." The milestone description reinforces: "Results are merged back using three-way merge strategies."
The test checks for subplan-related entries in the decision tree (line 110-112) but never verifies that a merge strategy was applied. There is no assertion checking for merge-related output, merge strategy fields, or merged results.
Recommendation: Add assertions that check for merge strategy indicators in the plan output or decision tree after subplan completion. If this is not feasible with the current infrastructure, add a TODO comment documenting the gap against the AC.
MEDIUM (7)
M-1: Subplan spawning verification is diagnostic-only (AC gap, mitigated by TODOs)
Category: Test Coverage Gap
File:
m4_acceptance.robot:110-118The subplan check at lines 110-112 only logs the count of subplan matches -- it never asserts. The AC requires "Test creates a plan that spawns child subplans during execution" and "Test verifies subplan status tracking." The TODO comments at lines 113-118 document this gap and propose hardening options, which is good practice. However, as written, this AC is completely unverified.
Recommendation: Consider adding a soft assertion with a descriptive failure message (e.g.,
Run Keyword And Warn On Failure) or splitting this into a separate test case tagged appropriately.M-2: Checkpoint rollback conditionally skipped (AC gap, mitigated by TODOs)
Category: Test Coverage Gap
File:
m4_acceptance.robot:171-183If no checkpoint is produced by the execution engine, the rollback step is silently skipped with a WARN log. The AC requires "Test exercises checkpoint creation and rollback." The TODO at lines 174-179 documents the trade-off. As written, the test can pass with zero checkpoint coverage.
Recommendation: Same as M-1: consider a soft assertion or separate test case.
M-3: Git operations in Step 1 not checked for return codes
Category: Test Flaw
File:
m4_acceptance.robot:35-36These git operations do not verify return codes. If they fail silently (e.g., git config issues, disk full), the repository will lack the expected source files, causing misleading failures in later steps.
Compare with
Create Temp Git Repo(common_e2e.resource:140-151) which checks every git operation withShould Be Equal As Integers.Recommendation: Add return code checks:
M-4: Inconsistent
--formatflag usage across CLI stepsCategory: Test Quality
File:
m4_acceptance.robot(multiple lines)Steps 2-6, 7, 8, 12, 13, and 15 pass explicit
--format plainor--format json, but Steps 9, 10 (corrections) and 14 (apply) do not. Without an explicit format flag, output defaults to whatevercore.formatis configured (orrichby default), which may include ANSI escape codes or structured rendering that interferes with text-based assertions likeShould Not Contain.Recommendation: Add
--format plainto the correction and apply commands for consistency.M-5: Plan diff output not validated
Category: Test Coverage Gap
File:
m4_acceptance.robot:195-201Step 13 runs
plan diffand logs the output but performs no assertion on whether the diff contains meaningful content. The M1 test (m1_acceptance.robot:93-94) assertsShould Not Be Empty ${diff_result.stdout}.Recommendation: Add
Should Not Be Empty ${r_diff.stdout} msg=Plan diff produced no output.M-6: No correction impact verification
Category: Test Coverage Gap
File:
m4_acceptance.robot:145-169Steps 9 and 10 only check for absence of Traceback/INTERNAL and non-empty output. They do not verify that the correction actually produced meaningful results. Per the spec,
plan correctoutputs structured fields includingMode,Impact,New Decision,Corrects, andAttempt. None of these are verified.Recommendation: Add at least one structural assertion per correction step, e.g.:
M-7: Processing state regex excludes valid Apply terminal states
Category: Test Flaw
File:
m4_acceptance.robot:228-230The spec defines Apply phase terminal states as:
applied(success),constrained,errored,cancelled. The regex does not includeconstrainedorerrored. If the plan ends in one of these legitimate (if unsuccessful) terminal states, the test produces a misleading error message ("unexpected processing state") rather than properly diagnosing the actual state.Recommendation: Broaden the regex to accept all valid states:
"(queued|processing|complete|applied|constrained|errored|cancelled)", then add a separate targeted assertion that the state is specifically the expected success state.LOW (2)
L-1: Overall test timeout may be insufficient under worst-case conditions
Category: Performance / Reliability
File:
m4_acceptance.robot:25The test timeout is 15 minutes (900s). The sum of individual step timeouts is approximately 23 minutes (strategize 180s + tree 60s + execute 300s + status 60s + 2x correction 360s + rollback 120s + status 60s + diff 60s + apply 120s + final status 60s). Under worst-case LLM latency (rate limiting, provider degradation), the overall timeout could be hit before all steps complete.
Recommendation: Consider increasing the overall timeout to 20-25 minutes, or document that the 15-minute timeout assumes typical LLM response times.
L-2: Step 14 comment says "Plan apply" but command is
lifecycle-applyCategory: Code Quality
File:
m4_acceptance.robot:203, 205Line 203 says
# ---- Step 14: Plan apply ----but line 205 invokesplan lifecycle-apply. Whilelifecycle-applyis the correct v3 command, the comment could be misleading.Recommendation: Update comment to
# ---- Step 14: Plan lifecycle-apply ----.Summary Table
Methodology
This review was conducted over 3 full review cycles, each examining all categories (test coverage, test flaws, performance, bug detection, security). The review scope was strictly limited to code changes in the
test/e2e-m4-acceptancebranch plus close connections to surrounding code (common_e2e.resource, siblingm1/m2/m6_acceptance.robot,docs/specification.md,cli/commands/plan.py). Cycles continued until no new findings emerged. No security issues were identified within scope.@ -2,6 +2,7 @@## Unreleased<<<<<<< HEADC-1 (Critical): Unresolved merge conflict marker. Line 5 contains
<<<<<<< HEADwith no matching closing markers. This was left behind from a partially resolved merge conflict and needs to be removed.@ -0,0 +32,4 @@Create Directory ${repo_dir}${/}srcCreate File ${repo_dir}${/}src${/}main.py print("hello")\nCreate File ${repo_dir}${/}src${/}utils.py def helper(): return True\nRun Process git add . cwd=${repo_dir}M-3 (Medium): Git operations not checked for return codes. Unlike
Create Temp Git Repowhich checks every git operation, theseRun Processcalls silently ignore failures. AddShould Be Equal As Integerschecks.@ -0,0 +57,4 @@... name: ${ACTION_NAME}... description: M4 acceptance test with subplans and corrections... definition_of_done: Generate or modify source files with corrections applied... strategy_actor: openai/gpt-4H-4 (High): Hardcoded
openai/gpt-4fails when only Anthropic keys are available. TheSkip If No LLM Keysguard checks for EITHER key, but this action requires OpenAI specifically. Environments with onlyANTHROPIC_API_KEYwill not skip but will fail with a misleading error. The M6 test dynamically selects the actor based on available keys (m6_acceptance.robot:29-33). Also considergpt-4o-minifor cost/speed.@ -0,0 +96,4 @@Should Not Be Empty ${r_tree.stdout}Log Decision tree: ${r_tree.stdout}# Verify tree contains decision IDs${decision_ids}= Get Regexp Matches ${r_tree.stdout} [0-9A-HJ-NP-Z]{26}H-1 (High): ULID regex captures non-decision IDs. This regex matches any ULID in the JSON output (plan_id, resource_id, etc.), not just decision_id values. When used at line 147 for corrections, it may pick the wrong entity.
Recommend parsing specifically:
@ -0,0 +145,4 @@# ---- Step 9: Exercise correction — revert mode ----# Use the first decision from the tree for correction.${decision_id}= Set Variable ${decision_ids}[0]${r_correct_revert}= Run CleverAgents CommandH-3 (High): Missing
expected_rc=None. This correction command will abort with a generic RC mismatch error on failure. TheExercise Checkpoint Rollbackkeyword (line 242) correctly usesexpected_rc=Nonefor graceful failure handling. These correction steps should follow the same pattern.@ -0,0 +158,4 @@# ---- Step 10: Exercise correction — append mode ----${r_correct_append}= Run CleverAgents Command... plan correct ${decision_id}H-2 (High): Same decision corrected twice. After revert (Step 9), this decision is superseded per the spec. Appending to a superseded decision (Step 10, same
${decision_ids}[0]) is semantically questionable. Consider using a different decision or re-inspecting the tree after revert.@ -0,0 +226,4 @@... msg=Plan did not reach Apply phase after lifecycle-apply# Verify processing state is consistent (queued, processing, or terminal)Should Match Regexp ${r_final.stdout}... "processing_state"\\s*:\\s*"(queued|processing|complete|applied)"M-7 (Medium): Processing state regex excludes valid terminal states. The regex allows
queued|processing|complete|appliedbut the spec defines additional Apply terminal states:constrained,errored,cancelled. A plan in one of these states produces a misleading error message.d7d63e429a0f806827b8Response to Review Findings
All actionable findings from the review have now been addressed on branch
test/e2e-m4-acceptance.Critical / High
"decision_id"JSON keys specifically instead of matching arbitrary ULIDs fromplan treeoutput.expected_rc=Noneand log non-zero exit codes explicitly for diagnostics instead of failing opaquely.Medium
git add,git commit, andgit rev-parse.--format plainto correction andlifecycle-applycommands for output consistency.plan diffnow asserts non-empty output.Traceback, noINTERNAL, and explicit rc logging.queued,processing,complete,applied,constrained,errored,cancelled.Low / Cleanup
Plan lifecycle-applyfor accuracy.Additional context
Skip If No LLM Keysis present as the first executable step.tdd_expected_failwas previously removed because the infra bugs referenced in the test tags are now closed.Verification
master3d0cd051PR Review: !814 (Ticket #744)
Verdict: Request Changes
Reviewed latest head
fa4b377e. Earlier fixes around skip guards, dynamic actor selection, and command-failure surfacing are in place, but the test still allows several core M4 acceptance criteria to go unverified or fail without failing the suite.Critical Issues
None
Major Issues
robot/e2e/m4_acceptance.robot:138-145${subplan_count}and${merge_count}, but it never asserts that anysubplan_spawn/subplan_parallel_spawndecision exists or that a merge strategy was actually applied. This means the suite can pass without exercising M4's child-plan or merge requirements.plan tree --format jsonpayload and assert at least one subplan node plus explicit merge-related metadata/results; if the current action does not reliably decompose, change the fixture so it does.robot/e2e/m4_acceptance.robot:176-210expected_rc=Noneand only WARN on non-zero exit codes. That lets revert/append fail while the acceptance test continues, so the correction flow is not enforced.rc == 0for both corrections and assert correction-specific output such asMode,Corrects,New Decision,Impact, orCorrection applied.robot/e2e/m4_acceptance.robot:170-174,robot/e2e/m4_acceptance.robot:212-218,robot/e2e/m4_acceptance.robot:278-293Exercise Checkpoint Rollbackalso tolerates non-zero exit codes with a WARN. The PR can therefore pass without proving that checkpoint creation and rollback are functional.plan rollbackto returnrc == 0.robot/e2e/m4_acceptance.robot:257-275processing_statevalues still includequeued,processing,errored, andcancelled, solifecycle-applycan be incomplete or unsuccessful while the suite still passes.appliedor whatever the product contract guarantees for success).Minor Issues
robot/e2e/wf12_hierarchical.robot:135-141,robot/e2e/wf17_explicit_container.robot:40-46,robot/e2e/wf18_container_clone.robot:34-40Nits
None
Summary
This is closer than before, but it still does not function as a reliable M4 acceptance gate. The current test can pass without proving subplan spawning, merge-strategy application, rollback success, successful correction execution, or successful lifecycle completion.
P1-1: Subplan/merge checks now assert (not just log). Action YAML crafted to encourage decomposition. Asserts subplan/children indicators and merge/strategy keywords in decision tree. P1-2: Correction steps now require rc=0 (Fail on non-zero). Assert correction-specific output keywords (revert, correction, decision, append). P1-3: Checkpoint existence is now hard-asserted after execution. Rollback requires rc=0 (Fail on non-zero). No more conditional skip or WARN tolerance. P1-4: Added Wait Until Plan Terminal keyword (polls every 10s for up to 3min). Final status asserts specifically 'applied' — no longer accepts errored/cancelled/queued. ISSUES CLOSED: #74462d6634bfcca494eacb9Response to @brent.edwards Review (
fa4b377e)All 4 P1 findings addressed. P2 also addressed.
P1-1: Subplan/merge checks diagnostic-only
Fixed. Action YAML rewritten to encourage decomposition into 3 explicit phases. Decision tree now hard-asserts:
subplan_count > 0(matchessubplan|children|child)merge_count > 0(matchesmerge|strategy|three.way)Both fail the suite if absent.
P1-2: Correction steps tolerate failure
Fixed. Both revert and append now
Failonrc != 0with full stdout/stderr. Added correction-specific output assertions:revert,correction,supersed, ordecisionappend,correction, ordecisionP1-3: Checkpoint optional, rollback tolerates failure
Fixed.
Should Be True ${has_checkpoint} > 0Fails onrc != 0P1-4: Final status check too lenient
Fixed. Added
Wait Until Plan Terminalkeyword that pollsplan status --format jsonevery 10s for up to 3 minutes, checking"is_terminal": true. Final assertion requires specifically"processing_state": "applied"— no longer acceptsqueued,processing,errored, orcancelled.P2: Unrelated actor-preference changes in WF12/WF17/WF18
Fixed. Reverted all 3 files to master. The PR now only touches
m4_acceptance.robotandCHANGELOG.md, matching the scope of ticket #744.Additional fix (from prior CI run)
Added
agents init --force --yesat Step 0a — M4 was missing workspace initialisation, causingsqlite3.OperationalError: unable to open database fileon the first CLI command.Verification
ca494eacFollow-up review per request (excluding the previously accepted
tdd_expected_failpoint). Posting only the remaining issues.Major issues
robot/e2e/m4_acceptance.robot:151,157— subplan/merge assertions are too broad and can false-passsubplan|children|childcan pass on generic JSON structure text (children) without proving real subplan spawn/status behavior.merge|strategy|three.waycan pass on incidental keywords without proving merge-strategy application.robot/e2e/m4_acceptance.robot:194-218— append correction target can be stale after revertdecision_id_appendis selected from the pre-revertdecision_idslist. After revert, decisions/subtrees can be superseded, so append may target a stale/superseded decision.Minor issues
robot/e2e/m4_acceptance.robot(e.g. 61,111 and similarFail ... stdout=... stderr=...lines) — noisy/sensitive failure loggingrobot/e2e/m4_acceptance.robot:47,49,51— git subprocesses lack per-command timeout/killRun Process git ...calls foradd/commit/rev-parsehave no explicit timeout/on-timeout behavior.timeout=...andon_timeout=killto these calls for consistency and CI flake resistance.Response to @brent.edwards Follow-Up Review
All 4 findings addressed.
Major-1: Subplan/merge assertions too broad
Fixed. Assertions now parse JSON structure instead of matching generic keywords:
"type": "..subplan.."decision fields AND non-empty"children": [{arrays. No longer matches on the genericchildrenkey that every tree node has."type": "..merge.."decision fields AND"chosen": "..Merge.."text. No longer false-matches onstrategy_choice.Major-2: Append target stale after revert
Fixed. After revert, the test re-fetches the tree via
plan tree --format json, extracts fresh decision IDs, and selects the last one (most recent non-superseded leaf) for the append correction.Minor-3: Raw stdout/stderr in failure messages
Fixed. Added
Fail If Command Failedkeyword that truncates stdout/stderr to 500 chars. All 12 CLI steps now use this keyword instead of inlineFailwith raw output.Minor-4: Git subprocesses lack timeout
Fixed. All 3 git calls (
add,commit,rev-parse) now havetimeout=60s on_timeout=kill.Verification
2f7795a4m4_acceptance.robot+CHANGELOG.mdonlyCode Review Report — M4 E2E Acceptance Test (PR #814 / Issue #744)
Reviewer: Automated review (CoreRasurae)
Scope: Code changes in branch
test/e2e-m4-acceptanceplus close connections to surrounding code (common_e2e.resource,tdd_expected_fail_listener.py, peer E2E tests).References: Issue #744 acceptance criteria, milestone v3.3.0 definition,
docs/specification.md,CONTRIBUTING.mdTDD tag conventions.Methodology: Three full review cycles across all categories (bug detection, test flaws, test coverage, performance, security). No tests were executed.
Summary
The M4 E2E acceptance test is a well-structured, comprehensive Robot Framework test that exercises the complete M4 feature set (corrections, subplans, checkpoints) with zero mocking. The
tdd_expected_failtagging for bug #1253 is correctly applied perCONTRIBUTING.mdconventions. TheFail If Command Failedkeyword is a useful addition for CI-safe diagnostics. Overall the test follows established E2E patterns from the project.However, the review identified 9 findings (2 High, 5 Medium, 2 Low) relating to test flow logic, assertion effectiveness, and acceptance criteria coverage gaps. The high-severity items affect the test's ability to pass cleanly when
tdd_expected_failis removed after bug #1253 is fixed.HIGH Severity
H-1:
Wait Until Plan Terminalsilently continues on timeout instead of failingFile:
robot/e2e/m4_acceptance.robot:264-278Category: Test Flow / Logic
The
Wait Until Plan Terminalkeyword polls 18 times (10s sleep each, up to 3 minutes). If the plan never reaches terminal state, it only logs aWARNand returns normally (line 278). The caller then proceeds to assert"processing_state"\\s*:\\s*"applied"(lines 236-238), which fails with a misleading message ("Plan did not reach 'applied' terminal state") rather than clearly reporting that polling timed out.This makes CI failures significantly harder to debug — the root cause (timeout) is hidden behind a symptom (wrong state).
Suggested fix: Replace
Log ... WARNat line 278 with:H-2: No post-rollback state verification before lifecycle-apply
File:
robot/e2e/m4_acceptance.robot:203-238Category: Test Flow / Logic
The test flow performs: corrections (steps 9-10) → checkpoint rollback (step 11) → lifecycle-apply (step 14) → assert
applied(step 15). After rollback, there is no assertion that the plan reverted to a state compatible with lifecycle-apply. If rollback silently fails, partially completes, or produces a state that lifecycle-apply cannot operate on, subsequent steps fail with misleading errors about apply/phase state rather than about rollback issues.When bug #1253 is fixed and
tdd_expected_failis removed, this flow will be exercised for the first time end-to-end. Without post-rollback verification, failures in the apply phase will be hard to attribute to rollback vs. apply issues.Suggested fix: After step 11 (
Exercise Checkpoint Rollback), add aplan status --format jsoncheck verifying the plan is in a phase/state compatible with lifecycle-apply.MEDIUM Severity
M-1: Merge strategy assertion depends on non-deterministic LLM-generated text
File:
robot/e2e/m4_acceptance.robot:130-140Category: Assertion Effectiveness
The merge evidence check falls back to regex on the
"chosen"JSON field ("chosen"\\s*:\\s*"[^"]*[Mm]erge[^"]*"), which depends on the LLM generating text containing the word "merge." With real LLM providers this is non-deterministic — the LLM may describe a merge strategy without using that exact word. The"type"field regex is more reliable if the system defines formal merge decision types, but the OR-logic withchosenmeans the assertion can spuriously pass (LLM happens to say "merge") or fail (LLM uses synonyms like "combine," "integrate," "consolidate").Suggested fix: If a formal merge decision type exists in the system (similar to
subplan_spawn), rely exclusively on thetypefield. If not, document the fragility and consider accepting a WARN-level diagnostic instead of a hard assertion for thechosenfield fallback.M-2: Revert and append correction keyword assertions cannot distinguish correction modes
File:
robot/e2e/m4_acceptance.robot:170-173, 198-201Category: Assertion Effectiveness
The revert assertion (line 171) accepts
'decision' in $revert_loweras evidence, and the append assertion (line 199) accepts'correction' in $append_lowerand'decision' in $append_lower. The word "decision" appears in virtually any correction output regardless of mode. These assertions pass even if the wrong correction mode were applied (e.g., append instead of revert), defeating the purpose of testing both modes separately.Suggested fix: Remove the broad keywords ('decision', 'correction') from the OR conditions. Keep only mode-specific keywords:
'revert' in $revert_lower or 'supersed' in $revert_lower'append' in $append_lowerM-3: Subplan status tracking not verified (acceptance criteria gap)
File:
robot/e2e/m4_acceptance.robot:106-140Category: Acceptance Criteria Coverage
Issue #744 AC states: "Test verifies subplan status tracking (sequential and/or parallel execution)." The test verifies subplan existence via decision tree structure (type fields containing "subplan", non-empty children arrays) but does not check individual subplan execution status or confirm that subplans actually ran to completion. The assertion proves the strategy created subplan decisions but not that execution tracked their status.
M-4: No assertion on plan execute (execute phase) output
File:
robot/e2e/m4_acceptance.robot:142-146Category: Acceptance Criteria Coverage
Step 7 (
plan execute— execute phase) only verifiesrc=0and absence ofTraceback/INTERNALviaFail If Command Failed. There is no content assertion verifying the execution phase produced meaningful results or artifacts. Compare with the strategize phase (step 5) which is immediately followed by detailed tree inspection (step 6). The execute phase is a major lifecycle step and warrants at least a non-empty output check or a subsequent status verification showing the plan progressed.M-5: PR description states GEMINI_API_KEY as a prerequisite but code does not support it
File:
m4_acceptance.robot:34-40, PR descriptionCategory: Documentation Inconsistency
The PR description under "Prerequisites" states: "OPENAI_API_KEY or GEMINI_API_KEY environment variable set." However, the
Skip If No LLM Keyskeyword (common_e2e.resource:49-59) only checksANTHROPIC_API_KEYandOPENAI_API_KEY. The actor selection (lines 34-40) checksOPENAI_API_KEYwith an Anthropic fallback. A user with onlyGEMINI_API_KEYset would see the test skipped, contradicting the documented prerequisites.Suggested fix: Either add Gemini support to the actor selection logic, or correct the PR description to say "OPENAI_API_KEY or ANTHROPIC_API_KEY."
LOW Severity
L-1:
Exercise Checkpoint Rollbackkeyword has no output content assertionFile:
robot/e2e/m4_acceptance.robot:253-262Category: Assertion Effectiveness
The rollback keyword only verifies
rc=0and absence ofTraceback/INTERNAL. Unlike the correction steps (steps 9-10) which check for mode-specific keywords in the output, the rollback keyword has no content assertion confirming that rollback-specific actions occurred (e.g., confirmation text, checkpoint reference in output).L-2: Post-revert decision selection relies on positional assumption about JSON ordering
File:
robot/e2e/m4_acceptance.robot:187-189Category: Test Robustness
The comment on line 187 says "Use the last decision (most likely a non-superseded leaf)" — the phrase "most likely" acknowledges uncertainty about the tree output ordering. If the
plan tree --format jsonoutput does not guarantee that non-superseded decisions appear last in the decision ID extraction order, the test could select a superseded decision for the append correction, causing a misleading failure.Not Flagged
The following areas were reviewed and found to be satisfactory:
[Tags] E2E tdd_issue tdd_issue_1253 tdd_expected_failset is correct perCONTRIBUTING.mdtag validation rules.timeout=60s on_timeout=killper the commit message for Minor-4.--format plainand--format jsonare used appropriately throughout.@ -0,0 +131,4 @@${merge_type_matches}= Get Regexp Matches ${r_tree.stdout}... "type"\\s*:\\s*"[^"]*merge[^"]*"${merge_chosen_matches}= Get Regexp Matches ${r_tree.stdout}... "chosen"\\s*:\\s*"[^"]*[Mm]erge[^"]*"M-1: The
chosenfield fallback ([Mm]ergein LLM-generated text) is non-deterministic with real LLM providers. The LLM may describe a merge strategy using synonyms like "combine" or "integrate" without the word "merge". If a formal merge decision type exists in the system, rely exclusively on thetypefield check.@ -0,0 +168,4 @@Fail If Command Failed ${r_correct_revert} plan correct (revert)# Assert correction-specific output${revert_lower}= Evaluate ($r_correct_revert.stdout).lower()${has_revert_kw}= Evaluate 'revert' in $revert_lower or 'correction' in $revert_lower or 'supersed' in $revert_lower or 'decision' in $revert_lowerM-2: The keyword
'decision' in $revert_lowermatches virtually any correction output regardless of mode. This assertion passes even if the wrong correction mode were applied (e.g., append instead of revert). Consider keeping only revert-specific keywords:'revert' in $revert_lower or 'supersed' in $revert_lower.@ -0,0 +196,4 @@... timeout=180s expected_rc=NoneFail If Command Failed ${r_correct_append} plan correct (append)${append_lower}= Evaluate ($r_correct_append.stdout).lower()${has_append_kw}= Evaluate 'append' in $append_lower or 'correction' in $append_lower or 'decision' in $append_lowerM-2: Same issue as the revert assertion —
'correction' in $append_lower or 'decision' in $append_lowermatches any correction output regardless of mode. Consider keeping only:'append' in $append_lower.@ -0,0 +201,4 @@... msg=Append correction output should contain correction-specific keywords# ---- Step 11: Exercise checkpoint rollback ----Exercise Checkpoint Rollback ${plan_id} ${checkpoint_matches}[0]H-2: After checkpoint rollback, the test proceeds directly to lifecycle-apply (step 14) and expects 'applied' state (step 15) without verifying that rollback restored a state compatible with lifecycle-apply. When bug #1253 is fixed and
tdd_expected_failis removed, this flow will be exercised end-to-end for the first time. Without post-rollback state verification, apply-phase failures will be hard to distinguish from rollback issues.Suggested fix: Add a
plan status --format jsoncheck after this line verifying the plan is in a phase/state compatible with lifecycle-apply.@ -0,0 +275,4 @@IF ${terminal_count} > 0 RETURNSleep 10s Waiting for plan to reach terminal state...ENDLog Plan did not reach terminal state within 3 minutes WARNH-1: This keyword silently continues on timeout instead of failing. When the plan never reaches terminal state, only a WARN is logged and execution continues to subsequent assertions, which fail with misleading error messages about plan state rather than clearly reporting the polling timeout.
Suggested fix: Replace this line with:
H-1: Wait Until Plan Terminal now Fails on timeout (not WARN). H-2: Added post-rollback state verification (asserts not errored/cancelled before lifecycle-apply). M-2: Correction assertions tightened to mode-specific keywords only (revert/supersed for revert, append for append). M-4: Added non-empty output assertion on execute phase. L-1: Added non-empty output assertion on rollback. L-2: Documented JSON ordering assumption for post-revert decision selection. M-1 (merge LLM-dependent) and M-3 (subplan status tracking) are valid but require infrastructure not yet available. M-5 (GEMINI in PR description) is a docs fix, not code. ISSUES CLOSED: #744Response to @CoreRasurae Review
7 of 9 findings addressed in code. 2 acknowledged as valid but out of scope.
High
Wait Until Plan TerminalnowFails on timeout instead ofLog WARN. CI will clearly report "polling timed out" rather than a misleading state assertion.plan status --format jsoncheck (Step 11b). Asserts plan is NOT inerroredorcancelledstate before proceeding to lifecycle-apply.Medium
"type"field check is reliable; the"chosen"text fallback is inherently LLM-dependent. Kept as-is with OR logic — removing thechosenfallback would make the assertion fail when the system produces merge results described in natural language rather than typed decisions.'revert' or 'supersed'. Append now asserts only'append'.plan treeoutput).Should Not Be Empty ${r_exec.stdout}after execute phase.Low
Should Not Be Empty ${r_rollback.stdout}to rollback keyword."superseded": false) if ordering changes.Verification
db1d9fbd🤖 Backlog Groomer (groomer-1): Closing as duplicate of #744.
Issue #744 (
test(e2e): E2E acceptance criteria for M4 (v3.3.0) — corrections, subplans, checkpoints) is the canonical version with full labels (MoSCoW/Must have,Priority/Critical,State/In Review,Type/Testing) and milestonev3.3.0. This issue is an exact title duplicate.Pull request closed