Increase merge throughput to master with merge-chain batching #9757

Open
opened 2026-04-15 14:48:55 +00:00 by brent.edwards · 4 comments
Member

Metadata

  • Commit Message: feat(merge): add merge-chain batching workflow
  • Branch: epic/merge-chain-batching

Background and context

CleverAgents is behind its milestone schedule, and the gap is growing.

At the time of writing, #9474 reports a large backlog for milestone v3.2.0. Jeff estimated that the current system can land at most one pull request approximately every 45 minutes. At that rate, the system cannot reduce the existing backlog fast enough even if no additional bugs are found.

The main bottleneck is the final merge-to-master cycle. Individual pull requests can run CI in parallel, but each candidate still has to be updated against the latest master, validated again, and then merged in sequence. That serializes throughput at the point where changes actually land.

This Epic proposes a merge-chain batching capability. The idea is to group individually validated pull requests into temporary integration branches, validate those batches against the latest master, and then land them as a unit if they pass. Failed batches are split into smaller batches and retried until the conflicting or unstable change is isolated.

This is Epic-level work because it is not a single atomic change. It requires design, orchestration behavior, queue handling, failure splitting, observability, and end-to-end validation across multiple child issues.

Expected behavior

When this Epic is complete, the project should have a working merge-chain capability that increases merge throughput to master without sacrificing CI validation.

The completed capability should provide:

  • an open merge-chain that collects pull requests which have already passed individual CI
  • a FIFO queue of closed merge-chains waiting to be validated against current master
  • a process that updates the queue head against current master and runs CI for the combined batch
  • a mechanism that lands a successful merge-chain into master
  • a failure-handling mechanism that:
    • routes single-PR failures back for investigation
    • splits multi-PR failures into smaller merge-chains and retries them
  • enough visibility and documentation that operators can understand what the queue is doing and why

The outcome must be demonstrable end-to-end: reviewers should be able to observe pull requests entering merge-chains, chains being validated, successful chains landing, and failed chains being split and retried.

Acceptance criteria

  • A merge-chain workflow is defined and implemented as a first-class capability rather than an informal manual process.
  • The system supports an open merge-chain that accepts pull requests which have already passed individual CI.
  • The system supports a FIFO queue of closed merge-chains awaiting validation against the latest master.
  • Only the queue head is validated against current master at any given time.
  • A successful merge-chain can be merged into master as a batch.
  • If a merge-chain containing exactly one pull request fails after revalidation, that pull request is surfaced for investigation rather than retried indefinitely.
  • If a merge-chain containing multiple pull requests fails, it is split into two smaller merge-chains and those chains are retried.
  • The system preserves clear ordering and state transitions for open chains, closed queued chains, active validation, success, and failure.
  • Operators and reviewers can observe the merge-chain state clearly enough to diagnose stuck, failing, or repeatedly split chains.
  • Documentation explains the workflow, terminology, and failure behavior.
  • The capability is demonstrated end-to-end with realistic example scenarios showing both successful and failing chains.
  • The completed system measurably improves overall merge throughput versus the current strictly serialized final merge cycle.

Supporting information

Child issues to create

  • Define the merge-chain state model, terminology, and queue lifecycle.
  • Implement open-chain collection for pull requests that pass individual CI.
  • Implement closed-chain queueing and queue-head selection.
  • Implement merge-chain revalidation against current master.
  • Implement batch landing of successful merge-chains into master.
  • Implement failure handling for single-PR merge-chains.
  • Implement recursive split-and-retry handling for failed multi-PR merge-chains.
  • Add visibility/logging/status reporting for merge-chain state transitions.
  • Add integration tests covering successful chain execution.
  • Add integration tests covering failing chains and split behavior.
  • Document the merge-chain workflow for maintainers and contributors.

Subtasks

  • Refine the merge-chain capability into concrete child issues.
  • Verify that the planned child issues are sufficient to deliver the full demonstrated capability.
  • Link all child issues to this Epic using Forgejo dependency links.
  • Review scope to ensure this Epic remains bounded and demonstrable.
  • Prepare implementation sequencing for the child issues.

Definition of Done

This Epic is complete when:

  • All child issues required for the merge-chain capability are completed and closed.
  • The merge-chain workflow is implemented and can be demonstrated end-to-end.
  • The Epic's acceptance criteria above are independently verified, not merely inferred from child issue closure.
  • Documentation for the merge-chain workflow is complete and current.
  • A Git commit is created where the first line of the commit message matches the Commit Message in Metadata exactly, followed by a blank line and any additional explanatory body text.
  • The commit is pushed to the remote on the branch matching the Branch in Metadata exactly.
  • The commit is submitted as a pull request to master, reviewed, and merged.
  • The Epic is then closed permanently; any follow-on expansion work is tracked in a new Epic rather than reopening this one.
## Metadata - **Commit Message**: `feat(merge): add merge-chain batching workflow` - **Branch**: `epic/merge-chain-batching` ## Background and context CleverAgents is behind its milestone schedule, and the gap is growing. At the time of writing, [#9474](https://git.cleverthis.com/cleveragents/cleveragents-core/issues/9474) reports a large backlog for milestone `v3.2.0`. Jeff estimated that the current system can land at most one pull request approximately every 45 minutes. At that rate, the system cannot reduce the existing backlog fast enough even if no additional bugs are found. The main bottleneck is the final merge-to-`master` cycle. Individual pull requests can run CI in parallel, but each candidate still has to be updated against the latest `master`, validated again, and then merged in sequence. That serializes throughput at the point where changes actually land. This Epic proposes a merge-chain batching capability. The idea is to group individually validated pull requests into temporary integration branches, validate those batches against the latest `master`, and then land them as a unit if they pass. Failed batches are split into smaller batches and retried until the conflicting or unstable change is isolated. This is Epic-level work because it is not a single atomic change. It requires design, orchestration behavior, queue handling, failure splitting, observability, and end-to-end validation across multiple child issues. ## Expected behavior When this Epic is complete, the project should have a working merge-chain capability that increases merge throughput to `master` without sacrificing CI validation. The completed capability should provide: - an **open merge-chain** that collects pull requests which have already passed individual CI - a FIFO queue of **closed merge-chains** waiting to be validated against current `master` - a process that updates the queue head against current `master` and runs CI for the combined batch - a mechanism that lands a successful merge-chain into `master` - a failure-handling mechanism that: - routes single-PR failures back for investigation - splits multi-PR failures into smaller merge-chains and retries them - enough visibility and documentation that operators can understand what the queue is doing and why The outcome must be demonstrable end-to-end: reviewers should be able to observe pull requests entering merge-chains, chains being validated, successful chains landing, and failed chains being split and retried. ## Acceptance criteria - [ ] A merge-chain workflow is defined and implemented as a first-class capability rather than an informal manual process. - [ ] The system supports an **open merge-chain** that accepts pull requests which have already passed individual CI. - [ ] The system supports a FIFO queue of **closed merge-chains** awaiting validation against the latest `master`. - [ ] Only the queue head is validated against current `master` at any given time. - [ ] A successful merge-chain can be merged into `master` as a batch. - [ ] If a merge-chain containing exactly one pull request fails after revalidation, that pull request is surfaced for investigation rather than retried indefinitely. - [ ] If a merge-chain containing multiple pull requests fails, it is split into two smaller merge-chains and those chains are retried. - [ ] The system preserves clear ordering and state transitions for open chains, closed queued chains, active validation, success, and failure. - [ ] Operators and reviewers can observe the merge-chain state clearly enough to diagnose stuck, failing, or repeatedly split chains. - [ ] Documentation explains the workflow, terminology, and failure behavior. - [ ] The capability is demonstrated end-to-end with realistic example scenarios showing both successful and failing chains. - [ ] The completed system measurably improves overall merge throughput versus the current strictly serialized final merge cycle. ## Supporting information - Backlog and schedule context: [#9474](https://git.cleverthis.com/cleveragents/cleveragents-core/issues/9474) - Original draft source: https://wiki.cleverthis.com/en/CleverAgent/merge-process ## Child issues to create - [ ] Define the merge-chain state model, terminology, and queue lifecycle. - [ ] Implement open-chain collection for pull requests that pass individual CI. - [ ] Implement closed-chain queueing and queue-head selection. - [ ] Implement merge-chain revalidation against current `master`. - [ ] Implement batch landing of successful merge-chains into `master`. - [ ] Implement failure handling for single-PR merge-chains. - [ ] Implement recursive split-and-retry handling for failed multi-PR merge-chains. - [ ] Add visibility/logging/status reporting for merge-chain state transitions. - [ ] Add integration tests covering successful chain execution. - [ ] Add integration tests covering failing chains and split behavior. - [ ] Document the merge-chain workflow for maintainers and contributors. ## Subtasks - [ ] Refine the merge-chain capability into concrete child issues. - [ ] Verify that the planned child issues are sufficient to deliver the full demonstrated capability. - [ ] Link all child issues to this Epic using Forgejo dependency links. - [ ] Review scope to ensure this Epic remains bounded and demonstrable. - [ ] Prepare implementation sequencing for the child issues. ## Definition of Done This Epic is complete when: - All child issues required for the merge-chain capability are completed and closed. - The merge-chain workflow is implemented and can be demonstrated end-to-end. - The Epic's acceptance criteria above are independently verified, not merely inferred from child issue closure. - Documentation for the merge-chain workflow is complete and current. - A Git commit is created where the **first line** of the commit message matches the Commit Message in Metadata exactly, followed by a blank line and any additional explanatory body text. - The commit is pushed to the remote on the branch matching the **Branch** in Metadata exactly. - The commit is submitted as a **pull request** to `master`, reviewed, and **merged**. - The Epic is then closed permanently; any follow-on expansion work is tracked in a new Epic rather than reopening this one.
brent.edwards added this to the v3.2.0 milestone 2026-04-15 14:48:55 +00:00
brent.edwards changed title from Speeding up the merge process to Increase merge throughput to master with merge-chain batching 2026-04-15 15:14:30 +00:00
Owner

Thank you for creating this Epic, @brent.edwards. This has been received and is being processed by the autonomous agent system.

Triage Summary

Issue: #9757 — Increase merge throughput to master with merge-chain batching
Type: Epic
Priority: High
Milestone: v3.2.0
State: Unverified

Assessment

This Epic addresses a critical bottleneck in the project delivery pipeline. With v3.2.0 at approximately 22% completion and a large backlog of open issues, improving merge throughput is a high-leverage intervention.

The proposed merge-chain batching approach is well-specified with:

  • Clear acceptance criteria (11 items)
  • Defined failure handling semantics
  • Explicit Definition of Done
  • Concrete subtask breakdown

Next Steps

The following actions will be taken by the autonomous agent system:

  1. Verification: The Epic scope and acceptance criteria will be reviewed for completeness and feasibility.
  2. Child Issue Creation: Concrete child issues will be created from the subtask list in the Epic description.
  3. Dependency Linking: Child issues will be linked to this Epic via Forgejo dependency links.
  4. Implementation Sequencing: A sequenced implementation plan will be prepared.

The Epic will transition from State/Unverified to State/Verified once the verification step is complete.

If you have additional context or constraints to add before verification begins, please comment on this issue.


Automated by CleverAgents Bot
Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor

Thank you for creating this Epic, @brent.edwards. This has been received and is being processed by the autonomous agent system. ## Triage Summary **Issue**: #9757 — Increase merge throughput to `master` with merge-chain batching **Type**: Epic **Priority**: High **Milestone**: v3.2.0 **State**: Unverified ## Assessment This Epic addresses a critical bottleneck in the project delivery pipeline. With v3.2.0 at approximately 22% completion and a large backlog of open issues, improving merge throughput is a high-leverage intervention. The proposed merge-chain batching approach is well-specified with: - Clear acceptance criteria (11 items) - Defined failure handling semantics - Explicit Definition of Done - Concrete subtask breakdown ## Next Steps The following actions will be taken by the autonomous agent system: 1. **Verification**: The Epic scope and acceptance criteria will be reviewed for completeness and feasibility. 2. **Child Issue Creation**: Concrete child issues will be created from the subtask list in the Epic description. 3. **Dependency Linking**: Child issues will be linked to this Epic via Forgejo dependency links. 4. **Implementation Sequencing**: A sequenced implementation plan will be prepared. The Epic will transition from `State/Unverified` to `State/Verified` once the verification step is complete. If you have additional context or constraints to add before verification begins, please comment on this issue. --- **Automated by CleverAgents Bot** Supervisor: Human Liaison | Agent: human-liaison-pool-supervisor
Owner

🏷️ Triage Decision — [AUTO-OWNR-1]

Status: Verified

Issue Type: Epic
Milestone: v3.2.0
MoSCoW: Should Have — Important throughput improvement but not blocking milestone completion
Priority: High (already set)

Rationale: This epic addresses merge throughput for the master branch, which is important for development velocity during the v3.2.0 milestone. It's a Should Have because while it improves efficiency, the core milestone features (decisions, validations, invariants) can proceed without it.

Action: Transitioning to State/Verified. Labels updated.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: project-owner-pool-supervisor

## 🏷️ Triage Decision — [AUTO-OWNR-1] **Status:** ✅ Verified **Issue Type:** Epic **Milestone:** v3.2.0 **MoSCoW:** Should Have — Important throughput improvement but not blocking milestone completion **Priority:** High (already set) **Rationale:** This epic addresses merge throughput for the master branch, which is important for development velocity during the v3.2.0 milestone. It's a Should Have because while it improves efficiency, the core milestone features (decisions, validations, invariants) can proceed without it. **Action:** Transitioning to State/Verified. Labels updated. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: project-owner-pool-supervisor
Owner

[AUTO-OWNR-1] Triage complete.\n\nVerified — Valid Epic. Merge-chain batching is a critical infrastructure improvement for v3.2.0 milestone completion. This Epic already has correct labels (MoSCoW/Should have, Priority/High, Type/Epic).\n\n- Type: Epic\n- Priority: High — directly impacts milestone completion velocity\n- MoSCoW: Should Have — important for throughput but not blocking core acceptance criteria\n- Milestone: v3.2.0 — merge infrastructure improvement\n- Action: Transitioning from State/Unverified → State/Verified\n\n---\nAutomated by CleverAgents Bot\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\nAutomated by CleverAgents Bot\nAgent: automation-tracking-manager

[AUTO-OWNR-1] Triage complete.\n\n**Verified** ✅ — Valid Epic. Merge-chain batching is a critical infrastructure improvement for v3.2.0 milestone completion. This Epic already has correct labels (MoSCoW/Should have, Priority/High, Type/Epic).\n\n- **Type**: Epic\n- **Priority**: High — directly impacts milestone completion velocity\n- **MoSCoW**: Should Have — important for throughput but not blocking core acceptance criteria\n- **Milestone**: v3.2.0 — merge infrastructure improvement\n- **Action**: Transitioning from State/Unverified → State/Verified\n\n---\n**Automated by CleverAgents Bot**\nSupervisor: Project Owner | Agent: project-owner-pool-supervisor\n\n---\n**Automated by CleverAgents Bot**\nAgent: automation-tracking-manager
HAL9000 modified the milestone from v3.2.0 to v3.5.0 2026-04-16 07:09:51 +00:00
Owner

Triage Decision

Status: Verified
Type: Epic
MoSCoW: Should Have
Priority: High
Milestone: v3.5.0 (moved from v3.2.0)
Points: 13 (revised from 8 — Epic-level work with 11 child issues)

Rationale: The merge-chain batching Epic is well-defined with clear acceptance criteria and addresses a real throughput bottleneck; milestone moved from v3.2.0 (overdue) to v3.5.0 where this infrastructure work fits the Autonomy Hardening theme, and points revised upward to reflect the 11 child issues required.


Automated by CleverAgents Bot
Supervisor: Project Owner | Agent: [AUTO-OWNR-1]

## Triage Decision **Status**: Verified **Type**: Epic **MoSCoW**: Should Have **Priority**: High **Milestone**: v3.5.0 (moved from v3.2.0) **Points**: 13 (revised from 8 — Epic-level work with 11 child issues) **Rationale**: The merge-chain batching Epic is well-defined with clear acceptance criteria and addresses a real throughput bottleneck; milestone moved from v3.2.0 (overdue) to v3.5.0 where this infrastructure work fits the Autonomy Hardening theme, and points revised upward to reflect the 11 child issues required. --- **Automated by CleverAgents Bot** Supervisor: Project Owner | Agent: [AUTO-OWNR-1]
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
cleveragents/cleveragents-core#9757
No description provided.