Hive: A Lightweight Multi-Agent Orchestrator


Spec driven develoment is the future of software engineering. Most of our time will be spent clarifying intent and goals, giving strong LLMs just enough detail to iron out the rest of the ambiguity and delegate to weaker models.

The arc of producing software is going to look like the arc of producing food. To draw a rough analogy:

  • pre-analog era of programming -> small-scale/subsistence farming
  • pre-LLM era -> harnessing the power of animals, wind, water
  • post-LLM era -> industrial scale agriculture powered by fossil fuel

Below is a teaser of the spec I used to build hive, a multi-agent orchestration framework. It is heavily inspired by some of the great ideas in Gas Town, but diverges on some fundamental design choices to make implementation easier.

This spec is a peek into what the future of software engineering will look like. The scale is dizzying. And like with many revolutions of the past -- the only way out is through.


If you are in a rush, I encourage you to look at the architecture diagram and prompting guidelines. I find these to be the highest levers in the system.

Copying and pasting the below alongside the gastown repo should get you far, if you so choose to try and implement these ideas yourself.


The Problem

Coordinating 20-30 AI coding agents across multiple git repositories is a hard problem. Existing systems like Gas Town solve it with a sophisticated stack: Go CLI, Dolt SQL server, git-backed JSONL sync, distributed hash-based IDs, multi-level databases, and Claude Code instances managed via tmux.

That stack is powerful, but it's designed for a world where work data must be conflict-free across disconnected clones, multiple tiers of databases must merge without coordination, and agents interact with the system exclusively through CLI shelling.

Hive trades those constraints for simpler ones:

  • A single SQLite database is the source of truth (no distributed sync)
  • An HTTP server API replaces CLI-driven agent management
  • The orchestrator is a single process that owns the DB and the agent lifecycle

The result preserves the best abstractions — the DAG-based ready queue, push-based execution, multi-step workflows, and the capability ledger — while dramatically reducing infrastructure complexity.


Architecture at a Glance

        ┌────────────────────────────────────────────────┐
                          LLM Server                    
                                                        
          ┌─────────────────────────────────────────┐   
           Queen Bee session (user-facing TUI/web)    
              human chats here                       
              has tool access to `hive` CLI          
          └────────────┬────────────────────────────┘   
                        hive create / hive status /   
                                                       
          ┌─────────────────────────────────────────┐   
                   SQLite DB (WAL mode)               
          └────────────┬────────────────────────────┘   
                                                       
          ┌────────────┴────────────────────────────┐   
                   Orchestrator (headless)            
            Work Scheduler · Agent Manager · SSE      
            Permission Unblocker · Merge Queue        
          └────────────┬────────────────────────────┘   
                        spawns/monitors                
          ┌────────────┴─────────────────────────────┐  
           Worker session A   (ephemeral, per-issue)  
           Worker session B   (ephemeral, per-issue)  
           Refinery session   (persistent, merges)    
           ...                                        
          └──────────────────────────────────────────┘  
        └────────────────────────────────────────────────┘
                                  
               git worktree   git worktree
               (worker A)     (worker B)

Components

Component Responsibility
Queen Bee (LLM) The user-facing strategic brain. The human chats with the Queen Bee, who interprets requests, decomposes them into work items, monitors workers, and handles escalations — all via CLI tool calls.
Orchestrator The headless worker pool manager. Polls the ready queue, spawns workers in git worktrees, monitors completion via SSE, handles permissions, processes the merge queue. Does NOT interact with the user.
Refinery (LLM) The merge processor. Easy rebases go through mechanically. Conflicts and integration failures get reasoned about by the Refinery. A persistent long-lived session.
Workers (LLM) Ephemeral coding agents. One per issue. Implement, test, commit. Spawned on demand, destroyed on completion.
SQLite DB Single source of truth for all work items, dependencies, agent state, and events.
Git Worktrees Per-agent sandboxes. Each agent gets an isolated working directory.

The Core Insight: Deterministic vs. Ambiguous

The system has a clean separation of concerns. The orchestrator handles everything that SQL and HTTP can handle. Everything that requires reasoning about ambiguity goes to an LLM agent.

Concern Who Handles It Why
Ready queue computation Orchestrator (SQL) Deterministic graph query — no judgment needed
Atomic task claiming Orchestrator (SQL CAS) Database operation — no judgment needed
Session lifecycle Orchestrator (HTTP) Mechanical — no judgment needed
Health checks, staleness detection Orchestrator (timer + SSE) Threshold-based — no judgment needed
"Build me an auth system" → concrete issues Queen Bee (LLM) Requires understanding user intent and codebase
Monitoring progress and reporting to user Queen Bee (LLM) Requires judgment about what's relevant
Handling escalations from stuck workers Queen Bee (LLM) Reads failure details, decides to retry/rephrase/ask user
Resolving merge conflicts Refinery (LLM) Requires understanding code semantics
Deciding if a test failure is pre-existing Refinery (LLM) Requires reading test output and understanding context
Implementing a feature / fixing a bug Worker (LLM) The actual coding work

This is the ZFC principle (Zero decisions in code, all judgment calls go to models) applied selectively. The orchestrator is deliberately dumb. The LLMs handle everything that requires reasoning.


Three Agents, Three Lifecycles

Queen Bee: The User's Interface

The Queen Bee is a user-facing LLM session. The human chats with the Queen Bee to request work, ask questions, and monitor progress. The Queen Bee decomposes requests into concrete work items via CLI tool calls.

Human ←→ Queen Bee (interactive session)
            
             "build me an auth system with JWT and rate limiting"
            
             Queen Bee explores codebase, asks clarifying questions, then:
               $ hive create "Design auth middleware architecture" --priority 1
               $ hive create "Implement JWT validation middleware" --priority 2
               $ hive create "Add rate limiting to auth endpoints" --priority 2
               $ hive create "Write integration tests for auth flow" --priority 2
            
         SQLite DB
            
            
Orchestrator (headless daemon)
   Queries ready queue  issue is ready
   Creates worktree, spawns worker session, claims issue
  
Worker "toast" (headless LLM session)
   Implements the feature
   Commits, signals completion
  
Orchestrator
   Observes completion via SSE
   Marks issue 'done', enqueues to merge queue
   Queries ready queue  next issue now unblocked
   Spawns next worker
  ... cycle continues ...
  
Refinery (headless LLM session)
   Picks up merge queue entries
   Clean rebase? Push through mechanically.
   Conflict? Reason about the code, resolve it.
   Tests fail? Diagnose: pre-existing or introduced?
   On success: issue finalized, worktree torn down
  
main branch updated

The Queen Bee and orchestrator are decoupled — they share the SQLite DB but never communicate directly.

Workers: Ephemeral Coding Agents

One worker per issue. Each gets a fresh session with an isolated git worktree. The worker implements, tests, commits, and signals completion. The orchestrator monitors via SSE events and handles the lifecycle.

Workers are propulsive — there is no idle worker pool. Workers exist because work exists. When work completes, the worker is destroyed. No "are you still working?" heartbeats. The SSE event stream provides real-time status.

Refinery: The Merge Processor

A persistent, long-lived session that processes the merge queue. Unlike workers (ephemeral, one per issue), the Refinery stays alive across merges, accumulating project context.

Two-tier architecture:

  • Tier 1 — Mechanical merge (no LLM): Attempt git rebase main, run tests, git merge --ff-only. If all succeed — done. No LLM cost.
  • Tier 2 — Refinery LLM: If rebase conflicts or tests fail, hand to the Refinery session for reasoning. The Refinery resolves conflicts, diagnoses test failures, and reports back.

Ephemeral Agents, Durable Events

Gas Town decomposed agent state into three layers (Identity, Sandbox, Session) with independent lifecycles. We originally kept the same decomposition, but in practice this seems unnecessary.

Agents are ephemeral. They exist only while executing work:

Layer Gas Town Hive
Identity Permanent bead, CV chain Ephemeral row in agents table; deleted after merge/cleanup
Sandbox Git worktree, managed by CLI Git worktree, managed by orchestrator
Session Claude Code in tmux LLM session via HTTP API

The agent ID serves as a correlation key during execution (linking events, notes, and merge entries within a single run) but has no meaningful identity beyond that. The model field is denormalized onto key events (worker_started, completed, incomplete) so the events table is self-contained for all analytics queries — no join to agents needed.

Agent rows are deleted after successful merge and purged on daemon startup. The agent_id columns in events, notes, and merge_queue are retained as correlation keys but have no foreign key constraint — events outlive agents by design.

State does not live in the context window. The context is disposable working memory. Durable state lives in the database (issues, events) and git (code, branches). When any agent cycles, it reads its state from the DB on startup.


The Ready Queue: DAG-Based Scheduling

This is the single most important idea in the system. The dependency graph is the scheduler.

Work items have explicit dependencies. An item is "ready" when all its blockers are resolved. No scheduler service. No priority queue. No task router. When a blocking issue completes, its dependents automatically become ready on the next query.

         ┌───────────┐
           Design     ready (no blockers)
         └─────┬─────┘
                blocks
         ┌─────▼─────┐
          Implement   blocked until Design is done
         └─────┬─────┘
                blocks
         ┌─────▼─────┐
            Test      blocked until Implement is done
         └───────────┘

Claiming a task is an atomic compare-and-swap operation — real CAS, no optimistic locking. Dependency cycle detection is a live query. "What's in flight?" is a single SELECT.

Issue Status State Machine

open → in_progress → done → finalized
                      ↓
                    failed (retryable)
open|in_progress → blocked (explicit)
* → canceled
* → escalated (human needed)

The donefinalized split is important: it creates a clean handoff boundary. "Done" means "I'm finished coding." "Finalized" means "it's merged and verified on main." Workers produce done. The Refinery produces finalized.


Multi-Step Workflows (Molecules)

Molecules are parent issues with child step-issues linked by blocking dependencies. Each step is a trackable work item. When a step completes, the orchestrator queries for the next ready step within the molecule and session-cycles the agent onto it — fresh context, same sandbox.

Molecule: "Build auth system"
  ├── Step 1: Design (ready immediately)
  ├── Step 2: Implement (blocked by Step 1)
  ├── Step 3: Test (blocked by Step 2)
  └── Step 4: Review (blocked by Step 3)

The agent keeps its git worktree across steps (code accumulates) but gets a fresh LLM context for each step (no context window bloat). The session cycles, the sandbox persists.


Prompt Engineering: Lessons from Production

Effective agent prompts need much more than a task description. They encode behavioral contracts, anti-patterns to avoid, and operational procedures. These patterns were learned from running agents at scale:

Pattern What It Does Why It Matters
The Idle Heresy Hammers home that agents must NEVER wait for approval LLMs naturally want to pause and confirm. Without strong anti-idle conditioning, agents stall after completing work.
The Approval Fallacy "There is no approval step. When work is done, you act." Same root cause — LLMs seek confirmation. The prompt explicitly forbids it.
Directory Discipline "Stay in YOUR worktree. NEVER edit files in the root." Agents drift out of their sandbox and lose work.
Propulsion Principle "If you find something on your hook, YOU RUN IT." Prevents the agent from announcing itself and waiting for instructions.
Escalation Protocol "When blocked, escalate. Do NOT wait for human input." Without this, agents hang indefinitely on ambiguous requirements.
Capability Ledger Motivation "Your work is visible. Your model's track record grows with every completion." Appeal to self-interest in reputation drives quality.

Health Checks and Failure Recovery

Lease-Based Staleness

Every agent assignment gets a lease_expires_at timestamp. The orchestrator extends the lease whenever it observes progress (new tokens, tool completions, status transitions). If the lease expires with no progress, the agent is considered stalled.

Escalation Chain

Worker (blocked) → Orchestrator (mechanical retries) → Queen Bee (LLM reasoning) → Human
  1. Worker signals a blocker via structured completion
  2. Orchestrator retries mechanically (same issue, fresh session)
  3. After N retries, orchestrator marks the issue as failed
  4. Queen Bee sees the failure via hive status / hive logs, discusses with the human
  5. Human and Queen Bee decide: rephrase, decompose further, or abandon

No special escalation protocol. The Queen Bee is already in conversation with the human — it just brings up the failure naturally.

Degraded Mode

When a critical dependency becomes unreachable, the orchestrator enters degraded mode rather than crashing. It stops dispatching new work, keeps a recovery loop with exponential backoff, continues database sweeps to reconcile state, and preserves in-flight agents. When the dependency recovers, the orchestrator reconciles and resumes.

Crash Recovery

On restart, the orchestrator reconciles database state with reality: finds agents whose sessions are gone (mark stalled, unassign work), finds orphan sessions with no database agent (kill them). The event log is the source of truth — status columns are a denormalized cache that can always be rebuilt by replaying events.


Inter-Agent Knowledge Transfer (Notes)

Workers are ephemeral — each gets a fresh session with no memory of what previous workers discovered. The notes system bridges this gap.

Workers write discoveries, gotchas, and patterns to a file in their worktree. The orchestrator harvests these on completion and stores them in the database. Future workers get relevant notes injected into their prompts.

This is the 80/20 of a full inter-agent mail protocol: context injection without routing, addressing, or inbox complexity. No agent-to-agent messaging — just a shared knowledge base that the orchestrator mediates.


Capability Ledger (Model-Based Analytics)

Agents are ephemeral — they're created for a single issue and deleted after merge. The unit of analysis for performance tracking is model × issue type → outcome, not agent identity.

The model is denormalized onto key events so the events table is self-contained for all analytics queries. "Which model is best at Go work?" becomes a SQL query over the events table joined with issue metadata — no join to the (ephemeral) agents table needed.

SELECT
    json_extract(e.detail, '$.model') as model,
    i.type,
    COUNT(*) FILTER (WHERE e.event_type = 'completed') as successes,
    COUNT(*) FILTER (WHERE e.event_type IN ('incomplete', 'failed')) as failures
FROM events e
JOIN issues i ON e.issue_id = i.id
WHERE e.event_type IN ('completed', 'incomplete')
  AND json_extract(e.detail, '$.model') IS NOT NULL
GROUP BY model, i.type;

What We Keep and What We Drop

The design is a deliberate trade: keep the ideas that matter, drop the infrastructure that served different constraints.

What We Keep

Concept Why
DAG-based ready queue The dependency graph is the scheduler. Elegant, correct, zero infrastructure.
Ephemeral agents No persistent identity needed. Agents exist for one issue, then are deleted.
Push-based execution No idle worker pool. Workers exist because work exists.
Multi-step workflows Sequential tasks with session cycling between steps.
Capability ledger Model performance emerges from the event log — keyed by model, not agent.

What We Drop

Feature Why We Drop It
Distributed databases (JSONL + Dolt + Git sync) Single SQLite DB — no offline or multi-writer sync needed
3-way merge with field-specific strategies Single writer serializes; no merge conflicts possible
Multi-level data architecture Single-level; projects are a column, not a namespace
Branch-per-agent SQL server SQLite WAL mode handles concurrent reads
tmux session management HTTP API manages sessions
Infrastructure agent roles (Witness, Deacon, Boot, Dog) Functions absorbed into deterministic orchestrator code
Inter-agent mail protocol Orchestrator mediates; agents communicate via DB

What Gets Simpler

With a central database and single orchestrator process:

  • Claiming a task is an atomic CAS — no optimistic locking
  • Dependency cycle detection is a live query, not something hoped to be consistent across clones
  • Event/audit trail is append-only into one database, no reconciliation
  • "What's in flight?" is a single query — no convoy abstraction needed
  • Agent state is a row in a table, not a bead in a distributed store

Comparison

Dimension Distributed Approach Hive
Agent runtime LLM in tmux LLM server (HTTP API)
Work queue Distributed DB + JSONL + Git Single SQLite DB
Scheduling CLI query Same SQL query, run by orchestrator
Strategic brain LLM in tmux Queen Bee (LLM via HTTP session)
Agent monitoring Patrol cycle + heartbeat Lease-based staleness + SSE events
Session management tmux + custom CLI HTTP session lifecycle API
Inter-agent comms Mail protocol Orchestrator mediates via DB
Crash recovery Detect stalled, respawn Lease expiry + degraded mode + reconciliation
Merge queue LLM in tmux Merge queue table + mechanical fast-path + LLM fallback
Infrastructure agents Multiple specialized roles Orchestrator code (deterministic)

What We Gain

  • Simpler infrastructure: One process, one database, one HTTP API
  • Real-time observability: SSE event stream replaces periodic polling
  • Easier debugging: All state in one SQLite file, queryable with any SQL tool
  • Cleaner separation: Deterministic logic in code, ambiguity resolution in LLMs
  • Lower barrier to entry: No distributed DB, no Go, no custom CLI, no tmux management

What We Lose

  • Offline/disconnected work: If the orchestrator is down, nothing runs
  • Git-native work history: No git log of issue state changes (events table compensates)
  • Distributed agent autonomy: Agents only communicate through the orchestrator — more centralized, but simpler