Hive: A Lightweight Multi-Agent Orchestrator

Spec driven develoment is the future of software engineering. Most of our time will be spent clarifying intent and goals, giving strong LLMs just enough detail to iron out the rest of the ambiguity and delegate to weaker models.

The arc of producing software is going to look like the arc of producing food. To draw a rough analogy:

pre-analog era of programming -> small-scale/subsistence farming
pre-LLM era -> harnessing the power of animals, wind, water
post-LLM era -> industrial scale agriculture powered by fossil fuel

Below is a teaser of the spec I used to build hive, a multi-agent orchestration framework. It is heavily inspired by some of the great ideas in Gas Town, but diverges on some fundamental design choices to make implementation easier.

This spec is a peek into what the future of software engineering will look like. The scale is dizzying. And like with many revolutions of the past -- the only way out is through.

If you are in a rush, I encourage you to look at the architecture diagram and prompting guidelines. I find these to be the highest levers in the system.

Copying and pasting the below alongside the gastown repo should get you far, if you so choose to try and implement these ideas yourself.

The Problem

Coordinating 20-30 AI coding agents across multiple git repositories is a hard problem. Existing systems like Gas Town solve it with a sophisticated stack: Go CLI, Dolt SQL server, git-backed JSONL sync, distributed hash-based IDs, multi-level databases, and Claude Code instances managed via tmux.

That stack is powerful, but it's designed for a world where work data must be conflict-free across disconnected clones, multiple tiers of databases must merge without coordination, and agents interact with the system exclusively through CLI shelling.

Hive trades those constraints for simpler ones:

A single SQLite database is the source of truth (no distributed sync)
An HTTP server API replaces CLI-driven agent management
The orchestrator is a single process that owns the DB and the agent lifecycle

The result preserves the best abstractions — the DAG-based ready queue, push-based execution, multi-step workflows, and the capability ledger — while dramatically reducing infrastructure complexity.

Architecture at a Glance

        ┌────────────────────────────────────────────────┐
        │                  LLM Server                    │
        │                                                │
        │  ┌─────────────────────────────────────────┐   │
        │  │ Queen Bee session (user-facing TUI/web) │   │
        │  │   ← human chats here                    │   │
        │  │   ← has tool access to `hive` CLI       │   │
        │  └────────────┬────────────────────────────┘   │
        │               │ hive create / hive status / …  │
        │               ▼                                │
        │  ┌─────────────────────────────────────────┐   │
        │  │         SQLite DB (WAL mode)            │   │
        │  └────────────┬────────────────────────────┘   │
        │               │                                │
        │  ┌────────────┴────────────────────────────┐   │
        │  │         Orchestrator (headless)         │   │
        │  │  Work Scheduler · Agent Manager · SSE   │   │
        │  │  Permission Unblocker · Merge Queue     │   │
        │  └────────────┬────────────────────────────┘   │
        │               │ spawns/monitors                │
        │  ┌────────────┴─────────────────────────────┐  │
        │  │ Worker session A   (ephemeral, per-issue)│  │
        │  │ Worker session B   (ephemeral, per-issue)│  │
        │  │ Refinery session   (persistent, merges)  │  │
        │  │ ...                                      │  │
        │  └──────────────────────────────────────────┘  │
        └────────────────────────────────────────────────┘
                    │              │
               git worktree   git worktree
               (worker A)     (worker B)

Components

Component	Responsibility
Queen Bee (LLM)	The user-facing strategic brain. The human chats with the Queen Bee, who interprets requests, decomposes them into work items, monitors workers, and handles escalations — all via CLI tool calls.
Orchestrator	The headless worker pool manager. Polls the ready queue, spawns workers in git worktrees, monitors completion via SSE, handles permissions, processes the merge queue. Does NOT interact with the user.
Refinery (LLM)	The merge processor. Easy rebases go through mechanically. Conflicts and integration failures get reasoned about by the Refinery. A persistent long-lived session.
Workers (LLM)	Ephemeral coding agents. One per issue. Implement, test, commit. Spawned on demand, destroyed on completion.
SQLite DB	Single source of truth for all work items, dependencies, agent state, and events.
Git Worktrees	Per-agent sandboxes. Each agent gets an isolated working directory.

The Core Insight: Deterministic vs. Ambiguous

The system has a clean separation of concerns. The orchestrator handles everything that SQL and HTTP can handle. Everything that requires reasoning about ambiguity goes to an LLM agent.

Concern	Who Handles It	Why
Ready queue computation	Orchestrator (SQL)	Deterministic graph query — no judgment needed
Atomic task claiming	Orchestrator (SQL CAS)	Database operation — no judgment needed
Session lifecycle	Orchestrator (HTTP)	Mechanical — no judgment needed
Health checks, staleness detection	Orchestrator (timer + SSE)	Threshold-based — no judgment needed
"Build me an auth system" → concrete issues	Queen Bee (LLM)	Requires understanding user intent and codebase
Monitoring progress and reporting to user	Queen Bee (LLM)	Requires judgment about what's relevant
Handling escalations from stuck workers	Queen Bee (LLM)	Reads failure details, decides to retry/rephrase/ask user
Resolving merge conflicts	Refinery (LLM)	Requires understanding code semantics
Deciding if a test failure is pre-existing	Refinery (LLM)	Requires reading test output and understanding context
Implementing a feature / fixing a bug	Worker (LLM)	The actual coding work

This is the ZFC principle (Zero decisions in code, all judgment calls go to models) applied selectively. The orchestrator is deliberately dumb. The LLMs handle everything that requires reasoning.

Three Agents, Three Lifecycles

Queen Bee: The User's Interface

The Queen Bee is a user-facing LLM session. The human chats with the Queen Bee to request work, ask questions, and monitor progress. The Queen Bee decomposes requests into concrete work items via CLI tool calls.

Human ←→ Queen Bee (interactive session)
            │
            │ "build me an auth system with JWT and rate limiting"
            │
            │ Queen Bee explores codebase, asks clarifying questions, then:
            │   $ hive create "Design auth middleware architecture" --priority 1
            │   $ hive create "Implement JWT validation middleware" --priority 2
            │   $ hive create "Add rate limiting to auth endpoints" --priority 2
            │   $ hive create "Write integration tests for auth flow" --priority 2
            │
         SQLite DB
            │
            ▼
Orchestrator (headless daemon)
  │ Queries ready queue → issue is ready
  │ Creates worktree, spawns worker session, claims issue
  ▼
Worker "toast" (headless LLM session)
  │ Implements the feature
  │ Commits, signals completion
  ▼
Orchestrator
  │ Observes completion via SSE
  │ Marks issue 'done', enqueues to merge queue
  │ Queries ready queue → next issue now unblocked
  │ Spawns next worker
  ... cycle continues ...
  ▼
Refinery (headless LLM session)
  │ Picks up merge queue entries
  │ Clean rebase? Push through mechanically.
  │ Conflict? Reason about the code, resolve it.
  │ Tests fail? Diagnose: pre-existing or introduced?
  │ On success: issue finalized, worktree torn down
  ▼
main branch updated

The Queen Bee and orchestrator are decoupled — they share the SQLite DB but never communicate directly.

Workers: Ephemeral Coding Agents

One worker per issue. Each gets a fresh session with an isolated git worktree. The worker implements, tests, commits, and signals completion. The orchestrator monitors via SSE events and handles the lifecycle.

Workers are propulsive — there is no idle worker pool. Workers exist because work exists. When work completes, the worker is destroyed. No "are you still working?" heartbeats. The SSE event stream provides real-time status.

Refinery: The Merge Processor

A persistent, long-lived session that processes the merge queue. Unlike workers (ephemeral, one per issue), the Refinery stays alive across merges, accumulating project context.

Two-tier architecture:

Tier 1 — Mechanical merge (no LLM): Attempt git rebase main, run tests, git merge --ff-only. If all succeed — done. No LLM cost.
Tier 2 — Refinery LLM: If rebase conflicts or tests fail, hand to the Refinery session for reasoning. The Refinery resolves conflicts, diagnoses test failures, and reports back.

Ephemeral Agents, Durable Events

Gas Town decomposed agent state into three layers (Identity, Sandbox, Session) with independent lifecycles. We originally kept the same decomposition, but in practice this seems unnecessary.

Agents are ephemeral. They exist only while executing work:

Layer	Gas Town	Hive
Identity	Permanent bead, CV chain	Ephemeral row in `agents` table; deleted after merge/cleanup
Sandbox	Git worktree, managed by CLI	Git worktree, managed by orchestrator
Session	Claude Code in tmux	LLM session via HTTP API

The agent ID serves as a correlation key during execution (linking events, notes, and merge entries within a single run) but has no meaningful identity beyond that. The model field is denormalized onto key events (worker_started, completed, incomplete) so the events table is self-contained for all analytics queries — no join to agents needed.

Agent rows are deleted after successful merge and purged on daemon startup. The agent_id columns in events, notes, and merge_queue are retained as correlation keys but have no foreign key constraint — events outlive agents by design.

State does not live in the context window. The context is disposable working memory. Durable state lives in the database (issues, events) and git (code, branches). When any agent cycles, it reads its state from the DB on startup.

The Ready Queue: DAG-Based Scheduling

This is the single most important idea in the system. The dependency graph is the scheduler.

Work items have explicit dependencies. An item is "ready" when all its blockers are resolved. No scheduler service. No priority queue. No task router. When a blocking issue completes, its dependents automatically become ready on the next query.

         ┌───────────┐
         │  Design   │ ← ready (no blockers)
         └─────┬─────┘
               │ blocks
         ┌─────▼─────┐
         │ Implement │ ← blocked until Design is done
         └─────┬─────┘
               │ blocks
         ┌─────▼─────┐
         │   Test    │ ← blocked until Implement is done
         └───────────┘

Claiming a task is an atomic compare-and-swap operation — real CAS, no optimistic locking. Dependency cycle detection is a live query. "What's in flight?" is a single SELECT.

Issue Status State Machine

open → in_progress → done → finalized
                      ↓
                    failed (retryable)
open|in_progress → blocked (explicit)
* → canceled
* → escalated (human needed)

The done → finalized split is important: it creates a clean handoff boundary. "Done" means "I'm finished coding." "Finalized" means "it's merged and verified on main." Workers produce done. The Refinery produces finalized.

Multi-Step Workflows (Molecules)

Molecules are parent issues with child step-issues linked by blocking dependencies. Each step is a trackable work item. When a step completes, the orchestrator queries for the next ready step within the molecule and session-cycles the agent onto it — fresh context, same sandbox.

Molecule: "Build auth system"
  ├── Step 1: Design (ready immediately)
  ├── Step 2: Implement (blocked by Step 1)
  ├── Step 3: Test (blocked by Step 2)
  └── Step 4: Review (blocked by Step 3)

The agent keeps its git worktree across steps (code accumulates) but gets a fresh LLM context for each step (no context window bloat). The session cycles, the sandbox persists.

Prompt Engineering: Lessons from Production

Effective agent prompts need much more than a task description. They encode behavioral contracts, anti-patterns to avoid, and operational procedures. These patterns were learned from running agents at scale:

Pattern	What It Does	Why It Matters
The Idle Heresy	Hammers home that agents must NEVER wait for approval	LLMs naturally want to pause and confirm. Without strong anti-idle conditioning, agents stall after completing work.
The Approval Fallacy	"There is no approval step. When work is done, you act."	Same root cause — LLMs seek confirmation. The prompt explicitly forbids it.
Directory Discipline	"Stay in YOUR worktree. NEVER edit files in the root."	Agents drift out of their sandbox and lose work.
Propulsion Principle	"If you find something on your hook, YOU RUN IT."	Prevents the agent from announcing itself and waiting for instructions.
Escalation Protocol	"When blocked, escalate. Do NOT wait for human input."	Without this, agents hang indefinitely on ambiguous requirements.
Capability Ledger Motivation	"Your work is visible. Your model's track record grows with every completion."	Appeal to self-interest in reputation drives quality.

Health Checks and Failure Recovery

Lease-Based Staleness

Every agent assignment gets a lease_expires_at timestamp. The orchestrator extends the lease whenever it observes progress (new tokens, tool completions, status transitions). If the lease expires with no progress, the agent is considered stalled.

Escalation Chain

Worker (blocked) → Orchestrator (mechanical retries) → Queen Bee (LLM reasoning) → Human

Worker signals a blocker via structured completion
Orchestrator retries mechanically (same issue, fresh session)
After N retries, orchestrator marks the issue as failed
Queen Bee sees the failure via hive status / hive logs, discusses with the human
Human and Queen Bee decide: rephrase, decompose further, or abandon

No special escalation protocol. The Queen Bee is already in conversation with the human — it just brings up the failure naturally.

Degraded Mode

When a critical dependency becomes unreachable, the orchestrator enters degraded mode rather than crashing. It stops dispatching new work, keeps a recovery loop with exponential backoff, continues database sweeps to reconcile state, and preserves in-flight agents. When the dependency recovers, the orchestrator reconciles and resumes.

Crash Recovery

On restart, the orchestrator reconciles database state with reality: finds agents whose sessions are gone (mark stalled, unassign work), finds orphan sessions with no database agent (kill them). The event log is the source of truth — status columns are a denormalized cache that can always be rebuilt by replaying events.

Inter-Agent Knowledge Transfer (Notes)

Workers are ephemeral — each gets a fresh session with no memory of what previous workers discovered. The notes system bridges this gap.

Workers write discoveries, gotchas, and patterns to a file in their worktree. The orchestrator harvests these on completion and stores them in the database. Future workers get relevant notes injected into their prompts.

This is the 80/20 of a full inter-agent mail protocol: context injection without routing, addressing, or inbox complexity. No agent-to-agent messaging — just a shared knowledge base that the orchestrator mediates.

Capability Ledger (Model-Based Analytics)

Agents are ephemeral — they're created for a single issue and deleted after merge. The unit of analysis for performance tracking is model × issue type → outcome, not agent identity.

The model is denormalized onto key events so the events table is self-contained for all analytics queries. "Which model is best at Go work?" becomes a SQL query over the events table joined with issue metadata — no join to the (ephemeral) agents table needed.

SELECT
    json_extract(e.detail, '$.model') as model,
    i.type,
    COUNT(*) FILTER (WHERE e.event_type = 'completed') as successes,
    COUNT(*) FILTER (WHERE e.event_type IN ('incomplete', 'failed')) as failures
FROM events e
JOIN issues i ON e.issue_id = i.id
WHERE e.event_type IN ('completed', 'incomplete')
  AND json_extract(e.detail, '$.model') IS NOT NULL
GROUP BY model, i.type;

What We Keep and What We Drop

The design is a deliberate trade: keep the ideas that matter, drop the infrastructure that served different constraints.

What We Keep

Concept	Why
DAG-based ready queue	The dependency graph is the scheduler. Elegant, correct, zero infrastructure.
Ephemeral agents	No persistent identity needed. Agents exist for one issue, then are deleted.
Push-based execution	No idle worker pool. Workers exist because work exists.
Multi-step workflows	Sequential tasks with session cycling between steps.
Capability ledger	Model performance emerges from the event log — keyed by model, not agent.

What We Drop

Feature	Why We Drop It
Distributed databases (JSONL + Dolt + Git sync)	Single SQLite DB — no offline or multi-writer sync needed
3-way merge with field-specific strategies	Single writer serializes; no merge conflicts possible
Multi-level data architecture	Single-level; projects are a column, not a namespace
Branch-per-agent SQL server	SQLite WAL mode handles concurrent reads
tmux session management	HTTP API manages sessions
Infrastructure agent roles (Witness, Deacon, Boot, Dog)	Functions absorbed into deterministic orchestrator code
Inter-agent mail protocol	Orchestrator mediates; agents communicate via DB

What Gets Simpler

With a central database and single orchestrator process:

Claiming a task is an atomic CAS — no optimistic locking
Dependency cycle detection is a live query, not something hoped to be consistent across clones
Event/audit trail is append-only into one database, no reconciliation
"What's in flight?" is a single query — no convoy abstraction needed
Agent state is a row in a table, not a bead in a distributed store

Comparison

Dimension	Distributed Approach	Hive
Agent runtime	LLM in tmux	LLM server (HTTP API)
Work queue	Distributed DB + JSONL + Git	Single SQLite DB
Scheduling	CLI query	Same SQL query, run by orchestrator
Strategic brain	LLM in tmux	Queen Bee (LLM via HTTP session)
Agent monitoring	Patrol cycle + heartbeat	Lease-based staleness + SSE events
Session management	tmux + custom CLI	HTTP session lifecycle API
Inter-agent comms	Mail protocol	Orchestrator mediates via DB
Crash recovery	Detect stalled, respawn	Lease expiry + degraded mode + reconciliation
Merge queue	LLM in tmux	Merge queue table + mechanical fast-path + LLM fallback
Infrastructure agents	Multiple specialized roles	Orchestrator code (deterministic)

What We Gain

Simpler infrastructure: One process, one database, one HTTP API
Real-time observability: SSE event stream replaces periodic polling
Easier debugging: All state in one SQLite file, queryable with any SQL tool
Cleaner separation: Deterministic logic in code, ambiguity resolution in LLMs
Lower barrier to entry: No distributed DB, no Go, no custom CLI, no tmux management

What We Lose

Offline/disconnected work: If the orchestrator is down, nothing runs
Git-native work history: No git log of issue state changes (events table compensates)
Distributed agent autonomy: Agents only communicate through the orchestrator — more centralized, but simpler