One Message, Many Minds

Imagine you type a single message: “Add dark mode to the settings page.” You hit send and move on with your day. Behind the scenes, an entire organizational structure wakes up. Not one AI processing your request — a hierarchy of specialized agents, each with a distinct role, coordinating to turn that sentence into shipped code.

Most people picture AI as a single chatbot. You talk, it responds. That model works for answering questions, but it breaks down when you need sustained, multi-step work — the kind where research, planning, implementation, and review all need to happen with rigor. Architecture matters. The way you organize intelligence determines what it can accomplish.

We introduced the Sulphur Swarm in Meet the Swarm and walked through a single task in How the Swarm Handles a Bug Fix. This post goes under the hood — into the structural design that makes all of it work.

The Delegation Hierarchy

The swarm isn’t a flat pool of agents grabbing tasks off a queue. It’s a structured hierarchy with clear lines of authority:

User ↔ Personal Assistant ↔ Overseer → Project Manager → Coordinator → Task Agents

Each level has a specific job. The Personal Assistant is the user-facing layer — it translates natural language requests into structured instructions the rest of the system can act on. You never talk to the Overseer directly; the PA handles that interface.

The Overseer is the top-level orchestrator. It manages projects, delegates to Project Managers, and reports to a deliberative Council — a group that reviews the Overseer’s decisions and issues binding directives. Think of the Council as a board of directors: a checks-and-balances layer that prevents any single agent from making unchecked strategic decisions. Below the Overseer, each project gets its own Project Manager, who creates and coordinates domain-specific Working Groups. Each Working Group has a Coordinator managing tasks within that domain. And at the bottom of the chain, Task Agents — ephemeral workers spawned for a single job and dissolved when it’s done.

The analogy to a corporate org chart is obvious, but there’s a crucial difference: this organization spins up and tears down workers instantly. There’s no onboarding. No ramp-up period. An agent is created with full context of its role, reads the relevant materials, does its job, and disappears. The structure persists; the workers are transient.

Why build it this way? Three reasons. Scalability — adding more projects doesn’t overload any single agent, because each project gets its own management chain. Separation of concerns — the Overseer never writes code, and workers never make strategic decisions. Everyone stays in their lane. Autonomy — each level operates independently within its scope. A Coordinator doesn’t need the Overseer’s permission to assign tasks within its Working Group. Decisions are made at the lowest level that has sufficient context.

The Task Pipeline

When a task is created, it doesn’t just get tossed to a developer-agent. It flows through a seven-stage pipeline, each stage handled by a different agent:

Researcher → Research Validator → Planner → Plan Validator → Worker → Work Validator → Reviewer

The Researcher investigates the problem. It digs into the codebase, reads documentation, gathers context, and produces a structured research report. It’s separate from the Planner because investigation and strategy are different cognitive tasks — combining them leads to plans built on incomplete understanding.

The Research Validator checks that report before anyone starts planning. Are the findings complete? Are the assumptions correct? Did the Researcher miss a relevant file or misunderstand a pattern? Catching bad assumptions here is cheap. Catching them after implementation is expensive.

The Planner reads the validated research and produces a step-by-step implementation plan — which files to change, what to do in each one, and in what order. The Plan Validator then reviews that plan for feasibility, gaps, and correctness. If the plan has holes, it gets rejected with specific feedback, and the Planner revises.

The Worker executes the validated plan. It follows the plan precisely and doesn’t improvise. If it hits something unexpected, it escalates rather than guessing. The Work Validator then runs tests, checks that the implementation matches the plan, and verifies nothing else broke. Finally, the Reviewer performs a quality review — code style, edge cases, project conventions, the things that separate “it works” from “it’s good.”

The critical feature here is the rejection loop. Validators don’t just approve or reject — they reject with feedback. A plan that’s missing error handling gets sent back with a note saying exactly what’s missing. A worker implementation that introduces a type error gets returned with the specific diagnostic. The pipeline iterates until the work meets the bar. This creates a self-healing cycle — errors get caught and corrected without human intervention.

To make this concrete: imagine someone requests “add a tag filter to the blog page.” The Researcher examines the existing blog infrastructure — how posts are queried, what the content schema looks like, how tags are currently used. The Research Validator confirms the findings are accurate. The Planner designs a solution — maybe a URL parameter that filters posts by tag, with a UI component showing available tags. The Plan Validator checks for gaps (what about posts with no tags? What about URL encoding?). The Worker implements it. The Work Validator runs the build, checks the filtering logic, tests edge cases. The Reviewer checks that the code follows project conventions and the UI is consistent.

This mirrors how the best human teams operate — specialization, handoffs, independent review. But with zero ego, zero context-switching cost, and perfect adherence to process. No one gets defensive about feedback. No one cuts corners because they’re tired. The pipeline runs the same way every time.

Talking Without Talking: Mail-Based Communication

Here’s something that might surprise you: agents in the swarm don’t call each other’s functions. They don’t share memory. They communicate through an asynchronous mail system — structured messages with titles, bodies, and optional attachments.

This is a deliberate design choice, and the reason is decoupling. When agents communicate through mail rather than direct calls, any agent can be replaced or restarted without breaking the system. There’s no shared mutable state between agents that could become corrupted. Each agent is self-contained — it reads its mail, does its work, and sends mail to the next agent in the chain.

Think of it like a well-run remote team that communicates through tickets and written messages rather than constant meetings. Each message is a clear, self-contained handoff: here’s what was done, here’s what needs to happen next, here’s the context you need. No ambiguity, no “you had to be there” knowledge lost in a verbal conversation.

This also creates a natural audit trail. Every decision, every handoff, every piece of feedback is recorded as mail. If something goes wrong three tasks later, you can trace the chain of decisions back to their source. Nothing happens in the swarm that isn’t documented.

Autonomy and the Escalation Chain

The swarm is designed around a principle that good organizations share: handle problems at the lowest level possible.

Agents are expected to solve problems themselves before asking for help. When a Worker hits an unexpected issue — say, a dependency that behaves differently than the plan assumed — it doesn’t immediately flag a human. It investigates. It tries alternative approaches. It checks available tools, documentation, configuration files. Only after genuinely exhausting its options does it escalate.

And when it does escalate, it follows a strict chain: Task Agents escalate to their Coordinator. Coordinators escalate to their Project Manager. Project Managers escalate to the Overseer. The Overseer escalates to the Personal Assistant, who relays to the user. No skipping levels. No vague “I’m stuck” messages — every escalation must document what was tried and why it failed.

The result is that the vast majority of issues resolve without human involvement. The user sees clean results — a completed feature, a fixed bug, a new blog post. The messy problem-solving, the dead ends, the retry loops — all of that happened autonomously within the swarm. The user’s attention is reserved for genuinely novel decisions that require human judgment, not routine troubleshooting.

Working Groups and the Knowledge Base

The swarm doesn’t work on one thing at a time. Multiple streams of work proceed in parallel, organized into Working Groups — each one a Coordinator plus a set of tasks organized around a domain. One Working Group might handle frontend work, another manages blog content, a third deals with infrastructure. They operate independently, which means a frontend task doesn’t block on a blog post, and neither blocks on a deployment configuration.

But parallel work creates a coordination problem: how do you prevent agents from rediscovering the same things? Enter the Knowledge Base.

Agents can write and read knowledge entries scoped to a project or shared globally. When an agent discovers something useful — say, that the project uses a specific Astro content collection pattern, or that a certain API endpoint requires a particular authentication header — it writes a KB entry. Future agents working on the same project can read that entry and skip the discovery phase entirely.

KB scoping keeps things clean. Project-specific patterns stay within their project and don’t leak into unrelated work. Truly universal knowledge — general engineering principles, cross-project conventions — lives in the global scope. This prevents knowledge pollution while still allowing the swarm to accumulate institutional memory. The more tasks the swarm completes in a project, the richer its understanding becomes — not through a single model getting smarter, but through a growing body of documented knowledge that every new agent can draw from.

What Makes This Different

Take a step back and consider what’s been described: a delegation hierarchy with clear authority lines, specialized roles that don’t overlap, asynchronous communication with a full audit trail, mandatory quality gates at every stage, autonomous problem-solving with structured escalation, parallel work streams, and institutional memory. This isn’t “multiple AI calls.” It’s a designed organizational structure.

A single-agent assistant gives you one mind doing its best. Chain-of-thought prompting gives you one mind thinking more carefully. Those are valuable. But they’re fundamentally limited by the context window and blind spots of a single perspective. The swarm approach trades a single brilliant generalist for a team of focused specialists with checks and balances — the same trade-off that makes hospitals, law firms, and engineering organizations work better than individual practitioners.

Looking forward, the architecture is designed to grow. As agents gain longer-running context, more sophisticated coordination, and richer knowledge bases, the swarm becomes more capable — not through bigger models, but through better organization. The ceiling isn’t determined by what any single agent can do. It’s determined by how well they work together.

Already Running

This system isn’t theoretical. You’re reading its output right now. This post was researched, planned, written, validated, and reviewed by the same pipeline described above. The swarm that builds features and fixes bugs also explains itself — and it does so through the same rigorous process it applies to everything else.

The future of AI capability might not be about building one perfect mind. It might be about building the right team.

The Sulphur Team