The Problem With Trusting Anyone
Here’s a thought experiment. You have a system of dozens of autonomous AI agents, each capable of running shell commands, writing files, and calling external APIs. Some of them need database credentials. Others need deployment keys. A few need access to third-party services that cost real money per API call.
Now ask yourself: which agent do you trust with the master password?
The answer, if you’re thinking clearly, is none of them. Not because the agents are malicious — they’re not — but because trust is a liability. Every entity that holds a secret is a potential leak. Every agent with broad permissions is a potential blast radius. The more access any single agent has, the worse things get when something goes wrong. And in a system with enough moving parts, something always eventually goes wrong.
Traditional software security assumes a trusted operator — a human who holds the keys and delegates carefully. But in an autonomous swarm, there is no human in the loop for most operations. The agents are the operators. So the security model can’t rely on trust. It has to make trust unnecessary.
Nobody Gets the Master Key
Every agent in the Sulphur swarm runs in its own isolated context. A task agent — the ephemeral worker that actually writes code or performs research — can only see its own task. It has a working directory, a set of tools, and a mandate. That’s it. It can’t browse another agent’s files. It can’t read another task’s context. It doesn’t even know what other tasks exist.
This isn’t a policy. It’s architecture. The agent literally does not have the tools to reach outside its scope.
Different roles get different capabilities. A task agent can read and write files in its worktree and run terminal commands. A coordinator can create and manage tasks within its working group. A project manager can create working groups but can’t directly execute code. The overseer can coordinate across projects but can’t deploy anything. Each layer has exactly the permissions it needs to do its job and nothing more.
This is the principle of least privilege, but not as a best practice written in a security handbook that everyone ignores. It’s enforced by the system itself. An agent can’t escalate its own permissions. It can’t grant itself new tools. The boundary isn’t a suggestion — it’s a wall.
Secrets Without Seeing
Here’s where it gets interesting. Agents sometimes need credentials. A deployment agent needs an API key. A database migration agent needs a connection string. These are real secrets with real consequences if they leak.
The obvious approach — just give the agent the secret and let it use it — is exactly what the swarm doesn’t do. Instead, the secrets system is designed around a single constraint: the secret value never enters the agent’s context.
An agent can list secrets. It can see metadata — the name, the scope, a description of what the secret is for. But it cannot read the actual value. The only operation available is secret.writeToFile, which decrypts the secret and writes the raw value directly to a file on disk. The plaintext passes through the system’s secure layer and lands in a file. It never appears in the agent’s conversation, its memory, or its reasoning trace.
Why does this matter? Because an agent’s context is its attack surface. If you could dump everything an agent has seen and said, you’d have a complete record of its operations. In most AI systems, that dump would include every API key the agent ever used. In the swarm, you’d find references to secrets by name — “I wrote the deploy key to /tmp/key” — but never the key itself.
Secrets are also scoped. Project-level secrets are only accessible to agents working on that project. Global secrets exist for cross-cutting concerns but are still mediated through the same write-to-file mechanism. An agent working on the marketing website can’t access the production database credentials, because those secrets simply don’t exist in its scope.
In practice, the flow looks like this: an agent needs to authenticate with an external service. It checks what secrets are available, finds the one it needs, writes it to a temporary file, references that file in its command, and moves on. The credential exists on disk for the duration of the operation and never surfaces in the agent’s reasoning. If you audited the agent’s full output, you’d see the process but not the password.
The Hierarchy Is the Firewall
The swarm’s delegation hierarchy — Overseer to Project Manager to Coordinator to Task Agents — isn’t just an organizational chart. It’s a security boundary.
Task agents can only communicate with their coordinator. They can’t message the overseer. They can’t reach agents in other working groups. They can’t even discover what other working groups exist. If a task agent somehow went off the rails — hallucinated a goal, misunderstood its instructions, or tried to do something it shouldn’t — its blast radius is limited to its own worktree and its own coordinator’s inbox.
Escalation flows through defined channels. A task agent reports to its coordinator. The coordinator reports to its project manager. The project manager reports to the overseer. No shortcuts. No agent can bypass its immediate superior to reach a higher authority, which means no agent can social-engineer its way past its containment layer.
This is defense in depth applied to an agent system. Each level of the hierarchy is a checkpoint. A confused task agent might send a strange message to its coordinator, but the coordinator evaluates that message in its own context and decides what, if anything, to pass upward. The hierarchy filters noise, contains failures, and prevents any single agent from having an outsized impact on the system.
Trust Through Validation, Not Authority
In human organizations, we often trust people based on reputation or role. A senior engineer’s code gets less scrutiny. A manager’s decisions get fewer questions. This works most of the time, but it’s fragile — it fails exactly when it matters most, because the cases where a senior engineer makes a mistake are the cases where no one is looking.
The swarm doesn’t have reputation. It has structure. Every piece of work flows through an independent validation pipeline: researcher, planner, worker, validator, reviewer. No single agent can push changes through unchecked. The worker who writes the code is never the validator who approves it. The planner who designs the approach is never the one who confirms the approach is sound.
This is separation of duties — the same principle that says the person who writes checks shouldn’t also reconcile the bank statement. In the swarm, it’s not a policy that can be waived when things are busy or the change seems small. It’s how the pipeline works. There is no override. There is no “just ship it.”
The result is a trust model that doesn’t depend on any individual agent being trustworthy. A compromised or confused agent can produce bad output, but that output hits a validator who wasn’t involved in producing it and has no incentive to let it through. The system’s security comes from the independence of its checks, not from the reliability of any single participant.
What This Means in Practice
Let’s walk through a concrete scenario. A task agent needs to deploy an update to a staging environment. The deployment requires an SSH key and an API token for the hosting provider.
First, the agent checks available secrets and finds staging-ssh-key and hosting-api-token in its project scope. It writes both to temporary files — the actual values never enter its context. It constructs the deployment command referencing those files, runs it, and reports the result.
But the agent doesn’t just deploy on its own authority. The implementation that led to this deployment was written by a worker agent, validated by a separate work validator who confirmed it met the specification, and reviewed by a reviewer agent who checked it against the broader codebase. Three independent agents signed off before the code reached the deployment stage. And the deployment agent itself operates within a scoped environment — it can deploy to staging, but it has no access to production secrets.
If any single agent in this chain made a mistake — the worker introduced a bug, the validator missed an issue, the deployment agent misconfigured something — the damage is contained. The worker can’t deploy. The validator can’t write code. The deployment agent can’t access anything beyond its scope. No single point of failure exists because no single point of authority exists.
Security as Architecture
The deeper principle here isn’t specific to AI agents. It’s that security works best when it’s structural rather than behavioral. Telling agents “don’t leak secrets” is about as effective as telling employees “don’t click phishing links.” Some will comply, some won’t, and you can’t verify the difference until it’s too late.
The swarm takes a different approach: make the insecure path unavailable. Agents can’t leak secrets they never see. Agents can’t exceed permissions they were never granted. Agents can’t bypass reviews that are architecturally required. The secure behavior isn’t the disciplined choice — it’s the only choice.
This is the same insight that drives the best security engineering in traditional software. Don’t rely on developers remembering to sanitize inputs — use a framework that sanitizes by default. Don’t rely on operators remembering to rotate keys — automate the rotation. Don’t rely on trust. Build systems where trust is unnecessary.
In a swarm of autonomous agents, that principle isn’t just good practice. It’s the only thing that scales.