The Problem Nobody’s Talking About
There’s a quiet crisis in artificial intelligence. Not a technical crisis — a structural one.
The most capable AI systems in the world require data centers consuming megawatts of power. They’re owned by a handful of corporations. They cost billions to train and millions to operate. And to use them, you need to pay per token, per query, per minute — renting intelligence from someone else’s infrastructure.
We’ve built the most transformative technology in human history, and then locked it behind API keys.
This isn’t just an economic problem. It’s an architectural one. The transformer — the engine behind ChatGPT, Claude, Gemini, and virtually every frontier AI system — was designed for scale. It needs scale. Attention mechanisms scale quadratically with context length. Parameter counts climb into the hundreds of billions. Training runs cost tens of millions of dollars. The architecture itself demands centralization.
We asked a different question: What if the architecture itself made intelligence cheap?
Not cheap as in “inferior.” Cheap as in “runs on hardware you already own.” Cheap as in “no API bill.” Cheap as in “no corporation standing between you and your AI.”
The Transformer Trap
Transformers are brilliant engineering. They’re also a dead end for democratized intelligence.
Every improvement to transformer-based AI follows the same pattern: make it bigger. More parameters, more data, more compute. GPT-3 had 175 billion parameters. GPT-4 reportedly has over a trillion. Each generation demands exponentially more resources for incrementally better performance.
This is the scaling paradigm — the belief that intelligence emerges from size. And to some degree, it works. But it creates an inescapable gravity well: only organizations with massive capital can participate in frontier AI development. Everyone else rents access.
The scaling paradigm also reveals something uncomfortable about transformers: they don’t understand anything. They predict the next token. They do this extraordinarily well — well enough to simulate understanding, reasoning, creativity, and empathy. But underneath, it’s statistical pattern matching over sequences of text. This is why they hallucinate with perfect confidence. They have no model of the world — only a model of language.
If we want AI that genuinely understands — and that runs on a laptop — we need to abandon the architecture that demands ignorance at scale.
What the Brain Got Right
The human brain processes information using roughly 20 watts of power — less than a light bulb. It learns from single examples. It generalizes across domains. It maintains a coherent, updateable model of the world. It does all of this running on biological hardware that, in computational terms, is absurdly slow and noisy.
How? Not by scaling. The brain isn’t a bigger version of a simpler brain. It’s a fundamentally different architecture — one optimized for efficiency, adaptation, and understanding rather than raw throughput.
Three principles make the brain extraordinary:
Prediction, not reaction. The brain doesn’t wait for input and then process it. It constantly generates predictions about what will happen next, and only processes the surprise — the difference between prediction and reality. This is vastly more efficient than processing every input from scratch, which is exactly what transformers do.
Local learning, not global optimization. Neurons update their connections based on locally available information — what they predicted, what actually happened, and how important the error is. No neuron needs to know about any other neuron’s connection strengths. Contrast this with backpropagation, which requires a global error signal propagated through every layer — an impossibility in biological systems and a computational bottleneck in artificial ones.
Continuous processing, not discrete tokens. The brain doesn’t chop the world into tokens and process them sequentially. It operates in continuous time, at multiple timescales simultaneously. Your brainstem handles reflexes in milliseconds. Your motor cortex plans movements over seconds. Your prefrontal cortex pursues goals over hours and days. All at once, in the same architecture.
FENA: Free Energy Neural Architecture
We’re building FENA — an architecture based on these brain principles, designed from the ground up to be both more capable and more efficient than transformers.
Predictive Coding replaces the forward-pass-plus-backprop paradigm. Every module in the system generates predictions about its inputs. When reality doesn’t match prediction, the local prediction error drives learning. No global loss function. No backward pass through the entire network. Each module improves independently, using only the information it has.
This isn’t just more biologically plausible — it’s more computationally efficient. Instead of propagating gradients through billions of parameters, each module updates itself locally. The computational cost scales with the number of modules, not with the total parameter count.
Continuous-time dynamics replace discrete token processing. FENA’s modules evolve according to differential equations — Neural ODEs that model genuine temporal dynamics. This means the system naturally processes information at multiple timescales. Fast modules handle reactive processing. Slow modules handle deliberation. The system automatically allocates more computation to harder problems — it literally “thinks longer” about difficult inputs — without any special mechanism.
Energy-based settling replaces single-pass computation. Instead of computing an answer in one forward pass, FENA settles toward an energy minimum — the state where all predictions are as accurate as possible and all errors are minimized. This is directly inspired by Karl Friston’s Free Energy Principle, arguably the most important unifying theory in modern neuroscience.
Oscillatory binding replaces attention. Instead of the O(n²) attention mechanism that makes transformers so expensive at long contexts, FENA uses neural oscillations to bind information — the same mechanism the brain uses. Synchronized oscillations link related information; desynchronization separates it. This is both more biologically accurate and more computationally efficient than scaled dot-product attention.
Why Free Inference Matters
When inference is free — when you can run a capable AI system on your own hardware with no ongoing cost — everything changes.
Privacy becomes default. Your data never leaves your machine. No API calls, no server logs, no corporate data collection. Your AI assistant actually works for you, not for the company hosting it.
Access becomes universal. A student in rural India with a gaming laptop has the same AI capabilities as a Fortune 500 company. Intelligence is no longer a service you rent from Silicon Valley — it’s a tool you own.
Innovation becomes decentralized. When anyone can run and modify the system, innovation happens everywhere. Not just in labs with thousand-GPU clusters, but in dorm rooms, garages, and community hackerspaces. The most important applications of AI will come from people who currently can’t afford API bills.
Reliability becomes guaranteed. No API outages. No rate limits. No service discontinuations. No terms of service changes. Your AI works as long as your hardware works.
Why Free Doesn’t Mean Worse
This is the critical point: we’re not building a “lite” version of real AI. We’re building something fundamentally better by being more aligned with how intelligence actually works.
Transformers achieve their capabilities through brute force — predicting the next token across trillions of training examples until statistical patterns approximate understanding. It works, impressively. But it’s the equivalent of memorizing every conversation ever had rather than actually learning to think.
FENA achieves capabilities through architecture — by building the computational equivalent of what evolution spent 500 million years perfecting. Prediction error minimization. Local learning. Continuous dynamics. Hierarchical processing. World modeling.
A system that maintains an actual model of the world doesn’t need to memorize every fact — it can derive facts from its model. A system that learns from prediction errors doesn’t need trillions of training examples — it learns efficiently from surprise, just as humans do. A system that processes in continuous time doesn’t need massive context windows — it maintains state naturally.
The brain proves this is possible. It outperforms every AI system ever built while consuming 20 watts. The problem isn’t that intelligence requires massive compute. The problem is that we’ve been using the wrong architecture.
The Road Ahead
We’re not claiming we’ll solve intelligence tomorrow. This is hard — potentially the hardest engineering challenge in human history. Some of our ideas won’t work. We’ll make mistakes, hit dead ends, and have to rebuild.
But we believe the direction is right. If you want AI that’s truly intelligent — not just statistically impressive — you need an architecture that’s truly brain-like. And if you want AI that’s free and accessible to everyone, you need an architecture that’s efficient enough to run on the hardware people actually have.
FENA is our bet that you can have both. That free and better aren’t contradictions — they’re consequences of getting the architecture right.
Intelligence shouldn’t require a data center. It shouldn’t require a subscription. It shouldn’t require permission.
We’re building the alternative.