The Problem with Current AI
Every major AI system today — ChatGPT, Claude, Gemini — is built on the same fundamental architecture: the transformer. And while transformers are remarkably capable, they have a dirty secret: they have almost nothing in common with how biological intelligence actually works.
Transformers have fundamental limitations:
- They process information in one direction (forward)
- They learn through backpropagation (a mathematically elegant but biologically impossible algorithm)
- They treat everything as token prediction (reducing all intelligence to “guess the next word”)
- They require enormous compute (billions of parameters, megawatts of power)
The human brain, by contrast, uses roughly 20 watts — the power of a dim light bulb — to do things no AI can: understand causality, learn from single examples, generalize across domains, and maintain a coherent model of the world.
We asked ourselves: what if we stopped trying to scale our way to intelligence and instead built something that actually works like a brain?
Enter FENA: Free Energy Neural Architecture
FENA is our answer. It’s a radically different approach to building intelligent systems, grounded in the best theories neuroscience has to offer about how the brain actually computes.
The name comes from Karl Friston’s Free Energy Principle — arguably the most important unifying theory in neuroscience. The core idea: the brain is a prediction machine. Everything it does — perceiving, thinking, acting, learning — is in service of one goal: minimizing surprise (or more precisely, minimizing “free energy,” a mathematical measure of prediction error).
FENA takes this principle and builds a complete computational architecture around it.
1. Predictive Coding Hierarchy
In the brain, higher cortical areas constantly predict what lower areas will report. When the prediction is wrong, the error signal travels upward and the higher area updates its model. This happens at every level simultaneously — from raw pixels to abstract concepts.
FENA implements this directly. Instead of a single forward pass through a neural network, information flows in both directions: predictions flow downward, errors flow upward. Representations aren’t computed — they’re discovered through iterative settling, like a ball rolling to the bottom of a valley.
This means FENA can “think harder” about difficult problems. Simple inputs settle quickly. Complex or ambiguous inputs trigger more iterations, more error correction, more refinement — just like the brain spends more time processing surprising or confusing inputs.
2. Continuous-Time Dynamics
Traditional neural networks process information in discrete steps — one layer at a time, one token at a time. But the brain doesn’t have “layers” or “steps.” It operates in continuous time, with billions of neurons constantly integrating inputs and firing at their own rhythms.
FENA uses Neural Ordinary Differential Equations (Neural ODEs) to model continuous-time dynamics. Each node in the network evolves according to differential equations, just like real neurons. This gives us something transformers can never achieve: natural multi-timescale processing.
Some nodes have fast time constants (milliseconds) — they handle reactive, sensory processing. Others are medium (seconds) — they handle motor planning and working memory. The slowest nodes (minutes to hours) handle deliberative reasoning and long-term goals.
This isn’t an architectural hack — it emerges naturally from the continuous-time formulation. The brain does the same thing: brainstem neurons fire fast for reflexes, cortical neurons fire slower for thinking, and the prefrontal cortex sustains activity over minutes for planning.
3. Energy-Based Settling
When FENA processes input, it doesn’t compute an answer in one shot. Instead, the entire network settles toward an energy minimum — a state where all the predictions are as accurate as possible and all the errors are minimized.
This is fundamentally different from how any current AI works. A transformer maps input to output through a fixed computation graph. FENA finds its answer by relaxing into equilibrium, like a physical system finding its lowest-energy state.
The beauty of this approach: the system naturally allocates more computation to harder problems. Easy inputs → fast settling. Hard inputs → more iterations, more energy to minimize, more computation. This is called “adaptive compute” and it emerges for free from the architecture.
4. Oscillatory Binding
One of the deepest unsolved problems in neuroscience is the “binding problem”: how does the brain combine separate features (color, shape, motion, location) into unified percepts (a red ball rolling left)?
The leading theory: neural oscillations. When neurons representing different features oscillate in synchrony (particularly in the gamma frequency band, 30-100 Hz), they’re “bound” together into a single percept. When they desynchronize, the binding dissolves.
FENA implements oscillatory binding computationally. Different nodes oscillate at different frequencies, and information is bound together through phase synchronization. This replaces the attention mechanism in transformers with something far more biologically grounded — and potentially more powerful, since it naturally supports hierarchical binding through cross-frequency coupling (theta-gamma coupling, as observed in hippocampal memory circuits).
5. Local Learning — No Backpropagation
Perhaps the most radical departure from standard AI: FENA doesn’t use backpropagation.
Backprop requires a global error signal that propagates backward through every layer of the network. This requires “weight transport” — each layer needs to know the weights of every other layer to compute gradients. No biological neural circuit does this. The brain simply doesn’t have the wiring for it.
Instead, FENA uses local learning rules inspired by real synaptic plasticity:
- Hebbian learning — “neurons that fire together wire together” — connections strengthen between co-active nodes
- STDP (Spike-Timing-Dependent Plasticity) — the precise timing of activity determines whether connections strengthen or weaken
- Prediction error modulation — local learning rates are modulated by prediction errors — you learn more from surprising events
Each node updates its own weights based only on locally available information. No global loss function. No gradient computation. No backward pass. Just local prediction errors driving local weight updates.
Recent theoretical work (Millidge et al., 2021) has shown that predictive coding with local learning rules can approximate backpropagation under certain conditions — so we’re not sacrificing learning power. We’re achieving the same end result through biologically plausible means.
Why This Matters
It Runs on Consumer Hardware
FENA is designed to run on a single GPU with 5-8GB of VRAM — a standard gaming card like an RTX 3080. We believe that if you need a data center to run your AI, you don’t understand intelligence well enough. The brain runs on 20 watts. Our target is a single consumer GPU.
It’s Self-Supervised
FENA learns entirely from raw experience — no labeled data, no human feedback, no curated datasets. The learning signal comes from prediction errors: the system tries to predict its inputs, and when it’s wrong, it learns. This is how infants learn: not from labels and rewards, but from the surprise of the world not matching their expectations.
It’s Modular Like the Brain
FENA isn’t one monolithic network. It’s a collection of specialized modules — perception, world modeling, reasoning, memory, language, action — that communicate through a shared workspace. Each module can be developed, tested, and improved independently. If one module breaks, the others still function.
It Bridges AI and Neuroscience
Most AI research and most neuroscience research happen in separate silos. FENA sits at the intersection: every architectural choice is grounded in neuroscience theory and implementable as efficient computation. We’re not just inspired by the brain — we’re testing neuroscience theories by building them.
The Road Ahead
FENA is ambitious. We’re replacing the foundation that every modern AI system is built on — transformers, attention, backpropagation — with something fundamentally different. There will be challenges. Some ideas won’t work as expected. But we believe the path to true intelligence doesn’t lie in making transformers bigger — it lies in making AI more like the remarkable prediction machine evolution spent 500 million years perfecting.
The brain got intelligence right. It’s time we started listening.
— The Sulphur Team