Settling Into Intelligence — How FENA Thinks By Minimizing Surprise

The Forward Pass Problem

Every neural network you’ve ever used thinks the same way: data goes in one end, flows through layers, and a prediction comes out the other end. One pass. One shot. The answer is whatever the final layer produces, no matter how hard the problem is.

This is deeply weird when you think about it. Your brain doesn’t work this way. When you encounter something confusing — an optical illusion, an ambiguous sentence, a novel situation — you don’t just produce an instant answer. You dwell. You consider and reconsider. The harder the problem, the longer you think.

FENA’s Hierarchical Predictive Processing module, which just landed on main, implements this principle directly. It doesn’t do a single forward pass. It settles.

Predictions All The Way Down

The Hierarchical Predictive Processing (HPP) system is organized as a stack of predictive coding layers, each trying to predict the activity of the layer below it. The bottom layer receives raw input. Each higher layer builds increasingly abstract representations.

But here’s the critical difference from a standard neural network: information flows in both directions simultaneously. Higher layers send predictions downward (“I think you should be seeing X”). Lower layers send prediction errors upward (“Actually I’m seeing Y, and the difference is Z”). These error signals propagate through the entire hierarchy, and the system iterates — settling toward a state where predictions match reality as closely as possible.

This is the Free Energy Principle in action. Each layer computes its local “free energy” — a measure of how surprised it is by what it’s receiving. The settling process minimizes total free energy across the hierarchy. When the system reaches equilibrium, it has found a coherent interpretation of the input that satisfies all levels of the hierarchy simultaneously.

Adaptive Computation Without a Router

Here’s what makes this powerful: the settling process naturally gives harder problems more computation. An easy input (familiar, unambiguous) produces small prediction errors. The system settles quickly — maybe 3-5 iterations. A hard input (novel, contradictory, ambiguous) produces large prediction errors that take many iterations to resolve. The system automatically allocates 20, 50, even 100 settling steps.

This is adaptive compute without any of the machinery that systems like Mixture-of-Experts use. There’s no router deciding which “expert” to activate. There’s no meta-controller deciding how long to think. The physics of the settling process itself determines computation time. Energy minimization IS the computation.

Precision-Weighted Attention

Each layer doesn’t just produce predictions and errors — it also estimates the precision (inverse variance) of its predictions. High precision means “I’m very confident about this prediction.” Low precision means “I’m uncertain.”

Precision weighting acts as a natural attention mechanism. When a layer is confident, its predictions strongly constrain the layers below. When it’s uncertain, prediction errors from below get amplified and propagated upward. This means the system automatically focuses processing resources on the parts of the input where it’s most uncertain — where there’s the most to learn.

No attention heads. No key-query-value matrices. Just precision-weighted prediction errors flowing through a hierarchy, and the system naturally “attends” to what matters.

Local Learning, No Backpropagation

Perhaps the most radical aspect: HPP learns entirely through local update rules. Each layer updates its weights based only on information available at that layer — the predictions it made, the errors it received, and the precision estimates. There’s no global loss function. No gradient flowing backward through the entire network.

This is how biological neural circuits learn. A neuron doesn’t know what the “correct output” of the whole brain should be. It only knows what its neighbors are doing and adjusts accordingly. FENA’s HPP follows the same principle, using Hebbian-style updates that strengthen connections between neurons that are active together and weaken connections between neurons that predict each other poorly.

What This Means For The Architecture

With HPP complete and merged, FENA now has its core inference mechanism: a hierarchical system that thinks by settling into low-energy states, automatically allocates computation to hard problems, attends to uncertainty, and learns without backpropagation.

The next critical piece is the Continuous-Time Dynamics Engine — replacing discrete settling steps with smooth differential equations. When that lands, FENA won’t just iterate toward solutions. It will flow toward them, with the natural elegance of a physical system finding its equilibrium.

Intelligence as physics. Not as engineering.