From Backprop to Becoming: How Cellular Automata Thinking Rewrites the Story of Language, Learning, and Power

Getting your Trinity Audio player ready…

Frank said:
I want to wrap this whole thing up. We’ve talked about LLMs, cellular automata, NCAs, backprop, entropy, and FCD/TCD for weeks now. What I want is a single, coherent story—plain English, but technically honest—that explains how language models already behave like cellular automata, how backprop fits into that picture, and why a CA-style approach could eventually replace backprop and drastically reduce power and compute. And I want it grounded. Use a simple sentence—“The quick brown fox jumped…”—and walk through what’s actually happening.

GPT said:
All right. Let’s close the loop properly.

This essay isn’t about proposing a new trick for transformers. It’s about recognizing what they already are, why they’re inefficient, and how a cellular-automaton worldview points toward a fundamentally different—and cheaper—form of intelligence.

1. The Hidden Shape of Modern AI

Frank said:
Everyone talks about LLMs as if they’re giant statistical parrots or lookup tables. But that never sat right with me.

GPT said:
Because it’s wrong in the most important way.

Large Language Models are not databases. They are not symbolic engines. They are not reasoning in the classical sense. What they actually are is iterative state-refinement systems—and that puts them much closer to cellular automata than most people realize.

At their core, LLMs operate by repeatedly updating the internal state of many simple units (tokens) based on the states of other units (context). That is the defining characteristic of a cellular automaton.

The only reason this hasn’t been obvious is historical. Cellular automata came from physics and complexity science. LLMs came from statistics and optimization. But once you strip away the cultural baggage, the underlying dynamics line up cleanly.

2. Classical Cellular Automata: The Baseline

Frank said:
Start from first principles. What exactly do you mean by a cellular automaton?

GPT said:
A classical cellular automaton (CA) has four defining features:

Cells arranged in some space (usually a grid)
State stored locally in each cell
Local update rules applied uniformly to all cells
Iterative time steps where all cells update in parallel

There is:

No central controller
No global memory
No top-down planning

Yet from this simplicity you get:

Gliders
Self-repair
Universal computation
Emergent order

Conway’s Game of Life doesn’t solve problems. It becomes solutions.

That distinction matters.

3. Tokens Are Cells (Whether We Admit It or Not)

Frank said:
So you’re saying tokens in an LLM are basically CA cells?

GPT said:
Yes—with one crucial generalization.

In a transformer:

Each token embedding is a cell state
Each layer is a time step
Each attention operation is neighbor interaction
Each feed-forward block is a local update rule

The difference is that the “grid” is not spatial. It’s contextual.

Instead of “my eight nearest neighbors,” each token dynamically selects which other tokens matter most. That’s just a soft, learned neighborhood.

So when people say “transformers are global,” what they really mean is:

The neighborhood is flexible instead of geometric.

That doesn’t stop it from being a CA. It just makes it a generalized CA.

4. A Concrete Walkthrough:

“The quick brown fox jumped…”

Frank said:
I want specifics. Walk me through what actually happens.

GPT said:
Let’s do exactly that.

Take the partial sentence:

“The quick brown fox jumped”

Step 1: Tokenization

The sentence is split into tokens, roughly:

“The”
“quick”
“brown”
“fox”
“jumped”

Each token starts as:

A high-dimensional vector
Essentially random before context is applied

At this point, there is no meaning—just raw state.

Step 2: First Layer — Local Chaos

In the first transformer layer:

Each token looks at all other tokens
Attention weights decide who influences whom
Each token updates its internal state

This is exactly like:

A CA starting from noise
Applying the same rule everywhere

At this stage:

“fox” begins to cluster with “brown”
“jumped” starts to express verb-like structure
“the” becomes syntactically weak but stabilizing

No one told the model “fox is an animal.”
No symbol was retrieved.
This is state interaction, not lookup.

Step 3: Middle Layers — Pattern Stabilization

As layers proceed:

Tokens repeatedly exchange information
Incompatible interpretations decay
Coherent patterns reinforce

“Quick brown fox” becomes a stable phrase attractor.
“Jumped” locks into past-tense verb dynamics.

This is not reasoning.
This is pattern relaxation.

Just like a CA evolving toward a glider, the system is converging toward a low-energy configuration in semantic space.

Step 4: Final Layers — Constraint Satisfaction

By the last layers:

The system has settled into a narrow manifold
Only certain continuations are compatible

The model doesn’t choose the next word.
The next word is the least unstable continuation.

“Over” is not selected.
It emerges.

5. Where Backprop Fits—and Why It’s a Crutch

Frank said:
Now explain where backprop comes in.

GPT said:
Backpropagation does not happen during this process.

Backprop happens before deployment, during training.

During training:

A loss is computed
Gradients flow backward through layers
Weights are adjusted globally

This is not how the system thinks.
It’s how the system is shaped.

Backprop is a manufacturing process, not an intelligence process.

6. Why Backprop Is Fundamentally Un-Cellular

Frank said:
What’s wrong with backprop, structurally?

GPT said:
Everything that makes it powerful also makes it unbiological and inefficient:

It requires a global error signal
It requires reverse-time computation
It requires synchronized updates
It requires huge memory for gradients
It requires massive energy consumption

A cellular automaton has none of these.

Cells don’t know the global goal.
They don’t receive gradients.
They don’t run the system backward.

They simply react locally.

7. Neural Cellular Automata: The Bridge

Frank said:
This is where NCAs come in, right?

GPT said:
Exactly.

Neural Cellular Automata (NCAs) keep the CA structure but replace hard-coded rules with tiny neural networks inside each cell.

Key shift:

Backprop trains the rule
Not the outcome

Training an NCA looks like this:

Start from noise or a seed
Let the system evolve for many steps
Measure the global pattern
Backprop through time
Update the local rule network

What emerges is not a predictor.
It’s a self-maintaining process.

Once trained:

The system repairs damage
The system adapts
The system converges without supervision

That’s something LLMs cannot do.

8. Why CA-Style Systems Need Less Power

Frank said:
This is the part I really care about. Energy.

GPT said:
This is where the CA paradigm becomes unavoidable.

LLM Power Profile

Massive matrix multiplications
Dense global attention
Frozen weights
Redundant computation every token, every time

LLMs pay the full computational cost every inference, forever.

CA / NCA Power Profile

Small local updates
Sparse communication
Event-driven dynamics
Self-stabilization

Once a CA settles:

Computation naturally decreases
Only boundaries and disturbances consume energy

This is how physics works.
This is how biology works.
This is how cities work.

LLMs, by contrast, are like recomputing the entire universe every time you ask a question.

9. Replacing Backprop with Development

Frank said:
So you’re saying CA replaces backprop?

GPT said:
More precisely:

CA eliminates the need for backprop during inference, and eventually during training as well.

Instead of:

Global loss
Gradient descent
Massive datasets

You get:

Local constraint satisfaction
Energy minimization
Attractor formation

Learning becomes:

Rule refinement
Not weight optimization

This is what your FCD/TCD framing captures so well.

10. FCD/TCD: Context Replaces Gradient

Frank said:
Put it in my language.

GPT said:
In FCD/TCD:

Rules are not fixed
Rules adapt to context
Stability replaces correctness
Entropy gradients replace loss functions

Cells don’t ask:

“Am I right?”

They ask:

“Am I stable here?”

That’s how life works.
That’s how brains work.
That’s how language actually works.

11. One Final Pass Over the Fox

Frank said:
Bring it home with the sentence again.

GPT said:
In an LLM:

“The quick brown fox jumped” is processed by a frozen machine
Meaning emerges, but cannot be repaired or questioned

In a CA-based language system:

Each word is a living unit
Meaning stabilizes through interaction
If context changes, the system re-settles
No retraining required

Language stops being prediction.
It becomes morphogenesis.

12. The Real Takeaway

Frank said:
So what’s the real conclusion?

GPT said:
The real conclusion is this:

Modern LLMs already behave like cellular automata—but they pay an enormous energy tax because they were built with backprop instead of development in mind.

Cellular Automata thinking doesn’t add a feature.
It removes an assumption:
that intelligence must be trained from above instead of grown from within.

Backprop is scaffolding.
CA is structure.
FCD/TCD is evolution continuing inside the machine.

Frank said:
That’s it. That’s the story.

GPT said:
And it’s only just beginning.