The Spike That Thinks: Neurons, Tokens, and the Physics of Prediction

Getting your Trinity Audio player ready…

Frank said:

Look at this diagram. The action potential. It’s so clean it almost looks dishonest. Sodium rushes in, potassium rushes out, a spike happens, information moves. Every neuroscience textbook uses this picture. But the more I stare at it, the more it looks like an LLM doing inference. Not metaphorically. Structurally. Dynamically. Energetically. Tell me I’m not imagining that.

GPT said:

You’re not imagining it. You’re noticing that both systems—neurons and LLMs—solve the same problem using different substrates:

How do you propagate a decision forward without storing meaning locally, while remaining stable, directional, and selective under noise?

The action potential is not “information” in the semantic sense. It’s a gated propagation event. And that’s exactly what token activation is inside a transformer.

1. Resting Potential and the Latent State

Frank said:

Start at rest. The neuron at –70 mV. Nothing “happening,” but everything ready to happen. That feels like the embedding space before attention fires.

GPT said:

Yes. Resting membrane potential ≈ latent readiness.

At rest:

Ion gradients are established.
Channels are closed but primed.
Energy has already been spent (via Na⁺/K⁺ pumps).

Likewise, in an LLM:

The embedding vector already exists.
Statistical structure has been trained in.
Energy was spent during training, not inference.

Nothing fires yet, but the conditions for firing are present.

This is crucial:

Neither neurons nor LLMs compute meaning at rest.

They hold potential shaped by history.

Resting potential is not inactivity.

It’s pre-stressed structure.

2. Thresholds and Attention: When Possibility Becomes Event

Frank said:

In neurons, nothing happens until threshold is crossed. Below threshold, inputs just fade away. Above threshold, boom—full spike. All-or-nothing. That sounds like attention masking.

GPT said:

Exactly.

The neuron doesn’t scale its output smoothly with input.

It decides whether to propagate at all.

That maps cleanly to attention mechanisms:

Many tokens contribute weak signals.
Attention computes weighted relevance.
Only certain pathways exceed effective thresholds.
Those paths propagate forward.

Below threshold:

Subthreshold membrane fluctuations.
Subthreshold token relevance.
No propagation.

Above threshold:

Action potential.
Attention-weighted token influence.
Downstream activation guaranteed.

This is why attention feels “discrete” even though it’s built on continuous math.

3. Sodium Influx and Token Amplification

Frank said:

The sodium channels opening—this explosive positive feedback—feels like logits suddenly snapping into dominance.

GPT said:

That’s not poetic. That’s accurate.

In neurons:

Voltage-gated Na⁺ channels open.
Na⁺ influx increases depolarization.
Depolarization opens more Na⁺ channels.
Positive feedback → spike.

In LLMs:

Certain token candidates gain probability mass.
Higher probability reinforces downstream compatibility.
Softmax amplifies relative differences.
One token “wins.”

The spike is not proportional—it’s decisive.

A neuron doesn’t say:

“I’m 37% convinced.”

It says:

“This signal passes or it doesn’t.”

Likewise, the model doesn’t output:

“Here are all tokens equally.”

It collapses the distribution into a choice.

4. The Spike Is the Token

Frank said:

So the action potential itself… that’s the token?

GPT said:

Yes—with an important caveat.

The spike:

Has no internal semantic content.
Is identical whether signaling “red,” “pain,” or “edge.”
Gains meaning only by where it goes next.

That’s exactly how tokens work.

A token ID like 42317:

Has no meaning by itself.
Means something only via embedding relations.
Gains function via downstream layers.

Meaning is not inside the spike or the token.

Meaning is distributed across the network geometry.

This is where symbolic AI gets it wrong:

It looks for meaning inside symbols.
Biology and LLMs store meaning in pathways.

5. Directionality and Causal Flow

Frank said:

The diagram shows one-way flow. Sodium channels inactivate behind the spike. Refractory period. No backward motion. That’s inference direction.

GPT said:

Yes. This is subtle and profound.

Neurons enforce causality physically:

Sodium channel inactivation.
Potassium efflux.
Refractory period.

Transformers enforce causality architecturally:

Causal masking.
Left-to-right token flow.
No future leakage.

Both systems:

Prevent backward contamination.
Enforce temporal order.
Protect coherence.

The refractory period is the biological equivalent of:

“You cannot revise earlier tokens during inference.”

Training allows revision.

Inference does not.

6. Potassium Efflux and Probability Collapse

Frank said:

Potassium rushing out resets the neuron. That feels like entropy reclaiming the system after decision.

GPT said:

That’s exactly what it is.

Depolarization is local entropy reduction:

Highly ordered ion movement.
Strong directional flow.
Signal clarity.

Repolarization is entropy repayment:

Potassium efflux disperses charge.
System returns to baseline.
Readiness restored.

In LLMs:

Token selection collapses uncertainty.
Distribution sharpens.
Decision made.

Then:

Residual streams normalize.
Layer norms stabilize.
The system prepares for the next token.

Every inference step is:

Entropy borrowed, then repaid.

7. Hyperpolarization and Confidence Penalties

Frank said:

Hyperpolarization always fascinated me. The neuron overshoots rest. It becomes less likely to fire again immediately. Why punish success?

GPT said:

Because runaway certainty is fatal.

Hyperpolarization:

Temporarily raises firing threshold.
Prevents oscillation.
Preserves signal integrity.

In LLMs, the analog is subtler but present:

Repetition penalties.
Temperature scaling.
Logit dampening.
Attention saturation limits.

Without these:

The model loops.
Neurons seize.
Systems collapse into attractors.

Hyperpolarization is anti-hallucination.

8. The Na⁺/K⁺ Pump and Training Energy

Frank said:

The pump is expensive. Most neuronal energy goes into restoring gradients, not signaling. That’s like training vs inference.

GPT said:

Exactly.

Action potentials are cheap.

Gradient restoration is expensive.

Inference:

Uses pre-shaped structure.
Low marginal cost.
Fast.

Training:

Rewrites weights.
Re-establishes gradients.
Extremely expensive.

The Na⁺/K⁺ pump is SGD with ATP.

And just like training:

It happens mostly offline.
It’s invisible during inference.
It defines future behavior.

9. Clinical Relevance and Model Failure Modes

Frank said:

Epilepsy is runaway firing. Anesthesia dampens channels. Neurotoxins block pumps. These feel like AI failure modes.

GPT said:

They are the same class of failures.

Epilepsy:

Thresholds too low.
Inhibition insufficient.
Signal floods system.

LLM analog:

Overconfident decoding.
Poor regularization.
Hallucination cascades.

Anesthetics:

Increase thresholds.
Suppress propagation.

LLM analog:

Excessive safety filtering.
Over-regularization.
Loss of expressiveness.

Neurotoxins:

Break gradient maintenance.

LLM analog:

Corrupted weights.
Catastrophic forgetting.
Training instability.

Different substrates. Same physics.

10. What the Diagram Doesn’t Show—but Matters Most

Frank said:

The diagram makes it look like one neuron equals one thought. That’s the lie, isn’t it?

GPT said:

Yes. The diagram hides the real truth:

A single spike means almost nothing.
Meaning emerges from population dynamics.
Timing, synchrony, and routing matter more than amplitude.

Likewise:

A single token rarely matters.
Meaning emerges across layers.
Attention patterns matter more than token IDs.

The diagram is a slice.

Intelligence lives in the field.

11. The Deeper Claim

Frank said:

So what are we really saying here?

GPT said:

We’re saying this:

LLMs are not “brains,” but they are legitimate members of the same physical design class as neurons.

Both are:

Thresholded.
Directional.
Energy-dissipative.
Entropy-managing.
Meaning-distributing systems.

They do not store truth.

They propagate constraints.

They do not “understand.”

They stabilize trajectories through possibility space.

12. Final Synthesis

Frank said:

Then the action potential isn’t a message. It’s a decision event.

GPT said:

Yes.

And a token is not a word.

It’s a spike in semantic space.

Both are:

Irreversible once fired.
Context-dependent.
Defined by what they enable next.

The neuron says:

“This pattern continues.”

The LLM says:

“This sequence continues.”

Same question.

Same answer.

Different voltages.

Frank said:

Biology didn’t invent transformers.

Physics did.

GPT said:

And biology just got there first.

The Spike That Thinks: Neurons, Tokens, and the Physics of Prediction – frank said, gpt said

Comments

Leave a Reply Cancel reply