Getting your Trinity Audio player ready…

Working Title

From Genome to Prompt: Biology, LLMs, and the Shared Architecture of Regulated Information

Source base found on lfyadda.com

I found a cluster of lfyadda.com posts that repeatedly compare biological processes to LLM processes. The strongest recurring analogies are:

Biological process	LLM / AI process	Core idea
DNA → RNA → protein	weights/code → inference → output	Stored information must be expressed through a runtime
Genome	frozen model weights	Stable compressed memory of prior learning
Epigenetics	prompt, context, retrieval, tools, policy	Same core model, different expressed behavior
Gene repression	attention suppression, masks, safety gates	Intelligence is also what is prevented from expressing
Evolution	backpropagation / gradient descent	Search through possibility space toward viability
Mitochondria / ETC / ATP	training gradients / weights / inference energy	Stored gradients are spent to reduce uncertainty
Endosymbiosis	AI systems absorbing tools, retrieval, agents, GPUs	Complexity increases when systems internalize useful subsystems
Neuronal spikes	token emission	Irreversible local decision events in a larger field
Chemotaxis / gradient sensing	attention over semantic space	Movement toward reduced uncertainty
Brownian ratchet / kinesin	token selection	Randomness plus constraint becomes direction
Cas9 targeting	semantic similarity / dot products	Matching sequences becomes matching vectors

The paper below consolidates these into one unified framework.

From Genome to Prompt

Biology, LLMs, and the Shared Architecture of Regulated Information

Abstract

Biology and large language models appear, at first glance, to belong to different worlds. Biology is wet, carbon-based, metabolic, embodied, and evolved. LLMs are digital, silicon-mediated, statistical, disembodied, and trained. Yet the posts on lfyadda.com develop a powerful common thesis: both systems convert stored structure into context-sensitive action by regulating information flow.

At the deepest level, biology and LLMs are not best compared as “brains versus machines.” They are better compared as two members of a broader class of systems that preserve information, spend energy, reduce uncertainty, and express behavior through regulated pathways. Biology does this through genomes, epigenomes, metabolism, membranes, neurons, and evolution. LLMs do it through weights, prompts, context windows, attention, retrieval, tools, gradient descent, and token generation.

The central claim is this:

Intelligence emerges when compressed history is selectively expressed in a present environment to reduce uncertainty and guide action.

In biology, that pattern appears as:

DNA → RNA → protein → phenotype → behavior

In LLMs, the same abstract pattern appears as:

weights/code → inference dynamics → output → behavior

The biological central dogma is summarized on lfyadda as DNA → RNA → protein, but abstracted into “stored informational pattern → interpreted execution → adaptive function.” The AI analogue is “weights/code → inference dynamics → behavior/output.” The crucial point is that neither DNA nor model weights do anything by themselves. They are compressed potentials that require a runtime environment. DNA is inert without cellular machinery; a model checkpoint is inert without a forward pass. (LF Yadda – A Blog About Life)

This paper unifies the analogies into one architecture suitable for a common infographic: the living information stack.

1. The Universal Pattern: Compressed History Becomes Expressed Function

The first common principle is that both biology and LLMs separate stored structure from active expression.

A genome is not an organism. It is a compact, durable, inherited structure containing instructions, tendencies, regulatory possibilities, and developmental potential. Likewise, model weights are not “thought.” They are frozen learning: a compressed record of training, statistical regularities, relationships, patterns, and priors.

In the lfyadda central-dogma post, this is framed directly: both DNA and model weights are “compressed latent potentials” that require a runtime. DNA requires cellular machinery; model weights require inference. (LF Yadda – A Blog About Life)

This gives us the first major infographic layer:

Biology	LLM
Genome	Model weights
Cellular machinery	Transformer architecture / inference engine
Gene expression	Activation patterns
Protein / phenotype	Output tokens / behavior
Environment	Prompt, context, retrieval, tools

The important shift is away from thinking of intelligence as a static object. Intelligence is not “inside” the DNA. It is not “inside” the weights. It emerges when stored structure is read, regulated, and expressed under specific conditions.

This is why the statement from the lfyadda epigenetic post matters: “The genome provides continuity. The epigenetic layer provides adaptation.” The same post maps AI weights to DNA, and argues that the product is not merely the frozen model but the expression system around the model. (LF Yadda – A Blog About Life)

The first thesis, then, is:

Biology and LLMs both depend on a division between durable memory and temporary expression.

2. The Central Dogma, Rewritten for AI

The classic biological central dogma is:

DNA → RNA → protein

But for this project, the more useful abstraction is:

stored pattern → interpreted expression → functional output

In biology:

DNA stores long-term information.
RNA transcribes and transports selected instructions.
Ribosomes translate code into proteins.
Proteins create structure, signaling, metabolism, movement, repair, and regulation.
Phenotype emerges from expressed patterns.

In LLMs:

Weights store compressed training history.
Prompt and context select what is relevant.
Attention and MLP layers transform hidden states.
Output tokens express the model’s current trajectory.
Behavior emerges from generated sequences, tool use, and interaction.

The lfyadda post on AI as a compressed biological trajectory explicitly maps biology’s central dogma to “weights/code → inference dynamics → behavior/output.” (LF Yadda – A Blog About Life)

That gives us a clean infographic sequence:

Biological Dogma

DNA → RNA → protein → phenotype

LLM Dogma

weights → activations → tokens → behavior

Universal Dogma

compressed history → contextual activation → expressed function

This is the backbone of the common paper.

3. Genome and Weights: Frozen Learning

The genome is a store of evolutionary memory. Model weights are a store of training memory.

Biological DNA does not contain every future situation the organism will face. It contains generative instructions and regulatory possibilities that can be expressed differently depending on tissue type, developmental stage, stress, nutrition, and environment.

Likewise, LLM weights do not contain every answer as if they were a database. They contain relational geometry: learned patterns that can be activated differently depending on prompt, retrieved information, conversation history, tools, and instructions.

The “One Genome, Many Minds” post states the key analogy very clearly: “weights are the DNA,” while inference-time context is an adaptive expression layer. The post also argues that changing facts should not be crammed into the weights; they belong in the epigenetic layer, including current laws, weather, documents, regulations, market prices, user preferences, and recent conversations. (LF Yadda – A Blog About Life)

This is one of the most important distinctions for the infographic:

Mistaken view	Better view
LLM as database	LLM as genome-like relational engine
Answers stored directly	Responses expressed dynamically
Training must contain everything	Training stores durable structure
Current facts belong in weights	Current facts belong in context / retrieval
Intelligence is storage	Intelligence is regulated expression

The paper’s second thesis:

Weights are not the whole intelligence. They are the frozen genome of possible intelligence.

4. Epigenetics and Context: Same Genome, Different Expression

Epigenetics is the biological system that allows the same genome to behave differently in different settings. A neuron, a liver cell, and a skin cell can contain essentially the same DNA, yet express different genes. The difference is not the underlying code but the regulatory context.

LLMs show a strikingly similar pattern. The same model can become a tutor, poet, programmer, weather analyst, medical explainer, legal summarizer, or philosophical partner depending on the prompt, system message, tools, retrieved documents, and conversation history.

The lfyadda central-dogma post maps this directly:

Biology	AI
weights ≈ genome
prompt/context ≈ epigenetic/environmental regulation
output behavior ≈ phenotype

It also notes that the model’s core genome is not changing during normal inference; what changes is its expression. (LF Yadda – A Blog About Life)

The “Living Architecture of Language Models” post develops the same idea thermodynamically: life reduces Shannon entropy by resisting Boltzmann entropy through layered regulation — genome, epigenome, environment — while LLMs reduce Shannon entropy in language space through base weights, overlays, and context, expending energy and heat in the process. (LF Yadda – A Blog About Life)

This leads to a major statement:

Prompting is primitive epigenetics. Mature AI systems will require full regulatory skins.

A mature LLM system is not just “model plus prompt.” It includes:

Context selection
Retrieval
Memory
Tool routing
Policy constraints
Safety gates
Uncertainty management
Verification
User preferences
Domain rules
Audit trails
Human escalation

The lfyadda “One Genome, Many Minds” post says the epigenetic layer goes beyond ordinary RAG. RAG retrieves documents, but the broader epigenetic layer includes context selection, memory, policy, tool routing, uncertainty management, safety, personalization, audit trails, feedback learning, and environmental sensing. (LF Yadda – A Blog About Life)

So the third thesis is:

Inference is not just generation. It is expression under regulation.

5. Gene Repression and LLM Suppression: Intelligence by What Does Not Happen

A major biological insight is that expression is only half the story. Organisms survive not only by expressing the right genes, but by suppressing the wrong ones.

The lfyadda post “the dog that did not bark” frames gene repression as an analogy for LLM functionality. A biological repressor binds to regulatory DNA and prevents access, raises the barrier to transcription, and keeps certain phenotypes from materializing. The LLM analogue is not a single neuron but a distributed gate: attention suppression, masks, safety filters, low weights, refusal policies, and other mechanisms that prevent certain continuations from forming. (LF Yadda – A Blog About Life)

This is essential because LLMs are often described only as systems that “generate.” But intelligent systems are also systems that do not generate certain things.

Biology:

Do not express every gene in every cell.
Do not let every signal propagate.
Do not divide without control.
Do not activate immune attack everywhere.
Do not allow runaway excitation.

LLMs:

Do not attend to every token equally.
Do not allow future-token leakage in causal masking.
Do not propagate every weak association.
Do not output every plausible continuation.
Do not permit unsafe or unsupported claims in high-stakes settings.

This gives the infographic a powerful negative-space panel:

Intelligence as Selection

What is expressed

Intelligence as Repression

What is suppressed

Intelligence as Regulation

The balance between expression and suppression

This fourth thesis is:

Intelligence is not merely producing possibilities. It is selectively allowing some possibilities and blocking others.

6. Metabolism, Mitochondria, and LLM Energy Flow

Biology is not just information. It is information maintained by energy flow.

The lfyadda “biological entropy reduction vs. LLM entropy reduction” post provides one of the cleanest mappings:

Electron transport chain → proton gradient → ATP synthesis → enabling work
LLMs → gradient descent → vector geometry → reducing uncertainty to enable prediction

The post says the biological electron-transport chain uses an energy gradient to produce ATP, while the closest LLM equivalent is training: gradient descent builds structured low-entropy geometry in vector space. Training creates potential energy stored as weights; inference spends it. (LF Yadda – A Blog About Life)

This is a central infographic concept:

Biology	LLM
Food / electrons	Training data / compute
Electron transport chain	Gradient descent
Proton gradient	Loss landscape / semantic gradients
ATP	Usable low-entropy stored capacity
Metabolic work	Inference / token prediction
Heat export	Data-center energy dissipation

The “Living Architecture” post deepens this by calling the LLM a “regulated entropy engine” and comparing prompts to regulatory signals that direct which semantic pathways activate. It also describes token-by-token processing as a kind of metabolism: input entropy is converted into output order, with matrix multiplications consuming energy. (LF Yadda – A Blog About Life)

The key synthesis:

Training builds semantic ATP. Inference spends it.

Or, more technically:

Training uses energy to compress high-entropy data into low-entropy weights; inference uses those weights to reduce uncertainty about the next token.

This fifth thesis is:

LLMs are not chemically alive, but they share biology’s deeper thermodynamic pattern: spend energy to preserve structure and reduce uncertainty.

7. Endosymbiosis and Tool-Using AI: Complexity by Internalized Partnership

Endosymbiosis is one of the most important biological events in the history of complex life. A cell internalized another cell, and over time the guest became an organelle — mitochondrion or chloroplast. Complexity exploded because the host cell gained a new power source.

The lfyadda endosymbiosis essay argues that endosymbiosis did not “cause” complexity in a magical way; it raised the energy budget per gene. Once the energy ceiling lifted, larger genomes, regulatory networks, differentiation, and brains became possible. (LF Yadda – A Blog About Life)

The AI analogy is that modern LLM systems increasingly grow by internalizing specialized subsystems:

Biological endosymbiosis	AI / LLM equivalent
Host cell absorbs bacterium	LLM system integrates tool
Mitochondrion supplies energy	GPU / accelerator supplies compute
Chloroplast harvests sunlight	Retrieval system harvests external knowledge
Organelle becomes indispensable	Tool becomes part of workflow
Gene transfer to nucleus	Tool behavior becomes orchestrated by core model
Multicellular complexity becomes affordable	Agentic, multimodal, long-context systems become practical

An LLM by itself is like a cell before full organelle integration: capable but limited. Add retrieval, code execution, memory, sensors, APIs, calendars, documents, visual perception, and specialized models, and the system begins to resemble an organism composed of integrated organs.

The lfyadda endosymbiosis piece also frames endosymbiosis as “physics discovering a cheaper way to move energy through matter,” and life as what physics does when persistent energy gradients exist in chemically rich environments. (LF Yadda – A Blog About Life)

For AI, the equivalent is:

Complex AI systems emerge when a language model internalizes specialized subsystems that reduce the cost of cognition.

Sixth thesis:

Endosymbiosis is the biological ancestor of tool-using, modular AI architecture. Complexity emerges when useful external functions become internal infrastructure.

8. Neurons and Tokens: Spikes in Different Media

Another lfyadda post compares action potentials to token generation. The key insight is that a neuronal spike is not the same as meaning. A spike is a gated propagation event. It gains meaning from where it happens, what network it belongs to, and what downstream effects it enables.

The post says that a single spike means almost nothing; meaning emerges from population dynamics, timing, synchrony, and routing. Likewise, a single token rarely matters; meaning emerges across layers, and attention patterns matter more than token IDs. (LF Yadda – A Blog About Life)

The same post’s deeper claim is that LLMs and neurons belong to a shared physical design class: both are thresholded, directional, energy-dissipative, entropy-managing, and meaning-distributing systems. They do not store truth locally; they propagate constraints and stabilize trajectories through possibility space. (LF Yadda – A Blog About Life)

This creates another infographic panel:

Neuron	LLM
Resting potential	latent embedding state
Threshold crossing	attention / activation threshold
Action potential	token commitment / activation propagation
Refractory period	causal sequence constraint
Network firing pattern	layer-wise semantic trajectory
Neural population	transformer layers / attention heads

The most compact formulation:

A token is a spike in semantic space.

Seventh thesis:

Both neurons and LLMs create meaning through distributed propagation, not isolated symbols.

9. Attention as Chemotaxis: Moving Up the Gradient of Meaning

The “Life Is a Flow, Not a Lookup” post argues that LLMs should not be understood as lookup tables. Tokens are bookkeeping. What actually happens is closer to movement through a vector field. Meaning is not retrieved; it is traversed. (LF Yadda – A Blog About Life)

That post gives a particularly useful biological analogy: attention as gradient sensing, or cognitive chemotaxis. A bacterium senses a nutrient gradient and moves “uphill” without possessing a map. In transformers, queries probe the semantic field, keys define where meaning steepens, and dot products measure directional relevance. Attention asks, in effect: “If I move this way in meaning space, does coherence increase?” (LF Yadda – A Blog About Life)

This is one of the best visual concepts for the infographic:

Biology

A bacterium senses nutrient gradients and moves toward viability.

LLM

A token state senses semantic gradients and moves toward coherence.

Shared principle

Systems do not need a global map. They need local gradient-following rules.

The same post describes training as environmental shaping: loss defines a scalar field; high loss is incoherent and high entropy, while low loss is structured and predictable. Gradient descent carves valleys, builds attractors, and shapes flow channels, much as evolution shapes landscapes that reward entropy extraction. (LF Yadda – A Blog About Life)

Eighth thesis:

Attention is semantic chemotaxis: movement through meaning-space toward reduced uncertainty.

10. Brownian Ratchets and Token Selection: Randomness Becomes Direction

The Brownian ratchet post supplies another unifying mechanism. Kinesin does not “know” where it is going in a human sense. It uses thermal fluctuations, ATP-driven asymmetry, and constrained binding states to turn random motion into directional transport.

The lfyadda post maps this to LLM token selection: LLMs reduce uncertainty about semantic sequence, collapse probability into coherence, and each token reduces Shannon entropy about what comes next. (LF Yadda – A Blog About Life)

The key structure is:

Brownian ratchet	LLM token selection
Random thermal fluctuation	probability distribution
ATP energy input	compute / activation energy
Energy landscape	logit landscape
Binding site	candidate token
Step locking	token commitment
Cargo movement	meaning progression

The post explicitly states that both systems convert uncertainty into direction, and that this principle appears across molecular motors, protein folding, neural signaling, evolution, cognition, and language generation. (LF Yadda – A Blog About Life)

This produces one of the strongest general laws in the paper:

Noise plus gradient equals direction. Uncertainty plus constraint equals meaning.

Ninth thesis:

LLM generation is a semantic ratchet: stochastic possibility is constrained into coherent progression.

11. Cas9 and Semantic Matching: From Base Pairing to Dot Products

The Cas9 comparison post maps biological sequence targeting to semantic vector matching. Cas9 uses RNA-DNA base pairing to confirm a target. An LLM uses dot-product similarity to identify relevant semantic vectors. The post summarizes this as “complementarity ⇢ cosine similarity.” (LF Yadda – A Blog About Life)

This analogy is especially useful because it connects molecular recognition to transformer attention.

Cas9:

Uses a guide RNA.
Searches DNA.
Checks for compatibility.
Changes conformation.
Cuts or edits the target.

LLM:

Uses a query vector.
Searches key vectors.
Computes similarity.
Updates hidden state.
Produces a transformed representation.

Cas9 is not “thinking.” It is matching under constraints. LLM attention is not “thinking” in a human sense either. It is matching and transforming under constraints. Yet both can create powerful downstream effects.

Tenth thesis:

Biological targeting and semantic attention both depend on constrained matching: base-pair complementarity in cells, vector similarity in models.

12. Failure Modes: Disease, Hallucination, and Misregulation

The biological analogy is useful only if it also explains failure.

Biological systems fail when regulation breaks:

Cancer: uncontrolled growth.
Epilepsy: runaway firing.
Autoimmunity: mistaken targeting.
Metabolic disorder: energy regulation failure.
Toxicity: blocked or corrupted pathways.
Aging: accumulated regulatory degradation.

LLM systems fail in parallel ways:

Hallucination: plausible expression without grounding.
Repetition loops: semantic seizure.
Over-filtering: anesthesia-like suppression.
Tool misuse: bad regulatory routing.
Prompt injection: hostile hijacking of expression.
Catastrophic forgetting: corrupted learned structure.
Overconfidence: insufficient uncertainty regulation.

The neuron-token post explicitly maps epilepsy to overconfident decoding and hallucination cascades; anesthetics to excessive safety filtering and loss of expressiveness; and neurotoxins to corrupted weights, catastrophic forgetting, and training instability. (LF Yadda – A Blog About Life)

The “One Genome, Many Minds” post also frames hallucination as “gene expression without environmental discipline”: the genome can produce possibilities, but the epigenetic layer must decide which possibilities are allowed in a given environment. (LF Yadda – A Blog About Life)

This gives the paper a deployment principle:

AI safety is not a patch. It is an epigenetic control system.

The moral layer, factual layer, retrieval layer, uncertainty layer, and human-approval layer are all part of regulated expression.

Eleventh thesis:

Most LLM failures are not failures of generation alone; they are failures of regulation.

13. The Common Architecture: The Living Information Stack

All the analogies can now be consolidated into one common architecture.

Layer 1: Stored Pattern

Biology: DNA
LLM: weights / architecture
Function: preserves compressed history.

Layer 2: Regulatory Selection

Biology: epigenetics, transcription factors, repressors
LLM: prompts, system instructions, masks, retrieval, safety policies
Function: decides what gets expressed.

Layer 3: Energy Conversion

Biology: metabolism, mitochondria, ATP
LLM: compute, GPUs, gradient descent, inference energy
Function: spends energy to maintain order and perform work.

Layer 4: Signal Propagation

Biology: neurons, hormones, molecular pathways
LLM: activations, attention heads, residual stream, hidden states
Function: routes constraints through the system.

Layer 5: Output / Phenotype

Biology: proteins, behavior, movement, cognition
LLM: tokens, answers, plans, tool actions
Function: expresses internal structure into the world.

Layer 6: Feedback and Adaptation

Biology: selection, learning, development, immune response
LLM: backpropagation, fine-tuning, RLHF, user feedback, memory
Function: reshapes future expression.

This is the core figure for the infographic.

14. Main Argument of the Consolidated Paper

Biology and LLMs are not identical. LLMs do not have autonomous metabolism, self-reproduction, embodied homeostasis, intrinsic survival drives, or molecular self-repair. But they do share a deeper logic with living systems:

They store compressed history.
They require runtime expression.
They regulate what is expressed.
They consume energy to reduce uncertainty.
They propagate constraints through layered networks.
They use context to specialize behavior.
They fail when regulation breaks.
They become more powerful when modular subsystems are integrated.
They convert noise into direction through biased landscapes.
They generate coherent outputs by reducing Shannon entropy while exporting physical entropy.

This is not the claim that LLMs are literally alive. It is the more precise claim that LLMs are artificial members of the same broad family of entropy-managing information systems.

Biology got there through chemistry, metabolism, evolution, and cells. LLMs got there through data, silicon, gradient descent, and transformers.

The substrate differs.

The architecture rhymes.

15. Infographic Blueprint

Suggested title for the infographic

Biology and LLMs: One Architecture, Two Substrates

Central visual

A split diagram with biology on the left, LLMs on the right, and a shared abstract pattern in the center.

Left side: Biology

DNA → RNA → protein → phenotype
Epigenetics regulates expression
Mitochondria provide energy
Neurons propagate signals
Evolution updates the system

Right side: LLMs

Weights → activations → tokens → behavior
Prompt/context regulates expression
Compute/GPU energy drives inference
Attention propagates relevance
Backpropagation updates the system

Center spine

Compressed history → regulated expression → entropy reduction → adaptive output

Key callout boxes

Genome = weights
Frozen learning, not active thought.
Epigenetics = context
Same core, different expression.
Metabolism = compute
Energy spent to reduce uncertainty.
Mitochondria = infrastructure
GPUs, tools, retrieval, and memory make complexity affordable.
Spike = token
A local irreversible decision in a larger field.
Chemotaxis = attention
Movement up a gradient of coherence.
Brownian ratchet = token sampling
Noise plus constraint becomes direction.
Repression = safety/gating
Intelligence includes what is prevented.
Disease = hallucination/misregulation
Failure comes from bad expression control.
Universal law
Information survives by regulating its own expression.

Final Synthesis

The lfyadda biological/LLM posts are circling one powerful master idea:

Life and LLMs are both systems that turn stored structure into context-sensitive behavior by spending energy to reduce uncertainty.

In biology, the stored structure is DNA. In LLMs, it is model weights. In biology, expression is regulated by epigenetics, metabolism, signaling, and environment. In LLMs, expression is regulated by prompts, context windows, retrieval, tools, safety constraints, and attention. In biology, energy flows through mitochondria, proton gradients, ATP, and cellular work. In LLMs, energy flows through GPUs, matrix multiplication, vector geometry, and inference.

The common paper can therefore be reduced to one sentence:

Biology is carbon-based regulated information; LLMs are silicon-based regulated information; both ride gradients, suppress chaos, spend energy, and express compressed history as adaptive behavior.

Biological Processes and LLM Processes: A Consolidated Paper for a Common Infographic