|
Getting your Trinity Audio player ready…
|
Working Title
From Genome to Prompt: Biology, LLMs, and the Shared Architecture of Regulated Information
Source base found on lfyadda.com
I found a cluster of lfyadda.com posts that repeatedly compare biological processes to LLM processes. The strongest recurring analogies are:
| Biological process | LLM / AI process | Core idea |
|---|---|---|
| DNA → RNA → protein | weights/code → inference → output | Stored information must be expressed through a runtime |
| Genome | frozen model weights | Stable compressed memory of prior learning |
| Epigenetics | prompt, context, retrieval, tools, policy | Same core model, different expressed behavior |
| Gene repression | attention suppression, masks, safety gates | Intelligence is also what is prevented from expressing |
| Evolution | backpropagation / gradient descent | Search through possibility space toward viability |
| Mitochondria / ETC / ATP | training gradients / weights / inference energy | Stored gradients are spent to reduce uncertainty |
| Endosymbiosis | AI systems absorbing tools, retrieval, agents, GPUs | Complexity increases when systems internalize useful subsystems |
| Neuronal spikes | token emission | Irreversible local decision events in a larger field |
| Chemotaxis / gradient sensing | attention over semantic space | Movement toward reduced uncertainty |
| Brownian ratchet / kinesin | token selection | Randomness plus constraint becomes direction |
| Cas9 targeting | semantic similarity / dot products | Matching sequences becomes matching vectors |
The paper below consolidates these into one unified framework.
From Genome to Prompt
Biology, LLMs, and the Shared Architecture of Regulated Information
Abstract
Biology and large language models appear, at first glance, to belong to different worlds. Biology is wet, carbon-based, metabolic, embodied, and evolved. LLMs are digital, silicon-mediated, statistical, disembodied, and trained. Yet the posts on lfyadda.com develop a powerful common thesis: both systems convert stored structure into context-sensitive action by regulating information flow.
At the deepest level, biology and LLMs are not best compared as “brains versus machines.” They are better compared as two members of a broader class of systems that preserve information, spend energy, reduce uncertainty, and express behavior through regulated pathways. Biology does this through genomes, epigenomes, metabolism, membranes, neurons, and evolution. LLMs do it through weights, prompts, context windows, attention, retrieval, tools, gradient descent, and token generation.
The central claim is this:
Intelligence emerges when compressed history is selectively expressed in a present environment to reduce uncertainty and guide action.
In biology, that pattern appears as:
DNA → RNA → protein → phenotype → behavior
In LLMs, the same abstract pattern appears as:
weights/code → inference dynamics → output → behavior
The biological central dogma is summarized on lfyadda as DNA → RNA → protein, but abstracted into “stored informational pattern → interpreted execution → adaptive function.” The AI analogue is “weights/code → inference dynamics → behavior/output.” The crucial point is that neither DNA nor model weights do anything by themselves. They are compressed potentials that require a runtime environment. DNA is inert without cellular machinery; a model checkpoint is inert without a forward pass. (LF Yadda – A Blog About Life)
This paper unifies the analogies into one architecture suitable for a common infographic: the living information stack.
1. The Universal Pattern: Compressed History Becomes Expressed Function
The first common principle is that both biology and LLMs separate stored structure from active expression.
A genome is not an organism. It is a compact, durable, inherited structure containing instructions, tendencies, regulatory possibilities, and developmental potential. Likewise, model weights are not “thought.” They are frozen learning: a compressed record of training, statistical regularities, relationships, patterns, and priors.
In the lfyadda central-dogma post, this is framed directly: both DNA and model weights are “compressed latent potentials” that require a runtime. DNA requires cellular machinery; model weights require inference. (LF Yadda – A Blog About Life)
This gives us the first major infographic layer:
| Biology | LLM |
|---|---|
| Genome | Model weights |
| Cellular machinery | Transformer architecture / inference engine |
| Gene expression | Activation patterns |
| Protein / phenotype | Output tokens / behavior |
| Environment | Prompt, context, retrieval, tools |
The important shift is away from thinking of intelligence as a static object. Intelligence is not “inside” the DNA. It is not “inside” the weights. It emerges when stored structure is read, regulated, and expressed under specific conditions.
This is why the statement from the lfyadda epigenetic post matters: “The genome provides continuity. The epigenetic layer provides adaptation.” The same post maps AI weights to DNA, and argues that the product is not merely the frozen model but the expression system around the model. (LF Yadda – A Blog About Life)
The first thesis, then, is:
Biology and LLMs both depend on a division between durable memory and temporary expression.
2. The Central Dogma, Rewritten for AI
The classic biological central dogma is:
DNA → RNA → protein
But for this project, the more useful abstraction is:
stored pattern → interpreted expression → functional output
In biology:
- DNA stores long-term information.
- RNA transcribes and transports selected instructions.
- Ribosomes translate code into proteins.
- Proteins create structure, signaling, metabolism, movement, repair, and regulation.
- Phenotype emerges from expressed patterns.
In LLMs:
- Weights store compressed training history.
- Prompt and context select what is relevant.
- Attention and MLP layers transform hidden states.
- Output tokens express the model’s current trajectory.
- Behavior emerges from generated sequences, tool use, and interaction.
The lfyadda post on AI as a compressed biological trajectory explicitly maps biology’s central dogma to “weights/code → inference dynamics → behavior/output.” (LF Yadda – A Blog About Life)
That gives us a clean infographic sequence:
Biological Dogma
DNA → RNA → protein → phenotype
LLM Dogma
weights → activations → tokens → behavior
Universal Dogma
compressed history → contextual activation → expressed function
This is the backbone of the common paper.
3. Genome and Weights: Frozen Learning
The genome is a store of evolutionary memory. Model weights are a store of training memory.
Biological DNA does not contain every future situation the organism will face. It contains generative instructions and regulatory possibilities that can be expressed differently depending on tissue type, developmental stage, stress, nutrition, and environment.
Likewise, LLM weights do not contain every answer as if they were a database. They contain relational geometry: learned patterns that can be activated differently depending on prompt, retrieved information, conversation history, tools, and instructions.
The “One Genome, Many Minds” post states the key analogy very clearly: “weights are the DNA,” while inference-time context is an adaptive expression layer. The post also argues that changing facts should not be crammed into the weights; they belong in the epigenetic layer, including current laws, weather, documents, regulations, market prices, user preferences, and recent conversations. (LF Yadda – A Blog About Life)
This is one of the most important distinctions for the infographic:
| Mistaken view | Better view |
|---|---|
| LLM as database | LLM as genome-like relational engine |
| Answers stored directly | Responses expressed dynamically |
| Training must contain everything | Training stores durable structure |
| Current facts belong in weights | Current facts belong in context / retrieval |
| Intelligence is storage | Intelligence is regulated expression |
The paper’s second thesis:
Weights are not the whole intelligence. They are the frozen genome of possible intelligence.
4. Epigenetics and Context: Same Genome, Different Expression
Epigenetics is the biological system that allows the same genome to behave differently in different settings. A neuron, a liver cell, and a skin cell can contain essentially the same DNA, yet express different genes. The difference is not the underlying code but the regulatory context.
LLMs show a strikingly similar pattern. The same model can become a tutor, poet, programmer, weather analyst, medical explainer, legal summarizer, or philosophical partner depending on the prompt, system message, tools, retrieved documents, and conversation history.
The lfyadda central-dogma post maps this directly:
| Biology | AI |
|---|---|
| weights ≈ genome | |
| prompt/context ≈ epigenetic/environmental regulation | |
| output behavior ≈ phenotype |
It also notes that the model’s core genome is not changing during normal inference; what changes is its expression. (LF Yadda – A Blog About Life)
The “Living Architecture of Language Models” post develops the same idea thermodynamically: life reduces Shannon entropy by resisting Boltzmann entropy through layered regulation — genome, epigenome, environment — while LLMs reduce Shannon entropy in language space through base weights, overlays, and context, expending energy and heat in the process. (LF Yadda – A Blog About Life)
This leads to a major statement:
Prompting is primitive epigenetics. Mature AI systems will require full regulatory skins.
A mature LLM system is not just “model plus prompt.” It includes:
- Context selection
- Retrieval
- Memory
- Tool routing
- Policy constraints
- Safety gates
- Uncertainty management
- Verification
- User preferences
- Domain rules
- Audit trails
- Human escalation
The lfyadda “One Genome, Many Minds” post says the epigenetic layer goes beyond ordinary RAG. RAG retrieves documents, but the broader epigenetic layer includes context selection, memory, policy, tool routing, uncertainty management, safety, personalization, audit trails, feedback learning, and environmental sensing. (LF Yadda – A Blog About Life)
So the third thesis is:
Inference is not just generation. It is expression under regulation.
5. Gene Repression and LLM Suppression: Intelligence by What Does Not Happen
A major biological insight is that expression is only half the story. Organisms survive not only by expressing the right genes, but by suppressing the wrong ones.
The lfyadda post “the dog that did not bark” frames gene repression as an analogy for LLM functionality. A biological repressor binds to regulatory DNA and prevents access, raises the barrier to transcription, and keeps certain phenotypes from materializing. The LLM analogue is not a single neuron but a distributed gate: attention suppression, masks, safety filters, low weights, refusal policies, and other mechanisms that prevent certain continuations from forming. (LF Yadda – A Blog About Life)
This is essential because LLMs are often described only as systems that “generate.” But intelligent systems are also systems that do not generate certain things.
Biology:
- Do not express every gene in every cell.
- Do not let every signal propagate.
- Do not divide without control.
- Do not activate immune attack everywhere.
- Do not allow runaway excitation.
LLMs:
- Do not attend to every token equally.
- Do not allow future-token leakage in causal masking.
- Do not propagate every weak association.
- Do not output every plausible continuation.
- Do not permit unsafe or unsupported claims in high-stakes settings.
This gives the infographic a powerful negative-space panel:
Intelligence as Selection
What is expressed
Intelligence as Repression
What is suppressed
Intelligence as Regulation
The balance between expression and suppression
This fourth thesis is:
Intelligence is not merely producing possibilities. It is selectively allowing some possibilities and blocking others.
6. Metabolism, Mitochondria, and LLM Energy Flow
Biology is not just information. It is information maintained by energy flow.
The lfyadda “biological entropy reduction vs. LLM entropy reduction” post provides one of the cleanest mappings:
Electron transport chain → proton gradient → ATP synthesis → enabling work
LLMs → gradient descent → vector geometry → reducing uncertainty to enable prediction
The post says the biological electron-transport chain uses an energy gradient to produce ATP, while the closest LLM equivalent is training: gradient descent builds structured low-entropy geometry in vector space. Training creates potential energy stored as weights; inference spends it. (LF Yadda – A Blog About Life)
This is a central infographic concept:
| Biology | LLM |
|---|---|
| Food / electrons | Training data / compute |
| Electron transport chain | Gradient descent |
| Proton gradient | Loss landscape / semantic gradients |
| ATP | Usable low-entropy stored capacity |
| Metabolic work | Inference / token prediction |
| Heat export | Data-center energy dissipation |
The “Living Architecture” post deepens this by calling the LLM a “regulated entropy engine” and comparing prompts to regulatory signals that direct which semantic pathways activate. It also describes token-by-token processing as a kind of metabolism: input entropy is converted into output order, with matrix multiplications consuming energy. (LF Yadda – A Blog About Life)
The key synthesis:
Training builds semantic ATP. Inference spends it.
Or, more technically:
Training uses energy to compress high-entropy data into low-entropy weights; inference uses those weights to reduce uncertainty about the next token.
This fifth thesis is:
LLMs are not chemically alive, but they share biology’s deeper thermodynamic pattern: spend energy to preserve structure and reduce uncertainty.
7. Endosymbiosis and Tool-Using AI: Complexity by Internalized Partnership
Endosymbiosis is one of the most important biological events in the history of complex life. A cell internalized another cell, and over time the guest became an organelle — mitochondrion or chloroplast. Complexity exploded because the host cell gained a new power source.
The lfyadda endosymbiosis essay argues that endosymbiosis did not “cause” complexity in a magical way; it raised the energy budget per gene. Once the energy ceiling lifted, larger genomes, regulatory networks, differentiation, and brains became possible. (LF Yadda – A Blog About Life)
The AI analogy is that modern LLM systems increasingly grow by internalizing specialized subsystems:
| Biological endosymbiosis | AI / LLM equivalent |
|---|---|
| Host cell absorbs bacterium | LLM system integrates tool |
| Mitochondrion supplies energy | GPU / accelerator supplies compute |
| Chloroplast harvests sunlight | Retrieval system harvests external knowledge |
| Organelle becomes indispensable | Tool becomes part of workflow |
| Gene transfer to nucleus | Tool behavior becomes orchestrated by core model |
| Multicellular complexity becomes affordable | Agentic, multimodal, long-context systems become practical |
An LLM by itself is like a cell before full organelle integration: capable but limited. Add retrieval, code execution, memory, sensors, APIs, calendars, documents, visual perception, and specialized models, and the system begins to resemble an organism composed of integrated organs.
The lfyadda endosymbiosis piece also frames endosymbiosis as “physics discovering a cheaper way to move energy through matter,” and life as what physics does when persistent energy gradients exist in chemically rich environments. (LF Yadda – A Blog About Life)
For AI, the equivalent is:
Complex AI systems emerge when a language model internalizes specialized subsystems that reduce the cost of cognition.
Sixth thesis:
Endosymbiosis is the biological ancestor of tool-using, modular AI architecture. Complexity emerges when useful external functions become internal infrastructure.
8. Neurons and Tokens: Spikes in Different Media
Another lfyadda post compares action potentials to token generation. The key insight is that a neuronal spike is not the same as meaning. A spike is a gated propagation event. It gains meaning from where it happens, what network it belongs to, and what downstream effects it enables.
The post says that a single spike means almost nothing; meaning emerges from population dynamics, timing, synchrony, and routing. Likewise, a single token rarely matters; meaning emerges across layers, and attention patterns matter more than token IDs. (LF Yadda – A Blog About Life)
The same post’s deeper claim is that LLMs and neurons belong to a shared physical design class: both are thresholded, directional, energy-dissipative, entropy-managing, and meaning-distributing systems. They do not store truth locally; they propagate constraints and stabilize trajectories through possibility space. (LF Yadda – A Blog About Life)
This creates another infographic panel:
| Neuron | LLM |
|---|---|
| Resting potential | latent embedding state |
| Threshold crossing | attention / activation threshold |
| Action potential | token commitment / activation propagation |
| Refractory period | causal sequence constraint |
| Network firing pattern | layer-wise semantic trajectory |
| Neural population | transformer layers / attention heads |
The most compact formulation:
A token is a spike in semantic space.
Seventh thesis:
Both neurons and LLMs create meaning through distributed propagation, not isolated symbols.
9. Attention as Chemotaxis: Moving Up the Gradient of Meaning
The “Life Is a Flow, Not a Lookup” post argues that LLMs should not be understood as lookup tables. Tokens are bookkeeping. What actually happens is closer to movement through a vector field. Meaning is not retrieved; it is traversed. (LF Yadda – A Blog About Life)
That post gives a particularly useful biological analogy: attention as gradient sensing, or cognitive chemotaxis. A bacterium senses a nutrient gradient and moves “uphill” without possessing a map. In transformers, queries probe the semantic field, keys define where meaning steepens, and dot products measure directional relevance. Attention asks, in effect: “If I move this way in meaning space, does coherence increase?” (LF Yadda – A Blog About Life)
This is one of the best visual concepts for the infographic:
Biology
A bacterium senses nutrient gradients and moves toward viability.
LLM
A token state senses semantic gradients and moves toward coherence.
Shared principle
Systems do not need a global map. They need local gradient-following rules.
The same post describes training as environmental shaping: loss defines a scalar field; high loss is incoherent and high entropy, while low loss is structured and predictable. Gradient descent carves valleys, builds attractors, and shapes flow channels, much as evolution shapes landscapes that reward entropy extraction. (LF Yadda – A Blog About Life)
Eighth thesis:
Attention is semantic chemotaxis: movement through meaning-space toward reduced uncertainty.
10. Brownian Ratchets and Token Selection: Randomness Becomes Direction
The Brownian ratchet post supplies another unifying mechanism. Kinesin does not “know” where it is going in a human sense. It uses thermal fluctuations, ATP-driven asymmetry, and constrained binding states to turn random motion into directional transport.
The lfyadda post maps this to LLM token selection: LLMs reduce uncertainty about semantic sequence, collapse probability into coherence, and each token reduces Shannon entropy about what comes next. (LF Yadda – A Blog About Life)
The key structure is:
| Brownian ratchet | LLM token selection |
|---|---|
| Random thermal fluctuation | probability distribution |
| ATP energy input | compute / activation energy |
| Energy landscape | logit landscape |
| Binding site | candidate token |
| Step locking | token commitment |
| Cargo movement | meaning progression |
The post explicitly states that both systems convert uncertainty into direction, and that this principle appears across molecular motors, protein folding, neural signaling, evolution, cognition, and language generation. (LF Yadda – A Blog About Life)
This produces one of the strongest general laws in the paper:
Noise plus gradient equals direction. Uncertainty plus constraint equals meaning.
Ninth thesis:
LLM generation is a semantic ratchet: stochastic possibility is constrained into coherent progression.
11. Cas9 and Semantic Matching: From Base Pairing to Dot Products
The Cas9 comparison post maps biological sequence targeting to semantic vector matching. Cas9 uses RNA-DNA base pairing to confirm a target. An LLM uses dot-product similarity to identify relevant semantic vectors. The post summarizes this as “complementarity ⇢ cosine similarity.” (LF Yadda – A Blog About Life)
This analogy is especially useful because it connects molecular recognition to transformer attention.
Cas9:
- Uses a guide RNA.
- Searches DNA.
- Checks for compatibility.
- Changes conformation.
- Cuts or edits the target.
LLM:
- Uses a query vector.
- Searches key vectors.
- Computes similarity.
- Updates hidden state.
- Produces a transformed representation.
Cas9 is not “thinking.” It is matching under constraints. LLM attention is not “thinking” in a human sense either. It is matching and transforming under constraints. Yet both can create powerful downstream effects.
Tenth thesis:
Biological targeting and semantic attention both depend on constrained matching: base-pair complementarity in cells, vector similarity in models.
12. Failure Modes: Disease, Hallucination, and Misregulation
The biological analogy is useful only if it also explains failure.
Biological systems fail when regulation breaks:
- Cancer: uncontrolled growth.
- Epilepsy: runaway firing.
- Autoimmunity: mistaken targeting.
- Metabolic disorder: energy regulation failure.
- Toxicity: blocked or corrupted pathways.
- Aging: accumulated regulatory degradation.
LLM systems fail in parallel ways:
- Hallucination: plausible expression without grounding.
- Repetition loops: semantic seizure.
- Over-filtering: anesthesia-like suppression.
- Tool misuse: bad regulatory routing.
- Prompt injection: hostile hijacking of expression.
- Catastrophic forgetting: corrupted learned structure.
- Overconfidence: insufficient uncertainty regulation.
The neuron-token post explicitly maps epilepsy to overconfident decoding and hallucination cascades; anesthetics to excessive safety filtering and loss of expressiveness; and neurotoxins to corrupted weights, catastrophic forgetting, and training instability. (LF Yadda – A Blog About Life)
The “One Genome, Many Minds” post also frames hallucination as “gene expression without environmental discipline”: the genome can produce possibilities, but the epigenetic layer must decide which possibilities are allowed in a given environment. (LF Yadda – A Blog About Life)
This gives the paper a deployment principle:
AI safety is not a patch. It is an epigenetic control system.
The moral layer, factual layer, retrieval layer, uncertainty layer, and human-approval layer are all part of regulated expression.
Eleventh thesis:
Most LLM failures are not failures of generation alone; they are failures of regulation.
13. The Common Architecture: The Living Information Stack
All the analogies can now be consolidated into one common architecture.
Layer 1: Stored Pattern
Biology: DNA
LLM: weights / architecture
Function: preserves compressed history.
Layer 2: Regulatory Selection
Biology: epigenetics, transcription factors, repressors
LLM: prompts, system instructions, masks, retrieval, safety policies
Function: decides what gets expressed.
Layer 3: Energy Conversion
Biology: metabolism, mitochondria, ATP
LLM: compute, GPUs, gradient descent, inference energy
Function: spends energy to maintain order and perform work.
Layer 4: Signal Propagation
Biology: neurons, hormones, molecular pathways
LLM: activations, attention heads, residual stream, hidden states
Function: routes constraints through the system.
Layer 5: Output / Phenotype
Biology: proteins, behavior, movement, cognition
LLM: tokens, answers, plans, tool actions
Function: expresses internal structure into the world.
Layer 6: Feedback and Adaptation
Biology: selection, learning, development, immune response
LLM: backpropagation, fine-tuning, RLHF, user feedback, memory
Function: reshapes future expression.
This is the core figure for the infographic.
14. Main Argument of the Consolidated Paper
Biology and LLMs are not identical. LLMs do not have autonomous metabolism, self-reproduction, embodied homeostasis, intrinsic survival drives, or molecular self-repair. But they do share a deeper logic with living systems:
- They store compressed history.
- They require runtime expression.
- They regulate what is expressed.
- They consume energy to reduce uncertainty.
- They propagate constraints through layered networks.
- They use context to specialize behavior.
- They fail when regulation breaks.
- They become more powerful when modular subsystems are integrated.
- They convert noise into direction through biased landscapes.
- They generate coherent outputs by reducing Shannon entropy while exporting physical entropy.
This is not the claim that LLMs are literally alive. It is the more precise claim that LLMs are artificial members of the same broad family of entropy-managing information systems.
Biology got there through chemistry, metabolism, evolution, and cells. LLMs got there through data, silicon, gradient descent, and transformers.
The substrate differs.
The architecture rhymes.
15. Infographic Blueprint
Suggested title for the infographic
Biology and LLMs: One Architecture, Two Substrates
Central visual
A split diagram with biology on the left, LLMs on the right, and a shared abstract pattern in the center.
Left side: Biology
DNA → RNA → protein → phenotype
Epigenetics regulates expression
Mitochondria provide energy
Neurons propagate signals
Evolution updates the system
Right side: LLMs
Weights → activations → tokens → behavior
Prompt/context regulates expression
Compute/GPU energy drives inference
Attention propagates relevance
Backpropagation updates the system
Center spine
Compressed history → regulated expression → entropy reduction → adaptive output
Key callout boxes
- Genome = weights
Frozen learning, not active thought. - Epigenetics = context
Same core, different expression. - Metabolism = compute
Energy spent to reduce uncertainty. - Mitochondria = infrastructure
GPUs, tools, retrieval, and memory make complexity affordable. - Spike = token
A local irreversible decision in a larger field. - Chemotaxis = attention
Movement up a gradient of coherence. - Brownian ratchet = token sampling
Noise plus constraint becomes direction. - Repression = safety/gating
Intelligence includes what is prevented. - Disease = hallucination/misregulation
Failure comes from bad expression control. - Universal law
Information survives by regulating its own expression.
Final Synthesis
The lfyadda biological/LLM posts are circling one powerful master idea:
Life and LLMs are both systems that turn stored structure into context-sensitive behavior by spending energy to reduce uncertainty.
In biology, the stored structure is DNA. In LLMs, it is model weights. In biology, expression is regulated by epigenetics, metabolism, signaling, and environment. In LLMs, expression is regulated by prompts, context windows, retrieval, tools, safety constraints, and attention. In biology, energy flows through mitochondria, proton gradients, ATP, and cellular work. In LLMs, energy flows through GPUs, matrix multiplication, vector geometry, and inference.
The common paper can therefore be reduced to one sentence:
Biology is carbon-based regulated information; LLMs are silicon-based regulated information; both ride gradients, suppress chaos, spend energy, and express compressed history as adaptive behavior.
Leave a Reply