Getting your Trinity Audio player ready…

Course Description

This course explains what happens inside a large language model after a user enters a prompt. Students move step by step through the inference pipeline: from raw text to tokens, embeddings, positional structure, self-attention, multilayer perceptrons, residual accumulation, final hidden states, logits, probabilities, and next-token generation. The course treats the LLM not as a magical chatbot but as a structured computational system that transforms language into geometry, compares activations to learned weight patterns, and recursively produces text one token at a time. This framing directly follows the conceptual structure of the source post. (LF Yadda – A Blog About Life)

Core Learning Goals

By the end of the course, students should be able to:

Explain the difference between raw text, tokens, token IDs, embeddings, and hidden states.
Describe why position must be added to token embeddings.
Explain self-attention in plain English and in simplified mathematical form.
Distinguish between attention as relevance finding and the MLP as feature transformation.
Explain the role of residual connections in preserving and accumulating meaning.
Describe how final hidden states become logits, probabilities, and selected output tokens.
Explain the difference between:
- prompt-to-weight comparison, and
- prompt-to-prompt comparison. (LF Yadda – A Blog About Life)
Build intuitive mental models for how an LLM “thinks” without anthropomorphizing it too much.

Big Course Theme

A good unifying sentence for the whole class is this:

An LLM does not retrieve a finished answer from memory; it repeatedly transforms a live context into a probability field over possible next tokens. This is exactly the logic of the article’s later blocks on final hidden state, logits, probabilities, and token selection. (LF Yadda – A Blog About Life)

Unit Structure

Unit 1 — Entering the Machine

Covers Blocks 1–4:

User Prompt
Tokenization
Embedding Lookup
Positional Information Added (LF Yadda – A Blog About Life)

Essential Question

How does ordinary human text become machine-processable structure?

Key Ideas

Raw text is not yet machine meaning.
Tokenization is segmentation, not semantic understanding.
Token IDs are discrete addresses.
Embeddings convert symbolic IDs into vectors.
Positional information turns isolated token vectors into sequence-aware hidden states. (LF Yadda – A Blog About Life)

Sample Lesson Activities

Break a sentence into possible token fragments by hand.
Compare “dog bites man” vs “man bites dog” to show why order matters.
Use colored index cards to represent tokens, IDs, embeddings, and positional additions.

Sample Homework

Take three short sentences and describe:

what the raw text is,
what tokenization would do,
why token IDs alone are not enough,
why positional information must be added.

Unit 2 — The Transformer as a Meaning Engine

Covers Blocks 5–11:

Transformer Block
Query/Key/Value Projections
Self-Attention Score Computation
Softmax Attention Weights
Value Aggregation
Attention Output Projection
Residual Add (LF Yadda – A Blog About Life)

Essential Question

How does the model decide what parts of the prompt matter to what other parts?

Key Ideas

Transformer layers are repeated refinement modules.
Q/K/V are learned projections of the same hidden state into different roles.
Self-attention computes relevance between token positions.
Softmax turns raw relevance into normalized influence.
Value aggregation gathers context.
Residual addition preserves earlier representation while adding new context. (LF Yadda – A Blog About Life)

Sample Lesson Activities

Role-play attention:
- one student is Query,
- several are Keys,
- several carry Values,
- the class computes “who matters most.”
Have students assign mock attention scores to words in a sentence.
Draw the residual stream as a river receiving tributaries.

Sample Homework

Write a one-page explanation of attention without using the words “magic,” “understanding,” or “memory retrieval.”

Unit 3 — The MLP and Semantic Circuitry

Covers Blocks 12–15:

MLP / Feedforward Layer
MLP Activation
MLP Output Returns to Residual Stream
Repeat Across Many Layers (LF Yadda – A Blog About Life)

Essential Question

If attention gathers context, what does the MLP do with it?

Key Ideas

The MLP is not mainly about token-to-token comparison.
It compares live hidden states against learned neuron directions.
Activation functions decide which feature responses become strong or weak.
The MLP output is added back into the residual stream.
Repeated layers progressively refine context into deeper semantic structure. (LF Yadda – A Blog About Life)

Sample Lesson Activities

Treat neurons as “feature detectors.”
Give students hidden-state cards with features like “plural,” “question,” “animal,” “past tense,” and simulate which detectors fire.
Build a “semantic circuitry board” metaphor.

Sample Homework

Describe the difference between:

attention asking “where should I look?”
MLP asking “what internal features should now activate?”

Unit 4 — From Internal State to Output

Covers Blocks 16–19:

Final Hidden State for Last Token
Output Logits / Unembedding
Next Token Probabilities
Token Selection (LF Yadda – A Blog About Life)

Essential Question

How does the model move from hidden computation to visible text?

Key Ideas

The final hidden state is a context-rich summary at the last position.
Unembedding compares that state against output token directions to produce logits.
Softmax converts logits into probabilities.
Decoding chooses one token.
The selected token becomes both output and new input for the next cycle. (LF Yadda – A Blog About Life)

Sample Lesson Activities

Give students a small fake vocabulary and a made-up logit table.
Let them compute rough probabilities and simulate greedy vs sampling output.
Show how one selected token changes the next step.

Sample Homework

Explain why logits are not yet probabilities, and why probabilities are not yet the final output.

Unit 5 — The Master Distinction

Covers the ending conceptual synthesis:

Type 1 comparison: prompt vs learned weights
Type 2 comparison: prompt vs prompt in self-attention (LF Yadda – A Blog About Life)

Essential Question

What kinds of “comparison” actually happen inside an LLM?

Key Ideas

The source post’s most important conceptual distinction is that there is not just one kind of comparison inside the model. Sometimes live hidden states are projected against learned parameter directions. Other times prompt-derived vectors compare directly with other prompt-derived vectors in self-attention. Understanding that distinction makes the whole pipeline much clearer. (LF Yadda – A Blog About Life)

Sample Lesson Activities

Sort operations into two bins:
- against stored learned structure
- against live prompt-derived context
Debate which is more important for “intelligence”: stored geometry or dynamic routing.

Final Writing Prompt

“Why an LLM is neither a database nor a simple Markov chain.”

Sample Weekly Schedule

Week 1

Introduction to LLM inference
Read Blocks 1–4
Class discussion: “Where does meaning begin?”

Week 2

Tokenization and embeddings
Lab: build a toy tokenizer
Mini-quiz

Week 3

Positional information and hidden states
Diagramming workshop

Week 4

Transformer blocks and Q/K/V
Attention simulation lab

Week 5

Self-attention, softmax, and value aggregation
Short written reflection

Week 6

Residual streams and cumulative meaning
Midterm concept map

Week 7

MLP and feature detectors
Neuron activation exercise

Week 8

Layer repetition and progressive contextualization
Compare early-layer vs late-layer roles

Week 9

Final hidden states and unembedding
Vocabulary competition exercise

Week 10

Probabilities and decoding
Sampling vs greedy lab

Week 11

Two kinds of comparison
Synthesis seminar

Week 12

Student presentations and final project

Sample Course Materials

Now the part you asked for specifically: sample course material, not just the outline.

Sample Lecture 1 Handout

Handout Title

What Happens the Moment You Enter a Prompt?

Opening Idea

When a person reads a sentence, it already feels meaningful. When a language model receives that same sentence, it begins with raw symbols, not meaning. The first phase is not understanding. It is conversion. The post explicitly frames the user prompt as the boundary between the outside symbolic world and the inside computational world. (LF Yadda – A Blog About Life)

Vocabulary

Prompt: the text supplied by the user
Tokenization: breaking text into model-recognizable pieces
Token ID: the numeric ID assigned to each token
Embedding: a learned vector retrieved for a token ID
Position encoding: additional information that tells the model where each token sits in the sequence (LF Yadda – A Blog About Life)

Mini Example

Sentence:
“The cat sat on the mat.”

Possible simplified flow:

Raw text arrives.
Tokenizer splits it into pieces.
Each piece gets a token ID.
Each token ID retrieves an embedding vector.
Position information is added.
The sequence is now ready to enter transformer layers. (LF Yadda – A Blog About Life)

Key Takeaway

The model does not begin with semantic understanding. It begins with structured conversion into vectors. (LF Yadda – A Blog About Life)

Sample Lecture 2 Mini-Script

Topic

Attention: How tokens decide which other tokens matter

Imagine every token in a sentence asking a question: “Who in the sentence matters to me right now?” That is what queries and keys help the model do. The hidden state at a token position is projected into three views: Query, Key, and Value. The query is what the token is looking for, the key is how another token advertises its relevance, and the value is the content it can contribute if chosen. The post explains that the Q/K/V stage prepares the ingredients for attention, and the self-attention stage then performs prompt-to-prompt comparison by dot-producting queries with keys. (LF Yadda – A Blog About Life)

The raw attention scores are then normalized by softmax, which turns relevance evidence into influence weights. Those weights are used to blend value vectors into a new contextual representation. That result is projected back into the residual stream and added to the earlier state rather than replacing it outright. (LF Yadda – A Blog About Life)

So attention is not “the model looking up the answer.” Attention is the model reorganizing the prompt internally according to relevance. (LF Yadda – A Blog About Life)

Sample Student Worksheet

Worksheet: Mapping the Inference Pipeline

Part A — Put These in Order

Number the following from earliest to latest:

logits
tokenization
token selection
embedding lookup
residual add
next-token probabilities
self-attention score computation
positional information added

Part B — Short Answer

Why is tokenization not the same thing as semantic understanding?
Why does the model need positional information?
What is the difference between a query and a value?
Why are residual connections important?
Why is the final hidden state not the same as the token embedding?

Part C — Reflection

Complete this sentence:

“The model’s output is not retrieved from storage all at once; it is generated by…”

Expected answer direction:
“…repeatedly turning a contextual hidden state into logits, probabilities, and then a selected next token.” (LF Yadda – A Blog About Life)

Sample Quiz

Quiz 1: Inside an LLM

Multiple Choice

1. Tokenization is primarily:
A. a semantic search
B. a deterministic segmentation and ID assignment step
C. a probability calculation
D. an attention score
Answer: B (LF Yadda – A Blog About Life)

2. Embedding lookup does what?
A. Selects the most similar sentence from training data
B. Converts raw logits into probabilities
C. retrieves a learned vector for each token ID
D. chooses the next token
Answer: C (LF Yadda – A Blog About Life)

3. Self-attention score computation compares:
A. hidden states only to MLP neurons
B. queries to keys across token positions
C. logits to probabilities
D. residual streams to token IDs
Answer: B (LF Yadda – A Blog About Life)

4. Residual addition is important because it:
A. deletes prior information
B. replaces old meaning entirely
C. lets the model preserve prior state while adding new context
D. converts text into token IDs
Answer: C (LF Yadda – A Blog About Life)

5. Logits are:
A. already normalized probabilities
B. raw scores over vocabulary items
C. token IDs
D. positional vectors
Answer: B (LF Yadda – A Blog About Life)

Short Response

In 3–5 sentences, explain the difference between logits and next-token probabilities. (LF Yadda – A Blog About Life)

Sample In-Class Lab

Lab Title

Be the Transformer

Goal

Students physically simulate one attention pass.

Materials

index cards
markers
board space

Setup

Use the sentence:
“The animal didn’t cross the street because it was tired.”

Assign students roles:

Token cards: The / animal / didn’t / cross / the / street / because / it / was / tired
One student plays the current token: “it”
Other students hold simplified key/value labels such as:
- animal → possible antecedent
- street → another noun
- tired → descriptive clue

Procedure

“It” forms a Query: “I need my referent.”
Candidate nouns present Keys.
Students assign rough relevance scores.
Softmax is approximated by converting those scores into rough weights.
Values are blended.
The class decides whether “it” most likely refers to “animal” or “street.”

Learning Point

This demonstrates how attention can help resolve relationships by comparing current-token needs to earlier-token relevance signals. That directly reflects the article’s explanation of self-attention, softmax weighting, and value aggregation. (LF Yadda – A Blog About Life)

Sample Discussion Questions

At what exact point does “meaning” begin inside the model?
Is an embedding “meaning,” or only the beginning of machine-usable meaning?
Why is attention not enough by itself?
Why is the MLP often underappreciated in popular explanations?
Why does repeated layering matter?
Does the model ever “know” a whole sentence in advance, or only produce one token at a time? (LF Yadda – A Blog About Life)

Sample Midterm Assignment

Prompt

Write a 1200–1800 word essay explaining the full LLM inference pipeline to an intelligent nontechnical reader. Your essay must include:

tokenization
embeddings
positional information
self-attention
MLP
residual connections
final hidden state
logits
probabilities
token selection

You must also explain the difference between:

prompt vs learned-weight comparison
prompt vs prompt comparison. (LF Yadda – A Blog About Life)

Grading Criteria

accuracy
clarity
organization
ability to use analogy without losing correctness
proper distinction of pipeline stages

Sample Final Project Options

Option 1

Create a visual wall chart of the 19-block pipeline.

Option 2

Write a teacher’s guide titled:
“How to Explain an LLM Without Saying It Just Predicts the Next Word.”

Option 3

Build a toy spreadsheet model showing:

tokens
token IDs
mock embeddings
mock attention scores
weighted values
mock logits

Option 4

Produce a “Frank-said / GPT-said” classroom dialogue version of the course as an educational script.

Sample Teacher Notes

A strong teaching move is to repeat this contrast all semester:

symbolic stage: raw text, token pieces, token IDs
geometric stage: embeddings, hidden states, projections, attention space
decision stage: logits, probabilities, token selection

That three-part framing is faithful to the post’s overall structure from prompt entry, through layered semantic processing, to output competition and decoding. (LF Yadda – A Blog About Life)

Another strong teaching move is to keep returning to the article’s final distinction: not all “comparison” inside an LLM is the same. Some comparison is against frozen learned structure. Some is live token-to-token interaction within the prompt. That distinction is probably the single best conceptual spine for the whole course. (LF Yadda – A Blog About Life)

Suggested Required Materials

your original LFYadda post as the anchor text
whiteboard or slide deck
printed worksheets
colored token cards
optional simple Python notebooks for vector demos

Recommended Assessments

A balanced version could be:

15% quizzes
20% worksheets and labs
20% discussion participation
20% midterm essay
25% final project

One-Page Course Summary for Students

This course teaches what happens inside a large language model after a prompt is entered.
You will learn how text is tokenized, turned into vectors, combined with positional information, processed through transformer layers, routed through self-attention, transformed by feedforward circuitry, compressed into a final hidden state, compared against vocabulary output directions, converted into probabilities, and finally decoded into a new token. The central lesson is that an LLM is a structured inference machine built from both frozen learned geometry and live context-sensitive interaction. (LF Yadda – A Blog About Life)

Inside an LLM: From Prompt to Prediction

Course Description

Core Learning Goals

Big Course Theme

Unit Structure

Unit 1 — Entering the Machine

Essential Question

Key Ideas

Sample Lesson Activities

Sample Homework

Unit 2 — The Transformer as a Meaning Engine

Essential Question

Key Ideas

Sample Lesson Activities

Sample Homework

Unit 3 — The MLP and Semantic Circuitry

Essential Question

Key Ideas

Sample Lesson Activities

Sample Homework

Unit 4 — From Internal State to Output

Essential Question

Key Ideas

Sample Lesson Activities

Sample Homework

Unit 5 — The Master Distinction

Essential Question

Key Ideas

Sample Lesson Activities

Final Writing Prompt

Sample Weekly Schedule

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

Sample Course Materials

Sample Lecture 1 Handout

Handout Title

Opening Idea

Vocabulary

Mini Example

Key Takeaway

Sample Lecture 2 Mini-Script

Topic

Sample Student Worksheet

Worksheet: Mapping the Inference Pipeline

Part A — Put These in Order

Part B — Short Answer

Part C — Reflection

Sample Quiz

Quiz 1: Inside an LLM

Multiple Choice

Short Response

Sample In-Class Lab

Lab Title

Goal

Materials

Setup

Procedure

Learning Point

Sample Discussion Questions

Sample Midterm Assignment

Prompt

Grading Criteria

Sample Final Project Options

Option 1

Option 2

Option 3

Option 4

Sample Teacher Notes

Suggested Required Materials

Recommended Assessments