|
Getting your Trinity Audio player ready…
|
The Basic Problem
Imagine you collect a lot of information about people.
For example, you measure:
- weight
- height
- blood pressure
- cholesterol
- sleep hours
- daily steps
- calories
- stress levels
- lung capacity
- grip strength
Now imagine doing this for thousands of people.
You might end up with 50 different measurements per person. Your spreadsheet becomes huge and complicated.
At that point a simple question becomes difficult:
Which people are doing well and which are struggling?
Your brain cannot easily understand 50 different numbers at once.
So we need a way to simplify the data without losing the important patterns.
That is what Principal Component Analysis (PCA) does.
The Big Idea of PCA
PCA tries to answer this question:
Are many of these measurements actually telling us the same thing?
Often they are.
For example:
- People with high body weight often have higher cholesterol.
- People who exercise more often have lower resting heart rate.
- People who sleep well often have lower stress.
So many variables are correlated.
Instead of 50 independent measurements, there may only be a few underlying patterns.
PCA tries to find those patterns.
For example, it might discover:
- Pattern 1: overall metabolic health
- Pattern 2: fitness vs strength
- Pattern 3: stress and recovery
So instead of describing a person with 50 numbers, you might only need 3 numbers.
You have simplified the data while keeping most of its meaning.
Step 1: Center the Data
The first thing PCA does is subtract the average value from every variable.
Why?
Because PCA looks for variation — how things differ.
If we leave the average in place, PCA may mistakenly focus on where the data sits rather than how it spreads.
So the data is shifted so the average is zero.
This lets PCA see the true shape of the data.
Step 2: Put Everything on the Same Scale
Different measurements use different units.
For example:
- weight might range from 60–120 kg
- stress score might range 0–10
Without correction, weight would dominate simply because its numbers are larger.
So PCA normalizes the variables.
This means each variable is scaled so they all have similar ranges.
Now PCA treats them fairly.
Step 3: Measure How Variables Move Together
Next PCA calculates something called covariance.
This simply measures how two variables change together.
For example:
If people who weigh more also tend to have higher cholesterol, those variables have positive covariance.
If two variables have nothing to do with each other, their covariance is near zero.
PCA calculates this relationship for every pair of variables.
All those relationships are stored in something called the covariance matrix.
Think of it as a big table showing how every variable relates to every other one.
Step 4: Find the Main Directions of Variation
Now PCA asks a geometric question:
In which direction does the data spread the most?
Imagine your data as a cloud of dots in space.
If you look at the cloud from the right angle, you might see that it stretches mostly along one direction.
That direction captures the most important variation.
PCA finds that direction.
This direction is called the first principal component (PC1).
Then PCA finds the second best direction that is perpendicular to the first.
That is PC2.
Then it finds PC3, and so on.
Each of these directions explains a smaller amount of variation.
Step 5: Rotate the Data
Once PCA finds these new directions, it rotates the coordinate system.
Instead of measuring people using the original variables, we measure them using the principal components.
So each person now has scores like:
- PC1 score
- PC2 score
- PC3 score
These scores describe where the person sits along the main patterns in the data.
Step 6: Decide How Many Components to Keep
Usually the first few components explain most of the variation.
For example:
| Components | Variance Explained |
|---|---|
| 1 | 76% |
| 2 | 89% |
| 3 | 95% |
| 4 | 98% |
This means:
- One component already explains most of the variation.
- Three components explain almost everything.
So instead of 50 variables, we might keep just three components.
That dramatically simplifies the dataset.
What PCA Is Really Doing
PCA is basically doing three things:
- Finding patterns in how variables move together
- Combining related variables into new variables
- Reducing the number of dimensions needed to describe the data
It compresses the information without losing much meaning.
The Limitation of PCA
PCA has an important limitation.
It finds the directions with the most variation, but those directions are not always the most useful.
For example:
Suppose the biggest difference in a hospital dataset is which hospital patients came from.
PCA might focus on that pattern.
But maybe the real question we care about is who will become sick.
PCA does not know that.
It only looks for variance.
So sometimes PCA highlights patterns that are interesting but not useful.
Another Important Concern
PCA also inherits any bias in the data.
If a dataset reflects biased human decisions — such as hiring patterns or medical access — PCA will compress and reorganize those patterns.
But it will not remove the bias.
In fact, because PCA mixes many variables together, it can make the bias harder to see.
So analysts must use PCA carefully, especially in sensitive areas like:
- hiring
- policing
- healthcare
- finance
The Big Picture
PCA solves a very common problem.
Modern data is often:
- extremely large
- highly correlated
- too complex to visualize
PCA gives us a way to simplify that complexity.
It rotates the data into a new perspective where the most important patterns become easier to see.
It doesn’t change the data itself.
It simply helps us look at it from the angle where the story becomes clearer.
Frank, if you’d like, I can also show you something that will probably resonate strongly with your entropy / AI thinking:
PCA is essentially finding the lowest-entropy description of a dataset.
In other words:
PCA discovers the fewest directions needed to describe most of the information.
Which is extremely close to how embeddings and latent spaces work in LLMs.
If you want, I can explain that connection — it’s actually quite beautiful.
Leave a Reply