PCA in Plain English

Getting your Trinity Audio player ready…

The Basic Problem

Imagine you collect a lot of information about people.

For example, you measure:

  • weight
  • height
  • blood pressure
  • cholesterol
  • sleep hours
  • daily steps
  • calories
  • stress levels
  • lung capacity
  • grip strength

Now imagine doing this for thousands of people.

You might end up with 50 different measurements per person. Your spreadsheet becomes huge and complicated.

At that point a simple question becomes difficult:

Which people are doing well and which are struggling?

Your brain cannot easily understand 50 different numbers at once.

So we need a way to simplify the data without losing the important patterns.

That is what Principal Component Analysis (PCA) does.


The Big Idea of PCA

PCA tries to answer this question:

Are many of these measurements actually telling us the same thing?

Often they are.

For example:

  • People with high body weight often have higher cholesterol.
  • People who exercise more often have lower resting heart rate.
  • People who sleep well often have lower stress.

So many variables are correlated.

Instead of 50 independent measurements, there may only be a few underlying patterns.

PCA tries to find those patterns.

For example, it might discover:

  • Pattern 1: overall metabolic health
  • Pattern 2: fitness vs strength
  • Pattern 3: stress and recovery

So instead of describing a person with 50 numbers, you might only need 3 numbers.

You have simplified the data while keeping most of its meaning.


Step 1: Center the Data

The first thing PCA does is subtract the average value from every variable.

Why?

Because PCA looks for variation — how things differ.

If we leave the average in place, PCA may mistakenly focus on where the data sits rather than how it spreads.

So the data is shifted so the average is zero.

This lets PCA see the true shape of the data.


Step 2: Put Everything on the Same Scale

Different measurements use different units.

For example:

  • weight might range from 60–120 kg
  • stress score might range 0–10

Without correction, weight would dominate simply because its numbers are larger.

So PCA normalizes the variables.

This means each variable is scaled so they all have similar ranges.

Now PCA treats them fairly.


Step 3: Measure How Variables Move Together

Next PCA calculates something called covariance.

This simply measures how two variables change together.

For example:

If people who weigh more also tend to have higher cholesterol, those variables have positive covariance.

If two variables have nothing to do with each other, their covariance is near zero.

PCA calculates this relationship for every pair of variables.

All those relationships are stored in something called the covariance matrix.

Think of it as a big table showing how every variable relates to every other one.


Step 4: Find the Main Directions of Variation

Now PCA asks a geometric question:

In which direction does the data spread the most?

Imagine your data as a cloud of dots in space.

If you look at the cloud from the right angle, you might see that it stretches mostly along one direction.

That direction captures the most important variation.

PCA finds that direction.

This direction is called the first principal component (PC1).

Then PCA finds the second best direction that is perpendicular to the first.

That is PC2.

Then it finds PC3, and so on.

Each of these directions explains a smaller amount of variation.


Step 5: Rotate the Data

Once PCA finds these new directions, it rotates the coordinate system.

Instead of measuring people using the original variables, we measure them using the principal components.

So each person now has scores like:

  • PC1 score
  • PC2 score
  • PC3 score

These scores describe where the person sits along the main patterns in the data.


Step 6: Decide How Many Components to Keep

Usually the first few components explain most of the variation.

For example:

ComponentsVariance Explained
176%
289%
395%
498%

This means:

  • One component already explains most of the variation.
  • Three components explain almost everything.

So instead of 50 variables, we might keep just three components.

That dramatically simplifies the dataset.


What PCA Is Really Doing

PCA is basically doing three things:

  1. Finding patterns in how variables move together
  2. Combining related variables into new variables
  3. Reducing the number of dimensions needed to describe the data

It compresses the information without losing much meaning.


The Limitation of PCA

PCA has an important limitation.

It finds the directions with the most variation, but those directions are not always the most useful.

For example:

Suppose the biggest difference in a hospital dataset is which hospital patients came from.

PCA might focus on that pattern.

But maybe the real question we care about is who will become sick.

PCA does not know that.

It only looks for variance.

So sometimes PCA highlights patterns that are interesting but not useful.


Another Important Concern

PCA also inherits any bias in the data.

If a dataset reflects biased human decisions — such as hiring patterns or medical access — PCA will compress and reorganize those patterns.

But it will not remove the bias.

In fact, because PCA mixes many variables together, it can make the bias harder to see.

So analysts must use PCA carefully, especially in sensitive areas like:

  • hiring
  • policing
  • healthcare
  • finance

The Big Picture

PCA solves a very common problem.

Modern data is often:

  • extremely large
  • highly correlated
  • too complex to visualize

PCA gives us a way to simplify that complexity.

It rotates the data into a new perspective where the most important patterns become easier to see.

It doesn’t change the data itself.

It simply helps us look at it from the angle where the story becomes clearer.


Frank, if you’d like, I can also show you something that will probably resonate strongly with your entropy / AI thinking:

PCA is essentially finding the lowest-entropy description of a dataset.

In other words:

PCA discovers the fewest directions needed to describe most of the information.

Which is extremely close to how embeddings and latent spaces work in LLMs.

If you want, I can explain that connection — it’s actually quite beautiful.


Posted

in

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *