FoundationsF5 of 6~30 minutesF3 (Reading Science) recommended

Data & Biology

In 1998, a single small study with 12 children was published in The Lancet.

Hook

In 1998, a single small study with 12 children was published in The Lancet. It claimed a possible link between the MMR vaccine and autism. The data was thin, the methods were flawed, the study was later retracted, and the author lost his medical license for fraud.

But the damage had already been done. Vaccination rates dropped across the UK and US. Measles — a disease once on the verge of elimination — returned. Children died.

One bad study. Twelve subjects. A misinterpreted data point. The lesson isn't that statistics are dangerous. It's that the ability to read data critically is one of the most consequential skills a person can have. This module gives you the basics.

---

Reading Graphs Without Getting Fooled

Graphs are how science communicates. They can also be misleading on purpose or by accident. The four you'll encounter most often:

Bar charts — Compare values across categories. Watch out for: truncated y-axes that make small differences look dramatic. A bar chart that starts at 90% instead of 0% can make a 2-point gap look like a chasm.

Line graphs — Show change over time. Watch out for: cherry-picked time ranges. A graph showing only the last six months can hide a long-term trend going the other direction.

Scatter plots — Each dot is one data point with two values. Used to show relationships between two variables. Watch out for: outliers driving the apparent trend, and cluttered plots that hide the actual pattern.

Logarithmic plots — Each tick on the y-axis represents a 10x change, not a constant amount. Used when values span huge ranges (gene expression, pandemic case counts, earthquake magnitudes). Watch out for: confusing a log-scale chart with a linear one — they look similar but tell completely different stories.

The first thing to do with any graph is read the axes carefully. What's being measured? In what units? Starting from where? Many "shocking" visualizations fall apart the moment you check the axis.

---

Mean, Median, and the Things They Hide

When you have a set of numbers, you need a way to summarize them. Three common measures:

Mean — The average. Add all values, divide by the count. Easy to compute, but heavily affected by outliers. If you and nine friends average your incomes, and Jeff Bezos walks into the room, your "average" income is now in the hundreds of millions.

Median — The middle value when the data is sorted. Half the data is above, half below. Much more robust to outliers. Bezos walking in doesn't change the median.

Mode — The most common value. Useful for categorical data ("what's the most common blood type in this population?").

Standard deviation — A measure of how spread out the data is. A small standard deviation means values cluster tightly around the mean. A large one means they're scattered widely. Two datasets can have identical means and wildly different standard deviations — and tell completely different stories.

The honest summary of a dataset usually requires both a measure of center (mean or median) and a measure of spread (standard deviation, range, or interquartile range). If a study reports the mean without the spread, ask why.

---

The P-Value: What It Actually Means

You'll see "p < 0.05" cited in nearly every quantitative biology paper. Most people who cite it don't actually know what it means. Here's the real definition:

> A p-value is the probability of getting a result at least as extreme as the one observed, assuming the null hypothesis is true.

Let's unpack that. The null hypothesis is the boring default — "there's no real effect here." When you run an experiment, you're asking whether your data is interesting enough to reject that default.

A p-value of 0.05 means: if there were truly no effect, we'd see data this dramatic by random chance only 5% of the time. Below 0.05 is the conventional threshold for "statistically significant."

What a p-value is not:

❌ The probability that your hypothesis is true
❌ The probability that the result was due to chance
❌ A measure of how big or important the effect is
❌ Proof of anything

A study with p = 0.04 and a tiny effect size on 10,000 subjects might be statistically significant but practically meaningless. A study with p = 0.10 and a huge effect on 50 subjects might be telling you something important that simply needs more data.

P-hacking is the practice of running many statistical tests, finding one that comes out below 0.05, and reporting only that one. By pure chance, if you run 20 tests on random noise, one will look "significant" at p < 0.05. This is a major source of bad science, and it's why pre-registration of hypotheses (declaring what you'll test before you look at the data) is becoming standard.

---

Correlation vs. Causation

This is the single most important distinction in all of data analysis. Get this one wrong and you'll be misled for the rest of your life.

Correlation means two variables move together. When one goes up, the other tends to go up (positive correlation) or down (negative correlation).

Causation means one variable actually causes the change in the other.

These are not the same thing.

Classic examples of correlation without causation:

Ice cream sales and drowning deaths are correlated. (Both rise in summer. Ice cream doesn't cause drowning.)
Countries with more Nobel laureates eat more chocolate. (Both are correlated with wealth and education.)
The number of films Nicolas Cage appeared in correlates strongly with US drownings in swimming pools from 1999 to 2009. (Pure coincidence. The internet calls these "spurious correlations.")

To establish causation, you generally need one of:

A randomized controlled trial — randomly assign subjects to groups, manipulate the variable, see what happens
A natural experiment — a real-world situation that randomly varies the variable of interest
Multiple lines of converging evidence — many studies of different types pointing the same direction

A single correlational study, no matter how strong the correlation, cannot prove causation. It can only suggest the question is worth investigating further.

When you see a headline claiming X "causes" Y based on a single observational study, your first question should be: did they actually establish causation, or just correlation?

---

Wait, Actually...

The "p < 0.05" threshold is arbitrary.

It was popularized in the 1920s by statistician Ronald Fisher, who suggested it as a rough rule of thumb — not a law of nature. He explicitly wrote that the right threshold depends on the context and the cost of being wrong. Decades later, the entire scientific community calcified around 0.05 as if it were sacred.

The result is a system where a study with p = 0.049 gets published and one with p = 0.051 gets rejected — even though the underlying evidence is nearly identical. This has driven enormous distortions in science: researchers chase the threshold, reviewers use it as a substitute for thinking, and journals refuse to publish "null results" where p > 0.05.

Some scientists have proposed dropping 0.05 entirely, or replacing it with a stricter 0.005, or abandoning p-values in favor of effect sizes and confidence intervals. The conversation is ongoing. You'll graduate into a scientific community that's actively rewriting these norms.

---

Check Your Understanding

What is the most important reason the median can be more informative than the mean?

The median is easier to calculate
The median is less affected by outliers
The median is more accurate
The median works only on integers

A p-value of 0.03 means:

There is a 3% chance the hypothesis is true
The effect size is 3%
If there were no real effect, we'd see data this extreme by chance about 3% of the time
The experiment is 97% accurate

Which of the following is the strongest evidence for causation?

Two variables show a strong correlation
Many people believe the relationship is causal
A randomized controlled trial shows the relationship
The relationship has been observed for many years

What is p-hacking?

Stealing statistical software
Running many tests and selectively reporting the ones that hit p < 0.05
Computing a p-value with too few subjects
Using machine learning to find significant results

Try This

Pick any health or science claim you've encountered recently — a supplement that supposedly helps memory, a food said to cause cancer, a habit linked to longer life.

Find one study cited for that claim. Then answer:

Is the study correlational or experimental?
What was the sample size?
What was the effect size? (Not the p-value — the actual magnitude of the change.)
Has it been replicated?

This four-question filter eliminates the majority of nonsense health claims you'll encounter in your lifetime. Apply it to one claim per week and you'll develop reflexive skepticism in months.

---

Where this takes you

🧬 Genomics Track — G7 and G8 use statistical genetics extensively
🌊 Marine Biology Track — Population dynamics, climate data, and conservation modeling are statistics-heavy
🔬 Biotech Track — Clinical trial design is the highest-stakes application of these concepts
🏛️ Biotech Policy Track — Public health policy lives and dies on data interpretation

Up next: [F6 — How Research Happens →]