Now that you've learned some of the basics of statistics, you need
to know what kinds of errors to watch for. Entire books have been
written on this subject, so this list is not even close to complete.
But if you learn to spot the errors listed here, you'll have more expertise
than most of the journalists and "experts" who report on
Not Equal Causation
The most important lesson in statistics is widely ignored, especially
by the mass media: Correlation between two things does not prove
that one causes the other. For instance:
Virtually all heroin addicts drank milk regularly
Therefore, drinking milk leads to heroin addiction.
That is, of course, complete nonsense, because most milk drinkers do
not become heroin addicts. But replace "drank milk" with "smoked pot"
and you have a statement that is just as ridiculous, (for the same reason),
but widely believed by many anti-marijuana crusaders.
Early epidemiological studies of breast cancer indicated that multiple
pregnancies had a protective effect. Women with many children were less
likely to get breast cancer than women with only one or two kids, and
women with no children had the highest risk. Additional studies confirmed
this. Then someone noted that women with larger families usually start
having kids at a younger age than women with smaller families, and researched
that connection. They learned that the preventive effects were not tied
to family size, but the woman's age at the time of her first pregnancy.
Pregnancy before age 20 reduces the risk of breast cancer, regardless
of the number of pregnancies that follow.
The early studies were carefully done, peer reviewed, widely accepted,
and wrong. The correlation did not prove causation. An overlooked confounder
caused the effect.
If you can only remember one fact from these pages on epidemiology,
remember this one, as it is the most important:
Fact: Correlation Does Not Prove
Because of this epidemiology never proves (or disproves) anything.
At best it serves as guide for the people in the labs who do the real
research, helping them decide which approach is most likely to generate
results. But it does not, can not, prove a cause and effect
relationship. It can not prove anything.
Roll a standard, six sided die. Let's say it comes up three. You may
now publish a study stating that you have conducted a study and discovered
that a die, when rolled, will display a three one hundred percent of
the time. Your study will be precise, accurate, truthful, and useless,
because your sample size is far too small.
Will rolling it six times result in an accurate study? Not unless you
roll each number once, which is highly unlikely. How about ten rolls,
And now, to make it even more complex, lets add a single, simple confounder.
You've learned the die is slightly weighted to make one number come
up 5% more often than it would on a fair die, but you don't know which
number is weighted. How many tests will it take to discover which number
the die favors?
That's just looking at six very obvious, very concrete possibilities
and one simple confounder. Imagine how much more difficult it is to
come up with an accurate analysis where there are many more different
possible outcomes and dozens of cofounders.
Beware of small sample sizes. Anything involving less than several
hundred people should be considered highly suspect. Several thousand
However, a large sample size is not a guarantee of accuracy. A poorly
done study using huge samples can be as worthless as a study with a
Retrospective studies ask respondents to remember things from years
ago, then treat those recollections as if they were valid data. Quick,
how many hamburgers did you eat during August of last year? If that
question pops up in the middle of a survey you're likely to guess, and
guess wrong. Your wrong guess will be pooled with many other wrong guesses
to create a study that that is pure fiction. If you have an illness
that's related to eating beef, you're likely to "remember" higher consumption
than people without the illnesses. This is referred to as "recall
bias" and can skew a study in either direction. Recall bias makes
most retrospective studies worthless.
Some people will intentionally lie on a survey. For instance, some
smokers will tell interviewers they don't smoke, especially if they
only smoke a few cigarettes a day or if the interviewer exhibits a disapproving
attitude. Have you ever filled out one of those survey forms that asks
for a salary range, and checked off the box just above the real
amount you take home? Me too. This is referred to as "Misclassification
When studying something as rare as lung cancer, even a few errors can
skew the study significantly. There are mathematical methods to estimate
(guess) the number of these errors, but the only way to accurately account
for them is to verify the data.
Unless the data is verified, information gathered from surveys should
be considered highly suspect.
It is possible to change the outcome of a survey by changing the wording
of the questions, or even the order the questions are asked. For this
reason, demand to see the actual questions used when someone uses a
survey to prove anything.
/ Response Relationship
This one should be obvious, but you'll catch a lot of bogus studies
Fact: The risk should increase
with increased exposure.
If it doesn't, it's unlikely that the cause and the effect are related.
This is covered in detail on the agendas
Does the connection make sense? The idea of smokers being more susceptible
to diseases of the mouth, throat and lungs sounds plausible, but how
would smoking cause colon cancer? What biological mechanism could make
Quick Hitts Blog.
Listen to the
Quick Hitts Podcast.
Pay close attention to studies that compare the effects on "passive
smokers" to the effects of primary smokers. Quite often you'll find
that illnesses are just as common (or almost as common) in the non-smokers
as in the smokers. Considering that even in the smokiest environment
non-smokers only inhale a fraction of a cigarette a day, such claims
are not biologically plausible. Or to put it in everyday language, they
just don't make any sense.
Studies that show significant increases in risks are more likely to
be published than studies that don't. The Journal of the American Medical
Association did a study on this, specifically on SHS studies. They determined
that "A greater proportion of studies with statistically significant
results are published than those with nonsignificant results,"
and "There is a publication delay for passive smoking studies with
nonsignificant results compared with those with significant results."
An article on the study is available here.
There are many other biases and errors that can affect the accuracy
of studies, but now you're familiar with some of the most common ones.
Whenever you suspect the results of a study (and you should always suspect
the results of any study) ask the following questions:
What kind of study was it?. If it was a meta analysis, check out
the agenda of the researcher or the funding organization. You probably
won't need to ask any more questions.
What is the RR?. (If it's less than 2.0, you don't need to ask
any more questions.)
How was the data gathered? How was it verified? If it's a retrospective
study, and the data wasn't validated independently of the survey,
you can discard it.
If a survey was used, what were the actual questions on the survey?
How big was the sample size? How was the sample selected?
Is there a dose/response relationship?
What confounders were considered and adjusted for? (If you get
this far without discrediting the study, this is one of the most
Who conducted this study? What is their personal history conducting
other studies? What personal interest might they have in the outcome?
Who funded the study? What is their history regarding other studies,
and what is their interest in the outcome?
Armed with this knowledge, you are now qualified to determine the merit
of just about any epidemiological study. You will need to find the original
study, of course, not just the news story on it. Fortunately, many of
them can be found on the Internet with a bit of digging.
Most studies start with an abstract, a condensed version of the study.
Sometimes the abstract is all you need to spot a bogus study.
If the abstract passes the test, read the actual study. At first glance
they may seem nearly impossible to decipher, but it gets easier after
you've stumbled through a few. You may want to bookmark this page as
A good explanation
of statistics terms.
A few other questions