"Smoking is one of the leading causes of statistics." - Fletcher Knebel

The Numbers


Smoking Bans
And Businesses

Odds and Ends


Statistics 102

Now that you've learned some of the basics of statistics, you need to know what kinds of errors to watch for. Entire books have been written on this subject, so this list is not even close to complete. But if you learn to spot the errors listed here, you'll have more expertise than most of the journalists and "experts" who report on health issues.

Correlation Does Not Equal Causation

The most important lesson in statistics is widely ignored, especially by the mass media: Correlation between two things does not prove that one causes the other. For instance:

Virtually all heroin addicts drank milk regularly as children.
Therefore, drinking milk leads to heroin addiction.

That is, of course, complete nonsense, because most milk drinkers do not become heroin addicts. But replace "drank milk" with "smoked pot" and you have a statement that is just as ridiculous, (for the same reason), but widely believed by many anti-marijuana crusaders.

Early epidemiological studies of breast cancer indicated that multiple pregnancies had a protective effect. Women with many children were less likely to get breast cancer than women with only one or two kids, and women with no children had the highest risk. Additional studies confirmed this. Then someone noted that women with larger families usually start having kids at a younger age than women with smaller families, and researched that connection. They learned that the preventive effects were not tied to family size, but the woman's age at the time of her first pregnancy. Pregnancy before age 20 reduces the risk of breast cancer, regardless of the number of pregnancies that follow.

The early studies were carefully done, peer reviewed, widely accepted, and wrong. The correlation did not prove causation. An overlooked confounder caused the effect.

If you can only remember one fact from these pages on epidemiology, remember this one, as it is the most important:

Fact: Correlation Does Not Prove Causation

Because of this epidemiology never proves (or disproves) anything. At best it serves as guide for the people in the labs who do the real research, helping them decide which approach is most likely to generate results. But it does not, can not, prove a cause and effect relationship. It can not prove anything.

Sample Size

Roll a standard, six sided die. Let's say it comes up three. You may now publish a study stating that you have conducted a study and discovered that a die, when rolled, will display a three one hundred percent of the time. Your study will be precise, accurate, truthful, and useless, because your sample size is far too small.

Will rolling it six times result in an accurate study? Not unless you roll each number once, which is highly unlikely. How about ten rolls, or twenty?

And now, to make it even more complex, lets add a single, simple confounder. You've learned the die is slightly weighted to make one number come up 5% more often than it would on a fair die, but you don't know which number is weighted. How many tests will it take to discover which number the die favors?

That's just looking at six very obvious, very concrete possibilities and one simple confounder. Imagine how much more difficult it is to come up with an accurate analysis where there are many more different possible outcomes and dozens of cofounders.

Beware of small sample sizes. Anything involving less than several hundred people should be considered highly suspect. Several thousand is desirable.

However, a large sample size is not a guarantee of accuracy. A poorly done study using huge samples can be as worthless as a study with a tiny sample.

Survey Errors

Retrospective studies ask respondents to remember things from years ago, then treat those recollections as if they were valid data. Quick, how many hamburgers did you eat during August of last year? If that question pops up in the middle of a survey you're likely to guess, and guess wrong. Your wrong guess will be pooled with many other wrong guesses to create a study that that is pure fiction. If you have an illness that's related to eating beef, you're likely to "remember" higher consumption than people without the illnesses. This is referred to as "recall bias" and can skew a study in either direction. Recall bias makes most retrospective studies worthless.

Some people will intentionally lie on a survey. For instance, some smokers will tell interviewers they don't smoke, especially if they only smoke a few cigarettes a day or if the interviewer exhibits a disapproving attitude. Have you ever filled out one of those survey forms that asks for a salary range, and checked off the box just above the real amount you take home? Me too. This is referred to as "Misclassification Bias."

When studying something as rare as lung cancer, even a few errors can skew the study significantly. There are mathematical methods to estimate (guess) the number of these errors, but the only way to accurately account for them is to verify the data.

Unless the data is verified, information gathered from surveys should be considered highly suspect.

It is possible to change the outcome of a survey by changing the wording of the questions, or even the order the questions are asked. For this reason, demand to see the actual questions used when someone uses a survey to prove anything.

Dose / Response Relationship

This one should be obvious, but you'll catch a lot of bogus studies with it.

Fact: The risk should increase with increased exposure.

If it doesn't, it's unlikely that the cause and the effect are related.

Researcher Bias

This is covered in detail on the agendas page.

Biological Plausibility

Does the connection make sense? The idea of smokers being more susceptible to diseases of the mouth, throat and lungs sounds plausible, but how would smoking cause colon cancer? What biological mechanism could make this happen?

Get Smartenized®
Read the
Quick Hitts Blog.

Listen to the
Quick Hitts Podcast.

Pay close attention to studies that compare the effects on "passive smokers" to the effects of primary smokers. Quite often you'll find that illnesses are just as common (or almost as common) in the non-smokers as in the smokers. Considering that even in the smokiest environment non-smokers only inhale a fraction of a cigarette a day, such claims are not biologically plausible. Or to put it in everyday language, they just don't make any sense.

Publication Bias

Studies that show significant increases in risks are more likely to be published than studies that don't. The Journal of the American Medical Association did a study on this, specifically on SHS studies. They determined that "A greater proportion of studies with statistically significant results are published than those with nonsignificant results," and "There is a publication delay for passive smoking studies with nonsignificant results compared with those with significant results." An article on the study is available here.


There are many other biases and errors that can affect the accuracy of studies, but now you're familiar with some of the most common ones. Whenever you suspect the results of a study (and you should always suspect the results of any study) ask the following questions:

What kind of study was it?. If it was a meta analysis, check out the agenda of the researcher or the funding organization. You probably won't need to ask any more questions.

What is the RR?. (If it's less than 2.0, you don't need to ask any more questions.)

How was the data gathered? How was it verified? If it's a retrospective study, and the data wasn't validated independently of the survey, you can discard it.

If a survey was used, what were the actual questions on the survey?

How big was the sample size? How was the sample selected?

Is there a dose/response relationship?

What confounders were considered and adjusted for? (If you get this far without discrediting the study, this is one of the most important questions.)

Who conducted this study? What is their personal history conducting other studies? What personal interest might they have in the outcome?

Who funded the study? What is their history regarding other studies, and what is their interest in the outcome?

Armed with this knowledge, you are now qualified to determine the merit of just about any epidemiological study. You will need to find the original study, of course, not just the news story on it. Fortunately, many of them can be found on the Internet with a bit of digging.

Most studies start with an abstract, a condensed version of the study. Sometimes the abstract is all you need to spot a bogus study.

If the abstract passes the test, read the actual study. At first glance they may seem nearly impossible to decipher, but it gets easier after you've stumbled through a few. You may want to bookmark this page as a reference.

More Information

A good explanation of statistics terms.

A few other questions to ask.


© 2000 - 2012 Dave Hitt

Permission is granted to use this information, in whole or in part, however you like.
Attribution and Links are appreciated but not required.

Home | Contact Us

Like this? Find more at DaveHitt.Com