"There are three kinds of lies: lies, damned lies, and statistics"
               - Disraeli (Often mistakenly attributed to Mark Twain)

The Numbers


Smoking Bans
And Businesses

Odds and Ends


Statistics 101

If you think Statistics is a complex, difficult to understand subject, you're right, but this page will help remove a lot of the mystery. If you think Statistics can be twisted and manipulated to produce just about any desired result, you're right again. But once you know how the numbers are twisted, it is usually easy to spot the dishonesty.

Note: Epidemiology and Statistics are two different things, but the words are often used interchangeably. Epidemiology is the study of illnesses, and usually uses statistics, but there other methods available.

Since almost all studies on health and medicine use epidemiology to reach their conclusions, understanding how it works is the only way to sort out the facts from the deceptions and frauds. Once you learn to pick apart these studies, you'll be able to approach the media with a very different attitude. When some talking head on TV tells you that some study proves that coffee is bad for you, and a week later another head tells you it's good for you, you'll know how to find out which one is reporting the facts. In many cases, you'll find both of them are wrong.

Types of Studies

Fact: Cohort Studies follow a group of people with different exposures to a substance over a period of time. Tracking people before any health effects occur reduces the impact of bias and increases the accuracy of the study, and allows testing for a variety of illnesses. It is the most expensive, time consuming and difficult type of study to conduct.

Cohort Studies are useful for common illnesses, but are too expensive and impractical for studying rare diseases.

Fact: Case Control studies examine two groups of people, those who already have an illness, and a control group. The control group may contain a random sampling of the population, or a sample specifically selected because they don't have the illness being studied.

Case Control studies are more likely to be biased because they start by selecting people who are already sick. For instance, if you wanted to find out if coffee caused stomach cancer, a case control study would start out with a sample of people who already had stomach cancer, leaving out the coffee drinkers who remained healthy. Case Control studies are much less expensive and time consuming, requiring much smaller sample sizes and eliminating the need to track people over long periods of time. They are often the only practical way to study uncommon illnesses.

Fact: Meta Studies (more accurately referred to as Meta Analysis) are analyses of existing studies. The researcher gathers data from other studies, picks the appropriate ones, pools the results and extracts his data.

It is extremely difficult to do this with any degree of accuracy, and extremely easy to twist the results to a predetermined outcome. Simply leaving out one or two studies can skew the data dramatically in one direction or the other. Be highly suspicious of any meta analysis. Carefully check for any researcher bias. If you automatically reject any meta study conducted or financed by someone with a strong agenda, you will almost always be right.

There are other types of studies, but these are the most common.

Relative Risk

Fact: The goal of an epidemiological study is to determine Relative Risk (RR).

Relative risk is determined by first establishing a baseline, an accounting of how common a disease (or condition) is in the general population. This general rate is given a Relative Risk of 1.0, no risk at all. An increase in risk would result in a number larger than 1.0. A decrease in risk would result in a lower number, and indicates a protective effect.

For instance, if a researcher wants to find out how coffee drinking effects foot fungus, he first has to find out how common foot fungus is in the general population. In this fictional example, let's say he determines that 20 out every 1,000 people have foot fungus. That's the baseline, a RR of 1.0. If he discovers that 30 out of 1,000 coffee drinkers have foot fungus, he's discovered a fifty percent increase, which would be expressed as a RR of 1.50.

If he were to find the rate was 40 out of 1,000, it would give him a RR of 2.0.

He might find foot fungus was less common among coffee drinkers. A rate of 15 out of 1,000 would be expressed as a RR of 0.75, indicating that drinking coffee has a protective effect against foot fungus.

The media usually reports RRs as percentages. An RR of 1.40 is usually reported as a 40% increase, while an RR of .90 is reported as a 10% decrease. (In theory, at least. In practice, negative RRs are seldom reported.)

Note: Some studies calculate an Odds Ratio (OR) instead of an RR. The formulas for determining the two numbers are different, but when studying rare diseases the results are approximately the same. When studying more common diseases ORs tend to overstate the RR.

Fact: As a rule of thumb, an RR of at least 2.0 is necessary to indicate a cause and effect relationship, and a RR of 3.0 is preferred.

"As a general rule of thumb, we are looking for a relative risk of 3 or more before accepting a paper for publication." - Marcia Angell, editor of the New England Journal of Medicine"

"My basic rule is if the relative risk isn't at least 3 or 4, forget it." - Robert Temple, director of drug evaluation at the Food and Drug Administration.

"Relative risks of less than 2 are considered small and are usually difficult to interpret. Such increases may be due to chance, statistical bias, or the effect of confounding factors that are sometimes not evident." - The National Cancer Institute

"An association is generally considered weak if the odds ratio [relative risk] is under 3.0 and particularly when it is under 2.0, as is the case in the relationship of ETS and lung cancer." - Dr. Kabat, IAQC epidemiologist

This requirement is ignored in almost all studies of ETS.

While it's important to know the RR, it's also very important to find the actual numbers. When dealing with the mass media, beware of the phrase "times more likely."

For instance, a news story may announce "Banana eaters are four times more likely to get athletes foot!" You find the study, read the abstract and find the RR is, indeed, 4.0. But further digging may reveal that the risk went from 1.5 in 10,000 to 6 in 10,000. Technically, the risk is four times greater, but would you worry about a jump from 0.015% to to 0.06%?

Confidence Intervals

Fact: The Confidence interval (CI) is used to determine the precision of the RR. It is expressed as a range of values that would be considered valid, for instance .90 1.43.

The narrower the CI, the more accurate the study. The CI can be narrowed in many ways, including using more accurate data and a larger sample size.

Fact: Confidence intervals are usually calculated to a 95% confidence level. This means the odds of the results occurring by chance are 5% or less.

This is one reason epidemiology is considered a crude science. (Imagine if your brakes failed 5% of the time.) The EPA, in their infamous 1993 SHS study, used a 90% CI, doubling their margin of error to achieve their desired results.

The RR could be any number within the CI. For instance, an RR of 1.15 with a CI of .95 1.43 could just as well be a finding of 1.25, an 25% increase, or .96, a 4% decrease, or 1.0, no correlation at all. Pay close attention to any study where the CI includes 1.0. When the CI includes 1.0, the RR is not statistically significant. When the lower bound of the CI is near 1.0 the RR is barely statistically significant.

Get Smartenized®
Read the
Quick Hitts Blog.

Listen to the
Quick Hitts Podcast.


On average, women live longer than men. Any study on longevity has to account for this fact. This is called a confounder, which is easy to remember because it can confound the results of a study. Some studies use the term "confounding variable." Any study of longevity (usually referred to as a study of morbidity) which doesn't take this confounder into account will be very inaccurate. For instance, when studying the longevity of smokers, it's important to adjust for the gender difference, and adjust for the percentage of men and women in the study.

Sound complicated? It gets worse. Poor people die sooner than rich people. Black people die sooner than white people, even when adjusting for the income confounder. People in some countries live longer than people in others. So if an impoverished black male smoker in Uruguay dies before reaching the median age, is it because of his income, race, gender, smoking, or nationality?

Fact: When studying the effects of tobacco exposure, either to the smoker or to those around him, confounders include age, allergies, nationality, race, medications, compliance with medications, education, gas heating and cooking, gender, socioeconomic status, exposure to other chemicals, occupation, use of alcohol, use of marijuana, consumption of saturated fat and other dietary considerations, family history of cancer and domestic radon exposure, to name a few.

Fact: When studying the effects of SHS on children confounders include most of the above, plus breast feeding, crowding, day care and school attendance, maternal age, maternal symptoms of depression, parental allergies, parental respiratory symptoms and prematurity.

A study that does not account for all of these factors is likely to be very inaccurate, and is probably worthless.


© 2000 - 2012 Dave Hitt

Permission is granted to use this information, in whole or in part, however you like.
Attribution and Links are appreciated but not required.

Home | Contact Us

Like this? Find more at DaveHitt.Com