16 Review

Political scientists can use quantitative research to learn about the political world. Political scientists can record data, analyze these data, and make inferences based on these data. The hypothesis that a political scientist tests is called the null hypothesis.

Sometimes political scientists analyze data from a randomized experiment, in which participants are randomly assigned to groups, so that the groups are similar to each other. Then the groups are treated differently. If, after the difference in treatment, the groups differ in some outcome, then that difference in outcome is due to some combination of…

the effect of the difference in treatment
random assignment error making it look like the difference in treatment had an effect

But political scientists can use a p-value to assess the probability that, if the difference in treatment had no effect, the observed difference in outcome or a larger difference in outcome would have occurred due to random assignment error. These p-values range from…

1, for no evidence against the null hypothesis…
to as close to 0 as possible…
with lower p-values indicating more evidence against the null hypothesis

Political scientists commonly infer that the difference in treatment in a randomized experiment caused any observed difference in outcomes, if the p-value is 0.05 or lower for a single statistical test. This low p-value means that we can be sufficiently confident that it is unlikely that the observed difference was due to random assignment error.

But sometimes political scientists analyze data that is not from a randomized experiment. A common type of non-experimental research is an observational study with statistical control. For example, a political scientist might test the null hypothesis that the mean length of criminal sentences given to men equals the mean length of criminal sentences given to women. In this case, a p-value under p=0.05 would be sufficient to conclude that these means differ, but a p-value under p=0.05 would be sufficient to conclude that gender bias caused these means to differ. For example, if men receive longer criminal sentences than women receive on average, that might be due to unfair bias against men, but that also might be due to men committing more serious crimes than women commit.

For such an analysis, political scientists can use control variables to eliminate alternate explanations. For example, we can compare the mean sentence length for men convicted of murder to the mean sentence length for women convicted of murder, and then compare the mean sentence length for men convicted of robbery to the mean sentence length for women convicted of robbery, and so forth for all crimes. If the mean sentence length for men is longer than the mean sentence length for women for each type of crime, then we have addressed the type of crime as an alternate explanation for why the mean sentence length for men is longer than the mean sentence length for women.

In an observational study with statistical control, researchers must remove alternate explanations one-by-one before making an inference about what causes what. Thus, a p-value of 0.05 or less can be properly interpreted as evidence of causation only after…

all plausible explanations are controlled for, or
uncontrolled-for explanations are not sufficient to plausibly explain any residual difference

In our example, we might have addressed the type of crime as an alternate explanation, but – before inferring unfair gender bias – we should address other alternate explanations, such as past criminal history.

Political scientists have other tools to infer causality, such as discontinuity designs, in which we try to compare groups that are as similar to each other as possible, with only one major difference. For an example, suppose that we are interested in how, if at all, the letter grade a student earns in a course affects that student’s evaluation of the course. Instead of comparing, say, the mean evaluation from all students who earned an A to the mean evaluation from all students who earned a B, we could compare the mean evaluation from all students who earned 90% (and thus barely earned an A) to the mean evaluation from all students who earned 89% (and thus got a B but also barely missed an A). The 90/89 comparison will be of students who are a lot more similar to each other, compared to the A/B comparison.

Political scientists can use a meta-analysis, which combines results from studies that have been conducted to test a given hypothesis. The Law of Large Numbers indicates that, the larger a representative sample, the closer the mean of the sample is expected to be to the true mean of the population. Thus, the larger sample that is used in a meta-analysis means that the meta-analysis estimate should be expected to be closer to the true value than the estimate of any given study in the meta-analysis, presuming that the studies included in the meta-analysis are representative of all studies that have been conducted on the topic. For a meta-analyses, larger studies should on average receive more weight in the calculation of the overall estimate, because larger studies tend to provide more evidence than smaller studies provide.

Quantitative reasoning isn’t easy, and it’s even more difficult because researchers can make errors and can try to mislead you. Therefore, it’s good to be skeptical of research. But it’s also good to not be unduly skeptical. For example, instead of merely not believing a research result because there might be alternate explanations, see whether you can think of a plausible alternate explanation and think whether addressing that alternate explanation would change the key inference of the research results.