I had a recent Twitter exchange about a Monkey Cage post:

@LJZigerell @monkeycageblog Thanks. I was familiar with your disagreement with Ahlquist et al. but unlike you find the paper and broader scholarly consensus compelling.

— Henry Farrell (@henryfarrell) May 11, 2017

Below, I use statistical power calculations to explain why the Ahlquist et al. paper, or at least the list experiment analysis cited in the Monkey Cage post, is not compelling.

---

Discussing the paper (published version here), Henry Farrell wrote:

So in short, this research provides exactly as much evidence supporting the claim that millions of people are being kidnapped by space aliens to conduct personally invasive experiments on, as it does to support Trump's claim that millions of people are engaging in voter fraud.

However, a survey with a sample size of three would also not be able to differentiate the percentage of U.S. residents who commit vote fraud from the percentage of U.S. residents abducted by aliens. For studies that produce a null result, it is necessary to assess the ability of the study to detect an effect of a particular size, to get a sense of how informative that null result is.

The Ahlquist et al. paper has a footnote [31] that can be used to estimate the statistical power for their list experiments: more than 260,000 total participants would be needed for a list experiment to have 80% power to detect a 1 percentage point difference between treatment and control groups, using an alpha of 0.05. The power calculator here indicates that the corresponding estimated standard deviation is at least 0.91 [see note 1 below].

So let's assume that list experiment participants are truthful and that we combine the 1,000 participants from the first Ahlquist et al. list experiment with the 3,000 participants from the second Ahlquist et al. list experiment, so that we'd have 2,000 participants in the control sample and 2,000 participants in the treatment sample. Statistical power calculations using an alpha of 0.05 and a standard deviation of 0.91 indicate that there is:

- a 5 percent chance of detecting a 1% rate of vote fraud.
- an 18 percent chance of detecting a 3% rate of vote fraud.
- a 41 percent chance of detecting a 5% rate of vote fraud.
- a 79 percent chance of detecting an 8% rate of vote fraud.
- a 94 percent chance of detecting a 10% rate of vote fraud.

---

Let's return to the claim that millions of U.S. residents committed vote fraud and use 5 million for the number of adult U.S. residents who committed vote fraud in the 2016 election, eliding the difference between illegal votes and illegal voters. There are roughly 234 million adult U.S. residents (reference), so 5 million vote fraudsters would be 2.1% of the adult population, and a 4,000-participant list experiment would have about an 11 percent chance of detecting that 2.1% rate of vote fraud.

Therefore, if 5 million adult U.S. residents really did commit vote fraud, a list experiment with the sample size of the pooled Ahlquist et al. 2014 list experiments would produce a statistically-significant detection of vote fraud about 1 of every 9 times the list experiment was conducted. The fact that Ahlquist et al. 2014 didn't detect voter impersonation at a statistically-significant level doesn't appear to compel any particular belief about whether the rate of voter impersonation in the United States is large enough to influence the outcome of presidential elections.

---

NOTES

1. Enter 0.00 for mu1, 0.01 for mu2, 0.91 for sigma, 0.05 for alpha, and a 130,000 sample size for each sample; then hit Calculate. The power will be 0.80.

2. I previously discussed the Ahlquist et al. list experiments here and here. The second link indicates that an Ahlquist et al. 2014 list experiment did detect evidence of attempted vote buying.