In a prior post, I criticized the questionnaire for the ANES 2020 Time Series Study, so I want to use this post to praise the questionnaire for the ANES 2022 Pilot Study, plus add some other comments.

---

1. The pilot questionnaire has items that ask participants to rate men and women on 0-to-100 feeling thermometers, which will permit assessment of the association for negative attitudes about women and men, presuming that some of the planned 1500 respondents express such negative attitudes.

2. The pilot questionnaire has items in which response options permit underestimation of the frequency of certain types of vote fraud, with a "Never" option for items about how often in the respondent's state [1] a voter casts more than one ballot and [2] votes are cast on behalf of dead people. That happened at least once recently in Arizona (see also https://www.heritage.org/voterfraud), and I suspect that this is currently a misperception that is more common on the political left.

But it doesn't seem like a good idea to phrase the vote fraud item about the respondent's state, so that coding a response as a misperception requires checking evidence in 50 states. And I don't think there is an obvious threshold for overestimating how often, say, a voter casts more than one ballot. "Rarely" seems like an appropriate response for Arizona residents, but is "Occasionally" incorrect?

3. The pilot questionnaire has an item about the genuineness of emails on Hunter Biden's laptop in which Hunter Biden "contacted representatives of foreign governments about business deals". So I guess that can be a misinformation item that liberals are more likely to be misinformed about.

4. The pilot questionnaire has items about whether being White/Black/Hispanic/Asian "comes with advantages, disadvantages, or doesn't it matter". Based on the follow up item, these items might not permit respondents to select both "advantages" and "disadvantages", and, if so, it might be better to differentiate respondents who think that, for instance, being White has only advantages from respondents who think that being White has on net more advantages than disadvantages.

5. The pilot questionnaire permits respondents to report the belief that Black and Hispanic Americans have lower socioeconomic status than White Americans because of biological differences, but respondents can't report the belief that particular less positive outcomes for White Americans relative to another group are due to biological differences (e.g., average White American K12 student math performance relative to average Asian American K12 student math performance).

---

Overall, the 2022 pilot seems like an improvement. For one thing, the pilot questionnaire, like is common for the ANES, has feeling thermometers about Whites, Blacks, Hispanics, and Asians, so that it's possible to construct a measure of negative attitudes about each included racial/ethnic group. And the feeling thermometers for men and women permit construction of a measure of negative attitudes about men and women. For another thing, respondents can report misperceptions that are presumably more common among persons on the political left. That's more than what is permitted by a lot of similar surveys.

Tagged with: , , , , ,

I posted earlier about Filindra et al 2022 "Beyond Performance: Racial Prejudice and Whites' Mistrust of Government". This post discusses part of the code for Filindra et al 2022.

---

Tables in Filindra et al 2022 have a pair of variables called "conservatism (ideology)" and "conservatism not known" and a pair of variables called "income" and "income not known". For an example of what the "not known" variables are for, if a respondent in the 2016 data did not provide a substantive response to the ideology item, Filindra et al 2022 coded that respondent as 1 in the dichotomous 0-or-1 "conservatism not known" variable and imputed a value of zero for the seven-level "conservatism (ideology)" variable, with zero indicating "extremely liberal".

I don't recall seeing that method before, so I figured I would post about it. I reproduced the Filindra et al. 2022 Table 1 results for the 2016 data and then changed the imputed value for "conservatism (ideology)" from 0 (extremely liberal) to 1 (extremely conservative). That changed the coefficient and t-statistic for the "conservatism not known" predictor but not the coefficient or t-statistic for the "conservatism (ideology)" predictor or for any other predictor (log of the Stata output).

---

I think that it might have been from Schaffner et al 2018 that I picked up the use of categories as a way to not lose observations from an analysis merely because the observation has a missing value for a predictor. For example, if a respondent doesn't indicate their income, then income can be coded as a series of categories with non-response as a category (such as income $20,000 or lower; income $20,001 to $40,000; ...; income $200,001 and higher; and income missing). Thus, in a regression with this categorical predictor for income, observations are not lost merely because of not having a substantive value for income. Another nice feature of this categorical approach is permitting nonuniform associations, in which, for example, the association of income might level off at higher categories.

But dealing with missing values on a control by using categorical predictors can produce long regression output, with, for example, fifteen categories of income, eight categories of ideology, ten categories of age, etc. The Filindra et al 2022 method seems like a reasonable shortcut, as long as it's understood that results for the "not known" predictors depend on the choice of imputed value. But these "not known" predictors aren't common in the research that I read, so maybe there is another flaw in that method that I'm not aware of.

---

NOTE

1. I needed to edit line 1977 in the Filindra et al 2022 code to:

recode V162345 V162346 V162347 V162348 V162349 V162350 V162351 V162352 (-9/-5=.)

Tagged with: ,

This year, I have discussed several errors or flaws in recent journal articles (e.g., 1, 2, 3, 4). For some new examples, I think that Figure 2 of Cargile 2021 reported estimates for the feminine factor instead of, as labeled, the masculine factor, and Fenton and Stephens-Dougan 2021 described a "very small" 0.01 odds ratio as "not substantively meaningful":

Finally, the percent Black population in the state was also associated with a statistically significant decline in responsiveness. However, it is worth noting that this decline was not substantively meaningful, given that the odds ratio associated with this variable was very small (.01).

I'll discuss more errors or flaws in the notes below, with more blog posts planned.

---

Given that peer review and/or the editing process will miss errors that readers can catch, it seems like it would be a good idea for journal editors to get more feedback before an article is published.

For example, the Journal of Politics has been posting "Just Accepted" manuscripts before the final formatted version of the manuscript is published, which I think permits the journal to correct errors that readers catch in the posted manuscripts.

The Journal of Politics recently posted the manuscript for Baum et al. "Sensitive Questions, Spillover Effects, and Asking About Citizenship on the U.S. Census". I think that some of the results reported in the text do not match the corresponding results reported in Table 1. For example, the text (numbered p. 4) indicates that:

Consistent with expectations, we also find this effect was more pronounced for Hispanics, who skipped 4.21 points more of the questions after the Citizenship Treatment was introduced (t-statistic = 3.494, p-value is less than 0.001).

However, from what I can tell, the corresponding Table 1 result indicates a 4.49 difference, with a t-statistic of 3.674.

---

Another potential flaw in the above statement is that, from what I can tell, the t-statistic for the "more pronounced for Hispanics" claim is based on a test of whether the estimate among Hispanics differs from zero. However, the t-statistic for the "more pronounced for Hispanics" claim should instead be from a test of whether the estimate among Hispanics differs from the estimate among non-Hispanics or whatever comparison category the "more pronounced" refers to.

---

So, to the extent that these aforementioned issues are errors or flaws, maybe these can be addressed before the Journal of Politics publishes the final formatted version of the Baum et al. manuscript.

---

NOTES

1. I think that this is an error, from Lucas and Silber Mohamed 2021, with emphasis added:

Moreover, while racial sympathy may lead to some respondents viewing non-white candidates more favorably, Chudy finds no relationship between racial sympathy and gender sympathy, nor between racial sympathy and attitudes about gendered policies.

That seemed a bit unlikely to me when I read it, and, sure enough, Chudy 2020 footnote 20 indicates that:

The raw correlation of the gender sympathy index and racial sympathy index was .3 for the entire sample (n = 1,000) and .28 for whites alone (n = 751).

2. [sic] errors in Jardina and Stephens-Dougan 2021. Footnote 25:

The Stereotype items were note included on the 2020 ANES Time Series study.

...and the Section 4 heading:

Are Muslim's part of a "band of others?"

... and the Table 2 note:

2016 ANES Time Serie Study

Moreover, the note for Jardina and Stephens-Dougan 2021 Figure 1 describes the data source as: "ANES Cumulative File (face-to-face respondents only) & 2012 ANES Times Series (all modes)". But, based on the text and the other figure notes, I think that this might refer to 2020 instead of 2012.

These things happen, but I think that it's worth noting, at least as evidence against the idea that peer reviews shouldn't note grammar-type errors.

3. I discussed conditional-acceptance comments in my PS symposium entry "Left Unchecked".

Tagged with: , ,

I received a few questions and comments about my use of 83.4% confidence intervals on the plot in my prior post, so I thought I would post an explanation that I can refer to later.

---

Often, political scientists use a p-value of p=0.05 as a threshold for sufficient evidence of an association, such that only p-values under p=0.05 indicate sufficient evidence. Plotting 95% confidence intervals can help readers assess whether the evidence indicates that a given estimate differs from a given value.

For example, in unweighted data from the ANES 2020 Time Series Study, the 95% confidence interval for Black respondents' mean rating about Whites is [63.0, 67.0]. The number 62 falls outside the 95% confidence interval, so that indicates that there is sufficient evidence at p=0.05 that Black respondents' mean rating about Whites is not 62. However, the number 64 falls inside the 95% confidence interval, so that indicates that there is not sufficient evidence at p=0.05 that the mean rating about Whites among Black respondents is not 64.

---

But suppose that we wanted to assess whether two estimates differ *from each other*. Below is a plot of 95% confidence intervals for Black respondents' mean rating about Whites and about Asians, in unweighted data. For a test of the null hypothesis that the estimates differ from each other, the p-value is p=0.04, indicating sufficient evidence of a difference. However, the 95% confidence intervals overlap quite a bit.

The 95% confidence intervals in this case don't do a good job of permitting readers to assess differences between estimates at the p=0.05 level.

But below is a plot that instead uses 83.4% confidence intervals. The ends of the 83.4% confidence intervals come close to each other but do not overlap. If using confidence interval touching as an approximation to p=0.05 evidence of a difference, that closeness without overlapping is what we would expect from a p-value of p=0.04.

Based on whether 83.4% confidence intervals overlap, readers can often get a good sense whether estimates differ at p=0.05. So my current practice is to plot 95% confidence intervals when the comparison of interest is of an estimate to a given number and to plot 83.4% confidence intervals when the comparison of interest is of one estimate to another estimate.

---

NOTES

1. Data source: American National Election Studies. 2021. ANES 2020 Time Series Study Preliminary Release: Combined Pre-Election and Post-Election Data [dataset and documentation]. March 24, 2021 version. www.electionstudies.org.

2. R code for the plots.

Tagged with: ,

In "Gendered Nationalism and the 2016 US Presidential Election: How Party, Class, and Beliefs about Masculinity Shaped Voting Behavior" (Politics & Gender 2019), Melissa Deckman and Erin Cassese reported a Table 2 model that had a sample size of 750 and a predictor for college degree that had a logit coefficient of -0.57 and a standard error of 0.28, so the associated t-statistic is -0.57/28, or about -2.0, which produces a p-value of about 0.05.

The college degree coefficient fell to -0.27 when a "gendered nationalism" predictor was added to the model, and Deckman and Cassese 2019 indicated (pp. 17-18) that:

A post hoc Wald test comparing the size of the coefficients between the two models suggests that the coefficient for college was significantly reduced by the inclusion of the mediator [F(1,678) = 7.25; p < .0072]...

From what I can tell, this means that there is stronger evidence for the -0.57 coefficient differing from the -0.27 coefficient (p<0.0072) than for the -0.57 coefficient differing from zero (p≈0.05).

This type of odd result has been noticed before.

---

For more explanation, below are commands that can be posted into Stata to produce a similar result:

clear all
set seed 123
set obs 500
gen Y = runiform(0,10)
gen X1 = 0.01*(Y + runiform(0,10)^2)
gen X2 = 0.01*(Y + 2*runiform(0,10))
reg Y X1
egen weight = fill(1 1 1 1 1)
svyset [pw=weight]
svy: reg Y X1
estimates store X1alone
svy: reg Y X1 X2
estimates store X1paired
suest X1alone X1paired
lincom _b[X1alone:X1] - 0
di _b[X1paired:X1]
lincom _b[X1alone:X1] - 0.4910762
lincom _b[X1alone:X1] - _b[X1paired:X1]

The X1 coefficient is 0.8481948 in the "reg Y X1" model and is 0.4910762 in the "reg Y X1 X2" model. Results for the "lincom _b[X1alone:X1] - _b[X1paired:X1]" command indicate that the p-value is 0.040 for the test that the 0.8481948 coefficient differs from the 0.4910762 coefficient. But results for the "lincom _b[X1alone:X1] - 0.4910762" command indicate that the p-value is 0.383 for the test that the 0.8481948 coefficient differs from the number 0.4910762.

So, from what I can tell, there is stronger evidence that the 0.8481948 X1 coefficient differs from an imprecisely estimated coefficient that has the value of 0.4910762 than from the value of 0.4910762.

---

As indicated in the link above, this odd result appears attributable to the variance sum law:

Variance(X-Y) = Variance(X) + Variance(Y) - 2*Covariance(X,Y)

For the test of whether the 0.8481948 X1 coefficient differs from the 0.4910762 X1 coefficient, the formula is:

Variance(X-Y) = Variance(X) + Variance(Y) - 2*Covariance(X,Y)

But for the test of whether the -0.57 coefficient differs from zero, the formula reduces to:

Variance(X-Y) = Variance(X) + 0 - 0

For the simulated data, subtracting 2*Covariance(X,Y) reduces Variance(X-Y) more than adding the Variance(Y) increases Variance(X-Y), which explains how the p-value can be lower for comparing the two coefficients to each other than for comparing one coefficient to the value of the other coefficient.

See the code below:

suest X1alone X1paired
matrix list e(V)
di (.8481948-.4910762)/sqrt(.16695974)
di (.8481948-.4910762)/sqrt(.16695974+.14457114-2*.14071065)
test _b[X1alone:X1] = _b[X1paired:X1]

Stata output here.

Tagged with:

I had a recent Twitter exchange about a Monkey Cage post:

Below, I use statistical power calculations to explain why the Ahlquist et al. paper, or at least the list experiment analysis cited in the Monkey Cage post, is not compelling.

---

Discussing the paper (published version here), Henry Farrell wrote:

So in short, this research provides exactly as much evidence supporting the claim that millions of people are being kidnapped by space aliens to conduct personally invasive experiments on, as it does to support Trump's claim that millions of people are engaging in voter fraud.

However, a survey with a sample size of three would also not be able to differentiate the percentage of U.S. residents who commit vote fraud from the percentage of U.S. residents abducted by aliens. For studies that produce a null result, it is necessary to assess the ability of the study to detect an effect of a particular size, to get a sense of how informative that null result is.

The Ahlquist et al. paper has a footnote [31] that can be used to estimate the statistical power for their list experiments: more than 260,000 total participants would be needed for a list experiment to have 80% power to detect a 1 percentage point difference between treatment and control groups, using an alpha of 0.05. The power calculator here indicates that the corresponding estimated standard deviation is at least 0.91 [see note 1 below].

So let's assume that list experiment participants are truthful and that we combine the 1,000 participants from the first Ahlquist et al. list experiment with the 3,000 participants from the second Ahlquist et al. list experiment, so that we'd have 2,000 participants in the control sample and 2,000 participants in the treatment sample. Statistical power calculations using an alpha of 0.05 and a standard deviation of 0.91 indicate that there is:

  • a 5 percent chance of detecting a 1% rate of vote fraud.
  • an 18 percent chance of detecting a 3% rate of vote fraud.
  • a 41 percent chance of detecting a 5% rate of vote fraud.
  • a 79 percent chance of detecting an 8% rate of vote fraud.
  • a 94 percent chance of detecting a 10% rate of vote fraud.

---

Let's return to the claim that millions of U.S. residents committed vote fraud and use 5 million for the number of adult U.S. residents who committed vote fraud in the 2016 election, eliding the difference between illegal votes and illegal voters. There are roughly 234 million adult U.S. residents (reference), so 5 million vote fraudsters would be 2.1% of the adult population, and a 4,000-participant list experiment would have about an 11 percent chance of detecting that 2.1% rate of vote fraud.

Therefore, if 5 million adult U.S. residents really did commit vote fraud, a list experiment with the sample size of the pooled Ahlquist et al. 2014 list experiments would produce a statistically-significant detection of vote fraud about 1 of every 9 times the list experiment was conducted. The fact that Ahlquist et al. 2014 didn't detect voter impersonation at a statistically-significant level doesn't appear to compel any particular belief about whether the rate of voter impersonation in the United States is large enough to influence the outcome of presidential elections.

---

NOTES

1. Enter 0.00 for mu1, 0.01 for mu2, 0.91 for sigma, 0.05 for alpha, and a 130,000 sample size for each sample; then hit Calculate. The power will be 0.80.

2. I previously discussed the Ahlquist et al. list experiments here and here. The second link indicates that an Ahlquist et al. 2014 list experiment did detect evidence of attempted vote buying.

Tagged with: , , ,