The American Political Science Review recently published a letter: Stephens-Dougan 2022 "White Americans' reactions to racial disparities in COVID-19".
Figure 1 of the Stephens-Dougan 2022 APSR letter reports results for four outcomes among racially prejudiced Whites, with the 84% confidence interval in the control overlapping with the 84% confidence interval in the treatment for only one of the four reported outcomes (zooming in on Figure 1, the confidence intervals for the parks outcome don't seem to overlap, and the code returns 0.1795327 for the upper bound for the control and 0.18800818 for the lower bound for the treatment). And results for the most obviously overlapping 84% confidence intervals seem to be interpreted as sufficient evidence of an effect, with all four reported outcomes discussed in the passage below:
When racially prejudiced white Americans were exposed to the racial disparities information, there was an increase in the predicted probability of indicating that they were less supportive of wearing face masks, more likely to feel their individual rights were being threatened, more likely to support visiting parks without any restrictions, and less likely to think African Americans adhere to social distancing guidelines.
---
There are at least three things to keep track of: [1] the APSR letter, [2] the survey questionnaire, located at the OSF site for the Time-sharing Experiments for the Social Sciences project; and [3] the pre-analysis plan, located at the OSF and in the appendix of the APSR article. I'll use the PDF of the pre-analysis plan. The TESS site also has the proposal for the survey experiment, but I won't discuss that in this post.
---
The pre-analysis plan does not mention all potential outcome variables that are in the questionnaire, but the pre-analysis plan section labeled "Hypotheses" includes the passage below:
Specifically, I hypothesize that White Americans with anti-Black attitudes and those White Americans who attribute racial disparities in health to individual behavior (as opposed to structural factors), will be more likely to disagree with the following statements:
The United States should take measures aimed at slowing the spread of the coronavirus while more widespread testing becomes available, even if that means many businesses will have to stay closed.
It is important that people stay home rather than participating in protests and rallies to pressure their governors to reopen their states.
I also hypothesize that White Americans with anti-Black attitudes and who attribute racial health disparities to individual behavior will be more likely to agree with the following statements:
State and local directives that ask people to "shelter in place" or to be "safer at home" are a threat to individual rights and freedom.
The United States will take too long in loosening restrictions and the economic impact will be worse with more jobs being lost
The four outcomes mentioned in the passage above correspond to items Q15, Q18, Q16, and Q21 in the survey questionnaire, but, of these four outcomes, the APSR letter reported on only Q16.
The outcome variables in the APSR letter are described as: "Wearing facemasks is not important", "Individual rights and freedom threatened", "Visit parks without any restrictions", and "Black people rarely follow social distancing guidelines". These outcome variables correspond to survey questionnaire items Q20, Q16, Q23A, and Q22A.
---
The pre-analysis plan PDF mentions moderators, with three moderators about racial dispositions: racial resentment, negative stereotype endorsement, and attributions for health disparities. The plan indicates that:
For racial predispositions, we will use two or three bins, depending on their distributions. For ideology and party, we will use three bins. We will include each bin as a dummy variable, omitting one category as a baseline.
The APSR letter reported on only one racial predispositions moderator: negative stereotype endorsement.
---
I'll post a link in the notes below to some of my analyses about the "Specifically, I hypothesize" outcomes, but I don't want to focus on the results, because I wanted this post to focus on deviations from the pre-analysis plan, because -- regardless of whether the estimates from the analyses in the APSR letter are similar to the estimates from the planned analyses in the pre-analysis plan -- I think that it's bad that readers can't trust the APSR to ensure that a pre-analysis plan is followed or at least to provide an explanation about why a pre-analysis plan was not followed, especially given that this APSR letter described itself as reporting on "a preregistered survey experiment" and included the pre-analysis plan in the appendix.
---
NOTES
1. The Stephens-Dougan 2022 APSR letter suggests that the negative stereotype endorsement variable was coded dichotomously ("a variable indicating whether the respondent either endorsed the stereotype that African Americans are less hardworking than whites or the stereotype that African Americans are less intelligent than whites"), but the code and the appendix of the APSR letter indicate that the negative stereotype endorsement variable was measured so that the highest level is for respondents who reported a negative relative stereotype about Blacks for both stereotypes. From Table A7:
(unintelligentstereotype 2 + lazystereotype2 )/2
In the data after running the code for the APSR letter, the negative stereotype endorsement variable is a three-level variable coded 0 for respondents who did not report a negative relative stereotype about Blacks for either stereotype, 0.5 for respondents who reported a negative stereotype about Blacks for one stereotype, and 1 for respondents who reported a negative relative stereotype about Blacks for both stereotypes.
2. The APSR letter indicated that:
The likelihood of racially prejudiced respondents in the control condition agreeing that shelter-in-place orders threatened their individual rights and freedom was 27%, compared with a likelihood of 55% in the treatment condition (p < 0.05 for a one-tailed test).
My analysis using survey weights got 44% and 29% among participants who reported a negative relative stereotype about Blacks for at least one of the two stereotype items, and my analysis got 55% and 26% among participants who reported negative relative stereotypes about Blacks for both stereotype items, with a trivial overlap in 84% confidence intervals.
But the 55% and 26% in a weighted analysis were 43% and 37% in an unweighted analysis with a large overlap in 84% confidence intervals, suggesting that at least some of the results in the APSR letter might be limited to the weighted analysis. I ran the code for the APSR letter removing the weights from the glm command and got the revised Figure 1 plot below. The error bars in the APSR letter are described as 84% confidence intervals.
I think that it's fine to favor the weighted analysis, but I'd prefer that publications indicate when results from an experiment are not robust to the application or non-application of weights. Relevant publication.
3. Given the results in my notes [1] and [2], maybe the APSR letter's Figure 1 estimates are for only respondents who reported negative relative stereotype about Blacks for both stereotypes. If so, the APSR letter's suggestion that this population is the 26% that reported anti-Black stereotypes for either stereotype might be misleading, if the Figure 1 analyses were estimated for only the 10% that reported negative relative stereotype about Blacks for both stereotypes.
For what it's worth, the R code for the APSR letter has code that doesn't use the 0.5 level of the negative stereotype endorsement variable, such as:
# Below are code for predicted probabilities using logit model
# Predicted probability "individualrights_dichotomous"
# Treatment group, negstereotype_endorsement = 1
p1.1 <- invlogit(coef(glm1)[1] + coef(glm1)[2] * 1 + coef(glm1)[3] * 1 + coef(glm1)[4] * 1)
It's possible to see what happens to the Figure 1 results when the negative stereotype endorsement variable is coded 1 for respondents who endorsed at least one of the stereotypes. Run this at the end of the Stata code for the APSR letter:
replace negstereotype_endorsement = ceil((unintelligentstereotype2 + lazystereotype2)/2)
Then run the R code for the APSR letter. Below is the plot I got for a revised Figure 1, with weights applied and the sample limited to respondents who endorsed at least one of the stereotypes:
Estimates in the figure above were close to estimates in my analysis using these Stata commands after running the Stata code from the APSR letter. Stata output.
4. Data, Stata code, and Stata output for my analysis about the "Specifically, I hypothesize" passage of the Stephens-Dougan pre-analysis plan.
My analysis in the Stata output had seven outcomes: the four outcomes mentioned in the "Specifically, I hypothesize" part of the pre-analysis plan as initially measured (corresponding to questionnaire items Q15, Q18, Q16, and Q21), with no dichotomization of five-point response scales for Q15, Q18, and Q16; two of these outcomes (Q15 and Q16) dichotomized as mentioned in the pre-analysis plan (e.g., "more likely to disagree" was split into disagree / not disagree categories, with the not disagree category including respondent skips); and one outcome (Q18) dichotomized so that one category has "Not Very Important" and "Not At All Important" and the other category has the other responses and skips, given that the pre-analysis plan had this outcome dichotomized as disagree but response options in the survey were not on an agree-to-disagree scale. Q21 was measured as a dichotomous variable.
The analysis was limited to presumed racially prejudiced Whites, because I think that that's what the pre-analysis plan hypotheses quoted above focused on. Moreover, that analysis seems more important than a mere difference between groups of Whites.
Note that, for at least some results, a p<0.05 treatment effect might be in the unintuitive direction, so be careful before interpreting a p<0.05 result as evidence for the hypotheses.
My analyses aren't the only analyses that can be conducted, and it might be a good idea to combine results across outcomes mentioned in the pre-analysis plan or across all outcomes in the questionnaire, given that the questionnaire had at least 12 items that could serve as outcome variables.
For what it's worth, I wouldn't be surprised if a lot of people who respond to survey items in an unfavorable way about Blacks backlashed against a message about how Blacks were more likely than Whites to die from covid-19.
5. The pre-analysis plan included a footnote that:
Given the results from my pilot data, it is also my expectation that partisanship will moderate the effect of the treatment or that the treatment effects will be concentrated among Republican respondents.
Moreover, the pre-analysis plan indicated that:
The condition and treatment will be blocked by party identification so that there are roughly equal numbers of Republicans and Democrats in each condition.
But the lone mention of "Repub-" in the APSR letter is:
The sample was 39% self-identified Democrats (including leaners) and 46% self-identified Republicans (including leaners).
6. Link to tweets about the APSR letter.