Social Science & Medicine published Skinner-Dorkenoo et al 2022 "Highlighting COVID-19 racial disparities can reduce support for safety precautions among White U.S. residents", with data for Study 1 fielded in September 2020. Stephens-Dougan had a similar Time-sharing Experiments for the Social Sciences study "Backlash effect? White Americans' response to the coronavirus pandemic", fielded starting in late April 2020 according to the TESS page for the study.

You can check tweets about Skinner-Dorkenoo et al 2022 and what some tweeters said about White people. But you can't tell from the Skinner-Dorkenoo et al 2022 publication or the Stephens-Dougan 2022 APSR article whether any detected effect is distinctive to White people.

Limiting samples to Whites doesn't seem to be a good idea if the purpose is to understand racial bias. But it might be naive to think that all social science research is designed to understand.

---

There might be circumstances in which it's justified to limit a study of racial bias to White participants, but I don't think such circumstances include:

* The Kirgios et al 2022 audit study that experimentally manipulated the race and gender of an email requester, but for which "Participants were 2,476 White male city councillors serving in cities across the United States". In late April, I tweeted a question to the first author of Kirgios et al 2022 about why the city councilor sample was limited to White men, but I haven't yet gotten a reply.

* Studies that collect sufficient data on non-White participants but do not report results from these data in the eventual publications (examples here and here).

* Proposals for federally funded experiments that request that the sample be limited to White participants, such as in the Stephens-Dougan 2020 proposal: "I want to test whether White Americans may be more resistant to efforts to curb the virus and more supportive of protests to reopen states when the crisis is framed as disproportionately harming African Americans".

---

One benefit of not limiting the subject pool by race is to limit unfair criticism of entire racial groups. For example, according to the analysis below from Bracic et al 2022, White nationalism among non-Whites was at least as influential as White nationalism among Whites in predicting support for a family separation policy net of controls:So, to the extent that White nationalism is responsible for support for the family separation policy, that applies to White respondents and to non-White respondents.

Of course, Bracic et al. 2022 doesn't report how the association for White nationalism compares to the association for, say, Black nationalism or Hispanic nationalism or how the association for the gendered nationalist belief that "the nation has gotten too soft and feminine" compares to the association for the gendered nationalist belief that, say, "the nation is too rough and masculine".

---

And consider this suggestion from Rice et al 2022 to use racial resentment items to screen Whites for jury service:

At the practical level, our research raises important empirical and normative questions related to the use of racial resentment items during jury selection in criminal trials. If racial resentment affects jurors' votes and reasoning, should racial resentment items be used to screen white potential jurors?

Given evidence suggesting that Black juror bias is on average at least as large as White juror bias, I don't perceive a good justification to limit this suggestion to White potential jurors, although I think that the Rice et al decision to not report results for Black mock jurors makes it easier to limit this suggestion to White potential jurors.

---

NOTES

1. I caught two flaws in Skinner-Dorkenoo et al 2022, which I discussed on Twitter: [1] For the three empathy items, more than 700 respondents selected "somewhat agree", more than 500 selected "strongly agree", but no respondent selected "agree", suggesting that the data were miscoded. [2] The p-value under p=0.05 for the empathy inference appears to be because the analysis controlled for a post-treatment measure; see the second model referred to by the lead author in the Twitter thread. I didn't conduct a full check of the Skinner-Dorkenoo et al 2022 analysis. Stata code and output for my analyses of Skinner-Dorkenoo et al 2022, with data here. Note the end of the output, indicating that the post-treatment control was affected by the treatment.

2. I have a prior post about the Stephens-Dougan TESS survey experiment reported on in the APSR that had substantial deviations from the pre-analysis plan. On May 31, I contacted the APSR about that and the error discussed at the post. I received an update in September, but the Stephens-Dougan 2022 APSR article hasn't been corrected as of Oct 2.

Tagged with: , , ,

PS: Political Science & Politics published Dietrich and Hayes 2022 "Race and Symbolic Politics in the US Congress" as part of a "Research on Race and Ethnicity in Legislative Studies" section with guest editors Tiffany D. Barnes and Christopher J. Clark.

---

1.

Dietrich and Hayes 2022 reported on an experiment in which a representative was randomized to be White or Black, the representative's speech was randomized to be about civil rights or renewable energy, and the representative's speech was randomized to include or not include symbolic references to the Civil Rights Movement. Dietrich and Hayes 2022 noted (p. 283) that:

When those same symbols were used outside of the domain of civil rights, however, white representatives received a significant punishment. That is, Black respondents were significantly more negative in their evaluations of white representatives who (mis-)used civil rights symbolism to advance renewable energy than in any other experimental condition.

The only numeric results that Dietrich and Hayes 2022 reported for this in the main text are in Figure 1, for an approval rating outcome. But the data file seems to have at least four potential outcomes: the symbolic_approval outcome (strongly disapprove to strongly approve), and the next three listed variables: symbolic_vote (extremely likely to extremely unlikely), symbolic_care (none to a lot), and symbolic_thermometer (0 to 100). The supplemental files have a figure that reports results for a dv_therm variable, but that figure doesn't report results for renewable energy separately by symbolic and non-symbolic.

---

2.

Another result reported in Dietrich and Hayes 2022 involved Civil Rights Movement symbolism in U.S. House of Representatives floor speeches that mentioned civil rights:

In addition to influencing African Americans' evaluation of representatives, our research shows that symbolic references to the civil rights struggle are linked to Black voter turnout. Using an analysis of validated voter turnout from the 2006–2018 Cooperative Election Study, our analyses suggest that increases in the number of symbolic speeches given by a member of Congress during a given session are associated with an increase in Black turnout in the subsequent congressional election. Our model predicts that increasing from the minimum of symbolic speeches in the previous Congress to the maximum in the current Congress is associated with a 65.67-percentage-point increase in Black voter turnout compared to the previous year.

This estimated 66 percentage point increase is at the congressional district level. Dietrich and Hayes 2022 calculated this estimate using a linear regression that predicted the change in Black turnout in a congressional district with a lagged symbolism percentage of 0 and a symbolism percentage of 1. From their code:

mod1<-lm(I(black_turnout-lag_black_turnout)~I(symbolic_percent-lag_symbolic_percent),data=cces)

print(round(predict(mod1,data.frame(symbolic_percent=1,lag_symbolic_percent=0))*100,2))

The estimated change in Black turnout was 85 percentage points when I modified the code to have a lagged symbolism percentage of 1 and a symbolism percentage of 0.

---

These estimated changes in Black turnout of 66 and 85 percentage points seemed implausible as causal estimates, and I'm not even sure that these are correct correlational estimates, based on the data in the "cces_turnout_results.csv" dataset in the hayes_dietrich_replication.zip file.

For one thing, the dataset lists symbolic_percent values for Alabama's fourth congressional district by row as 0.017857143, 0.047619048, 0.047619048, 0.013157895, 0.013157895, 0.004608295, 0.004608295, 0.00990099, 0.00990099, 1, 1, 1 , and 1. For speeches that mentioned civil rights, that's a relatively large jump in the percentage of such speeches that used Civil Rights Movement symbolism, from several values under 5% all the way to 100%. And this large jump to 100% is not limited to this congressional district: the mean symbolic_percent values across the full dataset were 0.14 (109th Congress), 0.02 (110th), 0.02 (111th), 0.03 (112th), 0.09 (113th), 1 (114th), and 1 (115th).

Moreover, the repetition in symbolic_percent within a congressional district is also consistent across the data that I checked. So, for the above district, 0.017857143 is for the 109th Congress, the first 0.047619048 is for one year of the 110th Congress, and the second 0.047619048 is for the other year of the 110th Congress, the two 0.013157895 values are for the two years of the 111th Congress, and so forth. From what I can tell, each dataset case is for a given district-year, but symbolic_percent is calculated only every two years, so that a large percentage of the "I(symbolic_percent-lag_symbolic_percent)" predictors are zero because of a research design decision to calculate the percentage of symbolic speeches per Congress and not per year; from what I can tell, these zeros might not be true zeros in which the percentage of symbolic speeches was the same in the given year and the lagged year.

---

For another thing, the "inline_calculations.R" file in the Dietrich and Hayes 2022 replication materials indicates that Black turnout values were based on CCES surveys and indicates that survey sample sizes might be very low for some congressional districts. The file describes a bootstrapping process that was used to produce the Black turnout values, which were then standardized to range from 0 to 1, but, from the description, I'm not sure how that standardization process works.

For instance, if, in one year the CCES has 2 Black participants for a certain congressional district and neither voted (0% turnout), and the next year is a presidential election year and the CCES had 3 Black participants in that district and all three voted (100% turnout), I'm not sure what the bootstrapping process would do to adjust that to get these congressional district Black turnout estimates to be closer to their true values, which presumably are between 0% and 100%. For what it's worth, of the 4,373 rows in the dataset, black_turnout is NA in 545 rows (12%), is 0 in 281 rows (6%), and is 1 in 1,764 rows (40%).

So I'm not sure how the described bootstrapping process adequately addresses the concern that the range of Black turnout values for a congressional district in the dataset is more extreme than the range of true Black turnout values for the congressional district. Maybe the standardization process addresses this in a way that I don't understand, so that 0 and 1 for black_turnout don't represent 0% turnout and 100% turnout, but, if that's the case, then I'm not sure how it would be justified for Dietrich and Hayes 2022 to interpret the aforementioned output of 65.67 as a 65.67 percentage-point increase.

---

NOTES

1. Dietrich and Hayes 2022 indicated that, in the survey experiment, participants were asked "to evaluate a representative on the basis of his or her floor speech", and Dietrich and Hayes 2022 indicated that the experimental manipulation for the representative's race involved "accompanying images of either a white or a Black representative". But the use of "his or her" makes me curious if the representative's gender was also experimentally manipulated.

2. Dietrich and Hayes 2022 Figure 1 reports [approval of the representative in the condition involving Civil Rights Movement symbolism in a speech about civil rights] in the same panel as [approval of the representative in the condition involving Civil Rights symbolism in a speech about renewable energy]. However, for assessing a penalty for use of Civil Rights Movement symbolism in the renewable energy speech, I think that it is more appropriate to compare [approval of the representative in the condition in which the renewable energy speech used Civil Rights Movement symbolism] to [approval of the representative in the condition in which the renewable energy speech did not use Civil Rights Movement symbolism].

If there is a penalty for using Civil Rights Movement symbolism in the speech about renewable energy, that penalty can be compared to the difference in approval between using and not using Civil Rights Movement symbolism in the speech about civil rights, to see whether the penalty in the renewable energy speech condition reflects a generalized penalty for the use of Civil Rights Movement symbolism.

3. On June 27, I emailed Dr. Dietrich and Dr. Hayes a draft of this blog post with an indication that "I thought that, as a courtesy, I would send the draft to you, if you would like to indicate anything in the draft that is unfair or incorrect". I have not yet received a reply, although it's possible that I used incorrect email addresses or my email went to a spam box.

Tagged with: , ,

I'll hopefully at some point write a summary that refers to a lot of my "comments" posts. But I have at least a few more to release before then, so here goes...

---

Politics, Groups, and Identities recently published Peay and McNair II 2022 "Concurrent pressures of mass protests: The dual influences of #BlackLivesMatter on state-level policing reform adoption". Peay and McNair II 2022 reported regressions that predicted a count of the number of police reform policies enacted by a state from August 2014 through 2020, using a key predictor of the number of Black Lives Matter protests in a state in the year after the killing of Michael Brown in August 2014.

An obvious concern is that the number of protests in a state is capturing the population size of the state. That's a concern because it's plausible that higher population states have legislatures that are more active than smaller population states, so that we would expect these high population states to tend to enact more policies per se, and not merely to enact more police reform policies. But the Peay and McNair II 2022 analysis does not control for the population size of the state.

I checked the correlation between [1] the number of Black Lives Matter protests in a state in the year after the killing of Michael Brown in August 2014 (data from Trump et al. 2018) and [2] the first list of the number of bills enacted by a state that I happened upon, which was the number of bills a state enacted from 2006 to 2009 relating to childhood obesity. The R-squared was 0.22 for a bivariate OLS regression using the state-level count of BLM protests to predict the state-level count of childhood obesity bills enacted. In comparison, Peay and McNair II 2022 Table 2 indicated that the R-squared was 0.19 in a bivariate OLS regression that used the state-level count of BLM protests to predict the state-level count of police reform policies enacted. So the concern about population size seems at least plausible.

---

This is a separate concern, but Figure 6 of Peay and McNair II 2022 reports predicted probabilities, with an x-axis of the number of protests. My analysis indicated that the number of protests ranged from 0 to 87, with only three states having more than 40 protests: New York at 67, Missouri at 74, and California at 87. Yet the widest the 95% confidence interval gets in Figure 6 is about 1 percentage point, at 87, which is a pretty precise estimate given data for only 50 states and only one state past 74.

Maybe the tight 95% confidence interval is a function of the network analysis for Figure 6, if the analysis, say, treats each potential connection between California and the other 49 states as 49 independent observations. Table 2 of Peay and McNair II 2022 doesn't have a sample size for this analysis, but reports 50 as the number of observations for the other analyses in that table.

---

NOTES

1. Data for my analysis.

2. No reply yet from the authors on Twitter.

Tagged with: , ,

Homicide Studies recently published Schildkraut and Turanovic 2022 "A New Wave of Mass Shootings? Exploring the Potential Impact of COVID-19". From the abstract:

Results show that total, private, and public mass shootings increased following the declaration of COVID-19 as a national emergency in March of 2020.

I was curious how Schildkraut and Turanovic 2022 addressed the possible confound of the 25 May 2020 killing of George Floyd.

---

Below is my plot of data used in Schildkraut and Turanovic 2022, for total mass shootings:

My read of the plot is that, until after the killing of George Floyd, there is insufficient evidence that mass shootings were higher in 2020 than in 2019.

Table 1 of Schildkraut and Turanovic 2022 reports an interrupted time series analysis that does not address the killing of George Floyd, with a key estimate of 0.409 and a standard error of 0.072. Schildkraut and Turanovic 2022 reports a separate analysis about George Floyd...

However, since George Floyd's murder occurred after the onset of the COVID-19 declaration, we conducted ITSA using only the post-COVID time period (n = 53 weeks) and used the week of May 25, 2020 as the point of interruption in each time series. These results indicated that George Floyd's murder had no impact on changes in overall mass shootings (b = 0.354, 95% CI [−0.074, 0.781], p = .105) or private mass shootings (b = 0.125, 95% CI [−0.419, 0.669], p = .652), but that Floyd's murder was linked to increases in public mass shootings (b = 0.772, 95% CI [0.062, 1.483], p = .033).

...but Schildkraut and Turanovic 2022 does not report any attempt to assess whether there is sufficient evidence to attribute the increase in mass shootings to covid once the 0.354 estimate for Floyd is addressed. The lack of statistical significance for the 0.354 Floyd estimate can't be used to conclude "no impact", especially given that the analysis for the covid declaration had data for 52 weeks pre-declaration and 53 weeks post-declaration, but the analysis for Floyd had data for only 11 weeks pre-Floyd and 42 weeks post-Floyd.

Schildkraut and Turanovic 2022 also disaggregated mass shootings into public mass shootings and private mass shootings. Corresponding plots by me are below. It doesn't look like the red line for the covid declaration is the break point for the increase in 2020 relative to 2019.

Astral Codex Ten discussed methods used to try to disentangle the effect of covid from the effect of Floyd, such as using for reference prior protests and other countries.

---

NOTES

1. In the Schildkraut and Turanovic 2022 data, some dates appeared in different weeks, such as 2019 Week 11 running from March 11 to March 17, but 2020 Week 11 running from March 9 to March 15.

2. The 13 March 2020 covid declaration occurred in the middle of Week 11, but the Floyd killing occurred at the start of Week 22, which ran from 25 May 2020 to May 31 2020.

3. Data. R code for the "total" plot above.

Tagged with: , , ,

PS: Political Science & Politics recently published Hartnett and Haver 2022 "Unconditional support for Trump's resistance prior to Election Day".

Hartnett and Haver 2022 reported on an experiment conducted in October 2020 in which likely Trump voters were asked to consider the hypothetical of a Biden win in the Electoral College and in the popular vote, with a Biden popular vote percentage point win randomly assigned to be from 1 percentage point through 15 percentage points. These likely Trump voters were then asked whether the Trump campaign should resist or concede.

Data were collected before the election, but Hartnett and Haver 2022 did not report anything about a corresponding experiment involving likely Biden voters. Hartnett and Haver 2022 discussed a Reuters/Ipsos poll that "found that 41% of likely Trump voters would not accept a Biden victory and 16% of all likely Trump voters 'would engage in street protests or even violence' (Kahn 2020)". The Kahn 2020 source indicates that the corresponding percentages for Biden voters for a Trump victory were 43% and 22%, so it didn't seem like there was a good reason to not include a parallel experiment for Biden voters, especially because data on only Trump voters wouldn't permit valid inferences about the characteristics on which Trump voters were distinctive.

---

But text for a somewhat corresponding experiment involving likely Biden voters is hidden in the Hartnett and Haver 2022 codebook under white boxes or something like that. The text of the hidden items can be highlighted, copied, and pasted from the bottom of pages 19 and 20 of the codebook PDF (or more hidden text can be copied, using ctrl+A, then ctrl-C, and then pasted with ctrl-V).

The hidden codebook text indicates that the hartnett_haver block of the survey had a "bidenlose" item that asked likely Biden voters whether, if Biden wins the popular vote by the randomized percentage points and Trump wins the electoral college, the Biden campaign should "Resist the results of the election in any way possible" or "Concede defeat".

There might be an innocent explanation for Hartnett and Haver 2022 not reporting the results for those items, but that innocent explanation hasn't been shared with me yet on Twitter. Maybe Hartnett and Haver 2022 have a manuscript in progress about the "bidenlose" item.

---

NOTES

1. Hartnett and Haver 2022 seems to be the survey that Emily Badger at the New York Times referred to as "another recent survey experiment conducted by Brian Schaffner, Alexandra Haver and Brendan Hartnett at Tufts". The copied-and-pasted codebook text indicates that this was for the "2020 Tufts Class Survey".

2. On page 18 of the Hartnett and Haver 2022 codebook, above the hidden item about socialism, part of the text of the "certain advantages" item is missing, which seems to be a should-be-obvious indication that text has been covered.

3. The codebook seems to be missing pages of the full survey: in the copied-and-pasted text, page numbers jump from "Page 21 of 43" to "Page 24 of 43" to "Page 31 of 43" to "Page 33 of 43". Presumably at least some missing items were for other members of the Tufts class, although I'm not sure what happened to page 32, which seems to be part of the hartnett_haver block that started on page 31 and ended on page 33.

4. The dataset for Hartnett and Haver 2022 includes a popular vote percentage point win from 1 percentage point through 15 percentage points assigned to likely Biden voters, but the dataset has no data on a resist-or-concede outcome or on a follow-up open-ended item.

Tagged with: , , , ,

Suppose that Bob at time 1 believes that Jewish people are better than every other group, but Bob at time 2 changes his belief to be that Jewish people are no better or worse than every other group, and Bob at time 3 changes his belief to be that Jewish people are worse than every other group.

Suppose also that these changes in Bob's belief about Jewish people have a causal effect on his vote choices. Bob at time 1 will vote 100% of the time for a Jewish candidate running against a non-Jewish candidate, no matter the relative qualifications of the candidates. At time 2, a candidate's Jewish identity is irrelevant to Bob's vote choice, so that, if given a choice between a Jewish candidate and an all-else-equal non-Jewish candidate, Bob will flip a coin and vote for the Jewish candidate only 50% of the time. Bob at time 3 will vote 0% of the time for a Jewish candidate running against a non-Jewish candidate, no matter the relative qualifications of the candidates.

Based on this setup, what is your estimate of the influence of antisemitism on Bob's voting decisions?

---

I think that the effect of antisemitism is properly understood as the effect of negative attitudes about Jewish people, so that the effect can be estimated in the above setup as the difference between Bob's voting decisions at time 2, when Bob is indifferent to a candidate's Jewish identity, and Bob's voting decisions at time 3, when Bob has negative attitudes about Jewish people. Thus, the effect of antisemitism on Bob's voting decisions is a 50 percentage point decrease, from 50% to 0%.

For the first decrease, from 100% to 50%, neither belief -- the belief that Jewish people are better than every other group, or the belief that Jewish people are no better or worse than every other group -- is antisemitic, so none of this decrease should be attributed to antisemitism. Generally, I think that this means that respondents who have positive attitudes about a group should not be used to estimate the effect of negative attitudes about that group.

---

So let's discuss the Race and Social Problems article: Sharrow et al 2021 "What's in a Name? Symbolic Racism, Public Opinion, and the Controversy over the NFL's Washington Football Team Name". The key predictor was a measure of resentment against Native Americans, built from responses to the statements below, measured on a 5-point scale from "strongly agree" to "strongly disagree":

Most Native Americans work hard to make a living just like everyone else.

Most Native Americans take unfair advantage of privileges given to them by the government.

My analysis indicates that 39% of the 1500 participants (N=582) provided consistently positive responses about Native Americans on both items, agreeing or strongly agreeing with the first statement and disagreeing or strongly disagreeing with the second statement. I don't see why these 582 respondents should be included in an analysis that attempts to estimate the effect of negative attitudes about Native Americans, if these participants do not fall along the indifferent-to-negative-attitudes continuum about Native Americans.

So let's check what happens after removing these respondents from the analysis.

---

I first conducted an unweighted OLS regression using the full sample and controls to predict the summary Team Name Index outcome, which measured support for the Washington football team's name placed on a 0-to-1 scale. For this regression (N=1024), the measure of resentment against Native Americans ranged from 0 for respondents who selected the most positive responses to both resentment items to 1 for respondents who selected the most negative responses to both resentment items. In this regression, the coefficient was 0.26 (t=6.31) for resentment against Native Americans.

I then removed respondents who provided positive responses about Native Americans for both resentment items. For this next unweighted OLS regression (N=572), the measure of resentment against Native Americans still had a value of 1 for respondents who provided the most negative responses to both resentment items; however, 0 was for participants who were neutral on one resentment item but provided the most positive response on the other resentment item, such as strongly agreeing that "Most Native Americans work hard to make a living just like everyone else" but neither agreeing or disagreeing that "Most Native Americans take unfair advantage of privileges given to them by the government". In this regression, the coefficient was 0.12 (t=2.23) for resentment against Native Americans.

The drop is similar when the regressions include only the measure of resentment against Native Americans and no other predictors: the coefficient is 0.44 for the full sample, but is 0.22 after dropping respondents who provided positive responses about Native Americans for both resentment items.

---

So I think that Sharrow et al 2021 might report substantial overestimates of the effect of resentment of Native Americans, because the estimates in Sharrow et al 2021 about the effect of negative attitudes about Native Americans included the effect of positive attitudes about Native Americans.

---

NOTES

1. About 20% of the Sharrow et al 2022 sample reported a negative attitude on at least one of the two measures of resentment against Native Americans. About 6% of the sample reported a negative attitude on both measures of resentment against Native Americans.

2. Sharrow et al 2021 indicated that "Our conclusions illustrate that symbolic racism toward Native Americans is central to interpreting the public's resistance toward changing the name, in sharp contrast to Snyder's claim that the name is about 'respect.'" (p. 111).

For what it's worth, the Sharrow et al 2021 data indicate that a nontrivial percentage of respondents with positive views of Native Americans somewhat or strongly disagreed with the claim that Washington football team name is offensive (in an item that reported the name of the team at the time): 47% of respondents who provided positive responses about Native Americans for both resentment items, 47% of respondents who rated Native Americans at 100 on a 0-to-100 feeling thermometer, 40% of respondents who provided positive responses about Native Americans for both resentment items and rated Native Americans at 100 on a 0-to-100 feeling thermometer, and 32% of respondents who provided the most positive responses about Native Americans for both resentment items and rated Native Americans at 100 on a 0-to-100 feeling thermometer (although this 32% was only 22% in unweighted analyses).

3. Sharrow et a 2021 indicated a module sample of 1,500 but the sample size fell to 1,024 in model 3 of Table 1. My analysis indicates that this is largely due to missing values on the outcome variable (N=1,362), the NFL sophistication index (N=1,364), and the measure of resentment of Native Americans (N=1,329).

4. Data for my analysis. Stata code and output.

5. Social Science Quarterly recently published Levin et al 2022 "Validating and testing a measure of anti-semitism on support for QAnon and vote intention for Trump in 2020", which also has the phenomenon of estimating the effect of negative attitudes about a target group but not excluding participants who favor the target group.

Tagged with: , , , , ,

The American Political Science Review recently published a letter: Stephens-Dougan 2022 "White Americans' reactions to racial disparities in COVID-19".

Figure 1 of the Stephens-Dougan 2022 APSR letter reports results for four outcomes among racially prejudiced Whites, with the 84% confidence interval in the control overlapping with the 84% confidence interval in the treatment for only one of the four reported outcomes (zooming in on Figure 1, the confidence intervals for the parks outcome don't seem to overlap, and the code returns 0.1795327 for the upper bound for the control and 0.18800818 for the lower bound for the treatment). And results for the most obviously overlapping 84% confidence intervals seem to be interpreted as sufficient evidence of an effect, with all four reported outcomes discussed in the passage below:

When racially prejudiced white Americans were exposed to the racial disparities information, there was an increase in the predicted probability of indicating that they were less supportive of wearing face masks, more likely to feel their individual rights were being threatened, more likely to support visiting parks without any restrictions, and less likely to think African Americans adhere to social distancing guidelines.

---

There are at least three things to keep track of: [1] the APSR letter, [2] the survey questionnaire, located at the OSF site for the Time-sharing Experiments for the Social Sciences project; and [3] the pre-analysis plan, located at the OSF and in the appendix of the APSR article. I'll use the PDF of the pre-analysis plan. The TESS site also has the proposal for the survey experiment, but I won't discuss that in this post.

---

The pre-analysis plan does not mention all potential outcome variables that are in the questionnaire, but the pre-analysis plan section labeled "Hypotheses" includes the passage below:

Specifically, I hypothesize that White Americans with anti-Black attitudes and those White Americans who attribute racial disparities in health to individual behavior (as opposed to structural factors), will be more likely to disagree with the following statements:

The United States should take measures aimed at slowing the spread of the coronavirus while more widespread testing becomes available, even if that means many businesses will have to stay closed.

It is important that people stay home rather than participating in protests and rallies to pressure their governors to reopen their states.

I also hypothesize that White Americans with anti-Black attitudes and who attribute racial health disparities to individual behavior will be more likely to agree with the following statements:

State and local directives that ask people to "shelter in place" or to be "safer at home" are a threat to individual rights and freedom.

The United States will take too long in loosening restrictions and the economic impact will be worse with more jobs being lost

The four outcomes mentioned in the passage above correspond to items Q15, Q18, Q16, and Q21 in the survey questionnaire, but, of these four outcomes, the APSR letter reported on only Q16.

The outcome variables in the APSR letter are described as: "Wearing facemasks is not important", "Individual rights and freedom threatened", "Visit parks without any restrictions", and "Black people rarely follow social distancing guidelines". These outcome variables correspond to survey questionnaire items Q20, Q16, Q23A, and Q22A.

---

The pre-analysis plan PDF mentions moderators, with three moderators about racial dispositions: racial resentment, negative stereotype endorsement, and attributions for health disparities. The plan indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions. For ideology and party, we will use three bins. We will include each bin as a dummy variable, omitting one category as a baseline.

The APSR letter reported on only one racial predispositions moderator: negative stereotype endorsement.

---

I'll post a link in the notes below to some of my analyses about the "Specifically, I hypothesize" outcomes, but I don't want to focus on the results, because I wanted this post to focus on deviations from the pre-analysis plan, because -- regardless of whether the estimates from the analyses in the APSR letter are similar to the estimates from the planned analyses in the pre-analysis plan -- I think that it's bad that readers can't trust the APSR to ensure that a pre-analysis plan is followed or at least to provide an explanation about why a pre-analysis plan was not followed, especially given that this APSR letter described itself as reporting on "a preregistered survey experiment" and included the pre-analysis plan in the appendix.

---

NOTES

1. The Stephens-Dougan 2022 APSR letter suggests that the negative stereotype endorsement variable was coded dichotomously ("a variable indicating whether the respondent either endorsed the stereotype that African Americans are less hardworking than whites or the stereotype that African Americans are less intelligent than whites"), but the code and the appendix of the APSR letter indicate that the negative stereotype endorsement variable was measured so that the highest level is for respondents who reported a negative relative stereotype about Blacks for both stereotypes. From Table A7:

(unintelligentstereotype 2 + lazystereotype2 )/2

In the data after running the code for the APSR letter, the negative stereotype endorsement variable is a three-level variable coded 0 for respondents who did not report a negative relative stereotype about Blacks for either stereotype, 0.5 for respondents who reported a negative stereotype about Blacks for one stereotype, and 1 for respondents who reported a negative relative stereotype about Blacks for both stereotypes.

2. The APSR letter indicated that:

The likelihood of racially prejudiced respondents in the control condition agreeing that shelter-in-place orders threatened their individual rights and freedom was 27%, compared with a likelihood of 55% in the treatment condition (p < 0.05 for a one-tailed test).

My analysis using survey weights got 44% and 29% among participants who reported a negative relative stereotype about Blacks for at least one of the two stereotype items, and my analysis got 55% and 26% among participants who reported negative relative stereotypes about Blacks for both stereotype items, with a trivial overlap in 84% confidence intervals.

But the 55% and 26% in a weighted analysis were 43% and 37% in an unweighted analysis with a large overlap in 84% confidence intervals, suggesting that at least some of the results in the APSR letter might be limited to the weighted analysis. I ran the code for the APSR letter removing the weights from the glm command and got the revised Figure 1 plot below. The error bars in the APSR letter are described as 84% confidence intervals.

I think that it's fine to favor the weighted analysis, but I'd prefer that publications indicate when results from an experiment are not robust to the application or non-application of weights. Relevant publication.

3. Given the results in my notes [1] and [2], maybe the APSR letter's Figure 1 estimates are for only respondents who reported negative relative stereotype about Blacks for both stereotypes. If so, the APSR letter's suggestion that this population is the 26% that reported anti-Black stereotypes for either stereotype might be misleading, if the Figure 1 analyses were estimated for only the 10% that reported negative relative stereotype about Blacks for both stereotypes.

For what it's worth, the R code for the APSR letter has code that doesn't use the 0.5 level of the negative stereotype endorsement variable, such as:

# Below are code for predicted probabilities using logit model

# Predicted probability "individualrights_dichotomous"

# Treatment group, negstereotype_endorsement = 1

p1.1 <- invlogit(coef(glm1)[1] + coef(glm1)[2] * 1 + coef(glm1)[3] * 1 + coef(glm1)[4] * 1)

It's possible to see what happens to the Figure 1 results when the negative stereotype endorsement variable is coded 1 for respondents who endorsed at least one of the stereotypes. Run this at the end of the Stata code for the APSR letter:

replace negstereotype_endorsement = ceil((unintelligentstereotype2 + lazystereotype2)/2)

Then run the R code for the APSR letter. Below is the plot I got for a revised Figure 1, with weights applied and the sample limited to respondents who endorsed at least one of the stereotypes:

Estimates in the figure above were close to estimates in my analysis using these Stata commands after running the Stata code from the APSR letter. Stata output.

4. Data, Stata code, and Stata output for my analysis about the "Specifically, I hypothesize" passage of the Stephens-Dougan pre-analysis plan.

My analysis in the Stata output had seven outcomes: the four outcomes mentioned in the "Specifically, I hypothesize" part of the pre-analysis plan as initially measured (corresponding to questionnaire items Q15, Q18, Q16, and Q21), with no dichotomization of five-point response scales for Q15, Q18, and Q16; two of these outcomes (Q15 and Q16) dichotomized as mentioned in the pre-analysis plan (e.g., "more likely to disagree" was split into disagree / not disagree categories, with the not disagree category including respondent skips); and one outcome (Q18) dichotomized so that one category has "Not Very Important" and "Not At All Important" and the other category has the other responses and skips, given that the pre-analysis plan had this outcome dichotomized as disagree but response options in the survey were not on an agree-to-disagree scale. Q21 was measured as a dichotomous variable.

The analysis was limited to presumed racially prejudiced Whites, because I think that that's what the pre-analysis plan hypotheses quoted above focused on. Moreover, that analysis seems more important than a mere difference between groups of Whites.

Note that, for at least some results, a p<0.05 treatment effect might be in the unintuitive direction, so be careful before interpreting a p<0.05 result as evidence for the hypotheses.

My analyses aren't the only analyses that can be conducted, and it might be a good idea to combine results across outcomes mentioned in the pre-analysis plan or across all outcomes in the questionnaire, given that the questionnaire had at least 12 items that could serve as outcome variables.

For what it's worth, I wouldn't be surprised if a lot of people who respond to survey items in an unfavorable way about Blacks backlashed against a message about how Blacks were more likely than Whites to die from covid-19.

5. The pre-analysis plan included a footnote that:

Given the results from my pilot data, it is also my expectation that partisanship will moderate the effect of the treatment or that the treatment effects will be concentrated among Republican respondents.

Moreover, the pre-analysis plan indicated that:

The condition and treatment will be blocked by party identification so that there are roughly equal numbers of Republicans and Democrats in each condition.

But the lone mention of "Repub-" in the APSR letter is:

The sample was 39% self-identified Democrats (including leaners) and 46% self-identified Republicans (including leaners).

6. Link to tweets about the APSR letter.

Tagged with: , , , , , , , ,