Broockman 2013 "Black politicians are more intrinsically motivated to advance Blacks' interests: A field experiment manipulating political incentives" reported results from an experiment in which U.S. state legislators were sent an email from "Tyrone Washington", which is a name that suggests that the email sender is Black. The experimental manipulation was that "Tyrone" indicated that the city that he lived in was a city in the legislator's district or was a well-known city far from the legislator's district.

Based on Table 2 column 2, response percentages were:

  • 56.1% from in-district non-Black legislators
  • 46.4% from in-district Black legislators (= 0.561 - 0.097)
  • 28.6% from out-of-district non-Black legislators (= 0.561 - 0.275)
  • 41.4% from out-of-district Black legislators (= 0.561 - 0.275 + 0.128)

---

Broockman 2013 lacked another emailer to serve as comparison for response rates to Tyrone, such as an emailer with a stereotypical White name. Broockman 2013 discusses this:

One challenge in designing the experiment was that there were so few black legislators in the United States (as of November 2010) that a set of white letter placebo conditions could not be implemented due to a lack of adequate sample size.

So all emails in the Broockman 2013 experiment were signed "Tyrone Washington".

---

But here is how Broockman 2013 was cited by Rhinehar 2020 in American Politics Research:

A majority of this work has explored legislator responsiveness by varying the race or ethnicity of the email sender (Broockman, 2013;...

---

Costa 2017 in the Journal of Experimental Political Science:

As for variables that do have a statistically significant effect, minority constituents are almost 10 percentage points less likely to receive a response than non-minority constituents (p < 0.05). This is consistent with many individual studies that have shown requests from racial and ethnic minorities are given less attention overall, and particularly when the recipient official does not share their race (Broockman, 2013;...

But Broockman 2013 didn't vary the race of the requester, so I'm not sure of the basis for the suggestion that Broockman 2013 provided evidence that requests from racial and ethnic minorities are given less attention overall.

---

Mendez and Grose 2018 in Legislative Studies Quarterly:

Others argue or show, through experimental audit studies, that political elites have biases toward minority constituents when engaging in nonpolicy representation (e.g.,Broockman 2013...

I'm not sure how Broockman 2013 permits an inference of political elite bias toward minority constituents, when the only constituent was Tyrone.

---

Lajevardi 2018 in Politics, Groups, and Identities:

Audit studies have previously found that public officials are racially biased in whether and how they respond to constituent communications (e.g., Butler and Broockman 2011; Butler, Karpowitz, and Pope 2012; Broockman 2013;...

---

Dinesen et al 2021 in the American Political Science Review:

In the absence of any extrinsic motivations, legislators still favor in-group constituents (Broockman 2013), thereby indicating a role for intrinsic motivations in unequal responsiveness.

Again, Tyrone was the only constituent in Broockman 2013.

---

Hemker and Rink 2017 in the American Journal of Political Science:

White officials in both the United States and South Africa are more likely to respond to requests from putative whites, whereas black politicians favor putative blacks (Broockman 2013, ...

---

McClendon 2016 in the Journal of Experimental Political Science:

Politicians may seek to favor members of their own group and to discriminate against members of out-groups (Broockman, 2013...

---

Gell-Redman et al 2018 in American Politics Research:

Studies that explore other means of citizen and legislator interaction have found more consistent evidence of bias against minority constituents. Notably, Broockman (2013) finds that white legislators are significantly less likely to respond to black constituents when the political benefits of doing so were diminished.

But the only constituent was Tyrone, so you can't properly infer bias against Tyrone or minority constituents more generally, because the experiment didn't indicate whether the out-of-district drop-off for Tyrone differed from the out-of-district drop-off for a putative non-Black emailer.

---

Broockman 2014 in the American Journal of Political Science:

Outright racial favoritism among politicians themselves is no doubt real (e.g., Broockman 2013b;...

But who was Tyrone favored more than or less than?

---

Driscoll et al 2018 in the American Journal of Political Science:

Broockman (2013) finds that African American state legislators expend more effort to improve the welfare of black voters than white state legislators, irrespective of whether said voters reside in their districts.

Even ignoring the added description of the emailer as a "voter", response rates to Tyrone were not "irrespective" of district residence. Broockman 2013 even plotted data for the matched case analysis, in which the bar for in-district Black legislators was not longer than the bar for in-district non-Black legislators:

---

Shoub et al 2020 in the Journal of Race, Ethnicity, and Politics:

Black politicians are more likely to listen and respond to black constituents (Broockman 2013),...

The prior context in Shoub et al 2020 suggests that the "more likely" comparison is to non-Black politicians, but this description loses the complication in which Black legislators were not more likely than non-Black legislators to respond to in-district Tyrone, which is especially important if we reasonably assume that in-district Tyrone was perceived to be a constituent and out-of-district Tyrone wasn't. Same problem with Christiani et al 2021 in Politics, Groups, and Identities:

Black politicians are more likely to listen and respond to black constituents than white politicians (Broockman
2013)...

The similar phrasing for the above two passages might be due to the publications having the same group of authors: Shoub Epp Baumgartner Christiani Roach, and Christiani Shoub Baumgartner Epp Roach.

---

Gleason and Stout 2014 in the Journal of Black Studies:

Recent experimental studies conducted by Butler and Broockman (2011) and Broockman (2013) confirm these findings. These studies show that Black elected officials are more likely to help co-racial constituents in and outside of their districts gain access to the ballot more than White elected officials.

This passage, from what I can tell, describes both citations incorrectly: in Broockman 2013, Tyrone was asking for help getting unemployment benefits, and I'm not sure what the basis is for the "in...their districts" claim: in-district response rates were 56.1% from non-Black legislators and 46.4% from Black legislators. The Butler and Broockman 2011 appendix reports results such as DeShawn receiving responses from 41.9%, 22.4%, and 44.0% of Black Democrat legislators when DeShawn respectively asked about a primary, a Republican primary, and a Democratic primary and, respectively, from 54.3%, 56.1%, and 62.1% of White Democrat legislators.

But checking citations to Butler and Broockman 2011 would be another post.

---

NOTES

1. The above isn't a systematic analysis of citations of Broockman 2013, so no strong inferences should be made about the percentage of times Broockman 2013 was cited incorrectly, other than maybe too often, especially in these journals.

2. I think that, for the Broockman 2013 experiment, a different email could have been sent from a putative White person, without sample size concerns. Imagine that "Billy Bob" emailed each legislator asking for help with, say, welfare benefits. If, like with Tyrone, Black legislator response rates were similar for in-district Billy Bob and for out-of-district Billy Bob, that would provide a strong signal to not attribute the similar rates to an intrinsic motivation to advance Blacks' interests. But if the out-of-district drop off in Black legislator response rates was much larger for Billy Bob than for Tyrone, that would provide a strong signal to attribute the similar Black legislator response rates for in-district Tyrone and out-of-district Tyrone to an intrinsic motivation to advance Blacks' interests.

3. I think that the error bars in Figure 1 above might be 50% confidence intervals, given that the error bars seems to match the Stata command "reg code_some treat_out treatXblack leg_black [iweight=cem_weights], level(50)" that I ran on the Broockman 2013 data after line 17 in the Stata do file.

4. I shared this post with David Broockman, who provided the following comments:

Hi LJ,

I think you're right that some of these citations are describing my paper incorrectly and probably meant to cite my 2011 paper with Butler. (FWIW, in that study, we find legislators of all races seem to just discriminate in favor of their race, across both parties, so some of the citations don't really capture that either....)

The experiment would definitely be better with a white control, there was just a bias-variance trade-off here -- adding a putative race of constituent factor in the experiment would mean less bias but more variance. I did the power calculations and didn't think the experiment would be well-powered enough if I made the cells that small and were looking for a triple interaction between legislator race X letter writer putative race X in vs. out of district. In the paper I discuss a few alternative explanations that the lack of a white letter introduces and do some tests for them (see the 3 or 4 paragraphs starting with "One challenge..."). Essentially, I didn't see any reason why we should expect black legislators to just be generically less sensitive to whether a person is in their district, especially given in our previous paper we found they reacted pretty strongly to the race of the email sender (so it's not like the black legislators who do respond to emails just don't read emails carefully). Still, I definitely still agree with what I wrote then that this is a weakness of the study. It would be nice for someone to replicate this study, and I like the idea you have in footnote 2 for doing this. Someone should do that study!

Tagged with: , ,

Research involves a lot of decisions, which in turn provides a lot of opportunities for research to be incorrect or substandard, such as mistakes in recoding a variable, not using the proper statistical method, or not knowing unintuitive elements of statistical software such as how Stata treats missing values in logical expressions.

Peer and editorial review provides opportunities to catch flaws in research, but some journals that publish political science don't seem to be consistently doing a good enough job at this. Below, I'll provide a few examples that I happened upon recently and then discuss potential ways to help address this.

---

Feinberg et al 2022

PS: Political Science & Politics published Feinberg et al 2022 "The Trump Effect: How 2016 campaign rallies explain spikes in hate", which claims that:

Specifically, we established that the words of Donald Trump, as measured by the occurrence and location of his campaign rallies, significantly increased the level of hateful actions directed toward marginalized groups in the counties where his rallies were held.

After Feinberg et al published a similar claim in the Monkey Cage in 2019, I asked the lead author about the results when the predictor of hosting a Trump rally is replaced with a predictor of hosting a Hillary Clinton rally.

I didn't get a response from Ayal Feinberg, but Lilley and Wheaton 2019 reported that the point estimate for the effect on the count of hate-motivated events is larger for hosting a Hillary Clinton rally than for hosting a Donald Trump rally. Remarkably, the Feinberg et al 2022 PS article does not address the Lilley and Wheaton 2019 claim about Clinton rallies, even though the supplemental file for the Feinberg et al 2022 PS article discusses a different criticism from Lilley and Wheaton 2019.

The Clinton rally counterfactual is an obvious way to assess the claim that something about Trump increased hate events. Even if the reviewers and editors for PS didn't think to ask about the Clinton rally counterfactual, that counterfactual analysis appears in the Reason magazine criticism that Feinberg et al 2022 discusses in its supplemental files, so the analysis was presumably available to the reviewers and editors.

Will May has published a PubPeer comment discussing other flaws of the Feinberg et al 2022 PS article.

---

Christley 2021

The impossible "p < .000" appears eight times in Christley 2021 "Traditional gender attitudes, nativism, and support for the Radical Right", published in Politics & Gender.

Moreover, Christley 2021 indicates that (emphasis added):

It is also worth mentioning that in these data, respondent sex does not moderate the relationship between gender attitudes and radical right support. In the full model (Appendix B, Table B1), respondent sex is correlated with a higher likelihood of supporting the radical right. However, this finding disappears when respondent sex is interacted with the gender attitudes scale (Table B2). Although the average marginal effect of gender attitudes on support is 1.4 percentage points higher for men (7.3) than it is for women (5.9), there is no significant difference between the two (Figure 5).

Table B2 of Christley 2021 has 0.64 and 0.250 for the logit coefficient and standard error for the "Male*Gender Scale" interaction term, with no statistical significance asterisks; the 0.64 is the only table estimate without results reported to three decimal places, so it's not clear to me from the table if the asterisks are missing or is the estimate should be, say, 0.064 instead of 0.64. The sample size for the Table B2 regression is 19,587, so a statistically significant 1.4-percentage-point difference isn't obviously out of the question, from what I can tell.

---

Hua and Jamieson 2022

Politics, Groups, and Identities published Hua and Jamieson 2022 "Whose lives matter? Race, public opinion, and military conflict".

Participants were assigned to a control condition with no treatment, to a placebo condition with an article about baseball gloves, or to an article about a U.S. service member being killed in combat. The experimental manipulation was the name of the service member, intended to signal race: Connor Miller, Tyrone Washington, Javier Juarez, Duc Nguyen, and Misbah Ul-Haq.

Inferences from Hua and Jamieson 2022 include:

When faced with a decision about whether to escalate a conflict that would potentially risk even more US casualties, our findings suggest that participants are more supportive of escalation when the casualties are of Pakistani and African American soldiers than they are when the deaths are soldiers from other racial–ethnic groups.

But, from what I can tell, this inference of participants being "more supportive" depending on the race of the casualties is based on differences in statistical significance when each racial condition is compared to the control condition. Figure 5 indicates a large enough overlap between confidence intervals for the racial conditions for this escalation outcome to prevent a confident claim of "more supportive" when comparing racial conditions to each other.

Figure 5 seems to plot estimates from the first column in Table C.7. The largest racial gap in estimates is between the Duc Nguyen condition (0.196 estimate and 0.133 standard error) and the Tyrone Washington condition (0.348 estimate and 0.137 standard error). So this difference in means is 0.152, and I don't think that there is sufficient evidence to infer that these estimates differ from each other. 83.4% confidence intervals would be about [0.01, 0.38] and [0.15, 0.54].

---

Walker et al 2022

PS: Political Science & Politics published Walker et al 2022 "Choosing reviewers: Predictors of undergraduate manuscript evaluations", which, for the regression predicting reviewer ratings of manuscript originality, interpreted a statistically significant -0.288 OLS coefficient for "White" as indicating that "nonwhite reviewers gave significantly higher originality ratings than white reviewers". But the table note indicates that the "originality" outcome variable is coded 1 for yes, 2 for maybe, and 3 for no, so that the "higher" originality ratings actually indicate lower ratings of originality.

Moreover, Walker et al 2022 claims that:

There is no empirical linkage between reviewers' year in school and major and their assessment of originality.

But Table 2 indicates p<0.01 evidence that reviewer major associates with assessments of originality.

And the "a", "b", and "c" notes for Table 2 are incorrectly matched to the descriptions; for example, the "b" note about the coding of the originality outcome is attached to the other outcome.

The "higher originality ratings" error has been corrected, but not the other errors. I mentioned only the "higher" error in this tweet, so maybe that explains that. It'll be interesting to see if PS issues anything like a corrigendum about "Trump rally / hate" Feinberg et al 2022, given that the flaw in Feinberg et al 2022 seems a lot more important.

---

Fattore et al 2022

Social Science Quarterly published Fattore et al 2022 "'Post-election stress disorder?' Examining the increased stress of sexual harassment survivors after the 2016 election". For a sample of women participants, the analysis uses reported experience being sexually harassed to predict a dichotomous measure of stress due to the 2016 election, net of controls.

Fattore et al 2022 Table 1 reports the standard deviation for a presumably multilevel categorical race variable that ranges from 0 to 4 and for a presumably multilevel categorical marital status variable that ranges from 0 to 2. Fattore et al 2022 elsewhere indicates that the race variable was coded 0 for white and 1 for minority, but indicates that the marital status variable is coded 0 for single, 1 for married/coupled, and 2 for separated/divorced/widowed, so I'm not sure how to interpret regression results for the marital status predictor.

And Fattore et al 2022 has this passage:

With 95 percent confidence, the sample mean for women who experienced sexual harassment is between 0.554 and 0.559, based on 228 observations. Since the dependent variable is dichotomous, the probability of a survivor experiencing increased stress symptoms in the post-election period is almost certain.

I'm not sure how to interpret that passage: Is the 95% confidence interval that thin (0.554, 0.559) based on 228 observations? Is the mean estimate of about 0.554 to 0.559 being interpreted as almost certain? Here is the paragraph that that passage is from.

---

Hansen and Dolan 2022

Political Behavior published Hansen and Dolan 2022 "Cross‑pressures on political attitudes: Gender, party, and the #MeToo movement in the United States".

Table 1 of Hansen and Dolan 2022 reported results from a regression limited to 694 Republican respondents in a 2018 ANES survey, which indicated that the predicted feeling thermometer rating about the #MeToo movement was 5.44 units higher among women than among men, net of controls, with a corresponding standard error of 2.31 and a statistical significance asterisk. However, Hansen and Dolan 2022 interpreted this to not provide sufficient evidence of a gender gap:

In 2018, we see evidence that women Democrats are more supportive of #MeToo than their male co-partisans. However, there was no significant gender gap among Republicans, which could signal that both women and men Republican identifiers were moved to stand with their party on this issue in the aftermath of the Kavanaugh hearings.

Hansen and Dolan 2022 indicated that this inference of no significant gender gap is because, in Figure 1, the relevant 95% confidence interval for Republican men overlapped with the corresponding 95% confidence interval for Republican women.

Footnote 9 of Hansen and Dolan 2022 noted that assessing statistical significance using overlap of 95% confidence intervals is a "more rigorous standard" than using a p-value threshold of p=0.05 in a regression model. But Footnote 9 also claimed that "Research suggests that using non-overlapping 95% confidence intervals is equivalent to using a p < .06 standard in the regression model (Schenker & Gentleman, 2001)", and I don't think that this "p < .06" claim is correct or at least not misleading.

My Stata analysis of the data for Hansen and Dolan 2022 indicated that the p-value for the gender gap among Republicans on this item is p=0.019, which is about what would be expected given data in Table 1 of a t-statistic of 5.44/2.31 and more than 600 degrees of freedom. From what I can tell, the key evidence from Schenker and Gentleman 2001 is Figure 3, which indicates that the probability of a Type 1 error using the overlap method is about equivalent to p=0.06 only when the ratio of the two standard errors is about 20 or higher.

This discrepancy in inferences might have been avoided if 83.4% confidence intervals were more commonly taught and recommended by editors and reviewers, for visualizations in which the key comparison is between two estimates.

---

Footnote 10 of Hansen and Dolan 2022 states:

While Fig. 1 appears to show that Republicans have become more positive towards #MeToo in 2020 when compared to 2018, the confidence bounds overlap when comparing the 2 years.

I'm not sure what that refers to. Figure 1 of Hansen and Dolan 2022 reports estimates for Republican men in 2018, Republican women in 2018, Republican men in 2020, and Republican women in 2020, with point estimates increasing in that order. Neither 95% confidence interval for Republicans in 2020 overlaps with either 95% confidence interval for Republicans in 2018.

---

Other potential errors in Hansen and Dolan 2022:

[1] The code for the 2020 analysis uses V200010a, which is a weight variable for the pre-election survey, even though the key outcome variable (V202183) was on the post-election survey.

[2] Appendix B Table 3 indicates that 47.3% of the 2018 sample was Republican and 35.3% was Democrat, but the sample sizes for the 2018 analysis in Table 1 are 694 for the Republican only analysis and 1001 for the Democrat only analysis.

[3] Hansen and Dolan 2022 refers multiple times to predictions of feeling thermometer ratings as predicted probabilities, and notes for Tables 1 and 2 indicate that the statistical significance asterisk is for "statistical significance at p > 0.05".

---

Conclusion

I sometimes make mistakes, such as misspelling an author's name in a prior post. In 2017, I preregistered an analysis that used overlap of 95% confidence intervals to assess evidence for the difference between estimates, instead of a preferable direct test for a difference. So some of the flaws discussed above are understandable. But I'm not sure why all of these flaws got past review at respectable journals.

Some of the flaws discussed above are, I think, substantial, such as the political bias in Feinberg et al 2022 not reporting a parallel analysis for Hillary Clinton rallies, especially with the Trump rally result being prominent enough to get a fact check from PolitiFact in 2019. Some of the flaws discussed above are trivial, such as "p < .000". But even trivial flaws might justifiably be interpreted as reflecting a review process that is less rigorous than it should be.

---

I think that peer review is valuable at least for its potential to correct errors in analyses and to get researchers to report results that they otherwise wouldn't report, such as a robustness check suggested by a reviewer that undercuts the manuscript's claims. But peer review as currently practiced doesn't seem to do that well enough.

Part of the problem might be that peer review at a lot of political science journals combines [1] assessment of the contribution of the manuscript and [2] assessment of the quality of the analyses, often for manuscripts that are likely to be rejected. Some journals might benefit from having a (or having another) "final boss" who carefully reads conditionally accepted manuscripts only for assessment [2], to catch minor "p < .000" types of flaws, to catch more important "no Clinton rally analysis" types of flaws, and to suggest robustness checks and additional analyses.

But even better might be opening peer review to volunteers, who collectively could plausibly do a better job than a final boss could do alone. I discussed the peer review volunteer idea in this symposium entry. The idea isn't original to me; for example, Meta-Psychology offers open peer review. The modal number of peer review volunteers for a publication might be zero, but there is a good chance that I would have raised the "no Clinton rally analysis" criticism had PS posted a conditionally accepted version of Feinberg et al 2022.

---

Another potentially good idea would be for journals or an organization such as APSA to post at least a small set of generally useful advice, such as reporting results for a test for differences between estimates if the manuscript suggests a difference between estimates. More specific advice could be posted by topic, such as, for count analyses, advice about predicting counts in which the opportunity varies by observation: Lilley and Wheaton 2019 discussed this page, but I think that this page has an explanation that is easier to understand.

---

NOTES

1. It might be debatable whether this is a flaw per se, but Long 2022 "White identity, Donald Trump, and the mobilization of extremism" reported correlational results from a survey experiment but, from what I can tell, didn't indicate whether any outcomes differed by treatment.

2. Data for Hansen and Dolan 2022. Stata code for my analysis:

desc V200010a V202183

svyset [pw=weight]

svy: reg metoo education age Gender race income ideology2 interest media if partyid2=="Republican"

svy: mean metoo if partyid2=="Republican" & women==1

3. The journal Psychological Science is now publishing peer reviews. Peer reviews are also available for the journal Meta-Psychology.

4. Regarding the prior post about Lacina 2022 "Nearly all NFL head coaches are White. What are the odds?", Bethany Lacina discussed that with me on Twitter. I have published an update at that post.

5. I emailed or tweeted to at least some authors of the aforementioned publications discussing the planned comments or indicating at least some of the criticism. I received some feedback from one of the authors, but the author didn't indicate that I had permission to acknowledge the author.

Tagged with: , , , , ,

1.

In 2003, Melissa V. Harris-Lacewell wrote that (p. 222):

The defining works of White racial attitudes fail to grapple with the complexities of African American political thought and life. In these studies, Black people are a static object about which White people form opinions.

Researchers still sometimes make it difficult to analyze data from Black participants or don't report interesting data on Black participants. Helping to address this, Darren W. Davis and David C. Wilson have a new book Racial Resentment in the Political Mind (RRPM), with an entire chapter on African Americans' resentment toward Whites.

RRPM is a contribution to research on Black political attitudes, and its discussion of measurement of Whites' resentment toward Blacks is nice, especially for people who don't realize that standard measures of "racial resentment" aren't good measures of resentment. But let me discuss some elements of the book that I consider flawed.

---

2.

RRPM draws, at a high level, a parallel between Whites' resentment toward Blacks and Blacks' resentment toward Whites (p. 242):

In essence, the same model of a just world and appraisal of deservingness that guides Whites' racial resentment also guides African Americans' racial resentment.

That seems reasonable, to have the same model for resentment toward Whites and resentment toward Blacks. But RRPM proposes different items for a battery of resentment toward Blacks and for a battery of resentment toward Whites, and I think that different batteries for each type of resentment will undercut comparison of the size of the effects of these two different resentments, because one battery might capture true resentment better than another battery.

Thus, especially for general surveys such as the ANES that presumably can't or won't devote space to batteries measuring resentments tailored to each racial group, it might be better to measure resentment toward various groups with generalizable items such as agreement/disagreement to statements such as "Whites have gotten more than they deserve" and "Blacks have gotten more than they deserve", which hopefully would produce more valid comparisons of the estimated effect of resentments toward different groups, compared to comparison of batteries of different items.

---

3.

RRPM suggests that all resentment batteries not be given to all respondents (p. 241):

A clear outcome of this chapter is that African Americans should not be presented the same classic racial resentment survey items that Whites would answer (and perhaps vice versa)...

And from page 30:

African Americans and Whites have different reasons to be resentful toward each other, and each group requires a unique set of measurement items to capture resentment.

But not giving participants items measuring resentment of their own racial group doesn't seem like a good idea, because a White participant could think that Whites have received more than they deserve on average, and a Black participant could think that Blacks have received more than they deserve on average, so that omitting White resentment of Whites and similar measures could plausibly bias estimates of the effect of resentment, if resentment of one's own racial group influences a participant's attitudes about political phenomena.

---

RRPM discusses asking Blacks to respond to racial resentment items toward Blacks: "No groups other than African Americans seem to be asked questions about self-hate" (p. 249). RRPM elsewhere qualifies this with "rarely": "That is, asking African Americans to answer questions about disaffection toward their own group is a task rarely asked of other groups"  (p. 215).

The ANES 2016 pilot study did ask White participants about White guilt (e.g., "How guilty do you feel about the privileges and benefits you receive as a white American?") without asking any other racial groups about parallel guilt. Moreover, the CCES had (in 2016 and 2018 at least) an agree/disagree item asked of Whites and others that "White people in the U.S. have certain advantages because of the color of their skin", with no equivalent item about color-of-skin advantages for people who are not White.

But even if Black participants disproportionately receive resentment items directed at Blacks, the better way to address this inequality and to understand racial attitudes is to add resentment items directed at other groups.

---

4.

RRPM seems to suggest an asymmetry in that only Whites' resentment is normatively bad (p. 25):

In the end, African Americans' quest for civil rights and social justice is resented by Whites, and Whites' maintenance of their group dominance is resented by African Americans.

Davis and Wilson discussed RRPM in a video on the UC Public Policy Channel, with Davis suggesting that "a broader swath of citizens need to be held accountable for what they believe" (at 6:10) and that "...the important conversation we need to have is not about racists. Okay. We need to understand how ordinary American citizens approach race, approach values that place them in the same bucket as racists. They're not racists, but they support the same thing that racists support" (at 53:37).

But, from what I can tell, the ordinary American citizens in the same bucket as racists don't seem to be, say, people who support hiring preferences for Blacks for normatively good reasons and just happen to have the same policy preferences as people who support hiring preferences for Blacks because of racism against Whites. Instead, my sense is that the racism in question is limited to racism that causes racial inequality: David C. Wilson at 3:24 in the UC video:

And so, even if one is not racist, they can still exacerbate racial injustice and racial inequality by focusing on their values rather than the actual problem and any solutions that might be at bay to try and solve them.

---

Another apparent asymmetry is that RRPM mentions legitimizing racial myths throughout the book (vii, 3, 8, 21, 23, 28, 35, 47, 48, 50, 126, 129, 130, 190, 243, 244, 247, 261, 337, and 342), but legitimizing racial myths are not mentioned in the chapter on African Americans' resentment toward Whites (pp. 214-242). RRPM page 8 figure 1.1 is model of resentment that has an arrow from legitimizing racial myths to resentment, but RRPM doesn't indicate what, if any, legitimizing racial myths inform resentment toward Whites.

Legitimizing myths are conceptualized on page 8 as follows:

Appraisals of deservingness are shaped by legitimizing racial myths, which are widely shared beliefs and stereotypes about African Americans and other minorities that justify their mistreatment and low status. Legitimizing myths are any coherent set of socially accepted attitudes, beliefs, values, and opinions that provide moral and intellectual legitimacy to the unequal distribution of social value (Sidanius, Devereux, and Pratto 1992).

But I don't see why legitimizing myths couldn't add legitimacy to unequal *treatment*. Presumably resentment flows from beliefs about the causes of inequality, so Whites as a/the main/the only cause of Black/White inequality could serve as a belief that legitimizes resentment toward Whites and, consequently, discrimination against Whites.

---

5.

The 1991 National Race and Politics Survey had a survey experiment, asking for agreement/disagreement to the item:

In the past, the Irish, the Italians, the Jews and many other minorities overcame prejudice and worked their
way up.

Version 1: Blacks...
Version 2: New immigrants from Europe...

...should do the same without any special favors?

This experiment reflects the fact that responses to items measuring general phenomena applied to a group might be influenced by the general phenomena and/or the group.

Remarkably, the RRPM measurement of racial schadenfreude (Chapter 7) does not address this ambiguity, with items measuring participant feelings about only President Obama, such as the schadenfreude felt by "Barack Obama's being identified as one of the worst presidents in history". At least RRPM realizes this (p. 206):

Without a more elaborate research design, we cannot really determine whether the schadenfreude experienced by Republicans is due to his race or to some other issue.

---

6.

For an analysis of racial resentment in the political mind, RRPM remarkably doesn't substantively consider Asians, even if only as a target of resentment to help test alternate explanations about the cause of resentment, given that, like Whites, Asians on average have relatively positive outcomes in income and related measures, but do not seem to be blamed for U.S. racial inequality as much as Whites are.

---

NOTES

1. From RRPM (p. 241):

When items designed on one race are automatically applied to another race under the assumption of equal meaning, it creates measurement invariance.

Maybe the intended meaning is something such as "When items designed on one race are automatically applied to another race, it assumes measurement invariance".

2. RRPM Figure 2.1 (p. 68) reports how resentment correlates with feeling thermometer ratings about Blacks and with feeling thermometer ratings about Whites, but not with the more intuitive measure of the *difference* in feeling thermometer ratings about Blacks and about Whites.

Tagged with: , , , ,

I posted earlier about Jardina and Piston 2021 "The Effects of Dehumanizing Attitudes about Black People on Whites' Voting Decisions".

Jardina and Piston 2021 limited the analysis to White respondents, even though the Qualtrics_BJPS dataset at the Dataverse page for Jardina and Piston 2021 contained observations for non-White respondents. The Qualtrics_BJPS dataset had variables such as aofmanpic_1 and aofmanpic_6, and I didn't know which of these variables corresponded to which target groups.

My post indicated a plan to follow up if I got sufficient data to analyze responses from non-White participants. Replication code has now been posted at version 2 of the Dataverse page for Jardina and Piston 2021, so this is that planned post.

---

Version 2 of the Jardina and Piston 2021 Dataverse page has a Qualtrics dataset (Qualtrics_2016_BJPS_raw) that differs from the version 1 Qualtrics dataset (Qualtrics_BJPS): for example, the version 2 Qualtrics dataset doesn't contain data for non-White respondents, doesn't contain respondent ID variables V1 and uid, and doesn't contain variables such as aofmanpic_2.

I ran the Jardina and Piston 2021 "aofman" replication code on the Qualtrics_BJPS dataset to get a variable named "aofmanwb". In the version 2 dataset, this produced the output for the Trump analysis in Table 1 of Jardina and Piston 2021, so this aofmanwb variable is the "Ascent of man" dehumanization measure, coded so that rating Blacks as equally evolved as Whites is 0.5, rating Whites as more evolved than Blacks runs from just above 0.5 to 1, and rating Blacks more evolved than Whites runs from just under 0.5 down to zero.

The version 2 replication code for Jardina and Piston 2021 suggests that aofmanpic_1 is for rating how evolved Blacks are and aofmanpic_4 is for rating how evolved Whites are. So unless these variable names were changed between versions of the dataset, the version 2 replication code should produce the "Ascent of man" dehumanization measure when applied to the version 1 dataset, which is still available at the Jardina and Piston 2021 Dataverse page.

To check, I ran commands such as "reg aofmanwb ib4.ideology if race==1 & latino==2" in both datasets, and got similar but not exact results, with the difference presumably due to the differences between datasets discussed in the notes below.

---

The version 1 Qualtrics dataset didn't contain a variable that I thought was a weight variable, so my analyses below are unweighted.

In the version 1 dataset, the medians of aofmanwb were 0.50 among non-Latino Whites in the sample (N=450), 0.50 among non-Latino Blacks in the sample (N=98), and 0.50 among respondents coded Asian, Native American, or Other (N=125). Respective means were 0.53, 0.48, and 0.51.

Figure 1 of Jardina and Piston 2021 mentions the use of sliders to select responses to the items about how evolved target groups are, and I think that some unequal ratings might be due to respondent imprecision instead of an intent to dehumanize, such as if a respondent intended to select 85 for each group in a pair, but moved the slider to 85 for one group and 84 for the other group, and then figured that this was close enough. So I'll report percentages below with a strict dehumanization definition of anything differing from 0.5 on the 0-to-1 scale as dehumanization, but I'll also report percentages with a tolerance for potential unintentional dehumanization.

---

For the strict coding of dehumanization, I recoded aofmanwb into a variable that had levels for [1] rating Blacks as more evolved than Whites, [2] equal ratings of how evolved Blacks and Whites are, and [3] rating Whites as more evolved than Blacks.

In the version 1 dataset, 13% of non-Latino Whites in the sample rated Blacks more evolved than Whites, with an 83.4% confidence interval of [11%, 16%], and 39% rated Whites more evolved than Blacks [36%, 43%]. 42% of non-Latino Blacks in the sample rated Blacks more evolved than Whites [35%, 49%], and 23% rated Whites more evolved than Blacks [18%, 30%]. 19% of respondents not coded Black or White in the sample rated Blacks more evolved than Whites [15%, 25%], and 38% rated Whites more evolved than Blacks [32%, 45%].

---

For the non-strict coding of dehumanization, I recoded aofmanwb into a variable that had levels that included [1] rating Blacks at least 3 units more evolved than Whites on a 0-to-100 scale, and [5] rating Whites at least 3 units more evolved than Blacks on a 0-to-100 scale.

In the version 1 dataset, 8% of non-Latino Whites in the sample rated Blacks more evolved than Whites [7%, 10%], and 30% rated Whites more evolved than Blacks [27%, 34%]. 34% of non-Latino Blacks in the sample rated Blacks more evolved than Whites [27%, 41%], and 21% rated Whites more evolved than Blacks [16%, 28%]. 13% of respondents not coded Black or White in the sample rated Blacks more evolved than Whites [9%, 18%], and 31% rated Whites more evolved than Blacks [26%, 37%].

---

NOTES

1. Variable labels in the Qualtrics dataset ("male" coded 0 for "Male" and 1 for "Female") and associated replication commands suggest that Jardina and Piston 2021 might have reported results for a "Female" variable coded 1 for male and 0 for female, which would explain why Table 1 Model 1 of Jardina and Piston 2021 indicates that females were predicted to have higher ratings about Trump net of controls at p<0.01 compared to males, even though the statistically significant coefficients for "Female" in the analyses from other datasets in Jardina and Piston 2021 are negative when predicting positive outcomes for Trump.

The "Female" variable in Jardina and Piston 2021 Table 1 Model 1 is right above the statistically significant coefficient and standard error for age, of "0.00" and "0.00". The table note indicates that "All variables are transformed onto a 0 to 1 scale.", but that isn't correct for the age predictor, which ranges from 19 to 86.

2. I produced a plot like Jardina and Piston 2021 Figure 3, but with a range from most dehumanization of Whites relative to Blacks to most dehumanization of Blacks relative to Whites. The 95% confidence interval for Trump ratings at most dehumanization of Whites relative to Blacks did not overlap with the 95% confidence interval for Trump ratings at no / equal dehumanization of Whites and Blacks. But, as indicated in my later analyses, that might merely be due to the Jardina and Piston 2021 use of aofmanwb as a continuous predictor: the aforementioned inference wasn't supported using 83.4% confidence intervals when the aofmanwb predictor was trichotomized as described above.

3. Regarding differences between Qualtrics datasets posted to the Jardina and Piston 2021 Dataverse page, the Stata command "tab race latino, mi" returns 980 respondents who selected "White" for the race item and "No" for the Latino item in the version 1 Qualtrics dataset, but returns 992 respondents who selected "White" for the race item and "No" for the Latino item in the version 2 Qualtrics dataset.

Both version 1 and version 2 of the Qualtrics datasets contain exactly one observation with a 1949 birth year and a state of Missouri. In both datasets, this observation has codes that indicate a White non-Latino neither-liberal-nor-conservative male Democrat with some college but no degree who has an income of $35,000 to $39,999. That observation has values of 100 for aofmanvinc_1 and 100 for aofmanvinc_4 in the version 2 Qualtrics dataset, but, in the version 1 Qualtrics dataset, that observation has no numeric values for aofmanvinc_1, aofmanvinc_4, or any other variable starting with "aofman".

I haven't yet received an explanation about this from Jardina and/or Piston.

4. Below is a description of more checking about whether aofmanwb is correctly interpreted above, given that the Dataverse page for Jardina and Piston 2021 doesn't have a codebook.

I dropped all cases in the original dataset not coded race==1 and latino==2. Case 7 in the version 2 dataset is from New York, born in 1979, has an aofmanpic_1 of 84 , and an aofmanpic_4 of 92; this matches Case 7 in the version 1 dataset when dropping aforementioned cases. Case 21 in the version 1 dataset is from South Carolina, born in 1966, has an aofmanvinc_1 of 79, and an aofmanvinc_4 of 75; this matches Case 21 in the version 2 dataset when dropping aforementioned cases. Case 951 in the version 1 dataset is from Georgia, born in 1992, has an aofmannopi_1 of 77, and an aofmannopi_4 of 65; this matches case *964* in the version 2 dataset when dropping aforementioned cases.

5. From what I can tell, for anyone interested in analyzing the data, thermind_2 in the version 2 dataset is the feeling thermometer about Donald Trump, and thermind_4 is the feeling thermometer about Barack Obama.

6. Stata code and output from my analysis.

Tagged with: ,

The Monkey Cage recently published "Nearly all NFL head coaches are White. What are the odds?" [archived], by Bethany Lacina.

Lacina reported analyses that compared observed racial percentages of NFL head coaches to benchmark percentages that are presumably intended to represent what racial percentages of NFL head coaches would occur absent racial bias. For example, Lacina compared the percentage of Whites among NFL head coaches hired since February 2021 (8 of 10, or 80%) to the percentage of Whites among the set of NFL offensive coordinators, defensive coordinators, and recently fired head coaches (which was between 70% and 80% White).

Lacina indicated that:

If the hiring process did not favor White candidates, the chances of hiring eight White people from that pool is only about one in four — or plus-322 in sportsbook terms.

I think that Lacina might have reported the probability that *exactly* eight of the ten recent NFL coach hires were White. But for assessing unfair bias favoring White candidates, it makes more sense to report the probability that *at least* eight of the ten recent NFL coach hires were White: that probability is 38% using a 70% White pool and is 67% using an 80% White pool. See Notes 1 through 3 below.

---

Lacina also conducted an analysis for the one Black NFL head coach among the 14 NFL head coaches in 2021 to 2022 who were young enough to have played in the NCAA between 1999 and 2007, given that demographic data from her source were available starting in 1999. Benchmark percentages were 30% Black from NCAA football players and 44% Black from NCAA Division I football players.

The correctness of Lacina's calculations for this analysis doesn't seem to matter, because the benchmark does not seem to be a reasonable representation of how NFL head coaches are selected. For example, quarterback is the most important player position, and quarterbacks presumably need to know football strategy relatively well compared to players at most or all other positions, so I think that the per capita probability of a college quarterback becoming an NFL head coach is likely nontrivially higher than the per capita probability of players from other positions becoming an NFL head coach; however, Lacina's benchmark doesn't adjust for player position.

---

None of the above analysis should be interpreted to suggest that selection of NFL head coaches has been free from racial bias. But I think that it's reasonable to suggest that the Lacina analysis isn't very informative either way.

---

NOTES

1. Below is R code for a simulation that returns a probability of about 24%, for the probability that *exactly* eight of ten candidates are White, drawn without replacement from a candidate pool of 32 offensive coordinators and 32 defensive coordinators that is overall 70% White:

SET  <- c(rep_len(1,45),rep_len(0,19))
LIST <- c()
for (i in 1:100000){
   LIST[i] <- sum(sample(SET,10,replace=F))
}
table(LIST)
length(LIST[LIST==8])/length(LIST)

The probability is about 32% if the pool of 64 is 80% White. Adding in a few recently fired head coaches doesn't change the percentage much.

2. In reality, 8 White candidates were hired for the 10 NFL head coaching positions. So how do we assess the extent to which this observed result suggests unfair bias in favor of White candidates? Let's first get results from the simulation...

For my 100,000-run simulation using the above code and a random seed of 123, the simulation produced exactly zero White head coaches zero times, exactly 1 White head coach 5 times, exactly 2 White head coaches 52 times, exactly 3 White head coaches 461 times, exactly 4 White head coaches 2654 times, exactly 5 White head coaches 9255 times, exactly 6 White head coaches 20987 times, exactly 7 White head coaches 29307 times, exactly 8 White head coaches 24246 times, exactly 9 White head coaches 10978 times, and exactly 10 White head coaches 2055 times.

The simulation indicated that, if candidates were randomly drawn from a 70% White pool, exactly 8 of 10 coaches would be White about 24% of the time (24,246/100,000). This 8-of-10 result represents a selection of candidates from the pool that is perfectly fair with no evidence of bias for *or against* White candidates.

The 8-of-10 result would be the proper focus if our interest were bias for *or against* White candidates. But the Lacina post didn't seem concerned about evidence of bias against White candidates, so the 9 White of 10 simulation result and the 10 White of 10 simulation result should be added to the totals to get 37%: the 9 of 10 and 10 of 10 represent simulated outcomes in which White candidates were underrepresented in reality relative to that outcome from the simulation. So the 8 of 10 represents no bias and the 9 of 10 and the 10 of 10 represent bias against Whites, so that everything else represents bias favoring Whites.

3. Below is R code for a simulation that returns a probability of about 37%, for the probability that *at least* eight of ten candidates are White, drawn with replacement from a candidate pool of 32 offensive coordinators and 32 defensive coordinators that is overall 70% White:

SET <- c(rep_len(1,45),rep_len(0,19))
LIST <- c()
for (i in 1:100000){
   LIST[i] <- sum(sample(SET,10,replace=F))
}
table(LIST)
length(LIST[LIST>=8])/length(LIST)

---

UPDATE

I corrected some misspellings of "Lacinda" to "Lacina" in the post.

---

UPDATE 2 (March 18, 2022)

Bethany Lacina discussed with me her calculation. She indicated that she did calculate at least eight of ten, but she used a joint probability method that I don't think is correct because random error would bias the inference toward unfair selection of coaches by race. Given the extra information that Bethany provided, here is a revised calculation that produces a probability of about 60%:

# In 2021: 2 non-Whites hired of 6 hires.
# In 2022: 0 non-Whites hired of 4 hires (up to the point of the calculation).
# The simulation below is for the probability that at least 8 of the 10 hires are White.

SET.2021 <- c(rep_len(0,12),rep_len(1,53)) ## 1=White candidate
SET.2022 <- c(rep_len(0,20),rep_len(1,51)) ## 1=White candidate
LIST <- c()

for (i in 1:100000){
DRAW.2021 <- sum(sample(SET.2021,6,replace=F)) 
DRAW.2022 <- sum(sample(SET.2022,4,replace=F)) 
LIST[i] <- DRAW.2021 + DRAW.2022
}

table(LIST)
length(LIST[LIST>=8])/length(LIST)
Tagged with: , ,

Political Behavior recently published Filindra et al 2022 "Beyond Performance: Racial Prejudice and Whites' Mistrust of Government". Hypothesis 1 is the expectation that "...racial prejudice (anti-Black stereotypes) is a negative and significant predictor of trust in government".

Filindra et al 2022 limits the analysis to White respondents and measures anti-Black stereotypes by combining responses to available items in which respondents rate Blacks on seven-point scales, ranging from hardworking to lazy, and/or from peaceful to violent, and/or from intelligent to unintelligent. The data include items about how respondents rate Whites on these scales, but Filindra et al 2022 didn't use these responses to measure anti-Black stereotyping.

But information about how respondents rate Whites is useful for measuring anti-Black stereotyping. For example, a respondent who rates all racial groups at the midpoint of a stereotype scale hasn't indicated an anti-Black stereotype; this respondent's rating about Blacks doesn't differ from the respondent's rating about other racial groups, and it's not clear to me why rating Blacks equal to all other racial groups would be a moderate amount of "prejudice" in this case.

But this respondent who rated all racial groups equally on the stereotype scales nonetheless falls halfway along the Filindra et al 2022 measure of "negative Black stereotypes", in the same location as a respondent who rated Blacks at the midpoint of the scale and rated all other racial groups at the most positive end of the scale.

---

I think that this flawed measurement means that more analyses need to be conducted to know whether the key Filindra et al 2022 finding is merely due to the flawed measure of racial prejudice. Moreover, I think that more analyses need to be conducted to know whether Filindra et al 2022 overlooked evidence of the effect of prejudice against other racial groups.

Filindra et al 2022 didn't indicate whether their results held when using a measure of anti-Black stereotypes that placed respondents who rated all racial groups equally into a different category than respondents who rated Blacks less positively than all other racial groups and a different category than respondents who rated Blacks more positively than all other racial groups. Filindra et al 2022 didn't even report results when their measure of anti-White stereotypes was included in the regressions estimating the effect of anti-Black stereotypes.

A better review process might have produced a Filindra et al 2022 that resolved questions such as: Is the key Filindra et al 2022 finding merely because respondents who don't trust the government rate *all* groups relatively low on stereotype scales? Is the key finding because anti-Black stereotypes and anti-White stereotypes and anti-Hispanic stereotypes and anti-Asian stereotypes *each* reduce trust in government? Or are anti-Black stereotypes the *only* racial stereotypes that reduce trust in government?

Even if anti-Black stereotypes among Whites is the most important combination of racial prejudice and respondent demographics, other combinations of racial stereotype and respondent demographics are important enough to report on and can help readers better understand racial attitudes and their consequences.

---

NOTES

1. Filindra et al 2022 did note that:

Finally, another important consideration is the possibility that other outgroup attitudes or outgroup-related policy preferences may also have an effect on public trust.

That's sort of close to addressing some of the alternate explanations that I suggested, but the Filindra et al 2022 measure for this is a measure about immigration *policy* and not, say, the measures of stereotypes about Hispanics and about Asians that are included in the data.

2. Filindra et al 2022 suggested that:

Future research should focus on the role of attitudes towards immigrants and other racial groups—such as Latinos— and ethnocentrism more broadly in shaping white attitudes toward government.

But it's not clear to me why such analyses aren't included in Filindra et al 2022.

Maybe the expectation is that another publication should report results that include the measures of anti-Hispanic stereotypes and anti-Asian stereotypes in the ANES data. And another publication should report results that include the measures of anti-White stereotypes in the ANES data. And another publication should report results that include or focus on respondents in the ANES data who aren't White. But including all this in Filindra et al 2022 or its supplemental information would be more efficient and could produce a better understanding of political attitudes.

3. Filindra et al 2022 indicated that:

All variables in the models are rescaled on 0–1 scales consistent with the nature of the original variable. This allows us to conceptualize the coefficients as maximum effects and consequently compare the size of coefficients across models.

Scaling all predictors to range from 0 to 1 means that comparison of coefficients likely produces better inferences than if the predictors were on different scales, but differences in 0-to-1 coefficients can also be due to differences in the quality of the measurement of the underlying concept, as discussed in this prior post.

4. Filindra et al 2022 justified not using a differenced stereotype measure, citing evidence such as (from footnote 2):

Factor analysis of the Black and white stereotype items in the ANES confirms that they do not fall on a single dimension.

The reported factor analysis was on ANES 2020 data and included a measure of "lazy" stereotypes about Blacks, a measure of "violent" stereotypes about Blacks, a feeling thermometer about Blacks, a measure of "lazy" stereotypes about Whites, a measure of "violent" stereotypes about Whites, and a feeling thermometer about Whites.[*] But a "differenced" stereotype measure shouldn't be constructed by combining measures like that, as if the measure of "lazy" stereotypes about Blacks is independent of the measure of "lazy" stereotypes about Whites.

A "differenced" stereotype measure could be constructed by, for example, subtracting the "lazy" rating about Whites from the "lazy" rating about Blacks, subtracting the "violent" rating about Whites from the "violent" rating about Blacks, and then summing these two differences. That measure could help address the alternate explanation that the estimated effect for rating Blacks low is because respondents who rate Blacks low also rate all other groups low. That measure could also help address the concern that using only a measure of stereotypes about Blacks underestimates the effect of these stereotypes.

Another potential coding is a categorical measure, coded 1 for rating Blacks lower than Whites on all stereotype measures, 2 for rating Blacks equal to Whites on all stereotype measures, coded 3 for rating Blacks higher than Whites on all stereotype measures, and coded 4 for a residual category. The effect of anti-Black stereotypes could be estimated as the difference net of controls between category 1 and category 2.

Filindra et al 2022 provided justifications other than the factor analysis for not using a differenced stereotype measure, but, even if you agree that stereotype scale ratings about Blacks should not be combined with stereotype scale ratings about Whites, the Filindra et al 2022 arguments don't preclude including their measure of anti-White prejudice as a separate predictor in the analyses.

[*] I'm not sure why the feeling thermometer responses were included in a factor analysis intended to justify not combining stereotype scale responses.

5. I think that labels for the panels of Filindra et al 2022 Figure 1 and the corresponding discussion in the text are backwards: the label for each plot in Figure 1a appears to be "Negative Black Stereotypes", but the Figure 1a label is "Public Trust"; the label for each plot in Figure 1b appears to be "Level of Trust in Govt", but the Figure 1b label is "Anti-Black stereotypes".

My histogram of the Filindra et al 2022 measure of anti-Black stereotypes for the ANES 2020 Time Series Study looks like their 2020 plot in Figure 1a.

6. I'm not sure what the second sentence is supposed to mean, from this part of the Filindra et al 2022 conclusion:

Our results suggest that white Americans' beliefs about the trustworthiness of the federal government have become linked with their racial attitudes. The study shows that even when racial policy preferences are weakly linked to trust in government racial prejudice does not. Analyses of eight surveys...

7. Data source for my analysis: American National Election Studies. 2021. ANES 2020 Time Series Study Full Release [dataset and documentation]. July 19, 2021 version. www.electionstudies.org.

Tagged with: , , , , ,

The recent Rhodes et al 2022 Monkey Cage post indicated that:

...as [Martin Luther] King [Jr.] would have predicted, those who deny the existence of racial inequality are also those who are most willing to reject the legitimacy of a democratic election and condone serious violations of democratic norms.

Regarding this inference about the legitimacy of a democratic election, Rhodes et al 2022 reported results for an item that measured perceptions about the legitimacy of Joe Biden's election as president in 2020. But a potential confound is that reported perceptions of the legitimacy of the U.S. presidential election in 2020 are due to who won that election and are not about elections per se. One way to address this confound is to use a measure of reported perceptions of the legitimacy of the U.S. presidential election *in 2016*, which Donald Trump won.

I checked data from the Democracy Fund Voter Study Group VOTER survey for responses to the items below, which can help address this confound:

[from 2016 and 2020] Over the past few years, Blacks have gotten less than they deserve.

[from 2016] How confident are you that the votes in the 2016 election across the country were accurately counted?

[from 2020] How confident are you that votes across the United States were counted as voters intended in the elections this November?

Results are below:

The dark columns are for respondents who strongly disagreed that Blacks have gotten less than they deserve, so that these respondents can plausibly be described as denying the existence of unfair racial inequality. The light columns are for respondents who strongly agreed that Blacks have gotten less than they deserve, so that these respondents can plausibly be described as most strongly asserting the existence of unfair racial inequality.

Comparison of the 2020 column for "strongly disagree" to the 2020 column for "strongly agree" suggests that, as expected based on Rhodes et al 2022, skepticism about votes in 2020 being counted accurately was more common among respondents who most strongly denied the existence of unfair racial inequality than among respondents who most strongly asserted the existence of unfair racial inequality.

But comparison of the 2016 column for "strongly disagree" to the 2016 column for "strongly agree" suggests that the general phrasing of "those who deny the existence of racial inequality are also those who are most willing to reject the legitimacy of a democratic election" does not hold for every election, such as the presidential election immediately prior to the election that was the focus of the relevant item in Rhodes et al 2022.

---

NOTE

1. Data source. Stata do file. Stata output. Code for the R plot.

Tagged with: , , ,