The Monkey Cage recently published "Nearly all NFL head coaches are White. What are the odds?" [archived], by Bethany Lacina.

Lacina reported analyses that compared observed racial percentages of NFL head coaches to benchmark percentages that are presumably intended to represent what racial percentages of NFL head coaches would occur absent racial bias. For example, Lacina compared the percentage of Whites among NFL head coaches hired since February 2021 (8 of 10, or 80%) to the percentage of Whites among the set of NFL offensive coordinators, defensive coordinators, and recently fired head coaches (which was between 70% and 80% White).

Lacina indicated that:

If the hiring process did not favor White candidates, the chances of hiring eight White people from that pool is only about one in four — or plus-322 in sportsbook terms.

I think that Lacina might have reported the probability that *exactly* eight of the ten recent NFL coach hires were White. But for assessing unfair bias favoring White candidates, it makes more sense to report the probability that *at least* eight of the ten recent NFL coach hires were White: that probability is 38% using a 70% White pool and is 67% using an 80% White pool. See Notes 1 through 3 below.

---

Lacina also conducted an analysis for the one Black NFL head coach among the 14 NFL head coaches in 2021 to 2022 who were young enough to have played in the NCAA between 1999 and 2007, given that demographic data from her source were available starting in 1999. Benchmark percentages were 30% Black from NCAA football players and 44% Black from NCAA Division I football players.

The correctness of Lacina's calculations for this analysis doesn't seem to matter, because the benchmark does not seem to be a reasonable representation of how NFL head coaches are selected. For example, quarterback is the most important player position, and quarterbacks presumably need to know football strategy relatively well compared to players at most or all other positions, so I think that the per capita probability of a college quarterback becoming an NFL head coach is likely nontrivially higher than the per capita probability of players from other positions becoming an NFL head coach; however, Lacina's benchmark doesn't adjust for player position.

---

None of the above analysis should be interpreted to suggest that selection of NFL head coaches has been free from racial bias. But I think that it's reasonable to suggest that the Lacina analysis isn't very informative either way.

---

NOTES

1. Below is R code for a simulation that returns a probability of about 24%, for the probability that *exactly* eight of ten candidates are White, drawn without replacement from a candidate pool of 32 offensive coordinators and 32 defensive coordinators that is overall 70% White:

SET  <- c(rep_len(1,45),rep_len(0,19))
LIST <- c()
for (i in 1:100000){
   LIST[i] <- sum(sample(SET,10,replace=F))
}
table(LIST)
length(LIST[LIST==8])/length(LIST)

The probability is about 32% if the pool of 64 is 80% White. Adding in a few recently fired head coaches doesn't change the percentage much.

2. In reality, 8 White candidates were hired for the 10 NFL head coaching positions. So how do we assess the extent to which this observed result suggests unfair bias in favor of White candidates? Let's first get results from the simulation...

For my 100,000-run simulation using the above code and a random seed of 123, the simulation produced exactly zero White head coaches zero times, exactly 1 White head coach 5 times, exactly 2 White head coaches 52 times, exactly 3 White head coaches 461 times, exactly 4 White head coaches 2654 times, exactly 5 White head coaches 9255 times, exactly 6 White head coaches 20987 times, exactly 7 White head coaches 29307 times, exactly 8 White head coaches 24246 times, exactly 9 White head coaches 10978 times, and exactly 10 White head coaches 2055 times.

The simulation indicated that, if candidates were randomly drawn from a 70% White pool, exactly 8 of 10 coaches would be White about 24% of the time (24,246/100,000). This 8-of-10 result represents a selection of candidates from the pool that is perfectly fair with no evidence of bias for *or against* White candidates.

The 8-of-10 result would be the proper focus if our interest were bias for *or against* White candidates. But the Lacina post didn't seem concerned about evidence of bias against White candidates, so the 9 White of 10 simulation result and the 10 White of 10 simulation result should be added to the totals to get 37%: the 9 of 10 and 10 of 10 represent simulated outcomes in which White candidates were underrepresented in reality relative to that outcome from the simulation. So the 8 of 10 represents no bias and the 9 of 10 and the 10 of 10 represent bias against Whites, so that everything else represents bias favoring Whites.

3. Below is R code for a simulation that returns a probability of about 37%, for the probability that *at least* eight of ten candidates are White, drawn with replacement from a candidate pool of 32 offensive coordinators and 32 defensive coordinators that is overall 70% White:

SET <- c(rep_len(1,45),rep_len(0,19))
LIST <- c()
for (i in 1:100000){
   LIST[i] <- sum(sample(SET,10,replace=F))
}
table(LIST)
length(LIST[LIST>=8])/length(LIST)

---

UPDATE

I corrected some misspellings of "Lacinda" to "Lacina" in the post.

---

UPDATE 2 (March 18, 2022)

Bethany Lacina discussed with me her calculation. She indicated that she did calculate at least eight of ten, but she used a joint probability method that I don't think is correct because random error would bias the inference toward unfair selection of coaches by race. Given the extra information that Bethany provided, here is a revised calculation that produces a probability of about 60%:

# In 2021: 2 non-Whites hired of 6 hires.
# In 2022: 0 non-Whites hired of 4 hires (up to the point of the calculation).
# The simulation below is for the probability that at least 8 of the 10 hires are White.

SET.2021 <- c(rep_len(0,12),rep_len(1,53)) ## 1=White candidate
SET.2022 <- c(rep_len(0,20),rep_len(1,51)) ## 1=White candidate
LIST <- c()

for (i in 1:100000){
DRAW.2021 <- sum(sample(SET.2021,6,replace=F)) 
DRAW.2022 <- sum(sample(SET.2022,4,replace=F)) 
LIST[i] <- DRAW.2021 + DRAW.2022
}

table(LIST)
length(LIST[LIST>=8])/length(LIST)
Tagged with: , ,

Political Behavior recently published Filindra et al 2022 "Beyond Performance: Racial Prejudice and Whites' Mistrust of Government". Hypothesis 1 is the expectation that "...racial prejudice (anti-Black stereotypes) is a negative and significant predictor of trust in government".

Filindra et al 2022 limits the analysis to White respondents and measures anti-Black stereotypes by combining responses to available items in which respondents rate Blacks on seven-point scales, ranging from hardworking to lazy, and/or from peaceful to violent, and/or from intelligent to unintelligent. The data include items about how respondents rate Whites on these scales, but Filindra et al 2022 didn't use these responses to measure anti-Black stereotyping.

But information about how respondents rate Whites is useful for measuring anti-Black stereotyping. For example, a respondent who rates all racial groups at the midpoint of a stereotype scale hasn't indicated an anti-Black stereotype; this respondent's rating about Blacks doesn't differ from the respondent's rating about other racial groups, and it's not clear to me why rating Blacks equal to all other racial groups would be a moderate amount of "prejudice" in this case.

But this respondent who rated all racial groups equally on the stereotype scales nonetheless falls halfway along the Filindra et al 2022 measure of "negative Black stereotypes", in the same location as a respondent who rated Blacks at the midpoint of the scale and rated all other racial groups at the most positive end of the scale.

---

I think that this flawed measurement means that more analyses need to be conducted to know whether the key Filindra et al 2022 finding is merely due to the flawed measure of racial prejudice. Moreover, I think that more analyses need to be conducted to know whether Filindra et al 2022 overlooked evidence of the effect of prejudice against other racial groups.

Filindra et al 2022 didn't indicate whether their results held when using a measure of anti-Black stereotypes that placed respondents who rated all racial groups equally into a different category than respondents who rated Blacks less positively than all other racial groups and a different category than respondents who rated Blacks more positively than all other racial groups. Filindra et al 2022 didn't even report results when their measure of anti-White stereotypes was included in the regressions estimating the effect of anti-Black stereotypes.

A better review process might have produced a Filindra et al 2022 that resolved questions such as: Is the key Filindra et al 2022 finding merely because respondents who don't trust the government rate *all* groups relatively low on stereotype scales? Is the key finding because anti-Black stereotypes and anti-White stereotypes and anti-Hispanic stereotypes and anti-Asian stereotypes *each* reduce trust in government? Or are anti-Black stereotypes the *only* racial stereotypes that reduce trust in government?

Even if anti-Black stereotypes among Whites is the most important combination of racial prejudice and respondent demographics, other combinations of racial stereotype and respondent demographics are important enough to report on and can help readers better understand racial attitudes and their consequences.

---

NOTES

1. Filindra et al 2022 did note that:

Finally, another important consideration is the possibility that other outgroup attitudes or outgroup-related policy preferences may also have an effect on public trust.

That's sort of close to addressing some of the alternate explanations that I suggested, but the Filindra et al 2022 measure for this is a measure about immigration *policy* and not, say, the measures of stereotypes about Hispanics and about Asians that are included in the data.

2. Filindra et al 2022 suggested that:

Future research should focus on the role of attitudes towards immigrants and other racial groups—such as Latinos— and ethnocentrism more broadly in shaping white attitudes toward government.

But it's not clear to me why such analyses aren't included in Filindra et al 2022.

Maybe the expectation is that another publication should report results that include the measures of anti-Hispanic stereotypes and anti-Asian stereotypes in the ANES data. And another publication should report results that include the measures of anti-White stereotypes in the ANES data. And another publication should report results that include or focus on respondents in the ANES data who aren't White. But including all this in Filindra et al 2022 or its supplemental information would be more efficient and could produce a better understanding of political attitudes.

3. Filindra et al 2022 indicated that:

All variables in the models are rescaled on 0–1 scales consistent with the nature of the original variable. This allows us to conceptualize the coefficients as maximum effects and consequently compare the size of coefficients across models.

Scaling all predictors to range from 0 to 1 means that comparison of coefficients likely produces better inferences than if the predictors were on different scales, but differences in 0-to-1 coefficients can also be due to differences in the quality of the measurement of the underlying concept, as discussed in this prior post.

4. Filindra et al 2022 justified not using a differenced stereotype measure, citing evidence such as (from footnote 2):

Factor analysis of the Black and white stereotype items in the ANES confirms that they do not fall on a single dimension.

The reported factor analysis was on ANES 2020 data and included a measure of "lazy" stereotypes about Blacks, a measure of "violent" stereotypes about Blacks, a feeling thermometer about Blacks, a measure of "lazy" stereotypes about Whites, a measure of "violent" stereotypes about Whites, and a feeling thermometer about Whites.[*] But a "differenced" stereotype measure shouldn't be constructed by combining measures like that, as if the measure of "lazy" stereotypes about Blacks is independent of the measure of "lazy" stereotypes about Whites.

A "differenced" stereotype measure could be constructed by, for example, subtracting the "lazy" rating about Whites from the "lazy" rating about Blacks, subtracting the "violent" rating about Whites from the "violent" rating about Blacks, and then summing these two differences. That measure could help address the alternate explanation that the estimated effect for rating Blacks low is because respondents who rate Blacks low also rate all other groups low. That measure could also help address the concern that using only a measure of stereotypes about Blacks underestimates the effect of these stereotypes.

Another potential coding is a categorical measure, coded 1 for rating Blacks lower than Whites on all stereotype measures, 2 for rating Blacks equal to Whites on all stereotype measures, coded 3 for rating Blacks higher than Whites on all stereotype measures, and coded 4 for a residual category. The effect of anti-Black stereotypes could be estimated as the difference net of controls between category 1 and category 2.

Filindra et al 2022 provided justifications other than the factor analysis for not using a differenced stereotype measure, but, even if you agree that stereotype scale ratings about Blacks should not be combined with stereotype scale ratings about Whites, the Filindra et al 2022 arguments don't preclude including their measure of anti-White prejudice as a separate predictor in the analyses.

[*] I'm not sure why the feeling thermometer responses were included in a factor analysis intended to justify not combining stereotype scale responses.

5. I think that labels for the panels of Filindra et al 2022 Figure 1 and the corresponding discussion in the text are backwards: the label for each plot in Figure 1a appears to be "Negative Black Stereotypes", but the Figure 1a label is "Public Trust"; the label for each plot in Figure 1b appears to be "Level of Trust in Govt", but the Figure 1b label is "Anti-Black stereotypes".

My histogram of the Filindra et al 2022 measure of anti-Black stereotypes for the ANES 2020 Time Series Study looks like their 2020 plot in Figure 1a.

6. I'm not sure what the second sentence is supposed to mean, from this part of the Filindra et al 2022 conclusion:

Our results suggest that white Americans' beliefs about the trustworthiness of the federal government have become linked with their racial attitudes. The study shows that even when racial policy preferences are weakly linked to trust in government racial prejudice does not. Analyses of eight surveys...

7. Data source for my analysis: American National Election Studies. 2021. ANES 2020 Time Series Study Full Release [dataset and documentation]. July 19, 2021 version. www.electionstudies.org.

Tagged with: , , , , ,