Electoral Studies recently published Jardina and Stephens-Dougan 2021 "The electoral consequences of anti-Muslim prejudice". Jardina and Stephens-Dougan 2021 reported results from 2004 through 2020 ANES Time Series Studies, estimating the effect of anti-Muslim prejudice on vote choice, among White Americans, using feeling thermometer ratings and responses on stereotype scales.

Figure 1 of Jardina and Stephens-Dougan 2021 reports non-Hispanic Whites' mean feeling thermometer ratings about Muslims, Whites, Blacks, Hispanics, and Asians...but not about Christian fundamentalists, even though ANES data for each year in Figure 1 contain feeling thermometer ratings about Christian fundamentalists.

The code for Jardina and Stephens-Dougan 2021 includes a section for "*Robustness for anti christian fundamental affect", indicating an awareness of the thermometer ratings about Christian fundamentalists.

I drafted a quick report about how reported 2020 U.S. presidential vote choice associated with feeling thermometer ratings about Jews, Christians, Muslims, and Christian fundamentalists, using data from the ANES 2020 Time Series Study. Plots are below, with more detailed descriptions in the quick report.

This first plot is of the distributions of feeling thermometer ratings about the religious groups asked about, with categories such as [51/99] indicating the percentage that rated the indicated group at 51 through 99 on the thermometer:

This next plot is of how the ratings about a given religious group associated with 2020 two-party presidential vote choice for Trump, with demographic controls only, and a separate regression for ratings about each religious group:

This next plot added controls for partisanship, political ideology, and racial resentment, and put all ratings of religious groups into the same regression:

The above plot zooms in on y-axis percentages from 20 to 60. The plot in the quick report has a y-axis that runs from 0 to 100.

---

Based on a Google Scholar search, research is available about the political implications of attitudes about Christian fundamentalists, such as Bolce and De Maio 1999. I'll plan to add a discussion of this if I convert the quick report into a proper paper.

---

The technique in the quick report hopefully improves on the Jardina and Stephens-Dougan 2021 technique for estimating anti-Muslim prejudice. From Jardina and Stephens-Dougan 2021 (p. 5):

A one-unit change on the anti-Muslim affect measure results in a 16-point colder thermometer evaluation of Kerry in 2004, a 22-point less favorable evaluation of Obama in both 2008 and 2012, and a 17-point lower rating of Biden in 2020.

From what I can tell, this one-unit change is the difference between estimated support for a candidate, net of controls, comparing a 0 rating about Muslims on the feeling thermometers to a 100 rating about Muslims on the feeling thermometers, based on a regression in which the "Negative Muslim Affect" predictor was merely the set of feeling thermometer ratings about Muslims reversed and placed on a 0-to-1 scale.

If so, then the estimated effect size of anti-Muslim affect is identical to the estimated effect size of pro-Muslim affect. Or maybe Jardina and Stephens-Dougan 2021 considers rating Muslims at 100 to be indifference about Muslims, 99 indicates some anti-Muslim affect, 98 indicates a bit more anti-Muslim affect, and so on.

It seems more reasonable to me that some people are on net indifferent about Muslims, some people have on net positive absolute views about Muslims, and some people have on net negative absolute views about Muslims. So instead I coded feeling thermometer ratings for each religious group into six categories: zero (the coldest possible rating), 100 (the warmest possible rating), 1 through 49 (residual cold ratings), 50 (indifference), 51 through 99 (residual warm ratings), and non-responses.

The extreme categories of 0 and 100 are to estimate the outcome at the extremes, and the 50 category is to estimate the outcome at indifference. If the number of observations at the extremes is not sufficiently large for some predictors, it might make more sense to also collapse the extreme value into adjoining values on the same side of 50.

---

NOTES

1. Jardina and Stephens-Dougan 2021 footnote 24 has an unexpected-to-me criticism of Michael Tesler's work.

We note that our findings with respect to 2012 are not consistent with Tesler (2016a), who finds that anti-Muslim attitudes were predictive of voting for Obama in 2012. Tesler, however, does not control for economic evaluations in his vote choice models, despite the fact that attitudes toward the economy are notoriously important predictors of presidential vote choice (Vavreck 2009)...

I don't think that a regression should include a predictor merely because the predictor is known to be a good predictor of the outcome, so it's not clear to me that Tesler or anyone else should include participant economic evaluations when predicting vote choice merely because participant economic evaluations predict vote choice.

It seems plausible that a nontrivial part of participant economic evaluations are downstream from attitudes about the candidates. Tesler's co-authored Identity Crisis book has a plot (p. 208) illustrating the flip-flop by Republicans and Democrats on views of the economy from around November 2016, with a note that:

This is another reason to downplay the role of subjective economic dissatisfaction in the election: it was largely a consequence of partisan politics, not a cause of partisans' choices.

2. Jardina and Stephens-Dougan 2021 indicated that (p. 5):

The fact, however, that the effect size of anti-Muslim affect is often on par with the effect size of racial resentment is especially noteworthy, given that the construct is measured far less robustly than the multi-item measure of racial resentment.

The anti-Muslim affect measure is a reversed 0-to-100 feeling thermometer, which has 101 potential levels. Racial resentment is built from four items, with each item having five substantive options, so that would permit the creation of a measure that has 17 substantive levels, not counting any intermediate levels that might occur for participants with missing data for some but not all of the four items.

I'm not sure why it's particularly noteworthy that the estimated effect for the 101-measure scale is on par with the estimated effect for the 17-level measure. From what I can tell, these measures are not easily comparable, unless we know, for example, the percentage of participants that fell into the most extreme levels.

3. Jardina and Stephens-Dougan 2021 reviewed a lot of the research on the political implications about attitudes about Muslims. But no mention of Helbling and Traunmüller 2018, which, based on data from the UK, indicated that:

The results suggest that Muslim immigrants are not per se viewed more negatively than Christian immigrants. Instead, the study finds evidence that citizens' uneasiness with Muslim immigration is first and foremost the result of a rejection of fundamentalist forms of religiosity.

4. I have a prior post about selective reporting in the 2016 JOP article from Stephens-Dougan, the second author of Jardina and Stephens-Dougan 2021.

5. Quick report. Stata code. Stata output.

Tagged with: , ,

1.

Abrajano and Lajevardi 2021 "(Mis)Informed: What Americans Know About Social Groups and Why it Matters for Politics" reported (p. 34) that:

We find that White Americans, men, the racially resentful, Republicans, and those who turn to Fox and Breitbart for news strongly predict misinformation about [socially marginalized] social groups.

But their research design is biased toward many or all of these results, given their selection of items for their 14-item set of misinformation items. I'll focus below on left/right political bias, and then discuss apparent errors in the publication.

---

2.

Item #7 is a true/false item:

Most terrorist incidents on US soil have been conducted by Muslims.

This item will code as misinformed some participants who overestimate the percentage of U.S.-based terror attacks committed by Muslims, but won't code as misinformed any participants who underestimate that percentage.

It seems reasonable to me that persons on the political Left will be more likely than persons on the Right to underestimate the percentage of U.S.-based terror attacks committed by Muslims and that persons on the political Right will be more likely than persons on the Left to overestimate the percentage of U.S.-based terror attacks committed by Muslims, so I'll code this item as favoring the political Left.

---

Four items (#11 to #14) ask about Black/White differences in receipt of federal assistance, but phrased so that Whites are the "primary recipients" of food stamps, welfare, and social security.

But none of these items measured misinformation about receipt of federal assistance as a percentage. So participants who report that the *number* of Blacks who receive food stamps is higher than the number of Whites who receive food stamps get coded as misinformed. But participants who mistakenly think that the *percentage* of Whites who receive food stamps is higher than the percentage of Blacks who receive food stamps do not get coded as misinformed.

Table 2 of this U.S. government report indicates that, in 2018, non-Hispanic Whites were 67% of households, 45% of households receiving SNAP (food stamps), and 70% of households not receiving SNAP. Respective percentages for Blacks were 12%, 27%, and 11% and for Hispanics were 13.5%, 22%, and 12%. So, based on this, it's correct that Whites are the largest racial/ethnic group that receives food stamps on a total population basis...but it's also true that Whites are the largest racial/ethnic group that does NOT receive food stamps on a total population basis.

It seems reasonable to me that the omission of percentage versions of these three public assistance items favors the political Left, in the sense that persons on the political Left are more likely to rate Blacks higher than Whites than are persons on the political Right, or, for that matter, Independents and moderates, so that these persons on the Left would presumably be more likely than persons on the Right to prefer (and thus guess) that Whites and not Blacks are the primary recipients of federal assistance. So, by my count, that's at least four items that favor the political Left.

---

As far as I can tell, Abrajano and Lajevardi 2021 didn't provide citations to justify their coding of correct responses. But it seems to me that such citation should be a basic requirement for research that codes responses as correct, except for  obvious items such as, say, who the current Vice President is. A potential problem with this lack of citation is that it's not clear to me that some responses that Abrajano and Lajevardi 2021 coded as correct are truly correct or at least are the only responses that should be coded as correct.

Abrajano and Lajevardi 2021 coded "Whites" as the only correct response for the "primary recipients" item about welfare, but this government document indicates that, for 2018, the distribution of TANF recipients was 37.8% Hispanic, 28.9% Black, 27.2% White, 2.1% multi-racial, 1.9% Asian, 1.5% AIAN, and 0.6% NHOPI.

And "about the same" is coded as the only correct response for the item about the "primary recipients" of public housing (item #14), but Table 14 of this CRS Report indicates that, in 2017, 33% of public housing had a non-Hispanic White head of household and 43% had a non-Hispanic Black head of household. This webpage permits searching for "public housing" for different years (screenshot below), which, for 2016, indicates percentages of 45% for non-Hispanic Blacks and 29% for non-Hispanic Whites.

Moreover, it seems suboptimal to have the imprecise "about the same" response be the only correct response. Unless outcomes for Blacks and Whites are exactly the same, presumably selection of one or the other group should count as the correct response.

---

Does a political bias in the Abrajano and Lajevardi 2021 research design matter? I think that the misinformation rates are close enough so that it matters: Figure A2 indicates that the Republican/Democrat misinformation gap is less than a point, with misinformed means of 6.51 for Republicans and 5.83 for Democrats.

Ironically, Abrajano and Lajevardi 2021 Table A1 indicates that their sample was 52% Democrat and 21% Republican, so -- on the "total" basis that Abrajano and Lajevardi 2021 used for the federal assistance items -- Democrats were the "primary" partisan source of misinformation about socially marginalized groups.

---

NOTES

1. Abrajano and Lajevardi 2021 (pp. 24-25) refers to a figure that isn't in the main text, and I'm not sure where it is:

When we compare the misinformation rates across the five social groups, a number of notable patterns emerge (see Figure 2)...At the same time, we recognize that the magnitude of difference between White and Asian American's [sic] average level of misinformation (3.4) is not considerably larger than it is for Blacks (3.2), nor for Muslim American respondents, who report the lowest levels of misinformation.

Table A5 in the appendix indicates that Blacks had a lower misinformation mean than Muslims did, 5.583 compared to 5.914, so I'm not sure what the aforementioned passage refers to. The passage phrasing refers to a "magnitude of difference", but 3.4 doesn't seem to refer to a social group gap or to an absolute score for any of the social groups.

2. Abrajano and Lajevardi 2021 footnote 13 is:

Recall that question #11 is actually four separate questions, which brings us to a total of thirteen questions that comprise this aggregate measure of political misinformation.

Question 11 being four separate questions means that there are 14 questions, and Abrajano and Lajevardi 2021 refers to "fourteen" questions elsewhere (pp. 6, 17).

Abrajano and Lajevardi 2021 indicated that "...we also observe about 11% of individuals who provided inaccurate answers to all or nearly all of the information questions" (p. 24, emphasis in the original), and it seems a bit misleading to italicize "all" if no one provided inaccurate responses to all 14 items.

3. Below, I'll discuss the full set of 14 "misinformation" items. Feel free to disagree with my count, but I would be interested in an argument that the 14 items do not on net bias results toward the Abrajano and Lajevardi 2021 claim that Republicans are more misinformed than Democrats about socially marginalized groups.

For the aforementioned items, I'm coding items #7 (Muslim terror %), #11 (food stamps), #12 (welfare), and #14 (public housing) as biased in favor of the political Left, because I think that these items are phrased so that the items will catch more misinformation among the political Right than among the political Left, even though the items could be phrased to catch more misinformation among the Left than among the Right.

I'm not sure about the item about social security (#13) , so I won't code that item as politically biased. So by my count that's 4 in favor of the Left, plus 1 neutral.

Item #5 seems to be a good item, measuring whether participants know that Blacks and Latinos are more likely to live in regions with environmental problems. But it's worth noting that this item is phrased in terms of rates and not, as for the federal assistance items, as the total number of persons by racial/ethnic group. So by my count that's 4 in favor of the Left, plus 2 neutral.

Item #1 is about the number of undocumented immigrants in the United States. I won't code that item as politically biased. So by my count that's 4 in favor of the Left, plus 3 neutral.

The correct response for item #2 is that most immigrants in the United States are here legally. I'll code this item as favoring the political Left for the same reason as the Muslim terror % item: the item catches participants who overestimate the percentage of immigrants here illegally, but the item doesn't catch participants who underestimate that percentage, and I think these errors are more likely on the Right and Left, respectively. So by my count that's 5 in favor of the Left, plus 3 neutral.

Item #6 is about whether *all* (my emphasis) U.S. universities are legally permitted to consider race in admissions. It's not clear to me why it's more important that this item be about *all* U.S. universities instead of about *some* or *most* U.S. universities. I think that it's reasonable to suspect that persons on the political Right will overestimate the prevalence of affirmative action and that persons on the political Left will underestimate the prevalence of affirmative action, so by my count that's 6 in favor of the Left, plus 3 neutral.

I'm not sure that items #9 and #10 have much of a bias (number of Muslims in the United States, and the country that has the largest number of Muslims), other than to potentially favor Muslims, given that the items measure knowledge of neutral facts about Muslims. So by my count that's 6 in favor of the Left, plus 5 neutral.

I'm not sure what "social group" item #8 is supposed to be about, which is about whether Barack Obama was born in the United States. I'm guessing that a good percentage of "misinformed" responses for this item are insincere. Even if it were a good idea to measure insincere responses to test a hypothesis about misinformation, I'm not sure why it would be a good idea to not also include a corresponding item about a false claim that, like the Obama item, is known to be more likely to be accepted among the political Left, such as items about race and killings by police. So I'll up the count to 7 in favor of the Left, plus 5 neutral.

Item #4 might reasonably be described as favoring the political Right, in the sense that I think that persons on the Right would be more likely to prefer that Whites have a lower imprisonment rate than Blacks and Hispanics. But the item has this unusual element of precision ("six times", "more than twice") that isn't present in items about hazardous waste and about federal assistance, so that, even if persons on the Right stereotypically guess correctly that Blacks and Hispanics have higher imprisonment rates than Whites, these persons still might not be sure that the "six times" and "more than twice" are correct.

So even though I think that this item (#4) can reasonably be described as favoring the political Right, I'm not sure that it's as easy for the Right to use political preferences to correctly guess this item as it is for the Left to use political preferences to correctly guess the hazardous waste item and the federal assistance items. But I'll count this item as favoring the Right, so by my count that's 7 in favor of the Left, 1 in favor of the Right, plus 5 neutral.

Item #3 is about whether the U.S. Census Bureau projects ethnic and racial minorities to be a majority in the United States by 2042. I think that it's reasonable that a higher percentage of persons on the political Left than the political Right would prefer this projection to be true, but maybe fear that the projection is true might bias this item in favor of the Right. So let's be conservative and count this item as favoring the Right, so that my coding of the overall distribution for the 14 misinformation items is: seven items favoring the Left, two items favoring the Right, and five politically neutral items.

4. The ANES 2020 Time Series Study has similar biases in its set of misinformation items.

Tagged with: , , , ,

Meta-Psychology has published my manuscript "Perceived Discrimination against Black Americans and White Americans".

---

Norton and Sommers 2011 presented evidence for a claim that has become widely cited, that "Whites have now come to view anti-White bias as a bigger societal problem than anti-Black bias". I was skeptical of that claim, so I checked the data for the American National Election Studies 2012 Time Series Study, which was the most recent ANES Time Series Study available at the time. These ANES data contradicted that claim.

I preregistered an analysis of the then-upcoming ANES 2016 Time Series Study and then preregistered an analysis of an item on a 2017 survey that YouGov conducted for me with different wording for the measure of perceived discrimination.

Each of these three sets of data indicated that a higher percentage of Whites reported the perception that there is more discrimination in the United States today against Blacks than against Whites, compared to the percentage of Whites that reported the perception that there is more discrimination in the United States today against Whites than against Blacks. And it's not particularly close. Here is a new data point: in weighted analyses of data from the ANES 2020 Time Series Study, 63% of non-Hispanic Whites rated discrimination against Blacks as larger than discrimination against Whites, but only 8% of non-Hispanic Whites rated discrimination against Whites as larger than discrimination against Blacks.

The Meta-Psychology article has an explanation for why the Norton and Sommers 2011 claim appears to be incorrect.

---

NOTE

1. Data source: American National Election Studies. 2021. ANES 2020 Time Series Study Preliminary Release: Combined Pre-Election and Post-Election Data [dataset and documentation]. March 24, 2021 version. www.electionstudies.org.

Tagged with: ,

The Journal of Academic Ethics published Kreitzer and Sweet‑Cushman 2021 "Evaluating Student Evaluations of Teaching: A Review of Measurement and Equity Bias in SETs and Recommendations for Ethical Reform".

---

Kreitzer and Sweet‑Cushman 2021 reviewed "a novel dataset of over 100 articles on bias in student evaluations of teaching" (p. 1), later described as "an original database of more than 90 articles on evaluative bias constructed from across academic disciplines" (p. 2), but a specific size of the dataset/database is not provided.

I'll focus on the Kreitzer and Sweet‑Cushman 2021 discussion of evidence for an "equity bias".

---

Footnote 4

Let's start with Kreitzer and Sweet‑Cushman 2021 footnote 4:

Research also finds that the role of attractiveness is more relevant to women, who are more likely to get comments about their appearance (Mitchell & Martin, 2018; Key & Ardoin, 2019). This is problematic given that attractiveness has been shown to be correlated with evaluations of instructional quality (Rosen, 2018)

Mitchell and Martin 2018 reported two findings about comments on instructor appearance. MM2018 Table 1 reported on a content analysis of official university course evaluations, which indicated that 0% of comments for the woman instructor and 0% of comments for the man instructor were appearance-related. MM2018 Table 2 reported on a content analysis of Rate My Professors comments, which indicated that 10.6% of comments for the woman instructor and 0% of comments for the man instructor were appearance-related, with p<0.05 for the difference between the 10.6% and the 0%.

So Kreitzer and Sweet‑Cushman 2021 footnote 4 cited the p<0.05 Rate My Professors finding but not the zero result for the official university course evaluations, even though official university course evaluations are presumably much more informative about bias in student evaluations as used in practice, compared to Rate My Professors comments that presumably are unlikely to be used for faculty tenure, promotion, and end-of-year evaluations.

Note also that Kreitzer and Sweet‑Cushman 2021 reported this Rate My Professors appearance-related finding without indicating the low quality of the research design: Mitchell and Martin 2018 compared comments about one woman instructor (Mitchell herself) to comments about one man instructor (Martin himself), from a non-experimental research design.

Moreover, the p<0.05 evidence for this "appearance" finding from Mitchell and Martin is based on an error by Mitchell and/or Martin. I blogged about the error in 2019, and MM2018 was eventually corrected (26 May 2020) to indicate that there is insufficient evidence (p=0.3063) to infer than the 10.6 percentage point gender difference in appearance-related comments is inconsistent enough with chance. However, Kreitzer and Sweet‑Cushman 2021 (accepted 27 Jan 2021) cited this "appearance" finding from the uncorrected version of the article.

---

And I'm not sure what footnote 4 is referencing in Key and Ardoin 2019. The closest Key and Ardoin 2019 passage that I see is below:

Another telling point is whether students comment on the faculty member's teaching and expertise, or on such personal qualities as physical appearance or fashion choices. Among the students who'd received the bias statement, comments on female faculty were substantially more likely to be about the teaching.

But this Key and Ardoin 2019 passage is about a difference between groups in comments about female faculty (involving personal qualities and not merely comments on appearance), and does not compare comments about female faculty to comments about male faculty, which is what would be needed to support the Kreitzer and Sweet‑Cushman 2021 claim in footnote 4.

---

And for the Kreitzer and Sweet‑Cushman 2021 claim that "the role of attractiveness is more relevant to women", consider this passage from Hamermesh and Parker (2005: 373):

The reestimates show, however, that the impact of beauty on instructors' course ratings is much lower for female than for male faculty. Good looks generate more of a premium, bad looks more of a penalty for male instructors, just as was demonstrated (Hamermesh & Biddle, 1994) for the effects of beauty in wage determination.

This finding is the *opposite* of the claim that "the role of attractiveness is more relevant to women".

Kreitzer and Sweet‑Cushman 2021 cited Hamermesh and Parker 2005 elsewhere, so I'm not sure why Kreitzer and Sweet‑Cushman 2021 footnote 4 claimed that "the role of attractiveness is more relevant to women" without at least noting the contrary evidence from Hamermesh and Parker 2005.

---

"...react badly when those expectations aren't met"

From Kreitzer and Sweet‑Cushman 2021 (p. 4):

Students are also more likely to expect special favors from female professors and react badly when those expectations aren't met or fail to follow directions when they are offered by a woman professor (El-Alayli et al., 2018; Piatak & Mohr, 2019).

From what I can tell, neither Piatak and Mohr 2019 nor Study 1 of El-Alayli et al 2018 support the "react badly when those expectations aren't met" part of this claim. I think that this claim refers to the "negative emotions" measure of El-Alayli et al 2018 Study 2, but I don't think that the El-Alayli et al 2018 data support that inference.

El-Alayli et al 2018 *claimed* that there was a main effect of professor gender for the "negative emotions" measure, but I think that that claim is incorrect: the relevant means in El-Alayli et al 2018 Table 1 are 2.38 and 2.28, with a sample size of 121 across two conditions and corresponding standard deviations of 0.93 and 0.93, so that there is insufficient evidence of a main effect of professor gender for that measure.

---

"...no discipline where women receive higher evaluative scores"

From Kreitzer and Sweet‑Cushman 2021 (p. 4):

Rosen (2018), using a massive (n = 7,800,000) Rate My Professor sample, finds there is no discipline where women receive higher evaluative scores.

I think that the relevant passage from Rosen 2018 is:

Importantly, out of all the disciplines on RateMyProfessors, there are no fields where women have statistically higher overall quality scores than men.

But this claim is based on an analysis limited to instructors rated "not hot", so Rosen 2018 doesn't support the Kreitzer and Sweet‑Cushman 2021 claim, which was phrased without that "not hot" caveat.

My concern with limiting the analysis to "not hot" instructors was that Rosen 2018 indicated that "hot" instructors on average received higher ratings than "not hot" instructors and that a higher percentage of women instructors than of men instructors received a "hot" rating. Thus, it seemed plausible to me that restricting the analysis to "not hot" instructors removed a higher percentage of highly-rated women than of highly-rated men.

I asked Andrew S. Rosen about gender comparisons by field for the Rate My Professors ratings for all professors and not limited to "not hot" professors, and he indicated that, of the 75 fields with the largest number of Rate My Professors ratings, men faculty had a higher mean overall quality rating at p<0.05 than women faculty did in many of these fields, but that, in exactly one of these fields (mathematics), women faculty had a higher mean overall quality rating at p<0.05 than men faculty did, with women faculty in mathematics also having a higher mean clarity rating and a higher mean helpfulness rating than men faculty in mathematics (p<0.05). Thanks to Andrew S. Rosen for the information.

By the way, the 7.8 million sample size cited by Kreitzer and Sweet‑Cushman 2021 is for the number of ratings, but I think that the more relevant sample size is the number of instructors who were rated.

---

"designs", plural

From Kreitzer and Sweet‑Cushman 2021 (p. 4):

Experimental designs that manipulate the gender of the instructor in online teaching environments have even shown that students offered lower evaluations when they believed the instructor was a woman, despite identical course delivery (Boring et al., 2016; MacNell et al., 2015).

The plural "experimental designs" and the citation of two studies suggests that one of these studies replicated the other study, but, regarding this "believed the instructor was a woman, despite identical course delivery" research design, Boring et al. 2016 merely re-analyzed data from MacNell et al. 2015, so the two cited studies are not independent of each other such that a plural "experimental designs" would be justified.

And Kreitzer and Sweet‑Cushman 2021 reported the finding without mentioning shortcomings of the research design, such as a sample size small enough (N=43 across four conditions) to raise reasonable questions about the replicability of the result.

---

Discussion

I think that it's plausible that there are unfair equity biases in student evaluations of teaching, but I'm not sure that Kreitzer and Sweet‑Cushman 2021 is convincing about that.

My reading of the literature on unfair bias in student evaluations of teaching is that the research isn't of consistently high enough quality that a credulous review establishes anything: a lot of the research designs don't permit causal inference of unfair bias, and a lot of the research designs that could permit causal inference have other flaws.

Consider the uncorrected Mitchell and Martin 2018: is it plausible that a respectable peer-reviewed journal would publish results from a similar research design that claimed no gender bias in student comments, in which the data were limited to a non-experimental comparison of comments about only two instructors? Or is it plausible that a respectable peer-reviewed journal would publish a four-condition N=43 version of MacNell et al. 2015 that found no gender bias in student ratings? I would love to see these small-N null-finding peer-reviewed publications, if they exist.

But maybe non-experimental "N=2 instructors" studies and experimental "N=43 students" studies that didn't detect gender bias in student evaluations of teaching exist, but haven't yet been published. If so, then did Kreitzer and Sweet‑Cushman try to find them? From what I can tell, Kreitzer and Sweet‑Cushman 2021 does not indicate that the authors solicited information about unpublished research through, say, posting requests on listservs or contacting researchers who have published on the topic.

I plan to tweet a link to this post tagging Dr. Kreitzer and Dr. Sweet‑Cushman, and I'm curious to see whether Kreitzer and Sweet‑Cushman 2021 is corrected or otherwise updated to address any of the discussion above.

Tagged with: ,

Sex Roles published El-Alayli et al 2018 "Dancing Backwards in High Heels: Female Professors Experience More Work Demands and Special Favor Requests, Particularly from Academically Entitled Students".

El-Alayli et al 2018 discussed their research design for Study 2 as follows (pp. 141-142):

The name of the professor, as well as the use of gendered pronouns in some of the subsequent survey questions, served as our professor gender manipulation, and participants were randomly assigned to one of the two experimental conditions...After reviewing the profile, participants were given seven scenarios that involved imagining special favor requests that could be asked of the professor...For each scenario, participants were first asked to indicate how likely they would be to ask for the special favor on a scale from 1 (Not at all likely) to 6 (Extremely likely). Using the same response scale, participants were then asked the likelihood that they would expect the professor to say "yes", ...

El-Alayli et al 2018 discussed the results for this item (p. 143, emphasis added):

There was a statistically significant main effect of professor gender on expectations, F(1, 117) = 5.68, p = .019 (b = −.80, SE = .34), such that participants were more likely to expect a "yes" response to the special favor requests when the professor had a woman's name than when the professor had a man's name. (Refer to Table 1 for condition means for all dependent measures.)

El-Alayli et al 2018 Table 1 reports that, for this "Expecting 'Yes'" item, the mean was 2.12 for the female professor and 2.05 for the male professor, with corresponding standard deviations of 0.80 and 0.66. The sample size was 121 total participants after exclusions (p. 141), so it wasn't clear to me how these data could produce a p-value of 0.019 or a b of -0.80 for the main effect of professor gender.

---

I suspect that the -0.80 is not a main effect of professor gender but is instead the predicted effect of professor gender when the other term in the interaction (academic entitlement) is zero (with the lowest level of the academic entitlement scale being 1, see p. 142).

From El-Alayli et al 2018 (p. 143):

...the professor gender × academic entitlement interaction was statistically significant, F(1, 117) = 7.80, p = .006 (b = .38, SE = .14, ΔR2 = .06).

El-Alayli et al 2018 Table 1 indicates that the mean for academic entitlement is 2.27 for the male professor and 2.27 for the female professor, with corresponding standard deviations of 1.00 and 0.87. I'll ballpark 0.93 as the combined standard deviation.

From El-Alayli et al 2018 (p. 143):

Students had a stronger expectation of request approval from the female professor than from the male professor when they had a high level (+1 SD) of academic entitlement, t = 2.37, p = .020 (b = .42, SE = .18, 95% CI [.07, .78]), but not when they had average, t = .54, p = .590 (b = .07, SE = .13, 95% CI [−.18, .32]) or low (−1 SD) levels of entitlement, t = −1.61, p = .111 (b = −.29, SE = .18, 95% CI [−.64, .07]).

So the above passage provides three data points:

X1 = +1 SD = 2.27 + 0.93 = 3.20 || Y1 = 0.42

X2 = average = 2.27 || Y2 = 0.07

X3 = -1 SD = 2.27 - 0.93 = 1.34 || Y = -0.29

I used an OLS regression to predict these Ys using the Xs: to two decimal places, the X coefficient was 0.38 (which equals the coefficient on the interaction term), and the constant was -0.80 (which equals the purported "main effect"); however, in this regression -0.80 is the predicted value of Y when X (academic entitlement) is zero.

I'll add a highlight to the passage quoted above from El-Alayli et al 2018 (p. 143) to indicate what I think the main effect is:

Students had a stronger expectation of request approval from the female professor than from the male professor when they had a high level (+1 SD) of academic entitlement, t = 2.37, p = .020 (b = .42, SE = .18, 95% CI [.07, .78]), but not when they had average, t = .54, p = .590 (b = .07, SE = .13, 95% CI [−.18, .32]) or low (−1 SD) levels of entitlement, t = −1.61, p = .111 (b = −.29, SE = .18, 95% CI [−.64, .07]).

The b=0.07 is equal to the difference between the aforementioned means of 2.12 for the female professor and 2.05 for the male professor.

---

The subscripts for El-Alayli et al 2018 Table 1 indicate that p<0.05 for the main effect of professor gender for four of the six items, but I don't think that p<0.05 for the main effect of professor gender any of those four items.

Moreover, I think that González-Morales 2019 incorrectly described the El-Alayli et al 2018 results as applying to all students with a stronger effect among academically entitled students, instead of the effect being detected only among academically entitled students:

In a recent experimental study, El-Alayli, Hansen-Brown, and Ceynar (2018) found that when students identified a fictitious professor as a woman, they expected that this professor would respond positively to requests for special favors or accommodations. This effect was stronger among academically entitled students.

Tagged with:

Social Science Quarterly recently published Cooper et al. 2021 "Heritage Versus Hate: Assessing Opinions in the Debate over Confederate Monuments and Memorials". The conclusion of the article notes that:

...we uncover significant evidence that the debate over Confederate monuments can be resoundingly summarized as "hate" over "heritage"

---

In a prior post, I noted that:

...when comparing the estimated effect of predictors, inferences can depend on how well each predictor is measured, so such analyses should discuss the quality of the predictors.

Cooper et al. 2021 measured "heritage" with a dichotomous predictor and measured "hate" with a five-level predictor, and this difference in the precision of the measurements could have biased their research design toward a larger estimate for hate than for heritage. [See note 3 below for a discussion].

I'm not suggesting that the entire difference between their estimates for heritage and hate is due to the number of levels of the predictors, but I think that a better peer review would have helped eliminate that flaw in the research design, maybe by requiring the measure of hate to be dichotomized as close as possible to 70/30 like the measure of heritage was.

---

Here is the lone measure of heritage used in Cooper et al. 2021:

"Do you consider yourself a Southerner, or not?"

Table 1 of the article indicates that 70% identified as a Southerner, so even if this were a face-valid measure of Southern heritage, the measure places into the highest level of Southern heritage persons at the 35th percentile of Southern heritage.

Maybe there is more recent data that undercuts this, but data from the Spring 2001 Southern Focus Poll indicated that only about 1 in 3 respondents who identified as a Southerner indicated that being a Southerner was "very important" to them. About 1 in 3 respondents who identified as a Southerner in that 2001 poll indicated that being a Southerner was "not at all important" or "not very important" to them, and I can't think of a good reason why, without other evidence, these participants belong in the highest level of a measure of Southern heritage.

---

Wright and Esses 2017 had a more precise measure for heritage and found sufficient evidence to conclude that (p. 232):

Positive attitudes toward the Confederate battle flag were more strongly associated with Southern pride than with racial attitudes when accounting for these covariates.

How does Cooper et al. 2021 address the Wright and Esses 2017 result, which conflicts with the result from Cooper et al. 2021 and which used a related outcome variable and a better measure of heritage? The Cooper et al. 2021 article doesn't even mention Wright and Esses 2017.

---

A better peer review might have caught the minimum age of zero years old in Table 1 and objected to the description of "White people are currently under attack in this country" as operationalizing "racial resentment toward blacks" (pp. 8-9), given that this item doesn't even mention or refer to Blacks. I suppose that respondents who hate White people would be reluctant to agree that White people are under attack regardless of whether that is true. But that's not the "hate" that is supposed to be measured.

Estimating the effect of "hate" for this type of research should involve comparing estimates net of controls for respondents who have a high degree of hate for Blacks to respondents who are indifferent to Blacks. Such estimates can be biased if the estimates instead include data from respondents who have more negative feelings about Whites than about Blacks. In a prior post, I discussed Carrington and Strother 2020, which measured hate with a Black/White feeling thermometer difference and thus permitted estimation of how much of the effect of hate is due to respondents rating Blacks higher than Whites on the feeling thermometers.

---

Did Cooper et al. have access to better measures of hate than the item "White people are currently under attack in this country"? The Winthrop Poll site didn't list the Nov 2017 survey on its archived poll page for 2017. But, from what I can tell, this Winthrop University post discusses the survey, which included a better measure of racial resentment toward blacks. I don't know what information the peer reviewers of Cooper et al. 2021 had access to, but, generally, a journal reform that I would like to see for manuscripts reporting on a survey is for peer reviewers to be given access to the entire set of items for a survey.

---

In conclusion, for a study that compares the estimated effects of heritage and hate, I think that at least three things are needed: a good measure of heritage, a good measure of hate, and the good measure of heritage being of similar quality to the good measure of hate. I don't think that Cooper et al. 2021 has any of those things.

---

NOTES

1. The Spring 2001 Southern Focus Poll study was conducted by the Odum Institute for Research in Social Science of the University of North Carolina at Chapel Hill. Citation: Center for the Study of the American South, 2001, "Southern Focus Poll, Spring 2001", https://hdl.handle.net/1902.29/D-31552, UNC Dataverse, V1.

2. Stata output.

3. Suppose that mean support for leaving Confederate monuments as they are were 70% among the top 20 percent of respondents by Southern pride, 60% among the next 20 percent of respondents by Southern pride, 50% among the middle 20 percent, 40% among the next 20 percent, and 30% among the bottom 20 percent of respondents by Southern pride. And let's assume that these bottom 20 percent are indifferent about Southern pride and don't hate Southerners.

The effect of Southern pride could be estimated at 40 percentage points, which is the difference in support among the top 20 percent and bottom 20 percent by Southern pride. However, if we grouped the top 60 percent together and the bottom 40 percent together, the mean percentage support would respectively be 60% and 35%, for an estimated effect of 25 percentage points. In this illustration, the estimated effect for the five-level predictor is larger than the estimate for the dichotomous predictor, even with the same data.

Here is a visual illustration:

The above is a hypothetical to illustrate the potential bias in measuring one predictor with five levels and another predictor with two levels. I have no idea whether this had any effect on the results reported in Cooper et al. 2021. But, with a better peer review, readers would not need to worry about this type of bias in the Cooper et al. 2021 research design.

Tagged with: , , ,

The journal Politics, Groups, and Identities recently published Mangum and Block Jr. 2021 "Perceived racial discrimination, racial resentment, and support for affirmative action and preferential hiring and promotion: a multi-racial analysis".

---

The article notes that (p. 13):

Intriguingly, blame [of racial and ethnic minorities] tends to be positively associated with support for preferential hiring and promotion, and, in 2008, this positive relationship is statistically significant for Black and Asian respondents (Table A4; lower right graph in Figure 6). This finding is confounding...

But from what I can tell, this finding might be because the preferential hiring and promotion outcome variable was coded backwards to the intended coding. Table 2 of the article indicates that a higher percentage of Blacks than of Whites, Hispanics, and Asians favored preferential hiring and promotion, but Figures 1 and 2 indicate that a lower percentage of Blacks than of Whites, Hispanics, and Asians favored preferential hiring and promotion.

My analysis of data for the 2004 National Politics Study indicated that the preferential hiring and promotion results in Table 2 are correct for this survey and that blame of racial and ethnic minorities negatively associates with favoring preferential hiring and promotion.

---

Other apparent errors in the article include:

Page 4:

Borrowing from the literature on racial resentment possessed (Feldman and Huddy 2005; Kinder and Sanders 1996; Kinder and Sears 1981)...

Figures 3, 4, 5, and 6:

...holding control variable constant

Page 15:

African Americans, Hispanics, and Asians support affirmative action more than are Whites.

Page 15:

Preferential hiring and promotion is about who deserves special treatment than affirmative action, which is based more on who needs it to overcome discrimination.

Note 2:

...we code the control variables to that they fit a 0-1 scale...

---

Moreover, the article indicates that "the Supreme Court ruled that affirmative action was constitutional in California v. Bakke in 1979", which is not the correct year. And the article seems to make inconsistent claims about affirmative action: "affirmative action and preferential hiring and promotion do not benefit Whites" (p. 15), but "White women are the largest beneficiary group (Crosby et al. 2003)" (p. 13).

---

At least some of these flaws seem understandable. But I think that the number of flaws in this article is remarkably high, especially for a peer-reviewed journal with such a large editorial group: Politics, Groups, and Identities currently lists a 13-member editorial team, a 58-member editorial board, and a 9-member international advisory board.

---

NOTES

1. The article claims that (p. 15):

Regarding all races, most of the racial resentment indicators are significant statistically and in the hypothesized direction. These findings lead to the conclusion that preferential hiring and promotion foster racial thinking more than affirmative action. That is, discussions of preferential hiring and promotion lead Americans to consider their beliefs about minorities in general and African Americans in particular more than do discussions of affirmative action.

However, I'm not sure of how the claim that "preferential hiring and promotion foster racial thinking more than affirmative action" is justified by the article's results regarding racial resentment.

Maybe this refers to the slopes being steeper for the preferential hiring and promotion outcome than for the affirmative action outcome, but it would be a lot easier to eyeball slopes across figures if the y-axes were consistent across figures; instead, the y-axes run from .4 to .9 (Figure 3), .4 to 1 (Figure 4), .6 to 1 (Figure 5), and .2 to 1 (Figure 6).

Moreover, Figure 1 is a barplot that has a y-axis that runs from .4 to .8, and Figure 2 is a barplot that has a y-axis that runs from .5 to .9, with neither barplot starting at zero. It might make sense for journals to have an editorial board member or other person devoted to reviewing figures, to eliminate errors and improve presentation.

For example, the article indicates that (p. 6):

Figures 1 and 2 display the distribution of responses for our re-coded versions of the dependent variables graphically, using bar graphs containing 95% confidence intervals. To interpret these graphs, readers simply check to see if the confidence intervals corresponding to any given bar overlap with those of another.

But if the intent is to use confidence interval overlap to assess whether there is sufficient evidence at p<0.05 of a difference between groups, then confidence intervals closer to 85% are more appropriate. I haven't always known this, but this does seem to be knowledge that journal editors should use to foster better figures.

2. Data citation:

James S. Jackson, Vincent L. Hutchings, Ronald Brown, and Cara Wong. National Politics Study, 2004. ICPSR24483-v1. Ann Arbor, MI: Bibliographic Citation: Inter-university Consortium for Political and Social Research [distributor], 2009-03-23. doi:10.3886/ICPSR24483.v1.

Tagged with: , ,