October 2022

I reached ten new publications to comment on that I didn't think were worth a separate blog post, so here goes:

---

The Twitter account for the journal Politics, Groups, and Identities retweeted R.G. Cravens linking to two of his articles in Politics, Groups, and Identities. I blogged about one of these articles, discussing, among other things, the article's erroneous interpretation of interaction terms. The other article that R.G. Cravens linked to in that tweet ("The view from the top: Social acceptance and ideological conservatism among sexual minorities") also misinterpreted an interaction term:

However, the coefficient estimate for the interaction term between racial minority identity and racial identity group consciousness (β = −.312, p = .000), showing the effect of racial identity group consciousness only among racial minority respondents, indicates a negative relationship between racial minority group consciousness and conservatism at the 99% confidence level.

The corresponding Table 1 coefficient for RI Consciousness is 0.117, indicating the estimated effect of racial identity consciousness when the "Racial minority" variable is set to zero. The -0.312 interaction term indicates how much the estimated effect of racial identity consciousness *differs* between non-racial minorities and racial minorities, so that the estimated effect of racial identity consciousness among racial minorities is 0.117 plus -0.312, which is -0.195.

Two articles by one author in the same journal within three years, and each article misinterpreted an interaction term.

---

PS: Political Science & Politics published another article about student evaluations of teaching: Foster 2022 "Instructor name preference and student evaluations of instruction". The key finding seems plausible, that "SEIs were higher for instructors who preferred going by their first name...than for instructors who preferred going by 'Dr. Moore'" (p. 4).

But a few shortcomings about the reporting on the experiment in Study 2, which manipulated the race of an instructor, the gender of the instructor, and the instructor's stated preference for using his/her first name versus using his/her title and last name:

* Hypothesis 5 is about conservative Republicans:

Moderated mediation: We predict that female instructors who express a preference for going by "Dr. Moore" will have lower teacher ratings through decreased perceived immediacy, but only for students who identify as conservative and Republican.

But, as far as I can tell, the article doesn't report any data about Hypothesis 5.

* Table 2 indicates a positive p<0.05 correlation between the race of the instructor and SEIs (student evaluations of instruction) and a positive p<0.05 correlation between the race of the instructor and course evaluations. But, as far as I can tell, the article doesn't report how the race variable was coded, so it's not clear whether the White instructors or the Black instructors had the higher SEIs and course evaluations.

* The abstract indicates that:

Study 2 found the highest SEIs for Black male instructors when instructors asked students to call them by their first name, but there was a decrease in SEI scores if they went by their professional title.

But, as far as I can tell, the article doesn't report sufficient evidence about whether the estimated influence of the name preference among the Black male instructor targets differed from the estimated influence of the name preference among any of the comparison instructors. The p-value being under p=0.05 for the Black male instructor targets and not being under p=0.05 for the other instructor targets isn't enough evidence to infer at p<0.05 that participants treated the Black male instructor targets differently than participants treated the comparison instructor targets, so that the article doesn't report sufficient evidence to permit an inference of racial discrimination.

---

The Journal of Race, Ethnicity, and Politics published Enders and Thornton 2022 "Biased interviewer assessments of respondent knowledge based on perceptions of skin tone", which used data from the 2012 American National Election Studies Time Series Study, which seems really similar to the analysis in Hannon 2015 "White colorism". From Hannon 2015:

Using data from the 2012 American National Election Study, an example is presented on white interviewers' perceptions of minority respondent skin tone and intelligence (N = 223). Results from ordinal logistic regression analyses indicate that African American and Latino respondents with the lightest skin are several times more likely to be seen by whites as intelligent compared with those with the darkest skin.

Enders and Thornton 2022 cited three publications that had Hannon as a co-author: Hannon L and Deffna R 2014, Hannon L and DeFina R 2016, and Hannon L, Defina R and Bruch S 2013. But no citation to Hannon 2015.

And if it seems coincidental that Hannon would have "Deffna R", "DeFina R", and "Defina R" as co-authors, these all seem to be Robert DeFina.

---

From the abstract of Evans et al 2022 "Regions of discrimination: felony records, race, and expressed college admissions policies", in the Journal of Crime and Justice, about an audit experiment:

Findings indicate that admissions departments are more likely to tell an interested applicant with a stereotypical Black name and a non-violent felony record that their criminal histories will be considered in the application process compared to another prospective applicant with a stereotypical White name and non-violent felony record.

But coded responses in Table 3 indicate that 41% of responses to "Tyrone" indicated that a felony conviction is not considered and that 41% of responses to "Christopher" indicated that a felony conviction is not considered.

Table 4 indicates that, in an analysis with statistical control, "Tyrone" was more likely than "Christopher" to be told that a felony conviction is considered in a holistic review than to be told that a felony conviction is not considered, but that Table 4 analysis split the coding into three categories, of "Not considered", "Considered in a holistic review", and "Considered in a discretionary review", so that analysis doesn't seem to support the general abstract claim about "considered in the application process".

And I'm not sure that there is an important distinction between holistic and discretionary. Here is an example that Evans et al 2022 indicates is "discretionary":

Thank you for your interest and for reaching out. ____ is a residential college and students are required to live on campus all four years. We must ensure the safety of all of our constituents so we do take into account criminal history when reviewing for admission (coded as discretionary)

So it's not like "discretionary" means only that the criminal history might be taken into account.

---

I wasn't the only person to notice this next one (see tweets from Tom Pepinsky and Brendan Nyhan), but Politics & Gender recently published Forman-Rabinovici and Mandel 2022 "The prevalence and implications of gender blindness in quantitative political science research", which indicated that:

Our findings show that gender-sensitive analysis yields more accurate and useful results. In two out of the three articles we tested, gender-sensitive analysis indeed led to different outcomes that changed the ramifications for theory building as a result.

But the inferential technique in the analysis reflected a common error.

For the first of the three aforementioned articles (Gandhi and Ong 2019), Table 1a of Forman-Rabinovici and Mandel 2022 reported results with a key coefficient that was -.308 across the sample, was -.294 (p=.003) among men in the sample, and was -.334 (p=.154) among women in the sample. These estimates are from a linear probability model predicting a dichotomous "Support PH" outcome, so the point estimates were 29 percentage points among men and 33 percentage points among women.

The estimate was more extreme among women than among men, but the estimate was less precise among women than among men, at least partly because the sample size among men (N=1902) was about three times the sample size among women (N=652).

Figure 1 of Forman-Rabinovici and Mandel 2022 described these results as:

Male voters leave PH coalition

Female voters continue to vote for PH coalition

But, in my analysis of the data, the ends of the 95% confidence interval for the estimate among women indicated an 82 percentage point decrease and a 15 percentage point increase [-0.82, +0.15], so that's not nearly enough evidence to infer a lack of an effect among women.

---

Politics & Gender published another article that has at least a misleading interpretation of interaction terms: Kreutzer 2022 "Women's support shaken: A study of women's political trust after natural disasters".

Table 1 reports results for three multilevel mixed-effects linear regressions, with coefficients on a "Number of Disasters Present" predictor of 0.017, 0.009, and 0.022. The models have a predictor for "Female" and an interaction of "Female" and "Number of Disasters Present" with interaction coefficients of –0.001, –0.002, and –0.001. So the combination of coefficients indicates that the associations of "Number of Disasters Present" and the "trust" outcomes are positive among women, but not as positive as the associations are among men.

Kreutzer 2022 discusses this correctly in some places, such as indicating that the interaction term "allows a comparison of how disasters influence women's political trust compared with men's trust" (p. 15). But in other places the interpretation is, I think, incorrect or at least misleading, such as in the abstract (emphasis added):

I investigate women's trust in government institutions when natural disasters have recently occurred and argue that because of their unique experiences and typical government responses, women's political trust will decline when there is a natural disaster more than men's. I find that when there is a high number of disasters and when a larger percentage of the population is affected by disasters, women's political trust decreases significantly, especially institutional trust.

Or from page 23:

I have demonstrated that natural disasters create unique and vulnerable situations for women that cause their trust in government to decline.

And discussing Figure 5, referring to a different set of three regressions (reference to footnote 12 omitted):

The figure shows a small decline in women's trust (overall, institutional, organizational) as the percentage of the population affected by disasters in the country increases. The effect is significantly different from 0, but the percentage affected seems not to make a difference.

That seems to say that the percentage of the population affected has an effect that is simultaneously not zero and does not seem to make a difference. I think Figure 5 marginal effects plots indicate that women have lower trust than men (which is why each point estimate line falls in the negative range), but that this gender difference in trust does not vary much by the percentage of the population affected (which is why the each point estimate line is pretty much flat).

---

The "Women's Political Empowerment Index" coefficient and standard error are –0.017 and 0.108 in Model 4, so maybe the ** indicating a two-tailed p<0.01 is an error.

Tweet to the author (Oct 3). No reply yet.

---

7, 8.

Let's return to Politics, Groups, and Identities, for Ditonto 2019 "Direct and indirect effects of prejudice: sexism, information, and voting behavior in political campaigns". From the abstract:

I also find that subjects high in sexism search for less information about women candidates...

At least in the reported analyses, the comparison for "less" is to participants low in sexism instead of to male candidates. So we get this result discussing Table 2 (pp. 598-599):

Those who scored lowest in sexism are predicted to look at approximately 13 unique information boxes for the female candidate, while those who scored highest are predicted to access about 10 items, or almost 1/3 less.

It should be obvious to peer reviewers and any editors that a comparison to the male candidates in the experiment would be a more useful comparison for assessing the effect of sexism, because, for all we know, respondents high in sexism might search for less information than respondents low in sexism search for, no matter the gender of the candidate.

Ditonto has another 2019 article in a different journal (Political Psychology) based on the same experiment: "The mediating role of information search in the relationship between prejudice and voting behavior". From that abstract:

I also find that subjects high in prejudice search for less information about minority candidates...

But, again, Table 2 in that article merely indicates that symbolic racism negatively associates with information search for a minority candidate, with no information provided about information search for a non-minority candidate.

---

And I think that the Ditonto 2019 abstracts include claims that aren't supported by results reported in the article. The PGI abstract claims that "I find that subjects with higher scores on items measuring modern sexism...rate female candidates more negatively than their male counterparts", and the PP abstract claims that "I find that subjects higher in symbolic racism...rate minority candidates more negatively than their white counterparts".

By the way, claims about respondents high in sexism or racism should be assessed using data only from respondents high in sexism or racism, because the association of a sexism or racism measure with an outcome might be completely due to respondents low in sexism or racism.

Tweet to the author (Oct 9). No reply yet.

---

Below is a passage from "Lower test scores from wildfire smoke exposure", by Jeff Wen and Marshall Burke, published in 2022 in Nature Sustainability:

When we consider the cumulative losses over all study years and across subgroups (Fig. 4b), we estimate the net present value of lost future income to be roughly $544 million (95% CI: −$999 million to −$100 million) from smoke PM2.5 exposure in 2016 for districts with low economic disadvantage and low proportion of non-White students. For districts with high economic disadvantage and high proportion of non-White students, we estimate cumulative impacts to be $1.4 billion (95% CI: −$2.3 billion to −$477 million) from cumulative smoke PM2.5 exposure in 2016. Thus, of the roughly $1.7 billion in total costs during the smokiest year in our sample, 82% of the costs we estimate were borne by economically disadvantaged communities of colour.

So, in 2016, the lost future income was about $0.5 billion for low economic disadvantage / low non-White districts and $1.4 billion for high economic disadvantage / high non-White districts; that gets us to $1.9 billion, without even including the costs from low/high districts and high/low districts. But total costs were cited as roughly $1.7 billion.

From what I can tell from Figure 4b, the percentage of total costs attributed to economically disadvantaged communities of color (the high / high category) is 59%. It's not a large inferential difference from 82%, in that both estimates are a majority, but it's another example of an error that could have been caught by careful reading.

Tweet to the authors about this (Oct 17). No reply yet.

---

10.

Political Research Quarterly published "Opening the Attitudinal Black Box: Three Dimensions of Latin American Elites' Attitudes about Gender Equality", by Amy Alexander, Asbel Bohigues, and Jennifer M. Piscopo.

I was curious about the study's measurement of attitudes about gender equality, and, not unexpectedly, the measurement was not good, using items such as "In general, men make better political leaders than women", in which respondents can agree that men make better political leaders, can disagree that men make better political leaders, and can be neutral about the claim that men make better political leaders...but respondents cannot report the belief that, in general, women make better political leaders than men do.

I checked the data, in case almost no respondent disagreed with the statement that "In general, men make better political leaders than women", in which case presumably no respondent would think that women make better political leaders than men do. But disagreement with the statement was pretty high, with 69% strongly disagreeing, another 15% disagreeing, and another 11% selecting neither agree nor disagree.

I tweeted a question about this to some of the authors (Oct 21). No reply yet.

Social Science & Medicine published Skinner-Dorkenoo et al 2022 "Highlighting COVID-19 racial disparities can reduce support for safety precautions among White U.S. residents", with data for Study 1 fielded in September 2020. Stephens-Dougan had a similar Time-sharing Experiments for the Social Sciences study "Backlash effect? White Americans' response to the coronavirus pandemic", fielded starting in late April 2020 according to the TESS page for the study.

You can check tweets about Skinner-Dorkenoo et al 2022 and what some tweeters said about White people. But you can't tell from the Skinner-Dorkenoo et al 2022 publication or the Stephens-Dougan 2022 APSR article whether any detected effect is distinctive to White people.

Limiting samples to Whites doesn't seem to be a good idea if the purpose is to understand racial bias. But it might be naive to think that all social science research is designed to understand.

---

There might be circumstances in which it's justified to limit a study of racial bias to White participants, but I don't think such circumstances include:

* The Kirgios et al 2022 audit study that experimentally manipulated the race and gender of an email requester, but for which "Participants were 2,476 White male city councillors serving in cities across the United States". In late April, I tweeted a question to the first author of Kirgios et al 2022 about why the city councilor sample was limited to White men, but I haven't yet gotten a reply.

* Studies that collect sufficient data on non-White participants but do not report results from these data in the eventual publications (examples here and here).

* Proposals for federally funded experiments that request that the sample be limited to White participants, such as in the Stephens-Dougan 2020 proposal: "I want to test whether White Americans may be more resistant to efforts to curb the virus and more supportive of protests to reopen states when the crisis is framed as disproportionately harming African Americans".

---

One benefit of not limiting the subject pool by race is to limit unfair criticism of entire racial groups. For example, according to the analysis below from Bracic et al 2022, White nationalism among non-Whites was at least as influential as White nationalism among Whites in predicting support for a family separation policy net of controls:So, to the extent that White nationalism is responsible for support for the family separation policy, that applies to White respondents and to non-White respondents.

Of course, Bracic et al. 2022 doesn't report how the association for White nationalism compares to the association for, say, Black nationalism or Hispanic nationalism or how the association for the gendered nationalist belief that "the nation has gotten too soft and feminine" compares to the association for the gendered nationalist belief that, say, "the nation is too rough and masculine".

---

And consider this suggestion from Rice et al 2022 to use racial resentment items to screen Whites for jury service:

At the practical level, our research raises important empirical and normative questions related to the use of racial resentment items during jury selection in criminal trials. If racial resentment affects jurors' votes and reasoning, should racial resentment items be used to screen white potential jurors?

Given evidence suggesting that Black juror bias is on average at least as large as White juror bias, I don't perceive a good justification to limit this suggestion to White potential jurors, although I think that the Rice et al decision to not report results for Black mock jurors makes it easier to limit this suggestion to White potential jurors.

---

NOTES

1. I caught two flaws in Skinner-Dorkenoo et al 2022, which I discussed on Twitter: [1] For the three empathy items, more than 700 respondents selected "somewhat agree", more than 500 selected "strongly agree", but no respondent selected "agree", suggesting that the data were miscoded. [2] The p-value under p=0.05 for the empathy inference appears to be because the analysis controlled for a post-treatment measure; see the second model referred to by the lead author in the Twitter thread. I didn't conduct a full check of the Skinner-Dorkenoo et al 2022 analysis. Stata code and output for my analyses of Skinner-Dorkenoo et al 2022, with data here. Note the end of the output, indicating that the post-treatment control was affected by the treatment.

2. I have a prior post about the Stephens-Dougan TESS survey experiment reported on in the APSR that had substantial deviations from the pre-analysis plan. On May 31, I contacted the APSR about that and the error discussed at the post. I received an update in September, but the Stephens-Dougan 2022 APSR article hasn't been corrected as of Oct 2.

Month: October 2022

More bad peer/editorial review

Racial attitudes and national anthem protests

The scientific study of White people's biases