Racial resentment and symbolic racism are terms used to describe a set of measures used in racial attitudes research, including statements such as "Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors". This item and at least some of the other racial resentment items confound racism and nonracial ideology; in this "special favors" item, an individualist who believes that everyone should work their way up without special favors would select a response on the same side of the scale as an antiBlack racist who believes that only Blacks should work their way up without special favors.

Feldman and Huddy (2005) concluded that "racial resentment is an inadequate measure of prejudice because it confounds prejudice and political ideology" (p. 181), which is consistent with factor analysis of racial resentment items (Sears and Henry 2003: 271). Some research has addressed this confounding with what Feldman and Huddy (2005: 171) call the multivariate approach, in which the analysis includes statistical control for related ideological values. The logic of this multivariate approach is that racial resentment confounds ideology and antiBlack animus so that controlling for ideology should permit the residual association of racial resentment to be interpreted as the association due to antiBlack animus.

The analysis below approaches from the opposite direction: racial resentment confounds ideology and antiBlack animus so that controlling for antiBlack animus should permit the residual association of racial resentment to be interpreted as the association due to ideology. Moreover, if controls for ideology and for antiBlack animus are both included, then the association of racial resentment with an outcome variable should be zero. But this is not even close to being true, as illustrated below in a figure that reports the association of racial resentment with racial or possibly racialized outcome variables, using different sets of statistical control.

In each panel above, the top estimate indicates the association of racial resentment with the outcome variable controlling for only demographics. The second and third estimates respectively indicate the association of racial resentment with outcome variables after controls for demographics and racial attitudes and after controls for demographics and ideology. The fourth and fifth estimates respectively indicate the association of racial resentment with outcome variables after controls for demographics, ideology, and racial attitudes and after controls for demographics, ideology, and racial animus. The key comparison is between the third estimate and the fourth and fifth estimates: the measures of racial attitudes and racial animus had relatively little impact on the racial resentment estimate once the controls for ideology were included in the analysis. For example, in the top left panel, the coefficient for racial resentment was 0.51 controlling for demographics and ideology, was 0.48 controlling for demographics, ideology, and racial attitudes, and was 0.52 controlling for demographics, ideology, and racial animus. In a common racial resentment association analysis, the 0.51 coefficient controlling for demographics and ideology would be assigned to antiBlack animus, but the addition of seven racial attitudes controls accounted for only 0.03 of the 0.51 coefficient and the inclusion of six antiBlack animus controls did not even reduce the 0.51 coefficient. (see the Notes below for more description on the measurements).

A reasonable critique of the above analysis is that racial resentment taps a form of antiBlack racism that is not captured or is not well captured in the included measures of racial attitudes and racial animus. But, from what I can tell, that is an equally valid criticism of analyses that control for ideology: the nonracial ideology captured in racial resentment measures is not captured or not well captured in the included measures of ideology.

NOTES

1. The sample for the analysis was the 3,261 non-Hispanic Whites who completed face-to-face or online the pre- and post-election surveys, conducted between 8 September 2012 and 24 January 2013, and who were not listwise deleted from a model due to missing data for a variable. Each variable in the analysis was coded to range from 0 to 1. Linear regressions without weights were used to predict values of the outcome variables.

The racial resentment measure summed responses to the four ANES 2012 racial resentment items. Models included demographic controls for participant sex, marital status, age, education level, and household family income. Ideological controls were self-reported partisanship, self-reported ideology, an item about guaranteed jobs, an index of attitudes about the role of government, a moral traditionalism index, an authoritarianism index, and an egalitarianism index.

One set of models included seven controls for racial attitudes: a feeling thermometer difference of ratings of Whites and ratings of Blacks, a rating difference for Blacks and for Whites in general on a laziness stereotype scale, a rating difference for Whites and for Blacks in general on an intelligence stereotype scale, an item rating admiration of Blacks, an item rating sympathy for Blacks, an item measuring the perceived political influence of Blacks relative to Whites, and a difference in ratings of the level of discrimination in the United States today against Whites and against Blacks. Another set of models included six dichotomous controls that attempted to isolate antiBlack animus: a more than 20-point feeling thermometer rating difference in which Whites were rated higher than Blacks and with Whites rated at or above 50 and Blacks rated below 50, a rating of Blacks as lazier in general than Whites, a rating of Whites as more intelligent in general than Blacks, an indication of never feeling sympathy for Blacks, an indication that Blacks have too much influence in American politics but Whites don't, and an indication that there is no discrimination against Blacks in the United States today but that there is discrimination against Whites in the United States today.

2. Code for the analysis is here.

3. Results for the 2016 ANES are below:

4. Code for the 2016 ANES analysis is here.

5. Citations:

American National Election Studies (ANES). 2016. ANES 2012 Time Series Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2016-05-17. https://doi.org/10.3886/ICPSR35157.v1.

American National Election Studies, University of Michigan, and Stanford University. 2017. ANES 2016 Time Series Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2017-09-19. https://doi.org/10.3886/ICPSR36824.v2.

Tagged with: ,

I drafted a manuscript entitled "Six Things Peer Reviewers Can Do To Improve Political Science". It was rejected once in peer review, so I'll post at least some of the ideas to my blog. This first blog post is about comments on the Valentino et al. 2018 "Mobilizing Sexism" Public Opinion Quarterly article. I sent this draft of the manuscript to Valentino et al. on June 11, 2018, limited to the introduction and parts that focus on Valentino et al. 2018; the authors emailed me back comments on June 12, 2018, which Dr. Valentino asked me to post and that I will post after my discussion.

1. Unreported tests for claims about group differences

Valentino et al. (2018) report four hypotheses, the second of which is:

Second, compared to recent elections, the impact of sexism should be larger in 2016 because an outwardly feminist, female candidate was running against a male who had espoused disdain for women and the feminist project (pp. 219-220).

Here is the discussion of their Study 2 results in relation to that expectation:

The pattern of results is consistent with expectations, as displayed in table 2. Controlling for the same set of predispositions and demographic variables as in the June 2016 online study, sexism was significantly associated with voting for the Republican candidate only in 2016 (b = 1.69, p < .05) (p.225).

However, as Gelman and Stern 2006 observed, "comparisons of the sort, 'X is statistically significant but Y is not,' can be misleading" (p. 331). In Table 2 of Valentino et al. 2018, the sexism predictor in the 2016 model had a logit coefficient of 1.69 and a standard error of 0.81, and the p-value under .05 for this sexism predictor provides information about only whether the 2016 sexism coefficient differs from zero; this p-value under .05 does not indicate whether, at p<.05, the 2016 sexism coefficient differs from the imprecisely estimated sexism coefficients of 0.23, 0.94, and 0.34 for 2012, 2008, and 2004. That difference in coefficients between sexism in 2016 and sexism in the other years is what would be needed to test the second hypothesis about the impact of sexism being larger in 2016.

2. No summary statistics reported for a regression-based inference about groups

Valentino et al. 2018 Table 2 indicates that, compared to lower levels of participant modern sexism, higher levels of participant modern sexism associate with a greater probability of a participant reported vote for Donald Trump in 2016. But the article does not report the absolute mean levels of modern sexism among Trump voters or Clinton voters. These absolute mean levels are in the figure below, limited to participants in face-to-face interviews (per Valentino et al. 2019 footnote 8):

Results in the above image indicate that the mean response across Trump voters represented beliefs:

  • that the news media should pay the same amount of attention to discrimination against women that they have been paying lately;
  • that, when women complain about discrimination, they cause more problems than they solve less than half the time;
  • and that, when women demand equality these days, less than half of the time they are actually seeking special favors.

These don't appear to be obviously sexist beliefs in the sense that I am aware of evidence that the beliefs incorrectly or unfairly disadvantage or disparage women or men, but comments are open below if you know of evidence or have an argument that the mean Trump voter response is sexist for any of these three items. Moreover, it's not clear to me that sexism can be inferred based on measures about only one sex; if, for instance, a participant believes that, when women complain about discrimination, they cause more problems than they solve, and the participant also believes that, when men complain about discrimination, they cause more problems than they solve, then it does not seem reasonable to code that person as a sexist, without more information.

---

Response from Valentino et al.

Here is the response that I received from Valentino et al.

1) Your first concern was that we did not discuss one of the conditions in our MTurk study, focusing on disgust. The TESS reference is indeed the same study. However, we did not report results from the disgust condition because we did not theorize about disgust in this paper. Our theory focuses on the differential effects of fear vs. anger. We are in fact quite transparent throughout, indicating where predicted effects are non-significant. We also include a lengthy appendix with several robustness checks, etc. 

2) We never claim all Trump voters are sexist. We do claim that in 2016 gender attitudes are a powerful force, and more conservative scores on these measures significantly increase the likelihood of voting for Trump. The evidence from our work and several other studies supports this simple claim handsomely. Here is a sample of other work that replicates the basic finding in regarding the power of sexism in the 2016 election. Many of these studies use ANES data, as we do, but there are also several independent replications using different datasets. You might want to reference them in your paper. 

Blair, K. L. (2017). Did Secretary Clinton lose to a ‘basket of deplorables’? An examination of Islamophobia, homophobia, sexism and conservative ideology in the 2016 US presidential election. Psychology & Sexuality8(4), 334-355. 

Bock, J., Byrd-Craven, J., & Burkley, M. (2017). The role of sexism in voting in the 2016 presidential election. Personality and Individual Differences119, 189-193. 

Bracic, A., Israel-Trummel, M., & Shortle, A. F. (2018). Is sexism for white people? Gender stereotypes, race, and the 2016 presidential election. Political Behavior, 1-27.

Cassese, E. C., & Barnes, T. D. (2018). Reconciling Sexism and Women's Support for Republican Candidates: A Look at Gender, Class, and Whiteness in the 2012 and 2016 Presidential Races. Political Behavior, 1-24. 

Cassese, E., & Holman, M. R. Playing the woman card: Ambivalent sexism in the 2016 US presidential race. Political Psychology

Frasure-Yokley, L. (2018). Choosing the Velvet Glove: Women Voters, Ambivalent Sexism, and Vote Choice in 2016. Journal of Race, Ethnicity and Politics3(1), 3-25. 

Ratliff, K. A., Redford, L., Conway, J., & Smith, C. T. (2017). Engendering support: Hostile sexism predicts voting for Donald Trump over Hillary Clinton in the 2016 US presidential election. Group Processes & Intergroup Relations, 1368430217741203. 

Schaffner, B. F., MacWilliams, M., & Nteta, T. (2018). Understanding white polarization in the 2016 vote for president: The sobering role of racism and sexism. Political Science Quarterly133(1), 9-34. 

3) We do not statistically compare the coefficients across years, but neither do we claim to do so. We claim the following:

"Controlling for the same set of predispositions and demographic variables as in the June 2016 online study, sexism was significantly associated with voting for the Republican candidate only in 2016 (b = 1.69, p < .05). ...In conclusion, evidence from two nationally representative surveys demonstrates sexism to be powerfully associated with the vote in the 2016 election, for the first time in at least several elections, above and beyond the impact of other typically influential political predispositions and demographic characteristics."

Therefore, we predict (and show) sexism was a strong predictor in 2016 but not in other years. Our test is also quite conservative, since we include in these models all manner of predispositions that are known to be correlated with sexism. In Table 2, the confidence interval around our 2016 estimate for sexism in these most conservative models contains the estimate for 2008 in that analysis, and is borderline for 2004 and 2012, where the impact of sexism was very close to zero. However, the bivariate logit relationships between sexism and Trump voting are much more distinct, with 2016 demonstrating a significantly larger effect than the other years. These results are easy to produce with ANES data.

---

Regarding the response from Valentino et al.:

1. My concern is that the decision about what to focus on in a paper is influenced by the results of the study. If a study has a disgust condition, then a description of the results of that disgust condition should be reported when results of that study are reported; otherwise, selective reporting of conditions could bias the literature.

2. I'm not sure that anything in their point 2 addresses anything my manuscript.

3. I realize that Valentino et al. 2018 did not report or claim to report results for a statistical test comparing the sexism coefficient in 2016 to sexism coefficients in prior years. But that reflects my criticism: that, for the hypothesis that "compared to recent elections, the impact of sexism should be larger in 2016…" (Valentino et al. 2018: 219-220), the article should have reported a statistical test to assess the evidence that the sexism coefficient in 2016 was different than than the sexism coefficient in prior recent elections.

---

NOTE

Code for the figure.

Tagged with:

The 2018 CCES (Cooperative Congressional Election Study, Schaffner et al. 2019) has two items to measure respondent sexism and, in the same grid, two items to measure respondent racism, with responses measured on a five-point scale from strongly agree to strongly disagree:

  • White people in the U.S. have certain advantages because of the color of their skin.
  • Racial problems in the U.S. are rare, isolated situations.
  • When women lose to men in a fair competition, they typically complain about being discriminated against.
  • Feminists are making entirely reasonable demands of men.

The figure below reports the predicted probability of selecting the more liberal policy preference (support or oppose) on the CCES's four environmental policy items, weighted, limited to White respondents, and controlling for respondents' reported sex, age, education, partisan identification, ideological identification, and family income. Blue columns indicate predicted probabilities when controls are set to their means and respondent sexism and racism are set to their minimum values, and black columns indicate predicted probabilities when controls are set to their means and respondent sexism and racism are set to their maximum values.

Rplot01

Below are results replacing the two-item racism measure with the traditional four-item racial resentment measure:

rresent

One possibility is that these strong associations are flukes; but similar patterns appear for the racism items on the 2016 CCES (the 2016 CCES did not have sexism items).

If the strong associations above are not flukes, then I think three possibilities remain: [1] sexism and racism combine to be a powerful *cause* of environmental policy preferences among Whites, [2] this type of associational research design with these items cannot be used to infer causality generally speaking, and [3] this type of associational research design with these items cannot be used to infer causality about environmental policy preferences but could be used to infer causality about other outcome variables, such as approval of the way that Donald Trump is handling his job as president.

If you believe [1], please post in a comment below a theory about how sexism and racism cause substantial changes in these environmental policy preferences. If you believe [3], please post in a comment an explanation why this type of associational research design with these items can be used to make causal inferences for only certain outcome variables and, if possible, a way to determine for which outcome variables a causal inference could be made. If I have omitted a possibility, please also post a comment with that omitted possibility.

NOTES

Stata code.

Tagged with: ,

Gronke et al. (2018) reported in Table 6 that "Gender Bias in Student Evaluations" (Mitchell and Martin 2018, hereafter MM) was, as of 25 July 2018, the PS: Political Science & Politics article with the highest Altmetric score, described as "a measure of attention an article receives" (p. 906, emphasis removed).

The MM research design compared student evaluations of and comments on Mitchell (a woman) to student evaluations of and comments on Martin (a man) in official university course evaluations and on the Rate My Professors website. MM reported evidence that "the language students use in evaluations regarding male professors is significantly different than language used in evaluating female professors" and that "a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific" (p. 648).

I think that there are errors in the MM article that warrant a correction. I mention or at least allude to some or all of these things in a forthcoming symposium piece in PS: Political Science & Politics, but I elaborate below. Comments are open if you see an error in my analyses or inferences.

---

1.

MM Table 1 reports on comparisons of official university course evaluations for Mitchell and for Martin. The table indicates that the sample size was 68, and the file that Dr. Mitchell sent me upon my request has 23 of these comments for Martin and 45 of these comments for Mitchell. Table 1's "Personality" row indicates 4.3% for Martin and 15.6% for Mitchell, which correspond to 1 personality-related comment of 23 comments for Martin and 7 personality-related comments of 45 comments for Mitchell. The table has three asterisks to indicate a p-value less than 0.01 for the comparison of the 4.3% and the 15.6%, but it is not clear how such a low p-value was derived.

I conducted a simulation in R to estimate, given 8 personality-related comments across 68 comments, how often random distribution of these 8 personality-related comments would results in Martin's 23 comments having 1 or fewer personality-related comment. For the simulation, for 10 million trials, I started with eight 1s and sixty 0s, drew 23 of these 68 numbers to represent comments on Martin, and calculated the difference between the proportion of 1s for Martin and the proportion of 1s in the residual numbers (representing comments on Mitchell):

list <- rep_len(NA,10000000)
for (i in 1:10000000){
   comments <- c(rep_len(1,8),rep_len(0,60))
   martin <- sample(comments,23,replace=FALSE)
   diff.prop <- sum(martin)/23 - (8-sum(martin))/45
   list[i] <- diff.prop
}
stack(table(list))

Here are results from the simulation:

   values                 ind
1  290952  -0.177777777777778
2 1412204   -0.11207729468599
3 2788608 -0.0463768115942029
4 2927564  0.0193236714975845
5 1782937   0.085024154589372
6  646247   0.150724637681159
7  135850   0.216425120772947
8   14975   0.282125603864734
9     663   0.347826086956522

The -0.1778 in line 1 represents 0 personality-related comments of 23 comments for Martin and 8 personality-related comments of 45 comments for Mitchell (0% to 17.78%), which occurred 290,952 times in the 10 million simulations (2.9 percent of the time). The -0.1121 in line 2 represents 1 personality-related comment of 23 comments for Martin and 7 personality-related comments of 45 comments for Mitchell (4.3% to 15.6%), which occurred 1,412,204 times in the 10 million simulations (14.1 percent of the time). So the simulation indicated that Martin receiving only 1 or fewer of the 8 personality-related comments would be expected to occur about 17 percent of the time if the 8 personality-related comments were distributed randomly. But recall that the MM Table 1 asterisks for this comparison indicate a p-value less than 0.01.

MM Table 2 reports on comparisons of Rate My Professors comments for Mitchell and for Martin, with a reported sample size of N=54, which is split into sample sizes of 9 for Martin and 45 for Mitchell in the file that Dr. Mitchell sent me upon my request; the nine comments for Martin are still available at the Rate My Professors website. I conducted another simulation in R for the incompetency-related comments, in which corresponding proportions were 0 of 9 for Martin and 3 of 45 for Mitchell (0% to 6.67%).

list <- rep_len(NA,10000000)
for (i in 1:10000000){
   comments <- c(rep_len(1,3),rep_len(0,51))
   martin <- sample(comments,9,replace=FALSE)
   diff.prop <- sum(martin)/9 - (3-sum(martin))/45
   list[i] <- diff.prop
}
stack(table(list))

Here are results from the simulation:

   values                 ind
1 5716882 -0.0666666666666667
2 3595302  0.0666666666666667
3  653505                 0.2
4   34311   0.333333333333333

The -0.0667 in line 1 represents 0 incompetency-related comments of 9 comments for Martin and 3 incompetency-related comments of 45 comments for Mitchell (0% to 6.67%), which occurred 5,716,882 times in 10 million simulations (57 percent of the time). So the simulation indicated that Martin's 9 comments having zero of the 3 incompetency-related comments would be expected to occur about 57 percent of the time if the 3 incompetency-related comments were distributed randomly. The MM Table 2 asterisk for this comparison indicates a p-value less than 0.1.

I have concerns about other p-value asterisks in MM Table 1 and MM Table 2, but I will not report simulations for those comparisons here.

---

2.

MM Table 4 inferential statistics appear to be unadjusted for the lack of independence of some observations. Click here, then Search by Course > Spring 2015 > College of Arts and Sciences > Political Science > POLS 2302 (or click here). Each "Total Summary" row at the bottom has 218 evaluations; for example, the first item of "Overall the instructor(s) was (were) effective" has 43 strongly agrees, 55 agrees, 75 neutrals, 24 disagrees, and 21 strongly disagrees, which suggests that 218 students completed these evaluations. But the total Ns reported in MM Table 4 are greater than 218. For example, the "Course" line in MM Table 4 has an N of 357 for Martin and an N of 1,169 for Mitchell, which is a total N of 1,526. That 1,526 is exactly seven times 218, and the MM appendix indicates that the student evaluations had 7 "Course" items.

Using this code, I reproduced MM Table 4 t-scores closely or exactly by treating each observation as independent and conducting a t-test assuming equal variances, suggesting that MM Table 4 inferential statistics were not adjusted for the lack of independence of some observations. However, for the purpose of calculating inferential statistics, multiple ratings from the same student cannot be treated as if these were independent ratings.

The aforementioned code reports p-values for individual-item comparisons of evaluations for Mitchell and for Martin, which avoids the problem of a lack of independence for some student responses. But I'm not sure that much should be made of any differences detected or not detected between evaluations for Mitchell and evaluations for Martin, given the lack of randomization of students to instructors or any evidence that the students in Mitchell's sections were sufficiently equal before the course to the students in Martin's sections, and given the possibility that students in these sections might have already has courses or interactions with Mitchell and/or Martin and that the evaluations reflected these prior experiences.

---

3.

Corrected inferential statistics for MM Table 1 and MM Table 2 would ideally reflect consideration of whether non-integer counts of comments should be used, as MM appears to have done. Multiplying proportions in MM Table 1 and MM Table 2 by sample sizes from the MM data produces some non-integer counts of comments. For example, the 15.2% for Martin in the MM Table 1 "Referred to as 'Teacher'" row corresponds to 3.5 of 23 comments, and the 20.9% for Mitchell in the MM Table 2 "Personality" row corresponds to 9.4 of 45 comments. Based on the data that Dr. Mitchell sent me, it seems that a comment might have been discounted by the number of sentences in the comment; for example, four of the official university course evaluations comments for Martin contain the word "Teacher", but the percentage for Martin is not 4 of 23 comments (17.4%) but is instead 3.5 of 23 comments (15.2%), presumably because one of the "teacher" comments had two sentences, only one of which referred to Martin as a teacher; the other three comments that referred to Martin as a teacher did not have multiple sentences.

Corrected inferential statistics for MM Table 1 and MM Table 2 for the frequency of references to the instructors as a professor should reflect consideration of the instructors' titles and job titles. For instance, for MM Table 1, the course numbers in the MM data match course listings for the five courses that Mitchell or Martin taught face-to-face at Texas Tech University in Fall 2015 or Spring 2015 (see here):

Mitchell
POLS 3312 Game Theory [Fall 2015]
POLS 3361 International Politics: Honors [Spring 2015]
POLS 3366 International Political Economy [Spring 2015]

Martin
POLS 3371 Comparative Politics [Fall 2015]
POLS 3373 Governments of Western Europe [Spring 2015]

Online CVs indicated that Mitchell's CV listed her Texas Tech title in 2015 as Instructor and that Martin's CV listed his Texas Tech title in 2015 as Visiting Professor.

A correction could also discuss the fact that, while Mitchell is referred to as "Dr." 19 times across all MM Table 1 and MM Table 2 comments, none of these comments refer to Martin as "Dr.". Martin's CV indicated that he earned his Ph.D. in 2014, so I do not see how non-reporting of references to Mitchell and Martin as "Dr." in the official student evaluations in MM Table 1 can be attributed to some comments being made before Martin received his Ph.D. Rate My Professors comments for Martin date to November 2014; however, even if the non-reporting of references to Mitchell and Martin as "Dr." in MM Table 2 can be attributed to some comments being made before Martin received his Ph.D., any use of "Professor" for Martin must be discounted because students presumably more titles to refer to Mitchell (e.g., "Dr.", "Professor") than to refer to Martin (e.g., "Professor").

---

Other notes:

---

4.

PS: Political Science & Politics should require authors to upload data and code so that readers can more clearly assess what the authors did.

---

5.

MM Table 4 data appear to have large percentages of enrolled students who did not evaluate Mitchell or Martin. Texas Tech data for Spring 2015 courses here indicate that enrollment for Mitchell's four sections of the course used in the study was 247 (section D6), 247 (section D7), 243 (section D8), and 243 (section D9), and that enrollment for Martin's two sections of the course was 242 (section D10) and 199 students (section D11). Mitchell's evaluations had ratings for 167 students of the 980 students in her courses, for a 17.0 response rate, and Martin's evaluations had ratings for 51 students of his 441 students, for an 11.6 percent response rate. It's possible that Mitchell's nearly 50 percent higher response rate did not affect differences in mean ratings between the instructors, but the difference in response rates would have been relevant information for the article to include.

---

6.

MM state (p. 652, emphasis in the original):

"To reiterate, of the 23 questions asked, there were none in which a female instructor received a higher rating."

My calculations indicate that Mitchell received a higher rating than Martin did on 3 of the 23 MM Table 4 items: items 17, 21, and 23. Moreover, MM Table 4 indicates that the mean for Mitchell was higher than the mean for Martin across the three Technology items. I think that the "there were none" statement is intended to indicate that Mitchell did not receive a higher rating than Martin did on any of the items for which the corresponding p-value was sufficiently low, but, if that's the case, then that should be stated clearly because the statement can otherwise be misleading.

But I'm curious how MM could have reported a difference in favor of Mitchell if MM were reporting results using one-tailed statistical tests to detect a difference in favor of Martin, as I read the MM Table 4 Technology line to indicate, with a t-score of 1.93 and a p-value of 0.027.

---

7.

MM reports that the study indicated that "a male instructor administering an identical online course as a female instructor receives higher ordinal scores in teaching evaluations, even when questions are not instructor-specific" (p. 648). But that was not always true: as indicated above, MM Table 4 even indicates that the mean for Mitchell was higher than the mean for Martin across the three not-instructor-specific Technology items.

---

8.

The MM appendix (p. 4) indicated that:

Students had a tendency to enroll in the sections with the lowest number initially (merely because those sections appeared first in the registration list). This means that section 1 tended to fill up earlier than section 3 or 4. It may also be likely that students who enroll in courses early are systematically different than those who enroll later in the registration period; for example, they may be seniors, athletes, or simply motivated students. For this reason, we examined sections in the mid- to high- numerical order: sections 6, 7, 8, 9, and 10.

The last line should indicate that data were from sections 6 to 11. See the sample sizes for Martin in the Texas Tech website data: item 1 for section D10 has student evaluation sample sizes of 6, 12, 10, 1, and 3, for a total of 32; adding the sample for item 1 from section D11 (7, 5, 6, 1, 0) raises that to 51; multiplying 51 times 7 produces 357, which is the sample size for Martin in the "Course" section of MM Table 4.

---

9.

I think that Blåsjö (2018) interpreted the statement that "For this reason, we examined sections in the mid- to high- numerical order: sections 6, 7, 8, 9, and 10" as if Mitchell and Martin collected data for other sections but did not analyze these data. Blåsjö: "Actually the researchers threw away at least half of the actual data". I think that that is a misreading of the (perhaps unclear) statement quoted above from the MM appendix. From what I can tell based on the data at the Texas Tech site, data were collected for only sections 6 to 11.

---

NOTE:

Thanks to representatives from the Texas Tech IRB and the Illinois State University IRB, respectively, for providing and forwarding the link to the Texas Tech student evaluations.

Tagged with:

According to the 20 Dec 2018 Samuel Perry and Andrew Whitehead Huffington Post article "What 'Make America Great Again' And 'Merry Christmas' Have In Common":

Christian theology, identity or faithfulness have nothing to do with an insistence on saying "Merry Christmas." To be more precise, when we analyzed public polling data, we found that there was no correlation between being an evangelical Christian, believing in the biblical Nativity story, attending church, or participating in charitable giving and rejecting "Season's Greetings" for "Merry Christmas." [emphasis added]

The referenced data are from a December 2013 Public Religion Research Initiative survey. Item Q5 is the "Merry Christmas" item:

Do you think stores and businesses should greet their customers with 'Happy Holidays' or 'Seasons Greetings' instead of 'Merry Christmas' out of respect for people of different faiths, or not? (Q5)

Item Q6 is the biblical Nativity belief item:

Do you believe the story of Christmas -- that is, the Virgin birth, the angelic proclamation to the Shepherds, the Star of Bethlehem, and the Wise Men from the East -- is historically accurate, or is it a theological story to affirm faith in Jesus? (Q6)

Here is the crosstab for the "Merry Christmas" item and the Nativity item:

PRRI-1Contra the article, these variables are correlated: ignoring the don't knows and refusals, 57 percent of participants who believe that the gospel Nativity story is historically accurate preferred the "Merry Christmas" response ("No, should not"), but only 41 percent of participants who believe that the gospel Nativity story is a theological story preferred the "Merry Christmas" response.

Here is a logit regression using the gospel Nativity responses (gospel) to predict the Merry Christmas responses (merry), removing from the analysis the participants who were coded as don't know or refusal for at least one of the items:

PRRI-2The p-value for the logit regression is also p<0.001 in weighted analyses.

The gospel predictor still has a p-value under p=0.05 when including the demographic controls below in unweighted analyses and in weighted analyses:

PRRI-3The gospel predictor still has a p-value under p=0.05 when including the demographic controls and controls for GOP partisanship and self-reported ideology in unweighted analyses:

PRRI-4There are specifications in which the p-value for the gospel predictor is above p=0.05, such as in a weighted analysis including the above controls for demographics, partisanship, and ideology. But the gospel predictor not being robust to every possible specification, especially specifications that control for factors such as GOP partisanship and charitable giving that are plausibly influenced by religious belief, isn't the impression that I received from "...we found that there was no correlation between...believing in the biblical Nativity story...and rejecting 'Season's Greetings' for 'Merry Christmas'".

---

Here is another passage from the article:

What does this tell us? Ultimately, drawing lines in the sand over whether people say "Merry Christmas" over "Happy Holidays" has virtually nothing to do with Christian faithfulness or orthodoxy.  It has everything to do with the cultural and political insecurity white conservatives feel.

I didn't see anything in the reported analysis that permits the inference that "It has everything to do with the cultural and political insecurity white conservatives feel". Whites and conservatives being more likely than non-Whites and non-conservatives to prefer "Merry Christmas" doesn't require that this preference is due to "the cultural and political insecurity white conservatives feel" any more than a non-White or non-conservative preference for "Happy Holidays" and "Seasons Greetings" can be attributed without additional information to the cultural and political insecurity that non-White non-conservatives feel.

---

NOTES:

1. Code here. Data here. Data acknowledgment: PRRI Religion & Politics Tracking Poll, December 2013; Principal Investigators Robert P. Jones and Daniel Cox; Data were downloaded from the Association of Religion Data Archives, www.TheARDA.com [http://www.thearda.com/Archive/Files/Descriptions/PRRIRP1213.asp].

2. I had a Twitter discussion of the article and the data with co-author Samuel Perry, which can be accessed here.

Tagged with: ,

The Kearns et al. study "Why Do Some Terrorist Attacks Receive More Media Attention Than Others?" has been published in Justice Quarterly; the abstract indicates that "Controlling for target type, fatalities, and being arrested, attacks by Muslim perpetrators received, on average, 357% more coverage than other attacks". A prior Kearns et al. analysis was reported on in a 2017 Monkey Cage post and a paper posted at SSRN with a "last edited" date of 3/5/17 limited to "media coverage for terrorist attacks in the United States between 2011 and 2015" (p. 7 of the paper).

Data for the Kearns et al. study published in Justice Quarterly has been expanded to cover terrorist attacks from 2006 to 2015 (instead of 2011 to 2015) and now reports a model with a predictor for "Perpetrator and group unknown", with a p-value under 0.05 for the Muslim perpetrator predictor. Footnote 9 of Kearns et al. 2019 discusses selection of 2006 as the starting point:

Starting in 2006, an increasing percentage of Americans used the Internet as their main source of news [URL provided, but omitted in this quote]. Since the news sources used for this study include both print and online newspaper articles, we started our analysis in 2006. In years prior to 2006, we may see fewer articles overall since print was more common and is subject to space constraints (p. 8).

That reason to start the analysis in 2006 does not explain why the analysis in the Monkey Cage post and the 3/5/17 paper started in 2011, given that the news sources in these earlier reports of the study also included both print and online articles.

In this 3/28/17 post, I reported that the Muslim perpetrator predictor had a 0.622 p-value in my analysis predicting the number of articles of media coverage using the Kearns et al. 2011-2015 outcome variable coding, controlling for the number of persons killed in the attack and for whether the perpetrator was unknown.

Using the 2006-2015 dataset and code that Dr. Kearns sent me upon request, I ran my three-predictor model, limiting the analysis to events from 2011 to 2015:

Kearns1The above p-value for the Muslim perpetrator predictor differs from my 0.622 p-value from the prior post, although inferences are the same. There might be multiple reasons for the difference, but the 3/5/17 Kearns et al. paper reports a different number of articles for some events; for example, the Robert Dear event was coded as 204 articles in the paper and as 178 articles in the 2019 article, and the number of articles for the Syed Rizwan Farook / Tashfeen Malik event dropped from 179 to 152.

---

The inference about the Muslim perpetrator predictor is more convincing using the 2006-2015 data from Kearns et al. 2019 than from the 2011-2015 data: the 2006-2015 data produce a 2.82 Muslim perpetrator predictor t-score using my three-predictor model above and a 4.20 t-score with a three-predictor model replacing the number killed in the event with a predictor for whether someone was killed in the event.

For what it's worth, along with higher-than-residual news coverage for events with Muslim perpetrators, the Kearns et al. data indicate that, compared to other events with a known perpetrator, events with Muslim perpetrators also have higher-than-residual numbers of deaths, numbers of logged wounded, and (at least at p=0.0766) likelihood of a death:

Kearns2Kearns3Kearns4---

NOTES

1. I could not find the 3/5/17 Kearns et al. paper online now, but I have a PDF copy from SSRN (SSRN-id2928138.pdf) that the above post references.

2. Stata code for my analyses:

gen PerpUnknown=0
replace PerpUnknown=1 if eventid==200601170007
replace PerpUnknown=1 if eventid==200606300004
replace PerpUnknown=1 if eventid==200607120007
replace PerpUnknown=1 if eventid==200705090002
replace PerpUnknown=1 if eventid==200706240004
replace PerpUnknown=1 if eventid==200710200003
replace PerpUnknown=1 if eventid==200710260003
replace PerpUnknown=1 if eventid==200802170007
replace PerpUnknown=1 if eventid==200803020012
replace PerpUnknown=1 if eventid==200803060004
replace PerpUnknown=1 if eventid==200804070005
replace PerpUnknown=1 if eventid==200804220011
replace PerpUnknown=1 if eventid==200806140008
replace PerpUnknown=1 if eventid==200807250030
replace PerpUnknown=1 if eventid==200903070010
replace PerpUnknown=1 if eventid==200909040003
replace PerpUnknown=1 if eventid==201007270013
replace PerpUnknown=1 if eventid==201011160004
replace PerpUnknown=1 if eventid==201101060018
replace PerpUnknown=1 if eventid==201102220009
replace PerpUnknown=1 if eventid==201104230010
replace PerpUnknown=1 if eventid==201105060004
replace PerpUnknown=1 if eventid==201109260012
replace PerpUnknown=1 if eventid==201110120003
replace PerpUnknown=1 if eventid==201205200024
replace PerpUnknown=1 if eventid==201205230034
replace PerpUnknown=1 if eventid==201208120012
replace PerpUnknown=1 if eventid==201301170006
replace PerpUnknown=1 if eventid==201302260036
replace PerpUnknown=1 if eventid==201304160051
replace PerpUnknown=1 if eventid==201304170041
replace PerpUnknown=1 if eventid==201304180010
replace PerpUnknown=1 if eventid==201307250065
replace PerpUnknown=1 if eventid==201308220053
replace PerpUnknown=1 if eventid==201403180089
replace PerpUnknown=1 if eventid==201403250090
replace PerpUnknown=1 if eventid==201406110089
replace PerpUnknown=1 if eventid==201410030065
replace PerpUnknown=1 if eventid==201410240071
replace PerpUnknown=1 if eventid==201411040087
replace PerpUnknown=1 if eventid==201502170127
replace PerpUnknown=1 if eventid==201502230104
replace PerpUnknown=1 if eventid==201503100045
replace PerpUnknown=1 if eventid==201506220069
replace PerpUnknown=1 if eventid==201506230056
replace PerpUnknown=1 if eventid==201506240051
replace PerpUnknown=1 if eventid==201506260046
replace PerpUnknown=1 if eventid==201507150077
replace PerpUnknown=1 if eventid==201507190097
replace PerpUnknown=1 if eventid==201508010105
replace PerpUnknown=1 if eventid==201508020114
replace PerpUnknown=1 if eventid==201508190040
replace PerpUnknown=1 if eventid==201509040048
replace PerpUnknown=1 if eventid==201509300082
replace PerpUnknown=1 if eventid==201512260016
tab PerpUnknown, mi
tab PerpUnknown PerpMuslim, mi
tab PerpUnknown PerpNonMuslim, mi
tab PerpUnknown PerpGroupUnknown, mi
nbreg TOTALARTICLES PerpMuslim numkilled PerpUnknown if eventid>=201101060018
nbreg TOTALARTICLES PerpMuslim numkilled PerpUnknown
gen kill0=0
replace kill0=1 if numkilled==0
tab numkilled kill0
nbreg TOTALARTICLES PerpMuslim kill0     PerpUnknown
ttest numkilled if PerpUnknown==0, by(PerpMuslim)
ttest numkilled                  , by(PerpMuslim)
ttest logwound  if PerpUnknown==0, by(PerpMuslim)
ttest logwound                   , by(PerpMuslim)
prtest kill0    if PerpUnknown==0, by(PerpMuslim)
prtest kill0                     , by(PerpMuslim)

3. Kearns et al. 2019 used a different "unknown" perpetrator measure than I did. My PerpUnknown predictor (in the above analysis and the prior post) coded in a dichotomous variable as 1 any perpetrator listed as "Unknown" in the Kearns et al. list. Kearns et al. 2019 has a dichotomous PerpGroupUnknown variable that differentiated between perpetrators in which the group of the perpetrator was known (such as for this case with an ID of 200807250030 in the Global Terrorism Database, in which the perpetrators were identified as Neo-Nazis) and perpetrators in which the group of the perpetrator was unknown (such as for this case with an ID of 200806140008 in the Global Terrorism Database, in which the perpetrator group was not identified). Kearns et al. 2019 footnote 17 indicates that "Even when the individual perpetrator is unknown, we often know the group responsible so 'perpetrator unknown' is not a theoretically sound category on its own, though we account for these incidents in robustness checks"; however, I'm not sure why "perpetrator unknown" is not a theoretically sound category on its own for the purpose of a control when predicting media coverage: if a perpetrator's name is not known, then there might be fewer news articles because there will be no follow-up articles that delve into the background of the perpetrator in a way that could be done if the perpetrator's name were known.

Tagged with: ,