1.

Researchers reporting results from an experiment often report estimates of the treatment effect at particular levels of a predictor. For example, a panel of Figure 2 in Barnes et al. 2018 plotted, over a range of hostile sexism, the estimated difference in the probability of reporting being very unlikely to vote for a female representative target involved in a sex scandal relative to the probability of reporting being very unlikely to vote for a male representative target involved in a sex scandal. For another example, Chudy 2020 plotted, over a range of racial sympathy, estimated punishments for a Black culprit target and a White culprit target. Both of these plots report estimates derived from a regression. However, as indicated in Hainmueller et al. 2020, regression can nontrivially misestimate a treatment effect at particular levels of a predictor.

This post presents another example of this phenomenon, based on data from the experiment in Costa et al. 2020 "How partisanship and sexism influence voters' reactions to political #MeToo scandals" (link to a correction to Costa et al. 2020).

---

2.

The Costa et al. 2020 experiment had a control condition, two treatment conditions, and multiple outcome variables, but my illustration will focus on only two conditions and only one outcome variable. Participants were asked to respond to four items measuring participant sexism and to rate a target male senator on a 0-to-10 scale. Participants who were then randomized to the "sexual assault" condition were provided a news story indicating that the senator had been accused of groping two women without consent. Participants who were instead randomized to the control condition were provided a news story about the senator visiting a county fair. The outcome variable of interest for this illustration is the percent change in the favorability of the senator, from the pretest to the posttest.

Estimates in the left panel of Figure 1 are based on a linear regression predicting the outcome variable of interest, using predictors of a pretest measure of participant sexism from 0 for low sexism to 16 for high sexism, a dichotomous variable coded 1 for participants in the sexual assault condition and 0 for participants in the control, and an interaction of these predictors. The panel plots the point estimates and 95% confidence intervals for the estimated difference in the outcome variable, between the control condition and the sexual assault condition, at each observed level of the participant sexism index.

The leftmost point indicates that the "least sexist" participants in the sexual assault condition were estimated to have a value of the outcome variable that was about 52 units less than the "least sexist" participants in the control condition; the "least sexist" participants in the control were estimated to have increased their rating of the senator by 4.6 percent, and the "least sexist" participants in the sexual assault condition were estimated to have reduced their rating of the senator by 47.6 percent.

The rightmost point of the plot indicates that the "most sexist" participants in the sexual assault condition were estimated to have a value of the outcome variable that was about 0 units less than did the "most sexist" participants in the control condition; the "most sexist" participants in the control were estimated to have increased their rating of the senator by 1.7 percent, and the "least sexist" participants in the sexual assault condition were estimated to have increased their rating of the senator by 2.1 percent. Based on this rightmost point, a reader could conclude about the sexual assault allegations, as Costa et al. 2020 suggested, that:

...the most sexist subjects react about the same way to sexual assault and sexist jokes allegations as they do to the control news story about the legislator attending a county fair.

However, the numbers at the inside bottom of the Figure 1 panels indicate the sample size at that level of the sexism index, across the control condition and the sexual assault condition. These numbers indicate that the regression-based estimate for the "most sexist" participants was nontrivially based on the behavior of other participants.

Estimates in the right panel of Figure 1 are instead based on t-tests conducted for participants at only the indicated level of the sexism index. As in the left panel, the estimate for the "least sexist" participants falls between -50 and -60, and, for the next few higher observed values of the sexism index, estimates tend to rise and/or tend to get closer to zero. But the tendency does not persist above the midpoint of the sexism index. Moreover, the point estimates in the right panel for the three highest values of the sexism index do not fall within the corresponding 95% confidence intervals in the left panel.

The p-value fell below p=0.05 for the 28 participants at 15 or 16 on the sexism index, with a point estimate of -22. The sample size was 1,888 across these two conditions, so participants at 15 or 16 on the sexism index represent the top 1.5% of participants on the sexism index across these two conditions. Therefore, the sexual assault treatment appears to have had an effect on these "very sexist" participants.

---

3.

Regression can reveal patterns in data. For example, linear regression estimates correctly indicated that, in the Costa et al. 2020 experiment, the effect of the sexual assault treatment relative to the control was closer to zero for participants at higher levels of a sexism index than for participants at lower level of the sexism index. However, as indicated in the illustration above, regression can produce misestimates of an effect at particular levels of a predictor. Therefore, inferences about an estimated effect at a particular level of a predictor should be based only on cases at or around that level of the predictor and should not be influenced by other cases.

---

NOTES

1. Costa et al. 2020 data.

2. Stata code for the analysis.

3. R code for the plot. CSV file for the R plot.

4. The interflex R package (Hainmueller et al. 2020) produced the plot below, using six bins. The leveling off at higher values of the sexism index also appears in this interflex plot:

R code to add to the corrected Costa et al. 2020 code:

dat$sexism16 <- (dat$pre_sexism-1)*4

summary(dat$sexism16)

p1 <- inter.binning(data=dat, Y="perchange_vote", D="condition2", X="sexism16", nbins=6, base="Control")

plot(p1)

Tagged with: ,

In a Monkey Cage post and Chapter 6 of their Ignored Racism book, Mark D. Ramirez and David A.M. Peterson reported on a conjoint experiment, in which White adult U.S. citizens were given a profile of two target persons and were asked "Which of these citizens do you prefer to keep registered to vote?". The experiment manipulated profile target characteristics such as race, gender, and criminal status.

Latina/o racism-ethnicism (LRE) was measured with responses to four "modern racism"-type items, such as "Many other ethnic groups have successfully integrated into American culture. Latinos and Hispanics should do the same without any special favors".

Results in Figure 6.7 indicated that high LRE participants favored White targets over Hispanic targets. But Figure 6.7 results also indicated that low LRE participants favored Hispanics targets over White targets. This experiment thus provided further evidence that a nontrivial percentage of participants at low levels of modern racism / modern sexism items have racial bias and/or gender bias. Here is prior post on a study indicating that persons at low levels of hostile sexism discriminated against men.

Tagged with: , ,

In "Gendered Nationalism and the 2016 US Presidential Election: How Party, Class, and Beliefs about Masculinity Shaped Voting Behavior" (Politics & Gender 2019), Melissa Deckman and Erin Cassese reported a Table 2 model that had a sample size of 750 and a predictor for college degree that had a logit coefficient of -0.57 and a standard error of 0.28, so the associated t-statistic is -0.57/28, or about -2.0, which produces a p-value of about 0.05.

The college degree coefficient fell to -0.27 when a "gendered nationalism" predictor was added to the model, and Deckman and Cassese 2019 indicated (pp. 17-18) that:

A post hoc Wald test comparing the size of the coefficients between the two models suggests that the coefficient for college was significantly reduced by the inclusion of the mediator [F(1,678) = 7.25; p < .0072]...

From what I can tell, this means that there is stronger evidence for the -0.57 coefficient differing from the -0.27 coefficient (p<0.0072) than for the -0.57 coefficient differing from zero (p≈0.05).

This type of odd result has been noticed before.

---

For more explanation, below are commands that can be posted into Stata to produce a similar result:

clear all
set seed 123
set obs 500
gen Y = runiform(0,10)
gen X1 = 0.01*(Y + runiform(0,10)^2)
gen X2 = 0.01*(Y + 2*runiform(0,10))
reg Y X1
egen weight = fill(1 1 1 1 1)
svyset [pw=weight]
svy: reg Y X1
estimates store X1alone
svy: reg Y X1 X2
estimates store X1paired
suest X1alone X1paired
lincom _b[X1alone:X1] - 0
di _b[X1paired:X1]
lincom _b[X1alone:X1] - 0.4910762
lincom _b[X1alone:X1] - _b[X1paired:X1]

The X1 coefficient is 0.8481948 in the "reg Y X1" model and is 0.4910762 in the "reg Y X1 X2" model. Results for the "lincom _b[X1alone:X1] - _b[X1paired:X1]" command indicate that the p-value is 0.040 for the test that the 0.8481948 coefficient differs from the 0.4910762 coefficient. But results for the "lincom _b[X1alone:X1] - 0.4910762" command indicate that the p-value is 0.383 for the test that the 0.8481948 coefficient differs from the number 0.4910762.

So, from what I can tell, there is stronger evidence that the 0.8481948 X1 coefficient differs from an imprecisely estimated coefficient that has the value of 0.4910762 than from the value of 0.4910762.

---

As indicated in the link above, this odd result appears attributable to the variance sum law:

Variance(X-Y) = Variance(X) + Variance(Y) - 2*Covariance(X,Y)

For the test of whether the 0.8481948 X1 coefficient differs from the 0.4910762 X1 coefficient, the formula is:

Variance(X-Y) = Variance(X) + Variance(Y) - 2*Covariance(X,Y)

But for the test of whether the -0.57 coefficient differs from zero, the formula reduces to:

Variance(X-Y) = Variance(X) + 0 - 0

For the simulated data, subtracting 2*Covariance(X,Y) reduces Variance(X-Y) more than adding the Variance(Y) increases Variance(X-Y), which explains how the p-value can be lower for comparing the two coefficients to each other than for comparing one coefficient to the value of the other coefficient.

See the code below:

suest X1alone X1paired
matrix list e(V)
di (.8481948-.4910762)/sqrt(.16695974)
di (.8481948-.4910762)/sqrt(.16695974+.14457114-2*.14071065)
test _b[X1alone:X1] = _b[X1paired:X1]

Stata output here.

Tagged with:

The 2018 CCES (Cooperative Congressional Election Survey) included an item asking for attitudes about the item: "White people in the U.S. have certain advantages because of the color of their skin". Schaffner 2020 ("The Heightened Importance of Racism and Sexism in the 2018 U.S. Midterm Elections") used this item in a "denial of racism" measure, which Schaffner 2020 in the title and elsewhere reduced to "racism". The included items permitted Schaffner 2020 to note that higher values of the "denial of racism" measure associate with voting for Republican candidates for president and the House (e.g., in Figure 2).

The 2018 CCES did not include a parallel item about whether White people in the United States have certain disadvantages, but the 2016 American National Election Studies Time Series Study has a set of items that permits comparison of denial of discrimination against certain groups. Here are results for the racial groups asked about, from the web sample with weights applied. Non-responses are included in the percentages, and error bars indicate ends of 95% confidence intervals:

Here are the above data, disaggregated by racial groups:

Here are data for Whites, with point estimates indicating responses by partisanship:

So these data indicate that a higher percentage of White Republicans than of White Democrats deny that there is discrimination against Blacks, Hispanics, and Asians. But these data also indicate that a higher percentage of White Democrats than of White Republicans deny that there is discrimination against Whites.

Let's check the ANES 2016 Time Series Study data to see how well each "denial of discrimination" measure predicts two-party vote choice in the 2016 U.S. presidential election, using the full sample (not only Whites):

So these data indicate that denial of discrimination against Whites was at least as good of a predictor of 2016 U.S. presidential election two-party vote choice as denial of discrimination against Blacks was, and was a better predictor than discrimination against Hispanics and discrimination against Asians.

For results below, the sample is limited to Whites:

---

I think that results are more informative measuring denial of discrimination against more than one racial group, especially given evidence that Republicans and Democrats advantage different racial groups. I think it's worth considering why the persons who decide which items to include on the CCES didn't include a parallel item about whether White people in the United States have certain disadvantages.

---

NOTES

1. The other 2018 CCES item that Schaffner 2020 used for the "denial of racism" measure is: "Racial problems in the U.S. are rare, isolated situations". DeSante and Smith 2017 referred to this item as a measure of "acknowledgment of institutional racism", but this item does not refer to institutions and uses "racial problems" instead of "racism". These seem like suboptimal choices for trying to measure "acknowledgment of institutional racism".

2. ANES 2016 citations:

The American National Election Studies (ANES). 2016. ANES 2012 Time Series Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2016-05-17. https://doi.org/10.3886/ICPSR35157.v1.

ANES. 2017. "User's Guide and Codebook for the ANES 2016 Time Series Study". Ann Arbor, MI, and Palo Alto, CA: The University of Michigan and Stanford University.

3. CCES 2018 citation:

Stephen Ansolabehere, Brian F. Schaffner, and Sam Luks. Cooperative Congressional Election Study, 2018: Common Content. [Computer File] Release 2: August 28, 2019. Cambridge, MA: Harvard University [producer] http://cces.gov.harvard.edu.

4. Code for the denial of discrimination analyses.

Tagged with:

Let's define "isolated negative feeling" as rating one target group under 50 but rating all other included target groups at 50 or above, on 0-to-100 feeling thermometers. Target groups in the plot below were Whites, Blacks, Hispanics, and Asians. Data are from the web sample of the 2016 American National Election Studies Time Series Study, with weights applied and limited to participants who provided a numeric rating for all four target groups. Error bars indicate ends of 95% confidence intervals:

---

NOTES

1. ANES 2016 citations:

The American National Election Studies (ANES). 2016. ANES 2012 Time Series Study. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2016-05-17. https://doi.org/10.3886/ICPSR35157.v1.

ANES. 2017. "User's Guide and Codebook for the ANES 2016 Time Series Study". Ann Arbor, MI, and Palo Alto, CA: The University of Michigan and Stanford University.

2. Code.

Tagged with:

PS Political Science & Politics recently published Liu et al. 2020 "The Gender Citation Gap in Undergraduate Student Research: Evidence from the Political Science Classroom". The authors use their study to discuss methods to address gender bias in citations among students:

To the extent that women, in fact, are underrepresented in undergraduate student research, the question becomes: What do we, as a discipline, do about this?...

However, Liu et al. 2020 do not establish that women authors were unfairly underrepresented in student research, because Liu et al. 2020 did not compare citation patterns to a benchmark of the percentage of women that should be cited in the absence of gender bias.

PS Political Science & Politics has an relevant article for benchmarking: Teele and Thelen 2017, in which Table 1 reports the percentage of authors who are women for research articles published from 2000 to 2015 in ten top political science journals. Based on that table, about 26.3% of authors were women.

The Liu et al. 2020 student sample had 75 male students and 65 female students,with male students citing 21.2% women authors and female students citing 33.1% women authors, so the percentage of women cited by the students overall was about 26.7% when weighted by student gender, which is remarkably close to the 26.3% benchmark.

There might be sufficient evidence to claim that the 95% confidence interval for male students does not contain the proper benchmark, and the same might be true for female students, but the 26.3% benchmark from Teele and Thelen 2017 might not be the correct benchmark: for example, maybe students wrote more on topics for which women have published relatively more, or maybe students drew from publications from before 2000 (during which women were a smaller percentage of political scientists than from 2000 to 2015). But the correct benchmark for inferring that women authors were unfairly underrepresented should have been addressed before PS published the final paragraph of Liu et al. 2020, with recommendations about how to address women's under-representation in undergraduate student research.

Tagged with: , ,

Back in 2016, SocImages tweeted a link to a post entitled "Trump Supporters Substantially More Racist Than Other Republicans". The "more racist" label refers to Trump supporters being more likely than Cruz supporters and Kasich supporters to indicate on stereotype scales that Blacks "in general" are less intelligent, more lazy, more rude, more violent, and more criminal than Whites "in general". I had a brief Twitter discussion with Philip Cohen and offered to move the discussion to a blog post. Moreover, I collected some relevant data, which is reported on in a new publication in Political Studies Review.

---

In 2017, Turkheimer, Harden, and Nisbett in Vox estimated the Black/White IQ gap to be closer to 10 points than to 15 points. Ten points would be a relatively large gap, about 2/3 of a standard deviation. Suppose that a person reads this Vox article and reads the IQ literature and, as a result, comes to believe that IQ is a valid enough measure of intelligence for it to be likely that the Black/White IQ gap reflects a true difference in mean intelligence. This person later responds to a survey, rating Whites in general one unit higher on a stereotype scale for intelligence than the person rates Blacks in general. My question, for anyone who thinks that such stereotype scale responses can be used as a measure of anti-Black animus, is:

Why is it racist for this person to rate Whites in general one unit higher than Blacks in general on a stereotype scale for intelligence?

I am especially interested in a response that is general enough to indicate whether it would be sexist against men to rate men in general higher than women in general on a stereotype scale for criminality.

Tagged with: , ,