The plot below is based on data from the ANES 2022 Pilot Study, plotting the percentage of particular populations that rated the in-general intelligence of Whites higher than the in-general intelligence of Blacks (black dots) and the percentage of these populations that rated the in-general intelligence of Asians higher than the in-general intelligence of Whites (white dots). For the item wording, see the notes below or page 44 of the questionnaire.

My understanding is that, based on a straightforward / naïve interpretation of educational data such as NAEP scores as good-enough measures of intelligence [*], there isn't much reason to be in the white dot and not in the black dot or vice versa. But, nonetheless, there is a gap between dots in the overall population and in certain populations.

In the plot above, estimated percentages are similar among very conservative Whites and among U.S. residents who attributed to biological differences at least some of the Black-American/Hispanic-American-vs-White-American difference in outcomes in things such as jobs and income. But similar percentages can mask inconsistencies.

For example, among U.S. residents who attributed to biological differences at least some of the Black-American/Hispanic-American-vs-White-American difference in outcomes in things such as jobs and income, about 37% rated Asians' intelligence higher than Whites' intelligence, about 34% rated Whites' intelligence higher than Blacks' intelligence, but only about 14% fell into both of these groups, as illustrated in the second panel below:

The plot below indicates corresponding comparisons for the estimated percentages that rated the in-general intelligence of Whites higher than the in-general intelligence of Blacks (black dots) and the percentage of these populations that rated the in-general intelligence of Asians higher than the in-general intelligence of Blacks (white dots).

---

[*] I can imagine reasons to not be in one or both dots, such as perceptions about the influence of past or present racial discrimination, the relative size of the gaps, flaws in the use of educational data as measures of intelligence, and imperfections in the wording of the ANES item. But I nonetheless thought that it would be interesting to check respondent ratings about racial group intelligence.

---

NOTES

1. Relevant item wording from the ANES 2022 Pilot Study:

Next, we're going to show you a seven-point scale on which the characteristics of the people in a group can be rated. In the first statement a score of '1' means that you think almost all of the people in that group tend to be intelligent.' A score of '7' means that you think most people in the group are 'unintelligent.' A score of '4' means that you think that most people in the group are not closer to one end or the other, and of course, you may choose any number in between. Where would you rate each group in general on this scale?

2. The ANES 2022 Pilot Study had a parallel item about Hispanic-Americans that I didn't analyze, to avoid complicating the presentation.

3. In the full sample, weighted, 13% rated in-general Black intelligence higher than in-general White intelligence (compared to 25% the other way), 8% rated in-general Black intelligence higher than in-general Asian intelligence (compared to 38% the other way), and 10% rated in-general White intelligence higher than in-general Asian intelligence (compared to 35% the other way). Respective equal ratings of in-general intelligence were 62% White/Black, 54% Asian/Black, and 55% Asian/White.

Respondents were coded into a separate category if the respondent didn't provide a rating of intelligence for at least one of the racial groups in a comparison, but almost all respondents provided a rating of intelligence for each racial group.

4. Plots created with R packages: tidyverse, waffle, and patchwork.

5. Data for the ANES 2022 Pilot Study. Stata code and output for my analysis.

6. An earlier draft of the first plot is below, which I didn't like as much, because I thought that it was too wide and not as visually attractive:

7. The shading in the plot below is intended to emphasize the size of the gaps between the estimates within a population, with red indicating reversal of the typical pattern:

8. Plot replacing the legend with direct labels:

9. Bonus plot, while I'm working on visualizations, with this plot comparing ratings about men and women on 0-to-100 feeling thermometers, with confidence intervals for each category, as if the category were plotted as its own percentage:

Tagged with: , , , , ,

The Monkey Cage tweeted a link to post (Gift et al 2022), claiming that "Just seeing a Fox News logo prompts racial bias, new research suggests".

This new research is Bell et al 2022, which reported on an experiment that manipulated the logo on a news story provided to participants (no logo, CNN, and Fox News) and manipulated the name of the U.S. Army Ranger in the news story who was accused of killing a wounded Taliban detainee, with the name signaling race (e.g., no name, Tyrone Washington, Mustafa Husain, Santiago Gonzalez, Todd Becker).

The Appendix to Bell et al 2022 reports some results for all respondents, but Bell et al 2022 indicates (footnotes and citations omitted):

Research on racial attitudes in America largely theorizes about the proclivities and nuances of racial animus harbored by Whites, so we follow conventions in the literature by restricting our analysis to 1149 White respondents.

Prior relevant post.

---

1.

From the Gift et al 2022 Monkey Cage post (emphasis added):

The result wasn't what we necessarily expected. We didn't anticipate that the Fox News logo might negatively affect attitudes toward the Black service member any more than soldiers of other races. So what could explain this outcome?

The regression results reported in Bell et al 2022 have the "no name" condition as the omitted category, so the 0.180 coefficient and 0.0705 standard error for the [Black X Fox News] interaction term for the "convicted" outcome indicates the effect of the Fox News logo in the Black Ranger condition relative to the effect of the Fox News logo in the no-name condition.

But, for assessing anti-Black bias among White participants, it seems preferable to compare the effect of the Fox News logo in the Black Ranger condition to the effect of the Fox News logo in the White Ranger condition. Otherwise, the Black name / no-name comparison might conflate the effect of a Black name for the Ranger with the general effect of naming the Ranger. Moreover, a Black name / White name comparison would better fit the claim about "any more than soldiers of other races".

---

The coefficient and standard error are 0.0917 and 0.0701 for the [White X Fox News] interaction term for the "convicted" outcome, and I don't think that there is sufficient evidence that the 0.180 [Black X Fox News] coefficient differs from the 0.0917 [White X Fox News] coefficient, given that the difference in coefficients for the interaction terms is only 0.09 and the standard errors are about 0.07 for each interaction term.

Similar concern about the "justified" outcome, which had respective coefficients (and standard errors) of −0.142 (0.0693) for [Black X Fox News] and −0.0841 (0.0692) for [White X Fox News]. I didn't see the replication materials for Bell et al 2022 in the journal's Dataverse, or I might have tried to get the p-values.

---

2.

From the Gift et al 2022 Monkey Cage post:

Of course one study is hardly definitive. Our analysis points to the need for more research into how Fox News and other media may or may not prime racial attitudes across a range of political and social issues.

Yes, one study is not definitive, so it might have been a good idea for the Gift et al 2022 Monkey Cage post to have mentioned the replication attempt *published in Bell et al 2022* in which the [Black X Fox News] interaction term did not replicate in statistical significance or even in the direction of the coefficients, with a −0.00371 coefficient for the "convicted" outcome and a 0.0199 coefficient for the "justified" outcome.

I can't see a good reason for the Gift et al 2022 Monkey Cage post to not report results for the preregistered replication attempt and for the Monkey Cage editors to have not known about the replication attempt or to permit publishing the post without mentioning the lack of replication for the [Black X Fox News] interaction term.

The preregistration suggests that the replication attempt was due to the journal (Research & Politics), so it seems that we can thank a peer reviewer or editor for the replication attempt.

---

3.

Below is the first sentence from the preregistration question about the main question for Study 2:

White Americans who see a story about a non-white soldier will be more likely to say the soldier should be punished for their alleged crime than either an unnamed soldier or a white soldier.

Bell et al 2022 Appendix Table A2 indicates that means for the "convicted" outcome in Study 2 were, from high to low and by condition:

No logo news source
0.725 White name
0.697 Latin name
0.692 MEast name
0.680 No name 
0.655 Black name

CNN logo
0.705 No name 
0.698 Latin name
0.695 Black name
0.694 White name
0.688 MEast name

Fox News logo
0.730 No name 
0.703 White name
0.702 Black name
0.695 MEast name
0.688 Latin name

So, in the Fox News condition from this *preregistered* experiment, the highest point estimate for a named Ranger was for the White Ranger, for the "convicted" outcome, which seems like a better measure of punishment than the "justified" outcome.

The gap between the highest mean "convicted" outcome for a named Ranger (0.703) and the lowest mean "convicted" outcome for a named Ranger (0.688) was 0.015 units on a 0-to-1 scale. That seems small enough to be consistent with random assignment error and to be inconsistent with the title of the Monkey Cage post of "Just seeing a Fox News logo prompts racial bias, new research suggests".

---

NOTES

1. Tweet question to authors of Bell et al 2022.

2. The constant in the Bell et al 2022 OLS regressions represents the no-name Ranger in the no-logo news story.

In Study 1, this constant indicates that the Ranger in the no-name no-logo condition was rated on a 0-to-1 scale as 0.627 for the "convicted" outcome and as 0.389 for the "justified" outcome. This balance make sense: on net, participants in the no-name no-logo condition agreed that the Ranger should be convicted and disagreed that the Ranger's actions were justified. Appendix Table A1 indicates that the mean "convicted" rating was above 0.50 and the mean "justified" rating was below 0.50 for each of the 15 conditions for Study 1.

But the constants in Study 2 were 0.680 for the "convicted" outcome and 0.711 for the "justified" outcome, which means that, on net, participants in the no-name no-logo condition agreed that the Ranger should be convicted and agreed that the Ranger's actions were justified. Appendix Table A2 indicates that the mean for both outcomes was above 0.50 for each of the 15 conditions for Study 2.

3. I think that Bell et al 2022 Appendix A1 might report results for all respondents: the sample size in A1 is N=1554, but in the main text Table 2 sample sizes are N=1149 for the convicted outcome and 1140 for the justified outcome. Moreover, I think that the main text Figure 2 might plot these A1 results (presumably for all respondents) and not the Table 2 results that were limited to White respondents.

For example, A1 has the mean "convicted" rating as 0.630 for no-name no-logo, 0.590 for no-name CNN logo, and 0.636 for non-name Fox logo, which matches the CNN dip in the leftmost panel of Figure 2 and Fox News being a bit above the no-logo estimate in that panel. But the "convicted" constant in Table 1 is 0.630 (for the no-name no-logo condition), with a −0.0303 coefficient for CNN and a −0.0577 coefficient for Fox News, so based on this I think that the no-name Fox News mean should be lower than the no-name CNN mean.

The bumps in Figure 2 better match with Appendix Table A5 estimates, which are for all respondents.

4. This Bell et al 2022 passage about Study 2 seems misleading or at least easy to misinterpret (emphasis in the original, footnote omitted):

If the soldier was White and the media source was unnamed, respondents judged him to be significantly less justified in his actions, but when the same information was presented under the Fox News logo, respondents found him to be significantly more justified in his actions.

As indicated in the coefficients and Figure 3, the "more justified" isn't more justified relative to the no-name no-logo condition, but more justified relative to the bias against the White Ranger relative to the no-name Ranger in the no-logo condition. Relevant coefficients are −0.131 for "White", which indicates the reduction in the "justified" rating between the no-name no-logo condition and the White-name no-logo condition, and 0.169 for "White X Fox News", which indicates the White-name Fox-News advantage relative to the no-name Fox-News effect.

So the Fox News bias favoring the White Ranger in the Study 2 "justified" outcome only a little more than offset the bias against the White Ranger in the no-logo condition, with a net bias that I suspect might be small enough to be consistent with random assignment error.

Tagged with: , , ,

The American Political Science Review published Bonilla and Tillery Jr 2020 "Which identity frames boost support for and mobilization in the #BlackLivesMatter movement? An experimental test".

---

The Bonilla and Tillery Jr 2020 Discussion and Conclusion indicates that:

Further studies should also focus on determining why African American women are mobilizing more than men in response to every frame that we exposed them to in our survey experiment.

But I don't think that the Bonilla and Tillery Jr 2020 data support the claim that every frame caused more mobilization among African American women than among African American men.

Bonilla and Tillery Jr 2020 has figures that measure support for Black Lives Matter and figures with outcomes about writing to Nancy Pelosi, but Bonilla and Tillery Jr 2020 also combines support and mobilization with the phrasing of "mobilizing positive attitudes" (p. 959), so I wanted to check what outcome the above passage was referring to. The response that I received suggested that the outcome was writing to Nancy Pelosi. But I don't see any basis for a claim about gender differences for each frame for that outcome, in Bonilla and Tillery Jr 2020 Figure 4B, the text of Bonilla and Tillery Jr 2020, or my analysis.

---

Another passage from Bonilla and Tillery Jr 2020 (p. 958):

For those not identifying as LGBTQ+, we saw a stronger negative effect in asking for support as a result of the Feminist treatment than LGBTQ+ treatment (βFeminist = -0.08, p = 0.08; βLGBTQ+ = -0.04, p = 0.34).

The p-value was p=0.45 for my test of the null hypothesis that the -0.08 coefficient for the feminist treatment equals the -0.04 coefficient for the LGBTQ+ treatment. There is thus not sufficient evidence in these data that these coefficients differ from each other, so it's not a good idea to claim that one treatment had a stronger effect than the other.

---

The lead author of Bonilla and Tillery Jr 2020 presented these data to Harvard's Women and Public Policy Program, noting at about 31:40 the evidence that the nationalist frame had a significant effect among women on the "mention police" outcome and noting at about 32:48 that "Black men in general...were much less likely than Black women to talk about the police in general". But my analysis indicated that p=0.35 for a test of the null hypothesis that the effect of the nationalist frame does not differ by gender for the "mention police" outcome.

---

Similar problem in this passage with the suggestion about results being consistent with "a differential response to the Black feminist treatment by gender" (p. 954, footnote omitted and emphasis added):

For female respondents, we see nonsignificant (positive) effects of the Black nationalist (β = 0.03, p = 0.39) and Black LGBTQ+ treatments (β = 0.03, p = 0.30), and nonsignificant (negative) effects of the Black feminist treatment (β = -0.02, p = 0.46). In contrast, we found that male respondents were much more affected by the intersectional treatments...but both the Black feminist and Black LGBTQ+ treatments decreased Black male approval of BLM (βFeminist = -0.06., p = 0.07; βLGBTQ+ = -0.09, p = 0.008).

---

NOTES

1. Bonilla and Tillery Jr 2020 had a preregistration. Here is hypothesis 3 from the preregistration...

H3: LGBTQ and Intersectional frames of the BLM movement will have no effect (or a demobilizing effect) on the perceived effectiveness of BLM African American subjects.

...and from the article (emphasis added)...

H3: Black LGBTQ+ frames of the BLM movement will have a positive effect on Black LGBTQ+ members, but they will have no effect or a demobilizing effect on Black subjects who do not identify as LGBTQ+.

I don't think that this deviation was super important, but the difference makes me wonder whether the APSR peer reviewers and/or editors bothered to check the preregistration against the article. Even if this check was made, it would be nice if the journal signaled to readers that this check was made.

2. Bonilla and Tillery Jr 2020 thanks the prior APSR editors:

Finally, we thank the three anonymous reviewers and the previous APSR editors, Professor Thomas Koenig and Professor Ken Benoit, for pushing us to significantly improve this paper through the review process.

Something else nice would be for journal articles to indicate the editorial teams responsible for the decision to publish the article and responsible for checking the manuscript for errors and flaws.

3. I was curious to see what subsequent research has discussed about Bonilla and Tillery Jr 2020. Let's start with Anoll et al 2022 from Perspectives on Politics:

This could be of great political consequence considering the importance of parents as socializing agents (Jennings and Niemi 1974; Niemi and Jennings 1991; Oxley 2017) and the necessity of building multiracial support for BLM (Bonilla and Tillery 2020; Corral 2020; Holt and Sweitzer 2020; Merseth 2018).

I'm not sure what about Bonilla and Tillery Jr 2020 supports that citation about "the necessity of building multiracial support for BLM". Let's try Hswen et al 2021 from JAMA Network Open, which cited Bonilla and Tillery Jr 2020 as footnote 14:

Although often used following fatal encounters with law enforcement, #BlackLivesMatter also became an important tool to raise awareness around health inequities in Black communities, such as HIV, adequate access to analgesia, and cancer screening.13,14

I'm not sure what about Bonilla and Tillery Jr 2020 supports that citation about BLM being "an important tool to raise awareness around health inequities in Black communities".

From Jasny and Fisher 2022 from Social Networks (sic for "complimentary"):

Research has also shown that when multiple issues or beliefs are seen as complimentary, a process called "frame alignment," the connection can boost support for social movements and motivate participation (Bonilla and Tillery, 2020, Heaney, 2021; for an overview of "frame alignment," see Snow et al., 1986).

I'm not sure what in Bonilla and Tillery Jr 2020 that support-boosting and/or participation-motivating multiple frame alignment refers to. At least Heaney 2022 from Perspectives on Politics is on target with what Bonilla and Tillery Jr 2020 is about:

If BLM is able to convey an intersectional message effectively to its supporters, then this idea is likely to be widely discussed in movement circles and internalized by participants (Bonilla and Tillery 2020).

But Heaney 2022 doesn't tell readers what information Bonilla and Tillery Jr 2020 provided about intersectional messages. Next are the "charity" citations from Boudreau et al 2022 from Political Research Quarterly, in which Bonilla and Tillery Jr 2020 is absolutely unnecessary to support the claims that the citation is used for:

From the protests against police brutality in the 1960s and 70s to the emergence of the Black Lives Matter movement in recent years, there is a long history of police-inspired political mobilization (Laniyonu 2019; Bonilla and Tillery 2020)...

The symbolic appeal of a movement that served as a focal point and mobilizer of Americans' outrage was manifested in the Black Lives Matter signs posted in windows and scrawled on sidewalks and buildings across the country (Bonilla and Tillery 2020).

I'm not even sure that Bonilla and Tillery 2020 is a good citation for those passages.

And from Krewson et al 2022 from Political Research Quarterly (footnote text omitted):

In May of 2021, we obtained a sample of 2170 high quality respondents from Qualtrics, a widely respected survey firm (Bonilla and Tiller 2020; Friedman 2019; Kane et al. 2021).6

Ah, yes, Bonilla and Tiller [sic] 2020 which, what, provided evidence that Qualitrics is widely respected? That Qualtrics provides high quality respondents? Bonilla and Tillery Jr 2020 used Qualtrics, I guess. The omitted footnote text didn't seem relevant and seems to be incorrect, based on comparing the footnotes to the working paper and based on the content of the footnotes, with, for example, footnote 6 being about the ACBC design but the main text mention of the ACBC design linking to footnote 7.

Here is a prior post about mis-citations. Caveats from that post apply to the above discussion of citations to Bonilla and Tillery 2020, with the discussion not being systematic or representative, which prevents any inference stronger than Bonilla and Tillery 2020 being miscited more often than it should be.

Tagged with: , , , ,

Political Research Quarterly published Garcia and Sadhwani 2022 "¿Quien importa? State legislators and their responsiveness to undocumented immigrants", about an experiment in which state legislators were sent messages, purportedly from a Latinx person such as Juana Martinez or from an Eastern European person such as Anastasia Popov, with message senders describing themselves as "residents", "citizens", or "undocumented immigrants".

I'm not sure of the extent to which response rates to the purported undocumented immigrants were due to state legislative offices suspecting that this was yet another audit study. Or maybe it's common for state legislators to receive messages from senders who invoke their undocumented status, as in this experiment ("As undocumented immigrants in your area we are hoping you can help us").

But that's not what this post is about.

---

1.

Garcia and Sadhwani 2022 Table 1 Model 2 reports estimates from a logit regression predicting whether a response was received from the state legislator, with predictors such as legislative professionalism. The coefficient was positive for legislative professionalism, indicating that, on average and other model variables held constant, legislators from states with higher levels of legislative professionalism were more likely to respond, compared to legislators from states with lower levels of legislative professionalism.

Another Model 2 predictor was "state", which had a coefficient of 0.007, a standard error of 0.002, and three statistical significance asterisks indicating that, on average and other model variables held constant -- what? -- legislators from states with more "state-ness" were more likely to respond? I'm pretty sure that this "state" predictor was coded with states later in the alphabet such as Wyoming assigned a higher number than states earlier in the alphabet such as Alabama. I don't think makes any sense as a predictor of response rates, but the predictor was statistically significant, so that's interesting.

The "state" variable was presumably meant to be included as a categorical predictor, based on the Garcia and Sadhwani 2022 text (emphasis added):

For example, we include the Squire index for legislative professionalism (Squire 2007), the chamber in which the legislator serves, and a fixed effects variable for states.

I think this is something that a peer reviewer or editor should catch, especially because Garcia and Sadhwani 2022 doesn't report that many results in tables or figures.

---

2.

Garcia and Sadhwani 2022 Table 1 model 2 omits the sender category of undocumented Latinx, so that results for the five included sender categories can be interpreted relative to omitted sender category of undocumented Latinx. So far so good.

But then Garcia and Sadhwani 2022 interprets the other predictors as applying to only the omitted sender category of undocumented Latinx, such as (sic for "respond do a request"):

To further examine the potential impact of sentiments toward immigrants and immigration at the state level, we included a variable ("2012 Romney states") to examine if legislators in states that went to Romney in the 2012 presidential election were less likely to respond do a request from an undocumented immigrant. We found no such relationship in the data.

This apparent misinterpretation appears in the abstract (emphasis added):

We found that legislators respond less to undocumented constituents regardless of their ethnicity and are more responsive to both the Latinx and Eastern European-origin citizen treatments, with Republicans being more biased in their responsiveness to undocumented residents.

I'm interpreting that emphasized part to mean that the Republican legislator gap in responsiveness to undocumented constituents compared to citizen constituents was larger than the non-Republican legislator gap in responsiveness to undocumented constituents compared to citizen constituents. And I don't think that's correct based on the data for Garcia and Sadhwani 2022.

My analysis used an OLS regression to predict whether a legislator responded, with only a predictor for "undocCITIZ" coded 1 for undocumented senders and 0 for citizen senders. Coefficients were -0.07 among Republican legislators and -0.11 among non-Republican legislators, so the undocumented/citizen gap was not larger among Republican legislators compared to non-Republican legislators. Percentage responses are in the table below:

SENDER         GOP NON-GOP 
Citizen EEurop 21  23
Citizen Latina 26  29
Control EEurop 25  33
Control Latina 18  20
Undocum EEurop 18  12
Undocum Latina 15  17
OVERALL        20  22

---

NOTE

1. No response yet to my Nov 17 tweet to a co-author of Garcia and Sadhwani 2022.

Tagged with: , , , ,

Political Research Quarterly published Huber and Gunderson 2022 "Putting a fresh face forward: Does the gender of a police chief affect public perceptions?". Huber and Gunderson 2022 reports on a survey experiment in which, for one of the manipulations, a police chief was described as female (Christine Carlson or Jada Washington) or male (Ethan Carlson or Kareem Washington).

---

Huber and Gunderson 2022 has a section called "Heterogeneous Responses to Treatment" that reports on results that divided the sample into "high sexism" respondents and "low sexism" respondents. For example, the mean overall support for the female police chief was 3.49 among "low sexism" respondents and was 3.41 among "high sexism" respondents, with p=0.05 for the difference. Huber and Gunderson 2022 (p. 8) claims that [sic on the absence of a "to"]:

These results indicate that respondents' sexism significantly moderates their support for a female police chief and supports role congruity theory, as individuals that are more sexist should react more negatively [sic] violations of gender roles.

But, for all we know from the results reported in Huber and Gunderson 2022, "high sexism" respondents might merely rate police chiefs lower relative to how "low sexism" respondents rate police chiefs, regardless of the gender of the police chief.

Instead of the method in Huber and Gunderson 2022, a better method to test whether "individuals that are more sexist...react more negatively [to] violations of gender roles" is to estimate the effect of the male/female treatment on ratings about the police chief among "high sexism" respondents. And, to test whether "respondents' sexism significantly moderates their support for a female police chief", we can compare the results of that test to results from a corresponding test among "low sexism" respondents.

---

Using the data and code for Huber and Gunderson 2022, I ran the code up to the section for Table 4, which is the table about sexism. I then ran my modified code of the Huber and Gunderson 2022 code for Table 4, among respondents Huber and Gunderson 2022 labeled "high sexism", which is for a score above 0.35 on the measure of sexism, and then among respondents Huber and Gunderson 2022 labeled "low sexism", which is for a score below 0.35 on the measure of sexism.

Results are below, indicating a lack of p<0.05 evidence for a male/female treatment effect among these "high sexism" respondents, along with a p<0.05 pro-female bias among the "low sexism" respondents on all but one of the Table 4 items.

HIGH SEXISM RESPONDENTS------------------
                     Female Male
                     Chief  Chief
Domestic Violence    3.23   3.16  p=0.16
Sexual Assault       3.20   3.16  p=0.45
Violent Crime Rate   3.20   3.23  p=0.45
Corruption           3.21   3.18  p=0.40
Police Brutality     3.17   3.17  p=0.94
Community Leaders    3.33   3.31  p=0.49
Police Chief Support 3.41   3.39  p=0.52

LOW SEXISM RESPONDENTS------------------
                     Female Male
                     Chief  Chief
Domestic Violence    3.40   3.21  p<0.01
Sexual Assault       3.44   3.22  p<0.01
Violent Crime Rate   3.40   3.33  p=0.10
Corruption           3.21   3.07  p=0.01
Police Brutality     3.24   3.11  p=0.01
Community Leaders    3.40   3.32  p=0.02
Police Chief Support 3.49   3.37  p<0.01

---

I'm sure that there might be more of interest, such as calculating p-values for the difference between the treatment effect among "low sexism" respondents and the treatment effect among "high sexism" respondents, and assessing whether there is stronger evidence of a treatment effect among "high sexism" respondents higher up the sexism scale than the 0.35 threshold used in Huber and Gunderson 2022.

But I at least wanted to document another example of a pro-female bias among "low sexism" respondents.

Tagged with: , , , ,

The Journal of Politics recently published Butler et al 2022 "Constituents ask female legislators to do more".

---

1. PREREGISTRATION

The relevant preregistration plan for Butler et al 2022 has an outcome that the main article does not mention, for the "Lower Approval for Women" hypothesis. Believe it or not, the Butler et al 2022 analysis didn’t find sufficient evidence in its "Lower Approval for Women" tests. So instead of reporting that in the JOP article or its abstract or its title, Butler et al mentioned the insufficient evidence in appendix C of the online supplement to Butler et al 2022.

---

2. POSSIBLE ERROR FOR THE APPROVAL HYPOTHESIS

The Butler et al 2022 online appendix indicates that the dependent variable for Table C2 is a four-point scale that was predicted using ordered probit. Table C2 reports results for four cut points, even though a four-point dependent variable should have only three cut points. The dependent variable was drawn from a 5-point scale in which the fifth point was "Not sure", so I think that someone forgot to recode the "Not sure" responses to missing.

Butler et al 2022 online appendix C indicates that:

Constituents chose among 5 response options for the question: Strongly approve, Somewhat approve, Somewhat disapprove, Strongly disapprove, Not sure.

So I think that the "Not sure" responses were coded as if being not sure was super strongly disapprove.

---

3. PREREGISTRATION + RESEARCH METHOD

The image below has a tabulation of the dependent variable for the preregistered hypothesis of Butler et al 2022 that is reported in the main text, the abstract, and the title:

That's a very large percentage of zeros.

The Butler et al 2022 experiment involved male legislators and female legislators sending letters to constituents asking the constituents to complete an online survey, and, in that online survey, the legislator asked "What policy issues do you think I should work on during the current session?".

Here is a relevant passage from the Butler et al 2022 preregistration reported in the online appendix, with my emphasis added and [sic] for "...condition the code...":

Coding the Dependent Variable. This would be an open-ended question where voters could list multiple issues. We will have RAs who are blind to the hypothesis and treatment condition the code the number of issues given in the open response. We will use that number as the dependent variable. We will then an OLS regression where the DV is the number of issues and the IV is the gender treatment.

That passage seems to indicate that the dependent variable was preregistered to be a measure about what constituents provided in the open response. From what I can tell based on the original coding of the "NumberIssues" dependent variable, the RAs coded 14 zeros based on what respondents provided in the open response, out of a total of 1,203 observations. I ran the analysis on only these 1,203 observations, and the coefficient for the gender of the legislator (fem_treatment) was p=0.29 without controls and p=0.29 with controls.

But Butler et al 2022 coded the dependent variable to be zero for the 29,386 people who didn't respond to the survey at all or at least didn't respond in the open response. Converting these 29,386 observations to zero policy issues asked about produces corresponding p-values of p=0.06 and p=0.09. But it seems potentially misleading to focus on a dependent variable that conflates [1] the number of issues that a constituent asked about and [2] the probability that the constituent responded to the survey.

Table D2 of Butler et al 2022 indicates that constituents were more likely to respond to the female legislators' request to respond to the online survey (p<0.05). Butler et al 2022 indicates that "Women are thus contacted more often but do not receive more requests per contact" (p. 2281). But it doesn't seem correct to describe a higher chance of responding to a female legislator's request to complete a survey as contacting female legislators more, especially if the suggestion is that the experimental results about contact initiated by the legislator applies to contact that is not initiated by the legislator.

If anything, constituents being more likely to respond to female legislator requests than male legislator requests seems like a constituent bias in favor of female legislators.

---

NOTE

1. To date, no responses to tweets about the potential error or the research method.

Tagged with: , ,

1.

In May, I published a blog post about deviations from the pre-analysis plan for the Stephens-Dougan 2022 APSR letter, and I tweeted a link to the blog post that tagged @LaFleurPhD and asked her directly about the deviations from the pre-analysis plan. I don't recall receiving a response from Stephens-Dougan, and, a few days later, on May 31, I emailed the APSR about my post, listing three concerns:

* The Stephens-Dougan 2022 description of racially prejudiced Whites not matching how the code for Stephens-Dougan 2022 calculated estimates for racially prejudiced Whites.

* The substantial deviations from the pre-analysis plan.

* Figure 1 of the APSR letter reporting weighted estimates, but the evidence being much weaker in unweighted analyses.

Six months later (December 5), the APSR has published a correction to Stephens-Dougan 2022. The correction addresses each of my three concerns, but not perfectly, which I'll discuss below, along with other discussion about Stephens-Dougan 2022 and its correction. I'll refer to the original APSR letter as "Stephens-Dougan 2022" and the correction as "the correction".

---

2.

The pre-analysis plan associated with Stephens-Dougan 2022 listed four outcomes at the top of its page 4, but only one of these outcomes (referred to as "Individual rights and freedom threatened") was reported on in Stephens-Dougan 2022. However, Table 1 of Stephens-Dougan 2022 reported results for three outcomes that were not mentioned in the pre-analysis plan.

The t-statistics for the key interaction term for the three outcomes included in Table 1 of Stephens-Dougan 2022 but not mentioned in pre-analysis plan were 2.6, 2.0, and 2.1, all of which indicate sufficient evidence. The t-statistics for the key interaction term mentioned in pre-analysis plan but omitted from Stephens-Dougan 2022 were 0.6, 0.6, and 0.6, none of which indicate sufficient evidence.

I calculated the t-statistics of 2.6, 2.0, and 2.1 from Table 1 of Stephens-Dougan 2022, by dividing a coefficient by its standard error. I wasn't able to use the correction to calculate the t-statistics of 0.6, 0.6, and 0.6, because the relevant data for these three omitted pre-analysis plan outcomes are not in the correction but instead are in Table A12 of a "replication-final.pdf" file hosted at the Dataverse.

That's part of what I meant about an imperfect correction: a reader cannot use information published in the APSR itself to calculate the evidence provided by the outcomes that were planned to be reported on in the pre-analysis plan, or, for that matter, to see how there is substantially less evidence in the unweighted analysis. Instead, a reader needs to go to the Dataverse and dig through table after table of results.

The correction refers to deviations from the pre-analysis plan, but doesn't indicate the particular deviations and doesn't indicate what happens when these deviations are not made.  The "Supplementary Materials Correction-Final.docx" file at the Dataverse for Stephens-Dougan 2022 has a discussion of deviations from the pre-analysis plan, but, as far as I can tell, the discussion does not provide a reason why the results should not be reported for the three omitted outcomes, which were labeled in Table A12 as "Slow the Spread", "Stay Home", and "Too Long to Loosen Restrictions".

It seems to me to be a bad policy to permit researchers to deviate from a pre-analysis plan without justification and to merely report results from a planned analysis on, say, page 46 of a 68-page file on the Dataverse. But a bigger problem might be that, as far as I can tell, many journals don't even attempt to prevent misleading selective reporting for survey research for which there is no pre-analysis plan. Journals could require researchers reporting on surveys to submit or link to the full questionnaire for the surveys or at least to declare that the main text reports on results for all plausible measured outcomes and moderators.

---

3.

Next, let me discuss a method used in Stephens-Dougan 2022 and the correction, which I think is a bad method.

The code for Stephens-Dougan 2022 used measures of stereotypes about Whites and Blacks on the traits of hard working and intelligent, to create a variable called "negstereotype_endorsement". The code divided respondents into three categories, coded 0 for respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for respondents who endorsed both negative stereotypes about Blacks relative to Whites. For both Stephens-Dougan 2022 and the correction, Figure 3 reported for each reported outcome an estimate of how much the average treatment effect among prejudiced Whites (defined as those coded 1) differed from the average treatment effect among unprejudiced Whites (defined as those coded 0).

The most straightforward way to estimate this difference in treatment effects is to [1] calculate the treatment effect for prejudiced Whites coded 1, [2] calculate the treatment effect for unprejudiced Whites coded 0, and [3] calculate the difference between these treatment effects. The code for Stephens-Dougan 2022 instead estimated this difference using a logit regression that had three predictors: the treatment, the 0/0.5/1 measure of prejudice, and an interaction of the prior two predictors. But, by this method, the estimated difference in treatment effect between the 1 respondents and the 0 respondents depends on the 0.5 respondents. I can't think of a valid reason why responses from the 0.5 respondents should influence an estimated difference between the 0 respondents and the 1 respondents.

See my Stata output file for more on that. The influence of the 0.5 respondents might not be major in most or all cases, but an APSR reader won't know, based on Stephens-Dougan 2022 or its correction, the extent to which the 0.5 respondents influenced the estimates for the comparison of the 0 respondents to the 1 respondents.

Now about those 0.5 respondents…

---

4.

Remember that the Stephens-Dougan 2022 "negative stereotype endorsement" variable has three levels: 0 for the 74% of respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for the 16% of respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for the 10% of respondents who endorsed both negative stereotypes about Blacks relative to Whites.

The correction indicates that "I discovered an error in the description of the variable, negative stereotype endorsement" and that "there was no error in the code used to create the variable". So was the intent for Stephens-Dougan 2022 to measure racial prejudice so that only the 1 respondents are considered prejudiced? Or was the intent to consider the 0.5 respondents and the 1 respondents to be prejudiced?

The pre-analysis plan seems to indicate a different method for measuring the moderator of negative stereotype endorsement:

The difference between the rating of Blacks and Whites is taken on both dimensions (intelligence and hard work) and then averaged.

But the pre-analysis plan also indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions.

So, even ignoring the plan to average the stereotype ratings, the pre-analysis plan is inconclusive about whether the intent was to use two or three bins. Let's try this passage from Stephens-Dougan 2022:

A nontrivial fraction of the nationally representative sample—26%—endorsed either the stereotype that African Americans are less hardworking than whites or that African Americans are less intelligent than whites.

So that puts the 16% of respondents at the 0.5 level of negative stereotype endorsement into the same bin as the 10% at the 1 level of negative stereotype endorsement. Stephens-Dougan 2022 doesn't report the percentage that endorsed both negative stereotypes about Blacks. Reporting the percentage of 26% is what would be expected if the intent was to place into one bin any respondent who endorsed at least one of the negative stereotypes about Blacks, so I'm a bit skeptical of the claim in the correction that the description is in error and the code was correct. Maybe I'm missing something, but I don't see how someone who intends to have three bins reports the 26% and does not report the 10%.

For another thing, Stephens-Dougan 2022 has only three figures: Figure 1 reports results for racially prejudiced Whites, Figure 2 reports results for non-racially prejudiced Whites, and Figure 3 reports on the difference between racially prejudiced Whites and non-racially prejudiced Whites. Did Stephens-Dougan 2022 intend to not report results for the group of respondents who endorsed exactly one of the negative stereotypes about Blacks? Did Stephens-Dougan 2022 intend to suggest that respondents who rate Blacks as lazier in general than Whites aren't racially prejudiced as long as they rate Blacks equal to or higher than Whites in general on intelligence?

---

5.

Stephens-Dougan 2022 and the correction depict 84% confidence intervals in all figures. Stephens-Dougan 2022 indicated (footnote omitted) that:

For ease of interpretation, I plotted the predicted probability of agreeing with each pandemic measure in Figure 1, with 84% confidence intervals, the graphical equivalent to p < 0.05.

The 84% confidence interval is good for assessing a p=0.05 difference between estimates, but not for assessing at p=0.05 whether an estimate differs from a particular number such as zero. So 84% confidence intervals make sense for Figures 1 and 2, in which the key comparisons are of the control estimate to the treatment estimate. But 84% confidence intervals don't make as much sense for Figure 3, which plot only one estimate and for which the key assessment is whether the estimate differs from zero (Figure 3 in Stephens-Dougan 2022) or from 1 (the correction).

---

6.

I didn’t immediately realize why, in Figure 3 in Stephens-Dougan 2022, two of the four estimates cross zero, but in Figure 3 in the correction, none of the four estimates cross zero. Then I realized that the estimates plotted in Figure 3 of the correction (but not Figure 3 in Stephens-Dougan 2022) are odds ratios.

The y-axis for odds ratios for Figure 3 of the correction ranges from 0 to 30-something, using a linear scale. The odds ratio that indicates no effect is 1, and an odds ratio can't be negative, so that it why none of the four estimates cross zero in the corrected Figure 3.

It seems like a good idea for a plot of odds ratios to have a guideline for 1, so that readers can assess whether an odds ratio indicating no effect is a plausible value. And a log scale seems like a good idea for odds ratios, too. Relevant prior post that mentions that Fenton and Stephens-Dougan 2021 described a "very small" 0.01 odds ratio as "not substantively meaningful".

None of the 84% confidence intervals for Figure 3 capture an odds ratio that crosses 1, but an 84% confidence interval for Figure A3 in "Supplementary Materials Correction-Final.docx" does.

---

7.

Often, when I alert an author or journal to an error in a publication, the subsequent correction doesn't credit me for my work. Sometimes the correction even suggests that the authors themselves caught the error, like the correction to Stephens-Dougan 2022 seems to do:

After reviewing my code, I discovered an error in the description of the variable, negative stereotype endorsement.

I guess it's possible that Stephens-Dougan "discovered" the error. For instance, maybe after she submitted page proofs, for some reason she decided to review her code, and just happened to catch the error that she had missed before, and it's a big coincidence that this was the same error that I blogged about and alerted the APSR to.

And maybe Stephens-Dougan also discovered that her APSR letter misleadingly deviated from the relevant pre-analysis plan, so that I don't deserve credit for alerting the APSR to that.

Tagged with: , , , , , , , ,