Political Research Quarterly published Garcia and Sadhwani 2022 "¿Quien importa? State legislators and their responsiveness to undocumented immigrants", about an experiment in which state legislators were sent messages, purportedly from a Latinx person such as Juana Martinez or from an Eastern European person such as Anastasia Popov, with message senders describing themselves as "residents", "citizens", or "undocumented immigrants".

I'm not sure of the extent to which response rates to the purported undocumented immigrants were due to state legislative offices suspecting that this was yet another audit study. Or maybe it's common for state legislators to receive messages from senders who invoke their undocumented status, as in this experiment ("As undocumented immigrants in your area we are hoping you can help us").

But that's not what this post is about.

---

1.

Garcia and Sadhwani 2022 Table 1 Model 2 reports estimates from a logit regression predicting whether a response was received from the state legislator, with predictors such as legislative professionalism. The coefficient was positive for legislative professionalism, indicating that, on average and other model variables held constant, legislators from states with higher levels of legislative professionalism were more likely to respond, compared to legislators from states with lower levels of legislative professionalism.

Another Model 2 predictor was "state", which had a coefficient of 0.007, a standard error of 0.002, and three statistical significance asterisks indicating that, on average and other model variables held constant -- what? -- legislators from states with more "state-ness" were more likely to respond? I'm pretty sure that this "state" predictor was coded with states later in the alphabet such as Wyoming assigned a higher number than states earlier in the alphabet such as Alabama. I don't think makes any sense as a predictor of response rates, but the predictor was statistically significant, so that's interesting.

The "state" variable was presumably meant to be included as a categorical predictor, based on the Garcia and Sadhwani 2022 text (emphasis added):

For example, we include the Squire index for legislative professionalism (Squire 2007), the chamber in which the legislator serves, and a fixed effects variable for states.

I think this is something that a peer reviewer or editor should catch, especially because Garcia and Sadhwani 2022 doesn't report that many results in tables or figures.

---

2.

Garcia and Sadhwani 2022 Table 1 model 2 omits the sender category of undocumented Latinx, so that results for the five included sender categories can be interpreted relative to omitted sender category of undocumented Latinx. So far so good.

But then Garcia and Sadhwani 2022 interprets the other predictors as applying to only the omitted sender category of undocumented Latinx, such as (sic for "respond do a request"):

To further examine the potential impact of sentiments toward immigrants and immigration at the state level, we included a variable ("2012 Romney states") to examine if legislators in states that went to Romney in the 2012 presidential election were less likely to respond do a request from an undocumented immigrant. We found no such relationship in the data.

This apparent misinterpretation appears in the abstract (emphasis added):

We found that legislators respond less to undocumented constituents regardless of their ethnicity and are more responsive to both the Latinx and Eastern European-origin citizen treatments, with Republicans being more biased in their responsiveness to undocumented residents.

I'm interpreting that emphasized part to mean that the Republican legislator gap in responsiveness to undocumented constituents compared to citizen constituents was larger than the non-Republican legislator gap in responsiveness to undocumented constituents compared to citizen constituents. And I don't think that's correct based on the data for Garcia and Sadhwani 2022.

My analysis used an OLS regression to predict whether a legislator responded, with only a predictor for "undocCITIZ" coded 1 for undocumented senders and 0 for citizen senders. Coefficients were -0.07 among Republican legislators and -0.11 among non-Republican legislators, so the undocumented/citizen gap was not larger among Republican legislators compared to non-Republican legislators. Percentage responses are in the table below:

SENDER         GOP NON-GOP 
Citizen EEurop 21  23
Citizen Latina 26  29
Control EEurop 25  33
Control Latina 18  20
Undocum EEurop 18  12
Undocum Latina 15  17
OVERALL        20  22

---

NOTE

1. No response yet to my Nov 17 tweet to a co-author of Garcia and Sadhwani 2022.

Tagged with: , , , ,

Political Research Quarterly published Huber and Gunderson 2022 "Putting a fresh face forward: Does the gender of a police chief affect public perceptions?". Huber and Gunderson 2022 reports on a survey experiment in which, for one of the manipulations, a police chief was described as female (Christine Carlson or Jada Washington) or male (Ethan Carlson or Kareem Washington).

---

Huber and Gunderson 2022 has a section called "Heterogeneous Responses to Treatment" that reports on results that divided the sample into "high sexism" respondents and "low sexism" respondents. For example, the mean overall support for the female police chief was 3.49 among "low sexism" respondents and was 3.41 among "high sexism" respondents, with p=0.05 for the difference. Huber and Gunderson 2022 (p. 8) claims that [sic on the absence of a "to"]:

These results indicate that respondents' sexism significantly moderates their support for a female police chief and supports role congruity theory, as individuals that are more sexist should react more negatively [sic] violations of gender roles.

But, for all we know from the results reported in Huber and Gunderson 2022, "high sexism" respondents might merely rate police chiefs lower relative to how "low sexism" respondents rate police chiefs, regardless of the gender of the police chief.

Instead of the method in Huber and Gunderson 2022, a better method to test whether "individuals that are more sexist...react more negatively [to] violations of gender roles" is to estimate the effect of the male/female treatment on ratings about the police chief among "high sexism" respondents. And, to test whether "respondents' sexism significantly moderates their support for a female police chief", we can compare the results of that test to results from a corresponding test among "low sexism" respondents.

---

Using the data and code for Huber and Gunderson 2022, I ran the code up to the section for Table 4, which is the table about sexism. I then ran my modified code of the Huber and Gunderson 2022 code for Table 4, among respondents Huber and Gunderson 2022 labeled "high sexism", which is for a score above 0.35 on the measure of sexism, and then among respondents Huber and Gunderson 2022 labeled "low sexism", which is for a score below 0.35 on the measure of sexism.

Results are below, indicating a lack of p<0.05 evidence for a male/female treatment effect among these "high sexism" respondents, along with a p<0.05 pro-female bias among the "low sexism" respondents on all but one of the Table 4 items.

HIGH SEXISM RESPONDENTS------------------
                     Female Male
                     Chief  Chief
Domestic Violence    3.23   3.16  p=0.16
Sexual Assault       3.20   3.16  p=0.45
Violent Crime Rate   3.20   3.23  p=0.45
Corruption           3.21   3.18  p=0.40
Police Brutality     3.17   3.17  p=0.94
Community Leaders    3.33   3.31  p=0.49
Police Chief Support 3.41   3.39  p=0.52

LOW SEXISM RESPONDENTS------------------
                     Female Male
                     Chief  Chief
Domestic Violence    3.40   3.21  p<0.01
Sexual Assault       3.44   3.22  p<0.01
Violent Crime Rate   3.40   3.33  p=0.10
Corruption           3.21   3.07  p=0.01
Police Brutality     3.24   3.11  p=0.01
Community Leaders    3.40   3.32  p=0.02
Police Chief Support 3.49   3.37  p<0.01

---

I'm sure that there might be more of interest, such as calculating p-values for the difference between the treatment effect among "low sexism" respondents and the treatment effect among "high sexism" respondents, and assessing whether there is stronger evidence of a treatment effect among "high sexism" respondents higher up the sexism scale than the 0.35 threshold used in Huber and Gunderson 2022.

But I at least wanted to document another example of a pro-female bias among "low sexism" respondents.

Tagged with: , , , ,

1.

In May, I published a blog post about deviations from the pre-analysis plan for the Stephens-Dougan 2022 APSR letter, and I tweeted a link to the blog post that tagged @LaFleurPhD and asked her directly about the deviations from the pre-analysis plan. I don't recall receiving a response from Stephens-Dougan, and, a few days later, on May 31, I emailed the APSR about my post, listing three concerns:

* The Stephens-Dougan 2022 description of racially prejudiced Whites not matching how the code for Stephens-Dougan 2022 calculated estimates for racially prejudiced Whites.

* The substantial deviations from the pre-analysis plan.

* Figure 1 of the APSR letter reporting weighted estimates, but the evidence being much weaker in unweighted analyses.

Six months later (December 5), the APSR has published a correction to Stephens-Dougan 2022. The correction addresses each of my three concerns, but not perfectly, which I'll discuss below, along with other discussion about Stephens-Dougan 2022 and its correction. I'll refer to the original APSR letter as "Stephens-Dougan 2022" and the correction as "the correction".

---

2.

The pre-analysis plan associated with Stephens-Dougan 2022 listed four outcomes at the top of its page 4, but only one of these outcomes (referred to as "Individual rights and freedom threatened") was reported on in Stephens-Dougan 2022. However, Table 1 of Stephens-Dougan 2022 reported results for three outcomes that were not mentioned in the pre-analysis plan.

The t-statistics for the key interaction term for the three outcomes included in Table 1 of Stephens-Dougan 2022 but not mentioned in pre-analysis plan were 2.6, 2.0, and 2.1, all of which indicate sufficient evidence. The t-statistics for the key interaction term mentioned in pre-analysis plan but omitted from Stephens-Dougan 2022 were 0.6, 0.6, and 0.6, none of which indicate sufficient evidence.

I calculated the t-statistics of 2.6, 2.0, and 2.1 from Table 1 of Stephens-Dougan 2022, by dividing a coefficient by its standard error. I wasn't able to use the correction to calculate the t-statistics of 0.6, 0.6, and 0.6, because the relevant data for these three omitted pre-analysis plan outcomes are not in the correction but instead are in Table A12 of a "replication-final.pdf" file hosted at the Dataverse.

That's part of what I meant about an imperfect correction: a reader cannot use information published in the APSR itself to calculate the evidence provided by the outcomes that were planned to be reported on in the pre-analysis plan, or, for that matter, to see how there is substantially less evidence in the unweighted analysis. Instead, a reader needs to go to the Dataverse and dig through table after table of results.

The correction refers to deviations from the pre-analysis plan, but doesn't indicate the particular deviations and doesn't indicate what happens when these deviations are not made.  The "Supplementary Materials Correction-Final.docx" file at the Dataverse for Stephens-Dougan 2022 has a discussion of deviations from the pre-analysis plan, but, as far as I can tell, the discussion does not provide a reason why the results should not be reported for the three omitted outcomes, which were labeled in Table A12 as "Slow the Spread", "Stay Home", and "Too Long to Loosen Restrictions".

It seems to me to be a bad policy to permit researchers to deviate from a pre-analysis plan without justification and to merely report results from a planned analysis on, say, page 46 of a 68-page file on the Dataverse. But a bigger problem might be that, as far as I can tell, many journals don't even attempt to prevent misleading selective reporting for survey research for which there is no pre-analysis plan. Journals could require researchers reporting on surveys to submit or link to the full questionnaire for the surveys or at least to declare that the main text reports on results for all plausible measured outcomes and moderators.

---

3.

Next, let me discuss a method used in Stephens-Dougan 2022 and the correction, which I think is a bad method.

The code for Stephens-Dougan 2022 used measures of stereotypes about Whites and Blacks on the traits of hard working and intelligent, to create a variable called "negstereotype_endorsement". The code divided respondents into three categories, coded 0 for respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for respondents who endorsed both negative stereotypes about Blacks relative to Whites. For both Stephens-Dougan 2022 and the correction, Figure 3 reported for each reported outcome an estimate of how much the average treatment effect among prejudiced Whites (defined as those coded 1) differed from the average treatment effect among unprejudiced Whites (defined as those coded 0).

The most straightforward way to estimate this difference in treatment effects is to [1] calculate the treatment effect for prejudiced Whites coded 1, [2] calculate the treatment effect for unprejudiced Whites coded 0, and [3] calculate the difference between these treatment effects. The code for Stephens-Dougan 2022 instead estimated this difference using a logit regression that had three predictors: the treatment, the 0/0.5/1 measure of prejudice, and an interaction of the prior two predictors. But, by this method, the estimated difference in treatment effect between the 1 respondents and the 0 respondents depends on the 0.5 respondents. I can't think of a valid reason why responses from the 0.5 respondents should influence an estimated difference between the 0 respondents and the 1 respondents.

See my Stata output file for more on that. The influence of the 0.5 respondents might not be major in most or all cases, but an APSR reader won't know, based on Stephens-Dougan 2022 or its correction, the extent to which the 0.5 respondents influenced the estimates for the comparison of the 0 respondents to the 1 respondents.

Now about those 0.5 respondents…

---

4.

Remember that the Stephens-Dougan 2022 "negative stereotype endorsement" variable has three levels: 0 for the 74% of respondents who did not endorse a negative stereotype about Blacks relative to Whites, 0.5 for the 16% of respondents who endorsed exactly one of the two negative stereotypes about Blacks relative to Whites, and 1 for the 10% of respondents who endorsed both negative stereotypes about Blacks relative to Whites.

The correction indicates that "I discovered an error in the description of the variable, negative stereotype endorsement" and that "there was no error in the code used to create the variable". So was the intent for Stephens-Dougan 2022 to measure racial prejudice so that only the 1 respondents are considered prejudiced? Or was the intent to consider the 0.5 respondents and the 1 respondents to be prejudiced?

The pre-analysis plan seems to indicate a different method for measuring the moderator of negative stereotype endorsement:

The difference between the rating of Blacks and Whites is taken on both dimensions (intelligence and hard work) and then averaged.

But the pre-analysis plan also indicates that:

For racial predispositions, we will use two or three bins, depending on their distributions.

So, even ignoring the plan to average the stereotype ratings, the pre-analysis plan is inconclusive about whether the intent was to use two or three bins. Let's try this passage from Stephens-Dougan 2022:

A nontrivial fraction of the nationally representative sample—26%—endorsed either the stereotype that African Americans are less hardworking than whites or that African Americans are less intelligent than whites.

So that puts the 16% of respondents at the 0.5 level of negative stereotype endorsement into the same bin as the 10% at the 1 level of negative stereotype endorsement. Stephens-Dougan 2022 doesn't report the percentage that endorsed both negative stereotypes about Blacks. Reporting the percentage of 26% is what would be expected if the intent was to place into one bin any respondent who endorsed at least one of the negative stereotypes about Blacks, so I'm a bit skeptical of the claim in the correction that the description is in error and the code was correct. Maybe I'm missing something, but I don't see how someone who intends to have three bins reports the 26% and does not report the 10%.

For another thing, Stephens-Dougan 2022 has only three figures: Figure 1 reports results for racially prejudiced Whites, Figure 2 reports results for non-racially prejudiced Whites, and Figure 3 reports on the difference between racially prejudiced Whites and non-racially prejudiced Whites. Did Stephens-Dougan 2022 intend to not report results for the group of respondents who endorsed exactly one of the negative stereotypes about Blacks? Did Stephens-Dougan 2022 intend to suggest that respondents who rate Blacks as lazier in general than Whites aren't racially prejudiced as long as they rate Blacks equal to or higher than Whites in general on intelligence?

---

5.

Stephens-Dougan 2022 and the correction depict 84% confidence intervals in all figures. Stephens-Dougan 2022 indicated (footnote omitted) that:

For ease of interpretation, I plotted the predicted probability of agreeing with each pandemic measure in Figure 1, with 84% confidence intervals, the graphical equivalent to p < 0.05.

The 84% confidence interval is good for assessing a p=0.05 difference between estimates, but not for assessing at p=0.05 whether an estimate differs from a particular number such as zero. So 84% confidence intervals make sense for Figures 1 and 2, in which the key comparisons are of the control estimate to the treatment estimate. But 84% confidence intervals don't make as much sense for Figure 3, which plot only one estimate and for which the key assessment is whether the estimate differs from zero (Figure 3 in Stephens-Dougan 2022) or from 1 (the correction).

---

6.

I didn’t immediately realize why, in Figure 3 in Stephens-Dougan 2022, two of the four estimates cross zero, but in Figure 3 in the correction, none of the four estimates cross zero. Then I realized that the estimates plotted in Figure 3 of the correction (but not Figure 3 in Stephens-Dougan 2022) are odds ratios.

The y-axis for odds ratios for Figure 3 of the correction ranges from 0 to 30-something, using a linear scale. The odds ratio that indicates no effect is 1, and an odds ratio can't be negative, so that it why none of the four estimates cross zero in the corrected Figure 3.

It seems like a good idea for a plot of odds ratios to have a guideline for 1, so that readers can assess whether an odds ratio indicating no effect is a plausible value. And a log scale seems like a good idea for odds ratios, too. Relevant prior post that mentions that Fenton and Stephens-Dougan 2021 described a "very small" 0.01 odds ratio as "not substantively meaningful".

None of the 84% confidence intervals for Figure 3 capture an odds ratio that crosses 1, but an 84% confidence interval for Figure A3 in "Supplementary Materials Correction-Final.docx" does.

---

7.

Often, when I alert an author or journal to an error in a publication, the subsequent correction doesn't credit me for my work. Sometimes the correction even suggests that the authors themselves caught the error, like the correction to Stephens-Dougan 2022 seems to do:

After reviewing my code, I discovered an error in the description of the variable, negative stereotype endorsement.

I guess it's possible that Stephens-Dougan "discovered" the error. For instance, maybe after she submitted page proofs, for some reason she decided to review her code, and just happened to catch the error that she had missed before, and it's a big coincidence that this was the same error that I blogged about and alerted the APSR to.

And maybe Stephens-Dougan also discovered that her APSR letter misleadingly deviated from the relevant pre-analysis plan, so that I don't deserve credit for alerting the APSR to that.

Tagged with: , , , , , , , ,

I reached ten new publications to comment on that I didn't think were worth a separate blog post, so here goes:

---

1.

The Twitter account for the journal Politics, Groups, and Identities retweeted R.G. Cravens linking to two of his articles in Politics, Groups, and Identities. I blogged about one of these articles, discussing, among other things, the article's erroneous interpretation of interaction terms. The other article that R.G. Cravens linked to in that tweet ("The view from the top: Social acceptance and ideological conservatism among sexual minorities") also misinterpreted an interaction term:

However, the coefficient estimate for the interaction term between racial minority identity and racial identity group consciousness (β = −.312, p = .000), showing the effect of racial identity group consciousness only among racial minority respondents, indicates a negative relationship between racial minority group consciousness and conservatism at the 99% confidence level.

The corresponding Table 1 coefficient for RI Consciousness is 0.117, indicating the estimated effect of racial identity consciousness when the "Racial minority" variable is set to zero. The -0.312 interaction term indicates how much the estimated effect of racial identity consciousness *differs* between non-racial minorities and racial minorities, so that the estimated effect of racial identity consciousness among racial minorities is 0.117 plus -0.312, which is -0.195.

Two articles by one author in the same journal within three years, and each article misinterpreted an interaction term.

---

2.

PS: Political Science & Politics published another article about student evaluations of teaching: Foster 2022 "Instructor name preference and student evaluations of instruction". The key finding seems plausible, that "SEIs were higher for instructors who preferred going by their first name...than for instructors who preferred going by 'Dr. Moore'" (p. 4).

But a few shortcomings about the reporting on the experiment in Study 2, which manipulated the race of an instructor, the gender of the instructor, and the instructor's stated preference for using his/her first name versus using his/her title and last name:

* Hypothesis 5 is about conservative Republicans:

Moderated mediation: We predict that female instructors who express a preference for going by "Dr. Moore" will have lower teacher ratings through decreased perceived immediacy, but only for students who identify as conservative and Republican.

But, as far as I can tell, the article doesn't report any data about Hypothesis 5.

* Table 2 indicates a positive p<0.05 correlation between the race of the instructor and SEIs (student evaluations of instruction) and a positive p<0.05 correlation between the race of the instructor and course evaluations. But, as far as I can tell, the article doesn't report how the race variable was coded, so it's not clear whether the White instructors or the Black instructors had the higher SEIs and course evaluations.

* The abstract indicates that:

Study 2 found the highest SEIs for Black male instructors when instructors asked students to call them by their first name, but there was a decrease in SEI scores if they went by their professional title.

But, as far as I can tell, the article doesn't report sufficient evidence about whether the estimated influence of the name preference among the Black male instructor targets differed from the estimated influence of the name preference among any of the comparison instructors. The p-value being under p=0.05 for the Black male instructor targets and not being under p=0.05 for the other instructor targets isn't enough evidence to infer at p<0.05 that participants treated the Black male instructor targets differently than participants treated the comparison instructor targets, so that the article doesn't report sufficient evidence to permit an inference of racial discrimination.

---

---

5.

I wasn't the only person to notice this next one (see tweets from Tom Pepinsky and Brendan Nyhan), but Politics & Gender recently published Forman-Rabinovici and Mandel 2022 "The prevalence and implications of gender blindness in quantitative political science research", which indicated that:

Our findings show that gender-sensitive analysis yields more accurate and useful results. In two out of the three articles we tested, gender-sensitive analysis indeed led to different outcomes that changed the ramifications for theory building as a result.

But the inferential technique in the analysis reflected a common error.

For the first of the three aforementioned articles (Gandhi and Ong 2019), Table 1a of Forman-Rabinovici and Mandel 2022 reported results with a key coefficient that was -.308 across the sample, was -.294 (p=.003) among men in the sample, and was -.334 (p=.154) among women in the sample. These estimates are from a linear probability model predicting a dichotomous "Support PH" outcome, so the point estimates were 29 percentage points among men and 33 percentage points among women.

The estimate was more extreme among women than among men, but the estimate was less precise among women than among men, at least partly because the sample size among men (N=1902) was about three times the sample size among women (N=652).

Figure 1 of Forman-Rabinovici and Mandel 2022 described these results as:

Male voters leave PH coalition

Female voters continue to vote for PH coalition

But, in my analysis of the data, the ends of the 95% confidence interval for the estimate among women indicated an 82 percentage point decrease and a 15 percentage point increase [-0.82, +0.15], so that's not nearly enough evidence to infer a lack of an effect among women.

---

6.

Politics & Gender published another article that has at least a misleading interpretation of interaction terms: Kreutzer 2022 "Women's support shaken: A study of women's political trust after natural disasters".

Table 1 reports results for three multilevel mixed-effects linear regressions, with coefficients on a "Number of Disasters Present" predictor of 0.017, 0.009, and 0.022. The models have a predictor for "Female" and an interaction of "Female" and "Number of Disasters Present" with interaction coefficients of –0.001, –0.002, and –0.001. So the combination of coefficients indicates that the associations of "Number of Disasters Present" and the "trust" outcomes are positive among women, but not as positive as the associations are among men.

Kreutzer 2022 discusses this correctly in some places, such as indicating that the interaction term "allows a comparison of how disasters influence women's political trust compared with men's trust" (p. 15). But in other places the interpretation is, I think, incorrect or at least misleading, such as in the abstract (emphasis added):

I investigate women's trust in government institutions when natural disasters have recently occurred and argue that because of their unique experiences and typical government responses, women's political trust will decline when there is a natural disaster more than men's. I find that when there is a high number of disasters and when a larger percentage of the population is affected by disasters, women's political trust decreases significantly, especially institutional trust.

Or from page 23:

I have demonstrated that natural disasters create unique and vulnerable situations for women that cause their trust in government to decline.

And discussing Figure 5, referring to a different set of three regressions (reference to footnote 12 omitted):

The figure shows a small decline in women's trust (overall, institutional, organizational) as the percentage of the population affected by disasters in the country increases. The effect is significantly different from 0, but the percentage affected seems not to make a difference.

That seems to say that the percentage of the population affected has an effect that is simultaneously not zero and does not seem to make a difference. I think Figure 5 marginal effects plots indicate that women have lower trust than men (which is why each point estimate line falls in the negative range), but that this gender difference in trust does not vary much by the percentage of the population affected (which is why the each point estimate line is pretty much flat).

---

The "Women's Political Empowerment Index" coefficient and standard error are –0.017 and 0.108 in Model 4, so maybe the ** indicating a two-tailed p<0.01 is an error.

Tweet to the author (Oct 3). No reply yet.

---

7, 8.

Let's return to Politics, Groups, and Identities, for Ditonto 2019 "Direct and indirect effects of prejudice: sexism, information, and voting behavior in political campaigns". From the abstract:

I also find that subjects high in sexism search for less information about women candidates...

At least in the reported analyses, the comparison for "less" is to participants low in sexism instead of to male candidates. So we get this result discussing Table 2 (pp. 598-599):

Those who scored lowest in sexism are predicted to look at approximately 13 unique information boxes for the female candidate, while those who scored highest are predicted to access about 10 items, or almost 1/3 less.

It should be obvious to peer reviewers and any editors that a comparison to the male candidates in the experiment would be a more useful comparison for assessing the effect of sexism, because, for all we know, respondents high in sexism might search for less information than respondents low in sexism search for, no matter the gender of the candidate.

Ditonto has another 2019 article in a different journal (Political Psychology) based on the same experiment: "The mediating role of information search in the relationship between prejudice and voting behavior". From that abstract:

I also find that subjects high in prejudice search for less information about minority candidates...

But, again, Table 2 in that article merely indicates that symbolic racism negatively associates with information search for a minority candidate, with no information provided about information search for a non-minority candidate.

---

And I think that the Ditonto 2019 abstracts include claims that aren't supported by results reported in the article. The PGI abstract claims that "I find that subjects with higher scores on items measuring modern sexism...rate female candidates more negatively than their male counterparts", and the PP abstract claims that "I find that subjects higher in symbolic racism...rate minority candidates more negatively than their white counterparts".

By the way, claims about respondents high in sexism or racism should be assessed using data only from respondents high in sexism or racism, because the association of a sexism or racism measure with an outcome might be completely due to respondents low in sexism or racism.

Tweet to the author (Oct 9). No reply yet.

---

9.

Below is a passage from "Lower test scores from wildfire smoke exposure", by Jeff Wen and Marshall Burke, published in 2022 in Nature Sustainability:

When we consider the cumulative losses over all study years and across subgroups (Fig. 4b), we estimate the net present value of lost future income to be roughly $544 million (95% CI: −$999 million to −$100 million) from smoke PM2.5 exposure in 2016 for districts with low economic disadvantage and low proportion of non-White students. For districts with high economic disadvantage and high proportion of non-White students, we estimate cumulative impacts to be $1.4 billion (95% CI: −$2.3 billion to −$477 million) from cumulative smoke PM2.5 exposure in 2016. Thus, of the roughly $1.7 billion in total costs during the smokiest year in our sample, 82% of the costs we estimate were borne by economically disadvantaged communities of colour.

So, in 2016, the lost future income was about $0.5 billion for low economic disadvantage / low non-White districts and $1.4 billion for high economic disadvantage / high non-White districts; that gets us to $1.9 billion, without even including the costs from low/high districts and high/low districts. But total costs were cited as roughly $1.7 billion.

From what I can tell from Figure 4b, the percentage of total costs attributed to economically disadvantaged communities of color (the high / high category) is 59%. It's not a large inferential difference from 82%, in that both estimates are a majority, but it's another example of an error that could have been caught by careful reading.

Tweet to the authors about this (Oct 17). No reply yet.

---

10.

Political Research Quarterly published "Opening the Attitudinal Black Box: Three Dimensions of Latin American Elites' Attitudes about Gender Equality", by Amy Alexander, Asbel Bohigues, and Jennifer M. Piscopo.

I was curious about the study's measurement of attitudes about gender equality, and, not unexpectedly, the measurement was not good, using items such as "In general, men make better political leaders than women", in which respondents can agree that men make better political leaders, can disagree that men make better political leaders, and can be neutral about the claim that men make better political leaders...but respondents cannot report the belief that, in general, women make better political leaders than men do.

I checked the data, in case almost no respondent disagreed with the statement that "In general, men make better political leaders than women", in which case presumably no respondent would think that women make better political leaders than men do. But disagreement with the statement was pretty high, with 69% strongly disagreeing, another 15% disagreeing, and another 11% selecting neither agree nor disagree.

I tweeted a question about this to some of the authors (Oct 21). No reply yet.

Tagged with: , , , ,

I'll hopefully at some point write a summary that refers to a lot of my "comments" posts. But I have at least a few more to release before then, so here goes...

---

Politics, Groups, and Identities recently published Peay and McNair II 2022 "Concurrent pressures of mass protests: The dual influences of #BlackLivesMatter on state-level policing reform adoption". Peay and McNair II 2022 reported regressions that predicted a count of the number of police reform policies enacted by a state from August 2014 through 2020, using a key predictor of the number of Black Lives Matter protests in a state in the year after the killing of Michael Brown in August 2014.

An obvious concern is that the number of protests in a state is capturing the population size of the state. That's a concern because it's plausible that higher population states have legislatures that are more active than smaller population states, so that we would expect these high population states to tend to enact more policies per se, and not merely to enact more police reform policies. But the Peay and McNair II 2022 analysis does not control for the population size of the state.

I checked the correlation between [1] the number of Black Lives Matter protests in a state in the year after the killing of Michael Brown in August 2014 (data from Trump et al. 2018) and [2] the first list of the number of bills enacted by a state that I happened upon, which was the number of bills a state enacted from 2006 to 2009 relating to childhood obesity. The R-squared was 0.22 for a bivariate OLS regression using the state-level count of BLM protests to predict the state-level count of childhood obesity bills enacted. In comparison, Peay and McNair II 2022 Table 2 indicated that the R-squared was 0.19 in a bivariate OLS regression that used the state-level count of BLM protests to predict the state-level count of police reform policies enacted. So the concern about population size seems at least plausible.

---

This is a separate concern, but Figure 6 of Peay and McNair II 2022 reports predicted probabilities, with an x-axis of the number of protests. My analysis indicated that the number of protests ranged from 0 to 87, with only three states having more than 40 protests: New York at 67, Missouri at 74, and California at 87. Yet the widest the 95% confidence interval gets in Figure 6 is about 1 percentage point, at 87, which is a pretty precise estimate given data for only 50 states and only one state past 74.

Maybe the tight 95% confidence interval is a function of the network analysis for Figure 6, if the analysis, say, treats each potential connection between California and the other 49 states as 49 independent observations. Table 2 of Peay and McNair II 2022 doesn't have a sample size for this analysis, but reports 50 as the number of observations for the other analyses in that table.

---

NOTES

1. Data for my analysis.

2. No reply yet from the authors on Twitter.

Tagged with: , ,

Homicide Studies recently published Schildkraut and Turanovic 2022 "A New Wave of Mass Shootings? Exploring the Potential Impact of COVID-19". From the abstract:

Results show that total, private, and public mass shootings increased following the declaration of COVID-19 as a national emergency in March of 2020.

I was curious how Schildkraut and Turanovic 2022 addressed the possible confound of the 25 May 2020 killing of George Floyd.

---

Below is my plot of data used in Schildkraut and Turanovic 2022, for total mass shootings:

My read of the plot is that, until after the killing of George Floyd, there is insufficient evidence that mass shootings were higher in 2020 than in 2019.

Table 1 of Schildkraut and Turanovic 2022 reports an interrupted time series analysis that does not address the killing of George Floyd, with a key estimate of 0.409 and a standard error of 0.072. Schildkraut and Turanovic 2022 reports a separate analysis about George Floyd...

However, since George Floyd's murder occurred after the onset of the COVID-19 declaration, we conducted ITSA using only the post-COVID time period (n = 53 weeks) and used the week of May 25, 2020 as the point of interruption in each time series. These results indicated that George Floyd's murder had no impact on changes in overall mass shootings (b = 0.354, 95% CI [−0.074, 0.781], p = .105) or private mass shootings (b = 0.125, 95% CI [−0.419, 0.669], p = .652), but that Floyd's murder was linked to increases in public mass shootings (b = 0.772, 95% CI [0.062, 1.483], p = .033).

...but Schildkraut and Turanovic 2022 does not report any attempt to assess whether there is sufficient evidence to attribute the increase in mass shootings to covid once the 0.354 estimate for Floyd is addressed. The lack of statistical significance for the 0.354 Floyd estimate can't be used to conclude "no impact", especially given that the analysis for the covid declaration had data for 52 weeks pre-declaration and 53 weeks post-declaration, but the analysis for Floyd had data for only 11 weeks pre-Floyd and 42 weeks post-Floyd.

Schildkraut and Turanovic 2022 also disaggregated mass shootings into public mass shootings and private mass shootings. Corresponding plots by me are below. It doesn't look like the red line for the covid declaration is the break point for the increase in 2020 relative to 2019.

Astral Codex Ten discussed methods used to try to disentangle the effect of covid from the effect of Floyd, such as using for reference prior protests and other countries.

---

NOTES

1. In the Schildkraut and Turanovic 2022 data, some dates appeared in different weeks, such as 2019 Week 11 running from March 11 to March 17, but 2020 Week 11 running from March 9 to March 15.

2. The 13 March 2020 covid declaration occurred in the middle of Week 11, but the Floyd killing occurred at the start of Week 22, which ran from 25 May 2020 to May 31 2020.

3. Data. R code for the "total" plot above.

Tagged with: , , ,

PS: Political Science & Politics recently published Hartnett and Haver 2022 "Unconditional support for Trump's resistance prior to Election Day".

Hartnett and Haver 2022 reported on an experiment conducted in October 2020 in which likely Trump voters were asked to consider the hypothetical of a Biden win in the Electoral College and in the popular vote, with a Biden popular vote percentage point win randomly assigned to be from 1 percentage point through 15 percentage points. These likely Trump voters were then asked whether the Trump campaign should resist or concede.

Data were collected before the election, but Hartnett and Haver 2022 did not report anything about a corresponding experiment involving likely Biden voters. Hartnett and Haver 2022 discussed a Reuters/Ipsos poll that "found that 41% of likely Trump voters would not accept a Biden victory and 16% of all likely Trump voters 'would engage in street protests or even violence' (Kahn 2020)". The Kahn 2020 source indicates that the corresponding percentages for Biden voters for a Trump victory were 43% and 22%, so it didn't seem like there was a good reason to not include a parallel experiment for Biden voters, especially because data on only Trump voters wouldn't permit valid inferences about the characteristics on which Trump voters were distinctive.

---

But text for a somewhat corresponding experiment involving likely Biden voters is hidden in the Hartnett and Haver 2022 codebook under white boxes or something like that. The text of the hidden items can be highlighted, copied, and pasted from the bottom of pages 19 and 20 of the codebook PDF (or more hidden text can be copied, using ctrl+A, then ctrl-C, and then pasted with ctrl-V).

The hidden codebook text indicates that the hartnett_haver block of the survey had a "bidenlose" item that asked likely Biden voters whether, if Biden wins the popular vote by the randomized percentage points and Trump wins the electoral college, the Biden campaign should "Resist the results of the election in any way possible" or "Concede defeat".

There might be an innocent explanation for Hartnett and Haver 2022 not reporting the results for those items, but that innocent explanation hasn't been shared with me yet on Twitter. Maybe Hartnett and Haver 2022 have a manuscript in progress about the "bidenlose" item.

---

NOTES

1. Hartnett and Haver 2022 seems to be the survey that Emily Badger at the New York Times referred to as "another recent survey experiment conducted by Brian Schaffner, Alexandra Haver and Brendan Hartnett at Tufts". The copied-and-pasted codebook text indicates that this was for the "2020 Tufts Class Survey".

2. On page 18 of the Hartnett and Haver 2022 codebook, above the hidden item about socialism, part of the text of the "certain advantages" item is missing, which seems to be a should-be-obvious indication that text has been covered.

3. The codebook seems to be missing pages of the full survey: in the copied-and-pasted text, page numbers jump from "Page 21 of 43" to "Page 24 of 43" to "Page 31 of 43" to "Page 33 of 43". Presumably at least some missing items were for other members of the Tufts class, although I'm not sure what happened to page 32, which seems to be part of the hartnett_haver block that started on page 31 and ended on page 33.

4. The dataset for Hartnett and Haver 2022 includes a popular vote percentage point win from 1 percentage point through 15 percentage points assigned to likely Biden voters, but the dataset has no data on a resist-or-concede outcome or on a follow-up open-ended item.

Tagged with: , , , ,