June 2016

Selective reporting in "Priming Racial Resentment without Stereotypic Cues"

By L.J Zigerell Posted on June 29, 2016 Posted in Race No Comments

In a survey experiment reported in LaFleur Stephens-Dougan's 2016 Journal of Politics article, "Priming Racial Resentment without Stereotypic Cues", respondents were shown a campaign mailer for a white candidate named Greg Davis, with experimental manipulations of the candidate's party (Democrat or Republican) and the photos on the candidate's mailers (five photos of whites, five photos of blacks, or a mixture of photos of whites and blacks).

One key finding described in the abstract is that "white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner".

The JOP article describes the analysis in Chapter 5 of Stephens' dissertation, which reported on a July 2011 YouGov/Polimetrix survey. Dissertation page 173 indicated that the survey had 13 experimental conditions, but the JOP article reports only six conditions, omitting the control condition and the six conditions in which candidate Greg Davis was black. Stephens-Dougan might plan to report on these omitted conditions in a subsequent publication, so I'll concentrate on the outcome variables.

---

Many potential outcome variables were not reported on in the JOP article, based on a comparison of the survey questionnaire in Appendix D of the dissertation to the outcome variables mentioned in the main text or the appendix of the article. The box below describes each post-treatment item on the survey questionnaire except for the Q7 manipulation check: regular font with a [*] indicates items reported on in the article, and boldface indicates items not reported on in the article. See pages 224 to 230 of the dissertation for the exact item wording.

Q8. Feeling thermometer for the candidate.

Q9 [*]. Likelihood of voting for Greg Davis.

Q10. How well a series of terms describe Greg Davis:

Intelligent

Inexperienced

Trustworthy

Hardworking

Fair [*]

Competent

Q11. Perception of Greg Davis' political ideology.

Q12. Whether Democrats, Republicans, or neither party would be better at:

Helping Senior Citizens

Improving Health Care

Improving the Economy

Reducing Crime

Reforming Public Education

Q13. Like, dislike, or neither for the Democratic Party.

Q14. Like, dislike, or neither for the Republican Party.

Q15. Perception of how well Greg Davis would handle:

Helping Senior Citizens

Improving Health Care

Improving the Economy

Reducing Crime [*]

Reforming Public Education

Q16 [*]. Whether Greg Davis' policies will favor Whites over Blacks, Blacks over Whites, or neither.

Q17. Perception of which groups Greg Davis will help if elected:

Teachers

Latinos

Corporate Executives

Farmers

Senior Citizens

African Americans

Homeowners

Students

Small Business Owners

Whites

Q18 [*]. Perception of Greg Davis' position on affirmative action for Blacks in the workplace.

Q19. Perception of Greg Davis' position on the level of federal spending on Social Security.

Q20. Job approval, disapproval, or neither for Barack Obama as president.

The boldface above indicates that many potential outcome variables were not reported on in the JOP article. For example, Q10 asked respondents how well the terms "intelligent", "inexperienced", "trustworthy", "hardworking", "fair", and "competent" describe the candidate, but readers are told about results only for "fair"; readers are told results for Q16 about the candidate's perceived preference for policies that help whites over blacks or vice versa, but readers are not told results for "Whites" and "Blacks" on Q17 about the groups that the candidate is expected to help.

Perhaps the estimates and inferences are identical for the omitted and included items, but prior analyses [e.g., here and here] suggest that omitted items often produce different estimates and sometimes produce different inferences than included items.

Data for most omitted potential outcome variables are not in the article's dataset at the JOP Dataverse, but the dataset did contain a "thermgregdavis" variable that ranged from 1 to 100, which is presumably the feeling thermometer in Q8. I used the model in line 14 of Stephens-Dougan's code, but -- instead of using the reported Q9 outcome variable for likelihood of voting for Greg Davis -- I used "thermgregdavis" as the outcome variable and changed the estimation technique from ordered logit to linear regression: the p-value for the difference between the all-white photo condition and the mixed photo condition was p=0.693, and the p-value for the difference between the all-white photo condition and the all-black photo condition was p=0.264.

---

This sort of selective reporting is not uncommon in social science [see here, here, and here], but I'm skeptical that researchers with the flexibility to report the results they want based on post-hoc research design choices will produce replicable estimates and unbiased inferences, especially in the politically-charged racial discrimination subfield. I am also skeptical that selective reporting across publications will balance out in a field in which a supermajority of researchers fall on one side of the political spectrum.

So how can such selective reporting be prevented? Researchers can preregister their research designs. Journals can preaccept articles based on a preregistered research design. For non-preregistered studies, journals can require as a condition of publication the declaration of omitted studies, experiments, experimental conditions, and outcome variables. Peer reviewers can ask for these declarations, too.

---

It's also worth comparing the hypotheses as expressed in the dissertation to the hypotheses as expressed in the JOP article. First, the hypotheses from dissertation Chapter 5, on page 153:

H1: Democratic candidates are penalized for an association with African Americans.

H2: Republican candidates are rewarded for an association with African Americans.

H3: The racial composition of an advertisement influences voters' perceptions of the candidates' policy preferences.

Now, the JOP hypotheses:

H1. White Democratic candidates associated with blacks will lose vote support and will be perceived as more likely to favor blacks over whites and more likely to support affirmative action relative to white Democratic candidates associated with images of only whites.

H2. Counterstereotypic images of African Americans paired with a white Democratic candidate will prime racial attitudes on candidate evaluations that are implicitly racial relative to a comparable white Democratic candidate associated with all whites.

H3. Counterstereotypic images of African Americans paired with a white Republican candidate will be inconsequential such that they will not be associated with a main effect or a racial priming effect.

So hypotheses became more specific for Democratic candidates and switched from Republicans being rewarded to Republicans not experiencing a consequential effect. My sense is that hypothesis modification is not uncommon in social science, but the reason for the survey items asking about personal characteristics of the candidate (e.g., trustworthy, competent) is clearer in light of the dissertation's hypotheses about candidates being penalized or rewarded for an association with African Americans. After all, the feeling thermometer and the other Q10 characteristic items can be used to assess a penalty or reward for candidates.

---

In terms of the substance of the penalty, the abstract of the JOP article notes: "I empirically demonstrate that white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner."

My analysis of the data indicated that, based on a model with no controls and no cases dropped, and comparing the all-white photo condition to the all-black photo condition, there is evidence of this penalty in the Q9 vote item at p=0.074. However, evidence for this penalty is weak in the feeling thermometer (p=0.248) and in the "fair" item (p=0.483), and I saw no evidence in the article or dissertation that the penalty can be detected in the items omitted from the dataset.

Moreover, much of the estimated penalty might reflect only the race of persons in the photos providing a signal about candidate Greg Davis' ideology. Compared to respondents in the all-white photo condition, respondents in the mixed photo condition and the all-black photo condition rated Greg Davis as more liberal (p-values of 0.014 and 0.004), and the p=0.074 penalty in the Q9 vote item inflates to p=0.710 when including the measure of Greg Davis' perceived ideology, with corresponding p-values ranging from p=0.600 to p=0.964 for models predicting a penalty in the thermometer and the "fair" item.

---

NOTES:

1. H/T to Brendan Nyhan for the pointer to the JOP article.

2. The JOP article emphasizes the counterstereotypical nature of the mailer photos of blacks, but the experiment did not vary the photos of blacks, so the experiment provides no evidence about the influence of the counterstereotypical nature of the photos.

3. The JOP article reports four manipulation checks (footnote 6), but the dissertation reports five manipulation checks (footnote 65, p. 156). The omitted manipulation check concerned whether the candidate tried to appeal to racial feelings. The dataset for the article at the JOP Dataverse has a "manipchk_racialfeelings" variable that is presumably this omitted manipulation check.

4. The abstract reports that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support." However, Table 2 of the article and my analysis indicate that no photo condition comparison produced a statistically-significant main effect for the "fair" item and only the all-white vs. mixed photo comparison produced a statistically-significant main effect for perceptions of the likelihood of reducing crime, with this one main effect reaching statistical significance only under the article's generous convention of using a statistical significance asterisk for a one-tailed p-value less than 0.10 (the p-value was p=0.142).

Table 4 of the article indicated a statistically-significant interaction between photo conditions and racial resentment when predicting the "fair" item and perceptions of the likelihood of reducing crime, so I think that this interaction is what is referred to in the abstract statement that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support."

5. The 0.142 p-value referred to in the previous item inflates to p=0.340 when the controls are removed from the model. There are valid reasons for including demographic controls in a regression predicting results from a survey experiment, but the particular set of controls should be preregistered to prevent researchers from estimating models without controls and with different combinations of controls and then selecting a model or models to report based on the corresponding p-value or effect size.

6. Code for the new analyses:

reg votegregdavis i.whitedem_treatments [pweight = weight]
reg thermgregdavis i.whitedem_treatments [pweight = weight]
reg fair_gregdavis i.whitedem_treatments [pweight = weight]
reg ideo_gregdavis i.whitedem_treatments [pweight = weight]
reg votegregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
reg thermgregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
reg fair_gregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
ologit fair_gregdavis i.whitedem_treatments gender educ income pid7 south [pweight = weight]
ologit gregdavis_redcrim i.whitedem_treatments gender educ income pid7 south [pweight = weight]
ologit gregdavis_redcrim i.whitedem_treatments [pweight = weight]

7. I emailed Dr. Stephens-Dougan, asking whether there was a reason for the exclusion of items and about access to a full dataset. I received a response and invited her to comment on this post.

Tagged with: methods, race, selective reporting, you're doing it wrong

This female researcher should be cited more often

By L.J Zigerell Posted on June 6, 2016 Posted in Methods No Comments

My article reanalyzing data on a gender gap in citations to international relations articles indicated that the gender gap is largely confined to elite articles, defined as articles in the right tail of citation counts or articles in the top three political science journals. That article concerned an aggregate gender gap in citations, but this post is about a particular woman who has been under-cited in the social science literature.

It is not uncommon to read a list experiment study that suggests or states that the list experiment originated in the research described in the Kuklinski, Cobb, and Gilens 1997 article, "Racial Attitudes and the New South." For example, from Heerwig and McCabe 2009 (p. 678):

Pioneered by Kuklinski, Cobb, and Gilens (1997) to measure social desirability bias in reporting racial attitudes in the "New South," the list experiment is an increasingly popular methodological tool for measuring social desirability bias in self-reported attitudes and behaviors.

Kuklinski et al. described a list experiment that was placed on the 1991 National Race and Politics Survey. Kuklinski and colleagues appeared to propose the list experiment as a new measure (p. 327):

We offer as our version of an unobtrusive measure the list experiment. Imagine a representative sample of a general population divided randomly in two. One half are presented with a list of three items and asked to say how many of these items make them angry — not which specific items make them angry, just how many. The other half receive the same list plus an additional item about race and are also asked to indicate the number of items that make them angry. [screen shot]

The initial draft of my list experiment article reflected the belief that the list experiment originated with Kuklinski et al., but I then learned [*] of Judith Droitcour Miller's 1984 dissertation, which contained this passage:

The new item-count/paired lists technique is designed to avoid the pitfalls encountered by previous indirect estimation methods. Briefly, respondents are shown a list of four or five behavior categories (the specific number is arbitrary) and are then asked to report how many of these behaviors they have engaged in — not which categories apply to them. Nothing else is required of respondents or interviewers. Unbiased estimation is possible because two slightly different list forms (paired lists) are administered to two separate subsamples of respondents, which have been randomly selected in advance by the investigator. The two list forms differ only in that the deviant behavior item is included on one list, but omitted from the other. Once the alternate forms have been administered to the two randomly equivalent subsamples, an estimate of deviant behavior prevalence can be derived from the difference between the average list scores. [screen shot]

The above passage was drawn from pages 3 and 4 of Judith Droitcour Miller's 1984 dissertation at the George Washington University, "A New Survey Technique for Studying Deviant Behavior." [Here is another description of the method, in a passage from the 2004 edition of the 1991 book, Measurement Errors in Surveys (p. 88)]

It's possible that James Kuklinski independently invented the list experiment, but descriptions of the list experiment's origin should nonetheless cite Judith Droitcour Miller's 1984 dissertation as a prior — if not the first [**] — example of the procedure known as the list experiment.

---

[*] I think it was the Adam Glynn manuscript described below through which I learned of Miller's dissertation.

[**] An Adam Glynn manuscript discussed the list experiment and item count method as special cases of aggregated response techniques. Glynn referenced a 1979 Raghavarao and Federer article, and that article referenced a 1974 Smith et al. manuscript that used a similar block total response procedure. The non-randomized version of the procedure split seven questions into groups of three, as illustrated in one of the questionnaires below. The procedure's unobtrusiveness derived from a researcher's inability in most cases to determine which responses a respondent had selected: for example, Yes-No-Yes produces the same total as No-No-No (5 in each case).

The questionnaire for the randomized version of the block total response procedure listed all seven questions; the respondent then drew a number and gave a total response for only those three questions that were associated with the number that was drawn: for example, if the respondent drew a 4, then the respondent gave a total for their responses to questions 4, 5, and 7. This procedure is similar to the list experiment, but the list experiment is simpler and more efficient.

Tagged with: inequality, list experiment, methods, sex, you're doing it wrong

The omitted "White" or "European American" experiment

By L.J Zigerell Posted on June 3, 2016 Posted in Methods No Comments

Here's part of the abstract from Rios Morrison and Chung 2011, published in the Journal of Experimental Social Psychology:

In both studies, nonminority participants were randomly assigned to mark their race/ethnicity as either "White" or "European American" on a demographic survey, before answering questions about their interethnic attitudes. Results demonstrated that nonminorities primed to think of themselves as White (versus European American) were subsequently less supportive of multiculturalism and more racially prejudiced, due to decreases in identification with ethnic minorities.

So asking white respondents to select their race/ethnicity as "European American" instead of "White" influenced whites' attitudes toward and about ethnic minorities. The final sample for study 1 was a convenience sample of 77 self-identified whites and 52 non-whites, and the final sample for study 2 was 111 white undergraduates.

Like I wrote before, if you're thinking that it would be interesting to see whether results hold in a nationally representative sample with a large sample size, well, that was tried, with a survey experiment as part of the Time Sharing Experiments in the Social Sciences. Here are the results:

I'm mentioning these results again because in October 2014 the journal that published Rios Morrison and Chung 2011 desk rejected the manuscript that I submitted describing these results. So you can read in the Journal of Experimental Social Psychology about results for the low-powered test on convenience samples for the "European American" versus "White" self-identification hypothesis, but you won't be able to read in the JESP about results when that hypothesis was tested with a higher-powered test on a nationally-representative sample with data collected by a disinterested third party.

I submitted a revision of the manuscript to Social Psychological and Personality Science, which extended a revise-and-resubmit offer conditional on inclusion of a replication of the TESS experiment. I planned to conduct an experiment with an MTurk sample, but I eventually declined the revise-and-resubmit opportunity for various reasons.

The most recent version of the manuscript is here. Links to data and code.

Tagged with: file drawer problem, methods, pottery barn rule, reproductions, selective reporting, TESS, you're doing it wrong

Selective reporting of outcome variables in "The Public's Anger"

By L.J Zigerell Posted on June 2, 2016 Posted in Methods No Comments

In the Political Behavior article, "The Public's Anger: White Racial Attitudes and Opinions Toward Health Care Reform", Antoine J. Banks presented evidence that "anger uniquely pushes racial conservatives to be more opposing of health care reform while it triggers more support among racial liberals" (p. 493). Here is how the outcome variable was measured in the article's reported analysis (p. 511):

Health Care Reform is a dummy variable recoded 0-1 with 1 equals opposition to reform. The specific item is "As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill". The response options were yes = I favor the health care bill or no = I oppose the health care bill.

However, the questionnaire for the study indicates that there were multiple items used to measure opinions of health care reform:

W2_1. Do you approve or disapprove of the way Barack Obama is handling Health Care? Please indicate whether you approve strongly, approve somewhat, neither approve nor disapprove, disapprove somewhat, or disapprove strongly.

W2_2. As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill?

[if "favor" on W2_2] W2_2a. Do you favor Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

[if "oppose" on W2_2] W2_2b. Do you oppose Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

The bold item above is the only item reported on as an outcome variable in the article. The reported analysis omitted results for one outcome variable (W2_1) and reported dichotomous results for the other outcome variable (W2_2) for which the apparent intention was to have a four-pronged outcome variable from oppose strongly to favor strongly.

---

Here is the manuscript that I submitted to Political Behavior in March 2015 describing the results using the presumed intended outcome variables and a straightforward research design (e.g., no political discussion control, no exclusion of cases, cases from all conditions analyzed at the same time). Here's the main part of the main figure:

The takeaway is that, with regard to opposition to health care reform, the effect of the fear condition on symbolic racism differed at a statistically significant level from the effect of the baseline relaxed condition on symbolic racism; however, contra Banks 2014, the effect of anger on symbolic racism did not differ at a statistically significant level from the effect of the relaxed condition on symbolic racism. The anger condition had a positive effect on symbolic racism, but it was not a unique influence.

The submission to Political Behavior was rejected after peer review. Comments suggested analyzing the presumed intended outcome variables while using the research design choices in Banks 2014. Using the model in Table 2 column 1 of Banks 2014, the fear interaction term and the fear condition term are statistically significant at p<0.05 for predicting the two previously-unreported non-dichotomous outcome variables and for predicting the scale of these two variables; the anger interaction term and the anger condition term are statistically significant at p<0.05 for predicting two of these three outcome variables, with p-values for the residual "Obama handling" outcome variable at roughly 0.10. The revised manuscript describing these results is here.

---

Data are here, and code for the initial submission is here.

---

Antoine Banks has published several studies on anger and racial politics (here, for example) that should be considered when making inferences about the substance of the effect of anger on racial attitudes. Banks had a similar article published in the AJPS, with Nicholas Valentino. Data for that article are here. I did not see any problems with that analysis, but I didn't look very hard, because the posted data were not the raw data: the posted data that I checked omitted, for example, the variables used to construct the outcome variable.

Tagged with: methods, pottery barn rule, race, reproductions, selective reporting, symbolic racism, TESS, you're doing it wrong

Month: June 2016

Selective reporting in "Priming Racial Resentment without Stereotypic Cues"

This female researcher should be cited more often

The omitted "White" or "European American" experiment

Selective reporting of outcome variables in "The Public's Anger"