selective reporting

Funnel plot for "Differences in Helping Whites and Blacks"

By L.J Zigerell Posted on July 22, 2016 Posted in Race No Comments

I happened across the Saucier et al. 2005 meta-analysis "Differences in Helping Whites and Blacks: A Meta-Analysis" (ungated), and I decided to plot the effect size against the standard error in a funnel plot to assess the possibility of publication bias.The funnel plot is below.

Funnel plot asymmetry was not detected in Begg's test (p=0.486) but was detected in the higher-powered Egger's test (p=0.009)

---

NOTE:

1. Saucier et al. 2005 reported sample sizes but not ~~effect sizes~~ standard errors for each study, so I estimated the standard errors with formula 7.30 of Hunter and Schmidt (2004: 286).

2. Code here.

Tagged with: race, selective reporting

Symbolic racism and gun control attitudes

By L.J Zigerell Posted on July 20, 2016 Posted in Race No Comments

I previously discussed Filindra and Kaplan 2016 in terms of the current state of political science research transparency, but this post will discuss the article more substantively.

Let's start with a re-quote regarding the purpose and research design of the Filindra and Kaplan 2016 experiment:

To determine whether racial prejudice depresses white support for gun control, we designed a priming experiment which exposed respondents to pictures of blacks and whites drawn from the IAT. Results show that exposure to the prime suppressed support for gun control compared to the control, conditional upon a respondent's level of racial resentment (p. 255).

Under the guise of a cognitive test, we exposed 600 survey participants who self-identified as white to three pictures of the faces of black individuals and another three of white individuals (p. 261).

For predicting the two gun-related outcome variable scales for the experiment, Table 1 indicates in separate models that the treatment alone, the treatment and a measure of symbolic racism alone, and the interaction of the treatment and symbolic racism all reach statistical significance at at least p<0.10 with a two-tailed test.

But the outcome variable scales are built from a subset of measured gun-related items. Filindra and Kaplan 2016 reported an exploratory factor analysis used to select items for outcome variable scales: 7 of 13 policy items about guns and 8 of 9 belief items about guns were selected for inclusion in the scales. The dataset for the article uploaded to the Dataverse did not contain data for the omitted policy and belief items, so I requested these data from Dr. Filindra. I did not receive access to these data.

It's reasonable to use factor analysis to decide which items to include in a scale, but this permits researcher flexibility about whether to perform the factor analysis in the first place and, if so, about whether to place all items in a single factor analysis or to, as in Filindra and Kaplan 2016, separate the items into groups and conduct a factor analysis for each group.

---

But the main problem with the experiment is not the flexibility in building the outcome variable scales. The main problem is that the research design does not permit an inference of racial prejudice.

The Filindra and Kaplan 2016 experimental design of a control and a single treatment of the black/white photo combination permits at most only the inference of a "causal relationship between racial considerations and gun policy preferences among whites" (p. 263, emphasis added). However, Filindra and Kaplan 2016 also discussed the experiment as if the treatment had been only photos of blacks (p. 263):

Our priming experiment shows that mere short exposure to pictures of blacks can drive opposition to gun control.

The Filindra and Kaplan experimental design does not permit assigning the measured effect to the photos of blacks isolated from the photos of whites, so I'm not sure why peer reviewers would have permitted that claim, which appeared in exactly the same form on page 9 of Filindra and Kaplan's 2015 MPSA paper.

---

Filindra and Kaplan 2016 supplement the experiment with a correlational study using symbolic racism to predict the ANES gun control item. But, as other researchers and I have noted, there is an inferential problem using symbolic racism in correlational studies, because symbolic racism conflates racial prejudice and nonracial attitudes; for example, knowing only that a person believes that blacks should not receive special favors cannot tell us whether that person's belief is motivated by antiblack bias, nonracial opposition to special favors, or some combination of the two.

My article here provides a sense of how strong a residual post-statistical-control correlation between symbolic racism and an outcome variable must be before one can confidently claim that the correlation is tapping antiblack bias. To illustrate this, I used linear regression on the 2012 ANES Time Series Study data, weighted and limited to white respondents, to predict responses to the gun control item, which was coded on a standardized scale so that the lowest value is the response that the federal government should make it more difficult to buy a gun, the middle response is that the rules should be kept the same, and the highest value is that the federal government should make it easier to buy a gun.

The standardized symbolic racism scale produced a 0.068 (p=0.012) residual correlation with the standardized gun control item, with the model including the full set of statistical control as described in the note below. That was about the same residual correlation as for predicting a standardized scale measuring conservative attitudes toward women (0.108, p<0.001), about the same residual correlation as for predicting a standardized abortion scale (-0.087, p<0.001), and about the same residual correlation as for predicting a standardized item about whether people should be permitted to place Social Security payroll taxes into personal accounts (0.070, p=0.007).

So, based on these data alone, racial prejudice as measured with symbolic racism has about as much "effect" on attitudes about gun control as it does on attitudes about women, abortion, and private accounts for Social Security. I think it's unlikely that bias against blacks causes conservative attitudes toward women, so I don't think that the 2012 ANES data can resolve whether or the extent to which bias against blacks causes support for gun control.

I would bet that there is some connection between antiblack prejudice and gun control, but I wouldn't argue that Filindra and Kaplan 2016 provide convincing evidence of this. Of course, it looks like a version of the Filindra and Kaplan 2016 paper won a national award, so what do I know?

---

NOTES:

1. Code for my analysis reported above is here.

2. The full set of statistical control has controls for: respondent sex, marital status, age group, education level, household income, employment status, Republican party membership, Democratic Party membership, self-reported political ideology, and items measuring attitudes about whether jobs should be guaranteed, limited government, moral traditionalism, authoritarianism, and egalitarianism.

3. Filindra and Kaplan 2016 Table 2 reports a larger effect size for symbolic racism in the 2004 and 2008 ANES data than in the 2012 ANES data, with respective values for the maximum change in probability of support of -0.23, -0.25, and -0.16. The mean of the 2004 and 2008 estimate is 50% larger than the 2012 estimate, so increasing the 2012 residual correlation of 0.068 by 50% produces 0.102, which is still about the same residual correlation as for conservative attitudes about women. Based on Table 6 of my article, I would not be comfortable alleging an effect for racial bias with anything under a 0.15 residual correlation with a full set of statistical control.

Tagged with: race, selective reporting, symbolic racism, you're doing it wrong

Selective reporting in "Priming Racial Resentment without Stereotypic Cues"

By L.J Zigerell Posted on June 29, 2016 Posted in Race No Comments

In a survey experiment reported in LaFleur Stephens-Dougan's 2016 Journal of Politics article, "Priming Racial Resentment without Stereotypic Cues", respondents were shown a campaign mailer for a white candidate named Greg Davis, with experimental manipulations of the candidate's party (Democrat or Republican) and the photos on the candidate's mailers (five photos of whites, five photos of blacks, or a mixture of photos of whites and blacks).

One key finding described in the abstract is that "white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner".

The JOP article describes the analysis in Chapter 5 of Stephens' dissertation, which reported on a July 2011 YouGov/Polimetrix survey. Dissertation page 173 indicated that the survey had 13 experimental conditions, but the JOP article reports only six conditions, omitting the control condition and the six conditions in which candidate Greg Davis was black. Stephens-Dougan might plan to report on these omitted conditions in a subsequent publication, so I'll concentrate on the outcome variables.

---

Many potential outcome variables were not reported on in the JOP article, based on a comparison of the survey questionnaire in Appendix D of the dissertation to the outcome variables mentioned in the main text or the appendix of the article. The box below describes each post-treatment item on the survey questionnaire except for the Q7 manipulation check: regular font with a [*] indicates items reported on in the article, and boldface indicates items not reported on in the article. See pages 224 to 230 of the dissertation for the exact item wording.

Q8. Feeling thermometer for the candidate.

Q9 [*]. Likelihood of voting for Greg Davis.

Q10. How well a series of terms describe Greg Davis:

Intelligent

Inexperienced

Trustworthy

Hardworking

Fair [*]

Competent

Q11. Perception of Greg Davis' political ideology.

Q12. Whether Democrats, Republicans, or neither party would be better at:

Helping Senior Citizens

Improving Health Care

Improving the Economy

Reducing Crime

Reforming Public Education

Q13. Like, dislike, or neither for the Democratic Party.

Q14. Like, dislike, or neither for the Republican Party.

Q15. Perception of how well Greg Davis would handle:

Helping Senior Citizens

Improving Health Care

Improving the Economy

Reducing Crime [*]

Reforming Public Education

Q16 [*]. Whether Greg Davis' policies will favor Whites over Blacks, Blacks over Whites, or neither.

Q17. Perception of which groups Greg Davis will help if elected:

Teachers

Latinos

Corporate Executives

Farmers

Senior Citizens

African Americans

Homeowners

Students

Small Business Owners

Whites

Q18 [*]. Perception of Greg Davis' position on affirmative action for Blacks in the workplace.

Q19. Perception of Greg Davis' position on the level of federal spending on Social Security.

Q20. Job approval, disapproval, or neither for Barack Obama as president.

The boldface above indicates that many potential outcome variables were not reported on in the JOP article. For example, Q10 asked respondents how well the terms "intelligent", "inexperienced", "trustworthy", "hardworking", "fair", and "competent" describe the candidate, but readers are told about results only for "fair"; readers are told results for Q16 about the candidate's perceived preference for policies that help whites over blacks or vice versa, but readers are not told results for "Whites" and "Blacks" on Q17 about the groups that the candidate is expected to help.

Perhaps the estimates and inferences are identical for the omitted and included items, but prior analyses [e.g., here and here] suggest that omitted items often produce different estimates and sometimes produce different inferences than included items.

Data for most omitted potential outcome variables are not in the article's dataset at the JOP Dataverse, but the dataset did contain a "thermgregdavis" variable that ranged from 1 to 100, which is presumably the feeling thermometer in Q8. I used the model in line 14 of Stephens-Dougan's code, but -- instead of using the reported Q9 outcome variable for likelihood of voting for Greg Davis -- I used "thermgregdavis" as the outcome variable and changed the estimation technique from ordered logit to linear regression: the p-value for the difference between the all-white photo condition and the mixed photo condition was p=0.693, and the p-value for the difference between the all-white photo condition and the all-black photo condition was p=0.264.

---

This sort of selective reporting is not uncommon in social science [see here, here, and here], but I'm skeptical that researchers with the flexibility to report the results they want based on post-hoc research design choices will produce replicable estimates and unbiased inferences, especially in the politically-charged racial discrimination subfield. I am also skeptical that selective reporting across publications will balance out in a field in which a supermajority of researchers fall on one side of the political spectrum.

So how can such selective reporting be prevented? Researchers can preregister their research designs. Journals can preaccept articles based on a preregistered research design. For non-preregistered studies, journals can require as a condition of publication the declaration of omitted studies, experiments, experimental conditions, and outcome variables. Peer reviewers can ask for these declarations, too.

---

It's also worth comparing the hypotheses as expressed in the dissertation to the hypotheses as expressed in the JOP article. First, the hypotheses from dissertation Chapter 5, on page 153:

H1: Democratic candidates are penalized for an association with African Americans.

H2: Republican candidates are rewarded for an association with African Americans.

H3: The racial composition of an advertisement influences voters' perceptions of the candidates' policy preferences.

Now, the JOP hypotheses:

H1. White Democratic candidates associated with blacks will lose vote support and will be perceived as more likely to favor blacks over whites and more likely to support affirmative action relative to white Democratic candidates associated with images of only whites.

H2. Counterstereotypic images of African Americans paired with a white Democratic candidate will prime racial attitudes on candidate evaluations that are implicitly racial relative to a comparable white Democratic candidate associated with all whites.

H3. Counterstereotypic images of African Americans paired with a white Republican candidate will be inconsequential such that they will not be associated with a main effect or a racial priming effect.

So hypotheses became more specific for Democratic candidates and switched from Republicans being rewarded to Republicans not experiencing a consequential effect. My sense is that hypothesis modification is not uncommon in social science, but the reason for the survey items asking about personal characteristics of the candidate (e.g., trustworthy, competent) is clearer in light of the dissertation's hypotheses about candidates being penalized or rewarded for an association with African Americans. After all, the feeling thermometer and the other Q10 characteristic items can be used to assess a penalty or reward for candidates.

---

In terms of the substance of the penalty, the abstract of the JOP article notes: "I empirically demonstrate that white Democratic candidates are penalized for associating with blacks, even if blacks are portrayed in a positive manner."

My analysis of the data indicated that, based on a model with no controls and no cases dropped, and comparing the all-white photo condition to the all-black photo condition, there is evidence of this penalty in the Q9 vote item at p=0.074. However, evidence for this penalty is weak in the feeling thermometer (p=0.248) and in the "fair" item (p=0.483), and I saw no evidence in the article or dissertation that the penalty can be detected in the items omitted from the dataset.

Moreover, much of the estimated penalty might reflect only the race of persons in the photos providing a signal about candidate Greg Davis' ideology. Compared to respondents in the all-white photo condition, respondents in the mixed photo condition and the all-black photo condition rated Greg Davis as more liberal (p-values of 0.014 and 0.004), and the p=0.074 penalty in the Q9 vote item inflates to p=0.710 when including the measure of Greg Davis' perceived ideology, with corresponding p-values ranging from p=0.600 to p=0.964 for models predicting a penalty in the thermometer and the "fair" item.

---

NOTES:

1. H/T to Brendan Nyhan for the pointer to the JOP article.

2. The JOP article emphasizes the counterstereotypical nature of the mailer photos of blacks, but the experiment did not vary the photos of blacks, so the experiment provides no evidence about the influence of the counterstereotypical nature of the photos.

3. The JOP article reports four manipulation checks (footnote 6), but the dissertation reports five manipulation checks (footnote 65, p. 156). The omitted manipulation check concerned whether the candidate tried to appeal to racial feelings. The dataset for the article at the JOP Dataverse has a "manipchk_racialfeelings" variable that is presumably this omitted manipulation check.

4. The abstract reports that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support." However, Table 2 of the article and my analysis indicate that no photo condition comparison produced a statistically-significant main effect for the "fair" item and only the all-white vs. mixed photo comparison produced a statistically-significant main effect for perceptions of the likelihood of reducing crime, with this one main effect reaching statistical significance only under the article's generous convention of using a statistical significance asterisk for a one-tailed p-value less than 0.10 (the p-value was p=0.142).

Table 4 of the article indicated a statistically-significant interaction between photo conditions and racial resentment when predicting the "fair" item and perceptions of the likelihood of reducing crime, so I think that this interaction is what is referred to in the abstract statement that "Racial resentment was primed such that white Democratic candidates associated with blacks were perceived as less fair, less likely to reduce crime, and less likely to receive vote support."

5. The 0.142 p-value referred to in the previous item inflates to p=0.340 when the controls are removed from the model. There are valid reasons for including demographic controls in a regression predicting results from a survey experiment, but the particular set of controls should be preregistered to prevent researchers from estimating models without controls and with different combinations of controls and then selecting a model or models to report based on the corresponding p-value or effect size.

6. Code for the new analyses:

reg votegregdavis i.whitedem_treatments [pweight = weight]
reg thermgregdavis i.whitedem_treatments [pweight = weight]
reg fair_gregdavis i.whitedem_treatments [pweight = weight]
reg ideo_gregdavis i.whitedem_treatments [pweight = weight]
reg votegregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
reg thermgregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
reg fair_gregdavis i.whitedem_treatments ideo_gregdavis [pweight = weight]
ologit fair_gregdavis i.whitedem_treatments gender educ income pid7 south [pweight = weight]
ologit gregdavis_redcrim i.whitedem_treatments gender educ income pid7 south [pweight = weight]
ologit gregdavis_redcrim i.whitedem_treatments [pweight = weight]

7. I emailed Dr. Stephens-Dougan, asking whether there was a reason for the exclusion of items and about access to a full dataset. I received a response and invited her to comment on this post.

Tagged with: methods, race, selective reporting, you're doing it wrong

The omitted "White" or "European American" experiment

By L.J Zigerell Posted on June 3, 2016 Posted in Methods No Comments

Here's part of the abstract from Rios Morrison and Chung 2011, published in the Journal of Experimental Social Psychology:

In both studies, nonminority participants were randomly assigned to mark their race/ethnicity as either "White" or "European American" on a demographic survey, before answering questions about their interethnic attitudes. Results demonstrated that nonminorities primed to think of themselves as White (versus European American) were subsequently less supportive of multiculturalism and more racially prejudiced, due to decreases in identification with ethnic minorities.

So asking white respondents to select their race/ethnicity as "European American" instead of "White" influenced whites' attitudes toward and about ethnic minorities. The final sample for study 1 was a convenience sample of 77 self-identified whites and 52 non-whites, and the final sample for study 2 was 111 white undergraduates.

Like I wrote before, if you're thinking that it would be interesting to see whether results hold in a nationally representative sample with a large sample size, well, that was tried, with a survey experiment as part of the Time Sharing Experiments in the Social Sciences. Here are the results:

I'm mentioning these results again because in October 2014 the journal that published Rios Morrison and Chung 2011 desk rejected the manuscript that I submitted describing these results. So you can read in the Journal of Experimental Social Psychology about results for the low-powered test on convenience samples for the "European American" versus "White" self-identification hypothesis, but you won't be able to read in the JESP about results when that hypothesis was tested with a higher-powered test on a nationally-representative sample with data collected by a disinterested third party.

I submitted a revision of the manuscript to Social Psychological and Personality Science, which extended a revise-and-resubmit offer conditional on inclusion of a replication of the TESS experiment. I planned to conduct an experiment with an MTurk sample, but I eventually declined the revise-and-resubmit opportunity for various reasons.

The most recent version of the manuscript is here. Links to data and code.

Tagged with: file drawer problem, methods, pottery barn rule, reproductions, selective reporting, TESS, you're doing it wrong

Selective reporting of outcome variables in "The Public's Anger"

By L.J Zigerell Posted on June 2, 2016 Posted in Methods No Comments

In the Political Behavior article, "The Public's Anger: White Racial Attitudes and Opinions Toward Health Care Reform", Antoine J. Banks presented evidence that "anger uniquely pushes racial conservatives to be more opposing of health care reform while it triggers more support among racial liberals" (p. 493). Here is how the outcome variable was measured in the article's reported analysis (p. 511):

Health Care Reform is a dummy variable recoded 0-1 with 1 equals opposition to reform. The specific item is "As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill". The response options were yes = I favor the health care bill or no = I oppose the health care bill.

However, the questionnaire for the study indicates that there were multiple items used to measure opinions of health care reform:

W2_1. Do you approve or disapprove of the way Barack Obama is handling Health Care? Please indicate whether you approve strongly, approve somewhat, neither approve nor disapprove, disapprove somewhat, or disapprove strongly.

W2_2. As of right now, do you favor or oppose Barack Obama and the Democrats' Health Care reform bill?

[if "favor" on W2_2] W2_2a. Do you favor Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

[if "oppose" on W2_2] W2_2b. Do you oppose Barack Obama and the Democrats' Health Care reform bill very strongly, or not so strongly?

The bold item above is the only item reported on as an outcome variable in the article. The reported analysis omitted results for one outcome variable (W2_1) and reported dichotomous results for the other outcome variable (W2_2) for which the apparent intention was to have a four-pronged outcome variable from oppose strongly to favor strongly.

---

Here is the manuscript that I submitted to Political Behavior in March 2015 describing the results using the presumed intended outcome variables and a straightforward research design (e.g., no political discussion control, no exclusion of cases, cases from all conditions analyzed at the same time). Here's the main part of the main figure:

The takeaway is that, with regard to opposition to health care reform, the effect of the fear condition on symbolic racism differed at a statistically significant level from the effect of the baseline relaxed condition on symbolic racism; however, contra Banks 2014, the effect of anger on symbolic racism did not differ at a statistically significant level from the effect of the relaxed condition on symbolic racism. The anger condition had a positive effect on symbolic racism, but it was not a unique influence.

The submission to Political Behavior was rejected after peer review. Comments suggested analyzing the presumed intended outcome variables while using the research design choices in Banks 2014. Using the model in Table 2 column 1 of Banks 2014, the fear interaction term and the fear condition term are statistically significant at p<0.05 for predicting the two previously-unreported non-dichotomous outcome variables and for predicting the scale of these two variables; the anger interaction term and the anger condition term are statistically significant at p<0.05 for predicting two of these three outcome variables, with p-values for the residual "Obama handling" outcome variable at roughly 0.10. The revised manuscript describing these results is here.

---

Data are here, and code for the initial submission is here.

---

Antoine Banks has published several studies on anger and racial politics (here, for example) that should be considered when making inferences about the substance of the effect of anger on racial attitudes. Banks had a similar article published in the AJPS, with Nicholas Valentino. Data for that article are here. I did not see any problems with that analysis, but I didn't look very hard, because the posted data were not the raw data: the posted data that I checked omitted, for example, the variables used to construct the outcome variable.

Tagged with: methods, pottery barn rule, race, reproductions, selective reporting, symbolic racism, TESS, you're doing it wrong

Researchers select 2 of 16,369 combinations to report

By L.J Zigerell Posted on January 11, 2016 Posted in Race 4 Comments

The Public Opinion Quarterly article "Bias in the Flesh" provided evidence of "an evaluative penalty for darker skin" (quote from the abstract). Study 2 of the article was an MTurk survey. Some respondents were shown an image of Barack Obama with darkened skin, and some respondents were shown an image of Barack Obama with lightened skin. Both sets of respondents received the text: "We are interested in how people evaluate images of political figures. Consider the following image:"

Immediately following the image and text, respondents received 14 items that could be used to assess this evaluative penalty for darker skin; these items are listed in the boxes below. The first 11 items could be used to measure whether, compared to respondents in one of the conditions, respondents in the other condition completed more word fragments with words associated with negative stereotypes, such as LAZY or CRIME.

Please complete the following word fragments. Make sure to type out the entire word.

1. L A _ _

2. C R _ _ _

3. _ _ O R

4. R _ _

5. W E L _ _ _ _

6. _ _ C E

7. D _ _ _ Y

8. B R _ _ _ _ _

9. _ _ A C K

10. M I _ _ _ _ _ _

11. D R _ _

How competent is Barrack Obama?

1. Very competent

2. Somewhat competent

3. Neither competent nor incompetent

4. Somewhat incompetent

5. Very incompetent

How trustworthy is Barrack Obama?

1. Very trustworthy

2. Somewhat trustworthy

3. Neither trustworthy nor untrustworthy

4. Somewhat untrustworthy

5. Very untrustworthy

On a scale from 0 (coldest) to 100 (warmest) how do you feel about Barack Obama?

The three bolded items above are the only three items for which results were reported on ~~in the article (items 1, 3, and 7) and~~ in the corresponding Monkey Cage post. ~~In other words, the researchers selected 3 of 14 items to assess the evaluative penalty for darker skin.~~ [Update: Footnote 16 in the article reported results for the combination of lazy, black, poor, welfare, crime, and dirty (p=0.078).]

If I'm using the correct formula, there are 16,369 different combinations of 14 items that could have been reported, not counting the null set and not counting reporting on only one item. Hopefully, I don't need a formula or calculation to convince you that there is a pretty good chance that random assignment variation alone would produce an associated two-tailed p-value less than 0.05 in at least one of those 16,369 combinations. The fact that the study reported one of these combinations doesn't provide much information about the evaluative penalty for darker skin.

The really discomforting part of this selective reporting is how transparently it was done: the main text of the article noted that only 3 of 14 puzzle-type items were selected, and the supplemental file included the items about Obama's competency, Obama's trustworthiness, and the Obama feeling thermometer. There was nothing hidden about this selective reporting, from what I can tell.

---

Notes:

1. For what it's worth, the survey had an item asking whether Obama's race is white, black, or mixed. But that doesn't seems to be useful for measuring an evaluative penalty for darker skin, so I didn't count it.

2. It's possible that the number of permutations that the peer reviewers would have permitted is less than 16,369. But that's an open question, given that the peer reviewers permitted 3 of 14 potential outcome variables to be reported [Update: ...in the main text of the article].

3. The data are not publicly available to analyze, so maybe the selective reporting in this instance didn't matter. I put in a request last week for the data, so hopefully we'll find out.

---

UPDATE (Jan 12, 2016)

1. I changed the title of the post from "Researchers select 1 of 16,369 combinations to report" to "Researchers select 2 of 16,369 combinations to report", because I overlooked footnote 16 in the article. Thanks to Solomon Messing for the pointer.

2. Omar Wasow noted that two of the items had a misspelling of Barack Obama's first name. Those misspellings appear in the questionnaire in the supplemental file for the article.

---

UPDATE (Jan 13, 2016)

1. Solomon Messing noted that data for the article are now available at the Dataverse. I followed as best I could the posted R code to reproduce the analysis in Stata, and I came close to the results reported in the article. I got the same percentages for the three word puzzles as the percentages that appear the article: 33% for the lightened photo, and 45% for the darkened photo, with a small difference in t-scores (t=2.74 to t=2.64). Estimates and t-scores were also close for the reported result in footnote 16: estimates of 0.98 and 1.11 for me, and estimates of 0.97 and 1.11 in the article, with respective t-scores of 1.79 and 1.77. Compared to the 630 unexcluded respondents for the article, I had 5 extra respondents after exclusions (635 total).

The table below reports results from t-tests that I conducted. The Stata code is available here.

Let me note a few things from the table:

First, I reproduced the finding that, when the word puzzles were limited to the combination of lazy, dirty, and poor, unexcluded respondents in the darkened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the lightened photo condition.

However, if I combine the word puzzles for race, minority, and rap, the finding is that unexcluded respondents in the lightened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the darkened photo condition: the opposite inference. Same thing when I combine race, minority, rap, and welfare. And same thing when I combine race, minority, rap, welfare, and crime.

Sure, as a group, these five stereotypes -- race, minority, rap, welfare, and crime -- don't have the highest face validity of the 11 stereotypes for being the most negative stereotypes, but there doesn't appear to be anyone in political science enforcing a rule that researchers must report all potential or intended outcome variables.

2. Estimates for 5 of the 11 stereotype items fell to the negative side of zero, indicating that unexcluded respondents in the lightened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the darkened photo condition. And estimates for 6 of the 11 stereotype items fell to the positive side of zero, indicating that unexcluded respondents in the darkened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the lightened photo condition.

A 5-to-6 split like that is what we'd expect if there were truly no effect, so -- in that sense -- this experiment doesn't provide much evidence for the relative effect of the darkened photo. That isn't a statement that the true relative effect of the darkened photo is exactly zero, but it is a statement about the evidence that this experiment has provided.

For what it's worth, the effect size is 0.118 and the p-value is 0.060 for the combination of word puzzles that I think has the most face validity for being the most negative stereotypes (lazy, poor, welfare, crime, drug, and dirty); the effect size is -0.032 and the p-value is 0.560 for the combination of word puzzles that I think have the least face validity for being the most negative stereotypes (race, black, brother, minority, and rap). So I'm not going to make any bets that the true effect is zero or that the lightened photo fosters relatively more activation of negative stereotypes.

3. Results for the competence, trustworthiness, and feeling thermometer items are pretty much what would be expected if the photo manipulation had no true effect on these items, with respective p-values of 0.904, 0.962, and 0.737. Solomon Messing noted that there is no expectation from the literature of an effect for these items, but now that I think of it, I'm not sure why there should be no expectation that showing a darkened photo of Obama would be expected to [1] make people more likely to call to mind negative racial stereotypes such as lazy and dirty but [2] have no effect on perceptions of Obama. In any event, I think that readers should have been told about the results for the competence, trustworthiness, and feeling thermometer items.

4. The report on these data suggested that the true effect is that the darkened photo increased stereotype activation. But I could have used the same data to argue for the inference that the darkened photo had no effect at all or at best only a negligible effect on stereotype activation and on attitudes toward Obama, had I reported the combination of all 11 word puzzles, plus the competence, trustworthiness, and feeling thermometer items. Moreover, had I selectively reported results and failed to inform peer reviewers of all the items, it might even have been possible to have published an argument that the true effect was that the lightened photo caused an increase in stereotype activation. I don't know why I should trust non-preregistered research if researchers have that much influence over inferences.

5. Feel free to check my code for errors or to report better ways to analyze the data.

Tagged with: methods, race, selective reporting, you're doing it wrong

Omitted condition from Monkey Cage post on paying college athletes

By L.J Zigerell Posted on December 31, 2015 Posted in Race 11 Comments

The Monkey Cage published a post, "Racial prejudice is driving opposition to paying college athletes. Here's the evidence." I tweeted about this post in several threads, but I'm posting the information here for possible future reference and for anyone who reads the blog.

Here's the key figure from the post. The left side of the post indicates that white respondents expressed more opposition to paying college athletes after exposure to a picture of black athletes than in a control condition with no picture.

After reading the post, I noted two oddities about the figure. First, based on the logic of an experiment -- change one thing only to assess the effect of that thing -- the proper comparison for assessing racial bias among white respondents would have been comparing the effect of a photo of black athletes to the effect of a photo of white athletes; that comparison would have removed the alternate explanations that respondents expressed more opposition because a photo was shown or because a photo of athletes was shown, and not necessarily because a photo of *black* athletes was shown. Second, the data were from the CCES, which typically has team samples of 1,000 respondents; these samples are presumably intended to be a representative of the national population, so there should be more than 411 whites in a 1,000-respondent sample.

Putting two and two together suggested that there was an unreported condition in which respondents were shown a photo of white athletes. I emailed the three authors of the blog post, and to their credit I received substantive replies to my questions about the experiment. Based on the team's responses, the experiment did have a condition in which respondents were shown a photo of white athletes, and opposition to paying college athletes in this "white athletes" photo condition did not differ at p<0.05 (two-tailed test) from opposition to paying college athletes in the "black athletes" photo condition.

Tagged with: methods, race, selective reporting, you're doing it wrong

Tag: selective reporting

Funnel plot for "Differences in Helping Whites and Blacks"

Symbolic racism and gun control attitudes

Selective reporting in "Priming Racial Resentment without Stereotypic Cues"

The omitted "White" or "European American" experiment

Selective reporting of outcome variables in "The Public's Anger"

Researchers select 2 of 16,369 combinations to report

Omitted condition from Monkey Cage post on paying college athletes