The Public Opinion Quarterly article "Bias in the Flesh" provided evidence of "an evaluative penalty for darker skin" (quote from the abstract). Study 2 of the article was an MTurk survey. Some respondents were shown an image of Barack Obama with darkened skin, and some respondents were shown an image of Barack Obama with lightened skin. Both sets of respondents received the text: "We are interested in how people evaluate images of political figures. Consider the following image:"

Immediately following the image and text, respondents received 14 items that could be used to assess this evaluative penalty for darker skin; these items are listed in the boxes below. The first 11 items could be used to measure whether, compared to respondents in one of the conditions, respondents in the other condition completed more word fragments with words associated with negative stereotypes, such as LAZY or CRIME.

Please complete the following word fragments. Make sure to type out the entire word.
1. L A _ _
2. C R _ _ _
3. _ _ O R
4. R _ _
5. W E L _ _ _ _
6. _ _ C E
7. D _ _ _ Y
8. B R _ _ _ _ _
9. _ _ A C K
10. M I _ _ _ _ _ _
11. D R _ _
How competent is Barrack Obama?
1. Very competent
2. Somewhat competent
3. Neither competent nor incompetent
4. Somewhat incompetent
5. Very incompetent
How trustworthy is Barrack Obama?
1. Very trustworthy
2. Somewhat trustworthy
3. Neither trustworthy nor untrustworthy
4. Somewhat untrustworthy
5. Very untrustworthy
On a scale from 0 (coldest) to 100 (warmest) how do you feel about Barack Obama?

The three bolded items above are the only three items for which results were reported on in the article (items 1, 3, and 7) and in the corresponding Monkey Cage post. In other words, the researchers selected 3 of 14 items to assess the evaluative penalty for darker skin. [Update: Footnote 16 in the article reported results for the combination of lazy, black, poor, welfare, crime, and dirty (p=0.078).]

If I'm using the correct formula, there are 16,369 different combinations of 14 items that could have been reported, not counting the null set and not counting reporting on only one item. Hopefully, I don't need a formula or calculation to convince you that there is a pretty good chance that random assignment variation alone would produce an associated two-tailed p-value less than 0.05 in at least one of those 16,369 combinations. The fact that the study reported one of these combinations doesn't provide much information about the evaluative penalty for darker skin.

The really discomforting part of this selective reporting is how transparently it was done: the main text of the article noted that only 3 of 14 puzzle-type items were selected, and the supplemental file included the items about Obama's competency, Obama's trustworthiness, and the Obama feeling thermometer. There was nothing hidden about this selective reporting, from what I can tell.

---

Notes:

1. For what it's worth, the survey had an item asking whether Obama's race is white, black, or mixed. But that doesn't seems to be useful for measuring an evaluative penalty for darker skin, so I didn't count it.

2. It's possible that the number of permutations that the peer reviewers would have permitted is less than 16,369. But that's an open question, given that the peer reviewers permitted 3 of 14 potential outcome variables to be reported [Update: ...in the main text of the article].

3. The data are not publicly available to analyze, so maybe the selective reporting in this instance didn't matter. I put in a request last week for the data, so hopefully we'll find out.

---

UPDATE (Jan 12, 2016)

1. I changed the title of the post from "Researchers select 1 of 16,369 combinations to report" to "Researchers select 2 of 16,369 combinations to report", because I overlooked footnote 16 in the article. Thanks to Solomon Messing for the pointer.

2. Omar Wasow noted that two of the items had a misspelling of Barack Obama's first name. Those misspellings appear in the questionnaire in the supplemental file for the article.

---

UPDATE (Jan 13, 2016)

1. Solomon Messing noted that data for the article are now available at the Dataverse. I followed as best I could the posted R code to reproduce the analysis in Stata, and I came close to the results reported in the article. I got the same percentages for the three word puzzles as the percentages that appear the article: 33% for the lightened photo, and 45% for the darkened photo, with a small difference in t-scores (t=2.74 to t=2.64). Estimates and t-scores were also close for the reported result in footnote 16: estimates of 0.98 and 1.11 for me, and estimates of 0.97 and 1.11 in the article, with respective t-scores of 1.79 and 1.77. Compared to the 630 unexcluded respondents for the article, I had 5 extra respondents after exclusions (635 total).

The table below reports results from t-tests that I conducted. The Stata code is available here.

Bias in the Flesh Table 1

Let me note a few things from the table:

First, I reproduced the finding that, when the word puzzles were limited to the combination of lazy, dirty, and poor, unexcluded respondents in the darkened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the lightened photo condition.

However, if I combine the word puzzles for race, minority, and rap, the finding is that unexcluded respondents in the lightened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the darkened photo condition: the opposite inference. Same thing when I combine race, minority, rap, and welfare. And same thing when I combine race, minority, rap, welfare, and crime.

Sure, as a group, these five stereotypes -- race, minority, rap, welfare, and crime -- don't have the highest face validity of the 11 stereotypes for being the most negative stereotypes, but there doesn't appear to be anyone in political science enforcing a rule that researchers must report all potential or intended outcome variables.

2. Estimates for 5 of the 11 stereotype items fell to the negative side of zero, indicating that unexcluded respondents in the lightened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the darkened photo condition. And estimates for 6 of the 11 stereotype items fell to the positive side of zero, indicating that unexcluded respondents in the darkened photo condition completed more word puzzles in a stereotype-congruent way than unexcluded respondents in the lightened photo condition.

A 5-to-6 split like that is what we'd expect if there were truly no effect, so -- in that sense -- this experiment doesn't provide much evidence for the relative effect of the darkened photo. That isn't a statement that the true relative effect of the darkened photo is exactly zero, but it is a statement about the evidence that this experiment has provided.

For what it's worth, the effect size is 0.118 and the p-value is 0.060 for the combination of word puzzles that I think has the most face validity for being the most negative stereotypes (lazy, poor, welfare, crime, drug, and dirty); the effect size is -0.032 and the p-value is 0.560 for the combination of word puzzles that I think have the least face validity for being the most negative stereotypes (race, black, brother, minority, and rap). So I'm not going to make any bets that the true effect is zero or that the lightened photo fosters relatively more activation of negative stereotypes.

3. Results for the competence, trustworthiness, and feeling thermometer items are pretty much what would be expected if the photo manipulation had no true effect on these items, with respective p-values of 0.904, 0.962, and 0.737. Solomon Messing noted that there is no expectation from the literature of an effect for these items, but now that I think of it, I'm not sure why there should be no expectation that showing a darkened photo of Obama would be expected to [1] make people more likely to call to mind negative racial stereotypes such as lazy and dirty but [2] have no effect on perceptions of Obama. In any event, I think that readers should have been told about the results for the competence, trustworthiness, and feeling thermometer items.

4. The report on these data suggested that the true effect is that the darkened photo increased stereotype activation. But I could have used the same data to argue for the inference that the darkened photo had no effect at all or at best only a negligible effect on stereotype activation and on attitudes toward Obama, had I reported the combination of all 11 word puzzles, plus the competence, trustworthiness, and feeling thermometer items. Moreover, had I selectively reported results and failed to inform peer reviewers of all the items, it might even have been possible to have published an argument that the true effect was that the lightened photo caused an increase in stereotype activation. I don't know why I should trust non-preregistered research if researchers have that much influence over inferences.

5. Feel free to check my code for errors or to report better ways to analyze the data.

Tagged with: , , ,

The Washington Post police shootings database as of January 4, 2016, indicated that on-duty police officers in the United States shot dead 91 unarmed persons in 2015: 31 whites, 37 blacks, 18 Hispanics, and 5 persons of another race or ethnicity. The database updates; the screen shot below is the data as of January 4, 2016.

WaPo UM

The New York Times search engine restricted to dates in 2015 returned 1,281 hits for "unarmed black", 4 hits for "unarmed white", 0 hits for "unarmed Hispanic", and 0 hits for "unarmed Asian":

nytimesUnarmedBlack

nytimesUnarmedWhite

nytimesUnarmedHispanic

nytimesUnarmedAsian

Tagged with: , , , ,

The Monkey Cage published a post, "Racial prejudice is driving opposition to paying college athletes. Here's the evidence." I tweeted about this post in several threads, but I'm posting the information here for possible future reference and for anyone who reads the blog.

Here's the key figure from the post. The left side of the post indicates that white respondents expressed more opposition to paying college athletes after exposure to a picture of black athletes than in a control condition with no picture.

After reading the post, I noted two oddities about the figure. First, based on the logic of an experiment -- change one thing only to assess the effect of that thing -- the proper comparison for assessing racial bias among white respondents would have been comparing the effect of a photo of black athletes to the effect of a photo of white athletes; that comparison would have removed the alternate explanations that respondents expressed more opposition because a photo was shown or because a photo of athletes was shown, and not necessarily because a photo of *black* athletes was shown. Second, the data were from the CCES, which typically has team samples of 1,000 respondents; these samples are presumably intended to be a representative of the national population, so there should be more than 411 whites in a 1,000-respondent sample.

Putting two and two together suggested that there was an unreported condition in which respondents were shown a photo of white athletes. I emailed the three authors of the blog post, and to their credit I received substantive replies to my questions about the experiment. Based on the team's responses, the experiment did have a condition in which respondents were shown a photo of white athletes, and opposition to paying college athletes in this "white athletes" photo condition did not differ at p<0.05 (two-tailed test) from opposition to paying college athletes in the "black athletes" photo condition.

Tagged with: , , ,

There is a common practice of discussing inequality in the United States without reference to Asian Americans, which permits the suggestion that the inequality is due to race or racial bias. Here's a recent example:

The graph reported results for Hispanics disaggregated into Cubans, Puerto Ricans, Mexicans, and other Hispanics, but the graph omitted results for Asians and Pacific Islanders, even though the note for the graph indicates that Asians/Pacific Islanders were included in the model. Here are data on Asian American poverty rates (source):

ACS

The omission of Asian Americans from discussions of inequality is a common enough practice [1, 2, 3, 4, 5] that it deserves a name. The Asian American Exclusion is as good as any.

Tagged with: , , ,

Here is the manuscript that I plan to present at the 2015 American Political Science Association conference in September: revised version here. The manuscript contains links to locations of the data; a file of the reproduction code for the revised manuscript  is here.

Comments are welcome!

Abstract and the key figure are below:

Racial bias is a persistent concern in the United States, but polls have indicated that whites and blacks on average report very different perceptions of the extent and aggregate direction of this bias. Meta-analyses of results from a population of sixteen federally-funded survey experiments, many of which have never been reported on in a journal or academic book, indicate the presence of a moderate aggregate black bias against whites but no aggregate white bias against blacks.

Metan w mcNOTE:

I made a few changes since submitting the manuscript: [1] removing all cases in which the target was not black or white (e.g., Hispanics, Asians, control conditions in which the target did not have a race); [2] estimating meta-analyses without removing cases based on a racial manipulation check; and [3] estimating meta-analyses without the Cottrell and Neuberg 2004 survey experiment, given that that survey experiment was more about perceptions of racial groups instead of a test for racial bias against particular targets.

Numeric values in the figure are for a meta-analysis that reflects [1] above:

* For white respondents: the effect size point estimate was 0.039 (p=0.375), with a 95% confidence interval of [-0.047, 0.124].
* For black respondents: the effect size point estimate was 0.281 (p=0.016), with a 95% confidence interval of [0.053, 0.509].

---

The meta-analysis graph includes five studies for which a racial manipulation check was used to restrict the sample: Pager 2006, Rattan 2010, Stephens 2011, Pedulla 2011, and Powroznik 2014. Inferences from the meta-analysis were the same when these five studies included respondents who failed the racial manipulation checks:

* For white respondents: the effect size point estimate was 0.027 (p=0.499), with a 95% confidence interval of [-0.051, 0.105].
* For black respondents: the effect size point estimate was 0.268 (p=0.017), with a 95% confidence interval of [0.047, 0.488].

---

Inferences from the meta-analysis were the same when the Cottrell and Neuberg 2004 survey experiment was removed from the meta-analysis. For the residual 15 studies using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.063 (p=0.114), with a 95% confidence interval of [-0.015, 0.142].
* For black respondents: the effect size point estimate was 0.210 (p=0.010), with a 95% confidence interval of [0.050, 0.369].

---

For the residual 15 studies not using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.049 (p=0.174), with a 95% confidence interval of [-0.022, 0.121].
* For black respondents: the effect size point estimate was 0.194 (p=0.012), with a 95% confidence interval of [0.044, 0.345].

Tagged with: , ,

The post is here.

Data for the Hutchings and Walton study are here and code is here.

Data for the Southern Focus Poll are here and code is here.

Here are factor analysis results for the Hutchings and Walton study and for the Southern Focus Poll.

---

UPDATE (July 7, 2015)

Corrected the code link for the Southern Focus Poll.

The Monkey Cage post is discussed in a scatterplot post.

More code to support this claim about Southern choice for words to describe food, as mentioned here.

Tagged with:

Here are four items typically used to measure symbolic racism, in which respondents are asked to indicate their level of agreement with the statements:

1. Irish, Italians, Jewish and many other minorities overcame prejudice and worked their way up. Blacks should do the same without any special favors.

2. Generations of slavery and discrimination have created conditions that make it difficult for blacks to work their way out of the lower class.

3. Over the past few years, blacks have gotten less than they deserve.

4. It's really a matter of some people not trying hard enough; if blacks would only try harder they could be just as well off as whites.

These four items are designed such that an antiblack racist would tend to respond the same way as a non-racist principled conservative. Many researchers realize this conflation problem and make an effort to account for this conflation. For example, here is an excerpt from Rabinowitz, Sears, Sidanius, and Krosnick 2010, explaining how responses to symbolic racism items might be influenced in part by non-racial values:

Adherence to traditional values—without concomitant racial prejudice—could drive Whites' responses to SR [symbolic racism] measures and their opinions on racial policy issues. For example, Whites' devotion to true equality may lead them to oppose what they might view as inherently inequitable policies, such as affirmative action, because it provides advantages for some social groups and not others. Similarly affirmative action may be perceived to violate the traditional principle of judging people on their merits, not their skin color. Consequently, opposition to such policies may result from their perceived violation of widely and closely held principles rather than racism.

However, this nuance is sometimes lost. Here is an excerpt from the Pasek, Krosnick, and Tompson 2012 manuscript that was discussed by the Associated Press shortly before the 2012 presidential election:

Explicit racial attitudes were gauged using questions designed to measure "Symbolic Racism" (Henry & Sears, 2002).

...

The proportion of Americans expressing explicit anti-Black attitudes held steady between 47.6% in 2008 and 47.3% in 2010, and increased slightly and significantly to 50.9% in 2012.

---

See here and here for a discussion of the Pasek et al. 2012 manuscript.

Tagged with: , , , , ,