This post reports on publication bias analyses for the Tara L. Mitchell et al. 2005 meta-analysis: "Racial Bias in Mock Juror Decision-Making: A Meta-Analytic Review of Defendant Treatment" [gated, ungated]. The appendices for the article contained a list of sample sizes and effect sizes, but the list did not match the reported results in at least one case. Dr. Mitchell emailed me a file of the correct data (here).

VERDICTS

Here is the funnel plot for the Mitchell et al. 2005 meta-analysis of verdicts:

mitchell-et-al-2005-verdicts-funnel-plotEgger's test did not indicate at the conventional level of statistical significance the presence of funnel plot asymmetry in any of the four funnel plots, with p-values of p=0.80 (white participants, published studies), p=0.82 (white participants, all studies), p=0.10 (black participants, published studies), and p=0.63 (black participants, all studies).

Trim-and-fill with the L0 estimator imputed missing studies for all four funnel plots to the side of the funnel plot indicating same-race favoritism:

mitchell-et-al-2005-verdicts-tf-l0Trim-and-fill with the R0 estimator imputed missing studies for only the funnel plots for published studies with black participants:

mitchell-et-al-2005-verdicts-tf-r0---

SENTENCES

Here is the funnel plot for the Mitchell et al. 2005 meta-analysis of sentences:

mitchell-et-al-2005-sentences-funnel-plotEgger's test did not indicate at the conventional level of statistical significance the presence of funnel plot asymmetry in any of the four funnel plots, with p-values of p=0.14 (white participants, published studies), p=0.41 (white participants, all studies), p=0.50 (black participants, published studies), and p=0.53 (black participants, all studies).

Trim-and-fill with the L0 estimator imputed missing studies for the funnel plots with white participants to the side of the funnel plot indicating same-race favoritism:

mitchell-et-al-2005-sentences-tf-l0Trim-and-fill with the R0 estimator did not impute any missing studies:

mitchell-et-al-2005-sentences-tf-r0---

I also attempted to retrieve and plot data for the Ojmarrh Mitchell 2005 meta-analysis ("A Meta-Analysis of Race and Sentencing Research: Explaining the Inconsistencies"), but the data were reportedly lost in a computer crash.

---

NOTES:

1. Data and code for the Mitchell et al. 2005 analyses are here: data file for verdicts, data file for sentences, R code for verdicts, and R code for sentences.

Tagged with: , ,

Here's part of the abstract from Rios Morrison and Chung 2011, published in the Journal of Experimental Social Psychology:

In both studies, nonminority participants were randomly assigned to mark their race/ethnicity as either "White" or "European American" on a demographic survey, before answering questions about their interethnic attitudes. Results demonstrated that nonminorities primed to think of themselves as White (versus European American) were subsequently less supportive of multiculturalism and more racially prejudiced, due to decreases in identification with ethnic minorities.

So asking white respondents to select their race/ethnicity as "European American" instead of "White" influenced whites' attitudes toward and about ethnic minorities. The final sample for study 1 was a convenience sample of 77 self-identified whites and 52 non-whites, and the final sample for study 2 was 111 white undergraduates.

Like I wrote before, if you're thinking that it would be interesting to see whether results hold in a nationally representative sample with a large sample size, well, that was tried, with a survey experiment as part of the Time Sharing Experiments in the Social Sciences. Here are the results:

mc2011reanalysis

I'm mentioning these results again because in October 2014 the journal that published Rios Morrison and Chung 2011 desk rejected the manuscript that I submitted describing these results. So you can read in the Journal of Experimental Social Psychology about results for the low-powered test on convenience samples for the "European American" versus "White" self-identification hypothesis, but you won't be able to read in the JESP about results when that hypothesis was tested with a higher-powered test on a nationally-representative sample with data collected by a disinterested third party.

I submitted a revision of the manuscript to Social Psychological and Personality Science, which extended a revise-and-resubmit offer conditional on inclusion of a replication of the TESS experiment. I planned to conduct an experiment with an MTurk sample, but I eventually declined the revise-and-resubmit opportunity for various reasons.

The most recent version of the manuscript is here. Links to data and code.

Tagged with: , , , , , ,

Timofey Pnin linked to an Alice Eagly article that mentioned these two meta-analyses:

  • van Dijk et al. 2012 "Defying Conventional Wisdom: A Meta-Analytical Examination of the Differences between Demographic and Job-Related Diversity Relationships with Performance"
  • Post and Bryon 2015 "Women on Boards and Firm Financial Performance: A Meta-Analysis"

I wanted to check for funnel plot asymmetry in the set of studies in these meta-analyses, so I emailed coauthors of the articles. Hans van Dijk and Kris Byron were kind enough to send data.

The funnel plot for the 612 effect sizes in the van Dijk et al. 2012 meta-analysis is below. The second funnel plot below is a close-up of the bottom of the full funnel plot, limited to studies with fewer than 600 teams. The funnel plot is remarkably symmetric.

FP1

FP2

The funnel plots below are for the Post and Byron 2015 meta-analysis, with the full set of studies in the top funnel plot and, below the full funnel plot, a close-up of the studies with a standard error less than 0.4. The funnel plot is reasonably symmetric.

FP3

FP4

UPDATE (Apr 13, 2016):

More funnel plots from van Dijk et al. 2012.

Sample restricted to age diversity (DIV TYPE=1):

vDe - Age Diversity (1)

Sample restricted to race and ethnic diversity (DIV TYPE=2):

vDe - Race Ethnic Diversity (2)

Sample restricted to sex diversity (DIV TYPE=5):

vDe - Sex Diversity (5)

Sample restricted to education diversity (DIV TYPE=6):

vDe - Education Diversity (6)

Tagged with: , ,

Here is the manuscript that I plan to present at the 2015 American Political Science Association conference in September: revised version here. The manuscript contains links to locations of the data; a file of the reproduction code for the revised manuscript  is here.

Comments are welcome!

Abstract and the key figure are below:

Racial bias is a persistent concern in the United States, but polls have indicated that whites and blacks on average report very different perceptions of the extent and aggregate direction of this bias. Meta-analyses of results from a population of sixteen federally-funded survey experiments, many of which have never been reported on in a journal or academic book, indicate the presence of a moderate aggregate black bias against whites but no aggregate white bias against blacks.

Metan w mcNOTE:

I made a few changes since submitting the manuscript: [1] removing all cases in which the target was not black or white (e.g., Hispanics, Asians, control conditions in which the target did not have a race); [2] estimating meta-analyses without removing cases based on a racial manipulation check; and [3] estimating meta-analyses without the Cottrell and Neuberg 2004 survey experiment, given that that survey experiment was more about perceptions of racial groups instead of a test for racial bias against particular targets.

Numeric values in the figure are for a meta-analysis that reflects [1] above:

* For white respondents: the effect size point estimate was 0.039 (p=0.375), with a 95% confidence interval of [-0.047, 0.124].
* For black respondents: the effect size point estimate was 0.281 (p=0.016), with a 95% confidence interval of [0.053, 0.509].

---

The meta-analysis graph includes five studies for which a racial manipulation check was used to restrict the sample: Pager 2006, Rattan 2010, Stephens 2011, Pedulla 2011, and Powroznik 2014. Inferences from the meta-analysis were the same when these five studies included respondents who failed the racial manipulation checks:

* For white respondents: the effect size point estimate was 0.027 (p=0.499), with a 95% confidence interval of [-0.051, 0.105].
* For black respondents: the effect size point estimate was 0.268 (p=0.017), with a 95% confidence interval of [0.047, 0.488].

---

Inferences from the meta-analysis were the same when the Cottrell and Neuberg 2004 survey experiment was removed from the meta-analysis. For the residual 15 studies using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.063 (p=0.114), with a 95% confidence interval of [-0.015, 0.142].
* For black respondents: the effect size point estimate was 0.210 (p=0.010), with a 95% confidence interval of [0.050, 0.369].

---

For the residual 15 studies not using the racial manipulation check restriction:

* For white respondents: the effect size point estimate was 0.049 (p=0.174), with a 95% confidence interval of [-0.022, 0.121].
* For black respondents: the effect size point estimate was 0.194 (p=0.012), with a 95% confidence interval of [0.044, 0.345].

Tagged with: , ,

Here's the abstract of a PLoS One article, "Racial Bias in Perceptions of Others' Pain":

The present work provides evidence that people assume a priori that Blacks feel less pain than do Whites. It also demonstrates that this bias is rooted in perceptions of status and the privilege (or hardship) status confers, not race per se. Archival data from the National Football League injury reports reveal that, relative to injured White players, injured Black players are deemed more likely to play in a subsequent game, possibly because people assume they feel less pain. Experiments 1–4 show that White and Black Americans–including registered nurses and nursing students–assume that Black people feel less pain than do White people. Finally, Experiments 5 and 6 provide evidence that this bias is rooted in perceptions of status, not race per se. Taken together, these data have important implications for understanding race-related biases and healthcare disparities.

Here are descriptions of the samples for each experiment, after exclusions of respondents who did not meet criteria for inclusion:

  • Experiment 1: 240 whites from the University of Virginia psychology pool or MTurk
  • Experiment 2: 35 blacks from the University of Virginia psychology pool or MTurk
  • Experiment 3: 43 registered nurses or nursing students
  • Experiment 4: 60 persons from MTurk
  • Experiment 5: 104 persons from MTurk
  • Experiment 6: 245 persons from MTurk

Not the most representative samples, of course. If you're thinking that it would be interesting to see whether results hold in a nationally representative sample with a large sample size, well, that was tried, with a survey experiment as part of the Time Sharing Experiments in the Social Sciences. Here's the description of the results listed on the TESS site for the study:

Analyses yielded mixed evidence. Planned comparison were often marginal or non-significant. As predicted, White participants made (marginally) lower pain ratings for Black vs. White targets, but only when self-ratings came before target ratings. When target ratings came before self-ratings, White participants made (marginally) lower pain ratings for White vs. Black targets. Follow-up analyses suggest that White participants may have been reactant. White participants reported that they were most similar to the Black target and least similar to the White target, contrary to prediction and previous work both in our lab and others' lab. Moreover, White participants reported that Blacks were most privileged and White participants least privileged, again contrary to prediction and previous work both in our lab and others' lab.

The results of this TESS study do not invalidate the results of the six experiments and one archival study reported in the PLoS One article, but the non-reporting of the TESS study does raise questions about whether there were other unreported experiments and archival studies.

The TESS study had an unusually large and diverse sample: 586 non-Hispanic whites, 526 non-Hispanic blacks, 520 non-Hispanic Asians, and 528 Hispanics. It's too bad that these data were placed into a file drawer.

Tagged with: , , ,

Andrew Gelman linked to a story (see also here) about a Science article by Annie Franco, Neil Malhotra, and Gabor Simonovits on the file drawer problem in the Time Sharing Experiments for the Social Sciences. TESS fields social science survey experiments, and sometimes the results of these experiments are not published.

I have been writing up some of these unpublished results but haven't submitted anything yet. Neil Malhotra was kind enough to indicate that I'm not stepping on their toes, so I'll post what I have so far for comment. From what I have been able to determine, none of these studies discussed below were published, but let me know if I am incorrect about that. I'll try to post a more detailed write-up of these results soon, but in the meantime feel free to contact me for details on the analyses.

I've been concentrating on bias studies, because I figure that it's important to know if there is little-to-no evidence of bias in a large-scale nationally-representative sample; not that such a study proves that there's no bias, but reporting these studies helps to provide a better estimate for the magnitude of bias. It's also important to report evidence of bias in unexpected directions.

 

TESS 241

TESS study 241, based on a proposal from Stephen W. Benard, tested for race and sex bias in worker productivity ratings. Respondents received a vignette about the work behavior of a lawyer whose name was manipulated in the experimental conditions to signal the lawyer's sex and race: Kareem (black male), Brad (white male), Tamika (black female), and Kristen (white female). Respondents were asked how productive the lawyer was, how valuable the lawyer was, how hardworking the lawyer was, how competent the lawyer was, whether the lawyer deserved a raise, how respected the lawyer was, how honorable the lawyer was, how prestigious the lawyer was, how capable the lawyer was, how intelligent the lawyer was, and how knowledgeable the lawyer was.

Substantive responses to these eleven items were used to create a rating scale, with items standardized before summing and cases retained if there were substantive responses for at least three items; this scale had a Cronbach's alpha of 0.92. The scale was standardized so that its mean and standard deviation were respectively 0 and 1; higher values on the scale indicate more favorable evaluations.

Here is a chart of the main results, with experimental targets on the left side:

benardThe figure indicates point estimates and 95% confidence intervals for the mean level of evaluations in experimental conditions for all respondents and disaggregated groups; data were not weighted because the dataset did not contain a post-stratification weight variable.

The bias in this study is against Brad relative to Kareem, Kristen, and Tamika.

 

TESS 392

TESS study 392, based on a proposal from Lisa Rashotte and Murray Webster, tested for bias based on sex and age. Respondents were randomly assigned to receive a picture and text description of one of four target persons: Diane Williams, a 21-year-old woman; David Williams, a 21-year-old man; Diane Williams, a 45-year-old woman; and David Williams, a 45-year-old man. Respondents were asked to rate the target person on nine traits, drawn from Webster and Driskell (1983): intelligence, ability in situations in general, ability in things that the respondent thinks counts, capability at most tasks, reading ability, abstract abilities, high school grade point average, how well the person probably did on the Federal Aviation Administration exam for a private pilot license, and physical attractiveness. For the tenth item, respondents were shown their ratings for the previous nine items and given an opportunity to change their ratings.

The physical attractiveness item was used as a control variable in the analysis. Substantive responses to the other eight items were used to create a rating scale, with items standardized before summing and cases retained if the case had substantive responses for at least five items; this scale had a Cronbach's alpha of 0.91. The scale was standardized so that its mean and standard deviation were respectively 0 and 1; higher values on the scale indicate more favorable evaluations.

Here is a chart of the main results, with experimental targets on the left side:

rashotte The figure indicates point estimates and 95% confidence intervals for the mean level of evaluations in experimental conditions for all respondents and disaggregated groups; data were weighted. The bias in this study, among women, is in favor of older persons and, among men, is in favor of the older woman. Here's a table of 95% confidence intervals for mean rating differences for each comparison:

rashottetable

 

TESS 012

TESS study 012, based on a proposal from Emily Shafer, tested for bias for or against married women based on the women's choice of last name after marriage. The study's six conditions manipulated a married woman's last name and the commitment that caused the woman to increase the burden on others. Conditions 1 and 4, 2 and 5, and 3 and 6 respectively reflected the woman keeping her last name, hyphenating her last name, or adopting her husband's last name; the vignette for conditions 1, 2, and 3 indicated that the woman's co-workers were burdened because of the woman's marital commitment, and the vignette for conditions 4, 5, and 6 indicated that the woman's husband was burdened because of the woman's work commitment.

Substantive responses to items 1, 2, 5A, and 6A were used to create an "employee evaluation" scale, with items standardized before summing and cases retained if there were substantive responses for at least three items; this scale had a Cronbach's alpha of 0.73. Substantive responses to items 3, 4, 5B, and 6B were used to create a "wife evaluation" scale, with items standardized before summing and cases retained if there were substantive responses for at least three items; this scale had a Cronbach's alpha of 0.74. Both scales were standardized so that their mean and standard deviation were respectively 0 and 1 and then reversed so that higher scores indicated a more positive evaluation.

Results are presented for the entire sample, for men, for women, for persons who indicated that they were currently married or once married and used traditional last name patterns (traditional respondents), and for persons who indicated that they were currently married or once married but did not use traditional last name patterns (non-traditional respondents); name patterns were considered traditional for female respondents who changed their last name to their spouse's last name (with no last name change by the spouse), and male respondents whose spouse changed their last name (with no respondent last name change).

Here is a chart of the main results, with experimental conditions on the left side:

shafer

The figure displays point estimates and 95% confidence intervals for weighted mean ratings for each condition, adjusted for physical attractiveness. Not much bias detected here, except for men's wife evaluations when the target woman kept her last name.

 

TESS 714

TESS study 714, based on a proposal from Kimberly Rios Morrison, tested whether asking whites to report their race as white had a different effect on multiculturalism attitudes and prejudice than asking whites to report their ethnicity as European American. See here for published research on this topic.

Respondents were randomly assigned to one of three groups: respondents in the European American prime group were asked to identify their race/ethnicity as European American, American Indian or Alaska Native, Asian American or Pacific Islander, Black or African American, Hispanic/Latino, or Other; respondents in the White prime group were asked to identify their race/ethnicity from the same list but with European American replaced with White; and respondents in the control group were not asked to identify their race/ethnicity.

Respondents were shown 15 items regarding ethnic minorities, divided into four sections that we'll call support for multiculturalism, support for pro-ethnic policies, resentment of ethnic minorities, and closeness to whites. Scales were made for items from the first three sections; to create a "closeness to whites" scale, responses to the item on closeness to ethnic minorities were subtracted from responses to the item on closeness to nonminorities, to indicate degree of closeness to whites; this item was then standardized.

Here is a chart of the main results, with experimental conditions on the left side:

rios morrisonThe figure displays weighted point estimates and 95% confidence intervals. The prime did not have much influence, except for the bottom right graph.

---

There's a LOT of interesting things in the TESS archives. Comparing reported results to my own analyses of the data (not for the above studies, but for other studies) has illustrated the inferential variation that researcher degrees of freedom can foster.

One of the ways to assess claims of liberal bias in social science is to comb through data such as the TESS archives, which let us see what a sample of researchers are interested in and what a sample of researchers place into their file drawer. Researchers placing null results into a file drawer is ambiguous because we cannot be sure whether placement in the file drawer is due to the null results or to the political valence of the null results; however, researchers placing statistically significant results into a file drawer has much less ambiguity.

---

UPDATE (Sept 6, 2014)

Gábor Simonovits, one of the co-authors of the Science article, quickly and kindly sent me a Stata file of their dataset; that data and personal communication with Stephen W. Benard indicated that results from none of the four studies reported in this post have been published.

Tagged with: , , , ,