Below is a discussion of small study effects in the data for the 2017 PNAS article, "Meta-analysis of field experiments shows no change in racial discrimination in hiring over time", by Lincoln Quillian, Devah Pager, Ole Hexel, and Arnfinn Midtbøen. The first part is the initial analysis that I sent to Dr. Quillian. The Quillian et al. team replied here, also available via this link a level up. I responded to this reply below my initial analysis and will notify Dr. Quillian of the reply. Please note that Quillian et al. 2017 mentions publication bias analyses on page 5 of its main text and in Section 5 of the supporting information appendix.

---

Initial analysis

Levels of discrimination against Black job applicants in the United States have not changed much or at all over the past 25 years is a conclusion of the Quillian et al. 2017 PNAS article, based on a meta-analysis that focuses on 1989-2015 field experiments assessing discrimination against Black or Hispanic job applicants relative to White applicants. The credibility of this conclusion depends at least on the meta-analysis including the population of relevant field experiments or a representative set of relevant field experiments. However, the graph below for the dataset set of Black/White discrimination field experiments is consistent with what would be expected if the meta-analysis did not have a complete set of studies.

Comment Q2017 Figure 1

The graphs plot a measure of the precision of each study against the corresponding effect size estimate, from the dmap_update_1024recoded_3.dta dataset available here. For a population of studies or for a representative set of studies, the pattern of points is expected to approximate a symmetric pyramid peaking at zero on the y-axis. The logic of this expectation is that, if there were a single true underlying effect, the size of that effect would be the estimated effect size from a perfectly-precise study, which would have a standard error of zero. The average effect size for less-than-perfectly-precise studies should also approximate the true effect size, but any given less-than-perfectly-precise study would not necessarily produce an estimate of the true effect size and would be expected to produce estimates that often fall to one side or the other side of the true effect size, with estimates from lower-precision studies falling further on average from the true effect size than estimates from higher-precision studies, thus creating the expected symmetric pyramid shape.

Egger's test assesses asymmetry in the shape of a pattern of points. The p-value of 0.003 for the Black/White set of studies indicates the presence of sufficient evidence to conclude with reasonable certainty that the pattern of points for the 1989-2015 set of Black/White discrimination field experiments is asymmetric. This particular pattern of asymmetry could have been caused by the higher-precision studies having tested for discrimination in situations with lower levels of anti-Black discrimination relative to situations for the lower-precision studies. But this pattern could also have been produced by suppression of low-precision studies that had null results or had results that indicated discrimination favoring Blacks relative to Whites.

Any inference from analyses of the set of 1989-2015 Black/White discrimination field experiments should thus consider the possibility that the set is incomplete and that any such incompleteness might bias inferences. For example, assessing patterns over time without any adjustment for possible missing studies requires an assumption that the inclusion of any missing studies would not alter the particular inference being made. That might be a reasonable assumption, but it should be identified as an assumption of any such inference.

The graphs below attempt to assess this assumption, by plotting estimates for the 10 earliest 1989-2015 Black/White field experiments and the 10 latest 1989-2015 Black/White field experiments, excluding the study that had no year indicated in the dataset for the year of the fieldwork. Both graphs are at least suggestive of the same type of small study effects.

Comment Q2017 Figure 2

Statistical methods have been developed to estimate the true effect size in meta-analyses after accounting for the possibility that the meta-analysis does not include the population of relevant studies or at least a representative set of relevant studies. For example, the top 10 percent by precision method, the trim-and-fill method with a linear estimator, and the PET-PEESE method cut the estimate of discrimination across the Black/White discrimination field experiments from 36 percent fewer callbacks or interviews to 25 percent, 21 percent, and 20 percent, respectively. These estimates, though, depend heavily on a lack of publication bias in highly-precise studies, which adds another assumption to these analyses and underscores the importance of preregistering studies.

Social science should inform public beliefs and public policy, but the ability of social scientists to not report data that have been collected and analyzed cannot help but undercut this important role for social science. Social scientists should consider preregistering their plans to conduct studies and their planned research designs for analyzing data, to restrict their ability to suppress undesired results and to thus add credibility to their research and to social science in general.

---

Reply from the Quillian et al.

Here

---

My response to the Quillian et al. reply

[1] The second section heading in the Quillian et al. reply correctly states that "Tests based on funnel plot asymmetry often generate false positives as indicators of publication bias". The Quillian et al. reply reported the funnel plot to the left below and the Egger's test p-value of 0.647 for the set of 13 Black/White discrimination resume audit correspondence field experiments, which provide little-to-no evidence of small study effects or publication bias. However, the funnel plot of the residual set of 8 Black/White discrimination field experiments—of in-person-audits—has an asymmetric shape and a p=0.043 Egger's test indicative of small study effects.

Comment Q2017 Figure 3The Quillian et al. reply indicated that "Using only resume audits to analyze change over time gives no trend (the linear slope is -.002, almost perfectly flat, shown in figure 3 in our original paper, and the weighted-average discrimination ratio is 1.32, only slightly below the ratio of all studies of 1.36)". For me at least, the lack of a temporal pattern in the resume audit (correspondence) field experiments is more convincing after seeing the funnel plot pattern than when not knowing the funnel plot pattern, although now the inference is limited to racial discrimination between 2001 and 2015 because there were no dataset correspondence field experiments conducted between 1989 and 2000. The top graph below illustrates this nearly-flat -0.002 slope for correspondence audit field experiments. Presuming no publication bias or presuming a constant effect of publication bias, it is reasonable to infer that there was no decrease in the level of White-over-Black favoring in correspondence audit field experiments between 2001 and 2015.

Comment Q2017 Figure 4But presuming no publication bias or presuming a constant effect of publication bias, the slope for in-person audits in the bottom graph above indicates a potentially alarming increase in discrimination favoring Whites over Blacks, from the early 1990s to the post-2000 years, with slope of 0.03 and a corresponding p-value of p=0.08. But maybe there's a good reason to not include the three field experiments from 1990 and 1991 with a decade gap between the latest of these three field experiments and the set of post-2000 field experiments. If so, the slope of the line for Black/White discrimination correspondence studies and Black/White discrimination in-person audit studies pooled together from 2001 to 2015 is -0.02 with a p-value of p=0.059, and depicted below.

[2] I don't object to the use of the publication bias test reported on in Quillian et al. 2017. My main objections are to the non-reporting of a funnel plot and to basing the inference that "publication or write-up bias is unlikely to have produced inflated discrimination estimates" (p. 6 of the supporting information index) on a null result from a regression with 21 points and five independent variables. Trim-and-fill lowered the meta-analysis estimate from 0.274 to 0.263 for the 1989-2015 Black/White discrimination correspondence audits, but lowered the 1989-2015 Black/White discrimination in-person audit meta-analysis estimate from 0.421 to 0.158. The trim-and-fill decrease for the pooled set of 1989-2015 Black/White discrimination field experiments is from 0.307 to 0.192.

Funnel plots and corresponding tests of funnel plot asymmetry indicate at most the presence of small study effects, which could be caused by phenomena other than publication bias. The Quillian et al. reply notes that "we find evidence that the difference between in person versus resume audit may create false positives for this test" (p. 4). This information and the reprinted funnel plots below are useful because they suggest multiple reasons to not pool results from in-person audits and correspondence audits for Black/White discrimination, such as [i] the possibility of publication bias in the in-person audit set of studies or [ii] possible differences in mean effect sizes for in-person audits compared to correspondence audits.

Comment Q2017 Figure 3Maybe the best way to report these results is a flat line for correspondence audits indicating no change between 2001 and 2015 (N=13) and a downward-sloping-but-not-statistically-significant line for in-person audits between 2001 and 2015 (N=5), with an upward-sloping-but-not-statistically-significant line for in-person audits between 1989 and 2015 (N=8).

[3] This section discusses the publication bias test used by Quillian et al. 2017. I'll use "available" to describe field experiments retrieved in the search for published and unpublished field experiments.

The Quillian et al. reply (pp. 1-2) describes the logic of the publication bias test that they used as:

If publication bias is a serious issue, then studies that focus on factors other than race/ethnic discrimination should show lower discrimination than studies focused primarily on race/ethnicity, because for the latter studies (but not the former) publication should be difficult for studies that do not find significant evidence of racial discrimination.

The expectation, as I understand it, is that discrimination field experiments with race as the primary focus will have a range of estimates, some of which are statically significant and some of which are not statically significant. If there is publication bias such that race-as-the-primary-focus field experiments that do not find discrimination against Blacks are less likely to be available than race-as-the-primary-focus field experiments that find discrimination against Blacks, then the estimate of discrimination against Blacks in the available race-as-the-primary-focus field experiments should be artificially inflated above the true value of racial discrimination. This publication bias test involves a comparison of this presumed inflated effect size to the effect size from field experiments in which race was not the primary focus, which presumably is closer to the true value of racial discrimination because non-availability in the non-race-as-the-primary-focus field experiments is not primarily due to the p-value and direction for racial discrimination but is instead or primarily due to the p-value and direction for the other type of discrimination. The publication bias test is whether the effect size for the available non-race-focused discrimination field experiments is smaller than effect size for the available race-focused discrimination field experiments.

The effect size for racial discrimination from field experiments in which race was not the primary focus might still be inflated in the presence of publication bias because [non-race-as-the-primary-focus field experiments that don't find discrimination in the primary focus but do find discrimination in the race manipulation] are plausibly more likely to be available than [non-race-as-the-primary-focus field experiments that don't find discrimination in the primary focus or in the race manipulation].

But let's stipulate that the racial discrimination effect size from non-race-as-the-primary-focus field experiments should be smaller than the racial discrimination effect size from race-as-the-primary-focus field experiments. If so, how large must this expected difference be such that the observed null result (0.051 coefficient, 0.112 standard error) in the N=21 five-independent-variable regression in Table S7 of Quillian et al. 2017 should be interpreted as evidence of the absence of nontrivial levels of publication bias?

For what it's worth, the publication bias test in the regression below reflects the test used in Quillian et al. 2017, but with a different model and with removal of the three field experiments from 1990 and 1991, such that the sample is the set of Black/White discrimination field experiments from 2001 to 2015. The control for the study method indicates that in-person audits have an estimated 0.40 larger effect size than correspondence audits. The 95 percent confidence interval for the race_not_focus predictor ranges from -0.21 to 0.18. Is that range inconsistent with the expected value based on this test if there were nontrivial amounts of publication bias?

Comment Q2017 Figure 6---

Data available at the webpage for Quillian et al. 2017 [here]

My R code [here]

My Stata code [here]

Tagged with: , ,

Continuing from a Twitter thread that currently ended here...

Hi Jenn,

I don't think that it's disingenuous to compare two passages that assess discrimination in decision-making based on models of decision-making that lack measures of relevant non-discriminatory factors that could influence decisions. At that level of abstraction, the two passages are directly comparable.

My perception is that:

The evidence of discrimination against Asian Americans in the cited study about college admissions is stronger than the evidence of discrimination against Asian Americans in the cited study about earnings; therefore, not accepting the evidence of discrimination in the college admissions study as evidence of true discrimination suggests that the evidence of discrimination in the earnings study should also not be accepted as evidence of true discrimination.

I perceive the evidence of discrimination in the college admissions study to be stronger because [1] net of included controls, the college admissions gap appears to be larger than the earnings gap, [2] the college admissions study appears to have fewer and fewer important inferential issues involving samples and included controls [*], and [3] compared to decision-making about which applicants are admitted to a college, decision-making about how much a worker should be paid presumably involves more important information about relevant non-discriminatory factors that have not been included in the statistical control of the studies.

Moreover, including evidence from outside these studies, legal cases involving racial discrimination in college admissions have often involved decision-making that explicitly includes race as a factor. My presumption is that a larger percentage of recent college admissions decisions have been made in which race is an explicit factor in admissions compared to the percentage of recent earnings decisions that have been made in which race is an explicit factor in worker remuneration.

For what it's worth, I think that a residual net racial discrimination is likely across a large number of important decisions made in the absence of perfect information, such as decisions involving college admissions and earnings, and I think that it is reasonable to accept evidence of discrimination against Asian Americans based on the studies cited in both passages.

---

[*] Support for [2] above:

[2a] The study that reported an 8% earnings gap was limited to data for men age 25 to 64 with a college degree who were participating in the labor market. Estimates for comparing earnings of White men to earnings of Asian men should be expected to be skewed to the extent that White men and Asian men with the same earnings potential have a different probability of being a college graduate or have a different probability of being in the labor market.

[2b] I don't think that naively controlling for cost of living is correct because higher costs of living partly reflect job perks that should not be completely controlled for. If, after adjusting for cost of living, a person who works in San Francisco has the same equivalent earnings as a person who works in an uncomfortably-humid rural lower-cost-of-living area with few amenities, the person who works in San Francisco is nonetheless better off in terms of climate and access to amenities.

---

I'm not sure that selectivity in immigration is relevant. The earnings models control for factors such as highest degree, field of study for the highest degree, and Carnegie classification of the school for the highest degree. It's possible that, net of these controls, Asian American men workers have higher earnings potential than White American men workers, but I'm not aware of evidence for this.

Tagged with: , ,

The Monkey Cage published a post by Dawn Langan Teele and Kathleen Thelen: "Some of the top political science journals are biased against women. Here's the evidence." The evidence presented for the claim of bias appears to be that women represent a larger percentage of the political science discipline than of authors in top political science journals. But that doesn't mean that the journals are biased against women, and the available data that I am aware of also doesn't indicate that the journals are biased against women:

1. Discussing data from World Politics (1999-2004), International Organization (2002), and Comparative Political Studies and International Studies Quarterly (three undisclosed years), Breuning and Sanders 2007 reported that "women fare comparatively well and appear in each journal at somewhat higher rates than their proportion among submitting authors" (p. 350).

2. Data for the American Journal of Political Science reported by Rick Wilson here indicated that 32% of submissions from 2010 to 2013 had at least one female author and 35% of accepted articles had at least one female author.

3. Based on data from 1983 to 2008 in the Journal of Peace Research, Østby et al. 2013 reported that: "If anything, female authors are more likely to be selected for publication [in JPR]".

4. Data below from Ishiyama 2017 for the American Political Science Review from 2012 to 2016 indicate that women served as first author for 27% of submitted manuscripts and 25% of accepted manuscripts.

APSR Data---

The data across the four points above do not indicate that these journals or corresponding peer reviewers are biased against women in this naive analysis. Of course, causal identification of bias would require a more representative sample beyond the largely volunteered data above and would require, for claims of bias among peer reviewers, statistical control for the quality of submissions and, for claims of bias at the editor level, statistical control for peer reviewer recommendations; analyses would get even more complicated accounting for the possibility that editor bias can influence peer reviewers selection, which can make the process easier or more difficult than would occur with unbiased assignment to peer reviewers.

Please let me know if you are aware of any other relevant data for political science journals.

---

NOTE

1 The authors of the Monkey Cage post have an article that cites Breuning and Sanders 2007 and Østby et al. 2013, but these data were not mentioned in the Monkey Cage post.

Tagged with: , ,

I recently blogged about the Betus, Lemieux, and Kearns Monkey Cage post (based on this Kearns et al. working paper) that claimed that "U.S. media outlets disproportionately emphasize the smaller number of terrorist attacks by Muslims".

I asked Kearns and Lemieux to share their data (I could not find an email for Betus). My request was denied until the paper was published. I tweeted a few questions to the coauthors about their data, but these tweets have not yet received a reply. Later, I realized that it would be possible to recreate or at least approximate their dataset because Kearns et al. included their outcome variable coding in the appendix of their working paper. I built a dataset based on [A] their outcome variable, [B] the Global Terrorism Database that they used, and [C] my coding of whether a given perpetrator was Muslim.

My analysis indicated that these data do not appear to support the claim of disproportionate media coverage of terror attacks by Muslims. In models with no control variables, terror attacks by Muslim perpetrators were estimated to receive 5.0 times as much media coverage as other terror attacks (p=0.008), but, controlling for the number of fatalities, this effect size drops to 1.53 times as much media coverage (p=0.480), which further drops to 1.30 times as much media coverage (p=0.622) after adding a control for attacks by unknown perpetrators, so that terror attacks by Muslim perpetrators are compared to terror attacks by known perpetrators who are not Muslim. See the Stata output below, in which "noa" is the number of articles and coefficients represent incident rate ratios:

kearns et al 1My code contains descriptions of corrections and coding decisions that I made. Data from the Global Terrorism Database is not permitted to be posted online without permission, so the code is the only information about the dataset that I am posting for now. However, the code describes how you can build your own dataset with Stata.

Below is the message that I sent to Kearns and Lemieux on March 17. Question 2 refers to the possibility that the Kearns et al. outcome variable includes news articles published before the identities of the Boston Marathon bombers were known; that lack of knowledge of who the perpetrators were makes it difficult to assign that early media coverage to the Muslim identity of the perpetrators. Question 3 refers to the fact that the coefficient on the Muslim perpetrator predictor is larger as the number of fatalities in that attack is smaller; the Global Terrorism Database lists four rows of data for the Tsarnaev case, the first of which has only one fatality, so I wanted to check to make sure that there is no error about this in the Kearns et al. data.

Hi Erin,

I created a dataset from the Global Terrorism Database and the data in the appendix of your SSRN paper. I messaged the Monkey Cage about writing a response to your post, and I received the suggestion to communicate with you about the planned response post.

For now, I have three requests:

  1. Can you report the number of articles in your dataset for Bobby Joe Rogers [id 201201010020] and Ray Lazier Lengend? The appendix of your paper has perpetrator Ray Lazier Lengend associated with the id for Bobby Joe Rogers.
  1. Can you report the earliest published date and the latest published date among the 474 articles in your dataset for the Tsarnaev case?
  1. Can you report the number killed in your dataset for the Tsarnaev case?

I have attached a do file that can be used to construct my dataset and run my analyses in Stata. Let me know if you have any questions, see any errors, or have any suggestions.

Thanks,

L.J

I have not yet received a reply to this message.

I pitched a response post to the Monkey Cage regarding my analysis, but the pitch was not accepted, at least while the Kearns et al. paper is unpublished.

---

NOTES:

[1] Data from the The Global Terrorism Database have this citation: National Consortium for the Study of Terrorism and Responses to Terrorism (START). (2016). Global Terrorism Database [Data file]. Retrieved from https://www.start.umd.edu/gtd.

[2] The method for eliminating news articles in the Kearns et al. working paper included this choice:

"We removed the following types of articles most frequently: lists of every attack of a given type, political or policy-focused articles where the attack or perpetrators were an anecdote to a larger debate, such as abortion or gun control, and discussion of vigils held in other locations."

It is worth assessing the degree to which this choice disproportionately reduces the count of articles for the Dylann Roof terror attack, which served as a background for many news articles about the display of the Confederate flag. It's not entirely clear why these types of articles should not be considered when assessing whether terror attacks by Muslims receive disproportionate media coverage.

[3] Controlling for attacks by unknown perpetrators, controlling for fatalities, and removing the Tsarnaev case drops the point estimate for the incident rate ratio to 0.89 (p=0.823).

Tagged with: , , ,

According to its website, Visions in Methodology "is designed to address the broad goal of supporting women who study political methodology" and "serves to connect women in a field where they are under-represented." The Call for Proposals for the 2017 VIM conference indicates that submissions were restricted to women:

We invite submissions from female graduate students and faculty that address questions of measurement, causal inference, the application of advanced statistical methods to substantive research questions, as well as the use of experimental approaches (including incentivized experiments)...Please consider applying, or send this along to women you believe may benefit from participating in VIM!

Here is the program for the 2016 VIM conference, which lists activities restricted to women, lists conference participants (which appear to be only women), and has a photo that appears to be from the conference (which appears to have only women in the photo).

The 2017 VIM conference webpage indicates that the conference is sponsored by several sources such as the National Science Foundation and the Stony Brook University Graduate School. But page 118 of the NSF's Proposal & Award Policies & Procedures Guide (PAPPG) of January 2017 states:

Subject to certain exceptions regarding admission policies at certain religious and military organizations, Title IX of the Education Amendments of 1972 (20 USC §§ 1681-1686) prohibits the exclusion of persons on the basis of sex from any education program or activity receiving Federal financial assistance.  All NSF grantees must comply with Title IX.

The VIM conference appears to be an education program or activity receiving Federal financial assistance and, as such, submissions and conference participation should not be restricted by sex.

---

NOTES:

1. This Title IX Legal Manual discusses what constitutes an education program or activity:

While Title IXs antidiscrimination protections, unlike Title VI’s, are limited in coverage to "education" programs or activities, the determination as to what constitutes an "education program" must be made as broadly as possible in order to effectuate the purposes of both Title IX and the CRRA. Both of these statutes were designed to eradicate sex-based discrimination in education programs operated by recipients of federal financial assistance, and all determinations as to the scope of coverage under these statutes must be made in a manner consistent with this important congressional mandate.

2. I think that the relevant NSF award is SES 1324159, which states that part of the project will "continue a series of small meetings for women methodologists that deliberately mix senior leaders in the subfield with young, emerging scholars who can benefit substantially from such close personal interaction." This page indicates that the 2014 VIM conference received support from NSF grant SES 1120976.

---

UPDATE [June 20, 2019]

I learned from a National Science Foundation representative of a statute (42 U.S. Code § 1885a) that permits the National Science Foundation to fund women-only activities listed in the statute. However, the Visions in Methodology conference has been funded by host organizations such as Stony Brook University, and I have not yet uncovered any reason why host institutional covered by Title IX would not be in violation of Title IX in funding single-sex educational opportunities.

Tagged with: ,

The Monkey Cage published a post that claimed that "U.S. media outlets disproportionately emphasize the smaller number of terrorist attacks by Muslims". Such an inference depends on the control variables making all else equal, but the working paper on which the inference was based had few controls and few alternate specifications. The models controlled for fatalities but the Global Terrorism Database used for the key reference also lists the number of persons injured, and a measure of total casualties might be a better control than only fatalaties. For example, the Boston Marathon bombing is listed as having 1 fatality and 132 injured, but the models in the working paper would estimate the media coverage to be the same as if the bombing had had 1 fatality and 0 injured.

Moreover, as noted in the comments to the post, the Boston Marathon bombing is an outlier in terms of the outcome variable (20 percent of articles were devoted to that single event). But the working paper reported no model that omitted this outlier from the analysis, so it is not clear to what extent the estimates and inferences reflect a "Muslim perpetrator" effect or a "Boston Marathon bombing" effect. And, as also noted in the comments, proper controls would reflect the difference in expected media coverage for terrorist attacks in which the perpetrator was killed at the scene versus terrorist attacks in which there was a manhunt for the perpetrator.

Finally, from what I can tell based on the post and the working paper, the number of articles for the Boston Marathon bombing might include articles published before it was known or credibly suspected that the perpetrators were Muslim. If so, then the article count for the Boston Marathon bombing might be inflated because media coverage of the bombing before the religion of the perpetrators was known or credibly suspected cannot be attributed to the religion of the perpetrators.

My request for the data and code used for the post was declined, but hopefully I'll remember to check for the data and code after the working paper is published. In the meantime, I asked the authors on Twitter about inclusion of articles before the suspects were known and about results when the Boston Marathon bombing is excluded from the analysis.

Tagged with: , ,

My article reanalyzing data on a gender gap in citations to international relations articles indicated that the gender gap is largely confined to elite articles, defined as articles in the right tail of citation counts or articles in the top three political science journals. That article concerned an aggregate gender gap in citations, but this post is about a particular woman who has been under-cited in the social science literature.

It is not uncommon to read a list experiment study that suggests or states that the list experiment originated in the research described in the Kuklinski, Cobb, and Gilens 1997 article, "Racial Attitudes and the New South." For example, from Heerwig and McCabe 2009 (p. 678):

Pioneered by Kuklinski, Cobb, and Gilens (1997) to measure social desirability bias in reporting racial attitudes in the "New South," the list experiment is an increasingly popular methodological tool for measuring social desirability bias in self-reported attitudes and behaviors.

Kuklinski et al. described a list experiment that was placed on the 1991 National Race and Politics Survey. Kuklinski and colleagues appeared to propose the list experiment as a new measure (p. 327):

We offer as our version of an unobtrusive measure the list experiment. Imagine a representative sample of a general population divided randomly in two. One half are presented with a list of three items and asked to say how many of these items make them angry — not which specific items make them angry, just how many. The other half receive the same list plus an additional item about race and are also asked to indicate the number of items that make them angry. [screen shot]

The initial draft of my list experiment article reflected the belief that the list experiment originated with Kuklinski et al., but I then learned [*] of Judith Droitcour Miller's 1984 dissertation, which contained this passage:

The new item-count/paired lists technique is designed to avoid the pitfalls encountered by previous indirect estimation methods. Briefly, respondents are shown a list of four or five behavior categories (the specific number is arbitrary) and are then asked to report how many of these behaviors they have engaged in — not which categories apply to them. Nothing else is required of respondents or interviewers. Unbiased estimation is possible because two slightly different list forms (paired lists) are administered to two separate subsamples of respondents, which have been randomly selected in advance by the investigator. The two list forms differ only in that the deviant behavior item is included on one list, but omitted from the other. Once the alternate forms have been administered to the two randomly equivalent subsamples, an estimate of deviant behavior prevalence can be derived from the difference between the average list scores. [screen shot]

The above passage was drawn from pages 3 and 4 of Judith Droitcour Miller's 1984 dissertation at the George Washington University, "A New Survey Technique for Studying Deviant Behavior." [Here is another description of the method, in a passage from the 2004 edition of the 1991 book, Measurement Errors in Surveys (p. 88)]

It's possible that James Kuklinski independently invented the list experiment, but descriptions of the list experiment's origin should nonetheless cite Judith Droitcour Miller's 1984 dissertation as a prior — if not the first [**] — example of the procedure known as the list experiment.

---

[*] I think it was the Adam Glynn manuscript described below through which I learned of Miller's dissertation.

[**] An Adam Glynn manuscript discussed the list experiment and item count method as special cases of aggregated response techniques. Glynn referenced a 1979 Raghavarao and Federer article, and that article referenced a 1974 Smith et al. manuscript that used a similar block total response procedure. The non-randomized version of the procedure split seven questions into groups of three, as illustrated in one of the questionnaires below. The procedure's unobtrusiveness derived from a researcher's inability in most cases to determine which responses a respondent had selected: for example, Yes-No-Yes produces the same total as No-No-No (5 in each case).

blocktotalresponse

The questionnaire for the randomized version of the block total response procedure listed all seven questions; the respondent then drew a number and gave a total response for only those three questions that were associated with the number that was drawn: for example, if the respondent drew a 4, then the respondent gave a total for their responses to questions 4, 5, and 7. This procedure is similar to the list experiment, but the list experiment is simpler and more efficient.

Tagged with: , , , ,