This R lesson is for confidence intervals on point estimates. See here for other lessons.

---

Here's the first three lines of code:

pe <- c(2.48, 1.56, 2.96)
y.axis <- c(1:3)
plot(pe, y.axis, type="p", axes=T, pch=19, xlim=c(1,4), ylim=c(1,3))

The first line places 2.48, 1.56, and 2.96 into a vector called "pe" for point estimates; you can call the vector anything that you want, as long as R recognizes the vector name.

The second line sends the integers from 1 to 3 into the vector "y.axis"; instead of y.axis <- c(1:3), you could have written y.axis <- c(1,2,3) to do the same thing.

The third line plots a graph with pe on the x-axis and y.axis on the y-axis; type="p" tells R to plot points, axes=T tells R to draw axes, pch=19 indicates what type of points to draw, xlim=c(1,4) indicates that the x-axis extends from 1 to 4, and ylim=c(1,3) indicates that the y-axis extends from 1 to 3.

Here's the graph so far:

ci1---

Let's make the points a bit larger by adding cex=1.2 to the end of the plot command.

Let's also add a title, using a new line of code: title(main="Negative Stereotype Disagreement > 3").

ci2---

Let's add the 95% confidence interval lines.

lower <- c(2.26, 1.17, 2.64)
upper <- c(2.70, 1.94, 3.28)
segments(lower, y.axis, upper, y.axis, lwd= 1.3)

The first line indicates the lower ends of the confidence intervals; the second line indicates the upper ends of the confidence intervals; and the segments command draws line segments from the coordinate (lower, y.axis) to the coordinate (upper, y.axis), with lwd=1.3 indicating that the line should be slightly thicker than the default.

Here's what we have so far:

ci3---

Let's replace the x-axis and y-axis. First, change axes=T to axes=F in the plot command; then add the code axis(1, at=seq(1,4,by=1)) to tell R to draw an axis at the bottom from 1 to 4 with tick marks every 1 unit. Here's what we get:

ci4Let's get rid of the "pe" and "y.axis" labels. Add to the plot command: xlab="", ylab="". Here's the graph now:

ci5---

Let's work on the y-axis now:

names <- c("Baseline", "Black\nFamily", "Affirmative\nAction")
axis(2, at=y.axis, label=names)

The first line sends three phrases to the vector "names"; the \n in the phrases tells R to place "Family" and "Action" on a new line. Here's the result:

ci6Let's make the y-axis labels perpendicular to the y-axis by adding las=2 to the axis(2 line. [las=0 would keep the labels parallel.]

ci7Now we need to add a little more space to the left of the graph to see the y-axis labels. Add par(mar=c(4, 6, 2, 0)) above the plot command to tell R to make the margins 4, 6, 2, and 0 for the bottom, left, top, and right margins.

ci8---

Let's say that I decided that I prefer to have the baseline on top of the graph and Affirmative Action at the bottom of the graph. I could use the rev() function to reverse the order of the points in the plot, segments, and axis functions to get:

ci9---

Here is the whole code for the above graph. By the way, the graph above can be found in my article on social desirability in the list experiment, "You Wouldn't Like Me When I'm Angry."

Tagged with: ,

This R lesson is for the plot command. See here for other lessons.

---

The start of this code is a bit complex. It's from R Commander, which is a way to use R through a graphical interface without having to write code.

library(foreign)

The library function with the foreign package is used to import data from SPSS, Stata, or some other software.

DWHouse <- read.dta("C:/house_polarization46_113v9.dta", convert.dates=TRUE, convert.factors=TRUE, missing.type=TRUE, convert.underscore=TRUE, warn.missing.labels=TRUE)

The above command reads data from Stata (.dta extension) and places the data into DWHouse. The house_polarization46_113v9.dta dataset is from Voteview polarization data, located here. [The v9 on the end of the dataset indicates that I saved the dataset as Stata version 9.]

---

Here's the plot command:

plot(repmean1~year, type="p", xlim=c(1900,2012), ylim=c(-1,1), xlab="Year", ylab="Liberal - Conservative", pch=19, col="red", main="House", data=DWHouse)

Here are what the arguments mean: the tilde in repmean1~year plots repmean1 as a function of year, type="p" indicates to plot points, xlim=c(1900,2012) indicates the limits for the x-axis, ylim=c(-1,1) indicates the limits for the x-axis, xlab="Year" and ylab="Liberal - Conservative" respectively indicate labels for the x-axis and y-axis, pch=19 indicates to use the 19 plotting character [see here for a list of pchs], col="red" indicates the color for the pchs [see here for a list of colors], main="House" indicates the main title, and data=DWHouse indicates the data to plot.

Here's what the graph looks like so far:

plotgop---

The repmean1 plotted above is the Republican Party mean for the first-dimension DW-Nominate scores among members of the House of Representatives. Let's add the Democrats. Instead of adding a new plot command, we just add points:

points(demmean1~year, type="p", pch=19, col="blue", data=DWHouse)

Now let's add some labels:

text(1960,0.4,labels="GOP mean", col="red")
text(1960,-0.4,labels="Dem mean", col="blue")

The first command adds text at the coordinate x=1960 and y =0.4; the text itself is "GOP mean," and the color of the text is red. I picked x=1960 and y =0.4 through trial and error to see where the text would look the nicest.

Here's the graph now:

plotgopdem---

Notice that the x-axis is labeled in increments of 20 years (1900, 1920, 1940, ...). This can be changed as follows. First, add axes=F to the plot command to shut off axes; you could also write axes=FALSE); then add these axis lines below the plot command:

axis(1, at=seq(1900, 2020, 10))
axis(2, at=seq(-1, 1, 0.5))

The above lines tell R to plot axes at the indicated intervals. The first line arguments are: 1 tells R to plot an axis below [1=below, 2=left, 3=above, and 4=right], and the (1900, 2020, 10) sequence tells R to plot from 1900 to 2020 and place tick marks every 10 years. Here's the resulting graph:

plotgopdem20---

Notice that the x-axis and y-axis do not touch in the graph above. There's a few extra points plotted that I did not intend to plot: I meant to start the graph at 1900 so that the first point was 1901 (DW-Nominate scores are provided in the dataset every two years starting with 1879). To get the x-axis and y-axis to touch, add xaxs="i", yaxs="i" to the plot command. Let's also add box() to get a box around the graph, like we had in the first two graphs above.

plotgopdem20i

---

Here is the whole code for the plot above.

Tagged with: ,

The first graph in this series is a barplot. This post will show how to add error bars to a barplot.

Here's the data that we want to plot, from a t-test conducted in Stata:

ttest---

Here's the first part of the code:

library(Hmisc)

The code above opens the Hmisc library, which has the error bar function that we will use.

means <- c(2.96, 3.59)

The code above places 2.96 and 3.59 into the vector "means".

bp = barplot(means, ylim=c(0,6), names.arg=c("Black", "White"), ylab="Support for life in prison without parole", xlab="Race of the convicted teen", width=c(0.2,0.2), xlim=c(0,1), space=c(1,1), las=1, main="Black Non-Hispanic Respondents")

The code above is similar to the barplot code that we used before, but notice that in this case the barplot is = bp. The remainder of the arguments are: means indicates what data to plot, ylim=c(0,6) indicates that the limits of the y-axis are 0 and 6, names.arg=c("Black", "White") indicates the names for the bars, ylab="Support for life in prison without parole" indicates the label for the y-axis, xlab="Race of the convicted teen" indicates the label for the x-axis, width=c(0.2,0.2) indicates the width of the bars, xlim=c(0,1) indicates that the limits of the x-axis are 0 and 1, space=c(1,1) indicates the spacing between bars, and main="Black Non-Hispanic Respondents" indicates the main title for the graph.

Here's the graph so far:

95a

---

Here's how to add the error bars:

se <- c(0.2346, 0.2022)
lower = c(means-1.96*se, means-1.96*se)
upper = c(means+1.96*se, means+1.96*se)
errbar(bp, means, upper, lower, add=T)

The first line sends the values for the standard errors into the vector "se". The second and third lines are used to calculate the ends of the error bars. The fourth line tells R to plot error bars; the add=T option tells R to keep the existing graph; without add=T, the graph will show only the error bars.

Finally, add the code box(bty="L") so that there is a line on the bottom of the graph. The bty="L" tells R to make the axis look like the letter L. Other options include C, O, 7, and U.

Here is the graph now:

95b---

It's not necessary to use the 1.96 multiplier for the error bars. The following code plugged in the lower and upper limits directly from the Stata output.

library(Hmisc)

means <- c(2.96, 3.59)

bp = barplot(means, ylim=c(0,6), names.arg = c("Black", "White"), ylab="Support for life in prison without parole", xlab="Race of the convicted teen", xpd=T, width=c(0.2,0.2), xlim=c(0,1), space=c(1,1), main="Black Non-Hispanic Respondents")

se <- c(0.2346, 0.2022)
lower = c(2.48, 3.19)
upper = c(3.42, 4.00)
errbar(bp, means, upper, lower, add=T)

box(bty="O")

---

Here's what the graph looks like for the above, shortened code, with the bty="O":

95c

---

Data from this post were drawn from here, with the article here. Click here for the graph code.

Tagged with: ,

If I remember correctly, my first introduction to R came when fellow Pitt graduate student Hirokazu Kikuchi requested that R be installed on the polisci lab computers. I looked into R and found this webpage based on a 2007 Perspectives on Politics article by Jonathan Kastellec and Eduardo Leoni. That link is a good place to start, but in this post I'll introduce a few lines of code to illustrate how nice and easy R can be. (Not that R is always easy.)

I'll indicate lines of R code in bold.

---

less5 <- c(40.91, 7.67, 7.11, 6.19, 15.65, 6.4, 4.57, 4.43, 2.42, 4.66)

The above command assigns the ten numbers (from 40.91 to 4.66) to a vector called "less5." c() is a concatenation function. The following command does the same thing:

 c(40.91, 7.67, 7.11, 6.19, 15.65, 6.4, 4.57, 4.43, 2.42, 4.66) -> less5

---

barplot (less5, main="Countries with a mean < 5", ylab="Percent", ylim=c(0, 40), names=c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

The barplot function tells R to plot a bar chart. These are the arguments: less 5 indicates the vector to plot, main="Countries with a mean < 5" indicates the main plot title, ylab="Percent" indicates the label for the y-axis, ylim=c(0, 40) indicates that the y-axis should run from 0 to 40, and names=c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10")) indicates the set of names that should be placed below the set of bars.

Here's what this graph looks like, based on the two lines of code:

barplot1---

Let's plot three graphs together. Here's the code for graphs 2 and 3:

from56 <- c(18.35, 4.41, 5.68, 4.61, 22.63, 9.31, 7.63, 8.65, 4.99, 13.75)

barplot (from56, main="Countries with a mean > 5 and < 6", ylab="Percent", ylim=c(0, 40), names=c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

more6 <- c(7.99, 2.26, 3.37, 3.62, 17.29, 9.46, 8.95, 12.83 ,8.93, 25.3)

barplot (more6, main="Countries with a mean > 6", ylab="Percent", ylim=c(0, 40), names=c("1", "2", "3", "4", "5", "6", "7", "8", "9", "10"))

Let's put the par function at the top of the code to tell R how to plot these three graphs:

par(mfrow=c(1, 3))

The above line of code tells R to plot 1 row and 3 columns of plots. Here's the output:

barplot3This is the output for par(mfrow=c(3, 1)):

barplot3v---

That's it for this post: here is a text file of the code. By the way, the graph above can be found in my article on midpoint misperceptions.

Tagged with: ,

I have been trying to reproduce several studies and have noticed that the reporting of results from these studies often presents a much stronger impression of results than I get from an investigation of the data itself. I plan to report some of these reproduction attempts, so I have been reading literature on researcher degrees of freedom and the file drawer problem. Below I'll post and comment on some interesting passages that I have happened upon.

---

To put it another way: without modern statistics, we find it unlikely that people would take seriously a claim about the general population of women, based on two survey questions asked to 100 volunteers on the internet and 24 college students. But with the p-value, a result can be declared significant and deemed worth publishing in a leading journal in psychology. (Gelman and Loken, 2013, 14-15, emphasis in the original)

I wonder how many people in the general population take seriously general claims based on only small mTurk and college student samples, provided that these people are informed that these general claims are based only on small unrepresentative samples; I suspect that some of the "taking seriously" that leads to publication in leading psychology journals reflects professional courtesy among peer researchers whose work is also largely based on small unrepresentative samples.

---

Maybe it's because I haven't done much work with small unrepresentative samples, but I feel cheated when investing time in an article framed in general language that has conclusions based on small unrepresentative samples. Here's an article that I recently happened upon: "White Americans' opposition to affirmative action: Group interest and the harm to beneficiaries objection." The abstract:

We focused on a powerful objection to affirmative action – that affirmative action harms its intended beneficiaries by undermining their self-esteem. We tested whether White Americans would raise the harm to beneficiaries objection particularly when it is in their group interest. When led to believe that affirmative action harmed Whites, participants endorsed the harm to beneficiaries objection more than when led to believe that affirmative action did not harm Whites. Endorsement of a merit-based objection to affirmative action did not differ as a function of the policy’s impact on Whites. White Americans used a concern for the intended beneficiaries of affirmative action in a way that seems to further the interest of their own group.

So who were these white Americans?

Sixty White American students (37% female, mean age = 19.6) at the University of Kansas participated in exchange for partial course credit. One participant did not complete the dependent measure, leaving 59 participants in the final sample. (p. 898)

I won't argue that this sort of research should not be done, but I'd like to see this sort of exploratory research replicated with a more representative sample. One of the four co-authors listed her institutional affiliation at California State University San Bernardino, and two other co-authors listed their institutional affiliation at Tulane University, so I would have liked to have seen a second study among a different sample of students. At the very least, I'd like to see a description of the restricted nature of the sample in the abstract to let me and other readers make a more informed judgment about the value of investing time in the article.

---

The Gelman and Loken (2013) passage cited above reminded me of a recent controversy regarding a replication attempt of Schnall et al. (2008). I read about the controversy in a Nicole Janz post at Political Science Replication. The result of the replication (a perceived failure to replicate) was not shocking because Schnall et al. (2008) had reported only two experiments based on data from 40 and 43 University of Plymouth undergraduates.

---

Schnall in a post on the replication attempt:

My graduate students are worried about publishing their work out of fear that data detectives might come after them and try to find something wrong in their work. Doing research now involves anticipating a potential ethics or even criminal investigation.

I like the term "data detectives" a bit better than "replication police" (h/t Nicole Janz), so I think that I might adopt the label "data detective" for myself.

I can sympathize with the graduate students' fear that someone might target my work and try to find an error in that work, but that's a necessary occupational hazard for a scientist.

The best way to protect research from data detectives is to produce reproducible and perceived replicable research; one of the worst ways to protect research from data detectives is to publish low-powered studies in a high-profile journal, because the high profile draws attention and the low power increases suspicions that the finding was due to the non-reporting of failed experiments.

---

From McBee and Matthews (2014):

Researchers who try to serve the interests of science are going to find themselves out-competed by those who elect to “play the game,” because the ethical researcher will conduct a number of studies that will prove unpublishable because they lack statistically significant findings, whereas the careerist will find ways to achieve significance far more frequently. (p. 77)

This reflects part of the benefit produced by data detectives and the replication police: a more even playing field for researchers reluctant to take advantage of researcher degrees of freedom.

---

This Francis (2012) article is an example of a data detective targeting an article to detect non-reporting of experiments. Balcetis and Dunning (2010) reported five experiments rejecting the null hypothesis; the experiments had Ns, effect sizes, and powers as listed below in a table drawn from Francis (2012) p. 176.

Francis 2012Francis summed the powers to get 3.11, which indicates the number of times that we should expect the null hypothesis to be rejected given the observed effect sizes and powers of the 5 experiments; Francis multiplied the powers to get 0.076, which indicates the probability that the null hypothesis will be rejected in all 5 experiments.

---

Here is Francis again detecting more improbable results. And again. Here's a back-and-forth between Simonsohn and Francis on Francis' publication bias studies.

---

Here's the Galak and Meyvis (2012) reply to another study in which Francis claimed to have detected non-reporting of experiments in Galak and Meyvis (2011). Galak and Meyvis admit to the non-reporting:

We reported eight successful demonstrations of this phenomenon in our paper, but we also conducted five additional studies whose results either did not reach conventional levels of significance or did reach significance but ended up being rhetorically redundant. (p. 595)

...but argue that it's not a problem because they weren't interested in effect sizes:

However, as is the case for many papers in experimental psychology, the goal was never to assess the exact size of the effect, but rather to test between competing theoretical predictions. (p. 595)

Even if it is true that the authors were unconcerned with effect size, I do not understand how that justifies not reporting results that fail to reach conventional levels of statistical significance.

So what about readers who *are* interested in effect sizes? Galak and Meyvis write:

If a researcher is interested in estimating the size of an effect reported in a published paper, we recommend asking the authors for their file drawer and conducting a meta-analysis. (p. 595-596)

That's an interesting solution: if you are reading an article and wonder about the effect size, put down the article, email the researchers, hope that the researchers respond, hope that the researchers send the data, and then -- if you receive the data -- conduct your own meta-analysis.

Tagged with: , , ,

I discussed here some weird things that SPSS does with regard to weighting. Here's another weird thing, this time in Stata:

StataQ1trunc

The variable Q1 has a minimum of 0 and a maximum of 99,999. For this particular survey question, 99,999 is not a believable response; so, instead of letting 99,999 and other unbelievable responses influence the results, I truncated Q1 at 100, so that all responses above 100 equaled 100. There are other ways of handling unbelievable responses, but this can work as a first pass to assess whether the unbelievable responses influenced results.

The command replace Q1trunc = 100 if Q1 > 100 tells Stata to replace all responses over 100 with a response of 100; but notice that this replacement increased the number of observations from 2008 to 2065; that's because Stata  treated the 57 missing values as positive infinity and replaced these 57 missing values with 100.

Here's a line from Stata's help missing documentation:

all nonmissing numbers < . < .a < .b < ... < .z

Stata has a reason for treating missing values as positive infinity, as explained here. But -- unless users are told of this -- it is not obvious that Stata treats missing values as positive infinity, so this appears to be a source of potential error for code with a > sign and missing values.

Here's how to recode the command so that missing values remains missing: replace Q1trunc = 100 if Q1 > 100 & if Q1 < .

Tagged with: ,

This post presents selected excerpts from Jesper W. Schneider’s 2014 Scientometrics article, "Null hypothesis significance tests. A mix-up of two different theories: the basis for widespread confusion and numerous misinterpretations" [ungated version here]. For the following excerpts, most citations have been removed, and page numbers references to the article have not been included because my copy of the article lacked page numbers.

The first excerpt notes that the common procedure followed in most social science research is a mishmash of two separate procedures:

What is generally misunderstood is that what today is known, taught and practiced as NHST [null hypothesis significance testing] is actually an anonymous hybrid or mix-up of two divergent classical statistical theories, R. A. Fisher’s 'significance test' and Neyman's and Pearson's 'hypothesis test'. Even though NHST is presented somewhat differently in statistical textbooks, most of them do present p values, null hypotheses (H0), alternative hypotheses (HA), Type I (α) and II (β) error rates as well as statistical power, as if these concepts belong to one coherent theory of statistical inference, but this is not the case. Only null hypotheses and p values are present in Fisher's model. In Neyman–Pearson's model, p values are absent, but contrary to Fisher, two hypotheses are present, as well as Type I and II error rates and statistical power.

The next two excerpts contrast the two procedures:

In Fisher's view, the p value is an epistemic measure of evidence from a single experiment and not a long-run error probability, and he also stressed that 'significance' depends strongly on the context of the experiment and whether prior knowledge about the phenomenon under study is available. To Fisher, a 'significant' result provides evidence against H0, whereas a non-significant result simply suspends judgment—nothing can be said about H0.

They [Neyman and Pearson] specifically rejected Fisher’s quasi-Bayesian interpretation of the 'evidential' p value, stressing that if we want to use only objective probability, we cannot infer from a single experiment anything about the truth of a hypothesis.

The next excerpt reports evidence that p-values are overstated. I have retained the reference citations here:

Using both likelihood and Bayesian methods, more recent research have demonstrated that p values overstate the evidence against H0, especially in the interval between significance levels 0.01 and 0.05, and therefore can be highly misleading measures of evidence (e.g., Berger and Sellke 1987; Berger and Berry 1988; Goodman 1999a; Sellke et al. 2001; Hubbard and Lindsay 2008; Wetzels et al. 2011). What these studies show is that p values and true evidential measures only converge at very low p values. Goodman (1999a, p. 1008) suggests that only p values less than 0.001 represent strong to very strong evidence against H0.

This next excerpt emphasizes the difference between p and alpha:

Hubbard (2004) has referred to p < α as an 'alphabet soup', that blurs the distinctions between evidence (p) and error (α), but the distinction is crucial as it reveals the basic differences underlying Fisher’s ideas on 'significance testing' and 'inductive inference', and Neyman–Pearson views on 'hypothesis testing' and 'inductive behavior'.

The next excerpt contains a caution against use of p-values in observational research:

In reality therefore, inferences from observational studies are very often based on single non-replicable results which at the same time no doubt also contain other biases besides potential sampling bias. In this respect, frequentist analyses of observational data seems to depend on unlikely assumptions that too often turn out to be so wrong as to deliver unreliable inferences, and hairsplitting interpretations of p values becomes even more problematic.

The next excerpt cautions against incorrect interpretation of p-values:

Many regard p values as a statement about the probability of a null hypothesis being true or conversely, 1 − p as the probability of the alternative hypothesis being true. But a p value cannot be a statement about the probability of the truth or falsity of any hypothesis because the calculation of p is based on the assumption that the null hypothesis is true in the population.

The final excerpt is a hopeful note that the importance attached to p-values will wane:

Once researchers recognize that most of their research questions are really ones of parameter estimation, the appeal of NHST will wane. It is argued that researchers will find it much more important to report estimates of effect sizes with CIs [confidence intervals] and to discuss in greater detail the sampling process and perhaps even other possible biases such as measurement errors.

The Schneider article is worthwhile for background and information on p-values. I'd also recommend this article on p-value misconceptions.

Tagged with: