you’re doing it wrong

How to use population weights in SPSS Complex Samples

By ljzigerell Posted on May 16, 2014 Posted in Methods

My previous posts discussed the p-values that the base module of SPSS reports for statistical significance tests using weighted data; these weights are not correct for probability-weighted analyses. Jon Peck informed me of SPSS Complex Samples, which can provide correct p-values for statistical significance tests for probability-weighted analyses. Complex Samples does not have the most intuitive setup, so this post describes the procedure for analyzing data using probability weights in SPSS Statistics 21.

SPSS0

SPSS1

The dataset that I was working with had probability weights but no clustering or stratification, so the Stratify By and Clusters boxes remain empty in the image below.

SPSS4

The next dialog box has options for Simple Systematic and Simple Sequential. Either method will work if Proportions are set to 1 in the subsequent dialog box.

SPSS3

SPSS4

SPSS5

SPSS6

SPSS7

SPSS8

SPSS9

I conducted an independent samples t-test, so I selected the General Linear Model command below.

SPSS10

SPSS11

Click the Statistics button in the image above and then click the t-test box in the image below to tell SPSS to conduct a t-test.

SPSS12

SPSS13

Hit OK to get the output.

rattan2012outputSPSS

The SPSS output above has the same p-value as the probability-weighted Stata output below.

rattan2012outputStata

Tagged with: SPSS, Stata, you're doing it wrong

SPSS ate my observations

By ljzigerell Posted on May 15, 2014 Posted in Methods

My previous post discussed p-values in SPSS and Stata for probability-weighted data. This post provides more information on weighting in the base module of SPSS. Data in this post are from Craig and Richeson (2014), downloaded from the TESS archives; SPSS commands are from personal communication with Maureen Craig, who kindly and quickly shared her replication code.

Figure 2 in Craig and Richeson's 2014 Personality and Social Psychology Bulletin article depicts point estimates and standard errors for racial feeling thermometer ratings made by white non-Hispanic respondents. The article text confirms what the figure shows: whites in the racial shift condition (who were exposed to a news article titled, "In a Generation, Racial Minorities May Be the U.S. Majority") rated Blacks/African Americans, Latinos/Hispanics, and Asian-Americans lower on the feeling thermometers at a statistically significant level than whites in the control condition (who were exposed to a news article titled, "U.S. Census Bureau Reports Residents Now Move at a Higher Rate").

CraigRicheson2014PSPB

Craig and Richeson generated a weight variable that retained the original post-stratification weights for non-Hispanic white respondents but changed the weight to 0.001 for respondents who were not non-Hispanic white. Figure 2 results were drawn from the SPSS UNIANOVA command, which "provides regression analysis and analysis of variance for one dependent variable by one or more factors and/or variables," according to the SPSS web entry for the UNIANOVA command.

The SPSS output below represents a weighted analysis in the base SPSS module for the command UNIANOVA therm_bl BY dummyCond WITH cPPAGE cPPEDUCAT cPPGENDER, in which therm_bl, dummyCond, cPPAGE, cPPEDUCAT, and cPPGENDER respectively indicate numeric ratings on a 0-to-100 feeling thermometer scale for blacks, a dummy variable indicating whether the respondent received the control news article or the treatment news article, respondent age, respondent education on a four-level scale, and respondent sex. The 0.027 Sig. value for dummyCond indicates that the mean thermometer rating made by white non-Hispanics in the control condition was different at the 0.027 level of statistical significance from the mean thermometer rating made by white non-Hispanics in the treatment condition.

CR2014PSPB

The image below presents results for the same analysis conducted using probability weights in Stata, with weightCR indicating a weight variable mimicking the post-stratification weight created by Craig and Richeson: the corresponding p-value is 0.182, not 0.027, a difference due to the Stata p-value reflecting a probability-weighted analysis and the SPSS p-value reflecting a frequency-weighted analysis.

CR2014bl0

So why did SPSS return a p-value of 0.027 for dummyCond?

The image below is drawn from online documentation for the SPSS weight command. The second bullet point indicates that SPSS often rounds fractional weights to the nearest integer. The third bullet point indicates that SPSS statistical procedures ignore cases with a weight of zero, so cases with fractional weights that round to zero will be ignored. The first bullet point indicates that SPSS arithmetically replicates a case according to the weight variable: for instance, SPSS treats a case with a weight of 3 as if that case were 3 independent and identical cases.

weightsSPSS

Let's see if this is what SPSS did. The command gen weightCRround = round(weightCR) in the Stata output below generates a variable with the values of weightCR rounded to the nearest integer. When the Stata command used the frequency weight option with this rounded weight variable, Stata reported p-values identical to the SPSS p-values.

CR2014bl2

The Stata output below illustrates what happened in the above frequency-weighted analysis. The expand weightCRround command replicated each dataset case n-1 times, in which n is the number in the weightCRround variable: for example, each case with a weightCRround value of 3 now appears three times in the dataset. Stata retained one instance of each case with a weightCRround value of zero, but SPSS ignores cases with a weight of zero for weighted analyses; therefore, the regression excluded cases with a zero value for weightCRround.

Stata p-values from a non-weighted regression on this adjusted dataset were identical to SPSS p-values reported using the Craig and Richeson commands.

CR2014bl3

So how much did SPSS alter the dataset? The output below is for the original dataset: the racial shift and control conditions respectively had 233 and 222 white non-Hispanic respondents with full data on therm_bl, cPPAGE, cPPEDUCAT, and cPPGENDER; the difference in mean therm_bl ratings across conditions was 3.13 units.

CR2014bl4before

The output below is for the dataset after executing the round and expand commands: the racial shift and control conditions respectively had 189 and 192 white non-Hispanic respondents with a non-zero weight and full data on therm_bl, cPPAGE, cPPEDUCAT, and cPPGENDER; the difference in mean therm_bl ratings across conditions was 4.67, a 49 percent increase over the original difference of 3.13 units.

CR2014bl4after

---

Certain weighted procedures in the SPSS base module report p-values identical to p-values reported in Stata when weights are rounded, cases are expanded by those weights, and cases with a zero weight are ignored; other weighted procedures in the SPSS base module report p-values identical to p-values reported in Stata when the importance weight option is selected or when the analytic weight option is selected and the sum of the weights is 1.

(Stata's analytic weight option treats each weight as an indication of the number of observations represented in a particular case; for instance, an analytic weight of 4 indicates that the values for the corresponding case reflect the mean values for four observations; see here.)

Test analyses that I conducted produced the following relationship between SPSS output and Stata output.

SPSS weighted base module procedures that reported p-values identical to Stata p-values when weights were rounded, cases were expanded by those weights, and cases with a zero weight were ignored:

UNIANOVA with weights indicated in the WEIGHT BY command

SPSS weighted base module procedures that reported p-values identical to Stata p-values when the importance weight or analytic weight option was selected and the sum of the weights was 1:

Independent samples t-test
Linear regression with weights indicated in the WEIGHT BY command
Linear regression with weights indicated in the REGWT subcommand in the regression menu (weighted least squares analysis)
UNIANOVA with weights indicated in the REGWT subcommand in the regression menu (weighted least squares analysis)

---

SPSS has a procedure that correctly calculates p-values with survey weights, as Jon Peck noted in a comment to the previous post. The next post will describe that procedure.

---

UPDATE (June 20, 2015)

Craig and Richeson have issued a corrigendum to the "On the Precipice of a 'Majority-Minority' America" article that had used incorrect survey weights.

Tagged with: reproductions, SPSS, Stata, you're doing it wrong

Problems with SPSS survey weights

By ljzigerell Posted on April 13, 2014 Posted in Methods 2 Comments

Here are t-scores and p-values from a set of t-tests that I recently conducted in SPSS and in Stata:

Group 1 unweighted
t = 1.082 in SPSS (p = 0.280)
t = 1.082 in Stata (p = 0.280)

Group 2 unweighted
t = 1.266 in SPSS (p = 0.206)
t = 1.266 in Stata (p = 0.206)

Group 1 weighted
t = 1.79 in SPSS (p = 0.075)
t = 1.45 in Stata (p = 0.146)

Group 2 weighted
t = 2.15 in SPSS (p = 0.032)
t = 1.71 in Stata (p = 0.088)

There was no difference between unweighted SPSS p-values and unweighted Stata p-values, but weighted SPSS p-values fell under conventional levels of statistical significance that probability weighted Stata p-values did not (0.10 and 0.05, respectively).

John Hendrickx noted some problems with weights in SPSS:

One of the things you can do with Stata that you can't do with SPSS is estimate models for complex surveys. Most SPSS procedures will allow weights, but although these will produce correct estimates, the standard errors will be too small (aweights or iweights versus pweights). SPSS cannot take clustering into account at all.

Re-analysis of Group 1 weighted and Group 2 weighted indicated that t-scores in Stata were the same as t-scores in SPSS when using the analytic weight option [aw=weight] and the importance weight option [iw=weight].

---

SPSS has another issue with weights, indicated on the IBM help site:

If the weighted number of cases exceeds the sample size, tests of significance are inflated; if it is smaller, they are deflated.

This means that, for significance testing, SPSS treats the sample size as the sum of the weights and not as the number of observations: if there are 1,000 observations and the mean weight is 2, SPSS will conduct significance tests as if there were 2,000 observations. Stata with the probability weight option treats the sample size as the number of observations no matter the sum of the weights.

I multiplied the weight variable by 10 in the dataset that I have been working in. For this inflated weight variable, Stata t-scores did not change for the analytic weight option, but Stata t-scores did inflate for the importance weight option.

---

UPDATE (April 21, 2014)

Jon Peck noted in the comments that SPSS has a Complex Samples procedure. SPSS p-values from the Complex Samples procedure matched Stata p-values using probability weights:

SPSS

Stata

The Complex Samples procedure appears to require a plan file. I tried several permutations for the plan, and the procedure worked correctly with this setup:

SPSS-CS

---

UPDATE (May 30, 2015)

More here and here.

Tagged with: reproductions, SPSS, Stata, you're doing it wrong

Does commercial tv news make people forget?

By ljzigerell Posted on August 24, 2013 Posted in Methods No Comments

John Sides at the Monkey Cage discusses an article on public broadcasting and political knowledge. The cross-sectional survey data analyzed in the article cannot resolve the question of causal direction, as Sides notes:

Obviously, there are challenges of sorting out correlation and causation here. Do people who consume public broadcasting become more knowledgeable? Or are knowledgeable people just more likely to consume public broadcasting? Via statistical modeling, Soroka and colleagues go some distance in isolating the possible effects of public broadcasting—though they are clear that their modeling is no panacea.
Nevertheless, the results are interesting. In most countries, people who consume more public broadcasting know more about current events than people who consume less of it. But these same differences emerge to a lesser extent among those who consume more or less commercial broadcasting. This suggests that public broadcasting helps citizens learn. Here's a graph:

But the article should not be interpreted as providing evidence that "public broadcasting helps citizens learn."

Cross-sectional survey data cannot resolve the question of causal direction, but theory can: if we observe a correlation between, say, race and voting for a particular political party, we can rule out the possibility that voting for a particular political party is causing race.

Notice that in the United Kingdom, consumption of commercial broadcasting news correlates with a substantial decrease in political knowledge: therefore, if the figure is interpreted as evidence that broadcasting causes knowledge, then it is necessary to interpret the UK results as commercial broadcasting news in the UK causing people to have less political knowledge. I think that we can safely rule out that possibility.

The results presented in the figure are more likely to reflect self-selection: persons in the UK with more political knowledge choose to watch public broadcasting news, and persons in the UK with less political knowledge choose to watch commercial broadcasting news; that doesn't mean that public broadcasting has zero effect on political knowledge, but it does mean that the evidence presented in the figure does not provide enough information to assess the magnitude of the effect.

Tagged with: methods, political knowledge, you're doing it wrong

How props can cause a poor discussion

By ljzigerell Posted on July 7, 2013 Posted in Teaching No Comments

This post at Active Learning in Political Science describes a discussion on inequality that followed the unequal distribution of chocolate to students reflecting unequal GDPs among countries:

The students then led a discussion about how the students felt, whether the wealthy students were obligated to give up some of their chocolate, and how they would convince the wealthy students to do so. Violence entered the conversation (jokingly) at one point. Eventually the discussion turned to the real-world implications, and the chocolate was widely shared.

Use of a prop like chocolate has advantageous qualities, such as raising the interest level of students and the uniqueness of the discussion, which likely fosters the potential for learning. But the simulation itself clouded or removed many of the features of inequality necessary for a quality discussion of global inequality and aid:

A discussion of inequality among students in the same room diverts attention from impediments to sharing that real countries face: it is nearly costless to pass chocolate to the person next to you, but there is a substantial cost to packaging and shipping goods across the world.
Presumably none of the students had the negative features of a regime like North Korea that would raise questions about whether direct aid might be more harmful than beneficial.
The method of production of the chocolate in the simulation bears no relationsip to the method of production for GDP, chocolate, or any good in the real world: countries do not "receive" goods or wealth independent of mechanisms related to the country's natural resources, education or skill level of the population, political choices, history, etc.
The parameters of the simulation ensured that the total amount of chocolate was static, so that the producion of more chocolate was not an option for the students.

The problem with simulations such as this is that the focus is placed on the simulated instead of the real.

Tagged with: teaching, you're doing it wrong

How ambiguous is 5, on a scale from 1 to 10?

By ljzigerell Posted on March 9, 2012 Posted in Methods No Comments

From a New York Times article by Harvey Araton:

On a scale of 1 to 10, Andy Pettitte’s level of certitude seemed to be a 5. Halfway convinced he couldn’t grind out another year with the Yankees in New York, he opted for an unforced retirement in Houston to watch his children play sports and begin to figure out what to do with the rest of his life.

Perhaps the use of 1-to-10 scales should be retired, as well, because of the common misconception that 5 is halfway between 1 and 10. If you don't believe me, take a look:

This misconception is not restricted to sportswriters, as I reported in this article describing a review of thousands of interviews that the World Values Survey conducted around the world.

Among the data reported, respondents were asked whether they think that divorce can never be justified (1), always be justifiable (10), or something in between. Seventeen percent of the 61,070 respondents for which a response was available selected 5 on the scale, but only eight percent selected 6 on the scale. The figure below shows that 5 was more popular than 6 even in countries whose populations leaned toward the 10 end of the scale.

It seems, then, that 5 serves as the ‘‘psychological mid-point’’ (see Rose, Munro, & Mishler 2004) of the 1-to-10 scale, which means that some respondents signal their neutrality by selecting a value closer to left end of the scale. This is not good.

Source: Harvey Araton. 2011. Saying It's Time, but Sounding Less Certain. NY Times.

Tagged with: methods, survey response, you're doing it wrong

Tag: you’re doing it wrong