Stata treats missing values as positive infinity

I discussed here some weird things that SPSS does with regard to weighting. Here's another weird thing, this time in Stata:

StataQ1trunc

The variable Q1 has a minimum of 0 and a maximum of 99,999. For this particular survey question, 99,999 is not a believable response; so, instead of letting 99,999 and other unbelievable responses influence the results, I truncated Q1 at 100, so that all responses above 100 equaled 100. There are other ways of handling unbelievable responses, but this can work as a first pass to assess whether the unbelievable responses influenced results.

The command replace Q1trunc = 100 if Q1 > 100 tells Stata to replace all responses over 100 with a response of 100; but notice that this replacement increased the number of observations from 2008 to 2065; that's because Stata  treated the 57 missing values as positive infinity and replaced these 57 missing values with 100.

Here's a line from Stata's help missing documentation:

all nonmissing numbers < . < .a < .b < ... < .z

Stata has a reason for treating missing values as positive infinity, as explained here. But -- unless users are told of this -- it is not obvious that Stata treats missing values as positive infinity, so this appears to be a source of potential error for code with a > sign and missing values.

Here's how to recode the command so that missing values remains missing: replace Q1trunc = 100 if Q1 > 100 & if Q1 < .

Tagged with: ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*

This site uses Akismet to reduce spam. Learn how your comment data is processed.