by Leah Christian, Michael Dimock and Scott Keeter
The following commentary is based on a presentation at the Annual Meeting of the American Association for Public Opinion Research, Hollywood, Florida, May 14-17, 2009
As the number of adults reachable only by cell phone continues to grow, more telephone surveys are including cell phone samples to ensure that their results are representative of the U.S. population. One issue of particular concern in surveys that include cell phones is the accuracy of geographic information that is derived from cell phone numbers; this information that accompanies the sample is used by many surveyors for geographic sampling and analysis. Being able to identify the location of respondents with precision is important for accurately sampling people in particular areas and for analyzing local and regional differences in respondents’ attitudes and behaviors. This problem is exacerbated by the fact that the wireless-only are more geographically mobile than those with landline phones.
There are several differences in the geographic information provided with landline and cell phone numbers. Because cell phones are not wired to a particular location, the number associated with that phone does not have the same meaning geographically that it has for landline phones. The area code and exchange associated with a landline telephone number allows it to be located fairly precisely. However, there is no requirement that wireless phones numbers be associated with a particular address or even with a particular geographic area. Even if someone chooses a number “near” the area where they live or work, the service areas for wireless phones can be larger than those for landline phones so accurately locating someone is much more difficult. And, of course, people may obtain a telephone in one location and move to an entirely different location while retaining the phone number.
To assess the accuracy of the geographic information provided with wireless telephone samples, we test whether the sample information matches geographic data derived from respondents’ self-reported zip code at the regional, state and county level and compare the results to those for the landline sample frame. The data for this analysis come from six general population surveys conducted in the Fall of 2008. The combined dataset includes 10,430 landline respondents and 3,460 cell respondents, including 1,160 cell phone only respondents. This design allows us to test the validity of geographic information from landline and cell phone samples simultaneously and to analyze the accuracy of the information for different types of cell respondents (cell only, cell mostly, and others who use their cell phones less frequently). We also use data from a survey focused on geographic mobility to evaluate whether types of phone-use groups differ in their patterns of mobility.
Geographic Information Less Accurate in Cell Sample
The geographic information derived from cell phone numbers is subject to a great deal of error, and the size of the error increases as the geographic unit of analysis gets smaller. For geographic analysis at the regional level (e.g., Northeast, South, Midwest, West), the geographic data associated with the cell sample is fairly accurate; the sample and zip code-derived region match for 94% of cell phone respondents while only 4% do not match (in 1% of the cases the zip code was missing or could not be matched.)
Error rates increase when moving to the state level. State codes associated with the original sample differ from the state of the respondent-provided zip code for 9% of the respondents. Looked at another way, this means that in a sample of cell phone numbers for a particular state, on average about 9% of potential respondents who live in that state will not be included in the sampling frame, and the amount of error can vary from state to state.
The amount of error is larger at the county level of analysis.1 The sample and zip code-derived county do not match for nearly four-in-ten cell respondents (39%).
In the landline sample, the size of the error is considerably smaller. At the regional and state level, the sample and zip code information differ for 1% or less of the landline respondents. This increases to 7% at the county level. In 2% of the cases, the respondent did not provide a zip code or it could not be matched.
Is Geographic Information Less Accurate for Particular Groups?
While geographic assignment from cell phone numbers is inherently difficult, error rates are even higher when looking at respondents who are “cell only,” meaning they have no landline telephone in their home. Within this group – roughly 20% of the adult population according to the latest National Health Interview Survey estimates — 6% are not matched at the regional level, 12% at the state level and 43% at the county level. The percent not matched is lower, though still widespread, for adults with landline and cell phones. In the landline sample, the size of the error does not vary substantially across phone use groups and is much smaller than in the cell sample.
Overall, there are only slight differences among various demographic groups in the accuracy of the geographic information provided with the cell sample. A slightly greater share of men than women live in a different location than the sample information derived from their cell phone number would suggest. More young people are not matched than those who are age 50 and older. More whites than blacks live in a zip code that is different from the geographic information provided with the sample. Also, more college graduates than those with a high school education or less live in an area different from their cell phone number.
There are also differences by region. The size of the error is smallest in the West where counties and states are larger and the population is more dispersed; only 7% do not match at the state level and 27% at the county level in this region. However, in the Northeast where densities are greater and states and counties are often smaller, as many as 12% do not live in the state the sample would suggest and nearly half (45%) do not live in the county associated with their cell phone number.
Mobility and Phone Usage Patterns
Underlying some of the inaccuracies in geographic information for the cell sample are variations in mobility patterns among different phone use groups. More people who have only a landline phone have lived their entire life in the same community (43%) than is the case for Americans with both a landline and cell or only a cell phone. More cell-only have moved within the same state (31%) than those in other groups, whereas more dual users with both a landline and cell phone (both wireless mostly and those who use their cell phones less often) have moved across state lines.
The landline-only have also lived in their communities longer than people with a cell phone; they have lived in their community for more than 20 years on average, nearly twice as long as the cell mostly (11 years) and cell-only groups (9 years). Significantly more cell-only (33%) and cell-mostly duals (22%) have lived in their current location for less than 5 years compared with only 14% of the other dual users and 10% of the landline only. On the other end of the spectrum, significantly more of the landline-only and duals who use their cell phones less often have lived in their communities for 20 or more years.
Although there are no differences among the dual users in past moving behavior, significantly more people who use their cell phones for nearly all of their calls say they are very likely to move in the next five years (27%) than do those who use their cell phones less frequently (15%). Nearly a third (32%) of the cell-only are very likely to move in the near future. In contrast, 44% of the landline-only and 37% of the more landline dominant duals say they are not at all likely to move in the next five years.
These differences may also reflect demographic differences in mobility patterns (see Who Moves? Who Stays Put? Where’s Home? for a more detailed report of the results from this survey).
Implications for Sampling and Data Analysis
The mobile nature of wireless phones creates a significant problem for geographic sampling, particularly as the size of the area being surveyed gets smaller. Because numbers are not associated with physical addresses, there can be a large amount of error in the geographic information associated with cell phone numbers. To address this issue, respondents who do not live in the area may be identified and screened out of the survey, but people who live in the area but have cell phone numbers from a different area will not be covered in the sampling frame. The size of the error is relatively small at the regional and state level but compromises the ability to accurately sample smaller geographic areas.
For sampling within cities, counties or other more narrow geographic areas, surveyors should consider other alternatives to random digit dialing (RDD). Address-based sampling allows surveyors to locate respondents with a great deal of accuracy and can provide even more possibilities for sampling at small geographic areas than telephone surveys can. Several studies2 have documented how address-based sampling can cover the wireless-only population and provide at least comparable results to those gained from RDD telephone surveys.
Because there can be a significant amount of error associated with the geographic information provided with the sample, surveyors wishing to conduct geographic data analysis should explore other alternatives. Information can be collected from respondents to replace or at least supplement that provided with the sample. It is important to collect information at the appropriate level of precision for the desired analysis. Various types of external sources can be used to supplement information provided by respondents and reduce the number of questions needed. The Pew Research Center is now using respondents’ self-reported zip code and matching it to an external source to derive appropriate geographic information so that our analyses can be more accurate. Because there is still some error in the geographic information for landline respondents, we are using this approach for both landline and cell respondents in all of our dual frame surveys.
1. For the purposes of this analysis, we assume that respondents are able to provide an accurate zip code, and that when data do not match, it represents errors in the information derived from the telephone number. Postal service zip codes do not cross state lines, and therefore should provide an accurate assessment of the respondent’s location. Neither zip codes nor telephone prefixes align perfectly with county boundaries however, so there is a certain amount of error in accurately placing a respondent in a county using either approach. For the purposes of this analysis, we assume that the zip code provides the more accurate assignment, given that zip codes are typically smaller geographic areas than telephone prefix areas, especially when it comes to wireless telephone numbers.
2. Link, et al. 2008. “A Comparison of Address-Based Sampling (ABS) Versus Random Digit Dialing (RDD) for General Population Surveys”. Public Opinion Quarterly 72:6-27. Fleeman and Wasikowski. 2009. “Performance Rates of CPO Subsequent Survey Households Identified Via Address Frames.”; Link, et al. 2009. “Building a New Foundation: Transitioning to Address Based Sampling after Nearly 30 Years of RDD.”; and Sherr, et al. 2009. “Comparing Random Digit Dial and United States Postal Service Address-Based Sample Designs for a General Population Survey: The 2008 Massachusetts Health Insurance Survey.” presented at the Annual Meeting of the American Association for Public Opinion Research.