About the Google Play Store Data Collection
Findings about Google apps permissions in this report are based on an analysis of data about 1,041,336 apps collected from the Google Play Store between June 2014 and September 2014. The data collection or scraping (“scraping” in this case refers to the process of copying the contents of a web page) began with a custom extension for the Google Chrome web browser created by Pew Research Center developers.
When run, the extension would open the Google Play Store website and go to the webpage for an app as designated by a unique app ID number each app in the store receives. It would then copy the content of that app’s page, store that information in a SQL database, and move on to the next app in a continual process until no more app ID’s were available. The extension engaged in data collection from June 18, 2014 to September 8, 2014.
The initial list of app ID’s to search was collected from a site that mirrors the Google Play Store called Androidpit.com (the site no longer mirrors the store as of this publication). As individual apps were scraped, the extension would collect the app ID numbers from the “related” apps listed on each app’s page. These “related” app IDs were then added to the initial list of apps to scrape in a sequential process, until eventually there were no new app IDs.. If an app was removed from the Google Pay Store during this process it would return a 404 error and its ID would be removed from the database.
About the Survey Findings
The Pew Research Center survey findings reported here come from two surveys conducted in 2015. The overall smartphone ownership number is based on telephone interviews conducted June 10, 2015, through July 12, 2015 among a national sample of 2,001 adults, 18 years of age or older, living in all 50 U.S. states and the District of Columbia. A total of 701 respondents were interviewed on a landline telephone, and 1,300 were interviewed on a cellphone, including 709 who had no landline telephone. The survey was conducted by interviewers at Princeton Data Source under the direction of Princeton Survey Research Associates International. A combination of landline and cellphone random digit dial samples were used; both samples were provided by Survey Sampling International. Interviews were conducted in English and Spanish. Respondents in the landline sample were selected by randomly asking for the youngest adult male or female who was at home. Interviews in the cellphone sample were conducted with the person who answered the phone, if that person was 16 years of age or older. For detailed information about our survey methodology, visit: https://legacy.pewresearch.org/methodology/u-s-survey-research/
The combined landline and cellphone samples are weighted using an iterative technique that matches gender, age, education, race, Hispanic origin and nativity, and region to parameters from the 2013 Census Bureau’s American Community Survey and population density to parameters from the Decennial Census. The sample also is weighted to match current patterns of telephone status (landline only, cellphone only or both landline and cellphone), based on extrapolations from the 2014 National Health Interview Survey. The weighting procedure also accounts for the fact that respondents with both landline and cellphones have a greater probability of being included in the combined sample and adjusts for household size among respondents with a landline phone. The margins of error reported and statistical tests of significance are adjusted to account for the survey’s design effect, a measure of how much efficiency is lost from the weighting procedures.
Findings on apps usage and attitudes in this report are based on a Pew Research Center survey conducted between Jan. 27, 2015, and Feb. 16, 2015, among a sample of 461 adults ages 18 or older. The survey was conducted by the GfK Group using KnowledgePanel, its nationally representative online research panel. GfK selected a representative sample of 1,537 English-speaking panelists to invite to join the subpanel and take the first survey in January 2014. Of the 935 panelists who responded to the invitation (60.8%), 607 agreed to join the subpanel and subsequently completed the first survey (64.9%) whose results were reported in November 2014. This group has agreed to take four online surveys about “current issues, some of which relate to technology” over the course of a year and possibly participate in one or more 45- to 60-minute online focus group chat sessions.
KnowledgePanel members are recruited through probability sampling methods and include both those with internet access and those without. KnowledgePanel provides internet access for those who do not have it and, if needed, a device to access the internet when they join the panel. A combination of random digit dialing (RDD) and address-based sampling (ABS) methodologies have been used to recruit panel members (in 2009 KnowledgePanel switched its sampling methodology for recruiting panel members from RDD to ABS). The panel comprises households with landlines and cellular phones, including those only with cellphones and those without a phone. Both the RDD and ABS samples were provided by Marketing Systems Group (MSG).
KnowledgePanel continually recruits new panel members throughout the year to offset panel attrition as people leave the panel. Respondents were selected randomly from eligible adult household members of the panel. All sampled members received an initial email on Aug. 5, 2014, to notify them of the survey that included a link to the survey questionnaire. One standard follow-up reminder was sent three days later to those who had not yet responded.
The final sample for this survey was weighted using an iterative technique that matches gender, age, education, race, Hispanic origin, household income, metropolitan area or not, and region to parameters from the March 2013 Census Bureau’s Current Population Survey (CPS). In addition, the sample is weighted to match current patterns of internet access from the October 2012 CPS survey. This weight is multiplied by an initial base or sampling weight that corrects for differences in the probability of selection of various segments of the sample and by a panel weight that adjusts for any biases due to nonresponse and noncoverage at the panel recruitment stage (using all of the parameters mentioned above as well as home ownership status).
Sampling errors and statistical tests of significance take into account the effect of weighting at each of these stages. Sampling error for the total sample of 498 respondents is plus or minus 5.6 percentage points at the 95% level of confidence.
Data on smartphone ownership is taken from a nationally representative telephone survey conducted June 10, 2015 through July 12, 2015, among 2,001 American adults age 18+. Data on app downloading and app usage is based on an online survey conducted Jan. 27, 2015 through -Feb. 16, 2015 among 461 Americans age 18+. The following table shows the unweighted sample sizes and the error attributable to sampling that would be expected at the 95% level of confidence for different groups in the survey:
Sample sizes and sampling errors for other subgroups are available upon request. The margins of error reported and statistical tests of significance are adjusted to account for the survey’s design effect, a measure of how much efficiency is lost from the weighting procedures.
In addition to sampling error, one should bear in mind that question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of opinion polls.