Survey methods
The survey date used in this report is based on telephone interviews conducted Feb. 29 to May 8, 2016, among a national sample of 3,769 adults, 18 years of age or older, living in all 50 U.S. states and the District of Columbia (977 respondents were interviewed on a landline telephone, and 2,792 were interviewed on a cellphone, including 1,676 who had no landline telephone). The survey was conducted by interviewers at Princeton Data Source under the direction of Princeton Survey Research Associates International (PSRAI). Interviews were conducted in English and Spanish. For detailed information about our survey methodology, see https://legacy.pewresearch.org/methodology/u-s-survey-research/
Four separate samples were used for data collection to obtain a representative sample that included an oversample of black and Hispanic respondents. The first sample was a disproportionately stratified random-digit-dial (RDD) landline sample drawn using standard list-assisted methods. A total of 822 interviews were completed using this RDD landline sample. The second sample was a disproportionally stratified RDD cell sample to oversample blacks and Hispanics. A total of 2,440 interviews were completed using this RDD cell sample. Respondents in the landline sample were selected by randomly asking for the youngest adult male or female who is now at home. Interviews in the cell sample were conducted with the person who answered the phone, if that person was an adult 18 years of age or older.
The landline and cell callback samples were drawn from recent Pew Research Center surveys conducted by PSRAI and included people who identified themselves as black at the time of the initial interview. All surveys used to produce the callback samples employed RDD sampling methodologies.
The weighting was accomplished in multiple stages to account for the disproportionately stratified samples, the overlapping landline and cell sample frames and household composition, the oversampling of blacks through callback interviews and differential non-response associated with sample demographics.
The first stage of weighting corrected for different probabilities of selection associated with the number of adults in each household and each respondent’s telephone usage patterns.10 This weighting also adjusts for the overlapping landline and cell sample frames and the relative sizes of each frame and each sample. Since we employed a disproportionately stratified sample design, the first-stage weight was computed separately for each stratum in each sample frame. The callback sample segments were assigned a first-stage weight equal to their first-stage weight from their original interview. After the first-stage weighting an adjustment was made to account for the callback oversamples, landlines and cellphones, of blacks.
The next step in weighting was demographic raking. The data was first divided into three groups – black, Hispanic and white/other. Each group was raked separately to population parameters for sex by age, sex by education, age by education and census region. The white/other group was also raked on a two-category race variable – white vs. not white. The Hispanic group was also raked on nativity – U.S. born vs. foreign born. The combined dataset was raked to parameters for race/ethnicity, population density and household telephone usage. The telephone usage parameter was derived from an analysis of the most recently available National Health Interview Survey data.11 The population density parameter was derived from Census 2010 data at the county level. All other weighting parameters were derived from an analysis of the 2014 American Community Survey 1-year PUMS file.
The margins of error reported and statistical tests of significance are adjusted to account for the survey’s design effect, a measure of how much efficiency is lost from the weighting procedures.
The following table shows the unweighted sample sizes and the error attributable to sampling that would be expected at the 95% level of confidence for different groups in the survey:
Sample sizes and sampling errors for other subgroups are available upon request.
In addition to sampling error, one should bear in mind that question wording and practical difficulties in conducting surveys can introduce error or bias into the findings of opinion polls.
Pew Research Center undertakes all polling activity, including calls to mobile telephone numbers, in compliance with the Telephone Consumer Protection Act and other applicable laws.
Content Analysis
The analysis of the conversations on Twitter regarding race was conducted using the Center’s content analysis rules with computer coding software developed by Crimson Hexagon (CH).
Crimson Hexagon is a software platform that identifies statistical patterns in words used in online texts. Researchers enter key terms using Boolean search logic so the software can identify relevant material to analyze. The Center draws its analysis sample from all public Twitter posts. Then a researcher trains the software to classify documents using examples from those collected posts. Finally, the software classifies the rest of the online content according to the patterns derived during the training.
While automated sentiment analysis is not perfect, the Center has conducted numerous tests and determined that Crimson Hexagon’s method of analysis is among the most accurate tools available. Multiple tests suggest that results from human coders and Crimson Hexagon are generally in agreement between 75% and 83% of the time. Additional tests for this project showed agreement of more than 85% for these specific analyses.
For this report, researchers created four separate queries (also known as monitors). The first was on all Twitter conversation pertaining to race. The next three were focused on the use of the hashtags #BlackLivesMatter, #AllLivesMatter and #BlueLivesMatter. For all monitors, only English-language tweets were included. The unit of analysis for all monitors was the tweet or post.
Twitter conversation about race
The time period examined for the race-focused conversation on Twitter was Jan. 1, 2015, to March 31, 2016. There were two major steps to this query.
The first step involved coming up with a long list of terms that could be used as a Boolean search query to identify all tweets that could be mentioning race. (A tweet was considered relevant if it included an explicit reference to the concept of race in general, blacks or whites.) To do this, researchers created a list of 50 words, abbreviations and phrases that are likely to appear in tweets about race. This list consisted of words such as “white” and “race” and “racism,” along with abbreviations such as “blk,” hashtags such as “blktwitter” and a number of commonly used slang terms. Researchers created the list through a lengthy process of testing after an examination of thousands of relevant tweets. Researchers also referred to word clouds created by Crimson Hexagon’s tool to make sure there were no common terms missing from the query.
In addition, researchers created a list of terms to be excluded from the query that were commonly used phrases that were not about race. These included terms such as “White Sox,” “race day” and “Chris Brown.”
The goal of creating these lists for the search was to collect all possible tweets that could be discussing race – even the ones that included these terms but were not about race. CH’s algorithm has the ability to determine the relevance of posts to the subject matter being investigated, and researchers would use the software to filter out posts that were captured in the query but not relevant to the project.
The second step in creating this monitor was to train the CH algorithm to identify race-related tweets and to categorize them according to their subject matter. To do this, researchers created a list of six thematic categories plus one for posts that were “off topic” or irrelevant to this project. The categories were as follows:
- Related to current events – focused on 2016 campaign
- Related to current events – focused on police or judicial system
- Related to current events – focused on celebrities or entertainment
- Related to current events – focused on things other than above categories
- About discrimination, but not tied to specific current events
- Related to race, but not explicitly about racial discrimination
- Off topic or irrelevant
In accordance with CH’s best practices, researchers categorized more than 20 sample posts for each of the above categories. As an additional step to ensure validity, two researchers were involved. The first researcher categorized posts and a second researcher reviewed the training categorization to see if they would have coded any posts differently. In the rare cases where the two researchers disagreed about the categorization of a specific post, the post was removed from the training.
After the training and double checking was completed, the CH algorithm applied that categorization to the millions of other tweets to determine how many posts were about race and how many of the tweets fit into each category.
In total, the query returned a total of 1.1 billion tweets. However, the monitor determined that 995 million of those tweets were relevant to this project, while the remaining 137 million tweets were considered “off topic” and excluded from the analysis.
Use of the hashtags #BlackLivesMatter, #AllLivesMatter and #BlueLivesMatter on Twitter
The queries for the two other CH monitors were much simpler to create since their search terms were more straight forward. All tweets that included the hashtags #BlackLivesMatter, #AllLivesMatter or #BlueLivesMatter were included in the analysis, so there was no need for an “off topic” category or a long keyword search.
Two different time periods were examined.
The first timeframe was July 12, 2013, to March 31, 2016, and only examined tweets with #BlackLivesMatter or #AllLivesMatter. In total, there were 11,781,721 tweets included in the #BlackLivesMatter monitor and 1,551,193 tweets included in the #AllLivesMatter monitor.
The specific Boolean keyword search for the #BlackLivesMatter monitor was:
#blacklivesmatter OR blacklivesmatter
The specific Boolean keyword search for the #AllLivesMatter monitor was:
#alllivesmatter OR alllivesmatter
The seven categories for the #BlackLivesMatter monitor were as follows:
- Support of #BlackLivesMatter movement – general support
- Support of #BlackLivesMatter movement – focused on specific incidents of alleged police misconduct
- Neutral references to #BlackLivesMatter hashtag or movement
- Criticism of #BlackLivesMatter hashtag or movement
- General race issues not clearly tied to #BlackLivesMatter hashtag or movement
- 2016 campaign
- Miscellaneous
The eight categories for the #AllLivesMatter monitor were as follows:
- Support of #AllLivesMatter movement – general support
- Support of #AllLivesMatter movement – focused on support for police or firefighters
- Opposition to #BlackLivesMatter movement
- Neutral references to #AllLivesMatter hashtag or movement
- Criticism of #AllLivesMatter hashtag or movement
- Pro-life/anti-abortion
- Animal rights
- Miscellaneous
The second time period examined was July 5 to July 13, 2016. The same search terms for #AllLivesMatter and #BlackLivesMatter were used as above. An additional monitor was created for #BlueLivesMatter following the same set of rules. (#BlackLivesMatter appeared approximately 800,000 times in the period from April 1 to July 4, 2016, but those tweets were not included in the following analysis.)
During this time, there were 4,945,229 tweets with the hashtag #BlackLivesMatter, 633,106 tweets with #AllLivesMatter and 415,329 tweets with #BlueLivesMatter.
For all three of these monitors, the main categories were as follows:
- Support for the given hashtag
- Opposition for the given hashtag
- Neutral references for the given hashtag
For the hashtag #BlueLivesMatter, an additional subcategory was created underneath the “support” category to measure the amount of tweets that directly criticized President Obama. For the hashtag #BlackLivesMatter, an additional subcategory was created underneath the “opposition” category to measure the amount of tweets that directly connected the Black Lives Matter movement with violence.
As with the monitor on all racially focused tweets, two researchers were involved in the training process for these monitors. One researcher categorized the posts, while a second reviewed the training. In the rare cases where the two researchers disagreed about the categorization of a specific post, the post was removed from the training.