This study included all online comments submitted to the Federal Communications Commission (FCC) regarding the proposal called Restoring Internet Freedom (Docket FCC-17-108). All of the data downloaded and analyzed were originally submitted using the FCC’s Electronic Comment Filing System (ECFS). All data and comments used in this report are stored on the FCC’s site and are freely available to the public. When submitting a comment, the FCC notified users that all their information submitted, including names and addresses, would be publicly available via the web.
The FCC opened the docket for public comment on April 27, 2017. The comment period was initially scheduled to end on Aug. 16, 2017, but was extended to Aug. 30, 2017. Only comments submitted during that official period (April 27-Aug. 30) were included in this study.
The FCC also allows for submissions by phone or letter, but those comments are not publicly accessible and were excluded from this report.
Pew Research downloaded all online comments from the FCC’s public Application Program Interface (API) using a Python script. The FCC assigned a unique ID number to each comment, although researchers discovered that some ID numbers appeared multiple times in the dataset. The Center removed all comments with duplicate ID numbers. In total, the Center collected and analyzed 21,706,195 comments.
The collection and analysis of the comments were conducted prior to the release of downloadable versions of the comments as zip files by the FCC in early November 2017.
In addition to filling out the FCC’s form, commenters had the option to attach files such as a text document. In order to maintain consistency and to limit the size of the dataset, the Center removed all attachments. The Center also removed addresses.
The following data was collected and analyzed for each comment:
- Full text of the comment
- Confirmation number
- Author name
- Email address
- Email confirmation
- Date submitted
- Date received
Text matching
Many comments included text that was nearly identical to other posts but differed in small ways. For example, many comments included the exact text as it appeared on a form or sample letter but signed with a different name. Some comments were identical except for an additional space or apostrophe, while others utilized the same sentences but were ordered in slightly different ways. For this study, Pew Research Center decided that comments of this nature were similar enough to be considered “matching” or “non-unique.”
To systematically identify these matching or non-unique posts, the Center used a measure known as “cosine similarity” to compare the text of all comments in the dataset. This technique takes two comments and compares the characters used. Comments were considered to be non-unique if the cosine similarity was .95 or above on a 0-1 scale. The .95 threshold is a conservative benchmark and ensured that only those comments that were nearly identical in content were counted as matching.
In addition to using cosine similarity to identify matching posts, the Center also performed a manual grouping process on the 100 most-submitted comments. In this process, researchers manually grouped together comments where the only differences involved line spacing, line breaks, word capitalization or the name used as the signoff in the text.
The following three examples offer a practical demonstration of the matching process employed for this analysis:
Example 1:
Comments A and B below have a cosine similarity of 0.98 and accordingly are considered matching. In this case, the only differences between the two are the capitalization of the word “internet” and the order of the sentences.
Comment A:
The unprecedented regulatory power the Obama Administration imposed on the internet is smothering innovation, damaging the American economy and obstructing job creation. I urge the Federal Communications Commission to end the bureaucratic regulatory overreach of the internet known as Title II and restore the bipartisan light-touch regulatory consensus that enabled the Internet to flourish for more than 20 years. The plan currently under consideration at the FCC to repeal Obama’s Title II power grab is a positive step forward and will help to promote a truly free and open internet for everyone.
Comment B:
The plan currently under consideration at the FCC to repeal Obama’s Title II power grab is a positive step forward and will help to promote a truly free and open internet for everyone. The unprecedented regulatory power the Obama Administration imposed on the internet is smothering innovation, damaging the American economy and obstructing job creation. I urge the Federal Communications Commission to end the bureaucratic regulatory overreach of the internet known as Title II and restore the bipartisan light-touch regulatory consensus that enabled the internet to flourish for more than 20 years.
Example 2:
Comments C and D have a cosine similarity of 0.95, which just meets the Center’s threshold to be consolidated. The only difference is the additional sentence in the first comment.
Comment C:
Don’t kill net neutrality. We deserve a free and open Internet with strong Title II rules. This will ensure that the flow of data is determined by the interests of Internet users
Comment D:
We deserve a free and open Internet with strong Title II rules. This will ensure that the flow of data is determined by the interests of Internet users
Example 3:
Comments E and F begin the same way, but the additional sentence in comment E causes them to fall short of the 0.95 threshold used in this report. Accordingly, these two comments would not be considered matching.
Comment E:
The FCC’s Open Internet Rules (net neutrality rules) are extremely important to me. I urge you to protect them.
Comment F:
The FCC’s Open Internet Rules (net neutrality rules) are extremely important to me.