Methodology

This analysis of the Twitter discussions surrounding the 2014 European Union (EU) elections employed media research methods that combined Pew Research’s content analysis rules with computer coding software developed by Crimson Hexagon (CH). This report is based on examination of more than 1.2 million tweets that were identified as being about the EU elections during the time period May 1 – 14, 2014.  The primary searches were conducted in three languages: English, French and German.

Crimson Hexagon is a software platform that identifies statistical patterns in words used in online texts. Researchers enter key terms using Boolean search logic so the software can identify relevant material to analyze. Pew Research draws its analysis sample from all public Twitter posts. Then a researcher trains the software to classify documents using examples from those collected posts. Finally, the software classifies the rest of the online content according to the patterns derived during the training. While automated sentiment analysis is not perfect, the Center has conducted numerous tests and determined that Crimson Hexagon’s method of analysis is among the most accurate tools available. Multiple tests suggest that human coders and Crimson Hexagon’s results are in agreement between 75% and 83% of the time. (For a more in-depth explanation on how Crimson Hexagon’s technology works click here)

This analysis contains two parts. The first is an analysis of the sentiment or tone of the conversation on Twitter. The second is an analysis of the most discussed topics surrounding the European elections.

All tweets analyzed in this report were collected between 12 am ET, May 1, 2014 to 12 am ET, May 14, 2014.

Each Boolean search used keywords in three languages–English, French and German – with the exception of that tracking sentiment toward the candidates for the European Commission, which was conducted in English only.

The Boolean searches used for each monitor included a variety of terms relevant to the subject being examined. For example, the search used to identify tweets about the European elections was: (“European Union” OR EU OR “European Elections” OR “European Parliament Elections” OR “European Politics” OR Eurosceptics OR Europhile OR Anti-EU OR Pro-EU OR EUDebate2014 OR EUDebate OR “European Parties” OR Europarty OR (Euro AND Currency) OR (Euro AND Europe) OR EP2014 OR EP14 OR EE2014 OR EUelections2014 OR EU_Commission OR “European Commission” OR EC OR “European Parliament” OR Europarl_EN OR (Europe AND Vote)) AND NOT “eu te amo.”

In the case of the candidates for the European Commission, the Boolean search used to identify the tweets represented varying versions of each candidate’s name. For example: SkaKeller OR FranziskaKeller OR “Ska Keller” OR “Franziska Keller.”

Researchers classified more than 250 tweets in order to “train” these specific Crimson Hexagon monitors. All tweets were put into one of four categories: positive, neutral, negative or off topic. Depending on the search, a tweet was considered positive if it clearly praised a candidate, and considered negative if it was clearly critical. The same applied to sentiment towards the EU.

CH monitors examine the entire Twitter discussion in the aggregate. To do that, the algorithm breaks up all relevant texts into subsections. Rather than the dividing each story, paragraph, sentence or word, CH treats the “assertion” as the unit of measurement. Thus, posts are divided up by the computer algorithm. Consequently, the results are not expressed in percent of newshole or percent of stories. Instead, the results are the percent of assertions out of the entire body of stories identified by the original Boolean search terms. We refer to the entire collection of assertions as the “conversation.”