This analysis is based on data from a nationally representative survey of 4,573 adults, conducted online Aug. 8-21 and Sept. 14-28, 2017, using Pew Research Center’s American Trends Panel. Respondents were randomly assigned to answer one of four open-ended questions:
- Thinking about men in our society these days… What traits or characteristics do you think people in our society value most in men? (n=1,133)
- Thinking about men in our society these days… What traits or characteristics do you think people in our society believe men should NOT have? (n=1,162)
- Thinking about women in our society these days… What traits or characteristics do you think people in our society value most in women? (n=1,142)
- Thinking about women in our society these days… What traits or characteristics do you think people in our society believe women should NOT have? (n=1,136)
Respondents could list up to three traits or characteristics, which generated a total of 14,143 words. We then used Levenshtein distance and cosine similarity measures to identify and group words that started with the same letters but were not identical, like “attractive” and “attractiveness” and “honest” and “honesty.” Every word was also filtered through a linguistic database called WordNet to find known variations (called “synsets”) such as “honor” and “honorable.” These tools helped us compile a list of 3,685 pairs of words, which we then reviewed individually to determine whether each pair of words should be collapsed or not.
Additionally, we identified 40 common multiword phrases that we included in addition to the individual words that compose them – but only “multitasking” was used frequently enough to make it into our analysis. We also developed a list of 32 negation indicators – words, phrases or prefixes that invert the meaning of the word they precede, like “not,” “lack of,” “un-” and “dis-”. We used these patterns to identify and remove negated forms of words from our analysis. There were a handful of exceptions to these rules – such as “understanding” – that were preserved as-is. Finally, we removed a number of “stopwords” comprised of a standard set of English words that hold little meaning on their own – words like “the,” “at,” and “is” – as well as a set of additional terms that were either reiterations of words found in the prompt (“men,” “women,” “trait”) or words that didn’t represent meaningful traits (“overly,” “personally”).
We ended up with 1,586 unique words, which we classified as “positive” when used as an answer to the questions about traits and characteristics that people in our society value most in men and in women, and “negative” when used as an answer to the questions about traits and characteristics people in our society believe men and women should not have. We compared the number of times each word was used in a positive and negative way and plotted them along a continuum, from words that were used in a negative way 100% of the time to words that were used in a positive way 100% of the time, with the midpoint representing words that were used equally in both positive and negative contexts. We then filtered the words down to those that met any of the following criteria:
- Words that were used overwhelmingly to describe a single gender (95% of the time, or more); each word must have been used at least 10 times for that gender.
- Words that were used in a positive and negative context at roughly equal rates for a specific gender (between 40% and 60% positive); each word must have been used at least 10 times for the relevant gender.
- Words with large differences in positive/negative use by gender (10 percentage points or more); each word must have been used at least 10 times to describe each gender.
Finally, we used a series of logistic regressions to confirm that the differences observed were statistically significant for each of these sets of words. These regressions modeled the likelihood that a respondent would use each of these words depending on the type of prompt they received, represented by two independent variables: a flag indicating whether the prompt was about men or women, and another indicating whether the prompt asked for positive or negative traits. For the first set of words, we found that each word was significantly more likely to be used for one gender over the other (p ≤ 0.05). For the second set of words, we found that each word was not significantly more likely to be used in a positive or negative context. And for the third set of words, we included an interaction between the two types of prompts as a third independent variable. Some of the positive/negative gender differences in this set turned out to be statistically insignificant, so we excluded those words.
We were left with 10 words with significant positive/negative gender differences, four words with roughly equal positive/negative usage for one gender, 10 words that were used almost exclusively to describe women, and 14 words that were used almost exclusively to describe men, which we filtered down to the top 10 by frequency for space considerations. In addition to showing plots for each of these sets, we also introduce our analysis with a handpicked selection of words designed to illustrate a variety of patterns, including words from each of these sets and a couple of the most-used words overall (“honest” and “lazy”).
For more, see the complete data essay.