Introduction

It is a common assumption in social science that, as Erikson and Tedin1 put it, “Conservatives consider people inherently unequal and worthy of unequal rewards,” whereas “liberals are egalitarian” (p. 69). Generations of philosophers, social theorists, and political scientists have argued that a fundamental, if not the fundamental, difference between ideologues of the left and right concerns egalitarianism: liberal-leftists prioritize social, economic, and political forms of equality, whereas conservative-rightists accept existing forms of hierarchy and inequality as legitimate and necessary, and perhaps even desirable (e.g.,2,3,4,5,6,7,8). A stronger commitment to equality and tolerance explains evidence that has accumulated over several decades that, on both implicit and explicit measures, political liberals express less hostility than conservatives toward a wide range of social groups that are frequent targets of prejudice in society5,8,9,10,11,12,13,14,15,16,17,18,19,20.

Recently, however, the longstanding idea that liberals are more egalitarian, more tolerant, and less prejudiced than conservatives has come under attack. It has been argued that liberal-leftists are every bit as authoritarian, intolerant, and hostile toward dissimilar others as are conservative-rightists21,22,23,24,25,26. The overarching claim is that leftists and rightists are equally biased, but they are just biased against different groups27. There is also an untested assumption in the literature on “worldview conflict” and prejudice27 that conservatives are biased against Black people and women not because of race or gender, but merely because they assume that Black people and women are liberal. Thus, whereas rightists are said to express prejudice against groups that are presumed to be left-leaning (such as Black people, atheists, and women), leftists are said to express prejudice against groups that are presumably right-leaning (such as businesspeople, Christians, and men).

The vast majority of evidence put forward on behalf of the ideological symmetry perspective is based on self-reported attitudes, such as feeling thermometer ratings of how “cold” or “warm” people feel toward specific target groups. A typical, albeit unsurprising finding is that social conservatives feel more warmth toward groups perceived as socially conservative (vs. liberal), whereas social liberals feel more warmth toward groups perceived as socially liberal28. However, we think that there are several major problems with investigating ideological symmetries and asymmetries in prejudice this way29.

To begin with, most of the research purporting to document ideological symmetries in prejudice merely shows that liberals and conservatives sometimes express lukewarm attitudes toward specific groups. This body of work relies upon what we consider to be a watered-down definition of prejudice as any “negative evaluation... on the basis of group membership,” which “does not depend on whether such a prejudice can be justified according to some moral code” (p. 359)30. This conceptualization departs radically from “classic” definitions of prejudice in social psychology, such as Gordon Allport’s31 treatment of prejudice as “thinking ill of others without sufficient warrant” (p. 6), that is, “an antipathy based upon a faulty and inflexible generalization... directed toward a group as a whole, or toward an individual because he is a member of that group” (p. 9). Textbook definitions likewise emphasize “a hostile or negative attitude toward a distinguishable group based on generalizations derived from faulty or incomplete information” (p. 231)32, and “an unjustifiable (and usually negative) attitude toward a group and its members [involving] stereotyped beliefs, negative feelings, and a predisposition to discriminatory action” (p. G–10)33. When social scientists seek to understand and ameliorate prejudice, we expect that they are not concerned merely with the expression of lukewarm attitudes but with the kind of intense, unwarranted negative affect that motivates hostility, hatred, intimidation, and discrimination (e.g.,34).

To overcome limitations of previous research on the subject, and to investigate the hypothesis that liberal commitments to equality and democratic tolerance would contribute to an ideological asymmetry in expressions of hostility, intimidation, and prejudice, we conducted a large-scale investigation of naturally occurring social media behavior. Specifically, we harvested a large corpus of Twitter messages based on keywords that included social groups that, according to previous research, are common targets of liberal prejudice (e.g., Catholics, Whites, wealthy people, and conservatives) and conservative prejudice (e.g., Blacks, illegal immigrants, and liberals). In addition, we implemented a Bayesian Spatial Following model to estimate the ideological positions of Twitter users in our sample, so that we could compare the online behavior of left- and right-leaning social media users. Finally, we used a combination of manual and automatic text-coding methods to investigate ideological asymmetries in the use of language containing (1) threat and intimidation, (2) obscenity and vulgarity, (3) name-calling and humiliation, (4) hatred and racial, ethnic, or religious slurs, (5) stereotypic generalizations, and (6) negative prejudicial language. We hypothesized that: (HI) tweets mentioning liberal- or left-leaning target groups will contain more expressions of online prejudice than tweets mentioning conservative- or right-leaning target groups; and (HII) tweets sent by conservative- and right-leaning users will contain more expressions of online prejudice than tweets sent by liberal- and left-leaning users.

Method

Data collection and inclusion criteria

We used a supervised machine-learning approach to analyze naturally occurring language in a very large number of social media posts sent by liberal-leftists and conservative-rightists in reference to groups that have been identified as likely targets of liberal and conservative bias. The population of interest was the set of messages circulated in the U.S. Twittersphere. Between March and May 2016, we harvested 733,907 Twitter messages that included one or more of the 96 keywords listed in Table 1, including progressives, rightists, Christians, civil rights activists, Caucasians, Black people, destitute, and rich people. The selection of target groups was based on previous research by Chambers et al.23 and Brandt et al.22, which sought to specify frequent targets of “liberal prejudice” and “conservative prejudice.” For each of the target groups, we included synonyms, all of which were either hashtags or keywords used on Twitter during the period of data collection. All search terms were manually inspected prior to data collection. Some of the terms were deemed by the computer scientists implementing the queries as too common on Twitter to be included in the collection, so they were excluded. To filter out tweets that contained pornographic content and those written in languages other than English, respectively, we included pornography and non-English as categories in the human coding and machine-learning phases. We excluded tweets that, through machine-learning classification, had a probability of containing pornographic content greater than 0.50 and being non-English greater than 0.50. This left us with a total sample of 670,973 tweets that were eligible for further analysis.

Table 1 Keywords used to harvest tweets for the data collection.

Ideological estimation

We used Barberá’s method of estimating left–right (or liberal-conservative) ideological positions of Twitter users36. This method, which has been validated in a number of ways, employs a Bayesian Spatial Following model that treats ideology as a latent variable estimated on the basis of follower networks, that is, the number of liberal and conservative political accounts (of well-known journalists, politicians, and other political actors) that the individual follows. We were able to calculate point estimates for a total of 325,717 Twitter users. Scores ranged from -2.611 (very liberal) to 4.668 (very conservative), with a mean of 0.369 (SD = 1.724). The mean indicated that, on average, the users in our sample were moderate (neither liberal nor conservative). Using this method, 176,948 Twitter users in our sample were classified as liberal-leaning (that is, below zero), and 148,769 were classified as conservative-leaning (above zero).

Human coding phase

To train the automatic machine-learning algorithm to classify tweets, it was necessary to first have a subset of them manually coded. Before rating the tweets that were used for the machine learning phase, all raters participated in a two-hour training session and were taught to follow the same standardized protocol (see Human Coding Manual in Supplementary Material). In the pilot coding phase, seven trained research assistants coded a total batch of 1000 tweets (500 tweets each) to assess the appropriateness of the coding instructions. We then used their feedback to make clarifications, minor revisions, and edits to the coding manual. In the next phase, 11 trained undergraduate and graduate psychology students coded an additional set of 6000 tweets. The final sample of manually coded tweets therefore consisted of N = 7000 unique tweets, with each tweet coded by at least three independent raters.

Coding categories

To establish our coding scheme, we conducted an extensive literature search on studies of online incivility and the linguistic expression of prejudice. Incivility in online discourse is operationally defined in terms of the use of disrespectful language37,38. Disrespectful language can be broken down further into the use of obscene language and name-calling or attempts to humiliate the target of the disrespectful language. In the context of intergroup relations, incivility may also include the use of aggressive, threatening, or intimidating language. Because a main goal of our research program was to investigate ideological symmetries and asymmetries in prejudice, we estimated the prevalence of negative prejudicial language, which is underpinned by stereotypical categorical generalizations expressed in a way that renders them largely immune to counterevidence11,17,31,34,35. Thus, we sought to analyze prejudicial language directed at specific target groups that are typically perceived to be left- and right-leaning, respectively. Because our dataset was harvested before Twitter expanded its policies against hate speech and hateful conduct in late 2019, we were able to investigate hatred directed at various target groups.

Therefore, research assistants coded the tweets on all of the following dimensions: (1) Threat/intimidation: language conveying a threat to use physical violence or intimidation directed at an individual or group; (2) Obscenity: an offensive word or phrase that would be considered inappropriate in professional settings; (3) Hatred: a communication that carries no meaning other than the expression of hatred for some social group; (4) Name-calling/humiliation: language directed at an individual or group that is demeaning, insulting, mocking, or intended to create embarrassment; (5) Stereotypic generalization: false or misleading generalizations about groups expressed in a manner that renders them largely immune to counterevidence; and (6) Negative prejudice: an antipathy based on group-based generalizations, that is, an unfavorable feeling “toward a person or thing, prior to, or not based on, actual experience” (p. 6)31.

Inter-rater reliability coefficients for each of these categories are provided in the Online Supplement (Tables S.1S.8). We used a majority voting method, so that if two or more of the three human coders agreed that a given tweet contained hatred, obscenity, prejudice, and so on, it was classified as belonging to the positive class. Coding frequencies estimated for the training data set are summarized in Table S.9 of the Supplement for each of the six theoretical categories (plus the two screening categories).

Machine-learning phase

Training, validation, and test sets for the machine-learning phase were based on the 7000 human-coded tweets. We reserved 20% (1400) of the tweets to use as a test set to evaluate final model performance. Of the other 5600 tweets, 20% (1100) were used for purposes of validation, leaving 4500 tweets with which to train the models. We used several different text classification strategies, including “bag of words” models such as the Support Vector Machine (SVM), neural networks such as Long Short-Term Memory (LSTM), and transfer learning techniques such as Universal Language Model Fine-Tuning (ULMFiT) and Bidirectional Encoder Representations from Transformers (BERT). We applied each of these strategies to classify the tweets according to the six dimensions of classification. For the sake of brevity, we report results from the best performing model, namely BERT. Detailed information about all machine-learning methods and results are provided in the Online Supplement, along with a comparative analysis of the four machine learning models employed.

Bidirectional encoder representations from transformers

BERT is an innovative state-of-the-art language representation model

Table 4 Bivariate correlations between target ideology (groups that were perceived as more conservative/rightist) and the expression of linguistic bias overall (N = 670,973 tweets).

Communicator effects

Next, we investigated the effects of user ideology on linguistic bias. This analysis was based on the subset of messages (n = 325,717) sent by users who could be classified as liberal or conservative. As shown in Table 5, conservative Twitter users were more likely than liberal Twitter users to communicate negative prejudice (r = 0.210), name-calling (r = 0.146), stereotypes (r = 0.110), and threatening language (r = 0.092), all ps < 0.001. Conservatives were slightly more likely to use hateful language (r = 0.011), whereas liberals were slightly more likely to use obscenity (r = −0.010); both of these effects were quite small but, because of the very large sample size, still significant at p < 0.001.

Table 5 Correlations between user ideology (twitter users who were classified as more conservative/rightist) and the expression of linguistic bias, both overall and against specific target groups.

Communicator effects analyzed separately for liberal versus conservative target groups

Next, we inspected correlations between user ideology and linguistic bias directed at groups that were generally perceived to be liberal or left-leaning vs. conservative or right-leaning, respectively (see Table 5). For the subsample of tweets that mentioned liberal-leftist groups (n = 229,788), which comprised 70.5% of the total number of tweets in our collection, users who were classified as more conservative were more likely to express negative prejudice (r = 0.247), to engage in name-calling (r = 0.191), and to include threats (r = 0.123), stereotypes (r = 0.116), and hatred (r = 0.021), all ps < 0.001. There was no effect of user ideology on the use of obscenity (r = 0.003, p = 0.119).

For the much smaller subsample of tweets that mentioned conservative-rightist groups (n = 95,929), more liberal users were slightly more likely to express obscenity (r = −0.047) and hatred (r = −0.026), both ps < 0.001. However, for the remaining categories, conservative Twitter users were actually more likely than liberal Twitter users to express linguistic bias. That is, even when writing about groups that are generally considered to be right-leaning, conservatives were more likely to communicate negative prejudice (r = 0.118), stereotypes (r = 0.096), name-calling (r = 0.025), and threatening language (r = 0.021), all ps < 0.001.

Sensitivity analyses

We conducted additional sensitivity analyses to determine whether the results and their interpretation was impacted by analytic decisions. Specifically, we re-coded the continuous estimates for linguistic bias into binary, categorical variables (< 50% probability = does not contain biased language, ≥ 50% probability = does not contain biased language) and conducted regression analyses. Results were very similar to those described above.

Tweets that mentioned liberal-leaning groups were more likely to contain hatred (b = − 0.45, SE(b) = 0.008, Wald = 2833.26), threats (b = − 0.26, SE(b) = 0.006, Wald = 1697.99), obscenity (b = − 0.23, SE(b) = 0.006, Wald = 1710.62), name calling (b = − 0.36, SE(b) = 0.004, Wald = 6783.19), stereotypes (b = − 0.22, SE(b) = 0.004, Wald = 2480.92), and negative prejudice (b = − 0.272, SE(b) = 0.003, Wald = 6386.04), all ps < 0.001.

We also compared the frequencies (percentages) of messages about various target groups that contained each type of linguistic bias. Tweets about left-leaning (vs. right-leaning) groups were again more likely to contain hatred (3.5% vs. 0.5%, χ2 = 6882.25), threats (4.1% vs. 2.9%, χ2 = 749.48), obscenity (5.8% vs. 2.8%, χ2 = 3521.28), name calling (10.4% vs. 4.8%, χ2 = 7342.45), stereotypes (7.9% vs. 5.7%, χ2 = 1254.16), and negative prejudice (15.4% vs. 10.1%, χ2 = 4205.00), all ps < 0.001.

Finally, we examined whether user ideology was related to the percentage of messages containing linguistic bias. Tweets sent by more conservative users had a higher probability of containing hateful language (b = 0.049, SE(b) = 0.008, Wald = 35.76), threats (b = 0.225, SE(b) = 0.005, Wald = 2055.62), name calling (b = 0.210, SE(b) = 0.003, Wald = 3766.95), stereotypes (b = 0.134, SE(b) = 0.003, Wald = 1475.32), and negative prejudice (b = 0.26, SE(b) = 0.003, Wald = 9125.68), all ps < 0.001. There was no statistically significant effect of user ideology in the use of obscene language (b = − 0.007, SE(b) = 0.006, Wald = 1.81, p = 0.179).

Ideology of the coders

Because we were concerned that the political orientations of the raters could bias their coding, we asked the research assistants to answer three questions about their general political orientation (“Please indicate on the scale below how liberal or conservative [in terms of your general outlook] you are”), social attitudes (“How liberal or conservative do you tend to be when it comes to social policy?”), and economic attitudes (“How liberal or conservative do you tend to be when it comes to economic policy?”). Responses could range from 1 (very liberal) to 7 (very conservative). The 8 (of 11) raters who answered these questions were liberal leaning on average, M = 2.46 (SD = 1.05).

We examined point-biserial correlations between coders’ ideology scores and their rating of each linguistic category under study for every batch of tweets. We found that rater ideology was unrelated to the criterion linguistic category used to train the machine learning algorithm, i.e., hateful language (r = 0.009, p = 0.139). Rater ideology was also unrelated to the detection of threatening language in the training tweets (r = 0.011, p = 0.079). At the same time, the more conservative our raters were, the more likely they were to detect obscenity (r = 0.022, p < 0.001), whereas the more liberal our raters were, the more likely they were to detect name-calling (r = − 0.028, p < 0.001), stereotypes (r = − 0.136, p < 0.001), and negative prejudice (r = − 0.111, p < 0.001). Thus, coder ideology was inconsistently related to the use of various coding categories. Most importantly, ideology of the raters was unrelated to their ratings of hatred, which was used as the base linguistic model for training the other categories. It is also worth highlighting the fact that the classification and labeling process for the machine learning training relied on majority voting, so that at least two annotators must have agreed that the tweets contained hatred, obscenity, etc., before it was labeled as belonging to the positive class.