1 Introduction

For all we know, the human capacity for language is second to no other species in the animal kingdom. It is regarded as essential for the development of human society and its cultural achievements. In addition, humans regard language as something extremely personal. In expressing thoughts, desires, intentions, or beliefs with words, humans experience themselves as individuals. This is one of the reasons why freedom of expression is considered a human right in many jurisdictions, even where it may not be practically granted. Hence, it is only natural that efforts to moderate linguistic expressions in the digital realm are an important topic in Digital Humanism. Another reason why content moderation deserves a prominent place in digital humanism is that speech has traditionally been the medium through which politics happens. From the public debate in the ancient Greek polis to the modern-day speeches in mass media, leaders lead through language and are challenged in debates. Language thus is the medium that facilitates power, and it can be the medium by which power is taken away as in the democratic vote of the people.

The intention to regulate (or “moderate”) what becomes published is not new nor is it exclusive to digital media. What should be published has probably been a central societal, political, religious, and ethical concern for as long as writing exists, but definitely ever since the invention of the printing press. It was a subject of censorship and is still regulated by law and ethical norms, including traditional mass media (newspapers, books, TV, etc.) in modern liberal democracies. For example, many countries have laws prohibiting the publication of terrorist content and have rules for the publication of certain types of material, such as age limits for pornographic content. Often, there are self-governing bodies that regulate what can be published, for example, in mass media or in advertising. Traditionally, such limitations on publication were implemented through reviewers, editors, censors, or courts and performed by humans. They could limit publication, remove content, or restrict the audiences of certain publications. In principle, these instruments are still applicable in the digital world. However, an important new quality of content moderation emerges, when the decision to regulate content (including who can see it) is taken with the help of algorithms.

2 What Is Algorithmic Content Moderation

Content moderation is a fairly recent field of digital technologies. Even though language technologies relevant for content moderation today have been develo** for decades, researchers pay much closer attention to the area since the development of large online social media platforms. In many cases, content moderation is a response to the fact that discourse on such platforms has proven problematic. Online social platforms facilitate the distribution of untrue information (fake news), the creation of environments where people are only exposed to opinions reconfirming their interests and beliefs (filter bubbles), verbal abuse, and many other troublesome phenomena. While these phenomena are by no means exclusive to digital media or social networks, they may be exacerbated in large communities of speakers with no personal interaction other than the messages that they exchange online. Up until a few years ago, relatively few scientific papers dealt with the topic. The number of scientific publications has been increasing since 2015 (Fanta, 2017), primarily focusing on technical approaches, ethical challenges, and the perception of automatically generated content by journalists and the public. Communication science has also dealt theoretically and empirically with automated content and, in particular, automated journalism for several years.

Content moderation addresses a topic that not only concerns individuals and their linguistic online expressions; it deals with communication and how humans establish social relations. It addresses how we interact with each other and how we make sense of the world. In the following definition of content moderation, we refer to Roberts (2017):

Content moderation is the organized practice of screening user-generated content (UGC) posted to Internet sites, social media, and other online outlets, in order to determine the appropriateness of the content for a given site, locality, or jurisdiction. The process can result in UGC being removed by a moderator, acting as an agent of the platform or site in question. … The style of moderation can vary from site to site, and from platform to platform, as rules around what UGC is allowed are often set at a site or platform level and reflect that platform’s brand and reputation … The firms who own social media sites and platforms that solicit UGC employ content moderation as a means to protect the firm from liability, negative publicity, and to curate and control user experience. (Roberts, 2017, p. 1)

Content moderation is not only an issue for content providers such as online newspapers but also relevant for social media platforms such as Twitter or Facebook. It is relevant for text-based online systems as well as networks that focus on other types of content such as images, videos, and even music. In this chapter, we focus on text-based systems.

There are two main reasons to screen user-generated content (UGC).

Reason 1: Depending on national legal regulations, media providers are liable for the content published via their sites. This is particularly the case for online newspapers who underlie national regulations as of what content is permitted. For Austria, this is regulated in media law.Footnote 1 The situation is less clear for providers of social media platforms such as Facebook, Twitter, or TikTok, which is a comparably new worldwide phenomenon. Thus, the formulation of international requirements of conduct and legal standards is necessary. Respective initiatives are underway. An example is the European Digital Services Act (DSA), which is an endeavor of the European Union to regulate online services including social media.Footnote 2 The DSA was put into force in November 2022 and is planned to be fully applicable by February 2024. Due to national legal regulations, it has been vital for providers of online newspapers ever since their existence to filter out UGC that conflicts with the law. This activity is called pre-moderation and is done before posts go online. Pre-moderation is typically done automatically, because of the sheer quantity of incoming posts and the speed the posts need to be processed in order to guarantee real-time communication.

Reason 2: Individual content providers have different editorial concepts, and depending on these, they have differing demands on how people are expected to communicate with each other. This is typically communicated via the terms of use and specific rules of netiquette. Fora of online newspapers with a claim to quality are typically moderated by human moderators. An example of an online newspaper with a strong moderation policy is derStandard,Footnote 3 which has a team of human forum moderators whose main task is to support a positive discussion climate in the newspaper’s online fora.Footnote 4 This kind of moderation is called post-moderation as the moderation activities relate to posts that are online. Apart from community activities where users can flag inappropriate posts, natural language processing (NLP) plays an important role in supporting human moderators in finding posts that might be interesting to a larger group of readers of a forum than just to the few who participate in a certain thread. Automatic systems can also help identify fora or phases in the discussions, which become increasingly emotional or discriminating. Examples will be given in the following section.

There are more reasons for algorithmic content moderation, for example, the identification of protected intellectual property (see chapter by Menids in this volume).

3 Technical Approaches to Content Moderation

Technical approaches to content moderation are based on classifiers. Text classification is a core method in NLP, employing different methods of machine learning. Classifiers are used to categorize data into distinct groups or classes. Roughly, they are mathematical models that use statistical analysis and optimization to identify patterns in the data. To train a classifier, a certain amount of labeled data is required, representing in-class and out-of-class examples. A number of classifiers exist, including logistic regression, Naïve Bayes, decision tree, support vector machine (SVM), k-nearest neighbors (KNN), and artificial neural network (ANN) (see, for instance, Kotsiantis et al. (2007) for a review of classification techniques and Li et al. (2022) specifically for text classification).

In the case of fora in online newspapers, text-based classifiers are employed, which assign predefined category labels to individual posts. In practice, a number of different classifiers are in use, depending on the moderation tasks at hand. In pre-moderation, UGC is classified into content that can or cannot be posted on the media site as it adheres to or infringes the requirements of the respective (national) media law, or should not be published because it violates the medium’s defined online etiquette or community policy (Reich, 2011; Singer, 2011). In post-moderation, classifiers support the forum moderators in identifying postings of interest. What is of interest is defined by the individual media companies and may differ across individual resorts up to individual articles. All in all, moderation is an important success factor for online discussion culture (Ziegele & Jost, 2016).

The classifier technologies in use widely differ depending on the time the classifiers were developed. Earlier classifiers often use decision trees and support vector machines (SVMs); more recent ones are based on neural networks (deep learning). As technology advances over time, deep learning-based approaches typically lead to better results than classical machine learning-based approaches such as decision trees or SVMs. For illustration, examples from the Austrian online newspaper derStandard.at are given in the following.

In pre-moderation, derStandard.at has been using a decision tree-based system (Foromat, developed by OFAIFootnote 5) since 2005. While before the implementation of the system human editors had to manually inspect the incoming posts and decide which ones could go online, Foromat significantly reduced the amount of posts that needed to be manually inspected. With the volume of postings drastically increasing with the shift from print to online, manual pre-moderation became simply impossible and thus was left to the system. Accordingly, community measures such as the possibility for users to flag content as inappropriate and post-moderation became more important as a means to filter out inappropriate postings after their publication.

Post-moderation is key for encouraging an agreeable discussion climate in fora. Starting in 2016, the De-Escalation Bot was developed by OFAI together with derStandard (Schabus et al., 2017; Schabus & Skowron, 2018).Footnote 6 This is another kind of classification task where the classes were designed to prevent escalation in fora and to identify valuable contributions to discussions. The thus classified posts are then sifted by the moderators, and those which the moderators consider of general interest are ranked at the top of a forum to be easily accessed by all users of the forum. According to derStandard this has noticeably improved the quality of the discourse.Footnote 7 A more recent collaboration between OFAI and derStandard lead to a classifier that helps moderators to identify misogynist posts in order to counteract online discrimination against women (Petrak & Krenn, 2022). This is an important precondition to foster female contributions to forum discussions. While the proportion of individuals who identify themselves as men or women among the online readers of derStandard is relatively balanced at 55–45%, there is a clear imbalance when it comes to active contributions, i.e., only 20% of posters identify themselves as female (surveyed on the basis of indication of “salutation” in new registrations).

The limitations in the area of classifier-supported moderation lie primarily in the necessary provision of correspondingly large training data annotated by domain experts (typically moderators). So far, this is usually done once when the classifier is developed. Over time, however, the wording of posts may change as users may counteract moderation strategies, also what is considered relevant and desirable content is likely to change over time, as well as which marginalized user groups and what measures are required to encourage their contributions. Therefore, mechanisms need to be integrated in moderation interfaces where the moderators easily can collect new training data during their daily work, and classifiers capable of online learning need to be developed. This, however, is still a question of basic research [for some further reading on online learning, see Cano and Krawczyk (2022) and Mundt et al. (2023)]. Likewise, depending on the information to be identified, the available training data, and the machine learning architecture used, the accuracy rates can vary significantly. In all cases, however, the moderation quality to be achieved at the end strongly depends on the human experts, the forum moderators. The advantage of the classifiers is to direct the moderators to potentially relevant posts, whereas the final moderation decision lies with the moderator.

Apart from encouraging an agreeable discussion climate, tendentious and fake news detection is another important aspect of content moderation. This is particularly relevant in social media, which significantly differ from fora in online newspapers. Whereas in online newspapers a forum is related to an individual article or blog entry written by a journalist and redacted according to the editorial policy of the respective newspaper, UGC on social media platforms is far less controlled. Accordingly, social media platforms offer a high degree of freedom of expression while being open to all kinds of propaganda and misinformation. This became particularly obvious with the US presidential elections in 2016. Up to date, the identification of fake and tendentious news has become a very active field of research (see, for instance, the SemEval2019Footnote 8 task on hyperpartisan news detection (Kiesel et al., 2019), where 42 NLP systems from all over the world designed for identifying extreme right- or left-wing news competed against each other). The comparison of the systems showed that no single method had a clear advantage over others. Successful approaches included both word embeddings and handcrafted features. SemEval2023 subtask 3 addresses persuasion techniques identification. Here especially the identification of manipulative wording and attack on reputation are of interest for tendentious news detection.Footnote 9 (See also Zhou and Zafarani (2020) for a discussion of theoretical concepts and approaches to fake news detection.) Automated fact checking is another area of NLP that addresses the task of assessing whether claims made in written or spoken language are true or false. This requires NLP technology to detect a claim in a text, to retrieve evidence for or against the claim, to predict a verdict whether the claim is true or false, and to generate a justification for the verdict (cf. Guo et al., 2022). Apart from NLP-based research on automated fact checking, there is a broad range of journalist-based fact-checking initiatives and sites, such as PolitiFact, a fact-checking site for American politics; EUfactcheck, an initiative of the European Journalism Training Association; the European Fact Checking Standards Project where European organizations involved in fact checking cooperate to develop a code of integrity for independent European fact checking; or Poynter an international fact-checking networkFootnote 10—to mention only some existing fact checking initiatives.

Disinformation detection is a moving target because new topics and concerted propaganda are constantly evolving, and fact checking must adapt accordingly. Moreover, there is ever-growing and refining technology for automated disinformation production due to technological progress in deep learning, which enables the creation of deepfakes, i.e., the use of deep learning for the automated generation of fake content. As examples, see the recent large generative language models (such as OpenAI’s GPT family or Google’s PaLM, Meta’s LLaMA, etc.) that are able to flexibly generate text based on prompts or the advances in neural visual content generation (e.g., Google’s Imagen or OpenAI’s DALL-E) where pictures are generated on the basis of textual input, as well as the possibilities to generate realistically looking, however, fake audio and video content. (See Zhang et al., 2023, for a survey on generative AI.) Note that the developments in generative AI are fast and up-to-date models at the time of writing this contribution may be outdated soon.

4 Societal Challenges

Freedom of expression. As already mentioned, the identification of illegal content is an important reason for content moderation in general and algorithmic moderation in particular. Often, such content will be removed after detection, and it may incur further legal procedures. For example, it is illegal in Austria to publicly deny the Holocaust. In addition, there is content that conflicts with the terms of use or the community guidelines of a social network. For example, some networks exclude nudity or have strict policies regarding false and misleading information or “fake news.”

Often, however, content moderation is publicly debated in the context of unwanted (or harmful) content. Such content is much more difficult to define. Note that in many countries, it is not in principle illegal to lie or to use abusive language. Still, such content is often considered societally unwanted, for example, because it affects certain parts of society stronger than others, may lead to the spread of dangerous information, or be used for exerting unwanted (e.g., political) influence. It is, however, a major problem of the notion of harmful content that it lacks precision and definition. It is often not very clear who is harmed and in whose interest it is to identify, mark, or remove such content. This is one of the reasons why calls to remove harmful content raise concerns and accusations of censorship. In addition, even productive debates may benefit from some degree of strong language and therefore, according to the European Court of Human Rights, may require information that offends, shocks or disturbs (ECHR, 2022). This makes it even less clear what precisely should be considered harmful in online debates (Prem, 2022).

The extent to which algorithmic content moderation actually interferes with the principle of freedom of expression is difficult to analyze in practice. This requires a detailed topical analysis of the practices of deletion, for which data are often lacking. It also requires a differentiated analysis of the degree to which banned content (or banned users) can turn to other media to express their thoughts. The core challenge is to strike a balance between protecting users and legal obligations on the one hand and safeguarding freedom of expression on the other (Cowls et al., 2020).

Challenges of meaning and languages. A central challenge in content moderation and indeed in most aspects of language technology is the identification of what users actually mean. Human utterances are easily misunderstood in everyday life, but the notion of meaning is also an elusive philosophical concept that has been debated for centuries. Features of natural languages such as humor, irony, mockery, and many more are notoriously hard to detect (Wallace, 2015), not only by machines but to some extent also by humans. Many algorithms for content moderation are context-blind and have great difficulties detecting nuance. However, such nuance is often important in debate. A particular difficulty in algorithmic moderation is to follow discussions over an extended stretch of online debate. In fact, still today, many algorithms operate only on single posts, while the intended meaning of a post may require taking into account longer stretches of dialogue. Estimates suggest only a 70% to 80% accuracy of commercial tools (Duarte et al., 2017). For example, in 2020, Facebook removed posts with the hashtag #EndSARS in Nigeria. It was intended to draw attention to police attacks against protesters, but the moderation algorithms mistook it for misinformation about COVID-19 (Tomiwa, 2020).

Another huge challenge for the practice of algorithmic moderation is the fact that many technologies only work well for a few common languages, first and foremost English. Rarer languages in particular often lack datasets suitable to train NLP models. This raises another ethical and perhaps legal challenge regarding the fairness of different standards in content moderation for different languages. In some regions, such as the European Union, there are a variety of different languages that should be treated equally. There are 24 official languages of the EU, and some of these have a relatively small community of speakers (e.g., Maltese), and significantly fewer texts are available. There are also fewer datasets available for researchers and for AI development. Moreover, the computational and economic power required to train large language models as well as the access to large datasets, which are the basis of current NLP systems today, lies in the hands of private companies such as OpenAI, Meta, or Google.

Political challenge: power of control, silencing, selection, and redress. Online content moderation raises important issues of power and power relations. Online moderation can have a decisive influence on which topics in a debate disappear from public discourse. It can silence specific groups, and it has the power to control whose online contributions are shown to whom. Hence, the delegation of content moderation to algorithms turns content providers into subjects of algorithmic decision-making.

This leads to other important problems, namely, what happens when content is wrongfully reported, deleted, or in any other way restricted. Firstly, there is the problem of informing the authors that their contributions were subject of moderating interventions. Such information can be given, especially where content is considered illegal, but in practice, this is not always the case. In particular, the decision to limit the visibility of a post and to not prominently show it is hardly ever easily accessible for a content contributor. Secondly, the question arises how to contest the decision that content is considered inappropriate or illegal. Today, the focus of public regulation is more on the deletion of illegal content rather than on redress procedures and re-instantiation. Note that regulators do not always prescribe deletion directly but are introducing strict liability regimes that provide strong incentives for platforms to perform algorithmic content moderation even before content is reported by users or third parties (Cowls et al., 2020). This poses a significant threat for freedom of expression as it can mean that certain views are systematically suppressed with little or no chance of forcing online platforms to publish content. It also leads to the question who should be the regulatory body deciding upon complaints and implementing the re-instantiation of wrongfully deleted content. Wrongful deletion is a significant problem. For example, in Q2/2020, more than 1.1 million videos were removed from YouTube (Cowls et al., 2020). Such excessive deletion may be a consequence of regulation. Lawmakers have a tendency to regulate that removal of illegal content must be implemented within very short time frames. While regulation usually does not prescribe the use of algorithms directly, it is in practice the only way in which social networks or publishers can fulfill their legal obligations.

Since many newer techniques are based on statistical machine learning, there is an additional danger of bias and discrimination. Algorithmic content moderation systems can replicate and amplify existing biases and discrimination in society. For example, if the system is trained on biased datasets or programmed with biased algorithms, it may disproportionately flag and remove content from marginalized communities (Haimson et al., 2021). In addition, there is a lack of transparency. Content moderation algorithms are often proprietary, meaning that the public does not have access to the underlying code or the criteria used to determine what content is flagged or removed. This lack of transparency can make it difficult for users to understand why their content was removed (Suzor et al., 2019), and for researchers, it makes it hard to study the impact of these algorithms on society.

There are various ways to address issues of transparency and bias ranging from stricter regulation to increasing accountability or democratic approaches. Vaccaro et al. (2021) propose adding representation (i.e., moving toward participatory and more democratic moderation), improving communication (i.e., better explaining the reasons for moderating interventions), and designing with compassion (i.e., emphasizing empathy and emotional intelligence in moderation decisions). Other measures that have been proposed (Cowls et al., 2020) include ombudspersons, enforceable statutory regimes, and improved legal protection.

5 Conclusions

Content moderation is not a new phenomenon. It has existed at least since written forms of expressions developed and was performed by humans in charge. However, algorithmic content moderation is a new phenomenon tied to the explosion of digital content in online media and social networks. Today’s content moderation is a combination of human moderators and algorithmic support, however with a strong trend toward more automation in reaction to growing amounts of content and increasing regulation. There is a range of reasons for content moderation. This includes legal aspects (e.g., duties of platform owners to remove illegal content) and practical aspects such as filtering content for relevance and aiming for showing users the most relevant content. Social reasons for content moderation may include the enforcement of user guidelines or the identification of content that is considered inappropriate or harmful.

Moderation takes many different forms. Content may be signaled or highlighted to inform users about issues with the content; it can be practically hidden from users by downranking and thus decreasing the likelihood of this content to be viewed, or it can be entirely deleted. Content can also be delayed in publication and reported to boards or authorities. There are great differences in the extent to which users are informed about their content being flagged, and there is only limited access to current practices for many of the social networks today. Algorithmic content moderation is often a response to legal requirements. Lawmakers may not prescribe algorithmic deletion directly but prescribe short time limits or severe fines so that algorithms are the only practically viable approaches given the large volumes of data and the high frequency of user interaction.

Challenges arise from the difficulty to interpret human language automatically, especially regarding context and nuances of expression. Also language technologies and training data available for rarer languages are less advanced than those for English. Potential pitfalls include limitations of freedom of expression, bias and unfair treatment, and political influence exerted through silencing of dissent or tendentious moderation. It is possible to perform algorithmic moderation with societally beneficial intent. This includes the abovementioned example of de-escalation, encouraging factful and constructive online debates (Kolhatkar & Taboada, 2017; Park et al., 2016) and pursuing other moderation objectives, such as making voices of systematically underrepresented groups better heard.

The advent of generative artificial intelligence and large language models is generally considered a game changer in text-based AI and expected to trigger a plethora of new applications and systems. Most of the challenges described in this chapter also apply to AI text generators as they pose similar questions of classifying text as harmful or illegal, using and abusing generated text for political propaganda, facilitating the creation of text for children and many more. Similarly issues of discrimination, bias, and transparency also apply to large language models and require further research, social debate, and political agreement and intervention.

Discussion Questions for Students and Their Teachers

  1. 1.

    What are the differences between illegal and harmful content? Discuss how these issues were dealt with in traditional, non-digital publishing. As an example, consider advertising in traditional mass media.

  2. 2.

    What constitutes a good online debate? Should all online discussion only be factual and all commentary be friendly and respectful, or is it sometimes necessary to simplify and use strong words?

  3. 3.

    What can be done to make a debate constructive? Consider techniques used in real-world discussions, and then think about which mechanisms can be transferred to the virtual world.

  4. 4.

    What are the main democratic threats emerging from algorithmic content moderation? Consider who has the power of running social network infrastructure and who is in charge of sha** online discourse. What can be done to balance the power to foster democratic principles?

Learning Resources for Students

  1. 1.

    For a survey and a taxonomy of different approaches to text classification, see Li et al. (2022). It also includes benchmarks and a comparison of different approaches.

  2. 2.

    N. Persily and J.A. Tucker’s (2020) book ‘Social Media and Democracy” provides a comprehensive account of social media, content moderation and the challenges for democracy.

  3. 3.

    There is an online summary video available for the ACM opinion piece referenced above (Prem 22): https://youtu.be/SjAH2HYKEhM illustrating the problem of illegal and harmful content and questions regarding freedom of expression.

  4. 4.

    Á. Díaz and L. Hecht-Felella (2021) provide a detailed discussion of many issues listed here and recommendations from a more legal perspective. The paper includes a critical perspective on content moderation regarding the representation of viewpoints from minorities.

  5. 5.

    The so-called Teachable Machine https://teachablemachine.withgoogle.com/ is a web-based tool for creating your own machine learning classification models. Its graphical user interface (GUI) allows you to train classifiers without prior programming knowledge and without the need to have special expertise in machine learning. A short overview is Teachable Machine: Approachable Web-Based Tool for Exploring Machine Learning Classification by Carney et al. (2020).