Abstract
The intersection of social media and politics is yet another realm in which Computational Social Science has a paramount role to play. In this review, I examine the questions that computational social scientists are attempting to answer – as well as the tools and methods they are developing to do so – in three areas where the rise of social media has led to concerns about the quality of democracy in the digital information era: online hate; misinformation; and foreign influence campaigns. I begin, however, by considering a precursor of these topics – and also a potential hope for social media to be able to positively impact the quality of democracy – by exploring attempts to measure public opinion online using Computational Social Science methods. In all four areas, computational social scientists have made great strides in providing information to policy makers and the public regarding the evolution of these very complex phenomena but in all cases could do more to inform public policy with better access to the necessary data; this point is discussed in more detail in the conclusion of the review.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
1 Introduction
The advent of the digital information age – and, in particular, the stratospheric rise in popularity of social media platforms such as Facebook, Instagram, Twitter, YouTube, and TikTok – has led to unprecedented opportunities for people to share information and content with one another in a much less mediated fashion that was ever possible previously. These opportunities, however, have been accompanied by a myriad of new concerns and challenges at both the individual and societal levels, including threats to systems of democratic governance (Tucker et al., 2017). Chief among these are the rise of hateful and abusive forms of communication on these platforms, the seemingly unchecked spread of mis- and disinformation,Footnote 1 and the ability of malicious political actors, including, and perhaps most notably, foreign adversaries, to launch coordinated influence attacks in an attempt to hijack public opinion.
Concurrently, the rise of computing power and the astonishing developments in the fields of information storage and retrieval, text-as-data, and machine learning have given rise to a whole new set of tools – collectively known as Computational Social Science – that have allowed scholars to study the digital trace data left behind by the new online activity of the digital information era in previously unimaginable ways. These Computational Social Science tools can enable scholars to characterize and describe the newly emerging phenomena of the digital information era but also, in the case of the more malicious of these new phenomena, to test ways to mitigate their prevalence and impact. Accordingly, this chapter of the handbook summarizes what we have learned about the potential for Computational Social Science tools to be used to address the three of these threats identified above: hate speech, mis-/disinformation, and foreign coordinated influence campaigns. As these topics are set against the backdrop of influencing public opinion, I begin with an overview of how Computational Social Science techniques can be harnessed to measure public opinion. Finally, the chapter concludes with a discussion of the paramount importance for any of these efforts of ensuring that independent researchers – that is, researchers not employed by the platforms themselves – have access to the data necessary to continue and build upon the research described in the chapter, as well as to inform, and ultimately facilitate, public regulatory policy.
All of these areas – using Computational Social Science to measure public opinion, and to detect, respond to, and possibly even remove hate speech, misinformation, and foreign influence campaigns – have important public policy connotations. Using social media to measure public opinion offers the possibility for policy makers to have additional tools at their disposal for gauging the opinions regarding, and salience of, issues among the general public, ideally helping to make governments more responsive to the public. Hate speech and misinformation together form the crux of the debate over “content moderation” on platforms, and Computational Social Science can provide the tools necessary to implement policy makers’ proscriptions for addressing these potential harms but also, equally importantly, for understanding the actual nature of the problems that they are trying to address. Finally, foreign coordinated influence campaigns, regardless of the extent to which they actually influence politics in other countries, can rightly be conceived of as national security threats when foreign powers attempt to undermine the quality and functioning of democratic institutions. Here again, Computational Social Science has an important role to play in identifying such campaigns but also in terms of attempting to measure the goals, strategies, reach, and ultimate impact of such campaigns.Footnote 2
In the review that follows, I focus almost exclusively on publications and papers from the last 3–4 years. To be clear, this research all builds on very important prior work that will not be covered in the review.Footnote 3 In addition, in the time it has taken to bring this piece to publication, there have undoubtedly been many new and important contributions to the field that will not be addressed here. But hopefully the review is able to provide readers with a fairly up to date sense of the promises of – and challenges facing – new approaches from Computational Social Science to the study of democracy and its challenges.
2 Computational Social Science and Measuring Public Opinion
One of the great lures of social media was that it would lead to new ways to analyse and measure public opinion (Barberá & Steinert-Threlkeld, 2020; Klašnja et al., 2017). Traditional survey-based methods of measuring public opinion of course have all sort of important advantages, to say nothing of a 70-year pedigree of developing appropriate methods around sampling and estimation. There are, however, drawbacks too: surveys are expensive; there are limits to how many anyone can run; they are dependent on appropriate sampling frames; they rely on an “artificial” environment for measuring opinion and are correspondingly subject to social desirability bias; and, perhaps most importantly, they can only measure opinions for the questions pollsters decide to ask. Social media, on the other hand, holds open the promise of inexpensive, real-time, finely grained time-series measurement of people’s opinions in a non-artificial environment where there is no sense of being observed for a study or needing to respond to a pollster (Beauchamp, 2017). Moreover, analysis can also be retrospective, going back in time to study the evolution of opinion on a topic for which one might not thought to have previously asked questions in public opinion surveys.Footnote 4
The way the field has developed has not, however, been in a way that uses social media to mimic the traditional public opinion polling approach of an omnibus survey that presents attitudes among the public across a large number of topics on a regular basis. Instead, we have seen two types of Computational Social Science studies take centre stage: studies that examine attitudes over time related to one particular issue or topic and studies that attempt to use social media data to assess the popularity of political parties and politicians, often in an attempt to predict election outcomes.Footnote 5
The issue-based studies generally involve a corpus of social media posts (usually tweets) being collected around a series of keywords related to the issue in question and then sentiment analysis (usually positive or negative sentiment towards the issue) being measured over a period of time. Studies of this nature have examined attitudes towards topics such as Brexit (Georgiadou et al., 2020), immigration (Freire-Vidal & Graells-Garrido, 2019), refugees (Barisione et al., 2019), austerity (Barisione & Ceron, 2017), COVID-19 (Dai et al., 2021; Gilardi et al., 2021; Lu et al., 2021), the police (Oh et al., 2021), gay rights (Adams-Cohen, 2020), and climate change (Chen et al., 2021b). Studies of political parties and candidates follow similar patterns although sometimes using engagement such as “likes” to measure popularity instead of sentiment analysis. Recent examples include studies that have been conducted in countries including Finland (Vepsäläinen et al., 2017), Spain (Bansal & Srivastava, 2019; Grimaldi et al., 2020), and Greece (Tsakalidis et al., 2018).Footnote 6
Of course, studying public opinion using computational social methods and social media data is not without its challenges. First and foremost is the question of representativeness: whose opinions are being measured when we analyse social media data? There are two layers of concern here: whether the people whose posts are being analysed are representative of the overall users of the platform but also whether the overall users of the platform are representative of the population of interest (Klašnja et al., 2017). If the goal is simply to ascertain the opinions of those using the platform, then the latter question is less problematic. Of course, the “people” part of the question can also be problematic, as social media accounts can also be “bots”, accounts that are automated to produce content based on algorithms as opposed to having a one-to-one relationship to a human being, although this varies by platform (Grimaldi et al., 2020; Sanovich et al., 2018; Yang et al., 2020). Another problem for representativeness can arise when significant portions of the population lack internet access, or when people are afraid to voice their opinions online due to fear of state repression (Isani, 2021).
Even if the question of representativeness can be solved and/or an appropriate population of interest identified, the original question of how to extract opinions out of unstructured text data still remains. Here, however, we have seen great strides by computational social scientists in developing innovative methods. Loosely speaking, we can identify two basic approaches. The first set of methods are characterized by a priori identifying text that is positively or negatively associated with a certain topic and then simply tracking the prevalence (e.g. counts, ratios) of these words over time (Barisione et al., 2019; Georgiadou et al., 2020; Gilardi et al., 2022). For example, in Siegel and Tucker (2018), we took advantage of the fact that when discussing ISIS in Arabic, the term “Islamic State” suggests support for the organization, while the derogatory term “Daesh” is used by those opposed to ISIS. Slight variations on this approach can involve including emojis as well as words (Bansal & Srivastava, 2019) or focusing on likes instead of text (Vepsäläinen et al., 2017).
The more popular approach, however, is to rely on one of the many different machine learning approaches to try to classify sentiment. These approaches include nonnegative matrix factorization (Freire-Vidal & Graells-Garrido, 2019), deep learning (Dai et al., 2021), convolutional and recurrent neural nets (Wood-Doughty et al., 2018), and pre-trained language transformer models (Lu et al., 2021; Terechshenko et al., 2020); many papers also compare a number of different supervised machine learning models and select the one that performs best (Adams-Cohen, 2020; Grimaldi et al., 2020; Tsakalidis et al., 2018). While less common, some studies use unsupervised approaches for stance relying on networks and activity to cluster accounts (Darwish et al., 2019). Closely related to these latter approaches are network-based models that are not focused on positive or negative sentiment towards a particular topic, but rather attempt to place different users along a latent dimension of opinion, such as partisanship (Barberá, 2015; Barberá et al., 2015) or attitudes towards climate change (Chen et al., 2021b).
With this basic background on the ways in which Computational Social Science can be utilized to measure public opinion using social media data, in the remainder of this chapter, I examine the potential of Computational Social Science to address three pernicious forms of online behaviour that have been identified as threats to the quality of democracy: hate speech, misinformation, and foreign influence campaigns.
3 Computational Social Science and Hate Speech
The rise of Web 2.0 brought with it the promise of a more interactive internet, where ordinary users could be contributing content in near real time (Ackland, 2013). Social media in many ways represented the apex of this trend, with the most dominant tech companies becoming those that did not actually produce content, but instead provided platforms on which everyone could create content. While removing the gatekeepers from the content production process has many attractive features from the perspective of democratic participation and accountability, it also has its downsides – perhaps no more obvious than the fact that gatekeepers could also play a role in policing online hate. As that observation became increasingly obvious, a wave of scholarship has developed utilizing Computational Social Science tools to attempt to characterize the extent of the problem, measure its impact, and assess the effectiveness of various countermeasures (Siegel, 2020).
Attempts to measure the prevalence and diffusion of hate speech have been at the forefront of this work, including studies that take place on single platforms (Gallacher & Bright, 2021; He et al., 2021; Mathew et al., 2018) and those on multiple platforms (Gallacher, 2021; Velásquez et al., 2021) with the latter including studies of what happens to user’s hate speech on one platform when they are banned from another one (Ali et al., 2021; Mitts, 2021). Other studies have focused on more specific topics, such as the amount of hate speech produced by bots as opposed to humans (Albadi et al., 2019), examining whether there are serial producers of hate in Italy (Cinelli et al., 2021) or hate speech targeted at elected officials and politicians (Greenwood et al., 2019; Rheault et al., 2019; Theocharis et al., 2020).
A second line of research has involved attempting to ascertain both the causes and effects of hate speech and in particular the relationship between offline violence, including hate crimes, and online hate speech. For example, a number of papers have examined the rise in online anti-Muslim hate speech on Twitter and Reddit following terrorist attacks in Paris (Fischer-Preßler et al., 2019; Olteanu et al., 2018) and Berlin (Kaakinen et al., 2018). Conversely, other studies have examined the relationship between hate speech on social media and hate crimes (Müller & Schwarz, 2021; Williams et al., 2020). Other work examines the relationship between political developments and the rise of hate speech, such as the arrival of a boat of refugees in Spain (Arcila-Calderón et al., 2021). Closely related are studies, primarily of an experimental nature, that attempt to measure the impact of being exposed to incivility (Kosmidis & Theocharis, 2020) or hate speech on outcomes such as prejudice (Soral et al., 2018) or fear (Oksanen et al., 2020).
A third line of research has focused on attempts to not just detect but also to counter hate speech online. The main approach here has been field experiments, where researchers detect users of hate speech on Twitter, use “sock puppet” accounts to deliver some sort of message designed to reduce the use of hate speech using an experimental research design, and then monitor users’ future behaviour. Stimuli tested have involved varying the popularity, race, and partisanship of the account delivering the message (Munger, 2017, 2021), embedding the exhortation in religious (Islamic) references (Siegel & Badaan, 2020), and threats of suspension from the platform (Yildirim et al., 2021). Researchers have also employed survey experiments to measure the impact of counter-hate speech (Sim et al., 2020) as well as observational studies, such as Garland et al. (2022)’s study of 180,000 conversations on German political Twitter.
Computational Social Science sits squarely at the root of all of this research, as any study that involves detecting hate speech at scale needs to rely on automated methods.Footnote 7 There are essentially two different research strategies employed by researchers. The first is to utilize dictionary methods – identifying hateful words that are either available in existing databases or identified by the researchers conducting the study and then collecting posts that contain those particular terms (Arcila-Calderón et al., 2021; Greenwood et al., 2019; Mathew et al., 2018; Mitts, 2021; Olteanu et al., 2018).
The second option is to rely on supervised machine learning. As with the study of opinions and sentiment generally, we can see a wide range of supervised ML methods employed, including pre-trained language models based on the BERT architecture (Cinelli et al., 2021; Gallacher, 2021; Gallacher & Bright, 2021; He et al., 2021), SVM models (Rheault et al., 2019; Williams et al., 2019), random forest (Albadi et al., 2019), doc2vec (Garland et al., 2022), and logistic regression with L1 regularization (Theocharis et al., 2020). Siegel et al. (2021) combine dictionary methods with supervised machine learning to screen out false positives from the dictionary methods using a naive Bayes classifier and, signaling a potential warning for the dictionary methods, find that large numbers (in many cases approximately half) of the tweets identified by the dictionary methods are removed by the supervised machine learning approach as false positives.
Unsupervised machine learning is less prevalent in this research – other than for identifying subtopics in a general area in which to look for the relative prevalence of hate speech (e.g. Arcila-Calderón et al. 2021, (refugees), Velásquez et al. 2021 (COVID-19), Fischer-Preßler et al. 2019 (terrorist attacks)) – although Rasmussen et al. (2021) propose what they call a “super-unsupervised” method for hate speech detection that relies on word embeddings and does not require human-coded training data.
One important development of note is that in recent years it is becoming more and more possible to find studies of hate speech involving language other than English, including Spanish (Arcila-Calderón et al., 2021), Italian (Cinelli et al., 2021), German (Garland et al., 2022), and Arabic (Albadi et al., 2019; Siegel & Badaan, 2020). Other important Computational Social Science innovations in the field include matching accounts across multiple platforms to observe how the same people behave on multiple platforms, including how content moderation actions on one platform can impact hate speech on another (Mitts, 2021) and network analyses of the spread of hateful content (Velásquez et al., 2021). Finally, it is important to remember that any form of identification of hate speech that relies on humans to classify speech as hateful or not is subject to whatever biases underlie human coding (Ross et al., 2017), which includes all supervised machine learning methods. One warning here can be found in Davidson et al. (2019), who demonstrate that a number of hate speech classifiers are more likely to classify tweets written in what the authors call “African-American English” as hate speech than tweets written in standard English.
4 Computational Social Science and Misinformation
In the past 6 years or so, we have witnessed a very significant increase in research related to misinformation online.Footnote 8 One can conceive of this field as attempting to answer six closely related questions, roughly in order of time sequence:
-
1.
Who produces misinformation?
-
2.
Who is exposed to misinformation?
-
3.
Conditional on exposure, who believes misinformation?
-
4.
Conditional on belief, is it possible to correct misinformation?
-
5.
Conditional on exposure, who shares misinformation?
-
6.
Through production and sharing, how much misinformation exists online/on platforms?
Computational Social Science can be used to shed light on any of these questions but is particularly important for questions 2, 5, and 6: who is exposed, who shares, and how much misinformation exists online?Footnote 9
To answer these questions, Computational Social Science is employed in one of two ways: to trace the spread of misinformation or to identify misinformation. The former of these is a generally easier task than the latter, and studies that employ Computational Social Science in this way generally follow the following pattern. First, a set of domains or news articles are identified as being false. In the case of news articles, researchers generally turn to fact checking organizations for lists of articles that have been previously identified as being false such as Snopes or PolitiFact (Allcott et al., 2019; Allcott & Gentzkow, 2017; Shao et al., 2018). Two points are worth noting here. First, this means that such studies are limited to countries in which fact checking organizations exist. Second, such studies are also limited to articles that fact checking organizations have chosen to check (which might be subject to their own organizational biases).Footnote 10 For news domains, researchers generally rely either on outside organization that ranks the quality of news domains, such as NewsGuard (Aslett et al., 2022), or else lists of suspect news sites published by journalists or other scholars (Grinberg et al., 2019; Guess et al., 2019). Scholars have also found other creative ways to find sources of suspect information, such as public pages on Facebook associated with conspiracy theories (Del Vicario et al., 2016) or videos that were removed from YouTube (Knuutila et al., 2020). Once the list of suspect domains or articles are identified, the Computational Social Science component of researching the spread comes from interacting with and/or scraping online information to track where these links are found. This can be as simple as querying an API, and as complicated as developing methods to track the spread of information.Footnote 11
The second – and primary – use of Computational Social Science techniques in the study of misinformation is the arguably more difficult task of using Computational Social Science to identify content as misinformation. As might be expected, using dictionary methods to do so is much more difficult than for tasks such as identifying hate speech or finding posts about a particular topic or issue. Accordingly, when we do see dictionary methods in the study of misinformation, they are generally employed in order to identify posts about a specific topic (e.g. Facebook ads related to a Spanish general election in Cano-Orón et al., 2021) that are then coded by hand; Gorwa (2017) and Oehmichen et al. (2019) follow similar procedures of hand labelling small numbers of posts/accounts as examples of misinformation in Poland and the United States, respectively.
Although still a very challenging computational task, recent research has begun to attempt to use machine learning to build supervised classifiers to identify misinformation on Twitter using SVMs (Bojjireddy et al., 2021), BERT embeddings (Micallef et al., 2020), and ensemble methods (Al-Rakhami & Al-Amri, 2020). Jagtap et al. (2021) comparatively test a variety of different supervised classifiers to identify misinformation in YouTube comments. Jachim et al. (2021) have built a tool based on unsupervised machine learning called “Troll Hunter” that while not identifying misinformation per se can be used to surface narratives across multiple posts online that might form the basis of disinformation campaign. Karduni et al. (2019) also incorporate images into their classifier.
Closely related, other studies have sought to harness network analysis to identify misinformation online. For example, working with leaked documents that identify actors paid by the South Korean government, Keller et al. (2020) show how retweet and co-tweet networks can be used to identify possible purveyors of misinformation. Zhu et al. (2020) utilize a “heuristic greedy algorithm” to attempt to identify nodes in networks that, if removed, would greatly reduce the spread of misinformation. Sharma et al. (2021) train a network-based model on data from the Russian Internet Research Agency (IRA) troll datasets released by Twitter and use it to identify coordinated groups spreading anti-vaccination and anti-masks conspiracies.
A different use of machine learning to identify misinformation – in this case, false news articles – can be found in Godel et al. (2021). Here we assess the possibility of crowdsourcing fact checking of news articles by testing a wide range of different possible rules for how decisions could possibly be made by crowds. Compared with intuitively simple rules such as “take the mode of the crowd”, we find that machine learning methods that draw upon a richer set of features – and in particular when analysed using convolutional neural nets – far outperform simple aggregation rules in having the judgment of the crowd match the assessment of a set of professional fact checkers.
Given the scale at which misinformation spreads, it is clear that any content moderation policy related to misinformation will need to rely on machine learning to at least some extent. From this vantage point, the progress the field has made in recent years must be seen as encouraging; still, important challenges remain. First, the necessary data to train models is not always available, either because platforms do not make it available to researchers due to privacy or commercial concerns or because it has, ironically, been deleted as part of the process of content moderation.Footnote 12 In some cases, platforms have released data of deleted accounts for scholarly research, but even here the method by which these accounts were identified generally remains a black box. Second, for any supervised learning method, the question of the robustness of a classifier designed to identify misinformation in one context to detect it in another context (different language, different country, different context even in the same country and language) remains paramount. While this is a problem for measuring sentiment on policy issues or hate speech as well, we have reason to suspect that the contextual nature of misinformation might make this even more challenging and suggests the potential value of unsupervised and/or network-based models. Third, so many of the methods to date rely on training classifiers based on news that has existed in the information ecosystem for extended periods of time, while the challenge for content moderation is to be able to identify misinformation in near real time before it spreads widely (Godel et al., 2021). Finally, false positives can have negative consequences as well, if the reaction to identifying misinformation is to suppress its spread. While reducing the spread of misinformation receives the most attention, it is important to remember that reducing true news in circulation is also costly, so future studies should try to explicitly address this trade-off, perhaps by attempting to assess the impact of methods of identifying misinformation for the overall makeup of the information ecosystem.
5 Computational Social Science and Coordinated Foreign Influence Operations
A third area in which Computational Social Science plays an important role in protecting democratic integrity is in the study of foreign influence operations. Here, I define foreign influence operations as coordinated attempts online by one state to influence the attitudes and behaviours of citizens of another state.Footnote 13 While foreign propaganda efforts of course precede the advent of the modern digital information age, the cost of mounting coordinated foreign influence operations has significantly dropped in the digital information era, especially due to the rise of social media platforms.Footnote 14
Research on coordinated foreign influence operations (hereafter CFIOs) can loosely be described as falling into one of two categories: attempts to describe what actually happened as part of previously identified CFIOs and attempts to develop methods to identify new CFIOs. Notably, the scholarly literature on the former is much larger (although one would guess that research on the latter is being conducted by social media platforms). Crucially, almost all of this literature, though, is dependent on having a list of identified accounts and/or posts that are part of CFIOs – by definition if the goal is to describe what happened in a CFIO and for use as training data if the goal is to develop methods to identify CFIOs. Accordingly, the primary source of data for the studies described in the remainder of this section are collections of posts from (or list of accounts involved with) CFIOs released by social media platforms. After having turned over lists of CFIO accounts to the US government as part of congressional testimony, Twitter has emerged as a leader in this regard; however other platforms including Reddit and Facebook have made CFIO data available for external research as well.Footnote 15
By far the most studied subject of CFIOs is the activities of the Russian IRA in the United States (Bail et al., 2020; Bastos & Farkas, 2019) and in particular in the period of time surrounding the 2016 US presidential election (Arif et al., 2018; Boyd et al., 2018; DiResta et al., 2022; Golovchenko et al., 2020; Kim et al., 2018; Linvill & Warren, 2020; Lukito, 2020; Yin et al., 2018; Zannettou et al., 2020).
Studies of CFIOs in other countries include Russian influence attempts in Germany (Dawson & Innes, 2019), across 12 European countries (Innes et al., 2021), Syria (Metzger & Siegel, 2019), Libya, Sudan, Madagascar, Central African Republic, and Mozambique (Grossman et al., 2019, 2020); Chinese influence attempts in the United Kingdom (Schliebs et al., 2021), Hong Kong, and Taiwan (Wallis et al., 2020); and the US (Molter & DiResta, 2020) and Iranian influence attempts in the Middle East (Elswah et al., 2019).
The methods employed in these studies vary, but many involve a role for Computational Social Science. In Yin et al. (2018) and Golovchenko et al. (2020), we extract hyperlinks shared by Russian IRA trolls using a custom-built Computational Social Science tool; in the latter study, we also utilize methods described earlier in this review in the measuring public opinion section to automate the estimation of the ideological placement of the shared links. Zannettou et al. (2020) extract and analyse the images shared by Russian IRA accounts. Innes et al. (2021), Dawson and Innes (2019), and Arif et al. (2018) all rely on various forms of network analysis to track the spread of IRA content in Germany, Europe, and the United States, respectively. Two studies of Chinese influence operations use sentiment analysis – again, in a manner similar to the one described earlier in the measuring public option section – to measure whether influence operations are relying on positive or negative messages (Molter & DiResta, 2020; Wallis et al., 2020). In a similar vein, Boyd et al. (2018) use NLP tools to chart the stylistic evolution of Russian IRA posts over time. DiResta et al. (2022) and Metzger and Siegel (2019) use structural topic models to dig deeper into the topics discussed by Russian influence operations in the United States and tweets by Russian state media about Syria, respectively. Lukito (2020) employs a similar method to the one discussed earlier in the measuring public opinion section regarding whether elites or masses drive the discussion of political topics to argue that the Russian IRA was trying out topics on Reddit before purchasing ads on those subjects on Facebook. Other papers combine digital trace data from social media platforms such as Facebook ads (Kim et al., 2018) or exposure to IRA tweets (Bail et al., 2020; Eady et al., 2022) with survey data.
A number of studies rely on qualitative analyses based on human annotation of CFIO account activity (e.g. Innes et al. (2021) include a case study of a Russian influence in Estonia to supplement a network-based study of Russian influence in 12 European countries; see also Bastos and Farkas, 2019; Dawson and Innes, 2019; DiResta et al., 2022; and Linvill and Warren, 2020), but even in these cases, Computational Social Science plays a role in allowing scholars to extract the relevant posts and accounts for analysis.
What there is much less of, though, are studies of the actual influence of exposure to CFIOs, which is a direction in which the literature should try to expand in the future. Two exceptions are Bail et al. (2020) and Eady et al. (2022), both of which rely on panel survey data combined with data on exposure to tweets by Russian trolls that took place between waves of the panel survey.
A second strand of the Computational Social Science literature involves trying to use machine learning to identify CFIOs.Footnote 16 One approach has been to use the releases of posts from CFIOs by social media platforms as training data for supervised models to identify new CFIOs (or at least new CFIOs that are unknown to the models); both Alizadeh et al. (2020) and Marcellino et al. (2020) report promising findings using this approach. Innes et al. (2021) filter on keywords and then attempt to identify influence campaigns through network analysis; this approach has the advantage of not needing to use training data, although the ultimate findings will of course be a function of the original keyword search. Schliebs et al. (2021) use NLP techniques to look for common phrases or patterns across the posts from Chinese diplomats, thus suggesting evidence of a coordinated campaign. This method also does not require training data, but, unlike either of the previous approaches, does require identifying the potential actors involved in the CFIO as a precursor to the analysis.
Taken together, it is clear that a great deal about the ways in which CFIOs operate in the modern digital era has been learned in a short period of time. That being said, a strikingly large proportion of recent research has focused on the activities of Russian CFIOs around the 2016 US elections; future research should continue to look at influence operations run by other countries with other targets.Footnote 17 There is also clearly a lot more work to be done in terms of understanding the impact of CFIOs, as well as in developing methods for identifying these campaigns. This latter point reflects a fundamental reality of the field, which is that its development has occurred largely because the platforms chose (or were compelled) to release data, and it is to this topic that I turn in some brief concluding remarks in the following section.
6 The Importance of External Data Access
Online hate, disinformation, and online coordinated influence operations all pose potential threats to the quality of democracy, to say nothing of the threats to people whose personal lives may be impacted by being attacked online or being exposed to dangerous misinformation. Computational Social Science – and in particular tools that facilitate working with large collections of (digital trace) data and innovations in machine learning – have important roles to play in helping society understand the nature of these threats, as well as potential mitigation strategies. Indeed, social scientists are getting better and better at incorporating the newest developments in machine learning (e.g. neural networks, pre-trained transformer models) into their research. So many of the results laid out in the previous sections are incredibly impressive and represent research we would not have even conceived of being able to do a decade ago.
That being said, the field as a whole remains dependent on the availability of data. And here, social scientists find themselves in a different position than in years past. Previously, most quantitative social research was conducted either with administrative data (e.g. election results, unemployment data, test scores) or with data – usually survey or experimental – that we could collect ourselves. As Nathaniel Persily and I have noted in much greater detail elsewhere (Persily & Tucker, 2020a, b, 2021), we now find ourselves in a world where the data which we need to do our research on the kinds of topics surveyed in this handbook chapter are “owned” by a handful of very large private companies. Thus, the key to advancing our knowledge of all of the topics discussed in this review, as well as the continued development of related methods and tools, is a legal and regulatory framework that ensures that outside researchers that are not employees of the platforms, and who are committed to sharing the results of their research with both the mass public and policy makers, are able to continue to access the data necessary for this research.Footnote 18
Let me give just two examples. First, almost none of the work surveyed in the previous section on CFIOs would have been possible had Twitter not decided to release its collections of tweets produced by CFIOs after they were taken off the platform. Yes, it is fantastic that Twitter did (and has continued to) release these data, but we as a society do not want to be at the mercy of decisions by platforms to release data for matters as crucial as understanding whether foreign countries are interfering in democratic processes. And just because Twitter has chosen to do this in the past, it does not mean that it will continue to do so in the future. Second, even with all the data that Twitter releases publicly through its researcher API, external researchers still do not have access to impressions data (e.g. how many times tweets were seen and by whom). While some have come up with creative ways to try to estimate impressions, this means that any research that is built around impressions is carrying out studies with unnecessary noise in our estimates; a decision by Twitter tomorrow could change this reality. For all of the topics in this review – hate speech, misinformation, foreign influence campaigns – impressions are crucially important pieces of the puzzle that we are currently missing.
As of the final editing of this essay, though, important steps are being taken on both sides of the Atlantic to try to address this question of data access for external academic researchers. In the United States, a number of bills have recently been introduced in the national legislature that include components aimed at making social media data available to external researchers for public-facing analysis.Footnote 19 While such bills are a still a long way from being made into law, the fact that multiple lawmakers are taking the matter seriously is a positive step forwards. Perhaps more importantly in terms of immediate impact, the European Union’s Digital Services Act (DSA) has provisions allowing data access to “vetted researchers” of key platforms, in order for researchers to evaluate how platforms work and how online risk evolves and to support transparency, accountability, and compliance with the new laws and regulations.Footnote 20
Computational Social Science has a huge role to play in helping us understand some of the most important challenges faced by democratic societies today. The scholarship that is being produced is incredibly inspiring, and the methodological leaps that are occurring in such short periods of time were perhaps previously unimaginable. But at the end of the day, the ultimate quality of the work we are able to do will depend on the data to which we have access. Thus data access needs to be a fundamental part of any forward-facing research plan for improving what Computational Social Science can teach us about threats to democracy.
Notes
- 1.
- 2.
- 3.
- 4.
More generally, if we can extract public opinion data from existing stores of social media data, we can retrospectively examine public opinion on any topic, which is of course impossible in traditional studies of public opinion via survey questionnaires, which are by definition limited to the questions asked in the past. Of course, social media data vary in the extent to which past data are available for retrospective analysis, but platforms where most posts are public (e.g. Twitter, Reddit) offer important opportunities in this regard.
- 5.
The one exception has been a few studies that attempt to use the discussion of issues as a way of teasing out who is leading the public conversation on important policy issues, elites or the mass public (Barberá et al., 2019; Gilardi et al., 2021, 2022). These studies, however, tend to measure attention to multiple topics and issues, but not opinions in regard to these issues.
- 6.
The Greek case involved a referendum, as opposed to a parliamentary election. For a meta-review of 74 related studies, see Skoric et al. (2020).
- 7.
Some studies of hate speech do avoid the need to identify hate speech at scale by the use of surveys and survey experiments (Kaakinen et al., 2018; Kunst et al., 2021; Oksanen et al., 2020; Sim et al., 2020; Soral et al., 2018), or creating one’s own platform in which to observe participant behavior (Álvarez-Benjumea & Winter, 2018, 2020).
- 8.
- 9.
The questions of who believes misinformation and how to correct misinformation are of course crucially important but are generally addressed using survey methodology (Aslett et al., 2022). For a review of the literature on correcting misinformation, see Wittenberg and Berinsky (2020); for more recent research on the value of “accuracy nudges” and games designed to inoculate users against believing false news, see Pennycook et al. (2021) and Maertens et al. (2021), respectively.
- 10.
For an exception to this approach, however, see Godel et al. (2021) which relies on an automated method to select popular articles from five news streams (three of which are low-quality news streams) in real time and then send those articles to professional fact checkers for evaluation as part of the research pipeline.
- 11.
See, for example, https://informationtracer.com/, which is presented in Z. Chen et al. (2021).
- 12.
- 13.
- 14.
In a way, coordinated foreign influence operations that rely on disguised social media accounts – that is, accounts pretending to be actors that they are not – could be considered another form of misinformation, with the identity of the online actors here being the misinformation. It is important to note, though, that coordinated foreign influence operations are not used solely to spread misinformation. Foreign influence operations can, and do, rely on true information in addition to misinformation; indeed, Yin et al. (2018) found that Russian foreign influence accounts on Twitter were actually much more likely to share links to legitimate news sources – and in particular to local news sources – than they were to low-quality news sources.
- 15.
Two other potential sources of data included leaked data and data from actors that researchers can identify – or at least speculate – as being involved in foreign influence activities, such as Chinese ambassadors (Schliebs et al., 2021), the FB pages of Chinese state media (Molter & DiResta, 2020), or the Twitter accounts of Russian state media actors (Metzger & Siegel, 2019). While there have been a series of very interesting papers published based on leaked data to identify coordinated domestic propaganda efforts (Keller et al., 2020; King et al., 2017; Sobolev, 2019), I am not aware of any CFIO studies at this time based on leaked data.
- 16.
Note that there is also a much larger literature on detecting automated social media accounts or bots (Ferrara et al., 2016; Stukal et al., 2017) which is beyond the subject of this review. Bots come up a lot in discussion for CFIOs, as bots can be a useful vehicle for such campaigns. Suffice it to say, Computational Social Science methods play a very important role in the detection of bots.
- 17.
This picture looks a lot more troubling if one takes out the numerous excellent reports produced by the Stanford Internet Observatory on CFIOs targeting Africa and the Middle East.
- 18.
Of course, issues surrounding data access raise very important issues in terms of obligations to users of social media to both protect their privacy and to make sure their voices are heard. The myriad of trade-offs in this regard are far beyond the purview of this chapter, but I invite interested readers to see the discussion of trade-offs between data privacy and data access for public-facing research to inform public policy in Persily and Tucker (2020b, pp. 321–324), the chapter by Taylor (2023) in the present handbook, as well as the proposal for a “Researcher Code of Conduct” – as laid out in Article 40 of the General Data Protection Regulation (GDRP) – by the European Digital Media Observatory multi-stakeholder Working Group on Platform-to-Researcher Data Access: https://edmo.eu/wp-content/uploads/2022/02/Report-of-the-European-Digital-Media-Observatorys-Working-Group-on-Platform-to-Researcher-Data-Access-2022.pdf.
- 19.
See, for example, https://www.coons.senate.gov/news/press-releases/coons-portman-klobuchar-announce-legislation-to-ensure-transparency-at-social-media-platforms; https://www.bennet.senate.gov/public/index.cfm/2022/5/bennet-introduces-landmark-legislation-to-establish-federal-commission-to-oversee-digital-platforms; and https://trahan.house.gov/news/documentsingle.aspx?DocumentID=2112.
- 20.
The Act defines “vetted researchers” as individuals “with an affiliation with an academic institution, independence from commercial interests, proven subject or methodological expertise, and the ability to comply with data security and confidentiality requirements” (Nonnecke & Carlton, 2022). The Act requires platforms to make three categories of data available with online databases or APIs: data needed to assess systemic risks (dissemination of illegal content, impacts on fundamental rights, coordinated manipulation of the platform’s services), “data on the accuracy, functioning, and testing of algorithmic systems for content moderation, recommender systems or advertising systems, and data on processes and outputs of content moderation or internal complaint-handling systems” (Nonnecke & Carlton, 2022), Moreover, VLOPs (very large online platforms) are required, by Article 63, to create a public digital ad repository with information on ad content, those behind ads, whether it was targeted, parameters for targeting, and number of recipients (Nonnecke & Carlton, 2022). Member states will be required to designate independent “Digital Service Coordinators”, who will supervise compliance with the new rules on their territory (https://ec.europa.eu/commission/presscorner/detail/en/QANDA_20_2348). The EU Parliament and Council and Commission reached a compromise regarding the text for the DSA on April 23, 2022. The final text is expected to be confirmed soon, and once formally approved, it will apply after 15 months or from January 1, 2024 (https://ec.europa.eu/commission/presscorner/detail/en/QANDA_20_2348). See as well the discussion of data altruism, and the possibility of donating data for research, in https://www.consilium.europa.eu/en/press/press-releases/2022/05/16/le-conseil-approuve-l-acte-sur-la-gouvernance-des-donnees/.
References
Ackland, R. (2013). Web social science: Concepts, data and tools for social scientists in the digital age. Sage. https://doi.org/10.4135/9781446270011
Adams-Cohen, N. J. (2020). Policy change and public opinion: Measuring shifting political sentiment with social media data. American Politics Research, 48(5), 612–621. https://doi.org/10.1177/1532673X20920263
Albadi, N., Kurdi, M., & Mishra, S. (2019). Hateful people or hateful bots? In Detection and characterization of bots spreading religious hatred in Arabic social media. Retrieved from https://doi.org/10.48550/ARXIV.1908.00153
Ali, S., Saeed, M. H., Aldreabi, E., Blackburn, J., De Cristofaro, E., Zannettou, S., & Stringhini, G. (2021). Understanding the effect of deplatforming on social networks. In 13th ACM web science conference 2021 (pp. 187–195). Retrieved from https://doi.org/10.1145/3447535.3462637
Alizadeh, M., Shapiro, J. N., Buntain, C., & Tucker, J. A. (2020). Content-based features predict social media influence operations. Science Advances, 6(30), eabb5824. https://doi.org/10.1126/sciadv.abb5824
Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211–236. https://doi.org/10.1257/jep.31.2.211
Allcott, H., Gentzkow, M., & Yu, C. (2019). Trends in the diffusion of misinformation on social media. Research & Politics, 6(2), 205316801984855. https://doi.org/10.1177/2053168019848554
Al-Rakhami, M. S., & Al-Amri, A. M. (2020). Lies kill, facts save: Detecting COVID-19 misinformation in twitter. IEEE Access, 8, 155961–155970. https://doi.org/10.1109/ACCESS.2020.3019600
Álvarez-Benjumea, A., & Winter, F. (2018). Normative change and culture of hate: An experiment in online environments. European Sociological Review, 34(3), 223–237. https://doi.org/10.1093/esr/jcy005
Álvarez-Benjumea, A., & Winter, F. (2020). The breakdown of antiracist norms: A natural experiment on hate speech after terrorist attacks. Proceedings of the National Academy of Sciences, 117(37), 22800–22804. https://doi.org/10.1073/pnas.2007977117
Arcila-Calderón, C., Blanco-Herrero, D., Frías-Vázquez, M., & Seoane-Pérez, F. (2021). Refugees welcome? Online hate speech and sentiments in twitter in Spain during the reception of the boat Aquarius. Sustainability, 13(5), 2728. https://doi.org/10.3390/su13052728
Arif, A., Stewart, L. G., & Starbird, K. (2018). Acting the part: Examining information operations within #BlackLivesMatter discourse. Proceedings of the ACM on Human-Computer Interaction, 2, 1–27. https://doi.org/10.1145/3274289
Aslett, K., Guess, A. M., Bonneau, R., Nagler, J., & Tucker, J. A. (2022). News credibility labels have limited average effects on news diet quality and fail to reduce misperceptions. Science Advances, 8(18), eabl3844. https://doi.org/10.1126/sciadv.abl3844
Bail, C. A., Guay, B., Maloney, E., Combs, A., Hillygus, D. S., Merhout, F., Freelon, D., & Volfovsky, A. (2020). Assessing the Russian Internet Research Agency’s impact on the political attitudes and behaviors of American Twitter users in late 2017. Proceedings of the National Academy of Sciences, 117(1), 243–250. https://doi.org/10.1073/pnas.1906420116
Bansal, B., & Srivastava, S. (2019). Lexicon-based Twitter sentiment analysis for vote share prediction using emoji and N-gram features. International Journal of Web Based Communities, 15(1), 85. https://doi.org/10.1504/IJWBC.2019.098693
Barberá, P. (2015). Birds of the same feather tweet together: Bayesian ideal point estimation using Twitter data. Political Analysis, 23(1), 76–91. https://doi.org/10.1093/pan/mpu011
Barberá, P., & Steinert-Threlkeld, Z. C. (2020). How to use social media data for political science research. In I. L. Curini & R. Franzese (Eds.), The Sage handbook of research methods in political science and international relations (pp. 404–423). SAGE Publications Ltd.. https://doi.org/10.4135/9781526486387.n26
Barberá, P., Jost, J. T., Nagler, J., Tucker, J. A., & Bonneau, R. (2015). Tweeting from left to right: Is online political communication more than an echo chamber? Psychological Science, 26(10), 1531–1542. https://doi.org/10.1177/0956797615594620
Barberá, P., Casas, A., Nagler, J., Egan, P. J., Bonneau, R., Jost, J. T., & Tucker, J. A. (2019). Who leads? Who follows? Measuring issue attention and agenda setting by legislators and the mass public using social media data. American Political Science Review, 113(4), 883–901. https://doi.org/10.1017/S0003055419000352
Barisione, M., & Ceron, A. (2017). A digital movement of opinion? Contesting austerity through social media. In M. Barisione & A. Michailidou (Eds.), Social media and European politics (pp. 77–104). Palgrave Macmillan UK. https://doi.org/10.1057/978-1-137-59890-5_4
Barisione, M., Michailidou, A., & Airoldi, M. (2019). Understanding a digital movement of opinion: The case of #RefugeesWelcome. Information, Communication & Society, 22(8), 1145–1164. https://doi.org/10.1080/1369118X.2017.1410204
Bastos, M., & Farkas, J. (2019). “Donald Trump is my president!”: The internet research agency propaganda machine. Social Media + Society, 5(3), 205630511986546. https://doi.org/10.1177/2056305119865466
Beauchamp, N. (2017). Predicting and interpolating state-level polls using Twitter textual data. American Journal of Political Science, 61(2), 490–503.
Bertoni, E., Fontana, M., Gabrielli, L., Signorelli, S., & Vespe, M. (Eds). (2022). Mapping the demand side of computational social science for policy. EUR 31017 EN, Luxembourg, Publication Office of the European Union. ISBN 978-92-76-49358-7, https://doi.org/10.2760/901622
Bojjireddy, S., Chun, S. A., & Geller, J. (2021). Machine learning approach to detect fake news, misinformation in COVID-19 pandemic. In DG.O2021: The 22nd Annual International Conference on Digital Government Research (pp. 575–578). https://doi.org/10.1145/3463677.3463762
Born, K., & Edgington, N. (2017). Analysis of philanthropic opportunities to mitigate the disinformation/propaganda problem.
Boyd, R. L., Spangher, A., Fourney, A., Nushi, B., Ranade, G., Pennebaker, J., & Horvitz, E. (2018). Characterizing the Internet research Agency’s social media operations during the 2016 U.S. presidential election using linguistic analyses [preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/ajh2q
Bradshaw, S., Bailey, H., & Howard, P. (2021). Industrialized disinformation: 2020 global inventory of organized social media manipulation. Computational Propaganda Research Project.
Cano-Orón, L., Calvo, D., López García, G., & Baviera, T. (2021). Disinformation in Facebook ads in the 2019 Spanish General Election Campaigns. Media and Communication, 9(1), 217–228. https://doi.org/10.17645/mac.v9i1.3335
Chen, Z., Aslett, K., Reynolds, J., Freire, J., Nagler, J., Tucker, J. A., & Bonneau, R. (2021a). An automatic framework to continuously monitor multi-platform information spread.
Chen, T. H. Y., Salloum, A., Gronow, A., Ylä-Anttila, T., & Kivelä, M. (2021b). Polarization of climate politics results from partisan sorting: Evidence from Finnish Twittersphere. Global Environmental Change, 71, 102348. https://doi.org/10.1016/j.gloenvcha.2021.102348
Cinelli, M., Pelicon, A., Mozetič, I., Quattrociocchi, W., Novak, P. K., & Zollo, F. (2021). Online Hate. Behavioural Dynamics and Relationship with Misinformation. https://doi.org/10.48550/ARXIV.2105.14005
Dai, Y., Li, Y., Cheng, C.-Y., Zhao, H., & Meng, T. (2021). Government-led or public-led? Chinese policy agenda setting during the COVID-19 pandemic. Journal of Comparative Policy Analysis: Research and Practice, 23(2), 157–175. https://doi.org/10.1080/13876988.2021.1878887
Darwish, K., Stefanov, P., Aupetit, M., & Nakov, P. (2019). Unsupervised user stance detection on Twitter. Retrieved from https://doi.org/10.48550/ARXIV.1904.02000.
Davidson, T., Bhattacharya, D., & Weber, I. (2019). Racial bias in hate speech and abusive language detection datasets. Proceedings of the Third Workshop on Abusive Language Online, 2019, 25–35. https://doi.org/10.18653/v1/W19-3504
Dawson, A., & Innes, M. (2019). How Russia’s internet research agency built its disinformation campaign. The Political Quarterly, 90(2), 245–256. https://doi.org/10.1111/1467-923X.12690
Del Vicario, M., Bessi, A., Zollo, F., Petroni, F., Scala, A., Caldarelli, G., Stanley, H. E., & Quattrociocchi, W. (2016). The spreading of misinformation online. Proceedings of the National Academy of Sciences, 113(3), 554–559. https://doi.org/10.1073/pnas.1517441113
DiResta, R., Grossman, S., & Siegel, A. (2022). In-house vs. outsourced trolls: How digital mercenaries shape state influence strategies. Political Communication, 39(2), 222–253. https://doi.org/10.1080/10584609.2021.1994065
Eady, G., Paskhalis, T., Zilinsky, J., Stukal, D., Bonneau, R., Nagler, J., & Tucker, J. A. (2022). Exposure to the Russian foreign influence campaign on Twitter in the 2016 US election and its relationship to political attitudes and voting behavior.
Elswah, M., Howard, P., & Narayanan, V. (2019). Iranian digital interference in the Arab World. Data memo. Project on Computational Propaganda.
Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots. Communications of the ACM, 59(7), 96–104. https://doi.org/10.1145/2818717
Fischer-Preßler, D., Schwemmer, C., & Fischbach, K. (2019). Collective sense-making in times of crisis: Connecting terror management theory with Twitter user reactions to the Berlin terrorist attack. Computers in Human Behavior, 100, 138–151. https://doi.org/10.1016/j.chb.2019.05.012
Freire-Vidal, Y., & Graells-Garrido, E. (2019). Characterization of local attitudes toward immigration using social media. Retrieved from https://doi.org/10.48550/ARXIV.1903.05072
Gallacher, J. D. (2021). Leveraging cross-platform data to improve automated hate speech detection. Retrieved from https://doi.org/10.48550/ARXIV.2102.04895
Gallacher, J. D., & Bright, J. (2021). Hate contagion: Measuring the spread and trajectory of hate on social media [preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/b9qhd
Garland, J., Ghazi-Zahedi, K., Young, J.-G., Hébert-Dufresne, L., & Galesic, M. (2022). Impact and dynamics of hate and counter speech online. EPJ Data Science, 11(1), 3. https://doi.org/10.1140/epjds/s13688-021-00314-6
Georgiadou, E., Angelopoulos, S., & Drake, H. (2020). Big data analytics and international negotiations: Sentiment analysis of Brexit negotiating outcomes. International Journal of Information Management, 51, 102048. https://doi.org/10.1016/j.ijinfomgt.2019.102048
Gilardi, F., Gessler, T., Kubli, M., & Müller, S. (2021). Social media and policy responses to the COVID-19 pandemic in Switzerland. Swiss Political Science Review, 27(2), 243–256. https://doi.org/10.1111/spsr.12458
Gilardi, F., Gessler, T., Kubli, M., & Müller, S. (2022). Social media and political agenda setting. Political Communication, 39(1), 39–60. https://doi.org/10.1080/10584609.2021.1910390
Godel, W., Sanderson, Z., Aslett, K., Nagler, J., Bonneau, R., Persily, N., & Tucker, J. A. (2021). Moderating with the mob: Evaluating the efficacy of real-time crowdsourced fact-checking. Journal of Online Trust and Safety, 1(1), 10.54501/jots.v1i1.15.
Golovchenko, Y., Buntain, C., Eady, G., Brown, M. A., & Tucker, J. A. (2020). Cross-platform state propaganda: Russian trolls on twitter and YouTube during the 2016 U.S. Presidential Election. The International Journal of Press/Politics, 25(3), 357–389. https://doi.org/10.1177/1940161220912682
Gorwa, R. (2017). Computational propaganda in Poland: False amplifiers and the digital public sphere (2017.4; Computational Propaganda Research Project). University of Oxford.
Greenwood, M. A., Bakir, M. E., Gorrell, G., Song, X., Roberts, I., & Bontcheva, K. (2019). Online abuse of UK MPs from 2015 to 2019: Working paper. Retrieved from https://doi.org/10.48550/ARXIV.1904.11230
Grimaldi, D., Cely, J. D., & Arboleda, H. (2020). Inferring the votes in a new political landscape: The case of the 2019 Spanish Presidential elections. Journal of Big Data, 7(1), 58. https://doi.org/10.1186/s40537-020-00334-5
Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B., & Lazer, D. (2019). Fake news on Twitter during the 2016 U.S. Presidential election. Science, 363(6425), 374–378. https://doi.org/10.1126/science.aau2706
Grossman, S., Bush, D., & DiResta, R. (2019). Evidence of Russia-linked influence operations in Africa. Technical Report Stanford Internet Observatory.
Grossman, S., Ramali, K., DiResta, R., Beissner, L., Bradshaw, S., Healzer, W., & Hubert, I. (2020). Stoking conflict by keystroke: An operation run by IRA-linked individuals targeting Libya, Sudan, and Syria [Technical report]. Stanford Internet Observatory.
Guess, A. M., & Lyons, B. A. (2020). Misinformation, disinformation, and online propaganda. In J. A. Tucker & N. Persily (Eds.), Social media and democracy: The state of the field, prospects for reform (pp. 10–33). Cambridge University Press. Retrieved from https://www.cambridge.org/core/books/social-media-and-democracy/misinformation-disinformation-and-online-propaganda/D14406A631AA181839ED896916598500
Guess, A. M., Nagler, J., & Tucker, J. A. (2019). Less than you think: Prevalence and predictors of fake news dissemination on Facebook. Science Advances, 5(1), eaau4586. https://doi.org/10.1126/sciadv.aau4586
He, B., Ziems, C., Soni, S., Ramakrishnan, N., Yang, D., & Kumar, S. (2021). Racism is a virus: Anti-Asian hate and counterspeech in social media during the COVID-19 crisis. ArXiv:2005.12423 [Physics]. Retrieved from http://arxiv.org/abs/2005.12423
Innes, M., Innes, H., Roberts, C., Harmston, D., & Grinnell, D. (2021). The normalisation and domestication of digital disinformation: On the alignment and consequences of far-right and Russian state (dis)information operations and campaigns in Europe. Journal of Cyber Policy, 6(1), 31–49. https://doi.org/10.1080/23738871.2021.1937252
Isani, M. A. (2021). Methodological problems of using Arabic-language Twitter as a gauge for Arab attitudes toward politics and society. Contemporary Review of the Middle East, 8(1), 22–35. https://doi.org/10.1177/2347798920976283
Jachim, P., Sharevski, F., & Pieroni, E. (2021). TrollHunter2020: Real-time detection of trolling narratives on Twitter during the 2020 U.S. elections. In Proceedings of the 2021 ACM Workshop on Security and Privacy Analytics (pp. 55–65). https://doi.org/10.1145/3445970.3451158.
Jagtap, R., Kumar, A., Goel, R., Sharma, S., Sharma, R., & George, C. P. (2021). Misinformation detection on YouTube using video captions. ArXiv:2107.00941 [Cs]. Retrieved from http://arxiv.org/abs/2107.00941
Kaakinen, M., Oksanen, A., & Räsänen, P. (2018). Did the risk of exposure to online hate increase after the November 2015 Paris attacks? A group relations approach. Computers in Human Behavior, 78, 90–97. https://doi.org/10.1016/j.chb.2017.09.022
Karduni, A., Cho, I., Wesslen, R., Santhanam, S., Volkova, S., Arendt, D. L., Shaikh, S., & Dou, W. (2019). Vulnerable to misinformation?: Verifi! In Proceedings of the 24th International Conference on Intelligent User Interfaces (pp. 312–323). https://doi.org/10.1145/3301275.3302320.
Keller, F. B., Schoch, D., Stier, S., & Yang, J. (2020). Political astroturfing on twitter: How to coordinate a disinformation campaign. Political Communication, 37(2), 256–280. https://doi.org/10.1080/10584609.2019.1661888
Kim, Y. M., Hsu, J., Neiman, D., Kou, C., Bankston, L., Kim, S. Y., Heinrich, R., Baragwanath, R., & Raskutti, G. (2018). The stealth media? Groups and targets behind divisive issue campaigns on Facebook. Political Communication, 35(4), 515–541. https://doi.org/10.1080/10584609.2018.1476425
King, G., Pan, J., & Roberts, M. E. (2017). How the Chinese government fabricates social media posts for strategic distraction, not engaged argument. American Political Science Review, 111(3), 484–501. https://doi.org/10.1017/S0003055417000144
Klašnja, M., Barberá, P., Beauchamp, N., Nagler, J., & Tucker, J. A. (2017). Measuring public opinion with social media data (Vol. 1). Oxford University Press. https://doi.org/10.1093/oxfordhb/9780190213299.013.3
Knuutila, A., Herasimenka, A., Au, H., Bright, J., Nielsen, R., & Howard, P. N. (2020). COVID-related misinformation on YouTube: The spread of misinformation videos on social media and the effectiveness of platform policies. COMPROP Data Memo, 6.
Kosmidis, S., & Theocharis, Y. (2020). Can social media incivility induce enthusiasm? Public Opinion Quarterly, 84(S1), 284–308. https://doi.org/10.1093/poq/nfaa014
Kunst, M., Porten-Cheé, P., Emmer, M., & Eilders, C. (2021). Do “good citizens” fight hate speech online? Effects of solidarity citizenship norms on user responses to hate comments. Journal of Information Technology & Politics, 18(3), 258–273. https://doi.org/10.1080/19331681.2020.1871149
Linvill, D. L., & Warren, P. L. (2020). Troll factories: Manufacturing specialized disinformation on twitter. Political Communication, 37(4), 447–467. https://doi.org/10.1080/10584609.2020.1718257
Lu, Y., Pan, J., & Xu, Y. (2021). Public sentiment on Chinese social media during the emergence of COVID19. Journal of quantitative description: Digital Media, 1, 10.51685/jqd.2021.013.
Lukito, J. (2020). Coordinating a multi-platform disinformation campaign: Internet Research Agency Activity on three U.S. Social Media Platforms, 2015 to 2017. Political Communication, 37(2), 238–255. https://doi.org/10.1080/10584609.2019.1661889
Maertens, R., Roozenbeek, J., Basol, M., & van der Linden, S. (2021). Long-term effectiveness of inoculation against misinformation: Three longitudinal experiments. Journal of Experimental Psychology: Applied, 27(1), 1–16. https://doi.org/10.1037/xap0000315
Marcellino, W., Johnson, C., Posard, M., & Helmus, T. (2020). Foreign interference in the 2020 election: Tools for detecting online election interference. RAND Corporation. https://doi.org/10.7249/RRA704-2
Martin, D. A., & Shapiro, J. N. (2019). Trends in online foreign influence efforts. Princeton University.
Mathew, B., Dutt, R., Goyal, P., & Mukherjee, A. (2018). Spread of hate speech in online social media. Retrieved from https://doi.org/10.48550/ARXIV.1812.01693.
Metzger, M. M., & Siegel, A. A. (2019). When state-sponsored media goes viral: Russia’s use of RT to shape global discourse on Syria. Working Paper.
Micallef, N., He, B., Kumar, S., Ahamad, M., & Memon, N. (2020). The role of the crowd in countering misinformation: A case study of the COVID-19 infodemic. Retrieved from https://doi.org/10.48550/ARXIV.2011.05773.
Mitts, T. (2021). Banned: How Deplatforming extremists mobilizes hate in the dark corners of the Internet.
Molter, V., & DiResta, R. (2020). Pandemics & propaganda: How Chinese state media creates and propagates CCP coronavirus narratives. Harvard Kennedy School Misinformation Review. https://doi.org/10.37016/mr-2020-025
Müller, K., & Schwarz, C. (2021). Fanning the flames of hate: Social media and hate crime. Journal of the European Economic Association, 19(4), 2131–2167. https://doi.org/10.1093/jeea/jvaa045
Munger, K. (2017). Tweetment effects on the tweeted: Experimentally reducing racist harassment. Political Behavior, 39(3), 629–649. https://doi.org/10.1007/s11109-016-9373-5
Munger, K. (2021). Don’t @ Me: Experimentally reducing partisan incivility on Twitter. Journal of Experimental Political Science, 8(2), 102–116. https://doi.org/10.1017/XPS.2020.14
Nonnecke, B., & Carlton, C. (2022). EU and US legislation seek to open up digital platform data. Science, 375(6581), 610–612. https://doi.org/10.1126/science.abl8537
O’Connor, S., Hanson, F., Currey, E., & Beattie, T. (2020). Cyber-enabled foreign interference in elections and referendums. Australian Strategic Policy Institute Canberra.
Oehmichen, A., Hua, K., Amador Diaz Lopez, J., Molina-Solana, M., Gomez-Romero, J., & Guo, Y. (2019). Not all lies are equal. A study into the engineering of political misinformation in the 2016 US Presidential Election. IEEE Access, 7, 126305–126314. https://doi.org/10.1109/ACCESS.2019.2938389
Oh, G., Zhang, Y., & Greenleaf, R. G. (2021). Measuring geographic sentiment toward police using social media data. American Journal of Criminal Justice. https://doi.org/10.1007/s12103-021-09614-z
Oksanen, A., Kaakinen, M., Minkkinen, J., Räsänen, P., Enjolras, B., & Steen-Johnsen, K. (2020). Perceived societal fear and cyberhate after the November 2015 Paris terrorist attacks. Terrorism and Political Violence, 32(5), 1047–1066. https://doi.org/10.1080/09546553.2018.1442329
Olteanu, A., Castillo, C., Boy, J., & Varshney, K. R. (2018). The effect of extremist violence on hateful speech online. Retrieved from https://doi.org/10.48550/ARXIV.1804.05704
Pennycook, G., Epstein, Z., Mosleh, M., Arechar, A. A., Eckles, D., & Rand, D. G. (2021). Shifting attention to accuracy can reduce misinformation online. Nature, 592(7855), 590–595. https://doi.org/10.1038/s41586-021-03344-2
Persily, N., & Tucker, J. A. (2020a). Conclusion: The challenges and opportunities for social media research. In J. A. Tucker & N. Persily (Eds.), Social media and democracy: The state of the field, prospects for reform (pp. 313–331). Cambridge University Press. Retrieved from https://www.cambridge.org/core/books/social-media-and-democracy/conclusion-the-challenges-and-opportunities-for-social-media-research/232F88C00A1694FA25110A318E9CF300
Persily, N., & Tucker, J. A. (Eds.). (2020b). Social media and democracy: The state of the field, prospects for reform (1st ed.). Cambridge University Press. https://doi.org/10.1017/9781108890960
Persily, N., & Tucker, J. A. (2021). How to fix social media? Start with independent research. (Brookings Series on The Economics and Regulation of Artificial Intelligence and Emerging Technologies). Brookings Institution. Retrieved from https://www.brookings.edu/research/how-to-fix-social-media-start-with-independent-research/
Rasmussen, S. H. R., Bor, A., Osmundsen, M., & Petersen, M. B. (2021). Super-unsupervised text classification for labeling online political hate [Preprint]. PsyArXiv. https://doi.org/10.31234/osf.io/8m5dc
Rheault, L., Rayment, E., & Musulan, A. (2019). Politicians in the line of fire: Incivility and the treatment of women on social media. Research & Politics, 6(1), 205316801881622. https://doi.org/10.1177/2053168018816228
Ross, B., Rist, M., Carbonell, G., Cabrera, B., Kurowsky, N., & Wojatzki, M. (2017). Measuring the reliability of hate speech annotations: The case of the European refugee crisis. Retrieved from https://doi.org/10.48550/ARXIV.1701.08118
Sanovich, S., Stukal, D., & Tucker, J. A. (2018). Turning the virtual tables: Government strategies for addressing online opposition with an application to Russia. Comparative Politics, 50(3), 435–482. https://doi.org/10.5129/001041518822704890
Schliebs, M., Bailey, H., Bright, J., & Howard, P. N. (2021). China’s public diplomacy operations: Understanding engagement and inauthentic amplifications of PRC diplomats on Facebook and Twitter.
Shao, C., Ciampaglia, G. L., Varol, O., Yang, K.-C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature Communications, 9(1), 4787. https://doi.org/10.1038/s41467-018-06930-7
Sharma, K., Zhang, Y., Ferrara, E., & Liu, Y. (2021). Identifying coordinated accounts on social media through hidden influence and group behaviours. ArXiv:2008.11308 [Cs]. Retrieved from http://arxiv.org/abs/2008.11308
Siegel, A. A. (2020). Online Hate Speech. In J. A. Tucker & N. Persily (Eds.), Social media and democracy: The state of the field, prospects for reform (pp. 56–88). Cambridge University Press. Retrieved from https://www.cambridge.org/core/books/social-media-and-democracy/online-hate-speech/28D1CF2E6D81712A6F1409ED32808BF1
Siegel, A. A., & Badaan, V. (2020). #No2Sectarianism: Experimental approaches to reducing sectarian hate speech online. American Political Science Review, 114(3), 837–855. https://doi.org/10.1017/S0003055420000283
Siegel, A. A., & Tucker, J. A. (2018). The Islamic State’s information warfare: Measuring the success of ISIS’s online strategy. Journal of Language and Politics, 17(2), 258–280. https://doi.org/10.1075/jlp.17005.sie
Siegel, A. A., Nikitin, E., Barberá, P., Sterling, J., Pullen, B., Bonneau, R., Nagler, J., & Tucker, J. A. (2021). Trumping hate on Twitter? Online hate speech in the 2016 U.S. election campaign and its aftermath. Quarterly Journal of Political Science, 16(1), 71–104. https://doi.org/10.1561/100.00019045
Sim, J., Kim, J. Y., & Cho, D. (2020). Countering sexist hate speech on YouTube: The role of popularity and gender. Bright Internet Global Summit. Retrieved from http://brightinternet.org/wp-content/uploads/2020/11/Countering-Sexist-Hate-Speech-on-YouTube-The-Role-of-Popularity-and-Gender.pdf
Skoric, M. M., Liu, J., & Jaidka, K. (2020). Electoral and public opinion forecasts with social media data: A meta-analysis. Information, 11(4), 187. https://doi.org/10.3390/info11040187
Sobolev, A. (2019). How pro-government “trolls” influence online conversations in Russia.
Soral, W., Bilewicz, M., & Winiewski, M. (2018). Exposure to hate speech increases prejudice through desensitization. Aggressive Behavior, 44(2), 136–146. https://doi.org/10.1002/ab.21737
Stukal, D., Sanovich, S., Bonneau, R., & Tucker, J. A. (2017). Detecting bots on Russian political Twitter. Big Data, 5(4), 310–324. https://doi.org/10.1089/big.2017.0038
Taylor, L. (2023). Data justice, computational social science and policy. In Handbook of computational social science for policy. Springer.
Terechshenko, Z., Linder, F., Padmakumar, V., Liu, F., Nagler, J., Tucker, J. A., & Bonneau, R. (2020). A comparison of methods in political science text classification: Transfer learning language models for politics. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3724644
Theocharis, Y., Barberá, P., Fazekas, Z., & Popa, S. A. (2020). The dynamics of political incivility on twitter. SAGE Open, 10(2), 215824402091944. https://doi.org/10.1177/2158244020919447
Tsakalidis, A., Aletras, N., Cristea, A. I., & Liakata, M. (2018). Nowcasting the stance of social media users in a sudden vote: The case of the Greek referendum. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management (pp. 367–376). https://doi.org/10.1145/3269206.3271783
Tucker, J. A., Theocharis, Y., Roberts, M. E., & Barberá, P. (2018). From liberation to turmoil: Social media and democracy. Journal of Democracy, 28(4), 46–59. https://doi.org/10.1353/jod.2017.0064
Tucker, J. A., Guess, A., Barbera, P., Vaccari, C., Siegel, A., Sanovich, S., Stukal, D., & Nyhan, B. (2018). Social media, political polarization, and political disinformation: A review of the scientific literature. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3144139
Van Bavel, J. J., Harris, E. A., Pärnamets, P., Rathje, S., Doell, K. C., & Tucker, J. A. (2021). Political psychology in the digital (mis)information age: A model of news belief and sharing. Social Issues and Policy Review, 15(1), 84–113. https://doi.org/10.1111/sipr.12077
Velásquez, N., Leahy, R., Restrepo, N. J., Lupu, Y., Sear, R., Gabriel, N., Jha, O. K., Goldberg, B., & Johnson, N. F. (2021). Online hate network spreads malicious COVID-19 content outside the control of individual social media platforms. Scientific Reports, 11(1), 11549. https://doi.org/10.1038/s41598-021-89467-y
Vepsäläinen, T., Li, H., & Suomi, R. (2017). Facebook likes and public opinion: Predicting the 2015 Finnish parliamentary elections. Government Information Quarterly, 34(3), 524–532. https://doi.org/10.1016/j.giq.2017.05.004
Wallis, J., Uren, T., Thomas, E., Zhang, A., Hoffman, S., Li, L., Pascoe, A., & Cave, D. (2020). Retweeting through the great firewall.
Williams, M. L., Burnap, P., Javed, A., Liu, H., & Ozalp, S. (2019). Hate in the machine: Anti-Black and Anti-Muslim social media posts as predictors of offline racially and religiously aggravated crime. The British Journal of Criminology, azz049. https://doi.org/10.1093/bjc/azz049
Williams, M. L., Burnap, P., Javed, A., Liu, H., & Ozalp, S. (2020). Hate in the machine: Anti-Black and Anti-Muslim social media posts as predictors of offline racially and religiously aggravated crime. The British Journal of Criminology, azz049, 242. https://doi.org/10.1093/bjc/azz049
Wittenberg, C., & Berinsky, A. J. (2020). Misinformation and its correction. In J. A. Tucker & N. Persily (Eds.), Social media and democracy: The state of the field, prospects for reform (pp. 163–198). Cambridge University Press. Retrieved from https://www.cambridge.org/core/books/social-media-and-democracy/misinformation-and-its-correction/61FA7FD743784A723BA234533012E810
Wood-Doughty, Z., Andrews, N., Marvin, R., & Dredze, M. (2018). Predicting Twitter user demographics from names alone. Proceedings of the Second Workshop on Computational Modeling of People’s Opinions, Personality, and Emotions in Social Media, 2018, 105–111. https://doi.org/10.18653/v1/W18-1114
Yang, K.-C., Hui, P.-M., & Menczer, F. (2020). How Twitter data sampling biases U.S. voter behavior characterizations. ArXiv:2006.01447 [Cs]. Retrieved from http://arxiv.org/abs/2006.01447
Yildirim, M. M., Nagler, J., Bonneau, R., & Tucker, J. A. (2021). Short of suspension: How suspension warnings can reduce hate speech on twitter. Perspectives on Politics, 1–13, 1. https://doi.org/10.1017/S1537592721002589
Yin, L., Roscher, F., Bonneau, R., Nagler, J., & Tucker, J. A. (2018). Your friendly neighborhood troll: The Internet Research Agency’s use of local and fake news in the 2016 US presidential campaign. SMaPP Data Report, Social Media and Political Participation Lab, New York University.
Zannettou, S., Caulfield, T., Bradlyn, B., De Cristofaro, E., Stringhini, G., & Blackburn, J. (2020). Characterizing the use of images in state-sponsored information warfare operations by Russian trolls on Twitter. Proceedings of the International AAAI Conference on Web and Social Media, 14, 774–785.
Zhu, J., Ni, P., & Wang, G. (2020). Activity minimization of misinformation influence in online social networks. IEEE Transactions on Computational Social Systems, 7(4), 897–906. https://doi.org/10.1109/TCSS.2020.2997188
Acknowledgements
I am extremely grateful to Sophie Xiangqian Yi and Trellace Lawrimore for their incredible research assistance in helping to locate almost all of the literature cited in this review, as well as for providing excellent summaries of what they had found. I would also like to thank Roxanne Rahnama for last-minute research assistance with the current status of EU efforts regarding data access (which included writing most of the text of footnote 20); Rebekah Tromble, Brandon Silverman, and Nate Persily provided helpful suggestions on this topic as well. I would like to thank Matteo Fontana for his very helpful feedback on the first draft of this chapter, as well as the rest of the CSS4P team (Eleonora Bertoni, Lorenzo Gabrielli, Serena Signorelli, and Michele Vespe) for inviting me to contribute the chapter, their patience with my schedule, and their helpful comments and suggestions along the way.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Tucker, J.A. (2023). Computational Social Science for Policy and Quality of Democracy: Public Opinion, Hate Speech, Misinformation, and Foreign Influence Campaigns. In: Bertoni, E., Fontana, M., Gabrielli, L., Signorelli, S., Vespe, M. (eds) Handbook of Computational Social Science for Policy. Springer, Cham. https://doi.org/10.1007/978-3-031-16624-2_20
Download citation
DOI: https://doi.org/10.1007/978-3-031-16624-2_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16623-5
Online ISBN: 978-3-031-16624-2
eBook Packages: Computer ScienceComputer Science (R0)