Exploring Political Bias in Social Media Misinformation Moderation

Table of Contents

Overall Summary

Overview

This research paper examines accusations of political bias in technology companies' moderation of misinformation, suggesting that behavioral differences in misinformation sharing between political groups may lead to uneven enforcement outcomes, despite unbiased policies. The study analyzes data from Twitter, Facebook, and surveys over multiple years and countries, finding that conservative users consistently share more low-quality news as judged by both experts and politically balanced laypeople groups. This difference in behavior is highlighted as a potential explanation for the disproportionate suspension of right-leaning users, indicating that unequal outcomes might not necessarily reflect platform bias.

Key Findings

Strengths

Areas for Improvement

Significant Elements

Table

Description: Table 1 categorizes 60 news domains into 'Mainstream', 'Hyper-partisan', and 'Fake News', providing corresponding quality scores from fact-checkers and politically balanced layperson ratings.

Relevance: This table is crucial for understanding how news quality is measured and categorized in the study, forming the basis for analyzing the relationship between political affiliation and shared news quality.

Figure

Description: Figure 1 illustrates the relationship between political leanings and the quality of shared news, using density plots and charts to demonstrate the association of conservatism with low-quality news sharing.

Relevance: This figure visually supports the study's core finding, showing that differences in user behavior, rather than platform bias, may contribute to unequal enforcement outcomes.

Conclusion

The study provides important insights into the debate about political bias in social media misinformation moderation. By demonstrating that conservative users share more low-quality news across various platforms, it suggests that behavioral differences may explain uneven enforcement outcomes, rather than platform bias. This has significant implications for how social media companies are perceived and how they develop and implement moderation policies. Future research could further explore the causal mechanisms behind these behavioral differences, examine the role of political elites in misinformation dissemination, and test the robustness of findings across additional platforms and contexts.

Section Analysis

Abstract

Overview

This research paper investigates political bias accusations against technology companies moderating misinformation. It argues that differing rates of misinformation sharing between political groups can lead to uneven enforcement outcomes, even with unbiased policies. Using data from Twitter, Facebook, and surveys across multiple years and countries, the study finds a consistent pattern of conservative users sharing more low-quality news, as judged by both experts and politically balanced layperson groups. This difference in behavior, the research suggests, can explain the disproportionate suspension of right-leaning users, highlighting that unequal outcomes don't necessarily indicate platform bias.

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Overview

Social media companies face pressure to combat misinformation, leading to policies like post removals and user suspensions. However, these policies have sparked accusations of political bias. This paper argues that differing misinformation sharing behaviors among political groups can lead to unequal enforcement outcomes even with unbiased policies. The introduction highlights public concern about misinformation and the resulting actions taken by social media companies, emphasizing the controversy surrounding perceived political bias in these actions.

Key Aspects

Strengths

Suggestions for Improvement

Results

Overview

This section presents the results of the study, focusing on Twitter suspensions after the 2020 US election. It shows that while pro-Trump users were more likely to be suspended, they also shared significantly more low-quality news, even when judged by politically balanced groups. This pattern of conservatives sharing more low-quality news is found across multiple datasets and platforms, suggesting that unequal suspension rates may stem from differences in online behavior rather than platform bias. The section also explores how simulating politically neutral suspension policies based on low-quality news sharing or bot likelihood still results in disproportionate impacts on certain political groups.

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

table 1

Table 1 categorizes 60 news domains into 'Mainstream', 'Hyper-partisan', and 'Fake News', providing corresponding quality scores from professional fact-checkers and politically balanced layperson ratings. The fact-checker rating is based on assessments from 8 professional fact-checkers, while the politically balanced layperson rating is an average of trustworthiness scores from Democrats and Republicans (from a sample of 970 laypeople). Higher scores indicate higher quality (trustworthiness).

First Mention

Text: "by fact-checkers and journalists; see Table 1 for a list of the domains used and ref. 38 for details) from 8 professional fact-checkers38"

Context: of 60 news domains (the 20 highest volume sites within each category of mainstream, hyper-partisan and fake news, as determined

Relevance: This table is crucial for understanding how news quality is measured and categorized in the study. It provides the foundation for analyzing the relationship between political affiliation and the quality of news shared.

Critique
Visual Aspects
  • Consider using a color gradient for the rating scales to visually highlight the range of quality.
  • Provide a brief explanation of how the scores are calculated within the table itself, rather than solely in the caption.
  • Use more descriptive column headers. Instead of 'Fact-checker rating' and 'Politically balanced layperson rating', use 'Fact-Checker Trustworthiness Score' and 'Politically Balanced Layperson Trustworthiness Score'.
Analytical Aspects
  • Include the sample sizes for each rating type (fact-checker and layperson) directly in the table or column headers.
  • Consider adding a column indicating the overall quality classification (e.g., 'Low', 'Medium', 'High') based on the ratings.
  • Provide more information on the methodology used to select the 60 news domains.
Numeric Data
figure 1

Figure 1 illustrates how social media users' political leanings relate to the quality of news they share. Subfigures (a) and (b) use density plots to compare the distribution of low-quality news sharing scores for users associated with Biden and Trump hashtags, based on fact-checker and layperson ratings, respectively. Subfigure (c) is a table showing the top 5 most shared news domains by each group. Subfigure (d) presents a bar chart showing the correlation between conservatism and low-quality news sharing across seven different datasets.

First Mention

Text: "t(8,943)= 1.2 × 102, P<0.0001; Fig. 1a)."

Context: than people who used Biden hashtags (t-test,

Relevance: This figure visually demonstrates the core finding of the study: a strong association between political leaning and the quality of news shared on social media. It supports the argument that differences in behavior, rather than platform bias, may contribute to unequal enforcement outcomes.

Critique
Visual Aspects
  • Place the figure closer to its first mention in the text to improve readability and flow.
  • Use consistent color schemes across all subfigures for better visual cohesion.
  • In subfigure (d), label the bars directly with the correlation coefficients to avoid relying solely on the y-axis.
Analytical Aspects
  • In subfigure (d), provide more details about the datasets used, including platform, sample size, and time period, either in the caption or as annotations on the bars.
  • Explain the choice of using density plots in (a) and (b) and justify the standardization (z-scoring) of the scores.
  • Clarify the meaning of 'Low-quality news sharing' and how it's calculated in each dataset in subfigure (d).
Numeric Data
figure 3

This figure illustrates how politically neutral enforcement policies related to low-quality news sharing and bot activity can lead to disproportionate suspension rates for Republicans. It contains three subplots. Subplot (a) shows the predictive accuracy (AUC) of various factors, including political orientation and low-quality news sharing, in predicting Twitter suspensions. Subplots (b) and (c) use simulations to show the expected suspension rates for Democrats and Republicans under different policy 'harshness' levels. Subplot (b) focuses on low-quality news sharing, where harshness is the probability of suspension per low-quality link shared. Subplot (c) focuses on bot activity, where harshness is the minimum probability of being human required to avoid suspension.

First Mention

Text: "Fig. 3 | Suspending users for sharing links to low-quality news sites or for having a high bot score would disproportionately affect Republicans."

Context: This figure is referenced in the context of discussing how unbiased policies can still lead to unequal suspension rates due to differences in behavior between political groups.

Relevance: This figure is crucial for understanding the central argument of the paper. It visually demonstrates that even with politically neutral enforcement policies, differences in behavior, such as sharing low-quality news or bot activity, can lead to unequal outcomes, with Republicans being disproportionately affected.

Critique
Visual Aspects
  • The y-axis labels in subplots (b) and (c) could be more descriptive. Instead of 'Expected fraction of suspended users,' use 'Expected Probability of Suspension'.
  • The caption could be placed closer to the figure for better readability.
  • Consider using different colors or line styles in subplots (b) and (c) to improve the contrast between the lines for Democrats and Republicans.
Analytical Aspects
  • In subplot (a), clarify what the different measures of political orientation and low-quality news sharing represent. Provide a brief explanation of each measure in the caption or figure legend.
  • Explain the simulation methodology used in subplots (b) and (c) more clearly. For example, specify how 'low-quality' news sites were defined and how bot scores were calculated.
  • Discuss the limitations of the simulation, such as the use of pre-election tweets to predict suspensions and the potential for omitted variables.
Numeric Data
figure Extended Data 2

This figure displays the distribution of politically balanced layperson ratings of news domains, plotted against the ratings provided by professional fact-checkers. It's a scatter plot where each point represents a news domain. The x-axis shows the fact-checker rating, and the y-axis shows the average rating from politically balanced layperson groups (average of Democrat and Republican ratings). Orange diamonds highlight the news domains classified as 'low-quality' in the study's simulations.

First Mention

Text: "Extended Data Fig. 2 | Distribution of politically balanced layperson ratings of news domains."

Context: This figure is mentioned in the context of explaining how 'low-quality' news sources were defined for the simulations of politically neutral suspension policies.

Relevance: This figure is important because it shows how the 'low-quality' news sources used in the simulations were determined. It helps address potential concerns about bias in the selection of low-quality sources by showing the agreement between layperson and expert ratings.

Critique
Visual Aspects
  • Add a diagonal line representing perfect agreement between fact-checker and layperson ratings to visually highlight the level of concordance.
  • Label the axes more clearly with 'Fact-Checker Trust Rating' and 'Politically Balanced Layperson Trust Rating'.
  • Increase the size of the data points, especially the orange diamonds, to improve visibility.
Analytical Aspects
  • Explain the criteria used to classify news domains as 'low-quality' (e.g., a specific threshold on the layperson rating).
  • Provide more details about the layperson rating process, such as the sample size and demographics of the participants.
  • Discuss any limitations of using layperson ratings to assess news quality, such as potential susceptibility to biases or lack of expertise.
Numeric Data
figure Extended Data Fig. 4

This figure uses density plots to show the distribution of bot scores, generated by Bot Sentinel, for Twitter users who primarily shared either Biden or Trump hashtags during the 2020 election. A higher bot score indicates a higher likelihood of being a bot. The x-axis represents the bot score (from 0 to 1), and the y-axis represents the relative frequency of users with that score. The distribution for Trump hashtag users (red) is shifted noticeably to the right compared to the distribution for Biden hashtag users (blue). This shift suggests that users who shared Trump hashtags were, on average, rated as more likely to be bots.

First Mention

Text: "Indeed, as with sharing links to low-quality news sites, users on the political right had significantly higher estimated likelihoods of being a bot (0.70 < r < 0.76 depending on political orientation measure, P < 0.0001 for all; Extended Data Fig. 4), and simulating suspension on the basis of likelihood of being a bot leads to much higher suspension rates for Republican accounts than Democrat accounts (Fig. 3c; see the Methods and Supplementary Information section 2 for details)."

Context: The authors are discussing how conservative users not only shared more low-quality news but also had higher bot scores, which could contribute to the higher suspension rates.

Relevance: This figure is relevant because it provides evidence for another behavioral difference between the two political groups, beyond just news sharing quality. The higher bot scores among Trump supporters could contribute to their higher suspension rates, even under a politically neutral anti-bot policy.

Critique
Visual Aspects
  • The color scheme clearly distinguishes between the two groups.
  • The axes are clearly labeled, making the plot easy to interpret.
  • The use of density plots effectively visualizes the distributions of bot scores.
Analytical Aspects
  • The figure effectively demonstrates the difference in bot scores between the two groups.
  • The caption could be improved by explicitly stating the statistical significance of the difference in bot scores.
  • The figure could be further enhanced by adding a measure of central tendency, such as the median bot score, for each group.
Numeric Data
  • Not applicable:
table Extended Data Table 1

This table presents the results of four different regression models (Probit, Probit Ridge, Logit, and Logit Ridge) predicting Twitter account suspension during the 2020 election study. The independent variables include political orientation, low-quality news sharing, bot scores, toxic language use, number of followers, number of friends, and other control variables. Each cell in the table shows the estimated regression coefficient and its standard error in parentheses. Asterisks indicate statistical significance (*p<0.05, **p<0.01, ***p<0.001). The table shows that low-quality news sharing and bot scores are significant predictors of suspension across all models, while political orientation is not a significant predictor in the non-ridge regression models.

First Mention

Text: "We then use probit regression to predict whether the user was suspended as of the end of July 2021, with P values Holm–Bonferroni corrected to adjust for multiple comparisons (see Supplementary Information section 1 for a full list of control variables and Extended Data Table 1 for regression models)."

Context: The authors are explaining their statistical approach to analyze the factors contributing to Twitter suspensions, using regression models to predict suspension based on various user characteristics and behaviors. They refer to Extended Data Table 1 for the full regression results.

Relevance: This table is crucial because it directly addresses the question of whether political orientation or other factors, like low-quality news sharing and bot activity, better explain Twitter suspensions. The regression results help disentangle the effects of these correlated variables.

Critique
Visual Aspects
  • The table is well-organized and easy to read, with clear column headers and row labels.
  • The use of parentheses for standard errors and asterisks for significance levels is standard practice and aids interpretation.
  • The table could be improved by adding a column indicating the number of observations used in each model.
Analytical Aspects
  • The table effectively presents the regression results, allowing readers to assess the statistical significance of each predictor.
  • The caption could be more informative by briefly explaining the different regression models used (Probit, Ridge, Logit).
  • The table would benefit from including measures of model fit, such as R-squared or pseudo-R-squared, to help readers evaluate the overall performance of the models.
Numeric Data
  • Not applicable:
figure 2

This figure shows that conservatives shared more misinformation than liberals on Twitter. It uses violin plots to display the distribution of false news URLs shared by each group, based on both fact-checker ratings (a) and politically balanced crowd ratings (b). The y-axis represents the log10(count of primary posts containing the URL + 1), allowing for visualization of the distribution across a wide range of share counts. Panels (c) and (d) show the correlation between conservatism and the fraction of shared COVID-19 claims rated as false by fact-checkers (c) or inaccurate by layperson crowds (d) across 16 countries. The overall effect is calculated using random effects meta-analysis, and error bars indicate 95% confidence intervals.

First Mention

Text: "Conservatives shared more false claims than liberals"

Context: This is the title of Figure 2, which explores the relationship between political leaning and sharing misinformation on Twitter and in a cross-national survey about COVID-19.

Relevance: This figure directly supports the paper's central argument by demonstrating the behavioral asymmetry in misinformation sharing between conservatives and liberals. This difference in behavior is crucial for understanding how politically neutral enforcement policies can still lead to disparate outcomes.

Critique
Visual Aspects
  • The violin plots could be improved by adding a box plot inside to clearly show the median and quartiles of the distribution.
  • The country labels in the tables (c) and (d) are small and difficult to read. Increasing the font size or using abbreviations would improve readability.
  • The use of different visual representations (violin plots and tables) within the same figure can be slightly disorienting. Consider presenting the correlation data in (c) and (d) as scatter plots or bar charts for better visual consistency.
Analytical Aspects
  • The figure caption could be more explicit about the statistical tests used to compare the distributions in (a) and (b). For example, specifying the use of a Mann-Whitney U test or similar would be helpful.
  • The tables in (c) and (d) present correlation coefficients, but the caption doesn't explain how these correlations were calculated (e.g., Pearson, Spearman).
  • Providing the exact p-values for the correlations in the tables (c) and (d) would strengthen the analysis.
Numeric Data
  • Correlation between conservatism and fraction of shared news that is false (fact-checker ratings): 0.06
  • Correlation between conservatism and fraction of shared news that is false (layperson ratings): 0.05
figure Extended Data 1

This figure compares the distribution of low-quality news site sharing scores between Twitter users who used Trump hashtags and those who used Biden hashtags. It uses four separate density plots, each representing a different news quality rating set: Lasser et al. (2022), Media Bias/Fact Check, Ad Fontes Media, and Republican-Only Layperson ratings. The x-axes show standardized (z-scored) low-quality news sharing scores, where higher scores indicate lower-quality sharing. The y-axes represent the relative frequency of each score within each group. The consistent rightward shift of the Trump hashtag user distribution in all four plots indicates that these users shared lower-quality news regardless of the rating system used.

First Mention

Text: "Extended Data Fig. 1"

Context: Mentioned on page 2 in the context of comparing the average quality of domains shared by people who used Trump hashtags versus Biden hashtags, using various quality rating sources.

Relevance: This figure strengthens the main finding by showing that the observed difference in low-quality news sharing between Trump and Biden supporters is robust across multiple independent news quality rating systems. This robustness helps rule out the possibility that the results are driven by biases in any single rating system.

Critique
Visual Aspects
  • The overlapping density plots can make it difficult to discern the magnitude of the difference between the two groups. Consider using semi-transparent colors or offsetting the plots slightly to improve visibility.
  • The x-axis labels could be more descriptive. Instead of just 'Lasser et al. 2022 Rating', specify the type of rating (e.g., 'Lasser et al. Accuracy Rating').
  • Adding a small table summarizing the key statistics (e.g., means, standard deviations, effect sizes) for each rating system would enhance the figure's informativeness.
Analytical Aspects
  • While the figure shows a clear visual difference, it would be beneficial to quantify the difference between the groups for each rating system. Report effect sizes (e.g., Cohen's d) to provide a measure of the magnitude of the difference.
  • The caption mentions z-scoring, but it would be helpful to clarify whether the scores were standardized within each rating system or across all rating systems.
  • Consider performing statistical tests (e.g., t-tests) to formally compare the distributions for each rating system and report the p-values.
Numeric Data

Methods

Overview

This section details the methodology used in the 2020 election Twitter suspension study. Researchers collected tweets from users who used either #Trump2020 or #VoteBidenHarris2020, focusing on those who shared links to news domains. They used various methods to assess news quality, including ratings from professional fact-checkers and politically balanced groups of laypeople. User political orientation was determined through hashtag usage, followed accounts, and shared news sites. Finally, the researchers simulated politically neutral suspension policies based on low-quality news sharing and bot likelihood to assess potential disparate impacts.

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

figure Extended Data 3

This figure investigates whether there are specific topics for which liberals share more misinformation than conservatives on Twitter. It presents two forest plots. The top plot uses fact-checker ratings of falsity, while the bottom uses ratings from politically balanced crowds. Each plot shows the coefficient from a linear regression predicting the log10 (number of misinformation shares + 1) for each topic (US Politics, Social Issues, COVID-19, Business/Economy, Foreign Affairs, Crime/Justice). A positive coefficient would indicate that conservatives shared more, while a negative coefficient would indicate that liberals shared more. The plots also include an overall estimate across all topics and the weight (%) each topic contributes to the overall analysis. The error bars represent 95% confidence intervals. Importantly, none of the confidence intervals cross zero in the negative direction, and the overall estimates are positive, suggesting no evidence of topics where liberals share more misinformation.

First Mention

Text: "For methodological details, see the Methods; for further analyses, see Supplementary Information section 3.6 and Extended Data Fig. 3."

Context: This is mentioned in the context of analyzing sharing of URLs deemed inaccurate by fact-checkers or politically balanced layperson ratings, estimating user ideology based on followed accounts, and finding conservatives shared more inaccurate URLs.

Relevance: This figure is relevant as it addresses a potential counter-argument: are there any topics where liberals share more misinformation? The findings reinforce the overall trend of greater misinformation sharing by conservatives, showing this pattern holds across various topics and isn't reversed for any specific subject.

Critique
Visual Aspects
  • The figure clearly presents the data using forest plots, which are appropriate for displaying effect sizes and confidence intervals from multiple regressions.
  • The color scheme and labeling are clear and easy to understand.
  • The inclusion of weights for each topic is helpful for understanding their contribution to the overall analysis.
Analytical Aspects
  • The figure effectively uses meta-analysis to combine the results across different topics, providing a more robust overall estimate.
  • The caption clearly explains the meaning of the coefficients and confidence intervals.
  • The analysis could be strengthened by providing the exact p-values for each topic and the overall estimate.
Numeric Data

Extended Data Figures

Overview

This section contains supplementary figures that provide additional context and support for the findings discussed in the main text of the research paper. These figures offer further details and robustness checks related to the relationship between political affiliation, news quality, and Twitter suspensions.

Key Aspects

Strengths

Suggestions for Improvement

Extended Data Table

Overview

This supplementary table provides further details on the regression analyses predicting Twitter account suspensions. It expands on the findings presented in the main text, showing the results of different regression models (Probit, Probit Ridge, Logit, and Logit Ridge) and including a wider range of control variables.

Key Aspects

Strengths

Suggestions for Improvement

Reporting Summary

Overview

This reporting summary outlines the statistical methods, software, data availability, and ethical considerations of the study. It confirms adherence to Nature Portfolio's reporting standards for reproducibility and transparency. The summary details the software used for data collection and analysis (Python, R, and STATA), affirms that the data necessary for reproducing the results is available online, and addresses the study's focus on publicly available social media data, which did not require ethical approval beyond MIT's observational study protocol.

Key Aspects

Strengths

Suggestions for Improvement

↑ Back to Top