Exploring Political Bias in Social Media Misinformation Moderation

Section Analysis

Abstract

Overview

This research paper investigates political bias accusations against technology companies moderating misinformation. It argues that differing rates of misinformation sharing between political groups can lead to uneven enforcement outcomes, even with unbiased policies. Using data from Twitter, Facebook, and surveys across multiple years and countries, the study finds a consistent pattern of conservative users sharing more low-quality news, as judged by both experts and politically balanced layperson groups. This difference in behavior, the research suggests, can explain the disproportionate suspension of right-leaning users, highlighting that unequal outcomes don't necessarily indicate platform bias.

Key Aspects

Uneven suspension rates: The study found that pro-Trump/conservative Twitter users were significantly more likely to be suspended than pro-Biden/liberal users after the 2020 US election.
Disproportionate sharing of low-quality news: Conservative users consistently shared more links to low-quality news sources across multiple datasets and platforms.
Politically balanced evaluation: The study used news quality ratings from both expert fact-checkers and politically balanced groups of laypeople, finding similar results and minimizing potential bias in evaluation.
Confounding factors: The research acknowledges that the observed correlation doesn't imply causation and suggests factors like exposure to misinformation from political elites could contribute to the asymmetry.
Unbiased policies, uneven impact: Simulations of politically neutral anti-misinformation policies demonstrate that even unbiased enforcement can result in disproportionate sanctions against certain groups due to their behavior.

Strengths

Clear and concise writing
The abstract effectively summarizes the research question, methodology, key findings, and implications in a clear and concise manner, making it easy for readers to grasp the main points of the study.

"Although users estimated to be pro-Trump/conservative were indeed substantially more likely to be suspended than those estimated to be pro-Biden/liberal, users who were pro-Trump/conservative also shared far more links to various sets of low-quality news sites...and had higher estimated likelihoods of being bots." (Page 1)
Strong motivation
The abstract clearly establishes the relevance and importance of the research question by highlighting the ongoing debate surrounding political bias in social media moderation and the need for a more nuanced understanding of the issue.

"In response to intense pressure, technology companies have enacted policies to combat misinformation1–4. The enforcement of these policies has, however, led to technology companies being regularly accused of political bias5–7." (Page 1)
Compelling findings
The abstract presents key findings that are both interesting and potentially impactful, suggesting that observed political asymmetries in social media sanctions may be driven by differences in behavior rather than platform bias.

"Thus, even under politically neutral anti-misinformation policies, political asymmetries in enforcement should be expected. Political imbalance in enforcement need not imply bias on the part of social media companies implementing anti-misinformation policies." (Page 1)

Suggestions for Improvement

Quantify key findings
While the abstract mentions key findings, providing specific numbers or effect sizes would strengthen its impact and provide readers with a more concrete understanding of the results.

"Although users estimated to be pro-Trump/conservative were indeed substantially more likely to be suspended..." (Page 1)

Rationale: Quantifying the difference in suspension rates or the extent of low-quality news sharing would make the findings more impactful and persuasive.

Implementation: Include specific statistics, such as the relative risk of suspension or the difference in the average number of low-quality links shared, to illustrate the magnitude of the observed effects.
Briefly mention the scope and limitations
Acknowledging the limitations of the study, such as the focus on specific platforms or time periods, would enhance the abstract's credibility and provide a more balanced perspective.

Rationale: Acknowledging limitations strengthens the research by demonstrating a nuanced understanding of the study's scope and potential generalizability.

Implementation: Add a brief sentence acknowledging the study's limitations, such as "While these findings are based on specific datasets and timeframes, they offer valuable insights into the complex relationship between political behavior and social media moderation."

Introduction

Overview

Social media companies face pressure to combat misinformation, leading to policies like post removals and user suspensions. However, these policies have sparked accusations of political bias. This paper argues that differing misinformation sharing behaviors among political groups can lead to unequal enforcement outcomes even with unbiased policies. The introduction highlights public concern about misinformation and the resulting actions taken by social media companies, emphasizing the controversy surrounding perceived political bias in these actions.

Key Aspects

Public concern about misinformation: The introduction notes widespread concern about misinformation on social media, driving demands for platform action.
Platform policies against misinformation: Social media companies have implemented various policies, including content removal, flagging, algorithmic downranking, and user suspensions.
Accusations of political bias: These policies have led to accusations of bias, particularly claims of targeting conservative viewpoints.
Differential misinformation sharing: The paper's core argument is that pre-existing differences in misinformation sharing behavior among political groups can explain unequal enforcement outcomes.
Case study focus: The introduction indicates that the paper will use Twitter's post-election suspensions as a case study to explore this issue.

Strengths

Clear problem statement
The introduction clearly articulates the problem of perceived political bias in social media's handling of misinformation, setting the stage for the research question.

"These policies, however, have often led to social media companies being accused of political bias in their choices about who and what to take action against." (Page 2)
Contextual background
The introduction provides relevant background information on the prevalence of misinformation and the public's desire for platform intervention, establishing the context for the study.

"Mass communication is a central feature of modern life, with social media having an increasingly important role in the global distribution and consumption of information16." (Page 1)
Logical argument setup
The introduction effectively introduces the paper's central argument, explaining how differential behavior can lead to unequal outcomes even under neutral policies, using a clear analogy.

"For example, if dog-lovers share more misinformation than cat-lovers, we would expect more dog-lovers than cat-lovers to get suspended by social media companies—and would not interpret such a pattern as reflecting bias against dog-lovers." (Page 2)

Suggestions for Improvement

More specific research question
While the core argument is presented, a more explicitly stated research question would enhance clarity and focus.

Rationale: A specific research question helps guide the reader and provides a clearer framework for evaluating the study's findings.

Implementation: Formulate a concise research question, such as, "To what extent can differential misinformation sharing behaviors explain the observed political asymmetries in social media enforcement actions?"
Preview of key findings
Briefly previewing the main findings of the study would increase reader engagement and provide a roadmap for the paper.

Rationale: Previewing the key findings helps readers understand the direction and significance of the research.

Implementation: Include a brief sentence or two summarizing the main findings, such as, "Our analysis reveals a consistent pattern of...leading to...This suggests that..."
Expand on the implications
While the introduction mentions the implications of the research for platform bias accusations, briefly elaborating on the broader societal implications would strengthen its impact.

Rationale: Highlighting the broader implications of the research increases its relevance and potential impact.

Implementation: Add a sentence or two discussing the potential implications for public discourse, political polarization, or trust in social media.

Results

Overview

This section presents the results of the study, focusing on Twitter suspensions after the 2020 US election. It shows that while pro-Trump users were more likely to be suspended, they also shared significantly more low-quality news, even when judged by politically balanced groups. This pattern of conservatives sharing more low-quality news is found across multiple datasets and platforms, suggesting that unequal suspension rates may stem from differences in online behavior rather than platform bias. The section also explores how simulating politically neutral suspension policies based on low-quality news sharing or bot likelihood still results in disproportionate impacts on certain political groups.

Key Aspects

Higher suspension rates for Trump supporters: Analysis of Twitter data reveals a significantly higher suspension rate for users who shared pro-Trump hashtags compared to those who shared pro-Biden hashtags.
Conservatives share more low-quality news: Across multiple datasets and using various rating methods (expert and layperson), conservative users consistently shared more low-quality news than liberal users.
Politically balanced ratings confirm findings: Even when using news quality ratings from politically balanced groups of laypeople, the trend of conservatives sharing more low-quality news remains.
Low-quality news sharing predicts suspension: Sharing low-quality news is a strong predictor of suspension, comparable to political orientation itself.
Simulations demonstrate disparate impact: Simulations of politically neutral suspension policies based on low-quality news sharing or bot likelihood still lead to higher suspension rates for certain political groups.

Strengths

Comprehensive data analysis
The section presents a thorough analysis of multiple datasets, using various metrics and methods to support its claims. This strengthens the validity and generalizability of the findings.

"Across 7 extra datasets, we evaluate the correlation between the average quality of news sources shared (using the set of 60 news sites in Table 1) and political orientation." (Page 3)
Use of politically balanced ratings
Addressing potential bias in news quality evaluation, the study incorporates ratings from politically balanced groups of laypeople, adding an important layer of objectivity to the analysis.

"To evaluate this possibility, we ask whether a similar pattern of results is observed when using evaluations that are designed to minimize the chance of political bias: trustworthiness ratings generated by politically balanced groups of laypeople." (Page 2)
Policy simulations
The use of simulations to explore the impact of hypothetical, politically neutral suspension policies provides valuable insights into the potential for disparate impact even in the absence of bias.

"To answer this question, we use simulations to examine which users would have been suspended if suspension had been based only on sharing links to low-quality news sites...and not at all on political orientation." (Page 5)

Suggestions for Improvement

Clarify causal claims
While the study acknowledges the correlational nature of the data, some language could be further refined to avoid implying causation where it cannot be established.

"This may therefore help to explain the apparent preferential suspension of right-leaning users." (Page 4)

Rationale: Maintaining precision in causal language is crucial for ensuring accurate interpretation of the findings.

Implementation: Rephrase sentences like the quoted example to explicitly state the observed association without suggesting a causal link. For example, "This observed association between low-quality news sharing and suspension rates among right-leaning users warrants further investigation to determine the underlying causal mechanisms."
Provide more context for effect sizes
While the section reports effect sizes, providing more context and interpretation of their magnitude would enhance understanding.

Rationale: Providing context for effect sizes helps readers understand the practical significance of the findings.

Implementation: Explain what the reported effect sizes mean in practical terms. For example, "This substantial difference in average quality suggests..." or provide comparative benchmarks.
Discuss limitations of simulations
While the simulations are valuable, explicitly discussing their limitations, such as the simplified nature of the modeled policies, would strengthen the analysis.

Rationale: Acknowledging the limitations of the simulations increases transparency and helps readers understand the scope of the inferences that can be drawn.

Implementation: Add a paragraph discussing the limitations of the simulations, such as the assumptions made about the suspension policies and the potential for real-world policies to be more complex.

Non-Text Elements

table 1

Table 1 categorizes 60 news domains into 'Mainstream', 'Hyper-partisan', and 'Fake News', providing corresponding quality scores from professional fact-checkers and politically balanced layperson ratings. The fact-checker rating is based on assessments from 8 professional fact-checkers, while the politically balanced layperson rating is an average of trustworthiness scores from Democrats and Republicans (from a sample of 970 laypeople). Higher scores indicate higher quality (trustworthiness).

First Mention

Text: "by fact-checkers and journalists; see Table 1 for a list of the domains used and ref. 38 for details) from 8 professional fact-checkers38"

Context: of 60 news domains (the 20 highest volume sites within each category of mainstream, hyper-partisan and fake news, as determined

Relevance: This table is crucial for understanding how news quality is measured and categorized in the study. It provides the foundation for analyzing the relationship between political affiliation and the quality of news shared.

Critique

Visual Aspects

Consider using a color gradient for the rating scales to visually highlight the range of quality.
Provide a brief explanation of how the scores are calculated within the table itself, rather than solely in the caption.
Use more descriptive column headers. Instead of 'Fact-checker rating' and 'Politically balanced layperson rating', use 'Fact-Checker Trustworthiness Score' and 'Politically Balanced Layperson Trustworthiness Score'.

Analytical Aspects

Include the sample sizes for each rating type (fact-checker and layperson) directly in the table or column headers.
Consider adding a column indicating the overall quality classification (e.g., 'Low', 'Medium', 'High') based on the ratings.
Provide more information on the methodology used to select the 60 news domains.

Numeric Data

figure 1

Figure 1 illustrates how social media users' political leanings relate to the quality of news they share. Subfigures (a) and (b) use density plots to compare the distribution of low-quality news sharing scores for users associated with Biden and Trump hashtags, based on fact-checker and layperson ratings, respectively. Subfigure (c) is a table showing the top 5 most shared news domains by each group. Subfigure (d) presents a bar chart showing the correlation between conservatism and low-quality news sharing across seven different datasets.

First Mention

Text: "t(8,943)= 1.2 × 102, P<0.0001; Fig. 1a)."

Context: than people who used Biden hashtags (t-test,

Relevance: This figure visually demonstrates the core finding of the study: a strong association between political leaning and the quality of news shared on social media. It supports the argument that differences in behavior, rather than platform bias, may contribute to unequal enforcement outcomes.

Critique

Visual Aspects

Place the figure closer to its first mention in the text to improve readability and flow.
Use consistent color schemes across all subfigures for better visual cohesion.
In subfigure (d), label the bars directly with the correlation coefficients to avoid relying solely on the y-axis.

Analytical Aspects

In subfigure (d), provide more details about the datasets used, including platform, sample size, and time period, either in the caption or as annotations on the bars.
Explain the choice of using density plots in (a) and (b) and justify the standardization (z-scoring) of the scores.
Clarify the meaning of 'Low-quality news sharing' and how it's calculated in each dataset in subfigure (d).

Numeric Data

figure 3

This figure illustrates how politically neutral enforcement policies related to low-quality news sharing and bot activity can lead to disproportionate suspension rates for Republicans. It contains three subplots. Subplot (a) shows the predictive accuracy (AUC) of various factors, including political orientation and low-quality news sharing, in predicting Twitter suspensions. Subplots (b) and (c) use simulations to show the expected suspension rates for Democrats and Republicans under different policy 'harshness' levels. Subplot (b) focuses on low-quality news sharing, where harshness is the probability of suspension per low-quality link shared. Subplot (c) focuses on bot activity, where harshness is the minimum probability of being human required to avoid suspension.

First Mention

Text: "Fig. 3 | Suspending users for sharing links to low-quality news sites or for having a high bot score would disproportionately affect Republicans."

Context: This figure is referenced in the context of discussing how unbiased policies can still lead to unequal suspension rates due to differences in behavior between political groups.

Relevance: This figure is crucial for understanding the central argument of the paper. It visually demonstrates that even with politically neutral enforcement policies, differences in behavior, such as sharing low-quality news or bot activity, can lead to unequal outcomes, with Republicans being disproportionately affected.

Critique

Visual Aspects

The y-axis labels in subplots (b) and (c) could be more descriptive. Instead of 'Expected fraction of suspended users,' use 'Expected Probability of Suspension'.
The caption could be placed closer to the figure for better readability.
Consider using different colors or line styles in subplots (b) and (c) to improve the contrast between the lines for Democrats and Republicans.

Analytical Aspects

In subplot (a), clarify what the different measures of political orientation and low-quality news sharing represent. Provide a brief explanation of each measure in the caption or figure legend.
Explain the simulation methodology used in subplots (b) and (c) more clearly. For example, specify how 'low-quality' news sites were defined and how bot scores were calculated.
Discuss the limitations of the simulation, such as the use of pre-election tweets to predict suspensions and the potential for omitted variables.

Numeric Data

figure Extended Data 2

This figure displays the distribution of politically balanced layperson ratings of news domains, plotted against the ratings provided by professional fact-checkers. It's a scatter plot where each point represents a news domain. The x-axis shows the fact-checker rating, and the y-axis shows the average rating from politically balanced layperson groups (average of Democrat and Republican ratings). Orange diamonds highlight the news domains classified as 'low-quality' in the study's simulations.

First Mention

Text: "Extended Data Fig. 2 | Distribution of politically balanced layperson ratings of news domains."

Context: This figure is mentioned in the context of explaining how 'low-quality' news sources were defined for the simulations of politically neutral suspension policies.

Relevance: This figure is important because it shows how the 'low-quality' news sources used in the simulations were determined. It helps address potential concerns about bias in the selection of low-quality sources by showing the agreement between layperson and expert ratings.

Critique

Visual Aspects

Add a diagonal line representing perfect agreement between fact-checker and layperson ratings to visually highlight the level of concordance.
Label the axes more clearly with 'Fact-Checker Trust Rating' and 'Politically Balanced Layperson Trust Rating'.
Increase the size of the data points, especially the orange diamonds, to improve visibility.

Analytical Aspects

Explain the criteria used to classify news domains as 'low-quality' (e.g., a specific threshold on the layperson rating).
Provide more details about the layperson rating process, such as the sample size and demographics of the participants.
Discuss any limitations of using layperson ratings to assess news quality, such as potential susceptibility to biases or lack of expertise.

Numeric Data

figure Extended Data Fig. 4

This figure uses density plots to show the distribution of bot scores, generated by Bot Sentinel, for Twitter users who primarily shared either Biden or Trump hashtags during the 2020 election. A higher bot score indicates a higher likelihood of being a bot. The x-axis represents the bot score (from 0 to 1), and the y-axis represents the relative frequency of users with that score. The distribution for Trump hashtag users (red) is shifted noticeably to the right compared to the distribution for Biden hashtag users (blue). This shift suggests that users who shared Trump hashtags were, on average, rated as more likely to be bots.

First Mention

Text: "Indeed, as with sharing links to low-quality news sites, users on the political right had significantly higher estimated likelihoods of being a bot (0.70 < r < 0.76 depending on political orientation measure, P < 0.0001 for all; Extended Data Fig. 4), and simulating suspension on the basis of likelihood of being a bot leads to much higher suspension rates for Republican accounts than Democrat accounts (Fig. 3c; see the Methods and Supplementary Information section 2 for details)."

Context: The authors are discussing how conservative users not only shared more low-quality news but also had higher bot scores, which could contribute to the higher suspension rates.

Relevance: This figure is relevant because it provides evidence for another behavioral difference between the two political groups, beyond just news sharing quality. The higher bot scores among Trump supporters could contribute to their higher suspension rates, even under a politically neutral anti-bot policy.

Critique

Visual Aspects

The color scheme clearly distinguishes between the two groups.
The axes are clearly labeled, making the plot easy to interpret.
The use of density plots effectively visualizes the distributions of bot scores.

Analytical Aspects

The figure effectively demonstrates the difference in bot scores between the two groups.
The caption could be improved by explicitly stating the statistical significance of the difference in bot scores.
The figure could be further enhanced by adding a measure of central tendency, such as the median bot score, for each group.

Numeric Data

Not applicable:

table Extended Data Table 1

This table presents the results of four different regression models (Probit, Probit Ridge, Logit, and Logit Ridge) predicting Twitter account suspension during the 2020 election study. The independent variables include political orientation, low-quality news sharing, bot scores, toxic language use, number of followers, number of friends, and other control variables. Each cell in the table shows the estimated regression coefficient and its standard error in parentheses. Asterisks indicate statistical significance (*p<0.05, **p<0.01, ***p<0.001). The table shows that low-quality news sharing and bot scores are significant predictors of suspension across all models, while political orientation is not a significant predictor in the non-ridge regression models.

First Mention

Text: "We then use probit regression to predict whether the user was suspended as of the end of July 2021, with P values Holm–Bonferroni corrected to adjust for multiple comparisons (see Supplementary Information section 1 for a full list of control variables and Extended Data Table 1 for regression models)."

Context: The authors are explaining their statistical approach to analyze the factors contributing to Twitter suspensions, using regression models to predict suspension based on various user characteristics and behaviors. They refer to Extended Data Table 1 for the full regression results.

Relevance: This table is crucial because it directly addresses the question of whether political orientation or other factors, like low-quality news sharing and bot activity, better explain Twitter suspensions. The regression results help disentangle the effects of these correlated variables.

Critique

Visual Aspects

The table is well-organized and easy to read, with clear column headers and row labels.
The use of parentheses for standard errors and asterisks for significance levels is standard practice and aids interpretation.
The table could be improved by adding a column indicating the number of observations used in each model.

Analytical Aspects

The table effectively presents the regression results, allowing readers to assess the statistical significance of each predictor.
The caption could be more informative by briefly explaining the different regression models used (Probit, Ridge, Logit).
The table would benefit from including measures of model fit, such as R-squared or pseudo-R-squared, to help readers evaluate the overall performance of the models.

Numeric Data

Not applicable:

figure 2

This figure shows that conservatives shared more misinformation than liberals on Twitter. It uses violin plots to display the distribution of false news URLs shared by each group, based on both fact-checker ratings (a) and politically balanced crowd ratings (b). The y-axis represents the log10(count of primary posts containing the URL + 1), allowing for visualization of the distribution across a wide range of share counts. Panels (c) and (d) show the correlation between conservatism and the fraction of shared COVID-19 claims rated as false by fact-checkers (c) or inaccurate by layperson crowds (d) across 16 countries. The overall effect is calculated using random effects meta-analysis, and error bars indicate 95% confidence intervals.

First Mention

Text: "Conservatives shared more false claims than liberals"

Context: This is the title of Figure 2, which explores the relationship between political leaning and sharing misinformation on Twitter and in a cross-national survey about COVID-19.

Relevance: This figure directly supports the paper's central argument by demonstrating the behavioral asymmetry in misinformation sharing between conservatives and liberals. This difference in behavior is crucial for understanding how politically neutral enforcement policies can still lead to disparate outcomes.

Critique

Visual Aspects

The violin plots could be improved by adding a box plot inside to clearly show the median and quartiles of the distribution.
The country labels in the tables (c) and (d) are small and difficult to read. Increasing the font size or using abbreviations would improve readability.
The use of different visual representations (violin plots and tables) within the same figure can be slightly disorienting. Consider presenting the correlation data in (c) and (d) as scatter plots or bar charts for better visual consistency.

Analytical Aspects

The figure caption could be more explicit about the statistical tests used to compare the distributions in (a) and (b). For example, specifying the use of a Mann-Whitney U test or similar would be helpful.
The tables in (c) and (d) present correlation coefficients, but the caption doesn't explain how these correlations were calculated (e.g., Pearson, Spearman).
Providing the exact p-values for the correlations in the tables (c) and (d) would strengthen the analysis.

Numeric Data

Correlation between conservatism and fraction of shared news that is false (fact-checker ratings): 0.06
Correlation between conservatism and fraction of shared news that is false (layperson ratings): 0.05

figure Extended Data 1

This figure compares the distribution of low-quality news site sharing scores between Twitter users who used Trump hashtags and those who used Biden hashtags. It uses four separate density plots, each representing a different news quality rating set: Lasser et al. (2022), Media Bias/Fact Check, Ad Fontes Media, and Republican-Only Layperson ratings. The x-axes show standardized (z-scored) low-quality news sharing scores, where higher scores indicate lower-quality sharing. The y-axes represent the relative frequency of each score within each group. The consistent rightward shift of the Trump hashtag user distribution in all four plots indicates that these users shared lower-quality news regardless of the rating system used.

First Mention

Text: "Extended Data Fig. 1"

Context: Mentioned on page 2 in the context of comparing the average quality of domains shared by people who used Trump hashtags versus Biden hashtags, using various quality rating sources.

Relevance: This figure strengthens the main finding by showing that the observed difference in low-quality news sharing between Trump and Biden supporters is robust across multiple independent news quality rating systems. This robustness helps rule out the possibility that the results are driven by biases in any single rating system.

Critique

Visual Aspects

The overlapping density plots can make it difficult to discern the magnitude of the difference between the two groups. Consider using semi-transparent colors or offsetting the plots slightly to improve visibility.
The x-axis labels could be more descriptive. Instead of just 'Lasser et al. 2022 Rating', specify the type of rating (e.g., 'Lasser et al. Accuracy Rating').
Adding a small table summarizing the key statistics (e.g., means, standard deviations, effect sizes) for each rating system would enhance the figure's informativeness.

Analytical Aspects

While the figure shows a clear visual difference, it would be beneficial to quantify the difference between the groups for each rating system. Report effect sizes (e.g., Cohen's d) to provide a measure of the magnitude of the difference.
The caption mentions z-scoring, but it would be helpful to clarify whether the scores were standardized within each rating system or across all rating systems.
Consider performing statistical tests (e.g., t-tests) to formally compare the distributions for each rating system and report the p-values.

Numeric Data

Methods

Overview

This section details the methodology used in the 2020 election Twitter suspension study. Researchers collected tweets from users who used either #Trump2020 or #VoteBidenHarris2020, focusing on those who shared links to news domains. They used various methods to assess news quality, including ratings from professional fact-checkers and politically balanced groups of laypeople. User political orientation was determined through hashtag usage, followed accounts, and shared news sites. Finally, the researchers simulated politically neutral suspension policies based on low-quality news sharing and bot likelihood to assess potential disparate impacts.

Key Aspects

Data collection: Tweets and user data were collected from a sample of Twitter users who used election-related hashtags. The sample was balanced between users sharing pro-Trump and pro-Biden hashtags.
News quality assessment: Several methods were used to assess the quality of news shared, including ratings from professional fact-checkers and politically balanced layperson groups. This allowed for comparisons and minimized potential bias.
Political orientation measurement: User political orientation was determined using multiple methods: hashtag usage, followed accounts, and shared news domains. This provided a more comprehensive measure of political leaning.
Policy simulations: The researchers simulated politically neutral suspension policies based on low-quality news sharing and bot likelihood to assess the potential for disparate impact on different political groups.
Additional datasets: The methods section also briefly describes the data and methodologies used in the reanalysis of seven additional datasets from various sources (Twitter, Facebook, and surveys).

Strengths

Detailed data collection process
The methods section provides a thorough description of the data collection process, including the specific hashtags used, the sample size, and the data retrieval methods. This level of detail enhances the reproducibility of the study.

"First, we collected a list of Twitter users who tweeted or retweeted either of the election hashtags #Trump2020 and #VoteBidenHarris2020 on 6 October 2020. We also collected the most recent 3,200 tweets sent by each of those accounts." (Page 9)
Multiple measures of political orientation
Using multiple methods to measure political orientation, including hashtag usage, followed accounts, and shared news sites, provides a more robust and nuanced assessment of user political leanings.

"To measure a user’s political orientation, we first classify their partisanship on the basis of whether they shared more #Trump2020 or #VoteBidenHarris2020 hashtags. Additionally, we retrieved all accounts followed by users in our sample and used the statistical model from ref. 39 to obtain a continuous measure of users’ ideology on the basis of the ideological leaning of the accounts they followed." (Page 9)
Use of politically balanced layperson ratings
Incorporating news quality ratings from politically balanced groups of laypeople helps mitigate potential concerns about bias in the evaluation of news sources.

"For each outlet, we then calculated politically balanced crowd ratings by calculating the average trust among Democrats and the average trust among Republicans, and then averaging those two average ratings." (Page 9)

Suggestions for Improvement

Clarify the rationale for excluding elite users
The methods section mentions excluding "elite" users with more than 15,000 followers. Providing a clearer justification for this exclusion would strengthen the methodology.

"We also excluded 426 ‘elite’ users with more than 15,000 followers who are probably unrepresentative of Twitter users more generally" (Page 9)

Rationale: A clear rationale for excluding certain users is essential for transparency and understanding the potential limitations of the sample.

Implementation: Provide a more detailed explanation of why elite users are considered unrepresentative and how their exclusion might affect the study's findings. Consider providing data on the characteristics of the excluded users to support the rationale.
Provide more details on the policy simulations
While the methods section outlines the general approach to the policy simulations, more specific details on the simulation parameters and assumptions would be beneficial.

"We then define a suspension policy as the probability of a user getting suspended each time they share a link to a low-quality news site." (Page 9)

Rationale: Providing more details on the simulations would enhance the reproducibility of the study and allow readers to better understand the limitations of the simulated scenarios.

Implementation: Specify the range of probabilities used for the suspension policies, the criteria for defining "low-quality" news sites, and any other relevant parameters or assumptions made in the simulations. Consider providing the code used for the simulations in a supplementary material or repository.
Expand on the description of additional datasets
The methods section only briefly mentions the additional datasets used. Providing more details about the data collection, sample characteristics, and methodologies for each dataset would improve the transparency and reproducibility of the analyses.

"Twitter sharing in 2018 and 2020 by users recruited through Prolific." (Page 10)

Rationale: A more detailed description of the additional datasets would allow readers to better understand the context and limitations of the analyses performed on these datasets.

Implementation: For each additional dataset, provide information on the sample size, demographics, data collection methods, time period covered, and any relevant exclusion criteria. Consider adding a table summarizing the key characteristics of each dataset.

Non-Text Elements

figure Extended Data 3

This figure investigates whether there are specific topics for which liberals share more misinformation than conservatives on Twitter. It presents two forest plots. The top plot uses fact-checker ratings of falsity, while the bottom uses ratings from politically balanced crowds. Each plot shows the coefficient from a linear regression predicting the log10 (number of misinformation shares + 1) for each topic (US Politics, Social Issues, COVID-19, Business/Economy, Foreign Affairs, Crime/Justice). A positive coefficient would indicate that conservatives shared more, while a negative coefficient would indicate that liberals shared more. The plots also include an overall estimate across all topics and the weight (%) each topic contributes to the overall analysis. The error bars represent 95% confidence intervals. Importantly, none of the confidence intervals cross zero in the negative direction, and the overall estimates are positive, suggesting no evidence of topics where liberals share more misinformation.

First Mention

Text: "For methodological details, see the Methods; for further analyses, see Supplementary Information section 3.6 and Extended Data Fig. 3."

Context: This is mentioned in the context of analyzing sharing of URLs deemed inaccurate by fact-checkers or politically balanced layperson ratings, estimating user ideology based on followed accounts, and finding conservatives shared more inaccurate URLs.

Relevance: This figure is relevant as it addresses a potential counter-argument: are there any topics where liberals share more misinformation? The findings reinforce the overall trend of greater misinformation sharing by conservatives, showing this pattern holds across various topics and isn't reversed for any specific subject.

Critique

Visual Aspects

The figure clearly presents the data using forest plots, which are appropriate for displaying effect sizes and confidence intervals from multiple regressions.
The color scheme and labeling are clear and easy to understand.
The inclusion of weights for each topic is helpful for understanding their contribution to the overall analysis.

Analytical Aspects

The figure effectively uses meta-analysis to combine the results across different topics, providing a more robust overall estimate.
The caption clearly explains the meaning of the coefficients and confidence intervals.
The analysis could be strengthened by providing the exact p-values for each topic and the overall estimate.

Numeric Data

Extended Data Figures

Overview

This section contains supplementary figures that provide additional context and support for the findings discussed in the main text of the research paper. These figures offer further details and robustness checks related to the relationship between political affiliation, news quality, and Twitter suspensions.

Key Aspects

Additional news quality ratings: Extended Data Figure 1 shows the distribution of low-quality news sharing scores for users who shared Trump versus Biden hashtags, using multiple different news quality rating systems. This demonstrates that the pattern of conservatives sharing more low-quality news is consistent across various ratings.
Layperson vs. fact-checker ratings: Extended Data Figure 2 compares news quality ratings from politically balanced layperson crowds with ratings from professional fact-checkers. This figure is used to define "low-quality" news sources for the policy simulations.
Misinformation sharing by topic: Extended Data Figure 3 investigates whether there are specific topics for which liberals share more misinformation than conservatives on Twitter. The figure presents forest plots showing the relationship between political leaning and misinformation sharing for various topics.
Bot scores and political affiliation: Extended Data Figure 4 shows the distribution of bot scores for Twitter users who shared Trump versus Biden hashtags, indicating that users associated with Trump hashtags were more likely to be bots.
Regression results: Extended Data Table 1 presents the results of regression models predicting Twitter suspension, including political orientation, low-quality news sharing, bot scores, and other variables.

Strengths

Supporting evidence
The figures provide visual and statistical support for the claims made in the main text, strengthening the overall argument.
Robustness checks
The use of multiple news quality rating systems and the comparison with layperson ratings demonstrate the robustness of the findings to different evaluation methods.
Detailed information
The figures and table provide detailed information about the data and analyses, allowing for a more in-depth understanding of the results.

Suggestions for Improvement

Cross-referencing with main text
The justification mentions that the figures support findings in the main text, but it would be helpful to explicitly state which specific findings each figure supports.

"These figures directly support findings discussed in the main text" (Page 12)

Rationale: Clear cross-referencing would improve the connection between the extended data figures and the main narrative of the paper.

Implementation: Add specific references to the relevant sections or figures in the main text when discussing each extended data figure. For example, "Extended Data Figure 1 supports the findings presented in Figure 2 of the main text."
Elaborate on the implications of each figure
While the figures present data, a brief discussion of the implications of each figure would enhance their value and connect them more directly to the research questions.

Rationale: Explaining the implications of each figure would help readers understand their significance and how they contribute to the overall argument.

Implementation: Add a short paragraph after each figure caption summarizing the key takeaways and their relevance to the research questions.
Consider interactive figures
Given the amount of data presented, interactive figures could enhance exploration and understanding. For instance, an interactive version of Extended Data Table 1 could allow readers to filter and sort the data.

Rationale: Interactive figures can make complex data more accessible and engaging for readers.

Implementation: Explore the possibility of creating interactive versions of the figures and table, perhaps as supplementary online material.

Extended Data Table

Overview

This supplementary table provides further details on the regression analyses predicting Twitter account suspensions. It expands on the findings presented in the main text, showing the results of different regression models (Probit, Probit Ridge, Logit, and Logit Ridge) and including a wider range of control variables.

Key Aspects

Regression results: The table presents the coefficients and standard errors for each predictor variable in the models, including political orientation, low-quality news sharing, bot scores, toxic language use, and other control variables.
Statistical significance: Asterisks indicate the level of statistical significance for each coefficient, allowing readers to assess which predictors are most strongly associated with suspension.
Model comparison: The table includes results from four different regression models, allowing for comparison and assessment of the robustness of the findings across different statistical approaches.
Control variables: The inclusion of various control variables helps to isolate the effects of the key predictors of interest and account for potential confounding factors.
Supporting the main text: The table provides more detailed information about the regression analyses discussed in the main text, allowing readers to delve deeper into the statistical findings.

Strengths

Detailed statistical information
The table provides detailed statistical information, including coefficients, standard errors, and significance levels, allowing for a thorough assessment of the regression results.
Multiple models
The inclusion of multiple regression models (Probit, Probit Ridge, Logit, and Logit Ridge) allows for comparison and assessment of the robustness of the findings across different statistical approaches.
Control variables
The inclusion of control variables helps to account for potential confounding factors and isolate the effects of the key predictors of interest.

Suggestions for Improvement

Include effect sizes
While the table provides coefficients and significance levels, adding effect sizes (e.g., standardized coefficients, odds ratios) would enhance the interpretability and practical significance of the findings.

Rationale: Effect sizes provide a standardized measure of the magnitude of the association between predictors and suspension, allowing for easier comparison and understanding of the practical importance of the findings.

Implementation: Calculate and include effect sizes for each predictor variable in the table. For example, for Probit and Logit models, report average marginal effects or standardized coefficients. For ridge regression models, consider reporting standardized coefficients or other appropriate effect size measures.
Clarify the choice of models
The table includes four different regression models, but the rationale for choosing these specific models is not explicitly stated.

Rationale: Explaining the reasons for selecting these particular models would enhance the transparency and methodological rigor of the analysis.

Implementation: Add a brief explanation in the table caption or a footnote justifying the choice of Probit, Probit Ridge, Logit, and Logit Ridge models. Discuss the assumptions of each model and why they are appropriate for the data and research question. For example, if ridge regression is used to address multicollinearity, mention this explicitly.
Provide more information on control variables
The table mentions the inclusion of control variables, but it doesn't specify which control variables were used.

"Models 2 and 4 shows coefficients from ridge regression. For details of the independent variables, see Methods section 1 and SI Section S1." (Page 16)

Rationale: Providing a list of the control variables would improve the transparency and reproducibility of the analysis.

Implementation: Include a list of the control variables in the table caption or a footnote. Briefly describe each control variable and its potential relevance to the outcome variable. If the full list is extensive, consider providing it in a supplementary table.

Reporting Summary

Overview

This reporting summary outlines the statistical methods, software, data availability, and ethical considerations of the study. It confirms adherence to Nature Portfolio's reporting standards for reproducibility and transparency. The summary details the software used for data collection and analysis (Python, R, and STATA), affirms that the data necessary for reproducing the results is available online, and addresses the study's focus on publicly available social media data, which did not require ethical approval beyond MIT's observational study protocol.

Key Aspects

Statistical Reporting: The summary ensures comprehensive reporting of statistical measures, including sample sizes, tests used, covariates, corrections, and effect sizes, promoting transparency and reproducibility.
Software and Code: It specifies the software used for data collection (Python, R) and analysis (STATA), and strongly encourages code deposition in a public repository for enhanced reproducibility.
Data Availability: The summary includes a data availability statement with a web link to the publicly accessible dataset, enabling verification and further research.
Ethical Oversight: It clarifies that the study, being observational and using public social media data, did not require ethical approval but adhered to MIT's protocol for observational studies.
Human Participants and Data: The summary addresses considerations related to human participants, confirming that sensitive demographic information like sex, gender, race, and ethnicity was not collected.

Strengths

Concise and structured reporting
The summary follows a clear structure, addressing key aspects of reporting in a concise and organized manner, facilitating easy access to essential information.

"Nature Portfolio wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency in reporting." (Page 17)
Emphasis on reproducibility
The summary highlights the importance of reproducibility and transparency, aligning with Nature Portfolio's policies and promoting rigorous research practices.

"Nature Portfolio wishes to improve the reproducibility of the work that we publish." (Page 17)
Clear data availability statement
The summary provides a clear and accessible link to the publicly available dataset, enabling others to reproduce and verify the findings.

"All data necessary to reproduce the results are available at https://osf.io/a2t7d/" (Page 17)

Suggestions for Improvement

Elaborate on data restrictions
While the summary mentions data availability, it would be beneficial to explicitly state any restrictions on data access or use, even if none exist.

"A description of any restrictions on data availability" (Page 17)

Rationale: Providing information about data restrictions, or lack thereof, enhances transparency and clarifies the terms of data use for other researchers.

Implementation: Add a sentence explicitly stating whether there are any restrictions on data access or use. For example, "There are no restrictions on data access or use." or specify any applicable limitations.
Provide more context on ethical considerations
While the summary states that ethical approval was not required, providing a brief explanation of the ethical considerations taken into account when using public social media data would strengthen the ethical reporting.

"The study is observational and uses public social media data and did not require ethical approval." (Page 18)

Rationale: Addressing ethical considerations, even in observational studies using public data, demonstrates a commitment to responsible research practices and can help anticipate potential ethical concerns.

Implementation: Add a sentence or two discussing the ethical considerations related to using public social media data, such as privacy concerns and potential risks to individuals. For example, "While the data used in this study is publicly available, we took steps to ensure the privacy of individuals by anonymizing user data and avoiding the collection of sensitive personal information."
Include specific details on statistical methods
The summary mentions adherence to reporting standards but lacks specific details about the statistical methods employed. Providing more information about the specific tests used and any corrections applied would enhance transparency.

"For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section." (Page 17)

Rationale: Providing specific details on the statistical methods used would allow readers to better understand the analytical approach and assess the validity of the findings.

Implementation: Include a brief description of the specific statistical tests used in the study, such as t-tests, chi-square tests, or regression analyses. Mention any corrections applied for multiple comparisons or other statistical adjustments.

Exploring Political Bias in Social Media Misinformation Moderation

Table of Contents

Overall Summary

Overview

Key Findings

Strengths

Areas for Improvement

Significant Elements

Table

Figure

Conclusion

Section Analysis

Abstract

Overview

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Overview

Key Aspects

Strengths

Suggestions for Improvement

Results

Overview

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

First Mention

Numeric Data

First Mention

Numeric Data

First Mention

Numeric Data

First Mention

Numeric Data

First Mention

Numeric Data

First Mention

Numeric Data

First Mention

Numeric Data

First Mention

Numeric Data

Methods

Overview

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

First Mention

Numeric Data

Extended Data Figures

Overview

Key Aspects

Strengths

Suggestions for Improvement

Extended Data Table

Overview

Key Aspects

Strengths

Suggestions for Improvement

Reporting Summary

Overview

Key Aspects

Strengths

Suggestions for Improvement