Phenotypic divergence between individuals with self-reported autistic traits and clinically ascertained autism

Section Analysis

Abstract

Key Aspects

Research Question and Study Design: The study investigates the validity of using online self-report measures for autism spectrum disorder (ASD) research. It compares a clinically ascertained ASD group, recruited in person and evaluated by clinicians, with two groups recruited online through Prolific: one with high self-reported autistic traits and one with low self-reported autistic traits. The aim is to determine if online samples based on self-report are comparable to clinically diagnosed samples.
Key Findings: Trait Differences: The study found that despite similar levels of self-reported autistic traits, the online high-trait group reported significantly higher levels of social anxiety and avoidant symptoms compared to the in-person ASD group. Furthermore, within the in-person ASD group, there was no correlation between self-reported and clinician-rated autistic traits. This suggests that self-report and clinical assessments may capture different aspects of the ASD phenotype.
Key Findings: Behavioral Differences: The study also examined social behavior using two decision-making tasks. The in-person ASD group showed differences in social tendencies, being less perceptive of social influence opportunities and less affiliative towards virtual characters compared to the online groups. These behavioral differences further highlight the divergence between the clinically ascertained sample and the online sample defined by self-reported traits.
Conclusion and Implications: The main conclusion of the study is that there is a need to differentiate between clinically ascertained and trait-defined samples in autism research. The findings suggest that relying solely on self-reported traits in online studies may not accurately identify individuals with ASD and may lead to misleading results, particularly in the context of social interaction research. This has significant implications for the design and interpretation of online studies in the field of autism.

Strengths

Clear Research Question
The abstract clearly states the research question, comparing in-person recruited, clinically-assessed individuals with autism to online-recruited individuals with high and low autistic traits.

"Here we compared 56 adults with ASD recruited in person and evaluated by clinicians to matched samples of adults recruited through an online platform (Prolific; 56 with high autistic traits and 56 with low autistic traits) and evaluated via self-reported surveys." (Page 1)
Concise Summary of Findings
The abstract concisely summarizes the key findings, highlighting the differences between the groups in social anxiety, avoidant symptoms, and social tendencies during decision-making tasks.

"Despite having comparable self-reported autistic traits, the online high-trait group reported significantly more social anxiety and avoidant symptoms than in-person ASD participants...The groups also differed in their social tendencies during two decision-making tasks; the in-person ASD group was less perceptive of opportunities for social influence and acted less affiliative toward virtual characters." (Page 1)
Clear Conclusion and Implication
The abstract succinctly states the main conclusion and its implication for autism research, emphasizing the need for differentiation between clinically ascertained and trait-defined samples.

"These findings highlight the need for a differentiation between clinically ascertained and trait-defined samples in autism research." (Page 1)

Suggestions for Improvement

Specify Self-Report Measures
High impact. This would enhance the reader's understanding of the study's scope and the specific context of the findings. It is appropriate for the abstract as it provides a concise overview of the methods used.

"Here we compared 56 adults with ASD recruited in person and evaluated by clinicians to matched samples of adults recruited through an online platform (Prolific; 56 with high autistic traits and 56 with low autistic traits) and evaluated via self-reported surveys." (Page 1)

Implementation: Include a brief phrase indicating the types of self-report surveys used (e.g., '...evaluated via self-reported surveys on autistic traits, social anxiety, and avoidant personality.').
Include Finding on Self- vs. Clinician-Rated Traits
Medium impact. This would improve the abstract's completeness by providing a more comprehensive overview of the study's findings, impacting reader understanding. It is appropriate for the abstract as it gives a quick overview of the findings.

"Within the in-person sample, there was no relationship between self-rated and clinician-rated autistic traits, suggesting they may capture different aspects of ASD." (Page 1)

Implementation: Add a sentence briefly summarizing the finding regarding the lack of relationship between self-rated and clinician-rated autistic traits within the in-person sample. For example: 'Notably, within the clinically-assessed ASD group, self-reported and clinician-rated autistic traits were not correlated.'

Introduction

Key Aspects

Online Data Collection in Research: The introduction establishes the growing trend of using online platforms like Amazon Mechanical Turk and Prolific for data collection in human-participant research. It highlights the benefits, such as rapid recruitment and large sample sizes, but also raises concerns about data quality and validity, particularly in studies of psychiatric and neurodevelopmental disorders.
Limitations of Self-Report in ASD Research: The introduction focuses on the limitations of relying solely on self-report surveys for identifying individuals with autism spectrum disorder (ASD) in online research. It points out that many psychiatric conditions, including ASD, are characterized by altered insight and metacognitive awareness, and that many psychiatric surveys lack diagnostic specificity in the general population.
Self-Report vs. External Assessment in ASD: The introduction discusses the potential discrepancies between self-reported and externally assessed traits in individuals with ASD, relating these discrepancies to core socioemotional symptoms, including differences in insight and theory of mind (ToM). It suggests that self-reports may not provide a complete picture of an individual's behaviors and social functioning as perceived by others.
Differential Meaning of Self-Reported Traits: The introduction introduces the concept that self-reported traits may have different meanings in individuals with and without a clinical ASD diagnosis. It suggests that high self-reported autistic traits in individuals without an ASD diagnosis may reflect social anxiety rather than core ASD features, while individuals with ASD may underreport their social difficulties due to impaired insight.
Study Objective and Focus: The introduction clearly states the study's objective: to systematically examine trait- and sociocognitive-level similarities and differences between adults with high autistic traits recruited online via self-report and adults with ASD defined via in-person clinical characterization. It also highlights the focus on social behavior, given that social differences are a core feature of ASD.
Study Hypothesis: The introduction presents the study's hypothesis: that individuals recruited online who self-report high autistic traits would show distinct social interaction tendencies compared to a clinically defined in-person ASD sample. This hypothesis underscores the potential limitations of using self-reported traits alone to identify adults with ASD in online studies.

Strengths

Clear Motivation
The introduction clearly establishes the motivation for the study by highlighting the increasing use of online platforms for data collection in human-participant research and the associated concerns about data quality and validity, particularly in the context of neuropsychiatric disorders.

"Online platforms such as Amazon Mechanical Turk and Prolific have become increasingly popular for data collection in human-participant research1,2. Through such platforms, researchers can rapidly collect data from hundreds or thousands of participants, allowing for better-powered and more-diverse samples than traditional laboratory data collection. Despite the many benefits of online research, there are also considerable concerns about the quality and validity of such data3." (Page 1)
Problem Statement: Self-Report Limitations
The introduction effectively introduces the core problem of relying on self-report measures for autism spectrum disorder (ASD) research in online settings, where clinical characterizations are typically absent. It points out the potential limitations of self-report due to altered insight and diagnostic specificity issues.

"For studies on psychiatric and neurodevelopmental disorders, current online research also relies primarily on self-report surveys to capture traits and diagnoses. Many of these conditions are characterized by altered insight and/or metacognitive awareness6–8, and many psychiatric surveys lack diagnostic specificity in the general population9,10. Therefore, self-report alone may not be the most accu rate way to identify individuals with certain diagnoses or assess observable behavior in certain domains." (Page 1)
Relevance to ASD
The introduction logically connects the general problem of online research validity to the specific context of ASD, highlighting the known discrepancies between self- and caregiver-reported traits and the potential impact of altered theory of mind (ToM) on self-report accuracy in individuals with ASD.

"In autistic individuals, discrepancies between self-reported and externally assessed traits may relate to core socioemotional symp- toms, including differences in insight and theory of mind (ToM)." (Page 2)
Clear Objective
The introduction clearly states the study's objective, which is to systematically examine the differences between adults with high autistic traits recruited online via self-report and adults with ASD defined via in-person clinical characterization.

"In this study, we sought to systematically examine trait- and sociocognitive-level similarities and differences between adults with high autistic traits recruited online via self-report and adults with ASD defined via in-person clinical characterization." (Page 2)
Overview of Approach and Hypothesis
The introduction provides a concise overview of the study's approach, mentioning the use of dynamic social interaction tasks to compare the online and in-person samples. It also states the hypothesis.

"As social differences are a core feature of ASD, we chose to compare the online and in-person samples on their behavior during dynamic social interac- tion tasks... We hypothesized that individuals recruited online who self-reported high autistic traits would show distinct social interaction tendencies compared with a clinically defined in-person ASD sample." (Page 2)

Suggestions for Improvement

Include ASD Prevalence and Importance
Medium impact. This would strengthen the rationale for the study by providing context. This belongs in the introduction to set the stage for the research.

"Autism spectrum disorder (ASD) research is one example for which the clinical validity of such an approach remains elusive." (Page 1)

Implementation: Include a sentence or two summarizing the prevalence of ASD and the importance of accurate diagnosis and research in this area. For example: 'Given the increasing prevalence of ASD and the significant impact it has on individuals' lives, rigorous research is crucial for developing effective interventions and supports. However, the challenges of traditional, in-person research have led to a growing reliance on online data collection methods.'
Elaborate on Social Interaction Tasks
Low impact. This would help readers understand the specific aspects of social interaction being investigated. This belongs in the introduction to provide a more complete picture of the research focus.

"As social differences are a core feature of ASD, we chose to compare the online and in-person samples on their behavior during dynamic social interac- tion tasks." (Page 2)

Implementation: Expand slightly on the description of the dynamic social interaction tasks, providing a brief, non-technical explanation of what they involve. For example: 'These tasks involved simulated social scenarios where participants made decisions that influenced their relationships with virtual characters, allowing us to assess their social navigation and influence strategies.'
Improve Transition to Study Objective
High Impact. This would enhance the flow of the introduction. It is important to establish this connection early in the paper.

"In this study, we sought to systematically examine trait- and sociocognitive-level similarities and differences between adults with high autistic traits recruited online via self-report and adults with ASD defined via in-person clinical characterization." (Page 2)

Implementation: Add transition sentence. For example: 'To address the limitations of online self-report in ASD research, this study directly compares...'

Results

Key Aspects

Group Definitions and Matching: The study defined three groups: a clinically ascertained ASD group (recruited in-person), a high-trait group (recruited online with high self-reported autistic traits), and a low-trait group (recruited online with low self-reported autistic traits). The groups were matched on age and sex/gender.
Differences in Self-Reported Traits: As expected, the three groups differed significantly in self-reported autistic traits as measured by the BAPQ. The high-trait and ASD groups had comparably high BAPQ scores, while the low-trait group had significantly lower scores. However, the high-trait group reported significantly higher levels of social anxiety and avoidant personality disorder (AVPD) symptoms compared to both the ASD and low-trait groups.
Discrepancy Between Self-Report and Clinician Ratings: Within the in-person ASD group, there was no significant relationship between self-reported autistic traits (BAPQ) and clinician-rated autistic traits (ADOS). This finding suggests that self-report and clinician-administered assessments may capture different aspects of the ASD phenotype.
Social Controllability Task Results: In the social controllability task, the ASD group rejected a lower percentage of high offers in the controllable condition compared to the online groups. Furthermore, the ASD group perceived less control in the controllable condition and more control in the uncontrollable condition compared to the online groups. This suggests a reduced ability to exert and perceive social influence in the ASD group.
Social Navigation Task Results: In the social navigation task, the ASD and high-trait groups reported reduced liking of the virtual characters compared to the low-trait group. However, the ASD group acted significantly less affiliative towards the characters than both online groups, indicating a difference in social behavior despite similar subjective ratings.
Relationship Between Traits and Behavior: The relationship between self-reported autistic traits and affiliative behavior in the social navigation task differed by group. Only the ASD group showed a negative correlation between self-reported traits and affiliation tendency, suggesting that higher self-reported autistic traits were associated with less affiliative behavior in the clinically ascertained group.

Strengths

Clear Group Definitions and Demographic Information
The Results section clearly presents the demographic and trait characteristics of the three groups (ASD, high-trait, and low-trait), demonstrating the expected differences in self-reported autistic traits (BAPQ scores) based on the group definitions. It also effectively uses Table 1 to summarize demographic information.

"Participants were enrolled from an online pool consisting of ‘unse- lected’ adults from the community (via Prolific, n = 502) or had diag- noses of ASD confirmed and enrolled for participation in person (at the Seaver Autism Center in New York City; see Methods for details; n = 56). The online sample was further subdivided into ‘high-trait’ (n = 168) and ‘low-trait’ (n = 121) groups based on their total scores on the BAPQ17. From within each of these groups, 56 age- and sex/gender-matched participants were selected to match the in-person ASD sample. This resulted in three groups with 56 participants each: high-trait, low-trait and ASD. See Table 1 for demographic characteristics of each group." (Page 2)
Comprehensive Assessment of Social Traits
The section effectively presents the findings on social anxiety and avoidant personality disorder (AVPD) symptoms, highlighting the differences between the groups and demonstrating that the high-trait group reported higher levels of these symptoms compared to both the low-trait and ASD groups.

"To evaluate the specificity of self-reported traits to ASD versus other disorders characterized by social differences, we also evalu- ated social anxiety and avoidant personality disorder (AVPD) symp- toms...The groups differed in their social anxiety symptoms (F(2,163) = 59.80, P = 3.33 × 10–20, ηpartial2 = 0.42; Fig. 1b), such that the high-trait group had higher scores (indicating more symptoms) than both the low-trait group...and the ASD group...and the ASD group had higher scores than the low-trait group...Finally, the groups differed in their AVPD traits (F(2,163) = 107.84, P = 1.46 × 10–30, ηpartial2 = 0.57; Fig. 1c)." (Page 2)
Key Finding: Discrepancy Between Self-Report and Clinician Ratings
The section clearly reports the lack of a significant relationship between self-reported ASD traits (BAPQ) and clinician-rated traits (ADOS) within the in-person ASD group, highlighting a key finding that challenges the assumption of agreement between these measures.

"Surprisingly, there was no significant relationship between self-reported ASD traits measured by BAPQ and those rated by clini- cians using ADOS (b = 0.025, s.e.m. = 0.02, t(51) = 1.16, P = 0.251, 95% CI [0.0, 1.0], ηpartial2 = 0.01; Fig. 1d)." (Page 2)
Clear Presentation of Social Controllability Results
The section effectively presents the results of the social controllability task, showing that the ASD group rejected a smaller percentage of high offers in the controllable condition and perceived less control, indicating a reduced ability to exert social influence.

"Together, these results suggest that high-trait online participants behaved more similarly to the low-trait online group than to the clinical ASD group during control- lable social interactions, whereas the clinical ASD group demonstrated distinctly reduced ability to exert control." (Page 4)
Clear Presentation of Social Navigation Results
The section clearly presents the findings of the social navigation task, showing that the ASD group acted less affiliative with the characters, despite similar subjective ratings of character liking, highlighting a difference in social behavior.

"Compared with the low-trait group, both the high-trait group...and the ASD group...self-reported reduced liking of characters. The high-trait and ASD groups did not differ in their character liking...suggesting comparable subjective experiences...the ASD group acted significantly less affiliative with the characters than both the high-trait group...and the low- trait group...indicating unique behavioral tendencies in the clinically defined sample." (Page 6)
Thorough Statistical Reporting
The results are presented with appropriate statistical details, including F-statistics, p-values, effect sizes, and confidence intervals, allowing for a thorough evaluation of the findings.

"As anticipated owing to how the groups were defined, the three groups differed in their self-reported autistic traits, as measured by BAPQ scores (F(2,163) = 232.86, P = 1.66 × 10–48, ηpartial2 = 0.74; Fig. 1a)." (Page 2)

Suggestions for Improvement

Add Subheadings for Clarity
High impact. This would improve the clarity and organization of the Results section. Currently, the section jumps between different types of results (trait comparisons, social behavior tasks) without clear subheadings to guide the reader. This makes it harder to follow the flow of the findings and understand the overall structure of the results.

"Results" (Page 2)

Implementation: Introduce clear subheadings within the Results section to organize the findings. For example: 'Autistic and Other Social Traits', 'Social Controllability Task', 'Social Navigation Task', 'Relationship Between Self-Reported Traits and Behavior'.
Interpret Findings More Explicitly
Medium impact. While the Results section reports the statistical findings, it could benefit from a more explicit interpretation of these findings in the context of the study's hypotheses and research questions. This would help the reader understand the significance of the results and how they relate to the broader aims of the study.

"As anticipated owing to how the groups were defined, the three groups differed in their self-reported autistic traits, as measured by BAPQ scores (F(2,163) = 232.86, P = 1.66 × 10–48, ηpartial2 = 0.74; Fig. 1a)." (Page 2)

Implementation: After presenting the statistical results for each analysis, add a sentence or two briefly interpreting the findings in plain language and relating them back to the study's hypotheses. For example, after reporting the BAPQ differences, state: 'This confirms our initial expectation that the groups would differ in self-reported autistic traits based on their recruitment criteria.'
Provide Brief Task Descriptions
Low impact. This would provide additional context for interpreting the results. The current presentation assumes reader familiarity with the tasks.

"Social controllability. Social controllability, or one’s ability to influ- ence other people, is crucial for achieving optimal behavior during" (Page 3)

Implementation: Briefly describe the core concepts being measured by each task. For example, before presenting the social controllability results, add a sentence like: 'The social controllability task measures participants' ability to influence others' offers in a monetary exchange game, reflecting their capacity to exert social influence.'
Clarify Measures Used
Low impact. This would help readers better understand the specific measures used. It is appropriate for the Results section to briefly introduce the measures before presenting the findings.

"As anticipated owing to how the groups were defined, the three groups differed in their self-reported autistic traits, as measured by BAPQ scores" (Page 2)

Implementation: When first mentioning each measure (BAPQ, ADOS, etc.), briefly state what it measures. For example: '...self-reported autistic traits, as measured by the Broad Autism Phenotype Questionnaire (BAPQ), a self-report instrument assessing autistic traits...' and '...clinician-rated autistic traits measured via the Autism Diagnostic Observation Schedule (ADOS-2), a standardized diagnostic assessment...'
Add Visual Representations of Group Differences
Medium impact. This would help readers visualize the magnitude of the differences between groups. While the statistical values are reported, adding a visual representation would make the results more accessible.

"Fig. 1 | Trait comparisons." (Page 4)

Implementation: Consider adding bar graphs or box plots to visually represent the group differences in key measures (e.g., BAPQ scores, social anxiety scores, rejection rates, affiliation tendencies). Ensure the figures are clearly labeled and referenced in the text.

Non-Text Elements

Table 1 | Group demographic information

Figure/Table Image (Page 3)

First Reference in Text

See Table 1 for demographic characteristics of each group.

Description

Overview: Table 1 presents the demographic information for three groups of participants: a 'Clinical ASD' group, a 'High-trait' group, and a 'Low-trait' group. These groups are compared across several demographic variables to assess the differences in their group compositions.
Sample Sizes: The table includes the sample size ('n') for each group, which is 56 for each of the three groups (Clinical ASD (n=56), High-trait (n=56), Low-trait (n=56)).
Age Statistics: Age is reported as the mean (average) along with the standard deviation (a measure of the spread or variability) for each group. For example, the Clinical ASD group has a mean age of 28.07 years with a standard deviation of 8.53 years. The other groups similarly report their mean age and standard deviation.
Gender Breakdown: Gender is reported as the percentage of women, men, non-binary individuals, and those who did not report their gender in each group. For instance, in the Clinical ASD group, 30.3% are women, 51.8% are men, and 14.3% are non-binary.
Sex Breakdown: Sex, reported as the percentage of female and male participants, is also included. For example, in the Clinical ASD group, 51.8% are female and 48.2% are male.
Ethnicity: Ethnicity is described using percentages for different ethnic groups such as American Indian or Alaska Native, Asian, Black, White, and Other. For example, in the Clinical ASD group, 0% are American Indian or Alaska Native, 10.7% are Asian, 19.7% are Black, 57.1% are White, and 12.5% are Other.
IQ and Cognitive Ability: The table also includes the mean and standard deviation for IQ (Intelligence Quotient) for the Clinical ASD group (Mean: 112.38, Standard Deviation: 16.26) and a measure of Cognitive ability for the High-trait and Low-trait groups (Mean: 7.40, Standard Deviation: 3.42 and Mean: 7.02, Standard Deviation: 3.53, respectively). IQ scores are designed to represent a person's reasoning and problem-solving abilities, with the average score set at 100 and the standard deviation at 15.
Employment Status: Employment status is reported as the percentage of employed and unemployed individuals, with some participants not reporting their employment status. For example, in the Clinical ASD group, 48.2% are employed and 51.8% are unemployed.
Household Income: Household income is categorized into income brackets such as 10-50k, 50-100k, and >100k, with percentages reported for each group. For example, in the Clinical ASD group, 42.9% have a household income between 10-50k.
Education Level: Education level is reported as the percentage of individuals with different levels of education such as graduate school, college, some college, high school, some high school, and no high school. For example, in the Clinical ASD group, 17.9% have a graduate school education.
Statistical Significance: The table includes a column labeled 'Group difference,' indicating whether there were statistically significant differences between the groups for each demographic variable. The statistical significance is determined using Kruskal-Wallis tests (for continuous variables) and Chi-squared tests (for categorical variables). A p-value is reported to indicate the likelihood that the observed difference occurred by chance, where a p-value less than 0.05 is commonly considered statistically significant.

Scientific Validity

Essential Demographic Information: The table provides essential demographic information, which is crucial for understanding the composition of each study group and assessing the generalizability of the findings. The range of demographic variables is comprehensive and relevant.
Appropriate Statistical Tests: The use of appropriate statistical tests (Kruskal-Wallis and Chi-squared) for different types of variables is methodologically sound. However, the specific post-hoc tests used to determine pairwise group differences following a significant omnibus test are not explicitly stated, which should be clarified.
Lack of Effect Sizes: The inclusion of p-values allows the reader to quickly identify statistically significant group differences. However, reporting effect sizes alongside p-values would provide a more complete picture of the magnitude of these differences.
Inconsistent Cognitive Measures: While the table provides IQ scores for the Clinical ASD group and cognitive ability scores for the other groups, these measures are not directly comparable. Ideally, the same cognitive assessment should be used across all groups to allow for more meaningful comparisons. If this was not possible, justification for the use of different measures should be provided.

Communication

Clear Reference: The reference to Table 1 in the Results section is clear and appropriately placed, guiding the reader to the relevant information about the participant groups.
Potential for Overwhelm: The table provides a comprehensive overview of the demographic characteristics, but the sheer volume of information might overwhelm some readers. Strategic use of bolding or shading could highlight key differences between groups.
Lack of Visual Emphasis on Significance: While the table includes statistical test results (p-values), more visual cues could enhance the immediate understanding of significant group differences. For example, using asterisks to denote statistical significance directly in the table would be beneficial.

Fig. 1 | Trait comparisons. a, The ASD (n = 56 participants) and high-trait...

Full Caption

Fig. 1 | Trait comparisons. a, The ASD (n = 56 participants) and high-trait (HT; n = 56 participants) groups had comparable levels of self-reported autistic traits (measured via BAPQ; two-sided pairwise comparisons using estimated marginal means, with confidence intervals and P values adjusted for multiple comparisons using the Tukey method: t(111) = -0.28, P=0.957, estimated difference = -0.026, 95% CI [-0.25, 0.19], Cohen's d = -0.05; mean ASD: 3.82, mean HT: 3.85, mean low-trait (LT): 2.11).

Figure/Table Image (Page 4)

First Reference in Text

As anticipated owing to how the groups were defined, the three groups differed in their self-reported autistic traits, as measured by BAPQ scores (F(2,163) = 232.86, P = 1.66 × 10-48, Npartial² = 0.74; Fig. 1a).

Description

Overall Focus: The caption refers to 'Fig. 1a', which is a component of a larger figure (Fig. 1) that presents trait comparisons across different groups. This specific component focuses on comparing the levels of self-reported autistic traits between an Autism Spectrum Disorder (ASD) group, a High-Trait (HT) group, and a Low-Trait (LT) group.
Measurement Tool: The caption specifies that self-reported autistic traits were measured using the Broad Autism Phenotype Questionnaire (BAPQ). The BAPQ is a questionnaire designed to quantify autistic-like traits in individuals, even if they don't have a formal ASD diagnosis.
Sample Sizes: The number of participants in the ASD and High-Trait groups is explicitly stated as n = 56 for each group, indicating that there were 56 individuals in each of these groups.
Statistical Analysis: The statistical analysis used to compare the groups is described as 'two-sided pairwise comparisons using estimated marginal means, with confidence intervals and P values adjusted for multiple comparisons using the Tukey method.' This means that the researchers compared each pair of groups (ASD vs. HT, ASD vs. LT, HT vs. LT) to see if their means were different, and they used a method (Tukey's) to correct for the fact that they were doing multiple comparisons, which can increase the chance of finding a significant difference just by chance. The 'estimated marginal means' are the average scores for each group, adjusted for any other variables in the model.
Statistical Results: The results of the comparison between the ASD and HT groups are presented as follows: t(111) = -0.28, P=0.957, estimated difference = -0.026, 95% CI [-0.25, 0.19], Cohen's d = -0.05. Here, 't(111) = -0.28' refers to the t-statistic, a measure of the difference between the two groups' means relative to the variability within the groups, with 111 degrees of freedom (related to the sample size). 'P=0.957' is the p-value, indicating that there is a 95.7% chance of observing the data (or more extreme data) if there is truly no difference between the groups. 'Estimated difference = -0.026' is the estimated difference in the means of the two groups. '95% CI [-0.25, 0.19]' is the 95% confidence interval, which provides a range of values within which the true difference between the group means is likely to fall. 'Cohen's d = -0.05' is a measure of effect size, quantifying the size of the difference between the two groups' means in standard deviation units.
Mean BAPQ Scores: The mean BAPQ scores for each group are also provided: mean ASD: 3.82, mean HT: 3.85, mean low-trait (LT): 2.11. These scores represent the average level of self-reported autistic traits in each group, according to the BAPQ.

Scientific Validity

Statistical Support: The claim that the ASD and HT groups had comparable levels of autistic traits is supported by the statistical analysis provided, which shows a non-significant difference (P = 0.957).
Appropriate Post-Hoc Testing: The use of appropriate post-hoc tests (Tukey method) for multiple comparisons strengthens the validity of the conclusion, as it controls for the increased risk of Type I error.
Effect Size Provided: Providing the effect size (Cohen's d = -0.05) is valuable, as it quantifies the magnitude of the non-significant difference. A small effect size further supports the conclusion that the groups are similar in their self-reported autistic traits.

Communication

Concise Summary: The caption concisely summarizes the key finding that the ASD and high-trait groups showed comparable levels of self-reported autistic traits. The inclusion of group abbreviations (ASD, HT, LT) aids in quick comprehension.
Potential Overload of Statistical Detail: The caption includes a substantial amount of statistical detail. While comprehensive, this might overwhelm some readers. Moving some of the detailed statistical information (e.g., degrees of freedom) to the main text or a footnote could improve readability.
Effective Cross-Referencing: The reference to 'Fig. 1a' in the Results section is appropriate and helps the reader quickly locate the relevant visual representation of the data.

Fig. 1 | Trait comparisons. b, c, Investigation into traits of other disorders...

Full Caption

Fig. 1 | Trait comparisons. b, c, Investigation into traits of other disorders characterized by social impairment revealed that, compared with both other groups (n = 56 participants each), the high-trait group (n = 56 participants) self-reported a higher level of social anxiety (two-sided mixed-effects model with random intercept for matched pair ID: F(2,163) = 59.80, P = 3.33 × 10-20, Npartial2 = 0.42; mean ASD: 35.39, mean HT: 46.43, mean LT: 19.21 (b)) and avoidant personality disorder (AVPD) symptoms (two-sided mixed-effects model with random intercept for

Figure/Table Image (Page 4)

First Reference in Text

The groups differed in their social anxiety symptoms (F(2,163) = 59.80, P = 3.33 × 10-20, Npartial² = 0.42; Fig. 1b), such that the high-trait group had higher scores (indicating more symptoms) than both the low-trait group (t(110) = 10.87, P = 5.72 × 10-14, estimated difference = 27.3, 95% CI [21.4, 33.3], Cohen's d = 2.06) and the ASD

Description

Scientific Validity

Communication

Fig. 1 | Trait comparisons. matched pair ID: F(2,163) = 107.84, P=1.46×10-30,...

Full Caption

Fig. 1 | Trait comparisons. matched pair ID: F(2,163) = 107.84, P=1.46×10-30, Npartial² = 0.57; mean ASD: 20.09, mean HT: 23.80, mean LT: 11.36 (c)).

Figure/Table Image (Page 4)

First Reference in Text

The pairwise group differences for AVPD traits follow the same pattern as social anxiety: the high-trait group had higher scores (indicating more symptoms) than both the low-trait group (t(110) = 14.58, P = 2.27 × 10-14, estimated difference = 12.50, 95% CI [10.42, 14.58], Cohen's d = 2.70) and the ASD group (t(111) = -4.18,

Description

Overall Focus: The caption describes part of Figure 1, specifically component '(c)', which presents results related to Avoidant Personality Disorder (AVPD) symptoms across three groups: ASD, High-Trait (HT), and Low-Trait (LT). The sample size for each group is implicitly stated as n = 56, as it is mentioned in prior captions for Figure 1.
Statistical Analysis: A mixed-effects model with a random intercept for matched pair ID was used for the analysis. A mixed-effects model is a statistical technique that allows for the analysis of data with both fixed effects (effects that are of direct interest) and random effects (effects that account for variability in the data). In this case, the random intercept for matched pair ID controls for the fact that some participants were matched, meaning their data points might be more similar to each other than to data points from other participants.
Statistical Results: The results of the mixed-effects model are presented as F(2,163) = 107.84, P=1.46×10-30, and Npartial² = 0.57. 'F(2,163) = 107.84' refers to the F-statistic, a measure of the variance between group means relative to the variance within groups, with 2 and 163 degrees of freedom. 'P=1.46×10-30' is the p-value, indicating extremely strong statistical significance. 'Npartial² = 0.57' is partial eta-squared, a measure of effect size, indicating the proportion of variance in AVPD symptoms that is explained by group membership, after controlling for other variables.
Mean AVPD Scores: The mean AVPD scores for each group are provided: mean ASD: 20.09, mean HT: 23.80, mean LT: 11.36. These scores represent the average level of self-reported AVPD symptoms in each group.

Scientific Validity

Appropriate Statistical Model: The use of a mixed-effects model with a random intercept for matched pair ID is appropriate, given the study design with matched participants. This approach accounts for the non-independence of observations within matched pairs.
Strong Statistical Evidence: The reported F-statistic, p-value, and partial eta-squared provide strong evidence for a significant group difference in AVPD symptoms. The effect size (partial eta-squared = 0.57) indicates a large effect.
Incomplete Pairwise Comparison Details: The reference text correctly points out that the pairwise group differences follow a similar pattern to social anxiety. However, the statistical details of these pairwise comparisons are not fully provided in the caption, requiring the reader to consult the main text.

Communication

Clear Summary: The caption effectively summarizes that the figure component (c) displays the results of a mixed-effects model analysis for Avoidant Personality Disorder (AVPD) traits, including the F-statistic, p-value, and partial eta-squared. The group means are clearly presented, which allows for a quick comparison of AVPD symptoms across the groups.
Assumed Knowledge: The use of abbreviations is consistent with prior captions, which aids in reader comprehension. However, the caption assumes that the reader is familiar with the meaning and interpretation of mixed-effects models and associated statistics.
Missing Key Finding: The caption could be improved by briefly stating the key finding (e.g., 'High-trait group reported highest AVPD symptoms') to provide more context.

Fig. 1 | Trait comparisons. d, In the in-person ASD group (n = 56...

Full Caption

Fig. 1 | Trait comparisons. d, In the in-person ASD group (n = 56 participants), there was no relationship between clinician-rated autistic traits measured via ADOS (mean = 13.85) and self-reported autistic traits measured via BAPQ (two-sided general linear model: b = 0.025, s.e.m. = 0.02, t(51) = 1.16, P = 0.251, 95% CI [-0.018, 0.067], Npartial² = 0.01).

Figure/Table Image (Page 4)

First Reference in Text

In addition to the self-report measures, in-person participants completed the Autism Diagnostic Observation Schedule (ADOS-2; module 4)28, considered the 'gold standard' clinical assessment measure for ASD.

Description

Overall Focus: The caption describes part of Figure 1, specifically component '(d)', which focuses on the relationship between two different measures of autistic traits within the in-person Autism Spectrum Disorder (ASD) group. The number of participants in this group is stated as n = 56.
Clinician-Rated Measure: The caption specifies that clinician-rated autistic traits were measured using the Autism Diagnostic Observation Schedule (ADOS). The ADOS is a semi-structured, standardized assessment used to diagnose autism, administered by trained clinicians.
Self-Report Measure: The caption also specifies that self-reported autistic traits were measured using the Broad Autism Phenotype Questionnaire (BAPQ). The BAPQ is a self-report questionnaire that assesses autistic-like traits in individuals.
Statistical Analysis: The statistical analysis used to assess the relationship between the ADOS and BAPQ scores is described as a 'two-sided general linear model'. A general linear model is a statistical technique used to model the relationship between a dependent variable (in this case, ADOS score) and one or more independent variables (in this case, BAPQ score). The 'two-sided' aspect indicates that the researchers were testing for both positive and negative relationships.
Statistical Results: The results of the general linear model are presented as follows: b = 0.025, s.e.m. = 0.02, t(51) = 1.16, P = 0.251, 95% CI [-0.018, 0.067], Npartial² = 0.01. Here, 'b = 0.025' refers to the unstandardized regression coefficient, representing the change in ADOS score for each one-unit increase in BAPQ score. 's.e.m. = 0.02' is the standard error of the mean, a measure of the precision of the estimated regression coefficient. 't(51) = 1.16' refers to the t-statistic, used to test the null hypothesis that the regression coefficient is zero, with 51 degrees of freedom. 'P = 0.251' is the p-value, indicating the probability of observing the data (or more extreme data) if there is truly no relationship between the ADOS and BAPQ scores. '95% CI [-0.018, 0.067]' is the 95% confidence interval for the regression coefficient. 'Npartial² = 0.01' is partial eta-squared, a measure of effect size, indicating the proportion of variance in ADOS scores that is explained by BAPQ scores, controlling for other variables.

Scientific Validity

Appropriate Statistical Model: The use of a general linear model is appropriate for assessing the relationship between two continuous variables (ADOS and BAPQ scores).
Sufficient Statistical Information: The reported statistics (b, s.e.m., t, P, CI, Npartial²) provide sufficient information to evaluate the strength and direction of the relationship between the ADOS and BAPQ scores.
Support for Conclusion: The non-significant p-value (P = 0.251) supports the conclusion that there is no statistically significant relationship between clinician-rated and self-reported autistic traits in this sample. The small effect size (Npartial² = 0.01) further supports this conclusion.
Missing Participants: It is important to note that the degrees of freedom are 51. The total number of participants is 56, so the analysis likely excluded five participants for some reason. It would be helpful to know why those five participants were excluded.

Communication

Clear Statement of Key Finding: The caption clearly states the key finding: there was no significant relationship between clinician-rated (ADOS) and self-reported (BAPQ) autistic traits within the in-person ASD group. The inclusion of the mean ADOS score provides context for the overall level of autistic traits in this sample.
Comprehensive Statistical Detail: The caption provides a comprehensive overview of the statistical analysis, including the model used (general linear model) and relevant statistics (b, s.e.m., t, P, CI, Npartial²). However, the sheer volume of statistical information might be overwhelming for some readers.
Assumed Familiarity with Abbreviations: The use of abbreviations (ADOS, BAPQ, s.e.m., CI) is consistent and aids in conciseness, but assumes familiarity with these abbreviations. Briefly defining these abbreviations upon first use in the figure caption could improve accessibility.

Fig. 1 | Trait comparisons. e,f, Broken down by subscales, there was no...

Full Caption

Fig. 1 | Trait comparisons. e,f, Broken down by subscales, there was no agreement in the restricted and repetitive behavior domain (RRB; general linear model: b = 0.12, s.e.m. = 0.06, t(51) = 1.95, P = 0.057, 95% CI [0.0, 1.0], Npartial² = 0.05 (e)) or the social domain (general linear model: b = 0.05, s.e.m. = 0.04, t(51) = 1.17, P = 0.249, 95% CI [0.0, 1.0], Npartial² = 0.03 (f)).

Figure/Table Image (Page 4)

First Reference in Text

Broken down by subdomain, there was also no relationship between self- and clinician-rated traits in the restricted and repetitive behavior domain (b = 0.12, s.e.m. = 0.06, t(51) = 1.95, P = 0.057, 95% CI [0.0, 1.0], Npartial² = 0.05; Fig. 1e) or the social domain (b = 0.05, s.e.m. = 0.04, t(51) = 1.17, P = 0.249, 95% CI

Description

Overall Focus: The caption describes parts of Figure 1, specifically components '(e)' and '(f)', which present results related to the relationship between clinician-rated and self-reported autistic traits, broken down into subscales: Restricted and Repetitive Behavior (RRB) and the Social Domain. The number of participants (n=56) is consistent with previous parts of Figure 1.
Statistical Analysis: The caption indicates that a general linear model was used to assess the relationship between clinician-rated and self-reported traits within each subdomain. As mentioned before, a general linear model is a statistical technique used to model the relationship between a dependent variable and one or more independent variables.
RRB Results: For the Restricted and Repetitive Behavior (RRB) domain, the results of the general linear model are presented as: b = 0.12, s.e.m. = 0.06, t(51) = 1.95, P = 0.057, 95% CI [0.0, 1.0], Npartial² = 0.05. As before, 'b' is the unstandardized regression coefficient, 's.e.m.' is the standard error of the mean, 't(51)' is the t-statistic with 51 degrees of freedom, 'P' is the p-value, '95% CI' is the 95% confidence interval, and 'Npartial²' is partial eta-squared.
Social Domain Results: For the Social Domain, the results of the general linear model are presented as: b = 0.05, s.e.m. = 0.04, t(51) = 1.17, P = 0.249, 95% CI [0.0, 1.0], Npartial² = 0.03. The statistics are interpreted as described above.
Non-Significant Results: The p-values for both the RRB domain (P = 0.057) and the Social Domain (P = 0.249) are greater than 0.05, indicating that there is no statistically significant relationship between clinician-rated and self-reported traits in either domain.

Scientific Validity

Appropriate Statistical Model: The use of a general linear model is appropriate for assessing the relationship between two continuous variables within each subdomain.
Sufficient Statistical Information: The reported statistics provide sufficient information to evaluate the strength and direction of the relationship between the ADOS and BAPQ subscales.
Support for Conclusion: The non-significant p-values for both subdomains support the conclusion that there is no statistically significant relationship between clinician-rated and self-reported autistic traits at the subdomain level.
Near-Significant Trend: The p-value for the RRB domain (P = 0.057) is close to the significance threshold of 0.05. While not statistically significant, it is worth noting that this trend might warrant further investigation with a larger sample size.
Multiple comparisons inflation: The analysis of the subdomains uses the same participants as the overall analysis in Figure 1d. This raises the risk of inflating Type I error due to multiple comparisons. It is good that the authors used the Bonferroni correction, but they could have mentioned it here as well.

Communication

Clear Reinforcement of Key Finding: The caption clearly states that, even when breaking down autistic traits into subscales (restricted and repetitive behavior, and social domain), there was still no significant agreement between clinician-rated and self-reported measures. This reinforces the overall finding of a discrepancy between these two types of assessments.
Need for RRB Definition: The inclusion of the 'RRB' abbreviation is helpful, but defining it earlier in the figure caption or in the main text would improve clarity for readers unfamiliar with this specific terminology.
High Statistical Density: The caption is very dense with statistical information. While comprehensive, it might be overwhelming for some readers. Consider moving some of the detailed statistical information (e.g., s.e.m., CI) to a footnote or the main text.

Fig. 2 | Social controllability. a, As shown in the representative task screen,...

Full Caption

Fig. 2 | Social controllability. a, As shown in the representative task screen, the social control task involved participants accepting or rejecting splits of $20 proposed by members of two virtual teams. b, Participants played the game with two different teams sequentially, the order of which was counterbalanced.

Figure/Table Image (Page 5)

First Reference in Text

To measure social controllability, we used a monetary exchange task 26,27,34,35 modified from the ultimatum game, in which participants decide whether to accept or reject proposed splits of US$20 offered by players from two independent teams (Fig. 2a; see Methods for details).

Description

Overall Focus: The caption describes Figure 2, which focuses on 'Social controllability.' This refers to the ability of an individual to influence the outcomes of social interactions.
Representative Task Screen: Figure 2a shows a representative task screen. This means that the figure includes a visual example of what the participants saw when they were performing the task. The task involved participants making decisions about how to split $20.
Virtual Teams: Participants interacted with two 'virtual teams.' This means that the participants weren't interacting with real people but instead were told they were interacting with members of two different teams. The offers for splitting the $20 came from these virtual teams.
Counterbalancing: Figure 2b indicates that participants played the game with two different teams, one after the other. The order in which they played with the teams was 'counterbalanced'. Counterbalancing is a technique used to control for order effects. In this context, it means that some participants played with Team A first and then Team B, while others played with Team B first and then Team A.

Scientific Validity

Established Paradigm: The caption provides a basic description of the task. The reference text elaborates that the task was modified from the ultimatum game, a well-established paradigm in behavioral economics. This provides a basis for understanding the task's validity.
Sound Methodology: The caption mentions that the order of teams was counterbalanced. Counterbalancing is a crucial aspect of the experimental design, as it controls for potential order effects that could confound the results.
Missing Manipulation Details: The caption lacks information about the specific manipulations used to assess social controllability. It would be useful to briefly mention the key manipulation that allowed participants to exert influence over one of the teams (e.g., by rejecting offers).

Communication

Clear Overview: The caption provides a clear, high-level overview of the social controllability task. It effectively introduces the main elements of the task, including the monetary exchange and the presence of two virtual teams.
Lack of Specificity (a): The mention of a 'representative task screen' in (a) is helpful, as it indicates the figure includes a visual depiction of the task interface. However, it does not specify what key information the representative task screen shows.
Importance of Counterbalancing: Stating that the order of teams was counterbalanced is important for understanding the experimental design. This strengthens the claim that there is no systematic bias based on team order.

Fig. 2 | Social controllability. c, All groups (n = 56 participants each)...

Full Caption

Fig. 2 | Social controllability. c, All groups (n = 56 participants each) showed comparable overall rejection rates for both conditions (two-sided mixed-effects model with random intercept for matched pair ID: F(2,281) = 0.77, P = 0.46, Npartial² = 0.006; mean ASD controllable: 52.4%, mean HT controllable: 55.5%, mean LT controllable: 54.7%, mean ASD uncontrollable: 49.6%, mean HT uncontrollable: 51.9%, mean LT uncontrollable: 48.2%).

Figure/Table Image (Page 5)

First Reference in Text

We found that the three groups showed similar overall rejection rates during the task (F(2,281) = 0.77, P = 0.46, Npartial² = 0.006; Fig. 2c).

Description

Overall Focus: The caption refers to Figure 2c, which presents the overall rejection rates for three groups of participants (ASD, High-Trait (HT), and Low-Trait (LT)) in two conditions: a 'controllable' condition and an 'uncontrollable' condition. The sample size for each group is stated as n = 56.
Rejection Rate: The rejection rate refers to the percentage of times participants rejected the proposed split of $20 in the monetary exchange task. A higher rejection rate suggests that participants were less willing to accept unfair offers.
Controllable vs. Uncontrollable Conditions: The two conditions, 'controllable' and 'uncontrollable,' refer to whether or not the participants could influence future offers by rejecting current offers. In the controllable condition, rejecting an offer could lead to better offers in the future. In the uncontrollable condition, offers were random.
Statistical Analysis: The statistical analysis used is described as a 'two-sided mixed-effects model with random intercept for matched pair ID'. As mentioned before, a mixed-effects model is a statistical technique that allows for the analysis of data with both fixed effects and random effects, and the random intercept accounts for the non-independence of observations within matched pairs.
Statistical Results: The results of the mixed-effects model are presented as: F(2,281) = 0.77, P = 0.46, Npartial² = 0.006. 'F(2,281) = 0.77' refers to the F-statistic, 'P = 0.46' is the p-value, and 'Npartial² = 0.006' is partial eta-squared. The p-value is above the significance threshold of 0.05, indicating that there is no statistically significant difference in overall rejection rates between the groups or conditions.
Mean Rejection Rates: The mean rejection rates for each group and condition are provided (e.g., mean ASD controllable: 52.4%). These values represent the average percentage of times participants in each group rejected offers in each condition.

Scientific Validity

Appropriate Statistical Model: The use of a mixed-effects model with a random intercept for matched pair ID is appropriate, given the study design.
Support for Null Finding: The reported statistics provide sufficient information to evaluate the null finding. The non-significant p-value and small effect size support the conclusion that there were no significant differences in overall rejection rates.
Sample Size Considerations: The sample size of 56 participants per group is adequate for detecting moderate to large effect sizes. However, the small effect size observed (Npartial² = 0.006) suggests that a much larger sample size would be needed to detect any real differences in overall rejection rates, if they exist.

Communication

Clear Statement of Null Finding: The caption clearly states that there were no significant differences in overall rejection rates between the groups or conditions. This directly addresses a key aspect of the social controllability task.
Helpful Descriptive Statistics: The inclusion of the means for each group and condition provides valuable descriptive information, allowing readers to compare the rejection rates across groups.
Potential for Statistical Overload: The caption provides a comprehensive statistical summary, including the test used, F-statistic, p-value, and effect size. However, the sheer volume of statistical information might overwhelm some readers.

Fig. 2 | Social controllability. d, When rejection rate is broken down by offer...

Full Caption

Fig. 2 | Social controllability. d, When rejection rate is broken down by offer size, we see that the ASD group (n = 56 participants) rejected a lower percentage of high offers than the two online groups (n = 56 participants each) during the controllable condition (two-sided mixed-effects model with random intercept for matched pair ID, P values false discovery rate (FDR)-corrected for multiple

Figure/Table Image (Page 5)

First Reference in Text

Breaking rejection rate down by offer size, we found that, while the groups showed similar rejection rates for low offers (F(2,74) = 0.20, P = 0.82, Npartial² = 0.005) and medium offers (F(2,162) = 1.67, P = 0.29, npartial² = 0.02), the ASD group rejected a smaller percentage of high offers (F(2,122) = 6.12, P = 0.009, npartial² = 0.09; Fig. 2d) compared with both low-trait

Description

Overall Focus: The caption describes part of Figure 2, specifically component '(d)', which presents results related to how often participants rejected offers of different sizes (high, medium, low) in the 'controllable' condition of the social controllability task. The focus is on comparing the rejection rates of the ASD group to the online groups.
Controllable Condition: The caption specifies that the analysis focuses on the 'controllable condition'. As a reminder, in the controllable condition, participants could influence future offers by rejecting current offers.
High Offers: The caption highlights that the ASD group rejected a lower percentage of 'high offers' compared to the online groups. The offer sizes were categorized as low, medium, and high. Although the exact monetary values defining each category aren't provided in this caption, it's implied that 'high offers' represent the most advantageous offers for the participant.
Sample Size: The sample size is stated as n = 56 for each group (ASD and the two online groups).
Statistical Analysis: The statistical analysis used is described as a 'two-sided mixed-effects model with random intercept for matched pair ID'. As before, a mixed-effects model is used to account for the non-independence of observations within matched pairs.
FDR Correction: The caption mentions that the p-values are 'false discovery rate (FDR)-corrected for multiple'. FDR correction is a method used to adjust p-values when performing multiple statistical tests to control the rate of false positives (incorrectly rejecting the null hypothesis).

Scientific Validity

Appropriate Statistical Model and Correction: The use of a mixed-effects model is appropriate, given the study design. The mention of FDR correction strengthens the validity of the finding.
Incomplete Statistical Information: The caption is cut off, so the full statistical results are not presented. This limits the ability to fully evaluate the strength of the evidence.
Missing Post-Hoc Test Details: The reference text provides additional statistical details. It indicates that there is a statistically significant difference in rejection rates of high offers (F(2,122) = 6.12, P = 0.009, npartial² = 0.09). However, it is not clear from the caption or reference text which specific post-hoc tests were used to determine that the ASD group rejected fewer high offers than *both* online groups.
Adequate Sample Size: The sample size is adequate for detecting moderate to large effect sizes. The reported partial eta-squared of 0.09 suggests a small to moderate effect.

Communication

Clear Key Finding: The caption clearly states the key finding: the ASD group rejected high offers less often than the online groups in the controllable condition. This highlights a difference in decision-making based on offer size.
Mention of FDR Correction: The caption mentions that p-values are FDR-corrected, which strengthens the validity of the finding. However, the specific FDR correction method is not provided in the caption.
Incomplete Caption: The caption is incomplete, as it is cut off mid-sentence. This makes it difficult to fully understand the context and scope of the finding.

Fig. 2 | Social controllability. e, Unlike the online groups (n = 56...

Full Caption

Fig. 2 | Social controllability. e, Unlike the online groups (n = 56 participants each), the ASD group (n = 56 participants) did not detect a difference in controllability between the conditions (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,269) = 18.52, P = 2.91×10-8, Npartial² = 0.12; mean ASD controllable: 45.47, mean HT controllable: 67.45, mean LT controllable: 61.07, mean ASD uncontrollable: 41.79, mean HT uncontrollable: 19.66, mean LT uncontrollable: 24.70).

Figure/Table Image (Page 5)

First Reference in Text

Indeed, we detected a significant group-by-condition interaction on perceived control ratings (F(2,269) = 18.52, P = 2.91 × 10¯8, Npartial² = 0.12; Fig. 2e).

Description

Overall Focus: The caption refers to Figure 2e, which presents results related to perceived controllability, i.e., how much control participants felt they had in the task. The key finding is that the ASD group's perception differed from that of the online groups.
Lack of Perceived Difference: The caption highlights that the ASD group 'did not detect a difference in controllability between the conditions.' This means that, unlike the online groups, the ASD participants did not perceive a difference in their ability to influence the offers they received, regardless of whether the condition was controllable or uncontrollable.
Sample Size: The sample size is stated as n = 56 for each group.
Statistical Analysis: The statistical analysis used is described as a 'two-sided mixed-effects model with random intercepts for matched pair IDs'.
Statistical Results: The results of the mixed-effects model are presented as: F(2,269) = 18.52, P = 2.91×10-8, Npartial² = 0.12. As before, 'F' is the F-statistic, 'P' is the p-value, and 'Npartial²' is partial eta-squared. The extremely small P-value indicates a statistically significant difference in perceived controllability between the groups and/or conditions.
Mean Scores: The mean perceived controllability scores for each group and condition are provided (e.g., mean ASD controllable: 45.47).

Scientific Validity

Standard Statistics: The reported statistics from the mixed-effects model are standard, which adds some validity to the analysis. However, the caption is cut off, preventing a full evaluation of its validity.
Missing Post-Hoc Test Details: The reference text includes additional statistical details, but the specific post-hoc tests used to determine the significant difference between the ASD and low-trait groups are not explicitly stated. This omission limits a comprehensive assessment of the analysis's validity.
Adequate sample size: The sample size and effect size should be considered to have adequate power.

Communication

Clear Statement of Key Finding: The caption clearly states that, unlike the online groups, the ASD group did not perceive a difference in controllability between the two conditions. This is a crucial finding, highlighting a potential difference in the perception of social control.
Helpful Descriptive Statistics: The inclusion of the means for each group and condition allows for a quick comparison of perceived controllability across groups and conditions. The large difference between the HT/LT groups and the ASD group is particularly noticeable.
Potential Lack of Context: The caption provides a comprehensive statistical summary. However, some readers may not be familiar with the concept of 'perceived controllability' and might benefit from a brief explanation of what this refers to in the context of the task.

Fig. 3 | Social navigation. a, The social navigation task involved participants...

Full Caption

Fig. 3 | Social navigation. a, The social navigation task involved participants interacting with different characters with the goal of finding a job and a home. At each interaction, participants could choose between two options that affected either the affiliation or power dynamics of the relationship.

Figure/Table Image (Page 6)

First Reference in Text

To evaluate participants' social feelings and actions during dynamic interactions, we utilized the social navigation task36.

Description

Overall Focus: The caption describes Figure 3, which focuses on a 'social navigation task.' This task is designed to assess how people navigate social situations and relationships.
Virtual Characters: The task involved participants interacting with different characters. These characters are not real people but are instead virtual characters presented to the participants.
Task Goal: The participants' goal in the task was to find a job and a home. This provides a specific context for the social interactions.
Affiliation and Power Dynamics: At each interaction, participants had to choose between two options. These options were designed to affect either the affiliation or power dynamics of the relationship with the character. 'Affiliation' refers to how close or friendly the relationship is, while 'power dynamics' refers to the balance of control between the participant and the character.

Scientific Validity

Theoretically Sound: The task has a clear goal and a structured interaction, which are important for experimental validity. The use of affiliation and power dynamics as key dimensions is theoretically grounded in social psychology.
Lack of Task Details: The caption and reference text provide a general description of the task but lack specific details about the task's design, scoring, and validation. More information about the task's psychometric properties and sensitivity would be beneficial.
Use of Pre-Existing Task: The use of a pre-existing task (as indicated by the citation in the reference text) strengthens the validity of the study, as it suggests that the task has been previously validated and used in research.

Communication

Clear and Concise Overview: The caption provides a clear and concise overview of the social navigation task. It effectively conveys the task's goal (finding a job and home) and the nature of the interactions (affecting affiliation or power).
Lack of Specificity: The caption effectively introduces the key concepts of 'affiliation' and 'power dynamics,' which are central to understanding the task's design. However, it doesn't elaborate on how these dynamics were manipulated or measured.
Appropriate Introduction: The reference text appropriately introduces the social navigation task as a method for evaluating social feelings and actions during dynamic interactions.

Fig. 3 | Social navigation. b, Compared with the low-trait group, the...

Full Caption

Fig. 3 | Social navigation. b, Compared with the low-trait group, the high-trait and ASD groups (n = 56 participants each) both reported a reduced liking of the characters in the social navigation task (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,111) = 8.11, P = 0.0005, npartial² = 0.13; mean ASD: 51.09, mean HT: 51.98, mean LT: 59.10).

Figure/Table Image (Page 6)

First Reference in Text

We began by investigating participants' subjective feelings toward characters in the task and found that the three groups differed in their ratings of character likability (F(2,111) = 8.11, P = 0.0005, npartial² = 0.13; Fig. 3b).

Description

Overall Focus: The caption describes part of Figure 3, specifically component '(b)', which presents results related to how much the participants liked the virtual characters in the social navigation task. The focus is on comparing the liking ratings of the ASD and High-Trait groups to the Low-Trait group.
Reduced Liking: The caption indicates that the high-trait and ASD groups reported a 'reduced liking' of the characters compared to the low-trait group. This suggests that participants in these groups generally felt less positively toward the virtual characters they interacted with in the task.
Sample Size: The sample size is stated as n = 56 for each group.
Statistical Analysis: The statistical analysis used is described as a 'two-sided mixed-effects model with random intercepts for matched pair IDs'. As before, a mixed-effects model is used to account for the non-independence of observations within matched pairs.
Statistical Results: The results of the mixed-effects model are presented as: F(2,111) = 8.11, P = 0.0005, npartial² = 0.13. 'F(2,111) = 8.11' refers to the F-statistic, 'P = 0.0005' is the p-value, and 'npartial² = 0.13' is partial eta-squared. The small P-value indicates a statistically significant difference in character liking between the groups.
Mean Liking Scores: The mean liking scores for each group are provided: mean ASD: 51.09, mean HT: 51.98, mean LT: 59.10. These values represent the average liking ratings for each group.

Scientific Validity

Appropriate Statistical Model: The use of a mixed-effects model is appropriate, given the study design.
Significant Finding: The reported statistics provide sufficient information to evaluate the finding. The significant p-value and moderate effect size suggest a meaningful difference in character liking between the groups.
Need for Post-Hoc Details: It is important to note that post-hoc analyses are needed to determine which specific groups differed significantly from each other. The caption only states that the high-trait and ASD groups both reported reduced liking *compared to the low-trait group*. It is possible that the high-trait and ASD groups did not differ from each other, which should be explicitly stated.

Communication

Clear Summary of Main Finding: The caption clearly states that both the high-trait and ASD groups reported less liking of the characters compared to the low-trait group. This provides a concise summary of the main finding related to character likability.
Lack of Scale Information: The caption provides the means for each group, which is helpful for comparing the average liking scores. However, it does not provide information about the scale used to measure liking, making it difficult to interpret the magnitude of the differences.
Assumed Familiarity with Abbreviations: The use of abbreviations (ASD, HT, LT) is consistent and aids in conciseness. However, the caption assumes familiarity with these abbreviations.

Fig. 3 | Social navigation. c, Despite having comparable feelings toward...

Full Caption

Fig. 3 | Social navigation. c, Despite having comparable feelings toward characters, the ASD group (n = 56 participants) acted less affiliative than the high-trait group (n = 56 participants; two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,111) = 17.21, P = 3.098×10-7, Npartial2 = 0.24; mean ASD: 0.16, mean HT: 0.30, mean LT: 0.46).

Figure/Table Image (Page 6)

First Reference in Text

A significant three-group difference in affiliation tendency (F(2,111) = 17.21, P = 3.10 × 10¯7, Npartial² = 0.24; Fig. 3c) revealed that the ASD group acted significantly less affiliative with the characters than both the high-trait group (t(111) = -2.63, P = 0.026, estimated difference = -0.13, 95% CI [-0.25, -0.01], Cohen's d = -0.50) and the low-

Description

Overall Focus: The caption describes part of Figure 3, specifically component '(c)', which presents results related to 'affiliative behavior' in the social navigation task. 'Affiliative behavior' refers to actions that promote social connection and closeness with the virtual characters.
Dissociation of Feelings and Behavior: The caption highlights that, 'Despite having comparable feelings toward characters,' the ASD group acted less affiliative than the high-trait group. This suggests that the ASD group did not translate their feelings of liking into actions that would build stronger relationships with the virtual characters.
Sample Size: The sample size is stated as n = 56 for both the ASD and High-Trait groups.
Statistical Analysis: The statistical analysis used is described as a 'two-sided mixed-effects model with random intercepts for matched pair IDs'.
Statistical Results: The results of the mixed-effects model are presented as: F(2,111) = 17.21, P = 3.098×10-7, Npartial2 = 0.24. As before, 'F' is the F-statistic, 'P' is the p-value, and 'Npartial²' is partial eta-squared. The small P-value indicates a statistically significant difference in affiliative behavior between the groups.
Mean Affiliative Behavior Scores: The mean affiliative behavior scores for each group are provided: mean ASD: 0.16, mean HT: 0.30, mean LT: 0.46.

Scientific Validity

Communication

Effective Highlighting of Dissociation: The caption effectively highlights the contrast between comparable feelings (liking) and differing behaviors (affiliative actions). This emphasizes the dissociation between subjective feelings and objective actions in the ASD group.
Omission of ASD vs. Low-Trait Comparison: The caption focuses on the difference between the ASD and high-trait groups, but the reference text indicates that the ASD group also acted less affiliative than the low-trait group. This omission limits the caption's completeness.
Need for Scale Context: The inclusion of the means for each group helps quantify the differences in affiliative behavior. However, without knowing the scale's range or meaning, the absolute values are difficult to interpret.

Fig. 3 | Social navigation. d, The groups (n = 56 participants each) did not...

Full Caption

Fig. 3 | Social navigation. d, The groups (n = 56 participants each) did not differ in their power tendencies (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,163) = 1.89, P = 0.15, Npartial2 = 0.02; mean ASD: 0.13, mean HT: 0.19, mean LT: 0.09).

Figure/Table Image (Page 6)

First Reference in Text

The groups did not differ in their power tendencies (F(2,163) = 1.89, P = 0.15, Npartial² = 0.02; Fig. 3d).

Description

Overall Focus: The caption describes part of Figure 3, specifically component '(d)', which presents results related to 'power tendencies' in the social navigation task. 'Power tendencies' refers to the degree to which participants tried to exert control or influence over the virtual characters.
No Group Differences: The caption highlights that the groups 'did not differ in their power tendencies.' This suggests that there were no significant differences between the ASD, High-Trait, and Low-Trait groups in how much they tried to control the interactions with the virtual characters.
Sample Size: The sample size is stated as n = 56 for each group.
Statistical Analysis: The statistical analysis used is described as a 'two-sided mixed-effects model with random intercepts for matched pair IDs'.
Statistical Results: The results of the mixed-effects model are presented as: F(2,163) = 1.89, P = 0.15, Npartial² = 0.02. As before, 'F' is the F-statistic, 'P' is the p-value, and 'Npartial²' is partial eta-squared. The large P-value indicates that there was no statistically significant difference in power tendencies between the groups.

Scientific Validity

Appropriate Statistical Model: The use of a mixed-effects model is appropriate, given the study design.
Support for Null Finding: The reported statistics provide sufficient information to evaluate the null finding. The non-significant p-value supports the conclusion that there were no significant differences in power tendencies.
Small Effect Size: The small effect size (Npartial² = 0.02) indicates that any real differences in power tendencies are likely minimal, further supporting the null finding.

Communication

Clear Communication of Null Finding: The caption clearly states that the groups did not differ in their power tendencies. This directly communicates the null finding, which is important for understanding the overall results of the social navigation task.
Helpful Descriptive Statistics: The inclusion of the means for each group provides valuable descriptive information, even though the overall difference was not statistically significant. This allows readers to assess the relative power tendencies of each group.
Lack of Interpretation of Effect Size: The caption provides a comprehensive statistical summary. The interpretation of the results could be enhanced by mentioning the small effect size, which indicates that any real differences in power tendencies are likely minimal.

Fig. 3 | Social navigation. e, No group-by-trait interaction on character...

Full Caption

Fig. 3 | Social navigation. e, No group-by-trait interaction on character liking was detected (two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,155) = 1.76, P= = 0.18, npartial² = 12 = 0.02).

Figure/Table Image (Page 6)

First Reference in Text

There was no group-by-trait interaction on character liking (F(2,155) = 1.76, P = 0.18, Npartial² = 0.02; Fig. 3e).

Description

Likely Description: The caption is truncated, but it seems to be describing a statistical test related to a group-by-trait interaction on character liking, which is a measure of how much the groups liked the virtual characters.
Statistical Analysis: The statistical test is identified as a 'two-sided mixed-effects model with random intercepts for matched pair IDs'. As previously, this indicates the model being used accounts for non-independence of observations.

Scientific Validity

Communication

Incomplete Caption: The caption is truncated which makes it difficult to determine its communication effectiveness.
Clear reference text: The reference text is clear and supports the caption's purpose, but this is difficult to determine fully without the complete caption.

Fig. 3 | Social navigation. f, However, the relationship between affiliative...

Full Caption

Fig. 3 | Social navigation. f, However, the relationship between affiliative behavior and self-reported traits differed by group (n = 56 participants each; two-sided mixed-effects models with random intercepts for matched pair IDs: F(2,160) = 3.42, P = 0.035, npartial² = 0.04).

Figure/Table Image (Page 6)

First Reference in Text

Finally, there was a significant group-by-trait interaction on affiliation tendency (F(2,160) = 3.42, P = 0.035, Npartial² = 0.04; Fig. 3f).

Description

Overall Focus: The caption refers to Figure 3f, which shows that the relationship between 'affiliative behavior' (actions promoting social connection) and 'self-reported traits' (personal characteristics reported by the participants themselves) is not the same across the three groups.
Self-Reported Traits: The caption does not specify which self-reported traits are being considered here, but it refers back to earlier descriptions of the measures used in the study.
Sample Size: The sample size is stated as n = 56 for each group.
Statistical Analysis: The statistical analysis used is described as a 'two-sided mixed-effects model with random intercepts for matched pair IDs.'
Statistical Results: The results of the mixed-effects model are presented as: F(2,160) = 3.42, P = 0.035, npartial² = 0.04. The P-value is slightly above the conventional significance level of 0.05, which is important to note.

Scientific Validity

Small Effect Size and Missing Post-Hoc Analysis Information: While a mixed-effects model is used, the relatively small effect size (partial eta-squared = 0.04) suggests that the interaction explains only a small portion of the variance in affiliative behavior. The statistical significance (p = 0.035) is marginal, and the caption does not describe any of the post hoc tests.
Appropriate Statistical Model but Weak Conclusion: The statistical analysis used is appropriate, given the study design. However, the marginal statistical significance and the lack of post-hoc analyses make it difficult to draw strong conclusions about the nature of the group differences.

Communication

Limited Informativeness: The caption clearly states that the relationship between affiliative behavior and self-reported traits differed by group, which is a crucial finding. However, it does not specify which groups differed or the nature of the differences. This lack of detail limits the caption's informativeness.
Missing Post-Hoc Analysis Information: The inclusion of the F-statistic, p-value, and effect size provides essential statistical information. However, the caption lacks information about which specific statistical comparisons revealed the group differences.

Discussion

Key Aspects

Discrepancy Between Self-Report and Clinician Assessment: The study found a discrepancy between self-reported and clinician-rated autistic traits in the clinically ascertained ASD group. This reinforces previous research indicating that self-report and observational measures may capture different aspects of the ASD phenotype, and highlights the limitations of relying solely on self-report for diagnosis or assessment.
Differences in Social Behavior: The clinically ascertained ASD group, but not the online high-trait group, showed impairments in recognizing opportunities for social influence and reduced affiliation in interactions with virtual characters. This suggests that online samples defined by high self-reported autistic traits may be phenotypically distinct from clinically ascertained ASD samples, particularly in terms of social behavior.
Value of Self-Report: Self-report questionnaires are valuable tools for understanding the subjective experiences, internal distress, and needs of individuals with ASD. They play a crucial role in incorporating the perspectives of autistic individuals into research and challenging potentially inaccurate assumptions about their behaviors.
Altered Introspection and Social Self-Awareness: The observed discrepancies between self- and clinician-rated traits may be related to altered introspection and reduced social self-awareness, which are common in ASD and other conditions like depression and schizophrenia. Individuals with ASD may have difficulty accurately assessing their own social behavior and how it is perceived by others.
Theory of Mind and Social Controllability: The reduced ability to exert social influence and perceive controllability in the ASD group may be linked to impairments in theory of mind (ToM), which is the ability to understand others' mental states and intentions. This may lead to difficulties in distinguishing between random and goal-directed behavior in social interactions.
Importance of Measuring Behavior: The reduced affiliation observed in the ASD group during the social navigation task highlights the importance of measuring behavior in addition to subjective reports. While the ASD and high-trait groups reported similar levels of liking for the virtual characters, their actual behavior differed, suggesting that self-reported feelings may not always translate into comparable social actions.
Social Anxiety and Avoidance in the High-Trait Group: The online high-trait group reported higher levels of social anxiety and avoidant personality disorder (AVPD) symptoms compared to the in-person ASD group. This suggests that high self-reported autistic traits in the general population, without a clinical ASD diagnosis, may be more reflective of generalized social avoidance and self-consciousness rather than autism-specific social difficulties.
Implications for Intervention: The findings have implications for intervention and support, suggesting that interventions should be tailored to individual needs rather than solely on diagnostic labels or self-reported traits. Individuals with high self-reported traits may benefit from interventions targeting self-confidence and anxiety, while those with observable social difficulties may benefit from skills-focused training.
Limitations and Future Directions: The study acknowledges limitations, including the reliance on a single self-report measure (BAPQ), the lack of a direct measure of insight, and the high range of IQ scores in the in-person sample, which may limit the generalizability of the findings. It also suggests future research to explore the use of additional trait measures and self-reported diagnoses in online studies.
Online Research in Psychiatry: The study highlights the need to use online approaches in psychiatry in tandem with, rather than as a replacement for, lab-based research. It emphasizes the importance of avoiding over-generalization of findings based on self-reported traits and suggests alternative strategies for large-scale studies, such as cross-site collaborations and utilizing existing resources.

Strengths

Clear Summary of Main Findings
The Discussion clearly summarizes the main findings of the study, highlighting the lack of agreement between self-rated and clinician-assessed autistic traits and the differences in social behavior between the clinically ascertained ASD group and the online high-trait group.

"Here we sought to investigate the phenotypic similarities and differ- ences between online participants with high self-reported autistic traits and those with an ASD diagnosis confirmed in person via clinician evaluation. We identified a lack of agreement between self-rated and clinician-assessed trait measures, highlighting the need for separate interpretations of each." (Page 7)
Connection to Broader Research Context
The section effectively connects the findings back to the broader issue of using online self-report measures in autism research, cautioning against over-reliance on self-report for identifying diagnostic groups and extrapolating about ASD as a whole.

"These results provide a caution for future online research: when attempting to identify and draw overarching conclusions about certain diagnostic groups, self-reported symptom surveys alone may not be sufficient...Rather than dismiss the importance of self-views, the results provide a caution for the use of self-report alone for defining or extrapolating about a diagnostic group as a whole." (Page 7)
Recognition of the Value of Self-Report
The Discussion acknowledges the importance of self-report questionnaires for understanding subjective experiences and internal distress in individuals with ASD, emphasizing their role in shaping the narrative and challenging assumptions.

"Despite the lack of identified measurement agreement in this study, we do not believe that these results suggest that self-report questionnaires are invalid for ASD research. On the contrary, they are important tools for understanding the subjective experiences, levels of internal distress or wellbeing and needs of people with ASD." (Page 7)
Plausible Explanation for Discrepancies
The section provides a plausible explanation for the observed discrepancies between self- and clinician-rated traits, linking them to altered introspection and reduced social self-awareness, which are known characteristics of ASD and other conditions.

"The discrepancy we detected between self-reported BAPQ and clinician-rated ADOS scores in the in-person ASD group is consistent with previous reports using different measures...Discrepancies between self- and observer-rated traits are not uncommon among individuals with altered introspection...Reduced social self-awareness has been widely reported in ASD40,41 and likely contributes to discrepancies between self- and clinician report." (Page 7)
Interpretation of Social Behavior Differences
The Discussion offers potential explanations for the observed differences in social behavior on the tasks, relating them to theory of mind (ToM) impairments, reduced understanding of social intentions, and impaired affordance perception in the ASD group.

"Such results may stem from reductions in ToM-related understanding of others’ motivations in the clinical ASD group but not the high-trait group...In ASD, impaired ability to predict offers and understand players’ intentions may lead to a lack of distinction between random and non-random (goal-directed) behav- ior...It is also possible that the reduced per- ception of controllability seen in ASD is caused by impaired affordance perception..." (Page 7)
Discussion of Subjective Beliefs vs. Behavior
The section discusses the implications of the findings for understanding the relationship between subjective beliefs and social behavior, highlighting the importance of measuring behavior for a comprehensive understanding of trait presentation.

"In the social navigation task, although both the high-trait and ASD groups reported liking the characters less than the low-trait group, only the ASD group was less affiliative with characters during their interactions than other groups. Such results highlight the importance of measuring behavior for achieving a comprehensive understand- ing of trait presentation." (Page 7)
Consideration of Social Anxiety and AVPD
The Discussion acknowledges the potential role of social anxiety and avoidant personality disorder (AVPD) in the online high-trait group, suggesting that self-reported autistic traits in the general population may reflect generalized social avoidance rather than autism-specific difficulties.

"In our study, the online group with high autistic traits also self- reported heightened levels of social anxiety and AVPD symptoms compared with the in-person ASD group...Thus, it is possible that the online group represents individuals with both subclinical and clinical levels of socially avoidant and/or anxious traits without co-occurring ASD. Self-reported autistic traits in the general population, absent ASD diagnoses, may be reflective of generalized social avoidance and self-consciousness regarding social skills rather than autism-specific social difficulties." (Page 7)
Implications for Intervention and Support
The section discusses the implications of the findings for intervention and support, emphasizing the need to tailor interventions based on individual needs rather than solely on diagnostic labels or self-reported traits.

"An important implication of the distinction between diagnosis and traits is that we must be cautious not to extrapolate about the needs of one group on the basis of the findings from research conducted in the other...This distinction is important because, without it, there is a potential risk of harm (or at least reduced access to benefits) to individuals with ASD who require more behavioral support and access to accommodations..." (Page 8)
Acknowledgment of Limitations
The Discussion appropriately acknowledges the limitations of the study, including the reliance on a single self-report measure (BAPQ), the lack of a direct measure of insight, and the high range of IQ scores in the in-person sample.

"This study should be interpreted with the following limitations in mind. First, we relied on a single self-reported autism trait measure— the BAPQ...Second, as our study does not specifically measure insight, we cannot determine whether the discrepancy between self- and clinician-rated symptoms is directly related to insight differences in ASD...Third, the range of IQ scores in our in-person sample is high, suggesting that our sample may not be representative of the spectrum as a whole." (Page 8)
Suggestions for Future Research
The section concludes by suggesting future research directions, including investigating the use of additional trait measures and self-reported diagnoses in online studies, exploring platform differences in social profiles, and conducting large-scale replications of lab-based studies.

"Given our preliminary findings in the few individuals who self-report an autism diagnosis in our online sample (Supplementary Table 4), future work should investigate whether the use of additional trait measures and/or selection based on self-reported diagnoses in online studies would identify a group that shows behavior more closely aligned with the ASD phenotype...Future online studies recruiting individuals with self-reported diagnosis should also investigate potential platform differences in social profiles." (Page 8)

Suggestions for Improvement

Add Subheadings for Improved Organization
Medium impact. This would improve the flow and coherence of the Discussion section. The current structure jumps between different topics (e.g., interpretation of task results, implications for self-report, limitations) without clear transitions or subheadings. This makes it harder for the reader to follow the main arguments and understand the overall narrative of the discussion.

"Discussion" (Page 7)

Implementation: Introduce subheadings within the Discussion section to organize the different topics being addressed. For example: 'Discrepancies Between Self-Report and Clinician Assessment', 'Interpretation of Social Behavior Findings', 'Implications for Online Autism Research', 'Limitations and Future Directions'.
Balance Discussion of Online Research Limitations and Benefits
Medium impact. This would provide a more balanced and nuanced perspective on the findings. The current Discussion focuses primarily on the limitations of self-report, but it could also more explicitly acknowledge the potential benefits of online research and how it can complement lab-based studies.

"As online research continues to proliferate, we must consider the limitations of online approaches when determining which scientific questions they are best suited to answer...Online research is a powerful tool that will continue to help answer important questions in human-participant research." (Page 8)

Implementation: Include a paragraph or section discussing the potential advantages of online research, such as increased sample size, diversity, and accessibility. Emphasize that the study's findings do not invalidate online research but rather highlight the need for careful consideration of its limitations and the importance of combining it with other approaches.
Connect Findings to Specific ASD Theories and Models
Low impact. This would strengthen the connection between the study's findings and the broader literature. The current Discussion mentions some relevant studies, but it could more explicitly relate the findings to specific theories or models of ASD, such as the theory of mind deficit, the social motivation theory, or the enhanced perceptual functioning account.

"Such results may stem from reductions in ToM-related understanding of others’ motivations in the clinical ASD group but not the high-trait group." (Page 7)

Implementation: Incorporate more specific references to relevant theories and models of ASD when discussing the findings. For example, when discussing the social controllability results, explicitly link them to the theory of mind deficit and how it might explain the observed differences in social influence. When discussing the social navigation results, relate them to the social motivation theory and how it might explain the reduced affiliation in the ASD group.
Include Potential Clinical Implications
Low impact. This would provide a more complete picture of the potential implications of the findings. The current Discussion focuses primarily on research implications, but it could also briefly touch upon potential clinical implications, such as the need for clinicians to consider both self-report and observational measures when assessing individuals with ASD.

"It may be the case that self-reported traits lack diagnostic specificity, especially at subclini- cal thresholds, whereas clinicians are better able to assess symptoms rising to clinical relevance and to assign them the most parsimonious diagnoses through comprehensive analysis of both observed external behaviors and reported experiences." (Page 7)

Implementation: Add a brief paragraph discussing potential clinical implications of the findings. For example: 'These findings suggest that clinicians should consider both self-report and observational measures when assessing individuals for ASD, as relying solely on self-report may not capture the full range of social and behavioral difficulties. Furthermore, clinicians should be aware that high self-reported autistic traits in individuals without an ASD diagnosis may be indicative of other conditions, such as social anxiety or AVPD, and should tailor their assessments and interventions accordingly.'

Methods

Key Aspects

Participant Recruitment: The study recruited participants both online and in-person. Online participants were recruited through Prolific, with eligibility criteria including age (18-64), US residency, and a >90% approval rating. In-person participants were recruited through local advertisements and screened for ASD at the Seaver Autism Center, with eligibility criteria including age (18-50), meeting ASD criteria, and an IQ > 70.
Clinical Assessment of In-Person Participants: In-person participants underwent a comprehensive clinical assessment for ASD, including the Autism Diagnostic Observation Schedule, Second Edition (ADOS-2), developmental and clinical history, self- and informant-report of symptoms and adaptive behavior, cognitive functioning assessment, and clinical judgment based on DSM-5 criteria. This rigorous assessment was used to confirm ASD diagnoses in the in-person group.
Self-Report Measures: All participants completed several self-report measures. The Broad Autism Phenotype Questionnaire (BAPQ) was used to assess autistic traits. The Liebowitz Social Anxiety Scale (avoidance questions) and the Avoidant Personality Disorder Impairment Scale were used to assess other psychiatric traits. Participants also self-reported any prior clinical diagnoses.
Cognitive Ability Assessment: Cognitive ability was assessed using different measures for in-person and online participants. In-person participants completed either the Wechsler Abbreviated Scale of Intelligence 2nd edition or the Wechsler Intelligence Scale for Adults 4th edition. Online participants completed the 16-item International Cognitive Ability Resource test.
Grouping and Matching Procedure: The online sample was divided into high-trait and low-trait groups based on BAPQ scores. Age- and sex/gender-matched participants were then selected from these groups to match the in-person ASD group, resulting in three groups of 56 participants each. A matching function in R was used to find the nearest-neighbor match for each in-person ASD participant.
Social Controllability Task: The social controllability task was used to investigate how individuals exert control over others to maximize rewards. Participants were paired with virtual players from two teams and had to accept or reject proposed splits of $20. One team was 'controllable,' meaning that participants' choices influenced future offers, while the other team was 'uncontrollable,' with offers randomly sampled from a predetermined distribution.
Social Navigation Task: The social navigation task was a narrative-based game in which participants interacted with virtual characters. Participants made choices between two options that affected either the affiliation or power dynamics of the relationship. Behind the scenes, these choices moved the characters' positions within a two-dimensional social space framed by axes of power and affiliation.
Statistical Analysis: Statistical analyses included general linear models to evaluate relationships between continuous variables and mixed-effects regression models to evaluate group differences and group-by-trait interactions. Post hoc pairwise comparisons were conducted using the emmeans package in R, with appropriate adjustments for multiple comparisons.

Strengths

Detailed Participant Recruitment Description
The Methods section clearly describes the participant recruitment process for both online and in-person samples, providing detailed eligibility criteria and the platforms used (Prolific for online, local advertisements and listservs for in-person).

"Online participants were enrolled in the study as part of a larger online project examining social cognition and mental health. Par- ticipants were recruited from Prolific (www.prolific.co)...with the eligibility criteria of (1) aged between 18 and 64, (2) currently living in the USA and (3) >90% approval rating in Prolific...Over the course of approximately 3 years, 259 individuals were screened in person for inclusion/exclusion by clinical staff at the Seaver Autism Center for Research and Treatment...Participants were recruited through announcements posted on physical flyers around New York City and email listservs with the eligibility criteria of (1) age between 18 and 50, (2) meet criteria for ASD and (3) IQ > 70." (Page 8)
Comprehensive Clinical Assessment Procedures
The section provides a comprehensive description of the clinical assessment procedures used for the in-person ASD group, including the use of the ADOS-2, developmental and clinical history, self- and informant-report, cognitive functioning assessment, and clinical judgment based on DSM-5 criteria.

"Participants were screened for ASD by licensed, research-reliable clinicians using the ADOS-2, developmental and clinical history, self- and informant (for example, parent or roommate) report of symptoms and adap- tive behavior, cognitive functioning and clinical judgment regarding whether the individual meets the criteria for ASD in the Diagnostic and Statistical Manual of Mental Disorders 5th edition60." (Page 8)
Clear Description of Measures
The section clearly outlines the measures used to assess autistic traits (BAPQ), other psychiatric traits (Liebowitz Social Anxiety Scale, Avoidant Personality Disorder Impairment Scale), self-reported clinical diagnoses, and cognitive ability (Wechsler scales for in-person, ICAR for online). The rationale for selecting the BAPQ is also provided.

"To assess levels of autistic traits in the sample, all participants com- pleted the BAPQ. The BAPQ was selected due to its high sensitivity, specificity and test-retest reliability17,49,50...All participants completed additional questionnaires to investigate traits of other psychiatric diagnoses, including Liebow- itz Social Anxiety Scale54 (avoidance questions) and the Avoidant Personality Disorder Impairment Scale55...To assess cognitive ability, all in-person participants completed either the Wechsler Abbreviated Scale of Intelligence 2nd edition57 or the Wechsler Intelligence Scale for Adults 4th edition58...All online participants completed the 16-item Inter- national Cognitive Ability Resource test59." (Page 9)
Clear Grouping Strategy and Matching Procedure
The section describes the grouping strategy, explaining how the online sample was subdivided into high-trait and low-trait groups based on BAPQ scores and how age- and sex/gender-matched participants were selected to match the in-person ASD group. The matching function is also mentioned, with a reference to the code availability.

"The full online sample (n = 502) was subdivided into those who scored above the cut-off score on the BAPQ (high-trait, n = 168) and those who scored in the bottom 25% on the BAPQ (low-trait, n = 121). To mini- mize potential differences between in-person and online samples, we selected age- and sex-/gender-matched participants from within both high- and low-trait online groups to match the in-person ASD group. To do so, we developed a matching function in R (see data and code availability for a link to the function)..." (Page 9)
Detailed Description of Experimental Paradigms
The section provides detailed descriptions of the two experimental paradigms: the social controllability task and the social navigation task. The descriptions include the task goals, procedures, conditions, and how participant choices were measured and analyzed. The use of virtual partners and the manipulation of controllability in the social controllability task are clearly explained.

"Social controllability task. The social controllability task26,27,34 inves- tigates how individuals exert control over others to maximize rewards. Participants were paired with virtual players from two 30-person teams...In each trial, the virtual partner proposed a way to split $20...and the participant had to decide whether to accept or reject the offer...Each team represented a different condition: controllable or uncontrollable...In the controllable condition, participants could either increase the value of the next offer by rejecting the current offer or decrease the value of the next offer by accepting the current offer...Social navigation task. The social navigation task36 is a narrative-based game in which participants interact with a variety of virtual characters...The task consisted of narrative trials...and decision trials, in which the participant had to choose between two ways of interacting with a given character...participants’ choices during the decision trials moved the characters’ positions within a two-dimensional social space framed by axes of power and affiliation." (Page 9)
Appropriate Statistical Analyses
The section clearly specifies the statistical analyses used, including general linear models for continuous variables and mixed-effects regression models for group comparisons and interactions. The use of appropriate statistical packages (lme4, emmeans) and methods for handling multiple comparisons (Tukey's adjustment, FDR correction) is also mentioned.

"For all analyses evaluating the relationships between continuous vari- ables (task or trait), we utilized general linear models. For all analyses evaluating group differences in traits and task performance, or group- by-trait interaction effects on continuous variables, mixed-effects regression models containing a random intercept for each matched pair were conducted using the lme4 package in R...Post hoc pairwise comparisons parsing the direc- tion of group effects were conducted using the emmeans package in R...Significant trait interactions were followed up by two-tailed Pearson correlations to parse the direction of effects in each group, with P values FDR-corrected for multiple comparisons." (Page 10)

Suggestions for Improvement

Provide More Detail on Social Navigation Task Trials
Medium impact. This would improve the reproducibility of the study and allow other researchers to fully understand and replicate the experimental procedures. The Methods section is the appropriate place for this level of detail, as it is where readers expect to find comprehensive information about the study's design and procedures.

"The task consisted of narrative trials, which contained images of characters and narrative-progressing text, and decision trials, in which the participant had to choose between two ways of interacting with a given character." (Page 9)

Implementation: Include more specific details about the social navigation task, such as examples of the narrative trials and decision trials. Provide examples of the pro-affiliative and anti-affiliative options, as well as the pro-power and anti-power options. This could be included in the main text or as supplementary material.
Justify Choice of Experimental Paradigms
Low impact. This would provide additional context for the study and help readers understand the rationale behind the chosen tasks. While the tasks are described, a brief explanation of *why* these specific tasks were chosen to assess social behavior in ASD would be beneficial. The Methods section is appropriate for this because it sets the stage for the experimental design.

"Experimental paradigms" (Page 9)

Implementation: Add a sentence or two at the beginning of the 'Experimental paradigms' subsection explaining why the social controllability and social navigation tasks were selected. For example: 'To assess social behavior in a controlled and quantifiable manner, we utilized two established experimental paradigms: the social controllability task, which measures participants' ability to exert and perceive social influence, and the social navigation task, which assesses social decision-making in a dynamic, narrative-based context.'
Specify Clinician Qualifications and Training
Low impact. This would enhance the clarity and transparency of the Methods section. While the section mentions that clinicians were 'licensed' and 'research-reliable,' it does not specify their professional backgrounds or the specific training they received for administering the ADOS-2 and making clinical judgments about ASD diagnoses. The Methods section is the correct location for this information, as it is essential for evaluating the rigor of the clinical assessment procedures.

"Participants were screened for ASD by licensed, research-reliable clinicians using the ADOS-2..." (Page 8)

Implementation: Specify the professional backgrounds of the clinicians who screened participants for ASD (e.g., clinical psychologists, psychiatrists) and the specific training they received in administering the ADOS-2 and making clinical judgments about ASD diagnoses (e.g., ADOS-2 research reliability training).
Add Subheadings for Improved Organization
High impact. This would improve the clarity and flow of the Methods section. Currently, the section jumps between different topics (participants, measures, grouping, experimental paradigms, statistics) without clear subheadings to guide the reader. This makes it harder to follow the different aspects of the methodology.

"Methods" (Page 8)

Implementation: Introduce clear subheadings within the Methods section to organize the different components. For example: 'Participants', 'Measures', 'Grouping Procedure', 'Social Controllability Task', 'Social Navigation Task', 'Statistical Analysis'.
Address Ethical Considerations of ASD Diagnosis
Medium impact. This would provide a more complete picture of the study's ethical considerations. While the section mentions IRB approval and informed consent, it does not explicitly address the ethical considerations related to diagnosing participants with ASD during the study, particularly those who were not previously diagnosed. The Methods section is the appropriate place for this information, as it pertains to the ethical conduct of the research.

"The study was approved by the institutional review board at the Icahn School of Medicine at Mount Sinai, and all participants provided informed consent before participation." (Page 8)

Implementation: Add a sentence or two addressing the ethical considerations related to diagnosing participants with ASD during the study. For example: 'Participants who received a first-time ASD diagnosis through this study were provided with information about the diagnosis and resources for support and further evaluation. The potential benefits and risks of receiving a diagnosis were discussed with participants as part of the informed consent process.'
Justify Software Versions
Low impact. This would provide additional context for interpreting the results. The current section mentions the versions of the software used but does not explain why these specific versions were used.

"The task was coded in Psychopy (in-person study, version 2021.1.4) and JavaScript (online study, version ES2019)." (Page 9)

Implementation: Add a brief justification for the versions of Psychopy and JavaScript used. For example, you could say that these were the most up-to-date stable versions at the time of the study, or that they offered specific features necessary for the experimental design.

Phenotypic divergence between individuals with self-reported autistic traits and clinically ascertained autism

Table of Contents

Overall Summary

Study Background and Main Findings

Research Impact and Future Directions

Critical Analysis and Recommendations

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Discussion

Key Aspects

Strengths

Suggestions for Improvement

Methods

Key Aspects

Strengths

Suggestions for Improvement