This study presents a large-scale empirical analysis of AI usage across economic tasks, using over four million conversations from Claude.ai mapped to the O*NET database. The analysis reveals that AI usage is primarily concentrated in software development and writing tasks, accounting for nearly half of the observed usage. However, approximately 36% of occupations show AI usage in at least a quarter of their associated tasks, indicating a broader diffusion. The study distinguishes between AI usage for augmentation (57%) and automation (43%), finding a slightly higher prevalence of augmentation. AI usage peaks in occupations with wages in the upper quartile and those requiring considerable preparation (e.g., a bachelor's degree). The study acknowledges limitations, including the data being from a single platform and potential biases in the methodology.
The study provides a novel and valuable contribution to understanding AI usage in the economy by leveraging a large dataset of Claude.ai conversations and mapping them to the O*NET database. The framework allows for granular, task-level analysis and dynamic tracking of AI adoption. However, the study's conclusions are primarily correlational, not causal. The analysis demonstrates associations between AI usage and various factors (occupation, wage, skills), but it cannot definitively determine cause-and-effect relationships. For instance, while AI usage is higher in certain occupations, it's unclear if AI *causes* changes in those occupations or if pre-existing characteristics of those occupations lead to greater AI adoption.
The practical utility of the findings is significant, offering a framework for monitoring AI's evolving role in the economy. The task-level analysis provides valuable insights for businesses, policymakers, and workers seeking to understand and adapt to the changing landscape of work. The findings regarding augmentation versus automation are particularly relevant, suggesting that AI is currently used more as a collaborative tool than a replacement for human labor. However, the study's focus on a single platform (Claude.ai) limits the generalizability of the results to other AI systems and user populations.
The study provides clear guidance for future research, emphasizing the need for longitudinal studies, investigation of causal relationships, and expansion to other AI platforms. It acknowledges key uncertainties, such as the long-term economic impacts of AI adoption and the potential for bias in the data and classification methods. The authors appropriately caution against over-interpreting the findings and highlight the need for ongoing monitoring and analysis.
Critical unanswered questions remain, particularly regarding the causal mechanisms driving AI adoption and its impact on employment and wages. While the study identifies correlations, it cannot determine whether AI usage *causes* changes in occupational structure or productivity. The limitations of the data source (a single AI platform) and the potential for bias in the model-driven classification fundamentally affect the interpretation of the results. While the study provides a valuable snapshot of AI usage, it's crucial to acknowledge that the findings may not be representative of the broader AI landscape or the overall workforce. Further research is needed to address these limitations and to explore the long-term consequences of AI adoption.
The abstract clearly states the research gap: the lack of systematic empirical evidence on AI's actual use in different tasks, despite widespread speculation about its impact.
It concisely summarizes the novel framework and methodology used, highlighting the use of a privacy-preserving system and the O*NET database.
The abstract presents the main findings, including the concentration of AI usage in software development and writing, broader usage across the economy, and the balance between augmentation and automation.
It acknowledges the limitations of the study, providing a balanced perspective.
The abstract concludes by highlighting the significance and potential impact of the research.
This medium-impact improvement enhances the abstract's clarity and impact by making the core contribution more prominent. The abstract currently introduces the 'novel framework' but doesn't immediately and explicitly state *what* that framework enables. This is addressed later, but front-loading the 'what' improves reader comprehension. This belongs in the abstract as it frames the entire study.
Implementation: Add a phrase after introducing the framework that succinctly states its primary capability. For example: '...a novel framework for measuring AI usage patterns across the economy, *allowing for the first large-scale, task-level analysis of AI adoption*. We leverage...'
This high-impact improvement would strengthen the abstract by providing quantitative context to the claims. Adding specific numbers (where available) makes the findings more concrete and impactful. This is crucial for an abstract, which serves as a concise summary of the research.
Implementation: Include specific numbers or ranges where possible. Examples: - Instead of 'over four million conversations', state '4.x million conversations'. - Add a statistic about the number of tasks or occupations analyzed, if feasible within the word limit.
This low-impact improvement would slightly improve the abstract's completeness. While the abstract mentions 'augmentation' and 'automation', it doesn't define them. Although these are common terms, providing brief parenthetical definitions enhances clarity, especially for readers less familiar with the terminology. The abstract is the appropriate location for these concise definitions.
Implementation: Add brief parenthetical definitions after the first use of 'augmentation' and 'automation'. For example: '...augmentation (e.g., learning or iterating on an output) while 43% suggests automation (e.g., fulfilling a request with minimal human involvement).'
The introduction clearly establishes the research gap: the lack of systematic empirical evidence on how AI systems are being integrated into the economy, despite rapid advancements and their potential impact on labor markets.
The introduction effectively introduces the novel framework for measuring AI usage across different tasks in the economy, highlighting the use of privacy-preserving analysis of conversations on Claude.ai and mapping them to the O*NET database.
The introduction concisely presents the five key contributions of the research, providing a clear overview of the study's scope and findings.
Figure 1 provides a visual representation of the framework, effectively illustrating how conversations are mapped to tasks and occupations, and how this approach allows for tracking AI's role in the economy.
This high-impact improvement would significantly strengthen the introduction by providing a more compelling motivation for the research. While the introduction mentions the lack of empirical evidence, it doesn't fully articulate *why* this evidence is crucial for stakeholders like policymakers, businesses, and workers. The introduction is the correct place for this because it sets the stage for the entire paper.
Implementation: Add a sentence or two explicitly stating the importance of understanding AI usage patterns. For example: 'This understanding is critical for policymakers to develop effective labor market strategies, for businesses to make informed investment decisions, and for workers to adapt to the changing demands of the job market.'
This medium-impact improvement would enhance the introduction's clarity and flow by providing a more structured overview of the key contributions. While the contributions are listed, they could be presented in a more cohesive and impactful way. The introduction is the appropriate place for this overview.
Implementation: Instead of just listing the five contributions, briefly introduce them with a sentence like: 'This framework allows us to: (1) Provide the first large-scale...' and then list the contributions with slightly more detail, perhaps combining some related points.
This low-impact improvement would enhance the introduction's completeness by briefly mentioning the limitations of the study. While the abstract acknowledges limitations, doing so in the introduction as well provides a more balanced perspective from the outset. This sets appropriate expectations for the reader.
Implementation: Add a sentence at the end of the introduction acknowledging the limitations. For example: 'While this study provides valuable insights, it is important to note that our data is limited to a single platform and faces certain methodological constraints, which are discussed in detail later in the paper.'
Figure 1: Measuring AI use across the economy. We introduce a framework to measure the amount of AI usage for tasks across the economy. We map conversations from Claude.ai to occupational categories in the U.S. Department of Labor's O*NET Database to surface current usage patterns. Our approach provides an automated, granular, and empirically grounded methodology for tracking Al's evolving role in the economy. (Note: figure contains illustrative conversation examples only.)
Figure 2: Hierarchical breakdown of top six occupational categories by the amount of AI usage in their associated tasks. Each occupational category contains the individual O*NET occupations and tasks with the highest levels of appearance in Claude.ai interactions.
Figure 3: Comparison of occupational representation in Claude.ai usage data and the U.S. economy. Results show most usage in tasks associated with software development, technical writing, and analytical, with notably lower usage in tasks associated with occupations requiring physical manipulation or extensive specialized training. U.S. representation is computed by the fraction of workers in each high-level category according to the U.S. Bureau of Labor Statistics [U.S. Bureau of Labor Statistics, 2024].
Figure 7: Distribution of automative behaviors (43%) where users delegate tasks to AI, and augmentative behaviors (57%) where users actively collaborate with AI. Patterns are categorized into five modes of engagement; automative modes include Directive and Feedback Loop, while augmentative modes are comprised of Task Iteration, Learning, and Validation.
Table 2: Analysis of AI usage across occupational barriers to entry, from Job Zone 1 (minimal preparation required) to Job Zone 5 (extensive preparation required). Shows relative usage rates compared to baseline occupational distribution in the labor market. We see peak usage in Job Zone 4 (requiring considerable preparation like a bachelor's degree), with lower usage in zones requiring minimal or extensive preparation.
The section clearly describes the use of Clio, a privacy-preserving analysis tool, to classify conversations across occupational tasks, skills, and interaction patterns. This tool is central to the methodology and its use is well-justified.
The methodology includes a hierarchical task-level analysis, mapping conversations to the O*NET database. The creation of a hierarchical tree of tasks is a novel approach to handle the large number of unique task statements in O*NET.
The section clearly outlines the data collection period (December 2024 and January 2025) and the data source (one million Claude.ai Free and Pro conversations). This provides transparency about the data used.
The methodology addresses the potential for multiple valid task mappings for a single conversation and acknowledges observing qualitatively similar results when mapping a conversation to multiple tasks.
The section effectively uses figures (2, 3, 4, 5, 6, and 7) to visually represent the data and findings, making the information more accessible and easier to understand.
The methodology includes an analysis of occupational skills exhibited in the conversations, using Clio to identify the skills present in Claude's responses. This provides insights into the types of skills AI is being used to demonstrate.
The section analyzes AI usage by wage and barrier to entry, using O*NET data to explore these correlations. This provides a valuable socioeconomic perspective on AI adoption.
The methodology distinguishes between automative and augmentative behaviors, classifying conversations into five collaboration patterns. This provides a nuanced understanding of how AI is being used in different work contexts.
This high-impact improvement would significantly increase the reproducibility and transparency of the study. While the section mentions using Clio and creating a hierarchical tree of tasks, it lacks sufficient detail on the specific algorithms, parameters, and decision rules used in the classification process. Providing these details is crucial for a Methods section, as it allows other researchers to understand, replicate, and build upon the work. The appendices are referenced, but the core methodology should be clear within this section.
Implementation: Include a more detailed description of the hierarchical tree creation process, including: - The specific algorithm used for creating the hierarchy (e.g., clustering algorithm, specific linkage criteria). - The parameters used in the algorithm (e.g., number of clusters at each level, distance metric). - The decision rules for assigning conversations to nodes in the tree (e.g., threshold for similarity score). - How the hierarchy was validated (if applicable).
This high-impact improvement would enhance the validity and reliability of the study. While the section mentions analyzing conversations, it does not explicitly address the potential for bias in the dataset. Since the data comes from Claude.ai users, it may not be representative of the broader population or workforce. Acknowledging and addressing this potential bias is essential for the Methods section, as it directly impacts the generalizability of the findings.
Implementation: Include a subsection discussing potential biases in the dataset, including: - Acknowledging that the data is from a single platform (Claude.ai) and may not represent all AI users. - Discussing the potential demographics or characteristics of Claude.ai users that might differ from the general population. - Explaining any steps taken to mitigate or account for these biases (if any). - Suggesting future research to address these limitations.
This medium-impact improvement would increase the clarity and rigor of the methodology. While the section mentions human validation in Appendix C, the core details of this validation should be summarized within the Methods section itself. This is important for readers to understand the quality of the classification and the extent to which the automated methods align with human judgment.
Implementation: Include a brief summary of the human validation process, including: - The number of conversations or tasks validated by humans. - The expertise or qualifications of the human validators. - The instructions or guidelines provided to the human validators. - The level of agreement between the automated classification and human judgment (e.g., inter-rater reliability).
This medium-impact improvement would strengthen the methodological rigor. The section mentions using Clio to classify conversations into collaboration patterns, but it does not provide sufficient detail on how these classifications were made. Providing more information about the criteria, rules, or prompts used for this classification would enhance the transparency and reproducibility of the study.
Implementation: Include a more detailed description of the classification process for collaboration patterns, including: - The specific criteria or rules used to distinguish between the five collaboration patterns. - Examples of conversations that would fall into each category. - The prompt or instructions given to Clio for this classification (or a summary if the full prompt is lengthy).
This low-impact improvement would improve the clarity and completeness of the methodology. The section mentions excluding activity from business customers, but the rationale for this exclusion is not fully explained. Providing a brief explanation would help readers understand the scope and limitations of the data.
Implementation: Add a sentence or two explaining *why* business customers were excluded. For example, this might be due to different usage patterns, contractual agreements, or privacy considerations. A concise justification strengthens the methodological choices.
Figure 4: Depth of AI usage across occupations. Cumulative distribution showing what fraction of occupations (y-axis) have at least a given fraction of their tasks with AI usage (x-axis). Task usage is defined as occurrence across five or more unique user accounts and fifteen or more conversations. Key points on the curve highlight that while many occupations see some AI usage (~36% have at least 25% of tasks), few occupations exhibit widespread usage of AI across their tasks (only ~4% have 75% or more tasks), suggesting AI integration remains selective rather than comprehensive within most occupations.
Figure 5: Distribution of occupational skills exhibited by Claude in conversations. Skills like critical thinking, writing, and programming have high presence in AI conversations, while manual skills like equipment maintenance and installation are uncommon.
Figure 6: Occupational usage of Claude.ai by annual wage. The analysis reveals notable outliers among mid-to-high wage professions, particularly Computer Programmers and Software Developers. Both the lowest and highest wage percentiles show substantially lower usage rates. Overall, usage peaks in occupations within the upper wage quartile, as measured against U.S. median wages [US Census Bureau, 2022].
Table 1: Taxonomy of Human-AI Collaboration Patterns. We classify conversations into five distinct patterns across two broad categories based on how people integrate AI into their workflow.
Figure 8: Comparative analysis of task usage patterns between Claude Sonnet 3.5 (New) and Claude Opus models, showing differential preferences in usage. Sonnet 3.5 (New) demonstrates more usage for coding and technical tasks, while Opus is more used for creative writing and educational content development.
Figure 9: Example subsection of the generated O*NET task hierarchy. Our hierarchy contains three levels: 12 top-level tasks, 474 middle-level tasks, and 19530 base-level (O*NET) tasks.
The discussion effectively summarizes the main findings of the study, highlighting the large-scale empirical analysis of AI usage across economic tasks and acknowledging key limitations.
The discussion presents a comprehensive list of limitations, including data sample representativeness, reliability of model-driven classification, varying complexity of user queries, limitations of the O*NET database, and lack of full context into user workflows. This provides a balanced perspective on the study's findings.
The discussion compares the empirical findings to previous predictive studies, highlighting both validations and challenges to existing predictions about AI's impact on work. This contextualizes the study's contributions within the broader literature.
The discussion emphasizes the importance of dynamic tracking of AI usage and task-level measurement, highlighting the advantages of the proposed framework for monitoring AI's integration into the workforce over time.
The discussion explores the distinction between augmentation and automation, noting the implications of this difference for workers and productivity. This adds a nuanced perspective on how AI systems are being used.
The discussion connects the findings to broader economic impacts, acknowledging the challenges of inferring long-term consequences from early usage trends but highlighting potential implications for productivity, displacement, and economic opportunities.
The discussion section effectively connects the current findings with those of previous sections. Specifically, it links back to the usage patterns by model type (Section 3.5), providing context for Figure 8.
This high-impact improvement would strengthen the discussion by providing a more structured and cohesive presentation of the implications and future work. Currently, these points are presented in a somewhat fragmented way. Grouping related implications and explicitly labeling subsections would improve readability and impact. The Discussion section is the appropriate place for this, as it synthesizes the findings and looks ahead.
Implementation: Restructure Section 4.2 into clearly labeled subsections, grouping related implications. Possible subsections could include: - **Implications for Predictive Modeling:** Discussing the comparison to previous studies. - **A Framework for Dynamic Monitoring:** Focusing on the dynamic tracking aspect. - **Task-Level vs. Job-Level Impacts:** Elaborating on the task-level measurement findings. - **Augmentation, Automation, and Productivity:** Expanding on the augmentation vs. automation discussion. - **Future Research Directions:** Outlining specific areas for future work.
This medium-impact improvement would enhance the discussion by providing more concrete suggestions for future research. While the section mentions 'areas for future research,' it lacks specific, actionable research questions or directions. The Discussion section is the ideal place to lay out a roadmap for future investigations.
Implementation: Expand the 'future work' section with specific research questions or directions. Examples: - 'Future research should investigate the causal relationship between AI usage and productivity changes in specific sectors.' - 'Longitudinal studies are needed to track the evolution of job roles and the emergence of new tasks driven by AI.' - 'Further research should explore the impact of AI on wage inequality and skill demands.' - 'Comparative studies across different AI platforms and user populations are needed to assess the generalizability of these findings.'
This medium-impact improvement would strengthen the discussion by providing a more nuanced discussion of the limitations related to model-driven classification. While the section acknowledges potential inconsistencies, it could elaborate on the specific types of errors or biases that might arise and how they could affect the results. The Discussion section is where limitations are critically assessed.
Implementation: Expand the discussion of 'Reliability of model-driven classification' to include: - Specific examples of how the model's understanding of tasks might differ from O*NET. - The potential for systematic biases in the classification (e.g., over- or under-representation of certain types of tasks). - A more explicit discussion of how the human validation (mentioned briefly) addresses these concerns. - Suggestions for improving the classification accuracy in future work (e.g., using a more diverse training dataset, incorporating human feedback).
This low-impact improvement would add a valuable perspective to the discussion by considering the ethical implications of widespread AI adoption. While the section focuses on economic impacts, briefly mentioning ethical considerations would provide a more complete picture. This is appropriate for the Discussion section, as it broadens the implications beyond the immediate findings.
Implementation: Add a brief paragraph discussing potential ethical implications, such as: - The potential for bias and discrimination in AI-driven hiring or task allocation. - The impact on worker autonomy and job satisfaction. - The need for transparency and accountability in AI systems used in the workplace. - The broader societal implications of widespread automation.
Figure 10: We observe minimal difference in our measurements of AI use across occupational categories when measuring by number of conversations versus number of accounts
Figure 14: Comparison between the prevalence of occupational categories when determined via direct assignment compared to clusters at various aggregation levels.
Figure 15: Prevalence of occupational categories when determined via direct assignment compared to clusters at various aggregation levels.
Figure 16: Comparison of occupational category distributions between direct assignment and cluster-based approaches at various aggregation levels, evaluated using different correlation metrics (Pearson, Kendall, Spearman) and Mean Square Error (MSE). These metrics provide complementary views of how well the cluster-based categorization aligns with analysis on conversations directly.
Figure 17: Comparison between the prevalence of occupations when determined via direct assignment compared to clusters at various aggregation levels.
Figure 18: Prevalence of top occupations when determined via direct assignment compared to clusters at various aggregation levels.
Figure 19: Comparison of occupation distributions between direct assignment and cluster-based approaches at various aggregation levels, evaluated using different correlation metrics (Pearson, Kendall, Spearman) and Mean Square Error (MSE). These metrics provide complementary views of how well the cluster-based categorization aligns with analysis on conversations directly.
Figure 20: Comparison of median salary by task usage between direct assignment and clusters of size ~1,500. In both cases, task usage is highest for occupations with a median salary between $50,000 and $125,000.
Figure 21: Number of occupations recovered at each aggregation level compared to direct assignment.
Figure 22: Comparison between the prevalence of tasks when determined via direct assignment compared to clusters at various aggregation levels.
Figure 23: Prevalence of top tasks when determined via direct assignment compared to clusters at various aggregation levels.
Figure 24: Comparison of task distributions between direct assignment and cluster-based approaches at various aggregation levels, evaluated using different correlation metrics (Pearson, Kendall, Spearman) and Mean Square Error (MSE). These metrics provide complementary views of how well the cluster-based categorization aligns with analysis on conversations directly.
Figure 25: Number of occupations recovered at each aggregation level compared to direct assignment.
Figure 26: Number of tasks assigned to top occupations at various aggregation levels. As expected, higher aggregation levels have fewer average tasks per occupation.
Figure 27: Mean number of tasks assigned to each occupation at various aggregation levels. Direct assignment assigns an average of 4.8 tasks per occupation.