Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way

Chenglong Wang, Bongshin Lee, Steven Drucker, Dan Marshall, Jianfeng Gao
arXiv: arXiv:2408.16119v2
Microsoft Research

First Page Preview

First page preview

Table of Contents

Overall Summary

Study Background and Main Findings

This paper introduces Data Formulator 2 (Df2), an AI-powered visualization system designed to address the challenges of iterative authoring in exploratory data analysis. Existing AI tools often require users to provide complete, text-only descriptions of visualizations upfront, which is impractical when analytical goals evolve during exploration. Df2 tackles this limitation by blending a graphical user interface (GUI) with natural language (NL) input, allowing users to specify chart designs precisely while delegating data transformation tasks to the AI. The system also introduces "data threads," a mechanism for tracking the history of data transformations and visualizations, enabling users to easily revisit, revise, and branch from previous steps.

The core of Df2's methodology involves decoupling chart specification from data transformation. Users define their visualization intent through a combination of GUI interactions (e.g., drag-and-drop field mapping) and concise NL instructions. The system then generates a Vega-Lite specification (a high-level grammar for interactive graphics) and prompts a large language model (LLM) to produce Python code for the necessary data transformations. Df2 executes this code, handles potential errors, and instantiates the Vega-Lite specification with the transformed data to generate the visualization. Data threads provide a visual representation of the user's interaction history, facilitating navigation and reuse of previous results.

A user study with eight participants, with varying levels of expertise in data analysis and programming, demonstrated Df2's effectiveness in supporting iterative visualization authoring. Participants successfully completed a series of tasks involving the creation of 16 visualizations, requiring diverse data transformations. The study revealed distinct iteration styles among users, some preferring broader exploration with multiple branches (wide trees), while others favored deeper, more linear progressions (deep trees). Participants also employed various prompting techniques, ranging from imperative commands to questions and chat-style interactions. The study highlighted the importance of Df2's transparency features, such as code explanations and data provenance tracking, in building user trust and facilitating verification of AI-generated outputs.

The discussion explores future directions for Df2, including integration with visualization recommendation systems and the development of agent-based systems for coordinating data transformation and chart editing. The authors acknowledge the limitations of the user study, particularly its focus on reproduction tasks and the lab setting, and propose future research involving open-ended exploration and longitudinal studies to investigate long-term user behavior and learning effects.

Research Impact and Future Directions

Data Formulator 2 (Df2) presents a compelling approach to iterative visualization authoring, effectively addressing the limitations of existing AI-powered tools. The system's innovative blend of GUI and NL input, coupled with its sophisticated data threading mechanism, empowers users to navigate complex data transformations and explore diverse visualization strategies with remarkable efficiency. The user study, while limited in sample size, provides strong qualitative evidence for Df2's usability and its potential to transform data analysis workflows. The system's transparency features, including code explanations and data provenance tracking, foster user trust and facilitate verification of AI-generated outputs.

However, the study's reliance on reproduction tasks and the lab setting constrains the generalizability of findings to real-world, open-ended exploration scenarios. Future research addressing these limitations, along with the proposed enhancements for recommendation systems and agent-based chart editing, will be crucial for realizing Df2's full potential. The core strength of this work lies in its robust, user-centered design and its potential to democratize access to sophisticated data visualization techniques by lowering the barrier to entry for users with varying levels of programming expertise. The integration of AI capabilities within an intuitive interface offers a promising pathway for more efficient, insightful, and accessible data exploration.

Critical Analysis and Recommendations

Clear Problem Articulation (written-content)
The abstract effectively establishes the context by highlighting the limitations of current AI visualization tools in handling iterative authoring, a crucial aspect of exploratory data analysis. This clear problem articulation sets the stage for the paper's contribution.
Section: Abstract
Concise Solution Introduction (written-content)
The abstract concisely introduces Df2 and its core functionalities, including the hybrid GUI/NL input and support for iteration history. This allows readers to quickly grasp the system's key innovations.
Section: Abstract
Precise Identification of Tool Limitations (written-content)
The Introduction effectively transitions from the general problem of iterative visualization to the specific limitations of existing tools, providing a strong rationale for Df2's design.
Section: Introduction
Clear Workflow Illustration (graphical-figure)
Figure 1 effectively communicates the core workflow of Df2, illustrating how users can iteratively refine visualizations using the data threads and chart builder. The visual representation clarifies the system's key functionalities.
Section: Introduction
Clear Articulation of Dual Design Strategy (written-content)
The Method section clearly articulates the dual design strategy of decoupling chart specification from data transformation and using data threads. This provides a solid foundation for understanding the system's architecture and functionality.
Section: Method
Sophisticated AI Prompting and Error Handling (written-content)
The Method section thoroughly describes the sophisticated AI prompting and error handling mechanisms, demonstrating a robust approach to AI integration and enhancing the system's reliability.
Section: Method
In-depth Analysis of Emergent Iteration and Prompting Styles (written-content)
The Results section provides a detailed analysis of emergent iteration and prompting styles, offering valuable insights into how users adapt to and utilize the novel features of Df2. This qualitative data enriches the understanding of user behavior.
Section: Results
Visualization of Workflow Variability (graphical-figure)
Figure 12 effectively visualizes the variability in user workflows, supporting the qualitative analysis of iteration styles and providing compelling evidence for the system's flexibility.
Section: Results
Visionary Integration with Recommendation Systems (written-content)
The Discussion section proposes a visionary integration with recommendation systems, leveraging Df2's strengths to address limitations of existing tools and potentially broaden the scope of data exploration.
Section: Discussion and Future Work
Transparent Acknowledgment of Study Limitations (written-content)
The Discussion acknowledges the limitations of the user study, particularly the use of reproduction tasks and the lab setting, and proposes specific future studies to address these gaps and enhance the generalizability of findings. This strengthens the paper's methodological rigor.
Section: Discussion and Future Work
Quantify Observed Differences in Iteration Styles (written-content)
The Results section lacks quantitative analysis of the observed differences in iteration styles. Quantifying these differences (e.g., average branch depth, frequency of backtracking) would strengthen the claims about distinct user approaches. This would provide more objective support for the qualitative observations.
Section: Results
Elaborate on Bias Mitigation in AI-Enhanced Recommendations (written-content)
The Discussion section could be strengthened by elaborating on strategies to mitigate potential biases in AI-enhanced recommendations. Addressing this concern is crucial for responsible development and deployment of such features. The lack of discussion on bias mitigation weakens the proposal for integrating recommendation systems.
Section: Discussion and Future Work

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Figure 1: With Data Formulator 2, analysts can iterate on a previous design by...
Full Caption

Figure 1: With Data Formulator 2, analysts can iterate on a previous design by (1) selecting a chart from data threads and (2) providing combined natural language and graphical user interface inputs in the chart builder to specify the new design. The AI model generates code to transform the data and update the chart. Data threads are updated with new charts for future use.

Figure/Table Image (Page 1)
Figure 1: With Data Formulator 2, analysts can iterate on a previous design by (1) selecting a chart from data threads and (2) providing combined natural language and graphical user interface inputs in the chart builder to specify the new design. The AI model generates code to transform the data and update the chart. Data threads are updated with new charts for future use.
First Reference in Text
For example, when exploring renewable energy trends, an analyst may find that similar trends across countries make a simple line chart (Figure 1) too dense for detailed comparisons.
Description
  • Overall Workflow Demonstration: Figure 1 illustrates the user interface and workflow of Data Formulator 2, a system for iterative data visualization. It shows how a user can refine a chart by selecting a previous version from "Data Threads" and then providing instructions in a "Chart Builder" using both graphical inputs and natural language.
  • Data Threads Panel: The "Data Threads" panel on the left displays a history of chart versions, akin to saved states in a design process. For instance, "thread-1" shows an initial chart from "energy.csv" with axes like "Year", "Electricity", and "Entity". "thread-2" shows an evolution, starting with "energy.csv" to a chart labeled "table-42" plotting "Renewable Per..." (Renewable Percentage) by "Entity" over "Year", and then further transformed into "table-86" based on a textual command.
  • Chart Builder Panel and User Input: The central "Chart Builder" panel shows an active chart modification step. An initial line chart (from "table-42") displays "Renewable Percentage" for multiple entities over years. The user inputs a natural language command: "Show only top 5 CO2 emission countries' trends." This panel also shows GUI elements for selecting chart type ("Line Chart") and mapping data fields to visual properties (e.g., x-axis: Year, y-axis: Renewable Per..., color: Entity).
  • Data Table Preview: A data table preview within the "Chart Builder" provides a snapshot of the underlying data being visualized. It includes columns like "Year", "Entity", and "Renewable Percentage", with example rows such as "2000, China, 16.639" and "2020, Japan, 21.325".
  • Resulting Refined Chart (New Thread): The outcome of the user's interaction is displayed on the right as a "new thread". This is a refined line chart showing "Renewable Percentage" (y-axis, values from approximately 0 to 40) versus "Year" (x-axis, from 2000 to 2015) for a filtered set of entities: China, Germany, India, Japan, and United States. This visualization directly reflects the application of the user's natural language instruction.
Scientific Validity
  • ✅ Illustrates core system functionality: The figure effectively demonstrates the core functionality of Data Formulator 2—iterative visualization refinement through combined GUI and natural language inputs—which aligns with the paper's claimed contributions.
  • ✅ Realistic use-case scenario: The chosen example, filtering a dense line chart of renewable energy trends to show specific countries, represents a common and realistic task in exploratory data analysis, thereby underscoring the system's practical relevance.
  • 💡 AI mechanism not visually detailed: The caption states that an AI model generates code to transform data. While the figure shows the input (NL query) and output (refined chart), the AI's role and the nature of the code generation are not visually detailed within the figure itself. This is acceptable for a high-level overview but might leave a reader wondering about the underlying AI mechanism's complexity or verifiability from this figure alone.
  • 💡 Ambiguity in data source for filtering criteria: The instruction "Show only top 5 CO2 emission countries' trends" implies filtering based on CO2 emissions. However, the visible input chart and data table in the "Chart Builder" focus on "Renewable Percentage." The figure does not explicitly show how CO2 emission data is accessed or linked for this filtering operation. It's assumed the AI handles this, but clarity on data provenance for the filtering criteria could be improved. For example, is CO2 data part of the 'energy.csv' or 'table-42' dataset?
  • 💡 Minor inconsistency in year ranges: There's a minor inconsistency: the "new thread" chart displays data up to the year 2015, whereas the data table preview in the "Chart Builder" includes entries for the year 2020 (e.g., "2020, Japan, 21.325"). While not critical, aligning the year ranges or explaining the discrepancy would enhance consistency.
Communication
  • ✅ Clear process illustration: The figure effectively uses a left-to-right visual flow (Data Threads -> Chart Builder -> new thread) to illustrate the iterative chart creation process, making the workflow easy to follow.
  • ✅ Effective communication of multi-modal input: The direct annotation of the natural language query ("Show only top 5 CO2 emission countries' trends") on the UI mockup clearly communicates the multi-modal input capability of the system.
  • ✅ Highlights system benefit: The visual contrast between the implied complexity of an initial chart and the filtered, cleaner "new thread" chart successfully highlights a key benefit of the system in reducing information overload.
  • ✅ Clear and aligned caption: The caption accurately and concisely describes the overall process depicted, aligning well with the visual elements shown in the figure.
  • 💡 Low resolution of text in Data Thread previews: The text within the small chart previews in the "Data Threads" panel (e.g., for table-49, table-42, table-86) is of low resolution and difficult to read. While these act as thumbnails, improving legibility could enhance the understanding of the iteration history. Suggestion: Increase the font size or simplify these preview elements if detailed content is not critical, or use higher-resolution inserts.
  • 💡 Non-semantic labels for tables: The labels like "table-42", "table-49", etc., are likely internal system identifiers and lack semantic meaning for the reader, potentially adding slight clutter without clear informational value in this context. Suggestion: Consider de-emphasizing them or using more descriptive placeholders if these are not crucial for understanding the figure's message.
  • 💡 High information density: The figure is information-dense, particularly the "Chart Builder" section. While this showcases various system features, it might initially overwhelm a viewer trying to grasp the core iterative step. Suggestion: Ensure visual hierarchy clearly guides the viewer's attention to the most critical elements for the illustrated workflow, perhaps with more prominent callouts or by slightly graying out less relevant UI components for this specific example.
Figure 2: An analyst explores electricity from different energy sources,...
Full Caption

Figure 2: An analyst explores electricity from different energy sources, renewable percentage trends, and country rankings by renewable percentages using a dataset on CO2 and electricity for 20 countries (2000-2020, table 1). The analyst creates five data versions in three branches to support different chart designs. DF2 allows users to manage iteration directions and create rich visualizations using a blended UI and natural language inputs.

Figure/Table Image (Page 3)
Figure 2: An analyst explores electricity from different energy sources, renewable percentage trends, and country rankings by renewable percentages using a dataset on CO2 and electricity for 20 countries (2000-2020, table 1). The analyst creates five data versions in three branches to support different chart designs. DF2 allows users to manage iteration directions and create rich visualizations using a blended UI and natural language inputs.
First Reference in Text
The initial dataset, shown in Figure 2-1, includes each country's energy produced from three sources (fossil fuel, renewables, and nuclear) each year and annual CO2 emission value (the CO2 emission data only ranges from 2000 to 2019).
Description
  • Panel Content and Identification: Panel 1 of Figure 2, identified as "Figure 2-1" in the reference text, displays a snippet of the initial tabular dataset. This dataset is central to the subsequent analyses and visualizations depicted in the overall figure, concerning energy production and CO2 emissions for various countries.
  • Data Columns and Units: The table in Panel 1 includes several data columns: "Year" (showing years like 2000 and 2020), "Entity" (representing the country, e.g., Australia, Brazil, China, United Kingdom, United States), "CO2 emissions (kt)" where 'kt' signifies kilotonnes, a unit of mass (e.g., Australia in 2000 reported 339450 kt of CO2 emissions), "Electricity from fossil fuels (TWh)", "Electricity from nuclear (TWh)", and "Electricity from renewables (TWh)". 'TWh' stands for Terawatt-hours, a common unit for large-scale energy measurement. For example, in 2000, Australia generated 181.05 TWh from fossil fuels, 0 TWh from nuclear, and 17.11 TWh from renewables.
  • CO2 Data Range and Null Values: An important detail visible in Panel 1 is that the "CO2 emissions (kt)" column shows "null" values for the United Kingdom and United States for the year 2020. This is consistent with the reference text, which clarifies that the CO2 emission data in this dataset only covers the period from 2000 to 2019.
  • Dataset Scope Illustration: The main caption for Figure 2 states that the complete dataset encompasses 20 countries over the years 2000-2020. Therefore, Panel 1 presents only an illustrative subset of this larger dataset, showcasing its structure and the types of data it contains.
Scientific Validity
  • ✅ Foundation for Subsequent Analysis: This table (Panel 1) appropriately presents a sample of the raw input data. This serves as the necessary foundation for the data transformations and visualizations illustrated in the subsequent panels of Figure 2, thereby promoting transparency in the depicted analytical workflow.
  • ✅ Comprehensive Variables for Energy Analysis: The dataset snippet includes key variables relevant to energy and environmental analysis: CO2 emissions and electricity generation categorized by source (fossil fuels, nuclear, renewables). This allows for a comprehensive basis for exploring trends in energy consumption and sustainability.
  • 💡 Handling and Clarification of Missing CO2 Data: The representation of "null" values for CO2 emissions in 2020 is consistent with the accompanying reference text, which states that CO2 data is available only up to 2019. This accurate representation of data limitations is good. However, if this table were to be presented as a standalone element, a direct footnote explaining these nulls would be crucial for immediate clarity and to prevent users from misinterpreting them as simple data omissions or errors.
  • 💡 Data Source and Representativeness Context: Panel 1 displays data for only a few countries, while the main figure caption specifies the dataset covers "20 countries." While this is acceptable for an illustrative snippet within a larger workflow diagram, a complete assessment of the dataset's scientific validity would necessitate more details, such as the specific source of the data (e.g., International Energy Agency, World Bank), the criteria for the selection of these 20 countries, and any preprocessing steps. This information is typically provided in a methods section rather than a figure caption.
  • 💡 Clarity and Consistency of "Entity" Variable: The term "Entity" is used for countries in the examples shown. For robust analysis, it would be important to confirm that all entries under "Entity" across the full dataset consistently refer to national entities and do not include regional aggregates or other types of organizations, which could affect data comparability.
Communication
  • ✅ Clear Tabular Structure: Panel 1 uses a standard and easily understandable tabular format for presenting the data. Column headers are descriptive and clearly labeled.
  • ✅ Units Indicated: The units for measurements (kt for CO2 emissions, TWh for electricity) are explicitly provided in the column headers, which is a key aspect of good data presentation.
  • ✅ Role as Input Clearly Indicated: Its position at the start of the workflow diagram in Figure 2, with arrows leading to transformation steps like "pivot" and "calculate", effectively communicates its role as the initial input data for the subsequent analyses shown in other panels.
  • 💡 Font Legibility: While the text within this table panel is generally legible, it is quite small. Given the density of the overall Figure 2, any improvement in font size or contrast for this panel could enhance readability. Suggestion: If space allows, slightly increase font size for table contents or ensure high-resolution rendering.
  • 💡 Context within Overall Figure Density: This panel is one of many in a complex figure. While its individual clarity is good, its effectiveness is tied to how well it integrates into the narrative of the entire Figure 2. The transformation labels originating from this panel are helpful in this regard. Suggestion: Ensure that the visual flow from this input table to the first derived charts is unmistakably clear, perhaps with slightly bolder connecting arrows or a subtle background grouping for this initial stage.
Figure 3: DF2 overview. Users create visualizations by providing fields...
Full Caption

Figure 3: DF2 overview. Users create visualizations by providing fields (drag-and-drop or type) and NL instructions to the Chart Builder, delegating data transformation to AI. Data View shows derived data. Users navigate data history and select contexts for the next iteration using (the thread in use is displayed as local data threads). They refine or create new charts by providing instructions in Chart Builder. The main panel provides pop-up windows to inspect code, explanations, and chat history.

Figure/Table Image (Page 4)
Figure 3: DF2 overview. Users create visualizations by providing fields (drag-and-drop or type) and NL instructions to the Chart Builder, delegating data transformation to AI. Data View shows derived data. Users navigate data history and select contexts for the next iteration using (the thread in use is displayed as local data threads). They refine or create new charts by providing instructions in Chart Builder. The main panel provides pop-up windows to inspect code, explanations, and chat history.
First Reference in Text
As Figure 4-2 shows, Megan first drags and drops existing fields Year and Entity to the x-axis and color, respectively.
Description
  • Overall UI Overview: Figure 3 presents a screenshot of the Data Formulator 2 (DF2) user interface, illustrating its main components and their arrangement. The interface is designed to help users iteratively create data visualizations.
  • 1. Chart Builder: Component 1, labeled "Chart Builder for specifying chart with visual encodings and NL instructions," is located in the central-right part of the UI. It shows a configuration area where users can define a chart (e.g., a "Custom Line" chart) by assigning data fields like "Year" to the x-axis and "Renewable Per..." (Renewable Percentage) to the y-axis. It also includes a text input field for Natural Language (NL) instructions, here showing "include global median as an entity". A list of available "Data Fields" (e.g., "Global Median?", "Rank", "Renewable Percentage") is visible to the right of the chart builder.
  • 2. Local Data Thread: Component 2, "Local Data Thread visualizes the current data thread and supports quick backtracking," is situated directly above the Chart Builder. It displays a sequence of chart states (e.g., from "energy.csv" to "table-42" to "table-77" to "table-18") within the currently active iteration path. This component allows users to see the immediate history and potentially revert or branch from recent steps.
  • 3. Data Threads: Component 3, "Data Threads for navigating and selecting contexts to guide AI in the next iteration," is shown in the top-left panel. It displays multiple independent iteration histories (e.g., "thread-1", "thread-2", "thread-3"), each representing a distinct line of exploration. Users can select a chart from these threads to use as a starting point for new visualizations.
  • 4. Data View: Component 4, "Data View for inspecting original and derived data," is located at the bottom of the UI. It shows a tabular representation of the data associated with a selected chart (here, "table-18"). Columns include "Entity", "Global Median?", "Renewable Percentage", and "Year", with example data like "China, No, 16.639126586, 2000". This allows users to inspect the data values underlying their visualizations.
  • Additional UI Features: The figure also indicates that pop-up windows are available for inspecting code, explanations, and chat history, although these pop-ups are not actively displayed in this particular screenshot. The main central panel shows a line chart visualizing "Renewable Percentage" over "Year" for different entities, including a "Global Median".
Scientific Validity
  • ✅ Coherent UI representation: The figure provides a plausible and coherent visual representation of a user interface designed for iterative data visualization, aligning with the system's description in the caption and the broader context of the paper.
  • ✅ Demonstrates key system concepts: The layout effectively demonstrates the key concepts of DF2, such as the separation of global "Data Threads" for managing different exploration paths, a "Local Data Thread" for the current iterative sequence, a "Chart Builder" for multi-modal input (GUI and NL), and a "Data View" for inspection. This supports the paper's claims about the system's architecture and user interaction model.
  • ✅ Illustrates multi-modal input: The figure illustrates the multi-modal input capability through the Chart Builder, which shows both GUI elements for field mapping (e.g., drag-and-drop implied for axes) and a text box for NL instructions. This is a central aspect of the DF2 system.
  • ✅ Consistent with AI-delegated tasks: The caption mentions that the AI is delegated data transformation tasks, and the main panel can show pop-ups for code and explanations. While the pop-ups are not shown, the overall UI structure is consistent with a system where AI plays a significant role in processing user requests and generating outputs (charts and derived data).
  • 💡 Static representation of an interactive system: The figure is a static representation. The dynamic aspects of interaction, such as the actual drag-and-drop mechanism, the process of typing NL queries, or the appearance of pop-up windows, are implied rather than explicitly shown. This is a common limitation of static figures for interactive systems but does not detract from its validity as an overview.
  • 💡 Mismatched reference text: The reference text provided ("As Figure 4-2 shows...") does not pertain to Figure 3. This figure's validity should be assessed based on its own content and caption, and its consistency with the overall paper narrative about DF2.
Communication
  • ✅ Effective use of callouts: The use of numbered callouts (1, 2, 3, 4) effectively highlights the key components of the DF2 interface, guiding the viewer's attention to the most important functional areas described in the caption.
  • ✅ Comprehensive system overview: The figure successfully provides a comprehensive visual overview of the DF2 system, illustrating the spatial arrangement and interplay of its main functional panels, which aids in understanding the user workflow.
  • ✅ Clear and aligned caption: The caption is clear and directly corresponds to the visual elements and callouts in the figure, making it relatively self-contained for understanding the system's basic operation.
  • 💡 Legibility of small text elements: The text within some UI elements, particularly the smaller chart previews in the "Data Threads" panel (top left) and some labels in the "Data Fields" list (top right), is quite small and difficult to read. Suggestion: For a figure intended as an overview, consider using slightly larger fonts in the mock-up, or if these are direct screenshots, ensure high resolution and potentially use magnified insets for critical small text areas.
  • 💡 High information density: The figure is information-dense, presenting many UI elements simultaneously. While this shows the system's richness, it could be slightly overwhelming for a first-time viewer. Suggestion: Ensure a clear visual hierarchy, perhaps by using subtle color differences or line weights to differentiate primary interaction areas from secondary ones, or by focusing callouts on a more streamlined workflow if the goal is to illustrate a specific interaction path.
  • 💡 Subtle workflow indicators: The arrows indicating workflow (e.g., from "Data Threads" to "Chart Builder") are somewhat subtle. Suggestion: Make these workflow indicators more prominent to better emphasize the iterative process described.
Figure 4: Experiences with DF2: (1) creating the basic renewable energy chart...
Full Caption

Figure 4: Experiences with DF2: (1) creating the basic renewable energy chart using drag-and-drop to encode fields; (2 and 3) creating charts requiring new fields by providing field names and optional natural language instructions to derive new data.

Figure/Table Image (Page 5)
Figure 4: Experiences with DF2: (1) creating the basic renewable energy chart using drag-and-drop to encode fields; (2 and 3) creating charts requiring new fields by providing field names and optional natural language instructions to derive new data.
First Reference in Text
As Figure 4-2 shows, Megan first drags and drops existing fields Year and Entity to the x-axis and color, respectively.
Description
  • Basic Chart Creation: Panel 1 of Figure 4 demonstrates the initial step of creating a basic chart in DF2. It shows the Chart Builder interface where a user is constructing a line chart. The data source is "energy.csv".
  • Field Encoding: The user has encoded the chart by assigning data fields to visual channels: "Year" is mapped to the x-axis, "Electricity from r..." (presumably 'Electricity from renewables (TWh)') is mapped to the y-axis, and "Entity" (representing countries or regions) is mapped to the color channel. This implies a drag-and-drop interaction, as described in the caption.
  • Available Data Fields: A "Data Fields" list is visible on the right, showing available fields like "CO2 emissions (kt)", "Electricity from fossil fuels (...", "Electricity from nuclear (T...", and "Electricity from renewables". This is where the user would select fields to encode.
Scientific Validity
  • ✅ Represents fundamental chart creation: This panel accurately depicts a fundamental and common method for chart creation in many visualization tools—mapping existing data fields to visual encodings. It serves as a valid baseline interaction.
  • ✅ Plausible UI for basic tasks: The UI shown is plausible for a system aiming to simplify visualization authoring, providing direct manipulation for basic tasks.
  • ✅ Supports caption claim (part 1): The panel clearly supports the first part of the caption: "(1) creating the basic renewable energy chart using drag-and-drop to encode fields."
  • 💡 Mismatch with provided reference text for Figure 4-2: The reference text mentions "Figure 4-2" which corresponds to the next panel, not this one. This panel (4-1) illustrates a different step than what is described in the provided reference text for Figure 4-2.
Communication
  • ✅ Clear illustration of drag-and-drop: This panel clearly illustrates the drag-and-drop functionality for basic chart creation by showing fields being assigned to visual channels (x-axis, y-axis, color). The UI elements are distinct and the action is intuitive.
  • ✅ Standard terminology: The use of standard chart builder terminology (e.g., "Line Chart", "x-axis", "y-axis", "color") makes the interface understandable.
  • ✅ Clear Data Fields list: The visual representation of the "Data Fields" list on the right, from which fields are presumably dragged, is clear and aids in understanding the source of the encoded fields.
  • 💡 Truncated field names: The text for the field names (e.g., "Electricity from r...") is slightly truncated due to space constraints. While the meaning can often be inferred, full field names would be ideal for complete clarity. Suggestion: If possible in the UI mockup, allow for slightly wider field display areas or use tooltips in the actual system (though not representable here).
Figure 5: Iteration with DF2: (1) provide an instruction to filter the...
Full Caption

Figure 5: Iteration with DF2: (1) provide an instruction to filter the renewable energy percentage chart by top CO2 countries, (2) update the chart with Global Median? and instruct DF2 to add the global median alongside the top 5 CO2 countries' trends, and (3) move Global Median? from column to opacity to update the chart design without deriving new data.

Figure/Table Image (Page 5)
Figure 5: Iteration with DF2: (1) provide an instruction to filter the renewable energy percentage chart by top CO2 countries, (2) update the chart with Global Median? and instruct DF2 to add the global median alongside the top 5 CO2 countries' trends, and (3) move Global Median? from column to opacity to update the chart design without deriving new data.
First Reference in Text
On top of that, Megan provides a new instruction below the local data thread, "show only top 5 CO2 emission countries' trends," and clicks the “derive” button (Figure 5-1).
Description
  • Initial Chart Context: Panel 1 of Figure 5 (referenced as Figure 5-1 in the text) shows an intermediate step in the DF2 workflow. It displays the Chart Builder interface, which currently shows a line chart derived from "table-42". This chart plots "Renewable Per..." (Renewable Percentage) on the y-axis against "Year" on the x-axis, with different lines colored by "Entity" (countries).
  • Natural Language Instruction: The key action illustrated is the user providing a natural language (NL) instruction. In the text input field below the chart, the user has typed: "Show only top 5 CO2 emission countries' trends." This instruction is intended to filter the currently displayed renewable energy percentage chart based on a different criterion (CO2 emissions).
  • Action Button: A button labeled "formulate data" is visible below the NL instruction input field, indicating the action the user would take to submit this instruction to the DF2 system for processing.
Scientific Validity
  • ✅ Represents NL-driven filtering input: This panel accurately depicts the input stage for a natural language-driven data transformation and filtering task, which is a core capability of the DF2 system as described.
  • ✅ Realistic and non-trivial task: The scenario—filtering a chart based on criteria not directly plotted (CO2 emissions influencing a renewable energy chart)—is a realistic and non-trivial task in exploratory data analysis, showcasing the potential utility of the AI-driven transformation.
  • ✅ Supports caption claim (part 1): The figure panel directly supports the first part of the main Figure 5 caption: "(1) provide an instruction to filter the renewable energy percentage chart by top CO2 countries".
  • 💡 Validity depends on subsequent AI processing: The panel clearly shows the user's intent expressed through natural language. The scientific validity of the outcome of this instruction would depend on the AI's ability to correctly interpret the query, access relevant CO2 data (which is not explicitly shown as part of "table-42"), perform the ranking and filtering, and then apply this to the renewable percentage data. This panel only shows the input step.
  • 💡 Minor discrepancy in button label (text vs. figure): The reference text mentions a "derive" button, while the figure shows a "formulate data" button. This is a minor discrepancy but worth noting for consistency between text and visual. Assuming "formulate data" is the intended label, it reasonably conveys the action.
Communication
  • ✅ Clear focus on NL instruction: The panel clearly highlights the natural language instruction input field, making the user's action (providing a textual command) the central focus, which aligns with the caption's description of this step.
  • ✅ Context of existing chart is clear: The context of the existing chart (Renewable Energy Percentage) is visible, providing a clear before-state for the intended filtering operation.
  • ✅ Clear visual cue for step 1: The numbering "1" effectively links this panel to the first step described in the main caption for Figure 5.
  • 💡 Legibility of chart preview details: The chart preview within this panel is relatively small, and the details (e.g., specific country names in the legend, axis labels) are difficult to discern. Suggestion: While this is an intermediate step, ensuring slightly better legibility or using a simplified iconic representation of the chart could improve clarity without losing context.
  • 💡 Button label clarity: The term "derive" on the button, as mentioned in the reference text (though the button label in the figure is "formulate data"), might be slightly ambiguous. "Formulate data" or "Apply filter" or "Update chart" might be more direct. This is a minor point regarding the UI terminology itself. Suggestion: Ensure button labels clearly reflect the action being performed; "Formulate data" is reasonably clear.

Method

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Figure 6: DF2's workflow: (1) DF2 generates a Vega-Lite spec skeleton based on...
Full Caption

Figure 6: DF2's workflow: (1) DF2 generates a Vega-Lite spec skeleton based on user specifications and chart type. (2) If new fields (e.g., Rank) are required, DF2 prompts its AI model to generate data transformation code. (3) The Vega-Lite skeleton is then instantiated with the new data to produce the desired chart.

Figure/Table Image (Page 7)
Figure 6: DF2's workflow: (1) DF2 generates a Vega-Lite spec skeleton based on user specifications and chart type. (2) If new fields (e.g., Rank) are required, DF2 prompts its AI model to generate data transformation code. (3) The Vega-Lite skeleton is then instantiated with the new data to produce the desired chart.
First Reference in Text
Below shows the LLM's refined goal for the task in Figure 6, and the generated code is shown in Figure 6-2.
Description
  • User Specification Input: Panel 1 of Figure 6 illustrates the initial phase of DF2's workflow. It shows a user interacting with the Chart Builder interface. The user has selected 'Line Chart' as the chart type from the 'table-42' data source. They have mapped 'Year' to the x-axis, a new field 'Rank' to the y-axis, and 'Entity' to the color encoding. Additionally, a natural language (NL) instruction, "rank by renewable percentage," has been provided.
  • Vega-Lite Skeleton Generation: As a result of these user specifications, DF2 generates an initial Vega-Lite JSON skeleton. Vega-Lite is a high-level grammar for interactive graphics, allowing concise descriptions of visualizations. The generated skeleton shown is: `{"mark": "line", "encoding": {"x": {"field": "Year", "type": "temporal"}, "y": {"field": "Rank", "type": "?"}, "color": {"field": "Entity", "type": "nominal"}}}`. Notably, the 'type' for the 'Rank' field is marked with a '?', indicating it's yet to be determined, as 'Rank' is a new field to be derived.
Scientific Validity
  • ✅ Plausible first step in AI-assisted workflow: This panel accurately represents the first step in a plausible workflow for AI-assisted visualization: translating user intent (expressed via GUI and NL) into a structured chart specification (Vega-Lite).
  • ✅ Correct handling of new field specification: The generation of a Vega-Lite skeleton with placeholders (like the '?' for the type of 'Rank') correctly reflects a scenario where new data fields need to be derived before the chart can be fully specified, aligning with the overall problem DF2 aims to solve.
  • ✅ Demonstrates multi-modal input: The combination of GUI inputs for visual encoding and NL input for data derivation intent is a key aspect of DF2's proposed multi-modal interaction, and this panel effectively demonstrates that combination.
Communication
  • ✅ Clear depiction of user input and skeleton generation: This part of the figure clearly shows the user's input (chart type, field mappings like 'Year' to x-axis, 'Rank' to y-axis, and 'Entity' to color, plus an NL instruction 'rank by renewable percentage') and the corresponding initial Vega-Lite JSON skeleton. The visual connection between user input and the skeleton is evident.
  • ✅ Readable Vega-Lite skeleton: The Vega-Lite JSON is presented in a readable format, making it easy to understand the structure of the chart specification being generated. The placeholder '?' for the type of 'Rank' clearly indicates an unresolved part of the specification.
  • ✅ Intuitive UI representation: The UI elements for user specification (e.g., x-axis, y-axis, color dropdowns) are standard and intuitively represent how a user would define a chart.
  • ✅ Clear workflow indication: The connection from the user input panel to the Vega-Lite skeleton is indicated by an arrow, which helps in following the workflow step.
Figure 7: DF2 converts user encodings into a Vega-Lite specification, which is...
Full Caption

Figure 7: DF2 converts user encodings into a Vega-Lite specification, which is combined with AI-transformed data to visualize country ranks in 2000 and 2020.

Figure/Table Image (Page 7)
Figure 7: DF2 converts user encodings into a Vega-Lite specification, which is combined with AI-transformed data to visualize country ranks in 2000 and 2020.
First Reference in Text
For example, as shown in Figure 7, when the user drags Year→ x, Entity→ y and types Rank in y, the line chart template mentioned above is instantiated with provided fields: if the field is available in the current data table, both field name and encoding type are instantiated (e.g., Year with the temporal type), otherwise the encoding type is left as a "<placeholder>" to be instantiated later when data transformation completes.
Description
  • Overall Process Illustration: Figure 7 illustrates the process by which Data Formulator 2 (DF2) translates user-defined chart encodings and natural language instructions into a visual chart, involving the generation of a Vega-Lite specification and AI-driven data transformation.
  • User Input and Initial Transformation: The left part of the figure shows the user's input. The user has selected a "Ranged Dot Plot" chart type. The data source is initially "table-17", which is transformed into "table-56" based on an instruction "Rank countries by renewable percentage". For the chart encodings, "Entity" (country names) is mapped to the x-axis, "Rank" is mapped to the y-axis, and a field representing "2000 or 2020" is mapped to the color. An additional natural language instruction "show only 2000 and 2020" is provided in the chart builder.
  • Vega-Lite Specification Generation: The middle part displays the generated Vega-Lite JSON specification. Vega-Lite is a high-level language for creating interactive data visualizations. This specification defines a layered chart, consisting of a line layer and a point layer. Key parts of the encoding include: `"x": {"field": "Entity", "type": "nominal"}` and `"y": {"field": "Rank", "type": "<placeholder>"}`. The `"<placeholder>"` for the type of "Rank" and for the type of the color field `"2000 or 2020"` signifies that these are new or transformed fields whose data types will be determined after the AI processes the data.
  • Resulting Visualization: The right part of the figure shows the final rendered chart. It is a dot plot where the x-axis lists countries (e.g., Australia, Brazil, Canada, China). The y-axis represents "Rank", ranging from 0 to 15. Each country has two points, one for the year 2000 (e.g., blue color) and one for 2020 (e.g., orange color), showing its rank in those respective years. For example, Australia has a rank of approximately 11 in 2000 and a rank around 12-13 in 2020.
Scientific Validity
  • ✅ Accurately demonstrates core DF2 workflow: The figure accurately demonstrates the claimed workflow of DF2: user specification leading to a Vega-Lite skeleton, which is then populated by AI-transformed data to produce the final chart. This aligns with the system's described methodology.
  • ✅ Sound use of Vega-Lite and placeholders: The use of Vega-Lite as an intermediate representation is a sound choice, as it's a well-established grammar for visualization. The concept of using placeholders for types of derived fields is a logical approach in a system where data transformation is dynamic.
  • ✅ Realistic and relevant example task: The example task—visualizing changes in country rankings between two specific years—is a realistic analytical task that would typically require data transformation (ranking, filtering by year), making it a suitable showcase for DF2's capabilities.
  • 💡 AI transformation details are implicit: The figure implies that the AI transformation (e.g., calculating "Rank", filtering for "2000 or 2020") happens between the Vega-Lite skeleton generation and the final chart rendering. The details of this AI transformation are not shown in this figure but are central to the system's functionality. The validity of the final chart rests on the AI correctly performing these transformations.
  • 💡 Mismatch between figure content and specific example in reference text: The reference text describes a line chart example with fields Year, Entity, and Rank. Figure 7, however, shows a "Ranged Dot Plot" with Entity, Rank, and "2000 or 2020" (derived from Year). While both illustrate the concept of placeholders, the specific chart type and field mappings in Figure 7 differ from the example detailed in the reference text. This is not a flaw in the figure itself but a slight mismatch in the provided textual example for this specific figure.
Communication
  • ✅ Clear workflow visualization: The figure effectively uses a three-part layout (User Input -> Vega-Lite Spec -> Resulting Chart) to illustrate the transformation process, making the workflow clear and easy to follow.
  • ✅ Readable Vega-Lite specification with placeholders: The Vega-Lite JSON specification is presented in a structured and readable manner, with placeholders like `"<placeholder>"` clearly indicating where information from AI-transformed data will be inserted. This effectively communicates the concept of a template being filled.
  • ✅ Clear and relevant resulting chart: The resulting chart (a ranged dot plot) is a clear visual output that directly corresponds to the user's specifications (ranking for years 2000 and 2020). The use of color to distinguish between the two years is effective.
  • ✅ Accurate and informative caption: The caption accurately summarizes the process depicted in the figure, enhancing its self-containedness.
  • 💡 Potential ambiguity in NL instruction application: The user input panel on the left shows data source "table-17" being transformed to "table-56" with the instruction "Rank countries by renewable percentage". However, the NL instruction box below the chart encodings shows "show only 2000 and 2020". It's slightly unclear if both instructions are applied sequentially or if the latter refines the former. Suggestion: Clarify if the NL instruction in the encoding panel is a separate, subsequent step or part of the same transformation that also produces "table-56".
  • 💡 Readability of x-axis labels in the resulting chart: The entities (countries) on the x-axis of the resulting chart are numerous and their labels are vertically oriented, making them somewhat hard to read quickly. Suggestion: For illustrative purposes, consider showing a subset of countries or using a horizontal bar chart if space allows for better label readability, though the current plot does effectively show the 'ranged' aspect.
Figure 8: Data threads and local data threads (right). Users can select...
Full Caption

Figure 8: Data threads and local data threads (right). Users can select previous data or charts to create new branches, and the AI reuses code for new transformations based on user instructions. The local data thread offers shortcuts to (1) rerun the previous instruction, (2) issue a follow-up instruction, or (3) expand the previous card to revise and rerun the instruction.

Figure/Table Image (Page 8)
Figure 8: Data threads and local data threads (right). Users can select previous data or charts to create new branches, and the AI reuses code for new transformations based on user instructions. The local data thread offers shortcuts to (1) rerun the previous instruction, (2) issue a follow-up instruction, or (3) expand the previous card to revise and rerun the instruction.
First Reference in Text
As shown in Figure 8, the code and the conversation history are attached to each data node.
Description
  • Overall Figure Composition: Figure 8 displays two key components of the DF2 interface related to managing iteration history: the global "Data threads" view on the left, and the "local data threads" view on the right.
  • Global Data Threads View (Left Panel): The left panel, labeled "Data threads," illustrates a broader, branching history of data analysis. It shows nodes representing different data states (e.g., "global-energy", "table-17", "table-56", "table-74") and visualizations (small chart previews). Arrows connect these nodes, indicating transformations. Some transformations are associated with Python code snippets, for example, the transformation leading to "table-17" shows code calculating "Total_Electricity (TWh)" and "Renewable Percentage". This panel demonstrates how users can create new branches (e.g., from "global-energy" leading to two distinct paths, one via "table-74" and another via "table-17"). An option to "create a new chart" is shown originating from "table-17".
  • Local Data Threads View (Right Panel): The right panel, labeled "local data threads," provides a focused view of the current active iteration path. It shows a linear sequence: "global-energy" to "table-17" (displaying "Renewable Per..." by "Entity") to "table-56" (displaying "Rank" by "Year" for different entities, with the instruction "Rank countries by renewable percentage"). This local view offers shortcuts related to the last operation (leading to "table-56"): an icon suggesting a rerun/refresh, a text input field for a follow-up instruction (pre-filled with "Rank country by CO2 emission instead"), and an icon suggesting expansion or editing of the previous instruction.
  • AI Code Reuse and History Attachment: The caption and reference text indicate that AI reuses or modifies code for new transformations, and that code and conversation history are attached to data nodes. The figure visually supports the code attachment by showing different Python snippets for different transformations in the global view.
Scientific Validity
  • ✅ Illustrates data provenance and iterative refinement: The figure effectively illustrates the concept of data provenance and iterative refinement in a visual analytics system. The branching data threads and local iteration shortcuts are plausible and well-established mechanisms for supporting exploratory data analysis.
  • ✅ Demonstrates AI-assisted code transformation concept: The depiction of AI reusing/modifying code (implied by different Python snippets for different transformations originating from similar base data) is consistent with the paper's claims about AI-assisted data transformation. This demonstrates a core aspect of the system's intelligence.
  • ✅ Sound interaction model for local iteration: The local data thread with shortcuts for rerunning, follow-up instructions, and editing previous steps provides a sound interaction model for efficient iteration and error correction, which is crucial for practical usability.
  • 💡 Conversation history attachment not explicitly visualized: The reference text states that "code and the conversation history are attached to each data node." The figure visually shows code snippets associated with transformations (edges) or resulting data states (nodes). However, the "conversation history" (e.g., the sequence of NL prompts or AI responses beyond the code itself) is not explicitly visualized as attached to each node in this figure. This is a minor omission if the primary intent is to show code reuse.
  • ✅ Plausible AI code generation examples: The figure implies that the AI generates Python code for data transformation. The specific Python functions shown (e.g., calculating percentages, ranking) are realistic examples of data wrangling tasks. The scientific validity of this aspect hinges on the AI's actual capability to generate correct and efficient code for a wide range of user requests, which is not something this figure alone can prove but it appropriately illustrates the system's designed behavior.
Communication
  • ✅ Effective side-by-side comparison: The side-by-side presentation of the global "Data threads" (left) and the "local data threads" (right) effectively contrasts the broader exploration history with the focused current iteration path.
  • ✅ Clear depiction of branching history: The visual representation of branching in the global "Data threads" clearly communicates the non-linear nature of data exploration and the system's support for it.
  • ✅ Clear illustration of local iteration shortcuts: The local data thread on the right, with its focused view and clearly indicated shortcut options (implied rerun, text input for follow-up, expand icon), successfully illustrates the mechanisms for quick iteration and revision.
  • ✅ Caption aligns with visual elements: The caption aligns well with the visual elements, particularly in describing the functionalities of the local data thread and the concept of AI reusing code for new transformations (implied by different code snippets shown in the global view).
  • 💡 Legibility of code snippets: The Python code snippets shown within the global "Data threads" view are very small and their content is largely illegible. While they serve to indicate the presence of code, their specific details cannot be discerned. Suggestion: For an overview figure, this is acceptable, but if the specific nature of code reuse or modification was critical to convey, consider using callouts with larger, more readable code excerpts.
  • 💡 Clarity of icons without caption: The icons for "rerun" (a refresh-like symbol) and "expand" (a square with an arrow) in the local data thread panel are somewhat generic. Their specific functions are primarily understood from the caption's enumeration. Suggestion: In an actual UI, tooltips would be essential. For the figure, ensuring the caption's numbered points clearly map to these visual elements is important, which is currently done.
  • 💡 Minor ambiguity in caption phrasing: The phrase "local data threads (right)" in the caption is a bit ambiguous. It means the local data thread component is shown on the right side of the figure. Suggestion: Phrasing like "local data threads (shown on the right)" could offer slightly more clarity.
Figure 9: DF2 provides explanations of the code generated by AI to assist users...
Full Caption

Figure 9: DF2 provides explanations of the code generated by AI to assist users understand the data transformation. This example is the explanation of the code behind table-56 in Figure 8.

Figure/Table Image (Page 9)
Figure 9: DF2 provides explanations of the code generated by AI to assist users understand the data transformation. This example is the explanation of the code behind table-56 in Figure 8.
First Reference in Text
Figure 9 shows the explanation for the code behind table-56 in Figure 8.
Description
  • Purpose and Context of Explanation: Figure 9 displays a pop-up window from the DF2 interface titled "Data transformation explanation (global-energy → table-56)". This window provides a step-by-step natural language explanation of the data transformation process that the AI performed to convert an initial dataset (referred to as "global-energy") into a derived data table ("table-56"). This specific transformation is related to the example shown in Figure 8, where "table-56" represents data with country rankings by renewable energy percentage.
  • Step-by-Step Transformation Logic: The explanation consists of four numbered steps: 1. **Calculate the `Total electricity (TWh)`** by summing `Electricity from fossil fuels (TWh)`, `Electricity from nuclear (TWh)`, and `Electricity from renewables (TWh)`. 'TWh' stands for Terawatt-hour, a unit of energy. 2. **Determine the `Renewable_percentage`** by dividing `Electricity from renewables (TWh)` by `Total electricity (TWh)` (and implicitly multiplying by 100, though not explicitly stated here, it's standard for percentages). 3. **Rank each `Entity` by `Renewable_percentage` for each `Year`**, with the highest percentage receiving rank 1. An 'Entity' here refers to a country or region. 4. **Select and return the columns `Year`, `Entity`, and `Rank`**. This means the final table ("table-56") will contain only these three pieces of information for each record.
  • Disclaimer and Additional Information: At the bottom of the explanation window, there is a disclaimer: "AI generated results can be inaccurate, inspect it!" This serves as a caution to the user. Below this, "data: table-56" indicates the specific data table this explanation pertains to, and there are icons suggesting further actions or information (e.g., an information icon, a thumbs up/down for feedback).
Scientific Validity
  • ✅ Enhances transparency of AI actions: The provision of a natural language explanation for AI-generated code is a valuable feature for transparency and user understanding. It allows users, especially those less familiar with the specific programming language (e.g., Python, as seen in Figure 8), to comprehend the logic behind the data transformations.
  • ✅ Logical and correct transformation steps: The steps outlined in the explanation (calculating total electricity, then renewable percentage, then ranking, and finally selecting columns) represent a logical and correct sequence of operations to achieve the goal of ranking entities by renewable energy percentage. This aligns with standard data analysis practices.
  • ✅ Consistent with preceding figures and claims: The explanation directly corresponds to the transformation required to produce "table-56" as conceptualized in Figure 8 (ranking countries by renewable percentage). This consistency supports the claim that DF2 can explain its operations.
  • ✅ Acknowledges AI limitations: The inclusion of the disclaimer about potential inaccuracies in AI-generated results is a methodologically sound approach, acknowledging the current limitations of AI and promoting critical user engagement.
  • 💡 Accuracy contingent on AI's explanation generation capability: While the explanation is clear, its accuracy is contingent on the AI's ability to correctly generate both the code and this corresponding explanation. The figure itself shows the explanation, not the process of generating it. The scientific validity of the explanation feature relies on the robustness of the AI model used to produce such explanations consistently and accurately across various transformations.
  • 💡 Minor imprecision in percentage calculation description: Step 2, "Determine the Renewable_percentage by dividing Electricity from renewables (TWh) by Total electricity (TWh)," correctly describes the calculation of a ratio. For it to be a percentage, multiplication by 100 is implied but not explicitly stated in this step. The Python code in Figure 8 for table-56 does include `* 100`. Ensuring the explanation is fully explicit (e.g., "...and multiplying by 100 to express as a percentage") would improve precision, though it's a minor point as the term 'percentage' itself implies this.
Communication
  • ✅ Clear, sequential explanation: The use of a numbered list for the explanation steps makes the transformation process easy to follow and understand sequentially.
  • ✅ Highlighting of key terms: Key terms within the explanation, such as `Total electricity (TWh)`, `Renewable_percentage`, `Year`, `Entity`, and `Rank`, are highlighted (e.g., bolded or distinctly formatted), which improves readability and helps the user quickly identify important variables or concepts.
  • ✅ Accessible language: The explanation is concise and uses relatively plain language, making it accessible even to users who may not be Python programming experts but understand data manipulation concepts.
  • ✅ Prominent disclaimer: The disclaimer "AI generated results can be inaccurate, inspect it!" is prominently displayed, which is good practice for systems involving AI-generated content, encouraging user verification.
  • ✅ Clear title indicating scope: The title "Data transformation explanation (global-energy → table-56)" clearly indicates the scope of the explanation, linking it to specific data states shown in other figures (like Figure 8).
  • ✅ Clean visual design: The visual design is clean and uncluttered, resembling a typical pop-up or modal window, which is appropriate for displaying such information without overwhelming the main interface.

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Figure 10: Participants' self-reported roles, expertise in chart creation, data...
Full Caption

Figure 10: Participants' self-reported roles, expertise in chart creation, data transformation, programming, and AI assistants (1=novice, 4=expert), task completion time, and hints needed during study tasks.

Figure/Table Image (Page 9)
Figure 10: Participants' self-reported roles, expertise in chart creation, data transformation, programming, and AI assistants (1=novice, 4=expert), task completion time, and hints needed during study tasks.
First Reference in Text
Participants self-rated their skills (Figure 10) on a scale of 1 to 4 ("Novice," "Intermediate,” “Proficient," and "Expert") in: (1) chart creation - experience with chart authoring tools or libraries, (2) data transformation - experience with data transformation tools and library expertise, (3) programming, and (4) AI assistants - experience with large language models (e.g., ChatGPT [1]) and prompting.
Description
  • Table Overview: Figure 10 is a table summarizing data for eight study participants, labeled P1 through P8. It includes their self-reported professional roles, expertise levels in four areas, task completion times for two datasets, and the number of hints they required.
  • Participant Roles: Participants' roles include "Developer" (P1, P4, P5, P8), "Data Scientist" (P2, P6, P7), and "Data Architect" (P3).
  • Self-Reported Expertise: Expertise levels are self-reported on a scale of 1 (novice) to 4 (expert) in four categories: "Chart" (chart creation), "Data" (data transformation), "Coding" (programming), and "AI" (AI assistants). For example, participant P1 rated themselves as 3 for Chart, 4 for Data, 4 for Coding, and 2 for AI. Participant P4 reported the lowest expertise across the board (1 for Chart, 2 for Data, 3 for Coding, 2 for AI), while P2 and P3 generally reported higher expertise (e.g., P2: 3, 4, 4, 4; P3: 3, 4, 4, 4).
  • Task Completion Times: Task completion times are provided for two datasets, "Dataset 1" and "Dataset 2", measured in seconds (s). For Dataset 1, times ranged from 715s (P3) to 1638s (P7). For Dataset 2, times ranged from 1148s (P6) to 2937s (P5).
  • Hints Needed: The final column, "Hints," shows the number of hints each participant needed during the study tasks. This ranged from 0 hints (P2, P3) to 3 hints (P6).
  • Unusual Markings: There are some unusual markings: P5 has an exclamation mark next to their ID, and P6 has a "-1" next to their ID. P5's AI expertise is bolded as '1', and P6's Data and Coding expertise are bolded as '2'. The meaning of these markings is not explicitly defined in the provided caption or reference text.
Scientific Validity
  • ✅ Summarizes relevant participant characteristics: The table effectively summarizes key characteristics of the study participants, which is important for understanding the context of the user study results and assessing potential influences of background on performance.
  • 💡 Self-reported expertise has inherent limitations: Presenting self-reported expertise is a common practice in HCI studies. However, self-reported data can be subject to biases (e.g., Dunning-Kruger effect, modesty bias). Acknowledging this limitation, or supplementing with objective measures if possible, would strengthen the study, though for descriptive purposes of the sample, it is acceptable.
  • 💡 Small sample size: The sample size (N=8) is relatively small, which is common for qualitative or in-depth usability studies. However, this limits the generalizability of quantitative findings (like average completion times or correlation between expertise and performance). The table itself is descriptive and doesn't make inferential claims, so this is acceptable for its purpose.
  • ✅ Includes objective performance metrics: The inclusion of task completion times and hints needed provides objective performance metrics that can be qualitatively related to the self-reported expertise and roles, offering richer insights into user experiences.
  • 💡 Unexplained symbols and formatting reduce clarity and interpretability: The unexplained symbols (P5!, P6-1) and bolded numbers are problematic as they introduce ambiguity. If these denote specific participant characteristics or events (e.g., P5 encountered a specific technical issue, P6 was a pilot participant with a slightly different setup), this context is missing and crucial for correct interpretation.
  • ✅ Relevant expertise categories: The categories for expertise (Chart, Data, Coding, AI) are relevant to a study on an AI-powered data visualization tool. This allows for exploring how different skill sets might interact with the system.
Communication
  • ✅ Effective use of tabular format: The tabular format is highly effective for presenting a summary of diverse participant characteristics and performance metrics in a structured and comparable way.
  • ✅ Clear column headers: Column headers are clear and concise (e.g., "ID", "Role", "Chart", "Data", "Coding", "AI", "Dataset 1", "Dataset 2", "Hints"), making it easy to understand what each column represents.
  • ✅ Informative caption with scale explanation: The caption provides essential context, including the meaning of the expertise scale (1=novice, 4=expert), which is crucial for interpreting the data.
  • 💡 Unexplained formatting/symbols: The use of bolding for participant P5's expertise in "AI assistants" (value 1) and P6's expertise in "Data" (value 2), and "Coding" (value 2) seems to indicate outliers or specific points of interest, but this is not explained in the caption or reference text. If these are indeed intended highlights, their significance should be clarified. Suggestion: Explain any specific formatting (like bolding or the exclamation mark next to P5 and the '-1' next to P6) in the caption or a footnote.
  • 💡 Interpretability of time data: The task completion times are presented in seconds (e.g., "1047s"). While precise, converting these to minutes and seconds (e.g., 17m 27s) might be more immediately interpretable for some readers. Suggestion: Consider adding a secondary representation or a note about typical time ranges in minutes.
  • ✅ Good information density: The table is compact and presents a good amount of information without appearing overly cluttered.
Figure 11: The dataset and tasks in our user study. (1) Dataset 1:...
Full Caption

Figure 11: The dataset and tasks in our user study. (1) Dataset 1: Understanding top earning majors and the relation between salary and women percentage. (2) Dataset 2: Exploring movie genres with best return-on-investment values (profit vs. profit ratio) and top movies. The branching directions are added for illustration; participants developed their own iteration strategies. We refer to these target charts as C1-7 for the college dataset and M1-9 for the movies dataset.

Figure/Table Image (Page 11)
Figure 11: The dataset and tasks in our user study. (1) Dataset 1: Understanding top earning majors and the relation between salary and women percentage. (2) Dataset 2: Exploring movie genres with best return-on-investment values (profit vs. profit ratio) and top movies. The branching directions are added for illustration; participants developed their own iteration strategies. We refer to these target charts as C1-7 for the college dataset and M1-9 for the movies dataset.
First Reference in Text
Figure 11-1 shows the first data exploration session: given a dataset on college majors and income data (173 rows × 7 columns), participants were asked to create seven visualizations: two basic charts and five requiring data transformation.
Description
  • Task Overview (Dataset 1): Panel 1 of Figure 11 (referred to as Figure 11-1 in the text) outlines the first user study task, which involves analyzing a dataset on college majors and income. The stated goal is "Understanding top earning majors and the relation between salary and women percentage."
  • Initial Data Table (Dataset 1): An initial data table snippet is shown at the top left. It includes columns like "Code", "Major", "Men", "Women", "Major Category", "Employed", and "Median Salary". For example, for Major "ACCOUNTING" (Code 6201), there are 94519 Men, 104114 Women, it falls under "Business" Major Category, has 165527 Employed, and a Median Salary of 45000.
  • Sequence of Target Charts and Transformations (C1-C7): The panel then illustrates a sequence of seven target chart visualizations (C1 through C7) that participants were asked to create. This sequence involves several data transformations and analytical steps: - C1 & C2: Labeled as "Basic charts". C1 appears to be a scatter plot of Median Salary vs. women percentage. C2 is also a scatter plot, likely related to Major Category and Median Salary. - From C2, a transformation "top 20 earning majors" leads to C3 (a bar chart of majors by median salary). - C3 is then transformed by "color by Major Category" to produce C4 (a similar bar chart, but with bars colored by major category). - Another branch from C1 involves "calculate women percentage and salary" leading to C5 (a scatter plot of median salary vs. women percentage, possibly with different aggregations or filtering). - From C5, a transformation "color by Major Category" leads to C6 (similar to C5 but colored by major category). - Finally, from C6, a transformation "show top 4 the rest as 'others'" leads to C7 (a scatter plot where majors are grouped, with top categories highlighted and others grouped as 'others').
  • Iterative Nature of Task: The caption notes that "The branching directions are added for illustration; participants developed their own iteration strategies." This implies the depicted flow is a target outcome, but users might have taken different paths to achieve these visualizations.
Scientific Validity
  • ✅ Realistic multi-step exploration task: This panel effectively outlines a realistic multi-step data exploration task. The transformations described (filtering, aggregation, categorization) are common in data analysis and suitable for evaluating a data visualization tool.
  • ✅ Tests iterative refinement and derived data handling: The progression from basic charts to more complex ones requiring several transformations (e.g., C7 requiring grouping into 'others') provides a good test of the system's capabilities for iterative refinement and handling derived data.
  • ✅ Relevant and understandable dataset: The dataset itself (college majors, employment, salary, gender distribution) is relevant and commonly used for socio-economic analyses, making the task relatable and understandable.
  • ✅ Consistent with dataset description in text: The reference text mentions the dataset has 173 rows x 7 columns. The snippet shows 7 columns, which is consistent. The number of rows in the snippet is small, which is typical for an illustration.
  • ✅ Clarifies illustrative nature of branching: The caption clarifies that the branching is illustrative and participants developed their own strategies. This is an important methodological point, as it suggests the study was not strictly about reproducing a fixed sequence but about achieving analytical goals, allowing for user variability.
  • 💡 Validity contingent on system capabilities: The scientific validity of the tasks hinges on whether these charts (C1-C7) can indeed be produced by the DF2 system and whether the transformations are within its capabilities. The figure itself presents these as targets.
Communication
  • ✅ Clear illustration of data and task flow: The panel effectively uses a combination of a data table snippet and a flowchart-like progression of chart thumbnails (C1-C7) to illustrate the data exploration task. This provides both context (initial data) and the sequence of analytical steps.
  • ✅ Clear transformation labels: The labels for transformations (e.g., "top 20 earning majors", "calculate women percentage and salary", "color by Major Category", "show top 4 the rest as 'others'") are concise and clearly describe the operation performed at each step.
  • ✅ Informative chart thumbnails: The chart thumbnails (C1-C7), while small, give a reasonable impression of the type of visualization created at each stage, aiding in understanding the analytical progression.
  • ✅ Clear task goal in caption: The caption part "(1) Dataset 1: Understanding top earning majors and the relation between salary and women percentage" clearly defines the goal of this analytical task.
  • 💡 Legibility of data table snippet: The data table snippet is quite small, and the text within it (column headers and data values) is difficult to read without zooming. Suggestion: Consider making the table snippet slightly larger or using a higher resolution image if the specific data values are important for the reader to see.
  • 💡 Information density: The arrows indicating the flow of transformations are clear, but the overall panel is information-dense. Suggestion: Ensure sufficient white space or visual grouping to prevent a cluttered appearance, especially given the multiple small chart elements.
Figure 12: Participants' workflow for study tasks in Figure 11 (C1-7 for...
Full Caption

Figure 12: Participants' workflow for study tasks in Figure 11 (C1-7 for college, M1-9 for movie). Each node represents a data table version, with blue for initial datasets, yellow for data tables instantiating (one or multiple) target visualizations in Figure 11 (number i in the node indicate the i-th target visualizations for the given dataset), and gray for others. Self-loop arrows indicate prompt revisions and data table updates ('×2' indicates two revisions).

Figure/Table Image (Page 12)
Figure 12: Participants' workflow for study tasks in Figure 11 (C1-7 for college, M1-9 for movie). Each node represents a data table version, with blue for initial datasets, yellow for data tables instantiating (one or multiple) target visualizations in Figure 11 (number i in the node indicate the i-th target visualizations for the given dataset), and gray for others. Self-loop arrows indicate prompt revisions and data table updates ('×2' indicates two revisions).
First Reference in Text
Figure 12 illustrates their organization of data threads in their workspaces upon completing the study tasks.
Description
  • Overall Structure: Small Multiples of Workflows: Figure 12 presents a series of small multiple diagrams, illustrating the individual workflows of eight participants (P1 through P8) for two separate study tasks: one involving a 'college' dataset (aiming to produce target charts C1-C7 from Figure 11) and another involving a 'movie' dataset (aiming for charts M1-M9 from Figure 11).
  • Node Representation and Color Coding: Each individual diagram is a node-link graph representing a participant's data exploration path, referred to as their 'data threads' organization. Nodes symbolize different versions of a data table. The color of a node indicates its role: blue nodes are initial datasets; yellow nodes represent data tables that were used to create one or more of the target visualizations (the specific target chart number, e.g., '1,2' or '7', is often shown inside the yellow node); and gray nodes are other intermediate data table versions created during the exploration.
  • Edge Representation and Iteration: Arrows between nodes indicate a transformation step, showing the progression from one data table version to another. Self-loop arrows (an arrow starting and ending on the same node) signify prompt revisions and data table updates. A notation like '×2' next to a self-loop indicates that two such revisions occurred on that data table version.
  • Variability in Workflow Complexity and Strategy: The diagrams visually demonstrate the diversity in participants' approaches. Some workflows are relatively linear with few branches (e.g., P2's college task), while others are highly branched with many intermediate steps (e.g., P5's movie task, which also includes a 'reset' point). The number of nodes and the complexity of the graph structure vary significantly across participants and tasks. For example, P1's college task workflow is very compact with few nodes, while P8's movie task workflow is more extensive.
  • Illustration of Iteration Styles: The figure aims to show how participants organized their data threads, illustrating different iteration styles such as creating wider versus deeper tree structures, or backtracking versus follow-up actions, as discussed in the paper's results section.
Scientific Validity
  • ✅ Appropriate visualization of qualitative workflow data: The figure provides a valuable visualization of qualitative data regarding user behavior and interaction strategies. Representing workflows as graphs is an appropriate method for capturing the iterative and often non-linear nature of data exploration.
  • ✅ Supports systematic comparison of iteration styles: The consistent coding scheme (colors, arrows) applied across all participants allows for systematic comparison of their approaches, supporting the paper's analysis of different iteration styles.
  • ✅ Supports claims made in reference text: The figure directly supports the reference text's claim that it illustrates the organization of data threads. It visually substantiates discussions about how participants managed their analytical history.
  • 💡 Granular detail requires careful interpretation for strategic patterns: The level of detail (individual data table versions as nodes) provides a granular view of the process. However, deriving higher-level strategic patterns solely from these visual graphs might require careful interpretation and could be complemented by other qualitative data (e.g., think-aloud protocols, interview excerpts).
  • ✅ Supports claim of diverse iteration strategies: The claim that "participants developed their own iteration strategies" (from Figure 11 caption, contextually relevant here) is well supported by the visual diversity in Figure 12's workflows.
  • 💡 'Reset' event lacks causal explanation in the figure itself: The 'reset' label for P5's movie task is an interesting data point. Its scientific interpretation would depend on understanding why the reset occurred (e.g., system error, user confusion, deliberate change in strategy). The figure shows the event but not the cause.
Communication
  • ✅ Consistent visual language for comparison: The use of a consistent visual language (nodes for data versions, arrows for transformations, color coding for node type) across all participant workflows allows for effective comparison of iteration strategies.
  • ✅ Effective color coding with clear legend: The color coding (blue for initial, yellow for target, gray for intermediate) is clearly explained in the caption and effectively distinguishes different types of data table versions within the workflows.
  • ✅ Comprehensive and clear caption: The caption is comprehensive and provides all necessary information to interpret the diagrams, including the meaning of nodes, colors, arrows, and special notations like '×2'.
  • ✅ Effectively visualizes variability in workflows: The figure successfully visualizes the non-linear and varied nature of participants' data exploration paths, highlighting individual differences in problem-solving approaches.
  • 💡 High information density: The figure is quite dense, with 16 small multiples (8 participants × 2 tasks). While this allows for a holistic view, individual workflow details can be hard to discern without zooming. Suggestion: If specific workflow patterns are being discussed in the text, consider using callouts or slightly larger versions of representative examples, or ensure the figure is rendered at a very high resolution in the final publication.
  • 💡 Legibility of numbers in yellow nodes: The numbers within the yellow nodes (indicating target visualizations C1-7 or M1-9) are small and can be difficult to read. Suggestion: Increase the font size for these numbers if possible, or ensure they are very crisp.
  • 💡 Undefined 'reset' label: The meaning of the 'reset' label on P5's movie task workflow is not explicitly defined in the caption, though it can be inferred as restarting a branch. Suggestion: Briefly define any unique labels like 'reset' in the caption or text.

Discussion and Future Work

Key Aspects

Strengths

Suggestions for Improvement