Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way

Section Analysis

Abstract

Key Aspects

📉 Challenges in Iterative Visualization Authoring: The abstract begins by highlighting a critical challenge faced by data analysts: the iterative nature of creating rich visualizations, which involves frequent alternation between data transformation and chart design. It points out that while numerous AI-powered systems aim to simplify visualization authoring, these existing tools are often ill-suited for this iterative process. Specifically, they typically demand that analysts provide a complete, text-only description of a complex visualization in a single interaction, which is often impractical for exploratory data analysis where requirements evolve. This establishes the core problem Df2 aims to solve, setting the context for the paper's contribution.
💻 Introducing Data Formulator 2: As a direct response to the identified limitations, the paper introduces Data Formulator 2 (Df2), an AI-powered visualization system. Df2 is explicitly presented as being engineered to support and overcome the difficulties associated with iterative authoring in data visualization. This introduction positions Df2 as the central contribution of the work, designed to address the shortcomings of prior systems. The significance lies in proposing a novel system focused on iteration as a primary design concern.
⚙️ Df2's Hybrid Interaction and Iteration Support: The abstract details Df2's core functionalities that enable its iterative capabilities. It uniquely blends graphical user interfaces (GUIs) with natural language (NL) inputs, allowing users to express their visualization intent more effectively and precisely than text-only prompts allow. Crucially, Df2 delegates the complex task of data transformation to its AI capabilities, simplifying a major hurdle for analysts. Furthermore, to enhance efficient iteration, Df2 enables users to navigate their design history and reuse previous visualization designs, thereby avoiding the need to restart the authoring process for each modification. This combination of hybrid input and iteration management is central to Df2's contribution to the field.
📊 User Study and Findings: The abstract concludes by summarizing the validation of Df2 through a user study involving eight participants. This study demonstrated Df2's practical utility, showing that it empowered participants to develop personalized iteration styles. This enabled them to successfully complete challenging data exploration sessions, indicating Df2's effectiveness in real-world analytical scenarios and its adaptability to individual user workflows. This provides empirical evidence supporting the system's claims and its potential impact on data analysis practices.

Strengths

✅ Clear Problem Articulation
The abstract effectively establishes the existing gap in AI-powered visualization tools, specifically their inadequacy for iterative authoring, which is a common practice in exploratory data analysis.

"Although many AI-powered systems have been introduced to reduce the effort of visualization authoring, existing systems are not well suited for iterative authoring." (Page 1)
✅ Concise Solution Introduction
The abstract clearly introduces Data Formulator 2 (Df2) as the proposed solution and immediately states its primary design goal: to overcome the limitations of existing systems in iterative authoring.

"We introduce Data Formulator 2 (Df2 for short), an AI-powered visualization system designed to overcome this limitation." (Page 1)
✅ Highlights Key Differentiating Features
The abstract successfully communicates the core mechanisms of Df2 that address the identified problem, such as the blend of GUI and NL inputs, AI-driven data transformation, and support for navigating iteration history.

"Df2 blends graphical user interfaces and natural language inputs to enable users to convey their intent more effectively, while delegating data transformation to AI. Furthermore, to support efficient iteration, Df2 lets users navigate their iteration history and reuse previous designs..." (Page 1)
✅ Empirical Validation Mentioned
The inclusion of a user study with a specific number of participants lends credibility to the system's claims and indicates that the findings are backed by empirical evidence.

"A user study with eight participants demonstrated that Df2 allowed participants to develop their own iteration styles to complete challenging data exploration sessions." (Page 1)

Suggestions for Improvement

💡 Briefly Specify Nature of "Challenging Sessions"
The abstract mentions that Df2 helped participants complete "challenging data exploration sessions." While conciseness is key in an abstract, adding a very brief, impactful descriptor for the nature of these challenges (e.g., involving complex data transformations, multi-step analyses) could subtly enhance the reader's understanding of Df2's capabilities and impact. This would be a low-impact change, but could add a touch more specificity without significantly increasing length. It belongs in the abstract as it clarifies the context of the user study's findings, which is a crucial part of summarizing the paper's contribution.

"...to complete challenging data exploration sessions." (Page 1)

Implementation: Consider revising the last sentence to incorporate a brief qualifier for the challenges. For example: "A user study with eight participants demonstrated that Df2 allowed participants to develop their own iteration styles to complete challenging data exploration sessions, such as those involving evolving analytical goals and multi-step data transformations."

Introduction

Key Aspects

📉 The Iterative Challenge in Data Visualization: The Introduction establishes that data exploration is fundamentally an iterative process, where analysts frequently cycle between exploring various chart designs and performing necessary data transformations to achieve their goals. This iterative journey often leads to new insights and directions. However, this process is fraught with execution challenges, as analysts must not only vary chart specifications but also manage complex data transformations to support their desired visualizations, such as filtering data based on dynamic criteria or computing rankings over time. This highlights the core difficulty that current tools often struggle to address effectively.
🧱 Shortcomings of Current AI Visualization Tools: The paper critically examines existing AI-powered visualization tools, noting their inadequacy in the context of iterative authoring. Two primary limitations are identified: Firstly, the reliance on free-form text prompts, while offering expressiveness, lacks the precision and affordances of graphical user interfaces (GUIs), making it difficult for users to articulate complex chart designs clearly, often leading to misinterpretations by AI models. Secondly, these tools generally fail to accommodate non-linear authoring behaviors like branching or backtracking, compelling users to re-specify intents from scratch even for minor updates, which is inefficient and increases the likelihood of AI failure. These limitations underscore the need for new interaction paradigms.
⚙️ Df2's Core Interaction Design Insights: To overcome the identified challenges, the Introduction presents two pivotal design insights that form the foundation of Data Formulator 2 (Df2). The first is a multi-modal chart builder, which synergistically blends a shelf-configuration UI—allowing users to drag-and-drop data fields onto visual channels—with natural language (NL) input for supplemental instructions, thereby enhancing the clarity and effectiveness of communicating design intent to the AI. The second key insight is the introduction of 'data threads,' which represent the user's non-linear authoring history, enabling navigation to previous states, forking new exploration branches, and specifying incremental updates, thus reducing input overhead and improving AI performance by providing relevant context.
🎯 Data Formulator 2: System and Contributions: Building upon these design principles, the paper introduces Data Formulator 2 (Df2) as an AI-powered visualization tool specifically engineered for iterative authoring. The Introduction briefly outlines Df2's capabilities, including its support for diverse chart types through the Vega-Lite grammar and its AI model's ability to perform flexible data transformations. It also signals the empirical validation of Df2 through a user study with eight participants. The section concludes by explicitly listing the main contributions: the novel interaction approaches (multi-modal chart builder and data threads), their implementation in the Df2 system, and the findings from the user study regarding analysts' iteration styles and experiences.

Strengths

✅ Effective Establishment of Problem Domain
The Introduction clearly articulates the core problem: the mismatch between the iterative nature of data exploration and the capabilities of existing AI-powered visualization tools. It effectively sets the stage by detailing why current solutions fall short.

"Despite their success, current tools do not perform well in the iterative visualization authoring context." (Page 2)
✅ Precise Identification of Tool Limitations
The paper doesn't just state a general problem but pinpoints specific deficiencies in current tools, namely the issues with text-only prompts (lack of precision, difficulty in describing complex designs) and the lack of support for iterative behaviors like branching and backtracking.

"First, even though free-form text prompts provide unbounded expressiveness for users to describe their goals, they miss UI interactions’ precision and affordances... Second, existing AI-powered tools do not accommodate branching or backtracking..." (Page 2)
✅ Cogent Transition from Problem to Solution
The Introduction provides a strong logical bridge from the identified problems to the proposed key insights of Df2. The multi-modal chart builder and data threads are presented as direct responses to the limitations discussed.

"To address these iterative chart authoring challenges, our first key insight is to design a multi-modal chart builder... Our second key insight is to introduce data threads..." (Page 2)
✅ Clear Enumeration of Contributions
The section concludes with a clear, bulleted list of the paper's main contributions, which helps the reader understand the scope and impact of the work upfront.

"In summary, our main contributions are as follows: • We designed new interaction approaches... • We implemented these designs in Df2... • We conducted a user study..." (Page 2)

Suggestions for Improvement

💡 Briefly Foreshadow Nature of Discovered Iteration Styles
The Introduction mentions that the user study 'discovered data analysts’ different iteration styles.' While the full details are rightly reserved for later sections, providing a very brief, high-level characterization of these styles (e.g., 'ranging from cautious, linear refinements to more exploratory, branched investigations') within the introduction could further pique reader interest and make this specific contribution more tangible from the outset. This is a medium-impact suggestion that could enhance the foreshadowing of key findings, fitting well within the summary of contributions.

"We conducted a user study that discovered data analysts’ different iteration styles and rich experiences using our new interaction approaches to complete iterative chart authoring tasks." (Page 2)

Implementation: Consider expanding the sentence slightly, for example: 'We conducted a user study that discovered data analysts’ different iteration styles (e.g., varying in their approach to branching and refinement) and rich experiences using our new interaction approaches...'

Non-Text Elements

Figure 1: With Data Formulator 2, analysts can iterate on a previous design by...

Full Caption

Figure 1: With Data Formulator 2, analysts can iterate on a previous design by (1) selecting a chart from data threads and (2) providing combined natural language and graphical user interface inputs in the chart builder to specify the new design. The AI model generates code to transform the data and update the chart. Data threads are updated with new charts for future use.

Figure/Table Image (Page 1)

First Reference in Text

For example, when exploring renewable energy trends, an analyst may find that similar trends across countries make a simple line chart (Figure 1) too dense for detailed comparisons.

Description

Overall Workflow Demonstration: Figure 1 illustrates the user interface and workflow of Data Formulator 2, a system for iterative data visualization. It shows how a user can refine a chart by selecting a previous version from "Data Threads" and then providing instructions in a "Chart Builder" using both graphical inputs and natural language.
Data Threads Panel: The "Data Threads" panel on the left displays a history of chart versions, akin to saved states in a design process. For instance, "thread-1" shows an initial chart from "energy.csv" with axes like "Year", "Electricity", and "Entity". "thread-2" shows an evolution, starting with "energy.csv" to a chart labeled "table-42" plotting "Renewable Per..." (Renewable Percentage) by "Entity" over "Year", and then further transformed into "table-86" based on a textual command.
Chart Builder Panel and User Input: The central "Chart Builder" panel shows an active chart modification step. An initial line chart (from "table-42") displays "Renewable Percentage" for multiple entities over years. The user inputs a natural language command: "Show only top 5 CO2 emission countries' trends." This panel also shows GUI elements for selecting chart type ("Line Chart") and mapping data fields to visual properties (e.g., x-axis: Year, y-axis: Renewable Per..., color: Entity).
Data Table Preview: A data table preview within the "Chart Builder" provides a snapshot of the underlying data being visualized. It includes columns like "Year", "Entity", and "Renewable Percentage", with example rows such as "2000, China, 16.639" and "2020, Japan, 21.325".
Resulting Refined Chart (New Thread): The outcome of the user's interaction is displayed on the right as a "new thread". This is a refined line chart showing "Renewable Percentage" (y-axis, values from approximately 0 to 40) versus "Year" (x-axis, from 2000 to 2015) for a filtered set of entities: China, Germany, India, Japan, and United States. This visualization directly reflects the application of the user's natural language instruction.

Scientific Validity

✅ Illustrates core system functionality: The figure effectively demonstrates the core functionality of Data Formulator 2—iterative visualization refinement through combined GUI and natural language inputs—which aligns with the paper's claimed contributions.
✅ Realistic use-case scenario: The chosen example, filtering a dense line chart of renewable energy trends to show specific countries, represents a common and realistic task in exploratory data analysis, thereby underscoring the system's practical relevance.
💡 AI mechanism not visually detailed: The caption states that an AI model generates code to transform data. While the figure shows the input (NL query) and output (refined chart), the AI's role and the nature of the code generation are not visually detailed within the figure itself. This is acceptable for a high-level overview but might leave a reader wondering about the underlying AI mechanism's complexity or verifiability from this figure alone.
💡 Ambiguity in data source for filtering criteria: The instruction "Show only top 5 CO2 emission countries' trends" implies filtering based on CO2 emissions. However, the visible input chart and data table in the "Chart Builder" focus on "Renewable Percentage." The figure does not explicitly show how CO2 emission data is accessed or linked for this filtering operation. It's assumed the AI handles this, but clarity on data provenance for the filtering criteria could be improved. For example, is CO2 data part of the 'energy.csv' or 'table-42' dataset?
💡 Minor inconsistency in year ranges: There's a minor inconsistency: the "new thread" chart displays data up to the year 2015, whereas the data table preview in the "Chart Builder" includes entries for the year 2020 (e.g., "2020, Japan, 21.325"). While not critical, aligning the year ranges or explaining the discrepancy would enhance consistency.

Communication

✅ Clear process illustration: The figure effectively uses a left-to-right visual flow (Data Threads -> Chart Builder -> new thread) to illustrate the iterative chart creation process, making the workflow easy to follow.
✅ Effective communication of multi-modal input: The direct annotation of the natural language query ("Show only top 5 CO2 emission countries' trends") on the UI mockup clearly communicates the multi-modal input capability of the system.
✅ Highlights system benefit: The visual contrast between the implied complexity of an initial chart and the filtered, cleaner "new thread" chart successfully highlights a key benefit of the system in reducing information overload.
✅ Clear and aligned caption: The caption accurately and concisely describes the overall process depicted, aligning well with the visual elements shown in the figure.
💡 Low resolution of text in Data Thread previews: The text within the small chart previews in the "Data Threads" panel (e.g., for table-49, table-42, table-86) is of low resolution and difficult to read. While these act as thumbnails, improving legibility could enhance the understanding of the iteration history. Suggestion: Increase the font size or simplify these preview elements if detailed content is not critical, or use higher-resolution inserts.
💡 Non-semantic labels for tables: The labels like "table-42", "table-49", etc., are likely internal system identifiers and lack semantic meaning for the reader, potentially adding slight clutter without clear informational value in this context. Suggestion: Consider de-emphasizing them or using more descriptive placeholders if these are not crucial for understanding the figure's message.
💡 High information density: The figure is information-dense, particularly the "Chart Builder" section. While this showcases various system features, it might initially overwhelm a viewer trying to grasp the core iterative step. Suggestion: Ensure visual hierarchy clearly guides the viewer's attention to the most critical elements for the illustrated workflow, perhaps with more prominent callouts or by slightly graying out less relevant UI components for this specific example.

Figure 2: An analyst explores electricity from different energy sources,...

Full Caption

Figure 2: An analyst explores electricity from different energy sources, renewable percentage trends, and country rankings by renewable percentages using a dataset on CO2 and electricity for 20 countries (2000-2020, table 1). The analyst creates five data versions in three branches to support different chart designs. DF2 allows users to manage iteration directions and create rich visualizations using a blended UI and natural language inputs.

Figure/Table Image (Page 3)

First Reference in Text

The initial dataset, shown in Figure 2-1, includes each country's energy produced from three sources (fossil fuel, renewables, and nuclear) each year and annual CO2 emission value (the CO2 emission data only ranges from 2000 to 2019).

Description

Panel Content and Identification: Panel 1 of Figure 2, identified as "Figure 2-1" in the reference text, displays a snippet of the initial tabular dataset. This dataset is central to the subsequent analyses and visualizations depicted in the overall figure, concerning energy production and CO2 emissions for various countries.
Data Columns and Units: The table in Panel 1 includes several data columns: "Year" (showing years like 2000 and 2020), "Entity" (representing the country, e.g., Australia, Brazil, China, United Kingdom, United States), "CO2 emissions (kt)" where 'kt' signifies kilotonnes, a unit of mass (e.g., Australia in 2000 reported 339450 kt of CO2 emissions), "Electricity from fossil fuels (TWh)", "Electricity from nuclear (TWh)", and "Electricity from renewables (TWh)". 'TWh' stands for Terawatt-hours, a common unit for large-scale energy measurement. For example, in 2000, Australia generated 181.05 TWh from fossil fuels, 0 TWh from nuclear, and 17.11 TWh from renewables.
CO2 Data Range and Null Values: An important detail visible in Panel 1 is that the "CO2 emissions (kt)" column shows "null" values for the United Kingdom and United States for the year 2020. This is consistent with the reference text, which clarifies that the CO2 emission data in this dataset only covers the period from 2000 to 2019.
Dataset Scope Illustration: The main caption for Figure 2 states that the complete dataset encompasses 20 countries over the years 2000-2020. Therefore, Panel 1 presents only an illustrative subset of this larger dataset, showcasing its structure and the types of data it contains.

Scientific Validity

✅ Foundation for Subsequent Analysis: This table (Panel 1) appropriately presents a sample of the raw input data. This serves as the necessary foundation for the data transformations and visualizations illustrated in the subsequent panels of Figure 2, thereby promoting transparency in the depicted analytical workflow.
✅ Comprehensive Variables for Energy Analysis: The dataset snippet includes key variables relevant to energy and environmental analysis: CO2 emissions and electricity generation categorized by source (fossil fuels, nuclear, renewables). This allows for a comprehensive basis for exploring trends in energy consumption and sustainability.
💡 Handling and Clarification of Missing CO2 Data: The representation of "null" values for CO2 emissions in 2020 is consistent with the accompanying reference text, which states that CO2 data is available only up to 2019. This accurate representation of data limitations is good. However, if this table were to be presented as a standalone element, a direct footnote explaining these nulls would be crucial for immediate clarity and to prevent users from misinterpreting them as simple data omissions or errors.
💡 Data Source and Representativeness Context: Panel 1 displays data for only a few countries, while the main figure caption specifies the dataset covers "20 countries." While this is acceptable for an illustrative snippet within a larger workflow diagram, a complete assessment of the dataset's scientific validity would necessitate more details, such as the specific source of the data (e.g., International Energy Agency, World Bank), the criteria for the selection of these 20 countries, and any preprocessing steps. This information is typically provided in a methods section rather than a figure caption.
💡 Clarity and Consistency of "Entity" Variable: The term "Entity" is used for countries in the examples shown. For robust analysis, it would be important to confirm that all entries under "Entity" across the full dataset consistently refer to national entities and do not include regional aggregates or other types of organizations, which could affect data comparability.

Communication

✅ Clear Tabular Structure: Panel 1 uses a standard and easily understandable tabular format for presenting the data. Column headers are descriptive and clearly labeled.
✅ Units Indicated: The units for measurements (kt for CO2 emissions, TWh for electricity) are explicitly provided in the column headers, which is a key aspect of good data presentation.
✅ Role as Input Clearly Indicated: Its position at the start of the workflow diagram in Figure 2, with arrows leading to transformation steps like "pivot" and "calculate", effectively communicates its role as the initial input data for the subsequent analyses shown in other panels.
💡 Font Legibility: While the text within this table panel is generally legible, it is quite small. Given the density of the overall Figure 2, any improvement in font size or contrast for this panel could enhance readability. Suggestion: If space allows, slightly increase font size for table contents or ensure high-resolution rendering.
💡 Context within Overall Figure Density: This panel is one of many in a complex figure. While its individual clarity is good, its effectiveness is tied to how well it integrates into the narrative of the entire Figure 2. The transformation labels originating from this panel are helpful in this regard. Suggestion: Ensure that the visual flow from this input table to the first derived charts is unmistakably clear, perhaps with slightly bolder connecting arrows or a subtle background grouping for this initial stage.

Figure 3: DF2 overview. Users create visualizations by providing fields...

Full Caption

Figure 3: DF2 overview. Users create visualizations by providing fields (drag-and-drop or type) and NL instructions to the Chart Builder, delegating data transformation to AI. Data View shows derived data. Users navigate data history and select contexts for the next iteration using (the thread in use is displayed as local data threads). They refine or create new charts by providing instructions in Chart Builder. The main panel provides pop-up windows to inspect code, explanations, and chat history.

Figure/Table Image (Page 4)

First Reference in Text

As Figure 4-2 shows, Megan first drags and drops existing fields Year and Entity to the x-axis and color, respectively.

Description

Overall UI Overview: Figure 3 presents a screenshot of the Data Formulator 2 (DF2) user interface, illustrating its main components and their arrangement. The interface is designed to help users iteratively create data visualizations.
1. Chart Builder: Component 1, labeled "Chart Builder for specifying chart with visual encodings and NL instructions," is located in the central-right part of the UI. It shows a configuration area where users can define a chart (e.g., a "Custom Line" chart) by assigning data fields like "Year" to the x-axis and "Renewable Per..." (Renewable Percentage) to the y-axis. It also includes a text input field for Natural Language (NL) instructions, here showing "include global median as an entity". A list of available "Data Fields" (e.g., "Global Median?", "Rank", "Renewable Percentage") is visible to the right of the chart builder.
2. Local Data Thread: Component 2, "Local Data Thread visualizes the current data thread and supports quick backtracking," is situated directly above the Chart Builder. It displays a sequence of chart states (e.g., from "energy.csv" to "table-42" to "table-77" to "table-18") within the currently active iteration path. This component allows users to see the immediate history and potentially revert or branch from recent steps.
3. Data Threads: Component 3, "Data Threads for navigating and selecting contexts to guide AI in the next iteration," is shown in the top-left panel. It displays multiple independent iteration histories (e.g., "thread-1", "thread-2", "thread-3"), each representing a distinct line of exploration. Users can select a chart from these threads to use as a starting point for new visualizations.
4. Data View: Component 4, "Data View for inspecting original and derived data," is located at the bottom of the UI. It shows a tabular representation of the data associated with a selected chart (here, "table-18"). Columns include "Entity", "Global Median?", "Renewable Percentage", and "Year", with example data like "China, No, 16.639126586, 2000". This allows users to inspect the data values underlying their visualizations.
Additional UI Features: The figure also indicates that pop-up windows are available for inspecting code, explanations, and chat history, although these pop-ups are not actively displayed in this particular screenshot. The main central panel shows a line chart visualizing "Renewable Percentage" over "Year" for different entities, including a "Global Median".

Scientific Validity

✅ Coherent UI representation: The figure provides a plausible and coherent visual representation of a user interface designed for iterative data visualization, aligning with the system's description in the caption and the broader context of the paper.
✅ Demonstrates key system concepts: The layout effectively demonstrates the key concepts of DF2, such as the separation of global "Data Threads" for managing different exploration paths, a "Local Data Thread" for the current iterative sequence, a "Chart Builder" for multi-modal input (GUI and NL), and a "Data View" for inspection. This supports the paper's claims about the system's architecture and user interaction model.
✅ Illustrates multi-modal input: The figure illustrates the multi-modal input capability through the Chart Builder, which shows both GUI elements for field mapping (e.g., drag-and-drop implied for axes) and a text box for NL instructions. This is a central aspect of the DF2 system.
✅ Consistent with AI-delegated tasks: The caption mentions that the AI is delegated data transformation tasks, and the main panel can show pop-ups for code and explanations. While the pop-ups are not shown, the overall UI structure is consistent with a system where AI plays a significant role in processing user requests and generating outputs (charts and derived data).
💡 Static representation of an interactive system: The figure is a static representation. The dynamic aspects of interaction, such as the actual drag-and-drop mechanism, the process of typing NL queries, or the appearance of pop-up windows, are implied rather than explicitly shown. This is a common limitation of static figures for interactive systems but does not detract from its validity as an overview.
💡 Mismatched reference text: The reference text provided ("As Figure 4-2 shows...") does not pertain to Figure 3. This figure's validity should be assessed based on its own content and caption, and its consistency with the overall paper narrative about DF2.

Communication

✅ Effective use of callouts: The use of numbered callouts (1, 2, 3, 4) effectively highlights the key components of the DF2 interface, guiding the viewer's attention to the most important functional areas described in the caption.
✅ Comprehensive system overview: The figure successfully provides a comprehensive visual overview of the DF2 system, illustrating the spatial arrangement and interplay of its main functional panels, which aids in understanding the user workflow.
✅ Clear and aligned caption: The caption is clear and directly corresponds to the visual elements and callouts in the figure, making it relatively self-contained for understanding the system's basic operation.
💡 Legibility of small text elements: The text within some UI elements, particularly the smaller chart previews in the "Data Threads" panel (top left) and some labels in the "Data Fields" list (top right), is quite small and difficult to read. Suggestion: For a figure intended as an overview, consider using slightly larger fonts in the mock-up, or if these are direct screenshots, ensure high resolution and potentially use magnified insets for critical small text areas.
💡 High information density: The figure is information-dense, presenting many UI elements simultaneously. While this shows the system's richness, it could be slightly overwhelming for a first-time viewer. Suggestion: Ensure a clear visual hierarchy, perhaps by using subtle color differences or line weights to differentiate primary interaction areas from secondary ones, or by focusing callouts on a more streamlined workflow if the goal is to illustrate a specific interaction path.
💡 Subtle workflow indicators: The arrows indicating workflow (e.g., from "Data Threads" to "Chart Builder") are somewhat subtle. Suggestion: Make these workflow indicators more prominent to better emphasize the iterative process described.

Figure 4: Experiences with DF2: (1) creating the basic renewable energy chart...

Full Caption

Figure 4: Experiences with DF2: (1) creating the basic renewable energy chart using drag-and-drop to encode fields; (2 and 3) creating charts requiring new fields by providing field names and optional natural language instructions to derive new data.

Figure/Table Image (Page 5)

First Reference in Text

As Figure 4-2 shows, Megan first drags and drops existing fields Year and Entity to the x-axis and color, respectively.

Description

Basic Chart Creation: Panel 1 of Figure 4 demonstrates the initial step of creating a basic chart in DF2. It shows the Chart Builder interface where a user is constructing a line chart. The data source is "energy.csv".
Field Encoding: The user has encoded the chart by assigning data fields to visual channels: "Year" is mapped to the x-axis, "Electricity from r..." (presumably 'Electricity from renewables (TWh)') is mapped to the y-axis, and "Entity" (representing countries or regions) is mapped to the color channel. This implies a drag-and-drop interaction, as described in the caption.
Available Data Fields: A "Data Fields" list is visible on the right, showing available fields like "CO2 emissions (kt)", "Electricity from fossil fuels (...", "Electricity from nuclear (T...", and "Electricity from renewables". This is where the user would select fields to encode.

Scientific Validity

✅ Represents fundamental chart creation: This panel accurately depicts a fundamental and common method for chart creation in many visualization tools—mapping existing data fields to visual encodings. It serves as a valid baseline interaction.
✅ Plausible UI for basic tasks: The UI shown is plausible for a system aiming to simplify visualization authoring, providing direct manipulation for basic tasks.
✅ Supports caption claim (part 1): The panel clearly supports the first part of the caption: "(1) creating the basic renewable energy chart using drag-and-drop to encode fields."
💡 Mismatch with provided reference text for Figure 4-2: The reference text mentions "Figure 4-2" which corresponds to the next panel, not this one. This panel (4-1) illustrates a different step than what is described in the provided reference text for Figure 4-2.

Communication

✅ Clear illustration of drag-and-drop: This panel clearly illustrates the drag-and-drop functionality for basic chart creation by showing fields being assigned to visual channels (x-axis, y-axis, color). The UI elements are distinct and the action is intuitive.
✅ Standard terminology: The use of standard chart builder terminology (e.g., "Line Chart", "x-axis", "y-axis", "color") makes the interface understandable.
✅ Clear Data Fields list: The visual representation of the "Data Fields" list on the right, from which fields are presumably dragged, is clear and aids in understanding the source of the encoded fields.
💡 Truncated field names: The text for the field names (e.g., "Electricity from r...") is slightly truncated due to space constraints. While the meaning can often be inferred, full field names would be ideal for complete clarity. Suggestion: If possible in the UI mockup, allow for slightly wider field display areas or use tooltips in the actual system (though not representable here).

Figure 5: Iteration with DF2: (1) provide an instruction to filter the...

Full Caption

Figure 5: Iteration with DF2: (1) provide an instruction to filter the renewable energy percentage chart by top CO2 countries, (2) update the chart with Global Median? and instruct DF2 to add the global median alongside the top 5 CO2 countries' trends, and (3) move Global Median? from column to opacity to update the chart design without deriving new data.

Figure/Table Image (Page 5)

First Reference in Text

On top of that, Megan provides a new instruction below the local data thread, "show only top 5 CO2 emission countries' trends," and clicks the “derive” button (Figure 5-1).

Description

Initial Chart Context: Panel 1 of Figure 5 (referenced as Figure 5-1 in the text) shows an intermediate step in the DF2 workflow. It displays the Chart Builder interface, which currently shows a line chart derived from "table-42". This chart plots "Renewable Per..." (Renewable Percentage) on the y-axis against "Year" on the x-axis, with different lines colored by "Entity" (countries).
Natural Language Instruction: The key action illustrated is the user providing a natural language (NL) instruction. In the text input field below the chart, the user has typed: "Show only top 5 CO2 emission countries' trends." This instruction is intended to filter the currently displayed renewable energy percentage chart based on a different criterion (CO2 emissions).
Action Button: A button labeled "formulate data" is visible below the NL instruction input field, indicating the action the user would take to submit this instruction to the DF2 system for processing.

Scientific Validity

✅ Represents NL-driven filtering input: This panel accurately depicts the input stage for a natural language-driven data transformation and filtering task, which is a core capability of the DF2 system as described.
✅ Realistic and non-trivial task: The scenario—filtering a chart based on criteria not directly plotted (CO2 emissions influencing a renewable energy chart)—is a realistic and non-trivial task in exploratory data analysis, showcasing the potential utility of the AI-driven transformation.
✅ Supports caption claim (part 1): The figure panel directly supports the first part of the main Figure 5 caption: "(1) provide an instruction to filter the renewable energy percentage chart by top CO2 countries".
💡 Validity depends on subsequent AI processing: The panel clearly shows the user's intent expressed through natural language. The scientific validity of the outcome of this instruction would depend on the AI's ability to correctly interpret the query, access relevant CO2 data (which is not explicitly shown as part of "table-42"), perform the ranking and filtering, and then apply this to the renewable percentage data. This panel only shows the input step.
💡 Minor discrepancy in button label (text vs. figure): The reference text mentions a "derive" button, while the figure shows a "formulate data" button. This is a minor discrepancy but worth noting for consistency between text and visual. Assuming "formulate data" is the intended label, it reasonably conveys the action.

Communication

✅ Clear focus on NL instruction: The panel clearly highlights the natural language instruction input field, making the user's action (providing a textual command) the central focus, which aligns with the caption's description of this step.
✅ Context of existing chart is clear: The context of the existing chart (Renewable Energy Percentage) is visible, providing a clear before-state for the intended filtering operation.
✅ Clear visual cue for step 1: The numbering "1" effectively links this panel to the first step described in the main caption for Figure 5.
💡 Legibility of chart preview details: The chart preview within this panel is relatively small, and the details (e.g., specific country names in the legend, axis labels) are difficult to discern. Suggestion: While this is an intermediate step, ensuring slightly better legibility or using a simplified iconic representation of the chart could improve clarity without losing context.
💡 Button label clarity: The term "derive" on the button, as mentioned in the reference text (though the button label in the figure is "formulate data"), might be slightly ambiguous. "Formulate data" or "Apply filter" or "Update chart" might be more direct. This is a minor point regarding the UI terminology itself. Suggestion: Ensure button labels clearly reflect the action being performed; "Formulate data" is reasonably clear.

Method

Key Aspects

⚙️ Core Design Philosophy: Decoupling and Reuse: The system design of Df2 is anchored by two fundamental principles aimed at facilitating iterative visualization. Firstly, it decouples chart specification from data transformation, allowing users to define visual intent via a high-level interface while AI handles data manipulation through code generation (Vega-Lite for charts, AI for transformations). Secondly, Df2 supports extensive reuse by organizing the iteration history into 'data threads,' where data is a first-class object. This structure enables users to easily navigate past analytical states, fork new exploratory paths, or incrementally build upon previous results, significantly streamlining iterative authoring.
💬 Multi-Modal Input for Chart Composition: Df2 employs a multi-modal approach for chart composition, blending a shelf-configuration Graphical User Interface (GUI) with Natural Language (NL) inputs. Users can drag-and-drop existing data fields onto visual channels or type names for new, to-be-derived fields, providing precision and structure. Supplemental NL instructions clarify complex requirements or specify transformations for these new fields. Df2 then generates a Vega-Lite specification from a selected chart type (e.g., scatter, line, bar) and user inputs, using predefined templates. This method significantly reduces user effort in articulating complex chart designs compared to purely text-based prompting.
🧠 AI-Driven Data Transformation and Refinement: A core component of Df2's methodology is its reliance on a Large Language Model (LLM) for automated data transformation. When a chart requires new data, Df2 assembles a detailed prompt including a system directive (LLM's role), data transformation context (schema, sample data, dialog history), and the user's goal (from NL and shelf inputs). A key step is "goal refinement," where the LLM first clarifies the task into a structured JSON object before generating Python code. Df2 executes this code and, if runtime errors occur, re-queries the LLM with the error message for automated debugging, enhancing transformation robustness.
🔄 Iteration Management with Data Threads: Df2 introduces 'data threads' to visualize and manage the analyst's interaction history with the AI, enabling effective control over iteration. Each node in a data thread represents a distinct data version, linked by edges signifying user instructions for transformations, with visualizations directly associated with their source data. This design supports non-linear exploration, allowing users to select any previous state as a basis for new branches or follow-up instructions. Df2 provides both global data threads for broad navigation (with previews) and integrated local data threads for context awareness and quick revisions (rerun, follow-up, modify), ensuring the AI always has relevant context for incremental updates.
🔍 User Inspection and Styling Capabilities: To foster transparency and user control, Df2 provides robust mechanisms for inspecting AI-generated artifacts and refining visualizations. Users can view transformed data tables alongside generated charts, and pop-up windows offer access to AI-generated Python code, its natural language explanations (also AI-generated), and raw LLM chat history. This multi-faceted inspection system caters to various verification styles, from high-level chart assessment to detailed code review. For stylistic changes not requiring new data (e.g., color schemes), users can directly edit chart properties in the chart builder, updating the Vega-Lite script for immediate visual feedback, bypassing further AI interaction.
💻 Implementation Details and Performance: Df2 is implemented as a React-based web application with a Python server for LLM-driven data transformation tasks. The system was tested with several OpenAI models, including GPT-3.5-turbo (used in the user study), with most models responding within approximately 10 seconds, supporting interactive use. The paper acknowledges potential performance bottlenecks from Vega-Lite rendering, especially with large datasets (over 20,000 rows) or extensive data threads (more than 20 charts). On-demand re-rendering of charts is proposed as a future optimization to enhance system responsiveness in such demanding scenarios.

Strengths

✅ Clear Articulation of Dual Design Strategy
The section clearly lays out the foundational design choices of Df2—decoupling chart specification from data transformation and using data threads for iteration—providing a strong conceptual framework for the subsequent detailed descriptions. This upfront clarity helps the reader understand the core architectural decisions.

"First, to enable users to specify their intent using multiple paradigms (shelf-configuration UI and NL inputs) Df2 decouples chart specification from data transformation... Second, to support reuse, Df2 organizes the iteration history as data threads with data as first-class objects." (Page 6)
✅ Detailed Multi-Modal Input Mechanism
The description of how users compose charts using a blend of GUI (shelf-configuration) and NL inputs is thorough and effectively justifies the benefits of this approach, such as saving users effort in writing verbose prompts for complex designs.

"The shelf-configuration saves users efforts from writing prompts to explain complex chart designs." (Page 6)
✅ Sophisticated AI Prompting and Error Handling
The method details a sophisticated, multi-segment prompting strategy for the LLM, including a "goal refinement" step and the inclusion of dialog history. The automated error correction mechanism, where Df2 queries the LLM with error messages, demonstrates a robust approach to AI integration.

"When such errors occur, Df2 tries to correct the errors by querying the LLM with the error message and a follow-up instruction to repair its mistakes [9, 38]." (Page 7)
✅ Comprehensive Data Thread Functionality
The distinction between global and local data threads, along with their specific roles in navigation, context awareness, and facilitating quick revisions, is well-explicated. This highlights a nuanced understanding of user needs during iterative analysis.

"To accommodate these different needs, Df2 presents both global data threads and local data threads." (Page 8)
✅ Emphasis on User Verification and Control
The system provides multiple avenues for users to inspect AI-generated results (data, code, explanations, chat history) and allows direct manipulation of chart styles without AI intervention. This empowers users and builds trust.

"This design accommodates various user verification styles [14, 57] such as viewing high-level correctness from the chart, inspecting corner cases in the data, examining the transformation output, and understanding the transformation process through the code." (Page 9)

Suggestions for Improvement

💡 Clarify User Interaction with LLM's "Goal Refinement" Step
The "goal refinement" step, where the LLM elaborates the user's intent into a JSON object before code generation, is an important design feature for improving transformation accuracy. While its rationale is clearly stated, the Method section could enhance reader understanding by briefly clarifying how, or if, this refined goal is exposed to the user. Knowing whether users can inspect or influence this AI-interpreted goal before final code generation is pertinent to understanding the system's transparency and the user's agency in the AI-assisted workflow. This is a medium-impact suggestion as it relates to the interpretability of the AI's intermediate reasoning and user oversight.

"The design rationale behind the “goal refinement” step is to allow the LLM to reason about any potential discrepancy between users’ provided fields and their instruction..." (Page 6)

Implementation: Add a sentence after describing the "goal refinement" step (page 6) to specify if the refined JSON is visible to the user (e.g., in logs, or as a pre-confirmation step) or if it's a purely internal process. For instance: "This refined JSON goal is logged as part of the interaction history, accessible via the 'view chat history' pop-up, allowing users to retrospectively understand the LLM's interpretation, though it is not presented for pre-confirmation in the current design."
💡 Detail Error Handling for Vega-Lite Instantiation and Rendering
The paper comprehensively describes error handling for the AI-generated Python data transformation code. However, after successful code execution, the process involves instantiating the Vega-Lite script with the new data, including inferring semantic types. It would strengthen the Method section to briefly address how Df2 handles potential errors that might arise specifically during this Vega-Lite instantiation or subsequent rendering phase (e.g., type mismatches not caught by Python, Vega-Lite spec errors, or rendering engine issues with the transformed data). This is a low-to-medium impact suggestion that would provide a more complete picture of the system's robustness.

"This is done by first inferring semantic types of newly generated columns (to determine their encoding type), and then assembling the data with the script to render the visualization (Figure 6-3)." (Page 7)

Implementation: Following the description of Vega-Lite script instantiation (page 7), add a sentence clarifying the handling of errors at this stage. For example: "If errors arise during the Vega-Lite instantiation or rendering (e.g., due to incompatible data types with the chart template or malformed Vega-Lite specifications), Df2 currently surfaces these errors to the user, prompting a revision of either the chart design or the transformation logic. Future work could explore AI-assisted diagnostics for such visualization-specific errors."

Non-Text Elements

Figure 6: DF2's workflow: (1) DF2 generates a Vega-Lite spec skeleton based on...

Full Caption

Figure 6: DF2's workflow: (1) DF2 generates a Vega-Lite spec skeleton based on user specifications and chart type. (2) If new fields (e.g., Rank) are required, DF2 prompts its AI model to generate data transformation code. (3) The Vega-Lite skeleton is then instantiated with the new data to produce the desired chart.

Figure/Table Image (Page 7)

First Reference in Text

Below shows the LLM's refined goal for the task in Figure 6, and the generated code is shown in Figure 6-2.

Description

User Specification Input: Panel 1 of Figure 6 illustrates the initial phase of DF2's workflow. It shows a user interacting with the Chart Builder interface. The user has selected 'Line Chart' as the chart type from the 'table-42' data source. They have mapped 'Year' to the x-axis, a new field 'Rank' to the y-axis, and 'Entity' to the color encoding. Additionally, a natural language (NL) instruction, "rank by renewable percentage," has been provided.
Vega-Lite Skeleton Generation: As a result of these user specifications, DF2 generates an initial Vega-Lite JSON skeleton. Vega-Lite is a high-level grammar for interactive graphics, allowing concise descriptions of visualizations. The generated skeleton shown is: `{"mark": "line", "encoding": {"x": {"field": "Year", "type": "temporal"}, "y": {"field": "Rank", "type": "?"}, "color": {"field": "Entity", "type": "nominal"}}}`. Notably, the 'type' for the 'Rank' field is marked with a '?', indicating it's yet to be determined, as 'Rank' is a new field to be derived.

Scientific Validity

✅ Plausible first step in AI-assisted workflow: This panel accurately represents the first step in a plausible workflow for AI-assisted visualization: translating user intent (expressed via GUI and NL) into a structured chart specification (Vega-Lite).
✅ Correct handling of new field specification: The generation of a Vega-Lite skeleton with placeholders (like the '?' for the type of 'Rank') correctly reflects a scenario where new data fields need to be derived before the chart can be fully specified, aligning with the overall problem DF2 aims to solve.
✅ Demonstrates multi-modal input: The combination of GUI inputs for visual encoding and NL input for data derivation intent is a key aspect of DF2's proposed multi-modal interaction, and this panel effectively demonstrates that combination.

Communication

✅ Clear depiction of user input and skeleton generation: This part of the figure clearly shows the user's input (chart type, field mappings like 'Year' to x-axis, 'Rank' to y-axis, and 'Entity' to color, plus an NL instruction 'rank by renewable percentage') and the corresponding initial Vega-Lite JSON skeleton. The visual connection between user input and the skeleton is evident.
✅ Readable Vega-Lite skeleton: The Vega-Lite JSON is presented in a readable format, making it easy to understand the structure of the chart specification being generated. The placeholder '?' for the type of 'Rank' clearly indicates an unresolved part of the specification.
✅ Intuitive UI representation: The UI elements for user specification (e.g., x-axis, y-axis, color dropdowns) are standard and intuitively represent how a user would define a chart.
✅ Clear workflow indication: The connection from the user input panel to the Vega-Lite skeleton is indicated by an arrow, which helps in following the workflow step.

Figure 7: DF2 converts user encodings into a Vega-Lite specification, which is...

Full Caption

Figure 7: DF2 converts user encodings into a Vega-Lite specification, which is combined with AI-transformed data to visualize country ranks in 2000 and 2020.

Figure/Table Image (Page 7)

First Reference in Text

For example, as shown in Figure 7, when the user drags Year→ x, Entity→ y and types Rank in y, the line chart template mentioned above is instantiated with provided fields: if the field is available in the current data table, both field name and encoding type are instantiated (e.g., Year with the temporal type), otherwise the encoding type is left as a "<placeholder>" to be instantiated later when data transformation completes.

Description

Overall Process Illustration: Figure 7 illustrates the process by which Data Formulator 2 (DF2) translates user-defined chart encodings and natural language instructions into a visual chart, involving the generation of a Vega-Lite specification and AI-driven data transformation.
User Input and Initial Transformation: The left part of the figure shows the user's input. The user has selected a "Ranged Dot Plot" chart type. The data source is initially "table-17", which is transformed into "table-56" based on an instruction "Rank countries by renewable percentage". For the chart encodings, "Entity" (country names) is mapped to the x-axis, "Rank" is mapped to the y-axis, and a field representing "2000 or 2020" is mapped to the color. An additional natural language instruction "show only 2000 and 2020" is provided in the chart builder.
Vega-Lite Specification Generation: The middle part displays the generated Vega-Lite JSON specification. Vega-Lite is a high-level language for creating interactive data visualizations. This specification defines a layered chart, consisting of a line layer and a point layer. Key parts of the encoding include: `"x": {"field": "Entity", "type": "nominal"}` and `"y": {"field": "Rank", "type": "<placeholder>"}`. The `"<placeholder>"` for the type of "Rank" and for the type of the color field `"2000 or 2020"` signifies that these are new or transformed fields whose data types will be determined after the AI processes the data.
Resulting Visualization: The right part of the figure shows the final rendered chart. It is a dot plot where the x-axis lists countries (e.g., Australia, Brazil, Canada, China). The y-axis represents "Rank", ranging from 0 to 15. Each country has two points, one for the year 2000 (e.g., blue color) and one for 2020 (e.g., orange color), showing its rank in those respective years. For example, Australia has a rank of approximately 11 in 2000 and a rank around 12-13 in 2020.

Scientific Validity

✅ Accurately demonstrates core DF2 workflow: The figure accurately demonstrates the claimed workflow of DF2: user specification leading to a Vega-Lite skeleton, which is then populated by AI-transformed data to produce the final chart. This aligns with the system's described methodology.
✅ Sound use of Vega-Lite and placeholders: The use of Vega-Lite as an intermediate representation is a sound choice, as it's a well-established grammar for visualization. The concept of using placeholders for types of derived fields is a logical approach in a system where data transformation is dynamic.
✅ Realistic and relevant example task: The example task—visualizing changes in country rankings between two specific years—is a realistic analytical task that would typically require data transformation (ranking, filtering by year), making it a suitable showcase for DF2's capabilities.
💡 AI transformation details are implicit: The figure implies that the AI transformation (e.g., calculating "Rank", filtering for "2000 or 2020") happens between the Vega-Lite skeleton generation and the final chart rendering. The details of this AI transformation are not shown in this figure but are central to the system's functionality. The validity of the final chart rests on the AI correctly performing these transformations.
💡 Mismatch between figure content and specific example in reference text: The reference text describes a line chart example with fields Year, Entity, and Rank. Figure 7, however, shows a "Ranged Dot Plot" with Entity, Rank, and "2000 or 2020" (derived from Year). While both illustrate the concept of placeholders, the specific chart type and field mappings in Figure 7 differ from the example detailed in the reference text. This is not a flaw in the figure itself but a slight mismatch in the provided textual example for this specific figure.

Communication

✅ Clear workflow visualization: The figure effectively uses a three-part layout (User Input -> Vega-Lite Spec -> Resulting Chart) to illustrate the transformation process, making the workflow clear and easy to follow.
✅ Readable Vega-Lite specification with placeholders: The Vega-Lite JSON specification is presented in a structured and readable manner, with placeholders like `"<placeholder>"` clearly indicating where information from AI-transformed data will be inserted. This effectively communicates the concept of a template being filled.
✅ Clear and relevant resulting chart: The resulting chart (a ranged dot plot) is a clear visual output that directly corresponds to the user's specifications (ranking for years 2000 and 2020). The use of color to distinguish between the two years is effective.
✅ Accurate and informative caption: The caption accurately summarizes the process depicted in the figure, enhancing its self-containedness.
💡 Potential ambiguity in NL instruction application: The user input panel on the left shows data source "table-17" being transformed to "table-56" with the instruction "Rank countries by renewable percentage". However, the NL instruction box below the chart encodings shows "show only 2000 and 2020". It's slightly unclear if both instructions are applied sequentially or if the latter refines the former. Suggestion: Clarify if the NL instruction in the encoding panel is a separate, subsequent step or part of the same transformation that also produces "table-56".
💡 Readability of x-axis labels in the resulting chart: The entities (countries) on the x-axis of the resulting chart are numerous and their labels are vertically oriented, making them somewhat hard to read quickly. Suggestion: For illustrative purposes, consider showing a subset of countries or using a horizontal bar chart if space allows for better label readability, though the current plot does effectively show the 'ranged' aspect.

Figure 8: Data threads and local data threads (right). Users can select...

Full Caption

Figure 8: Data threads and local data threads (right). Users can select previous data or charts to create new branches, and the AI reuses code for new transformations based on user instructions. The local data thread offers shortcuts to (1) rerun the previous instruction, (2) issue a follow-up instruction, or (3) expand the previous card to revise and rerun the instruction.

Figure/Table Image (Page 8)

First Reference in Text

As shown in Figure 8, the code and the conversation history are attached to each data node.

Description

Overall Figure Composition: Figure 8 displays two key components of the DF2 interface related to managing iteration history: the global "Data threads" view on the left, and the "local data threads" view on the right.
Global Data Threads View (Left Panel): The left panel, labeled "Data threads," illustrates a broader, branching history of data analysis. It shows nodes representing different data states (e.g., "global-energy", "table-17", "table-56", "table-74") and visualizations (small chart previews). Arrows connect these nodes, indicating transformations. Some transformations are associated with Python code snippets, for example, the transformation leading to "table-17" shows code calculating "Total_Electricity (TWh)" and "Renewable Percentage". This panel demonstrates how users can create new branches (e.g., from "global-energy" leading to two distinct paths, one via "table-74" and another via "table-17"). An option to "create a new chart" is shown originating from "table-17".
Local Data Threads View (Right Panel): The right panel, labeled "local data threads," provides a focused view of the current active iteration path. It shows a linear sequence: "global-energy" to "table-17" (displaying "Renewable Per..." by "Entity") to "table-56" (displaying "Rank" by "Year" for different entities, with the instruction "Rank countries by renewable percentage"). This local view offers shortcuts related to the last operation (leading to "table-56"): an icon suggesting a rerun/refresh, a text input field for a follow-up instruction (pre-filled with "Rank country by CO2 emission instead"), and an icon suggesting expansion or editing of the previous instruction.
AI Code Reuse and History Attachment: The caption and reference text indicate that AI reuses or modifies code for new transformations, and that code and conversation history are attached to data nodes. The figure visually supports the code attachment by showing different Python snippets for different transformations in the global view.

Scientific Validity

✅ Illustrates data provenance and iterative refinement: The figure effectively illustrates the concept of data provenance and iterative refinement in a visual analytics system. The branching data threads and local iteration shortcuts are plausible and well-established mechanisms for supporting exploratory data analysis.
✅ Demonstrates AI-assisted code transformation concept: The depiction of AI reusing/modifying code (implied by different Python snippets for different transformations originating from similar base data) is consistent with the paper's claims about AI-assisted data transformation. This demonstrates a core aspect of the system's intelligence.
✅ Sound interaction model for local iteration: The local data thread with shortcuts for rerunning, follow-up instructions, and editing previous steps provides a sound interaction model for efficient iteration and error correction, which is crucial for practical usability.
💡 Conversation history attachment not explicitly visualized: The reference text states that "code and the conversation history are attached to each data node." The figure visually shows code snippets associated with transformations (edges) or resulting data states (nodes). However, the "conversation history" (e.g., the sequence of NL prompts or AI responses beyond the code itself) is not explicitly visualized as attached to each node in this figure. This is a minor omission if the primary intent is to show code reuse.
✅ Plausible AI code generation examples: The figure implies that the AI generates Python code for data transformation. The specific Python functions shown (e.g., calculating percentages, ranking) are realistic examples of data wrangling tasks. The scientific validity of this aspect hinges on the AI's actual capability to generate correct and efficient code for a wide range of user requests, which is not something this figure alone can prove but it appropriately illustrates the system's designed behavior.

Communication

✅ Effective side-by-side comparison: The side-by-side presentation of the global "Data threads" (left) and the "local data threads" (right) effectively contrasts the broader exploration history with the focused current iteration path.
✅ Clear depiction of branching history: The visual representation of branching in the global "Data threads" clearly communicates the non-linear nature of data exploration and the system's support for it.
✅ Clear illustration of local iteration shortcuts: The local data thread on the right, with its focused view and clearly indicated shortcut options (implied rerun, text input for follow-up, expand icon), successfully illustrates the mechanisms for quick iteration and revision.
✅ Caption aligns with visual elements: The caption aligns well with the visual elements, particularly in describing the functionalities of the local data thread and the concept of AI reusing code for new transformations (implied by different code snippets shown in the global view).
💡 Legibility of code snippets: The Python code snippets shown within the global "Data threads" view are very small and their content is largely illegible. While they serve to indicate the presence of code, their specific details cannot be discerned. Suggestion: For an overview figure, this is acceptable, but if the specific nature of code reuse or modification was critical to convey, consider using callouts with larger, more readable code excerpts.
💡 Clarity of icons without caption: The icons for "rerun" (a refresh-like symbol) and "expand" (a square with an arrow) in the local data thread panel are somewhat generic. Their specific functions are primarily understood from the caption's enumeration. Suggestion: In an actual UI, tooltips would be essential. For the figure, ensuring the caption's numbered points clearly map to these visual elements is important, which is currently done.
💡 Minor ambiguity in caption phrasing: The phrase "local data threads (right)" in the caption is a bit ambiguous. It means the local data thread component is shown on the right side of the figure. Suggestion: Phrasing like "local data threads (shown on the right)" could offer slightly more clarity.

Figure 9: DF2 provides explanations of the code generated by AI to assist users...

Full Caption

Figure 9: DF2 provides explanations of the code generated by AI to assist users understand the data transformation. This example is the explanation of the code behind table-56 in Figure 8.

Figure/Table Image (Page 9)

First Reference in Text

Figure 9 shows the explanation for the code behind table-56 in Figure 8.

Description

Purpose and Context of Explanation: Figure 9 displays a pop-up window from the DF2 interface titled "Data transformation explanation (global-energy → table-56)". This window provides a step-by-step natural language explanation of the data transformation process that the AI performed to convert an initial dataset (referred to as "global-energy") into a derived data table ("table-56"). This specific transformation is related to the example shown in Figure 8, where "table-56" represents data with country rankings by renewable energy percentage.
Step-by-Step Transformation Logic: The explanation consists of four numbered steps: 1. **Calculate the `Total electricity (TWh)`** by summing `Electricity from fossil fuels (TWh)`, `Electricity from nuclear (TWh)`, and `Electricity from renewables (TWh)`. 'TWh' stands for Terawatt-hour, a unit of energy. 2. **Determine the `Renewable_percentage`** by dividing `Electricity from renewables (TWh)` by `Total electricity (TWh)` (and implicitly multiplying by 100, though not explicitly stated here, it's standard for percentages). 3. **Rank each `Entity` by `Renewable_percentage` for each `Year`**, with the highest percentage receiving rank 1. An 'Entity' here refers to a country or region. 4. **Select and return the columns `Year`, `Entity`, and `Rank`**. This means the final table ("table-56") will contain only these three pieces of information for each record.
Disclaimer and Additional Information: At the bottom of the explanation window, there is a disclaimer: "AI generated results can be inaccurate, inspect it!" This serves as a caution to the user. Below this, "data: table-56" indicates the specific data table this explanation pertains to, and there are icons suggesting further actions or information (e.g., an information icon, a thumbs up/down for feedback).

Scientific Validity

✅ Enhances transparency of AI actions: The provision of a natural language explanation for AI-generated code is a valuable feature for transparency and user understanding. It allows users, especially those less familiar with the specific programming language (e.g., Python, as seen in Figure 8), to comprehend the logic behind the data transformations.
✅ Logical and correct transformation steps: The steps outlined in the explanation (calculating total electricity, then renewable percentage, then ranking, and finally selecting columns) represent a logical and correct sequence of operations to achieve the goal of ranking entities by renewable energy percentage. This aligns with standard data analysis practices.
✅ Consistent with preceding figures and claims: The explanation directly corresponds to the transformation required to produce "table-56" as conceptualized in Figure 8 (ranking countries by renewable percentage). This consistency supports the claim that DF2 can explain its operations.
✅ Acknowledges AI limitations: The inclusion of the disclaimer about potential inaccuracies in AI-generated results is a methodologically sound approach, acknowledging the current limitations of AI and promoting critical user engagement.
💡 Accuracy contingent on AI's explanation generation capability: While the explanation is clear, its accuracy is contingent on the AI's ability to correctly generate both the code and this corresponding explanation. The figure itself shows the explanation, not the process of generating it. The scientific validity of the explanation feature relies on the robustness of the AI model used to produce such explanations consistently and accurately across various transformations.
💡 Minor imprecision in percentage calculation description: Step 2, "Determine the Renewable_percentage by dividing Electricity from renewables (TWh) by Total electricity (TWh)," correctly describes the calculation of a ratio. For it to be a percentage, multiplication by 100 is implied but not explicitly stated in this step. The Python code in Figure 8 for table-56 does include `* 100`. Ensuring the explanation is fully explicit (e.g., "...and multiplying by 100 to express as a percentage") would improve precision, though it's a minor point as the term 'percentage' itself implies this.

Communication

✅ Clear, sequential explanation: The use of a numbered list for the explanation steps makes the transformation process easy to follow and understand sequentially.
✅ Highlighting of key terms: Key terms within the explanation, such as `Total electricity (TWh)`, `Renewable_percentage`, `Year`, `Entity`, and `Rank`, are highlighted (e.g., bolded or distinctly formatted), which improves readability and helps the user quickly identify important variables or concepts.
✅ Accessible language: The explanation is concise and uses relatively plain language, making it accessible even to users who may not be Python programming experts but understand data manipulation concepts.
✅ Prominent disclaimer: The disclaimer "AI generated results can be inaccurate, inspect it!" is prominently displayed, which is good practice for systems involving AI-generated content, encouraging user verification.
✅ Clear title indicating scope: The title "Data transformation explanation (global-energy → table-56)" clearly indicates the scope of the explanation, linking it to specific data states shown in other figures (like Figure 8).
✅ Clean visual design: The visual design is clean and uncluttered, resembling a typical pop-up or modal window, which is appropriate for displaying such information without overwhelming the main interface.

Results

Key Aspects

📊 User Performance and System Usability Feedback: This aspect summarizes the quantitative performance and qualitative user experiences from the study. All eight participants successfully created 16 visualizations, with average completion times under 20 minutes for the first task (7 charts) and around 33 minutes for the second (9 charts). The study noted that these times might be overestimates due to users exploring beyond task requirements. Participants generally found Df2 significantly more efficient for iterative tasks involving data transformation compared to their familiar tools or chat-based AI assistants, praising its iteration support and the UI+NL (User Interface + Natural Language) approach for structuring intent.
⚙️ Diverse Iteration Strategies and Workflow Organization: Participants in the study naturally developed distinct strategies for managing their iterative data exploration workflows within Df2, particularly in how they organized 'data threads'—the system's feature for tracking iteration history. One group preferred creating 'wider' exploration trees with more branches and shorter threads, often deriving multiple transformations from a single data state. Another group favored 'deeper' trees with fewer, but longer, threads, incrementally building upon previous results. These choices were influenced by factors like coding habits, desire for a 'terse' workspace, or a preference for a 'smooth train of thoughts' by maintaining context over longer sequences.
🔄 Iteration Tactics: Backtracking/Revision vs. Follow-Up: The study observed varied approaches to correcting errors or generating new visualizations from existing states, categorized as 'backtracking and revise' versus 'follow-up'. Some users preferred to revisit and modify previous prompts to refine data or correct mistakes, aiming to keep their workspace clean or build a 'global expanded dataset'. Others favored issuing new follow-up instructions, even for minor corrections, often creating numerous intermediate data nodes. This latter approach was sometimes driven by a desire to preserve the history of their thought process and build upon previous steps rather than nullifying them.
💬 Varied Prompting Techniques and User Intent Expression: Participants employed a range of styles when formulating Natural Language (NL) prompts for Df2, though all prompts were generally concise (under 20 words). Common styles included imperative commands describing transformations or desired outputs, questions, and chat-style interactions. Some users, like P5, adopted a distinct style of directly instructing the AI to add, mutate, or retrieve specific data columns. Others, like P7, used more verbose prompts to reiterate computation details, while P6 preferred minimal NL input when using descriptive field names, reflecting differing strategies to 'minimize the error space for AI'.
🛡️ Verification Approaches and Dynamics of Trust in AI Outputs: The process of verifying AI-generated transformations and visualizations was crucial for participants. Errors were often spotted through visual discrepancies in charts (e.g., incorrect encodings, cardinality issues). For less obvious computations, verification methods varied by user background, involving examination of AI-generated code explanations, the Python code itself (even by non-programmers), or values in the resulting data tables. Trust in the system often developed incrementally, with users validating simpler steps before assuming correctness for more complex transformations built upon them, highlighting the importance of transparency and understandable AI outputs.
🛠️ Participant-Driven Feedback for Df2 Enhancements: During the debriefing, participants provided valuable feedback for enhancing Df2. Suggestions included interface modifications, such as a larger view for data threads to encourage more branching and transformations (P1). There was a desire for the AI to be more proactive in seeking disambiguation when user intent is unclear, rather than attempting to solve tasks with ambiguous specifications (P3). The challenges faced by P7 with overly detailed or incorrect instructions also led to discussions about incorporating templates or AI-driven feedback for prompt crafting to reduce errors and streamline the iteration process.

Strengths

✅ Robust Reporting of Task Completion and User Challenges
The Results section clearly presents quantitative data on task completion rates and times, providing a solid baseline for Df2's performance in the hands of users. This is complemented by a well-categorized breakdown of hints requested, offering insights into areas where users faced challenges.

"All participants successfully completed all 16 visualizations (Figure 10): participants took less than 20 mins on average to finish the seven charts in task 1, and about 33 mins for the nine charts in task 2." (Page 10)
✅ Effective Use of Qualitative Data and Participant Quotes
The integration of direct participant quotes throughout the section is highly effective. These quotes vividly illustrate user experiences, particularly when comparing Df2 to other tools and explaining their interaction strategies, adding depth and authenticity to the findings.

"P2 mentioned “with ChatGPT, I would have to put a bit more effort to specify the instructions to get what I want, iterations here is much faster with UI.”" (Page 10)
✅ In-depth Analysis of Emergent Iteration and Prompting Styles
The paper provides a nuanced and detailed characterization of the diverse iteration styles (wide vs. deep trees; backtracking/revise vs. follow-up) and prompting techniques users developed. This qualitative analysis, supported by specific examples and user rationale, reveals valuable insights into how users adapt to and utilize novel AI-powered systems.

"We characterize participants’ iteration styles based on their preferences between “wider” versus “deeper” tree structures, “backtrack and revise” versus “follow up” for providing new instructions to the AI, as well as their preferences for including intermediate tables in their threads." (Page 10)
✅ Insightful Examination of User Verification Processes
The section thoroughly explores participants' verification strategies, highlighting how different backgrounds influenced their methods for assessing AI-generated outputs (e.g., relying on code explanations, the code itself, or data tables). This sheds light on trust formation and the importance of transparency features.

"P3 mentioned “as an expert, I like to see the prompt to the model, and then the code generated; but as a business user, I would imagine using more data, chart, and explanations.”" (Page 13)
✅ Transparent Reporting of User Feedback for System Improvement
The inclusion of 'Additional Feedback' detailing specific user suggestions for Df2 improvements (e.g., interface affordances, AI disambiguation) demonstrates a commitment to user-centered design and provides a clear pathway for future system refinements.

"P1 commented on how small interface variations might give different affordances. For instance, “if there was a large view for data threads, it would encourage me to do more transformations and do more branching.”" (Page 13)

Suggestions for Improvement

💡 Quantify Observed Differences in Iteration Styles
Medium impact. While the qualitative descriptions of iteration styles are rich and well-supported by quotes, augmenting this with quantitative data would provide more objective evidence for the observed behavioral clusters (e.g., 'wide vs. deep' and 'backtrack vs. follow-up'). This analysis directly pertains to user study data and belongs in the Results section. It would enhance the rigor of these findings, allowing for more direct comparisons between styles and potentially revealing correlations with user backgrounds or task outcomes. Such data could further substantiate the claims about distinct user approaches.

"From the high-level organization of data threads, one group of participants (P1, P3, P5, P7, P8) preferred to branch out more often with shorter data threads than the other group (P2, P4, P6), who preferred to create fewer but longer data threads instead." (Page 10)

Implementation: Analyze the recorded user study data, particularly the interaction logs and workflow structures exemplified in Figure 12. Extract metrics such as: average number of branches created per participant, average depth of data threads, frequency of 'revise' actions (self-loops) versus 'follow-up' actions (new nodes) for each participant or user group. Present these metrics concisely, perhaps in a small table or integrated into the textual discussion of these iteration styles, to complement the qualitative observations.
💡 Investigate Correlations Between Prompt Styles and Task Outcomes
Medium-to-high impact. The Results section effectively details the diverse prompting styles adopted by participants. An analysis exploring potential correlations between these styles (e.g., imperative commands, questions, verbosity, direct data manipulation) and task-related outcomes—such as efficiency (time to completion, number of interactions), error rates (frequency of AI misinterpretations), or the need for hints—would offer significant insights. This analysis of user-generated data is appropriate for the Results section and could inform the development of prompting guidelines or adaptive AI feedback mechanisms within Df2, thereby enhancing usability and user success.

"We observed that participants created diverse styles of prompts, both in terms of how they phrase the instruction (e.g., question, command) and the subject they asked (e.g., describing expected visual output or output data property, providing computation formula)." (Page 12)

Implementation: Systematically categorize the prompts used by participants based on the styles already identified (e.g., imperative, question-based, verbose, concise, column-focused). For each participant or prompt style category, analyze corresponding task segments for metrics like time taken per sub-task, number of AI interactions required to achieve a correct visualization, and instances where hints were requested or significant corrections were needed. Report any observed correlations or notable patterns, or the lack thereof, to provide a richer understanding of prompt effectiveness.

Non-Text Elements

Figure 10: Participants' self-reported roles, expertise in chart creation, data...

Full Caption

Figure 10: Participants' self-reported roles, expertise in chart creation, data transformation, programming, and AI assistants (1=novice, 4=expert), task completion time, and hints needed during study tasks.

Figure/Table Image (Page 9)

First Reference in Text

Participants self-rated their skills (Figure 10) on a scale of 1 to 4 ("Novice," "Intermediate,” “Proficient," and "Expert") in: (1) chart creation - experience with chart authoring tools or libraries, (2) data transformation - experience with data transformation tools and library expertise, (3) programming, and (4) AI assistants - experience with large language models (e.g., ChatGPT [1]) and prompting.

Description

Table Overview: Figure 10 is a table summarizing data for eight study participants, labeled P1 through P8. It includes their self-reported professional roles, expertise levels in four areas, task completion times for two datasets, and the number of hints they required.
Participant Roles: Participants' roles include "Developer" (P1, P4, P5, P8), "Data Scientist" (P2, P6, P7), and "Data Architect" (P3).
Self-Reported Expertise: Expertise levels are self-reported on a scale of 1 (novice) to 4 (expert) in four categories: "Chart" (chart creation), "Data" (data transformation), "Coding" (programming), and "AI" (AI assistants). For example, participant P1 rated themselves as 3 for Chart, 4 for Data, 4 for Coding, and 2 for AI. Participant P4 reported the lowest expertise across the board (1 for Chart, 2 for Data, 3 for Coding, 2 for AI), while P2 and P3 generally reported higher expertise (e.g., P2: 3, 4, 4, 4; P3: 3, 4, 4, 4).
Task Completion Times: Task completion times are provided for two datasets, "Dataset 1" and "Dataset 2", measured in seconds (s). For Dataset 1, times ranged from 715s (P3) to 1638s (P7). For Dataset 2, times ranged from 1148s (P6) to 2937s (P5).
Hints Needed: The final column, "Hints," shows the number of hints each participant needed during the study tasks. This ranged from 0 hints (P2, P3) to 3 hints (P6).
Unusual Markings: There are some unusual markings: P5 has an exclamation mark next to their ID, and P6 has a "-1" next to their ID. P5's AI expertise is bolded as '1', and P6's Data and Coding expertise are bolded as '2'. The meaning of these markings is not explicitly defined in the provided caption or reference text.

Scientific Validity

✅ Summarizes relevant participant characteristics: The table effectively summarizes key characteristics of the study participants, which is important for understanding the context of the user study results and assessing potential influences of background on performance.
💡 Self-reported expertise has inherent limitations: Presenting self-reported expertise is a common practice in HCI studies. However, self-reported data can be subject to biases (e.g., Dunning-Kruger effect, modesty bias). Acknowledging this limitation, or supplementing with objective measures if possible, would strengthen the study, though for descriptive purposes of the sample, it is acceptable.
💡 Small sample size: The sample size (N=8) is relatively small, which is common for qualitative or in-depth usability studies. However, this limits the generalizability of quantitative findings (like average completion times or correlation between expertise and performance). The table itself is descriptive and doesn't make inferential claims, so this is acceptable for its purpose.
✅ Includes objective performance metrics: The inclusion of task completion times and hints needed provides objective performance metrics that can be qualitatively related to the self-reported expertise and roles, offering richer insights into user experiences.
💡 Unexplained symbols and formatting reduce clarity and interpretability: The unexplained symbols (P5!, P6-1) and bolded numbers are problematic as they introduce ambiguity. If these denote specific participant characteristics or events (e.g., P5 encountered a specific technical issue, P6 was a pilot participant with a slightly different setup), this context is missing and crucial for correct interpretation.
✅ Relevant expertise categories: The categories for expertise (Chart, Data, Coding, AI) are relevant to a study on an AI-powered data visualization tool. This allows for exploring how different skill sets might interact with the system.

Communication

✅ Effective use of tabular format: The tabular format is highly effective for presenting a summary of diverse participant characteristics and performance metrics in a structured and comparable way.
✅ Clear column headers: Column headers are clear and concise (e.g., "ID", "Role", "Chart", "Data", "Coding", "AI", "Dataset 1", "Dataset 2", "Hints"), making it easy to understand what each column represents.
✅ Informative caption with scale explanation: The caption provides essential context, including the meaning of the expertise scale (1=novice, 4=expert), which is crucial for interpreting the data.
💡 Unexplained formatting/symbols: The use of bolding for participant P5's expertise in "AI assistants" (value 1) and P6's expertise in "Data" (value 2), and "Coding" (value 2) seems to indicate outliers or specific points of interest, but this is not explained in the caption or reference text. If these are indeed intended highlights, their significance should be clarified. Suggestion: Explain any specific formatting (like bolding or the exclamation mark next to P5 and the '-1' next to P6) in the caption or a footnote.
💡 Interpretability of time data: The task completion times are presented in seconds (e.g., "1047s"). While precise, converting these to minutes and seconds (e.g., 17m 27s) might be more immediately interpretable for some readers. Suggestion: Consider adding a secondary representation or a note about typical time ranges in minutes.
✅ Good information density: The table is compact and presents a good amount of information without appearing overly cluttered.

Figure 11: The dataset and tasks in our user study. (1) Dataset 1:...

Full Caption

Figure 11: The dataset and tasks in our user study. (1) Dataset 1: Understanding top earning majors and the relation between salary and women percentage. (2) Dataset 2: Exploring movie genres with best return-on-investment values (profit vs. profit ratio) and top movies. The branching directions are added for illustration; participants developed their own iteration strategies. We refer to these target charts as C1-7 for the college dataset and M1-9 for the movies dataset.

Figure/Table Image (Page 11)

First Reference in Text

Figure 11-1 shows the first data exploration session: given a dataset on college majors and income data (173 rows × 7 columns), participants were asked to create seven visualizations: two basic charts and five requiring data transformation.

Description

Task Overview (Dataset 1): Panel 1 of Figure 11 (referred to as Figure 11-1 in the text) outlines the first user study task, which involves analyzing a dataset on college majors and income. The stated goal is "Understanding top earning majors and the relation between salary and women percentage."
Initial Data Table (Dataset 1): An initial data table snippet is shown at the top left. It includes columns like "Code", "Major", "Men", "Women", "Major Category", "Employed", and "Median Salary". For example, for Major "ACCOUNTING" (Code 6201), there are 94519 Men, 104114 Women, it falls under "Business" Major Category, has 165527 Employed, and a Median Salary of 45000.
Sequence of Target Charts and Transformations (C1-C7): The panel then illustrates a sequence of seven target chart visualizations (C1 through C7) that participants were asked to create. This sequence involves several data transformations and analytical steps: - C1 & C2: Labeled as "Basic charts". C1 appears to be a scatter plot of Median Salary vs. women percentage. C2 is also a scatter plot, likely related to Major Category and Median Salary. - From C2, a transformation "top 20 earning majors" leads to C3 (a bar chart of majors by median salary). - C3 is then transformed by "color by Major Category" to produce C4 (a similar bar chart, but with bars colored by major category). - Another branch from C1 involves "calculate women percentage and salary" leading to C5 (a scatter plot of median salary vs. women percentage, possibly with different aggregations or filtering). - From C5, a transformation "color by Major Category" leads to C6 (similar to C5 but colored by major category). - Finally, from C6, a transformation "show top 4 the rest as 'others'" leads to C7 (a scatter plot where majors are grouped, with top categories highlighted and others grouped as 'others').
Iterative Nature of Task: The caption notes that "The branching directions are added for illustration; participants developed their own iteration strategies." This implies the depicted flow is a target outcome, but users might have taken different paths to achieve these visualizations.

Scientific Validity

✅ Realistic multi-step exploration task: This panel effectively outlines a realistic multi-step data exploration task. The transformations described (filtering, aggregation, categorization) are common in data analysis and suitable for evaluating a data visualization tool.
✅ Tests iterative refinement and derived data handling: The progression from basic charts to more complex ones requiring several transformations (e.g., C7 requiring grouping into 'others') provides a good test of the system's capabilities for iterative refinement and handling derived data.
✅ Relevant and understandable dataset: The dataset itself (college majors, employment, salary, gender distribution) is relevant and commonly used for socio-economic analyses, making the task relatable and understandable.
✅ Consistent with dataset description in text: The reference text mentions the dataset has 173 rows x 7 columns. The snippet shows 7 columns, which is consistent. The number of rows in the snippet is small, which is typical for an illustration.
✅ Clarifies illustrative nature of branching: The caption clarifies that the branching is illustrative and participants developed their own strategies. This is an important methodological point, as it suggests the study was not strictly about reproducing a fixed sequence but about achieving analytical goals, allowing for user variability.
💡 Validity contingent on system capabilities: The scientific validity of the tasks hinges on whether these charts (C1-C7) can indeed be produced by the DF2 system and whether the transformations are within its capabilities. The figure itself presents these as targets.

Communication

✅ Clear illustration of data and task flow: The panel effectively uses a combination of a data table snippet and a flowchart-like progression of chart thumbnails (C1-C7) to illustrate the data exploration task. This provides both context (initial data) and the sequence of analytical steps.
✅ Clear transformation labels: The labels for transformations (e.g., "top 20 earning majors", "calculate women percentage and salary", "color by Major Category", "show top 4 the rest as 'others'") are concise and clearly describe the operation performed at each step.
✅ Informative chart thumbnails: The chart thumbnails (C1-C7), while small, give a reasonable impression of the type of visualization created at each stage, aiding in understanding the analytical progression.
✅ Clear task goal in caption: The caption part "(1) Dataset 1: Understanding top earning majors and the relation between salary and women percentage" clearly defines the goal of this analytical task.
💡 Legibility of data table snippet: The data table snippet is quite small, and the text within it (column headers and data values) is difficult to read without zooming. Suggestion: Consider making the table snippet slightly larger or using a higher resolution image if the specific data values are important for the reader to see.
💡 Information density: The arrows indicating the flow of transformations are clear, but the overall panel is information-dense. Suggestion: Ensure sufficient white space or visual grouping to prevent a cluttered appearance, especially given the multiple small chart elements.

Figure 12: Participants' workflow for study tasks in Figure 11 (C1-7 for...

Full Caption

Figure 12: Participants' workflow for study tasks in Figure 11 (C1-7 for college, M1-9 for movie). Each node represents a data table version, with blue for initial datasets, yellow for data tables instantiating (one or multiple) target visualizations in Figure 11 (number i in the node indicate the i-th target visualizations for the given dataset), and gray for others. Self-loop arrows indicate prompt revisions and data table updates ('×2' indicates two revisions).

Figure/Table Image (Page 12)

First Reference in Text

Figure 12 illustrates their organization of data threads in their workspaces upon completing the study tasks.

Description

Overall Structure: Small Multiples of Workflows: Figure 12 presents a series of small multiple diagrams, illustrating the individual workflows of eight participants (P1 through P8) for two separate study tasks: one involving a 'college' dataset (aiming to produce target charts C1-C7 from Figure 11) and another involving a 'movie' dataset (aiming for charts M1-M9 from Figure 11).
Node Representation and Color Coding: Each individual diagram is a node-link graph representing a participant's data exploration path, referred to as their 'data threads' organization. Nodes symbolize different versions of a data table. The color of a node indicates its role: blue nodes are initial datasets; yellow nodes represent data tables that were used to create one or more of the target visualizations (the specific target chart number, e.g., '1,2' or '7', is often shown inside the yellow node); and gray nodes are other intermediate data table versions created during the exploration.
Edge Representation and Iteration: Arrows between nodes indicate a transformation step, showing the progression from one data table version to another. Self-loop arrows (an arrow starting and ending on the same node) signify prompt revisions and data table updates. A notation like '×2' next to a self-loop indicates that two such revisions occurred on that data table version.
Variability in Workflow Complexity and Strategy: The diagrams visually demonstrate the diversity in participants' approaches. Some workflows are relatively linear with few branches (e.g., P2's college task), while others are highly branched with many intermediate steps (e.g., P5's movie task, which also includes a 'reset' point). The number of nodes and the complexity of the graph structure vary significantly across participants and tasks. For example, P1's college task workflow is very compact with few nodes, while P8's movie task workflow is more extensive.
Illustration of Iteration Styles: The figure aims to show how participants organized their data threads, illustrating different iteration styles such as creating wider versus deeper tree structures, or backtracking versus follow-up actions, as discussed in the paper's results section.

Scientific Validity

✅ Appropriate visualization of qualitative workflow data: The figure provides a valuable visualization of qualitative data regarding user behavior and interaction strategies. Representing workflows as graphs is an appropriate method for capturing the iterative and often non-linear nature of data exploration.
✅ Supports systematic comparison of iteration styles: The consistent coding scheme (colors, arrows) applied across all participants allows for systematic comparison of their approaches, supporting the paper's analysis of different iteration styles.
✅ Supports claims made in reference text: The figure directly supports the reference text's claim that it illustrates the organization of data threads. It visually substantiates discussions about how participants managed their analytical history.
💡 Granular detail requires careful interpretation for strategic patterns: The level of detail (individual data table versions as nodes) provides a granular view of the process. However, deriving higher-level strategic patterns solely from these visual graphs might require careful interpretation and could be complemented by other qualitative data (e.g., think-aloud protocols, interview excerpts).
✅ Supports claim of diverse iteration strategies: The claim that "participants developed their own iteration strategies" (from Figure 11 caption, contextually relevant here) is well supported by the visual diversity in Figure 12's workflows.
💡 'Reset' event lacks causal explanation in the figure itself: The 'reset' label for P5's movie task is an interesting data point. Its scientific interpretation would depend on understanding why the reset occurred (e.g., system error, user confusion, deliberate change in strategy). The figure shows the event but not the cause.

Communication

✅ Consistent visual language for comparison: The use of a consistent visual language (nodes for data versions, arrows for transformations, color coding for node type) across all participant workflows allows for effective comparison of iteration strategies.
✅ Effective color coding with clear legend: The color coding (blue for initial, yellow for target, gray for intermediate) is clearly explained in the caption and effectively distinguishes different types of data table versions within the workflows.
✅ Comprehensive and clear caption: The caption is comprehensive and provides all necessary information to interpret the diagrams, including the meaning of nodes, colors, arrows, and special notations like '×2'.
✅ Effectively visualizes variability in workflows: The figure successfully visualizes the non-linear and varied nature of participants' data exploration paths, highlighting individual differences in problem-solving approaches.
💡 High information density: The figure is quite dense, with 16 small multiples (8 participants × 2 tasks). While this allows for a holistic view, individual workflow details can be hard to discern without zooming. Suggestion: If specific workflow patterns are being discussed in the text, consider using callouts or slightly larger versions of representative examples, or ensure the figure is rendered at a very high resolution in the final publication.
💡 Legibility of numbers in yellow nodes: The numbers within the yellow nodes (indicating target visualizations C1-7 or M1-9) are small and can be difficult to read. Suggestion: Increase the font size for these numbers if possible, or ensure they are very crisp.
💡 Undefined 'reset' label: The meaning of the 'reset' label on P5's movie task workflow is not explicitly defined in the caption, though it can be inferred as restarting a branch. Suggestion: Briefly define any unique labels like 'reset' in the caption or text.

Discussion and Future Work

Key Aspects

🤝 Enhancing Exploratory Analysis with AI Recommendations: This key aspect outlines the vision for augmenting Df2 with AI-driven recommendation capabilities, addressing the 'cold start' problem common in exploratory data analysis where users may initially lack clear direction. The paper posits that Df2's inherent strengths—specifically, its ability to generate visualizations from data beyond the initial input tables via transformation, and its 'data threads' feature that facilitates iterative follow-up on suggestions—can overcome limitations of current recommendation tools. The significance lies in potentially broadening the scope of discoverable insights; however, this also introduces challenges such as managing an expanded, potentially noisy, exploration space and mitigating the risk of generating trivial, distracting, or biased recommendations, necessitating future research into robust management and communication of exploration paths.
⚖️ Synergizing Data Transformation and Chart Editing Paradigms: The discussion addresses the current operational dichotomy in Df2 where AI handles data transformation, but chart styling (e.g., color schemes, axis ordering within an existing Vega-Lite specification, which is a high-level grammar for interactive graphics) is managed via direct GUI manipulation. This design choice leverages the precision of UIs and the current strengths of LLMs in data tasks over chart generation. Recognizing user interest from the study for AI to also perform chart edits, the paper proposes a future direction involving an agent-based system. Such a system would intelligently decide whether to transform data, edit the chart script, or both, based on user inputs. The primary challenge and significance of this future work lie in achieving reliable and timely responses from AI agents, which often require multiple interactions for complex tasks, thereby impacting the fluidity of the user experience.
💬 Advancing AI Interaction through Proactive Clarification: This aspect highlights a proposed shift in Df2's interaction model from a purely 'generation-verification' loop (where AI attempts a task and the user subsequently verifies its correctness) to a more proactive AI engagement. The paper suggests that AI could actively seek clarification from users when inputs are ambiguous (e.g., a vague request like 'show top 5') before attempting execution. This proactive dialogue is significant as it aims to reduce the user's cognitive load associated with verification and revision, potentially streamline the analytical workflow, and foster greater trust in the AI system by demonstrating a more nuanced understanding of user intent. Future research would focus on training AI models to ask only essential, contextually relevant clarification questions to avoid disrupting the user's flow.
🗺️ Acknowledging Study Limitations and Charting Future Research: The paper transparently discusses the limitations inherent in its user study methodology, providing crucial context for interpreting the findings and bolstering the study's credibility. Key limitations include the use of structured reproduction tasks instead of free-form exploration (a choice made to control for varying data analysis skills among participants) and the constraints of a laboratory setting that preclude observation of long-term learning effects with Df2. The significance of this self-critique lies in its contribution to academic rigor and in outlining clear paths for future research. Specifically, the authors propose follow-up studies involving open-ended exploration with users' own data and longitudinal studies to investigate evolving user expectations, interaction styles, and learning curves with Df2 over extended periods.

Strengths

✅ Visionary Integration with Recommendation Systems
The discussion effectively articulates a forward-looking vision for Df2 by proposing integration with recommendation systems, leveraging Df2's unique strengths like dynamic data transformation and data threads to overcome limitations of existing recommenders.

"Df2’s designs can benefit user experiences with visualization recommendation tools in two ways. First, because Df2 supports visualization beyond initial data formats, it overcomes the limitation of most existing tools... Second, Df2’s data threads provide a natural way for users to follow up the system’s initial recommendations..." (Page 13)
✅ Pragmatic Approach to AI in Chart Editing
The paper demonstrates a pragmatic understanding of current AI capabilities by proposing a balanced approach to chart editing—maintaining precise GUI control for stylistic refinements while exploring future AI-driven agent systems for more complex, unified interactions.

"This design leverages the natural and precise nature of UI updates, providing immediate visual feedback [55]. It also utilizes current models’ strengths in data transformation for more reliable outputs [12, 24]." (Page 14)
✅ User-Centric Proposal for Proactive AI Clarification
The discussion on proactive AI clarification highlights a user-centric approach to future development, aiming to reduce user effort in verification and build trust by making the AI a more intelligent conversational partner.

"This proactive approach could also promote users’ trust in the AI system. It is an interesting research direction to explore ways to prompt or train AI models to ask only necessary clarification questions, preventing users from being overwhelmed with low-level questions that might interrupt their workflow." (Page 14)
✅ Transparent Acknowledgment of Study Limitations
The authors exhibit strong methodological awareness by openly discussing the user study's limitations, such as the nature of the tasks and the lab environment, and by proposing specific future studies (open exploration, longitudinal) to address these gaps.

"In our user study, we used the reproduction of professional data analysts’ exploration sessions as the study tasks, rather than asking participants to perform free explorations. This choice was made to minimize the impact of participants’ data analysis skills on their experience with Df2, as our goal was not to assess their exploration skills." (Page 14)

Suggestions for Improvement

💡 Elaborate on Bias Mitigation in AI-Enhanced Recommendations
Medium-to-high impact. The discussion correctly identifies the increased risk of bias and irrelevant suggestions when Df2's data transformation capabilities expand the recommendation space. While acknowledging this is good, elaborating on potential research avenues or specific strategies to mitigate these issues would significantly strengthen this future work direction. This is pertinent to the Discussion section as it addresses a critical challenge for the proposed enhancements to recommendation capabilities, directly impacting the responsible development and deployment of such AI features.

"While Df2’s data transformation ability can extend the visualization exploration space, thus bringing in more potential insights to be discovered, it also increases the chances of suggesting field combinations that are either trivial, distracting, or even biased." (Page 13)

Implementation: After the sentence 'Therefore, as part of future work, it would be valuable to explore ways to support visual recommendation in a larger exploration space, especially managing and communicating exploration paths to the user to prevent unintentional bias towards an undesired direction,' consider adding: 'This could involve research into developing algorithmic safeguards for recommendation diversity, incorporating interactive user feedback mechanisms to refine suggestion quality and flag potential biases, or designing transparent interfaces that clearly articulate the provenance and rationale behind AI-generated recommendations and alternative exploration paths.'
💡 Discuss User Oversight in Proposed Agent-Based Systems
Medium impact. The proposal for an agent-based system to coordinate data transformation and chart editing is innovative. However, the discussion could benefit from briefly considering how user control and oversight would be maintained within such a system, particularly given that AI agents might require multiple interactions. Addressing user agency in complex AI interactions is crucial for system usability and trust, making it a relevant point for the Discussion section's future work considerations.

"A potential solution is an agent-based system [66, 71] that plans whether to transform data, edit the chart script, or both based on user inputs, and dispatches agents to handle these tasks." (Page 14)

Implementation: Following the sentence 'The key challenge is managing response time and maintaining reliability, as AI agents often require multiple interactions to reach consensus,' add a sentence such as: 'Furthermore, future investigations should explore mechanisms for effective user intervention and preference articulation within these agent-based dialogues, ensuring users can guide, correct, or override agent decisions, particularly when consensus is slow to achieve or diverges from the user’s evolving intent.'

Data Formulator 2: Iterative Creation of Data Visualizations, with AI Transforming Data Along the Way

First Page Preview

Table of Contents

Overall Summary

Study Background and Main Findings

Research Impact and Future Directions

Critical Analysis and Recommendations

Section Analysis

Abstract

Key Aspects

Strengths

Suggestions for Improvement

Introduction

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Method

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Results

Key Aspects

Strengths

Suggestions for Improvement

Non-Text Elements

Discussion and Future Work

Key Aspects

Strengths

Suggestions for Improvement