Spider2.0-GUI: Can Multimodal Agents Achieve Expert Proficiency in Data Science and Engineering?

Ruisheng Cao

Fangyu Lei

Haoyuan Wu

Jixuan Chen

Yeqiao Fu

Hongcheng Gao

Xinzhuang Xiong

Hanchong Zhang

Yuchen Mao

Wenjing Hu

Tianbao Xie

Hongshen Xu

Danyang Zhang

Sida Wang

Ruoxi Sun

Pengcheng Yin

Caiming Xiong

Ansong Ni

Qian Liu

Victor Zhong

Lu Chen

Kai Yu

Tao Yu

2024

Download Google Scholar

Abstract

The field of data science and engineering is crucial for harnessing large-scale data to assist both individuals and enterprises in analytical processing and automated orchestration. Despite the significance, large language model~(LLM)-based data agents remain underexplored, particularly concerning professional data engineering tools such as {\tt dbt}, {\tt Airflow}, and {\tt Airbyte}, which are complex to use and include intensive GUI operations. To bridge this gap, we introduce Spider2.0-GUI, the first benchmark focusing on enterprise data engineering softwares across a full data pipeline. It encapsulates $486$ tasks involving $20$ professional softwares, guiding through tasks such as data warehousing, ingestion, transformation, analysis, visualization, and orchestration. Each task is paired with both abstract and verbose instructions, considering different levels of user expertise. We also build a comprehensive document warehouse with $11,231$ documents for Spider2.0-GUI to support retrieval-augmented agent frameworks. The benchmark is further enhanced with a real-time, executable Ubuntu desktop environment that interacts with real-world internet, providing a realistic and dynamic testing ground. Preliminary results with state-of-the-art vision language models~(VLMs) indicate that even the most advanced model only achieves $11\%$ success rate~(SR) with abstract instructions, and $21\%$ SR with verbose instructions~(a.k.a., step-by-step tutorials). This benchmark not only investigates the competencies of data agents, but also paves the way for future advancements in real-world automated data science and engineering tasks.

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

Spider2.0-GUI: Can Multimodal Agents Achieve Expert Proficiency in Data Science and Engineering?

Abstract

Meet the teams driving innovation