TUMIX: Augmenting LLM Reasoning with a Dynamic Tool-Use Mixture

Chuchu Fan

Tomas Pfister

Jiefeng Chen

Na Li

Chi Wang

Ji Yin

Yongchao Chen

Rui Meng

Jinsung Yoon

2025

Download Google Scholar

Abstract

Integrating tools like Code Interpreter and Search has significantly improved Large Language Models (LLMs) reasoning, as shown by leading models such as OpenAI's ChatGPT Agent, Google's Gemini-Pro, and XAI's Grok4. However, the research community still lacks practical guidance on fully leveraging these tools. The main challenge lies in finding an effective method to fully exploit the benefits of textual reasoning, coding, and searching when facing distinctive questions. To address this, we propose an ensemble-based framework that runs multiple agents in parallel, each exploring different answer paths with distinct tool-use strategies. Agents iteratively share and refine their answers by considering the original question and previous responses. Our proposed method Tool-Use Mixture (TUMIX) achieves significant gains over other representative tool-augmented test-time scaling methods such as Self-MoA, Symbolic-MoE, DEI, SciMaster, and GSA. With near equal inference costs, TUMIX delivers an average +3.55% accuracy improvement over the best baseline on Gemini-2.5-Pro and Gemini-2.5-Flash across key reasoning benchmarks (HLE, GPQA, AIME 24&25), where coding and search can effectively support reasoning when applied properly. We find that agent diversity and quality are crucial, and can be further improved by querying LLMs to automatically optimize agent designs. To reduce costs, TUMIX halts refinement once sufficient confidence is reached, preserving nearly the same performance at just 49% of the inference cost. With further scaling, TUMIX can achieve even higher performance, though at substantially greater cost.

Defining the technology of today and tomorrow.

Philosophy

People

Foundational ML & Algorithms

Computing Systems & Quantum AI

Science, AI & Society

Projects

Publications

Resources

Shaping the future, together.

Student programs

Faculty programs

Conferences & events

TUMIX: Augmenting LLM Reasoning with a Dynamic Tool-Use Mixture

Abstract

Learn more about how we conduct our research