Blackboard Multi-Agent Systems for Information Discovery in Data Science

Hamed Zamani
Mihir Parmar
ALIREZA SALEMI
2025

Abstract

The proliferation of Large Language Models (LLMs) has opened new opportunities in data science, yet their practical deployment is often constrained by the challenge of discovering relevant data within large and heterogeneous data lakes. Existing approaches, including single-agent and master–slave multi-agent systems, struggle with scalability, information heterogeneity, and robustness to irrelevant files. To address these limitations, we propose a novel multi-agent communication paradigm inspired by the blackboard architecture in traditional AI and software design. In this framework, a central agent posts information requests to a shared blackboard, and autonomous subordinate agents---each responsible for a partition of the data lake---volunteer to respond based on their capabilities. This distributed design improves scalability and flexibility by eliminating the need for a central coordinator to have prior knowledge of agent expertise. We evaluate the approach on three benchmarks that require explicit data discovery: KramaBench and modified versions of DS-Bench and DA-Code to incorporate data discovery. Experimental results demonstrate that the blackboard architecture substantially outperforms baselines, including RAG and the master–slave paradigm, achieving 13% to 57% relative improvement in end-to-end task success and up to a 9% relative gain in F1 score for data discovery across both proprietary and open-source LLMs. These findings establish the blackboard paradigm as a scalable and generalizable communication framework for multi-agent data science systems.