AndroidWorld: An Open World for Autonomous Agents
Abstract
Autonomous computer control agents that execute human tasks by controlling user interfaces (UIs) are emerging. Such agents would be valuable for humans, and progress in the field will be driven by realistic and reproducible benchmarks. We present AndroidWorld, a fully-functioning Android environment that pro-vides reward signals across 20 apps on 114 programmatic tasks. Instead of a static test set, the tasks in AndroidWorld parameterized, allowing for unlimited variation in language and task parameters. Reward signals are derived from An-droid system state, making them highly durable and extensible across different applications. To demonstrate AndroidWorld's extensibility, we integrate the popular MiniWoB++ into it.To evaluate AndroidWorld, we introduce a new multimodal autonomous agent for Android, M3A. Our agent achieves a 27% success rate leaving ample room for future work. Furthermore, we adapt a popular desktop web agent for Android, which we find to be less effective on mobile, suggesting future research is needed to build universal, cross-domain agents. Finally, we conduct robustness testing by testing M3A against a suite of real-world variations on a representative subset of tasks. AndroidWorld and the experiments in this paper are available at https://https//github.com/google-research/android-world: