PlanGEN: A Framework Utilizing Inference-Time Algorithms with LLM Agents for Planning and Reasoning

Hootan Nakhost
Mihir Parmar
Swaroop Mishra
Chitta Baral
Jindong Gu
2025

Abstract

Scaling inference-time computation in Large Language Models (LLMs) dramatically improves their capabilities for solving complex problems. While test-time scaling has shown promise in many tasks such as code generation and mathematical reasoning, integration of inference-time algorithms into multi-agent frameworks for planning and reasoning remains under-explored. To this end, we explore popular inference-time algorithms—Best of N, Tree of Thought (ToT), and REward BAlanced SEarch (REBASE)—with proposed feedback-driven refinement. Our feedback-driven refinement employs specialized agents: a constraint agent to enforce task instance-specific constraints, and a verifier agent to evaluate plan quality. Furthermore, we hypothesize that test-time scaling can be proportional to instance-level complexity. Thus, we propose an additional selection agent to dynamically optimize algorithm choice. We evaluate our proposed approaches on four different benchmarks, i.e., NATURAL PLAN, GPQA, OlympiadBench, and DocFinQA. Experimental results show that our methods outperform strong baselines, achieving state-of-the-art results in NATURAL PLAN, OlympiadBench , and DocFinQA. Our key findings demonstrate that constraint-guided iterative refinement and algorithm selection improves both planning and downstream reasoning in LLMs