Causal language modeling can elicit search and reasoning capabilities on synthetic tasks

Kulin Shah
Nishanth Dikkala
2024

Abstract

Causal language modeling using the Transformer architecture has yielded remarkable capabilities in Large Language Models (LLMs) over the last few years. However, the extent of the search and reasoning abilities of LLMs remains a topic of ongoing debate. In this work, we study if causal language modeling with Transformers can learn a complex task such as solving Sudoku puzzles. To solve a Sudoku puzzle, the model is first required to search over all empty cells of the puzzle to decide on a cell to fill and then apply an appropriate strategy to fill the decided cell. Sometimes, the application of a strategy only results in thinning down the possible values in a cell rather than concluding the exact value of the cell. In such cases, multiple strategies are applied one after the other to fill a single cell. We observe that Transformer models trained on this synthetic task can indeed learn to solve Sudokus when trained on a logical sequence of steps taken by a solver. We find that training Transformers with the logical sequence of steps is necessary and without such training, they fail to learn Sudoku. In addition, we study the internal representations of the trained Transformer and find that through linear probing we can decode high-level strategy information from them pointing to the presence of a strong reasoning engine implicit in the Transformer weights.
×