Bi-level Hierarchical Neural Contextual Bandits for Online Recommendation

Yunzhe Qi

Yao Zhou

Yikun Ban

Allan Stewart

Chuanwei Ruan

Jiachuan He

Shishir Kumar Prasad

Haixun Wang

Jingrui He

Transactions on Machine Learning Research (2026)

Download Google Scholar

Abstract

Contextual bandit algorithms aim to identify the optimal choice among a set of candidate arms, based on their contextual information. Among others, the neural contextual bandit algorithms have demonstrated generally superior performance compared to traditional linear and kernel-based methods. Nevertheless, neural methods are not inherently suitable to handle a large number of candidate arms due to their high computational cost when performing neural exploration.
Motivated by the widespread availability of arm category information (e.g., movie genres, retailer types), we formulate contextual bandits into a bi-level recommendation problem based on the accessible arm category information, and propose a novel neural bandit framework, named H2N-Bandit, which utilizes a bi-level hierarchical neural structure to mitigate the substantial computational cost found in conventional neural bandit methods.
To demonstrate its effectiveness, we provide the regret bound for H2N-Bandit under the over-parameterized neural bandit settings. Furthermore, to illustrate its efficiency, we conduct extensive experiments on multiple real-world public data sets with various specifications, showing that H2N-Bandit can significantly reduce the computational cost over existing non-linear methods while achieving better or comparable performances against state-of-the-art baselines.

Explore our many areas of focus

Building a collaborative ecosystem

Shaping the future together

Translating discovery into real-world impact

Bi-level Hierarchical Neural Contextual Bandits for Online Recommendation

Abstract

Meet the teams driving innovation

Google Ai

Google Cloud

Google DeepMind

Google Labs