Rajat Sen
I am a Research Scientist at Google. My areas of interest are bandit algorithms, black box optimization and time-series forecasting.
I obtained my PhD. at UT Austin, where I was lucky to be advised by Dr. Sanjay Shakkottai.
Research Areas
Authored Publications
Sort By
Preview abstract
Motivated by recent advances in large language models for NLP, we design a time-series foundation model for forecasting whose out-of-the-box zero-shot performance on a variety of datasets, matches the accuracy of state-of-the-art supervised forecasting models for each individual dataset. Our model is based on pretraining a patched-decoder style attention model on a large time series dataset, and can work well across different forecasting history lengths, prediction lengths and temporal granularities.
View details
Preview abstract
We provide an intuitive new algorithm for black-box stochastic optimization of unimodal functions, a function class that we observe empirically can capture hyperparameter-tuning loss surfaces. Our method's convergence guarantee automatically adapts to Lipschitz constants and other problem difficulty parameters, recovering and extending prior results. We complement our theoretical development with experimentally validation on hyperparameter tuning tasks.
View details
Preview abstract
Hierarchical forecasting is a key problem in many practical multivariate forecasting applications - the goal is to obtain coherent predictions for a large number of correlated time series that are arranged in a pre-specified tree hierarchy. In this paper, we present a novel probabilistic top-down approach to hierarchical forecasting that uses a novel attention-based RNN model to learn the distribution of the proportions according to which each parent prediction is split among its children nodes at any point in time. It also relies on an independent probabilistic forecasting model for the (univariate) root time series that can be generated using a sequence-to-sequence model, or even from traditional autoregressive-style univariate forecasting models , and The resulting forecasts are computed in a top-down fashion and are naturally coherent, and also support probabilistic predictions over all time series in the hierarchy. We provide theoretical justification for the superiority of our top-down approach compared to traditional bottom-up hierarchical modeling. Finally, we experiment on several public datasets and demonstrate significantly improved probabilistic forecasts, compared to state-of-the-art probabilistic hierarchical models.
View details
Efficient List-Decodable Regression using Batches
Abhimanyu Das
Ayush Jain
Weihao Kong
Efficient List-Decodable Regression using Batches, ICML (2023)
Preview abstract
We begin the study of list-decodable linear regression using batches.
In this setting only an $\alpha \in (0,1]$ fraction of the batches are genuine, each providing a batch of $\ge n$ i.i.d. samples from a common unknown distribution. The remaining batches may contain arbitrary or even adversarial samples.
We derive a polynomial time algorithm that for any $n\ge \tilde \Omega(1/\alpha)$ returns a list of size $\mathcal O(1/\alpha)$ such that one of the item in the list is close to the true regression parameter. The algorithm requires only $\tilde\cO(d)$ genuine batches, and works under fairly general assumptions on the distribution.
The results demonstrate the utility of batch structure, which allows for the first polynomial time algorithm for list-decodable regression, which may be impossible for non-batch setting, as suggested by a recent SQ lower bound~\cite{diakonikolas2021statistical} for the non-batch setting.
View details
Long Horizon Forecasting with TiDE: Time-series Dense Encoder
Abhimanyu Das
Andrew Leach
Rose Yu
Weihao Kong
Transactions on Machine Learning Research (2023)
Preview abstract
We propose a simple MLP based encoder decoder architecture for long term time-series forecasting that can handle non-linear dependencies and dynamic covariates. Our method can achieve better results in several long term forecasting benchmarks while being 5-10x faster in terms of training and inference compared to the best transformer based baselines. We also show theoretically and empirically that linear models can be near optimal when the ground truth is generated from an LDS when compared to RNN's and transformers.
View details
Preview abstract
We propose a practical maximum-likelihood-estimation framework for regression as an alternative to the typical approach of Empirical Risk Minimization (ERM) over a specific loss metric. Our approach is better suited to capture inductive biases in datasets, and can output post-hoc estimators at inference time that can optimize different types of loss metrics. We present theoretical evidence (in the fixed design setting) to demonstrate that our approach is always competitive with using ERM over the loss metric, and in many practical scenarios can be much superior to ERM. For time series forecasting, we propose an end-to-end MLE based training and inference approach that can flexibly capture various inductive biases, and optimize prediction accuracy for a variety of typical loss metrics, without having to choose a specific loss metric at training time. We demonstrate empirically that our method instantiated with a well-designed general purpose likelihood can obtain superior performance over ERM for a variety of time-series forecasting and regression datasets with different inductive biases and data distributions.
View details
On Learning Mixtures of Linear Regressions in a Non-Realizable Setting
Arya Mazumdar
Avishek Ghosh
Soumyabrata Pal
On Learning Mixture of Linear Regressions in the Non-Realizable Setting (2022)
Preview abstract
While mixture of linear regression is a well-studied topic, prior works in the literature usually focus on the {\em realizable} setting, i.e., when data is generated by a mixed linear (noisy) model. In this paper we show that a version of the popular AM algorithm finds out the best fit lines in a dataset even when a realizable model is not assumed, under some regularity conditions on the dataset and the initial point. In general, finding the best fit lines in the dataset involved solving a nonconvex optimization problem. We further provide an algorithm that runs in polynomial time in the number of datapoints, and recovers a good approximation of the best fit lines. In addition, we show that, mixture of linear regression can be used in a {\em list} prediction framework, with small prediction error via solving the aforementioned optimization problem.
View details
Preview abstract
We study the problem of learning generalized linear models under adversarial corruptions.
We analyze a classical heuristic called the \textit{iterative trimmed maximum likelihood estimator} which is known to be effective against \textit{label corruptions} in practice. Under label corruptions, we prove that this simple estimator achieves minimax near-optimal risk on a wide range of generalized linear models, including Gaussian regression, Poisson regression and Binomial regression. Finally, we extend the estimator to the much more challenging setting of \textit{label and covariate corruptions} and demonstrate its robustness and optimality in that setting as well.
View details
Top-k eXtreme Contextual Bandits with Arm Hierarchy
Alexander Rakhlin
Lexing Ying
Rahul Kidambi
Daniel Hill
Dean Foster
Inderjit Dhillon
Top-k eXtreme Contextual Bandits with Arm Hierarchy, International Conference on Machine Learning (2021) (to appear)
Preview abstract
Motivated by modern applications, such as online advertisement and recommender systems, we study the top-$k$ \xtm contextual bandits problem, where the total number of arms can be enormous, and the learner is allowed to select $k$ arms and observe all or some of the rewards for the chosen arms. We first propose an algorithm for the non-\xtm realizable setting, utilizing the Inverse Gap Weighting strategy for selecting multiple arms. We show that our algorithm has a regret guarantee of $O(k\sqrt{(A-k+1)T \log (|\cF|T)})$, where $A$ is the total number of arms and $\cF$ is the class containing the regression function, while only requiring $\tilde{O}(A)$ computation per time step. In the $\xtm$ setting, where the total number of arms can be in the millions, we propose a practically-motivated arm hierarchy model that induces a certain structure in mean rewards to ensure statistical and computational efficiency. The hierarchical structure allows for an exponential reduction in the number of relevant arms for each context, thus resulting in a regret guarantee of $O(k\sqrt{(\log A-k+1)T \log (|\cF|T)})$. Finally, we implement our algorithm using a hierarchical linear function class and show superior performance with respect to well-known benchmarks on simulated bandit feedback experiments using \xtm multi-label classification datasets. On a dataset with three million arms, our reduction scheme has an average inference time of only 7.9 milliseconds, which is a 100x improvement.
View details