Minmin Chen
Research Areas
Authored Publications
Sort By
Surrogate for Long-Term User Experience in Recommender Systems
Can Xu
Lisa Mijung Chung
Mohit Sharma
Qian Sun
Sriraj Badam
Yuyan Wang
KDD 2022 (2022)
Preview abstract
Over the years we have seen recommender systems shifting focus from optimizing short-term engagement toward improving long-term user experience on the platforms. While defining good long-term user experience is still an active research area, we focus on one specific aspect of improved long-term user experience here, which is user revisiting the platform. These long term outcomes however are much harder to optimize due to the sparsity in observing these events and low signal-to-noise ratio (weak connection) between these long-term outcomes and a single recommendation. To address these challenges, we propose to establish the association between these long-term outcomes and a set of more immediate term user behavior signals that can serve as surrogates for optimization.
To this end, we conduct a large-scale study of user behavior logs on one of the largest industrial recommendation platforms serving billions of users. We study a broad set of sequential user behavior patterns and standardize a procedure to pinpoint the subset that has strong predictive power of the change in users' long-term visiting frequency. Specifically, they are predictive of users' increased visiting to the platform in $5$ months among the group of users with the same visiting frequency to begin with. We validate the identified subset of user behaviors by incorporating them as reward surrogates for long-term user experience in a reinforcement learning (RL) based recommender. Results from multiple live experiments on the industrial recommendation platform demonstrate the effectiveness of the proposed set of surrogates in improving long-term user experience.
View details
Learning to Augment for Casual User Recommendation
Elaine Le
Jianling Wang
Yuyan Wang
The ACM Web Conference 2022 (2022)
Preview abstract
Users who come to recommendation platforms are heterogeneous in activity levels. There usually exists a group of core users who visit the platform regularly and consume a large body of contents upon each visit, while others are casual users who tend to visit the platform occasionally and consume less each time.
As a result, consumption activities from core users often dominate the training data used for learning. As core users can exhibit different activity patterns from casual users, recommender systems trained on historical user activity data usually achieve much worse performance on casual users than core users.
To bridge the gap, we propose a model-agnostic framework \textit{L2Aug} to improve recommendations for casual users through data augmentation, without sacrificing core user experience. \textit{L2Aug} is powered by a data augmentor that learns to generate augmented interaction sequences, in order to fine-tune and optimize the performance of the recommendation system for casual users. On four real-world public datasets, the proposed \textit{L2Aug} outperforms other treatment methods and achieves the best sequential recommendation performance for both casual and core users. We also test \textit{L2Aug} in an online simulation environment with real-time feedback to further validate its efficacy, and showcase its flexibility in supporting different augmentation actions.
View details
Towards Content Provider-Aware Recommendation Systems: A Simulation Study on Interplays among User and Provider Utilities
Ruohan Zhan
Elaine Le
Martin Mladenov
Alex Beutel
Ljubljana, Slovenia, pp. 3872-3883
Preview abstract
Most existing recommender systems primarily focus on the users (content consumers), matching users with the most relevant contents, with the goal of maximizing user satisfaction on the platform. However, given that content providers are playing an increasingly critical role through content creation, largely determining the content pool available for recommendation, a natural question that arises is: Can we design recommenders taking into account utilities of both users and content providers? By doing so, we hope to sustain the flourish of more content providers and a diverse content pool for long-term user satisfaction. Understanding the full impact of recommendations on both user and content provider groups is challenging. This paper aims to serve as a research investigation on one approach toward building a content provider-aware recommender, and evaluating its impact under a simulated setup.
To characterize the users-recommender-providers interdependence, we complement user modeling by formalizing provider dynamics as a parallel Markov Decision Process of partially observable states transited by recommender actions and user feedback. We then build a REINFORCE recommender agent, coined EcoAgent, to optimize a joint objective of user utility and the counterfactual utility lift of the content provider associated with the chosen content, which we show to be equivalent to maximizing overall user utility and utilities of all content providers on the platform. To evaluate our approach, we also introduce a simulation environment capturing the key interactions among users, providers, and the recommender. We offer a number of simulated experiments that shed light to both the benefits and the limitations of our approach. These results serve to understand how and when a content-provider aware recommender agent is of benefit in building multi-stakeholder recommender systems.
View details
Values of Exploration in Recommender Systems
Can Xu
Elaine Le
Mohit Sharma
Su-Lin Wu
Yuyan Wang
RecSys (2021)
Preview abstract
Reinforcement Learning (RL) has been sought after to bring next-generation recommender systems to improve user experience on recommendation platforms. While the exploration-exploitation tradeoff is the foundation of RL research, the value of exploration in RL based recommender systems is less well understood. Exploration, commonly seen as a tool to reduce model uncertainty in regions with sparse user interaction/feedback, is believed to cost user experience in the short term while the indirect benefit of better model quality arrives at a later time. We on the other hand argue that recommender systems have inherent needs for exploration and exploration can improve user experience even in the more imminent term.
We focus on understanding the role of exploration in changing different facets of recommendation quality that more directly impact user experience. To do that, we introduce a series of methods inspired by exploration research to increase exploration in a RL based recommender system, and study their effect on the end recommendation quality, more specifically, \emph{accuracy, diversity, novelty and serendipity}.
We propose a set of metrics to measure RL based recommender systems in these four aspects and evaluate the impact of exploration induced methods against these metrics. In addition to the offline measurements, we conduct live experiments on an industrial recommendation platform serving billions of users to showcase the benefit of exploration. Moreover, we use user conversion as an indicator of the holistic long-term user experience and study the values of exploration in helping platforms convert users. Connecting the offline analyses and live experiments, we start building the connections between these four facets of recommendation quality toward long term user experience and identify serendipity as a desirable recommendation quality that changes user states and improves long term user experience.
View details
Reward Shaping for User Satisfaction in a REINFORCE Recommender
Can Xu
Sriraj Badam
Trevor Potter
Daniel Li
Hao Wan
Elaine Le
Chris Berg
Eric Bencomo Dixon
(2021)
Preview abstract
How might we design Reinforcement Learning (RL)-based recommenders that
encourage aligning user trajectories with the underlying user satisfaction?
Three research questions are key: (1) measuring user satisfaction, (2)
combatting sparsity of satisfaction signals, and (3) adapting the training of
the recommender agent to maximize satisfaction. For measurement, it has been
found that surveys explicitly asking users to rate their experience with
consumed items can provide valuable orthogonal information to the
engagement/interaction data, acting as a proxy to the underlying user
satisfaction. For sparsity, i.e, only being able to observe how satisfied users
are with a tiny fraction of user-item interactions, imputation models can be
useful in predicting satisfaction level for all items users have consumed. For
learning satisfying recommender policies, we postulate that reward shaping in
RL recommender agents is powerful for driving satisfying user experiences.
Putting everything together, we propose to jointly learn a policy network and a
satisfaction imputation network: The role of the imputation network is to learn
which actions are satisfying to the user; while the policy network, built on
top of REINFORCE, decides which items to recommend, with the reward utilizing
the imputed satisfaction. We use both offline analysis and live experiments in
an industrial large-scale recommendation platform to demonstrate the promise of
our approach for satisfying user experiences.
View details
Deconfounding User Satisfaction Estimation from Response Rate Bias
Madeleine Traverse
Trevor Potter
Emma Marriott
Daniel Li
Chris Haulk
Proceedings of the 14th ACM Conference on Recommender Systems (2020)
Preview abstract
Improving user satisfaction is at the forefront of industrial recommender systems. While significant progress in recommender systems has relied on utilizing logged implicit data of user-item interactions (i.e., clicks, dwell/watch time, and other user engagement signals), there has been a recent surge of interest in measuring and modeling user satisfaction, as provided by orthogonal data sources. Such data sources typically originate from responses to user satisfaction surveys, which are explicitly asking users to rate their experience with the system and/or specific items they have consumed in the recent past. This data can be valuable for measuring and modeling the degree to which a user has had a satisfactory experience with the recommender, since what users do (engagement) does not always align with what users say they want (satisfaction as measured by surveys).
We focus on a large-scale industrial system trained on user survey responses to predict user satisfaction. The predictions of the satisfaction model for each user-item pair, combined with the predictions of the other models (e.g., engagement-focused ones), are fed into the ranking component of a real-world recommender system in deciding items to present to the user. It is therefore imperative that the satisfaction model does an equally good job on imputing user satisfaction across slices of users and items, as it would directly impact which items a user is exposed to. However, the data used for training satisfaction models is specifically biased in that users are more likely to respond to a survey when they will respond that they are more satisfied. When the satisfaction survey responses in slices of data with high response rate follow a different distribution than those with low response rate, response rate becomes a confounding factor for user satisfaction estimation.
We find a positive correlation between response rate and ratings in a large-scale survey dataset collected in our case study. To address this inherent response rate bias in the satisfaction data, we propose an inverse propensity weighting approach within a multi-task learning framework. We extend a simple feed-forward neural network architecture predicting user satisfaction to a shared-bottom multi-task learning architecture with two tasks: the user satisfaction estimation task, and the response rate estimation task. We concurrently train these two tasks, and use the inverse of the predictions of the response rate task as loss weights for the satisfaction task to address the response rate bias. We showcase that by doing this, (i) we can accurately model whether a user will respond to a survey, (ii) we improve the user satisfaction estimation error for the data slices with lower propensity to respond while not hurting that of the slices with higher propensity to respond, and (iii) we demonstrate in live A/B experiments that applying the resulting satisfaction predictions from this approach to rank recommendations translates to higher user satisfaction.
View details
Towards Neural Mixture Recommender for Long Range Dependent User Sequences
Francois Belletti
Sagar Jain
Alex Beutel
Can Xu
Paul Covington
WWW (2019)
Preview abstract
Understanding temporal dynamics has proved to be highly valuable for accurate recommendation. Sequential recommenders have been successful in modeling the dynamics of users and items over time. However, while different model architectures excel at capturing various temporal ranges or dynamics, distinct application contexts require adapting to diverse behaviors.
In this paper we examine how to build a model that can make use of different temporal ranges and dynamics depending on the request context. We begin with the analysis of an anonymized Youtube dataset comprising millions of user sequences. We quantify the degree of long-range dependence in these sequences and demonstrate that both short-term and long-term dependent behavioral patterns co-exist. We then propose a neural Multi-temporalrange Mixture Model (M3) as a tailored solution to deal with both short-term and long-term dependencies. Our approach employs a mixture of models, each with a different temporal range. These models are combined by a learned gating mechanism capable of exerting different model combinations given different contextual information. In empirical evaluations on a public dataset and our own anonymized YouTube dataset, M3 consistently outperforms state-of-the-art sequential recommendation methods.
View details
Preview abstract
Characterizing temporal dependence patterns is a critical step in understanding the statistical properties of sequential data. Long Range Dependence (LRD), referring to long-range correlations decaying as a power law rather than exponentially w.r.t. distance, demands a different set of tools for modeling the underlying dynamics of the sequential data. While it has been widely conjectured that LRD is present in language modeling and sequential recommendation, the amount of LRD in the corresponding sequential datasets has not yet been quantified in a scalable and model-independent manner. We propose a principled estimation procedure of LRD in sequential datasets based on established LRD theory for real-valued time series and apply it to sequences of symbols with million-item-scale dictionaries. In our measurements, the procedure estimates reliably the LRD in the behavior of users as they write Wikipedia articles and as they interact with Youtube. We further show that measuring LRD better informs modeling decisions in particular for RNNs whose ability to capture LRD is still an active area of research. The quantitative measure of LRD informs new Evolutive Recurrent Neural Networks (EvolutiveRNNs) designs, leading to state-of-the-art results on language understanding and sequential recommendation tasks at a fraction of the computational cost.
View details
Preview abstract
Recurrent neural networks have gained widespread use in modeling sequential
data. Learning long-term dependencies using these models remains difficult
though, due to exploding or vanishing gradients. In this paper, we draw connections between recurrent networks and ordinary differential equations. A special
form of recurrent networks called the AntisymmetricRNN is proposed under this
theoretical framework, which is able to capture long-term dependencies thanks to
the stability property of its underlying differential equation. Existing approaches
to improving RNN trainability often incur significant computation overhead. In
comparison, AntisymmetricRNN achieves the same goal by design. We showcase
the advantage of this new architecture through extensive simulations and experiments. AntisymmetricRNN exhibits much more predictable dynamics. It outperforms regular LSTM models on tasks requiring long-term memory and matches
the performance on tasks where short-term dependencies dominate despite being
much simpler.
View details
Top-K Off-Policy Correction for a REINFORCE Recommender System
Alex Beutel
Paul Covington
Sagar Jain
Francois Belletti
ACM International Conference on Web Search and Data Mining (WSDM) (2019)
Preview abstract
Industrial recommender systems deal with extremely large action
spaces – many millions of items to recommend. Moreover, they
need to serve billions of users, who are unique at any point in
time, making a complex user state space. Luckily, huge quantities
of logged implicit feedback (e.g., user clicks, dwell time) are available
for learning. Learning from the logged feedback is however
subject to biases caused by only observing feedback on recommendations
selected by the previous versions of the recommender. In
this work, we present a general recipe of addressing such biases in
a production top-K recommender system at YouTube, built with a
policy-gradient-based algorithm, i.e. REINFORCE [48]. The contributions
of the paper are: (1) scaling REINFORCE to a production
recommender system with an action space on the orders of millions;
(2) applying off-policy correction to address data biases in learning
from logged feedback collected from multiple behavior policies; (3)
proposing a novel top-K off-policy correction to account for our
policy recommending multiple items at a time; (4) showcasing the
value of exploration. We demonstrate the efficacy of our approaches
through a series of simulations and multiple live experiments on
YouTube.
View details