Bhargav Kanagal
Research Areas
      Authored Publications
    
  
  
  
    
    
  
      
        Sort By
        
        
    
    
        
          
            
              ShopTalk: A System for Conversational Faceted Search
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        D. Sivakumar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ebenezer Omotola Anjorin
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Gurmeet Singh Manku
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ilya Eckstein
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        James Patrick Lee-Thorp
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jim Rosswog
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jingchen Feng
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Joshua Ainslie
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Larry Adams
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Michael Anthony Pohl
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Sudeep Gandhe
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zach Pearson
                      
                    
                  
              
            
          
          
          
          
            SIGIR eCom '22 (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              We present ShopTalk, a multi-turn conversational faceted search system for Shopping that is designed to handle large and complex schemas that are beyond the scope of state of the art slot-filling systems. ShopTalk decouples dialog management from fulfillment, thereby allowing the dialog understanding system to be domain-agnostic and not tied to the particular Shopping application. The dialog understanding system consists of a deep-learned Contextual Language Understanding module, which interprets user utterances, and a primarily rules-based Dialog-State Tracker (DST), which updates the dialog state and formulates search requests intended for the fulfillment engine. The interface between the two modules consists of a minimal set of domain-agnostic ``intent operators,'' which instruct the DST on how to update the dialog state. ShopTalk was deployed in 2020 on the Google Assistant for Shopping searches.
              
  
View details
          
        
      
    
        
          
            
              MAVE: A Product Dataset for Multi-source Attribute Value Extraction
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Li Yang
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Qifan Wang
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zac Yu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Anand Kulkarni
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bin Shu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jon Elsas
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            WSDM 2022 (2022)
          
          
        
        
        
          
              Preview abstract
          
          
              Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. Product attribute values are essential in many e-commerce scenarios, such as customer service robots, product ranking, retrieval and recommendations. While in the real world, the attribute values of a product are usually incomplete and vary over time, which greatly hinders the practical applications.
In this paper, we introduce MAVE, a new dataset to better facilitate research on product attribute value extraction. MAVE is composed of a curated set of 2.2 million products from Amazon pages, with 3 million attribute-value annotations across 1257 unique categories. MAVE has four main and unique advantages: First, MAVE is the largest product attribute value extraction dataset by the number of attribute-value examples by 8x. Second, MAVE includes multi-source representations from the product, which captures the full product information with high attribute coverage. Third, MAVE represents a more diverse set of attributes and values relative to what previous datasets cover. 
Lastly, MAVE provides a very challenging zero-shot test set, as we empirically illustrate in the experiments. We further propose a novel approach that effectively extracts the attribute value from the multi-source product information. We conduct extensive experiments with several baselines and show that MAVE is challenging for the latest attribute value extraction models, especially on zero-shot attribute extraction.
              
  
View details
          
        
      
    
        
          
            
              DOCENT: Learning Self-Supervised Entity Representations from Large Document Collections
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Yury Zemlyanskiy
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Sudeep Gandhe
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ruining He
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Anirudh Ravula
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Juro Gottweis
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Fei Sha
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Ilya Eckstein
                      
                    
                  
              
            
          
          
          
          
            Proceedings of EACL (2021) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              This paper explores learning rich self-supervised entity representations from large amounts of associated text. Once pre-trained, these models become applicable to multiple entity-centric tasks such as search ranked retrieval, knowledge base completion, question answering and more. Unlike other methods that harvest self-supervision signals based merely on a local context within a sentence, we radically expand the notion of context to include {\em any} available text related to an entity. With the breadth and depth of textual content available on the web, this approach enables a new class of powerful, high-capacity representations that can ultimately ``remember" any useful information about an entity, without the need for human annotations. 
We present several training strategies that jointly learn to predict words and entities --- strategies we compare experimentally on downstream tasks in the TV-Movies domain, such as MovieLens tag prediction from user reviews and natural language movie search. As evidenced by results, our models outperform competitive baselines, sometimes with little or no fine-tuning, and are also able to scale to very large corpora.
Finally, we make our datasets and pre-trained models publicly available\footnote{To be released after the review period.}. This includes {\em Reviews2Movielens}, mapping the 1B word corpus of Amazon movie reviews to MovieLens tags, as well as Reddit Movie Suggestions containing natural language queries and corresponding community recommendations.
              
  
View details
          
        
      
    
        
        
          
              Preview abstract
          
          
              Transformer is the backbone of modern NLP models. In this paper, we propose RealFormer, a simple and generic technique to create Residual Attention Layer Transformer networks that significantly outperform canonical Transformer and its variations of different sizes on a wide spectrum of tasks/benchmarks including Masked Language Modeling, GLUE, SQuAD, Neural Machine Translation, WikiHop, HotpotQA, Natural Questions, and OpenKP. Qualitatively, RealFormer stabilizes training and leads to models with sparser attentions. Code and pre-trained checkpoints will be open-sourced.
              
  
View details
          
        
      
    
        
          
            
              Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Qifan Wang
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Li Yang
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        D. Sivakumar
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Bin Shu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Zac Yu
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jon Elsas
                      
                    
                  
              
            
          
          
          
          
            SIGKDD 2020 (2020)
          
          
        
        
        
          
              Preview abstract
          
          
              Attribute value extraction refers to the task of identifying values of an attribute of interest from product information.
It is an important research topic which has been widely studied in e-Commerce and relation learning.
There are two main limitations in existing attribute value extraction methods: scalability and generalizability.
Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications. 
Moreover, very limited research has focused on generalizing extraction to new attributes.
In this work, we propose a novel approach for Attribute Value Extraction via Question Answering (AVEQA) using a multi-task framework. 
In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context.
A unique BERT contextual encoder is adopted and shared across all attributes to encode both the context and the question, which makes the model scalable. 
A distilled masked language model with knowledge distillation loss is introduced to improve the model generalization ability. In addition, we employ a no-answer classifier to explicitly handle the cases where there are no values for a given attribute in the product context.
The question answering, distilled masked language model and the no answer classification are then combined into a unified multi-task framework. 
We conduct extensive experiments on a public dataset. The results demonstrate that the proposed approach outperforms several state-of-the-art methods with large margin.
              
  
View details
          
        
      
    
        
          
            
              Constructing a Comprehensive Events Database from the Web
            
          
        
        
          
            
              
                
                  
                    
    
    
    
        
         
          
  
Preview
        
    
  
                      
                        Qifan Wang
                      
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vijay Garg
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        D. Sivakumar
                      
                    
                  
              
            
          
          
          
          
            The 28th ACM International Conference on Information and Knowledge Management (CIKM 2019) (2019)
          
          
        
        
          
            
              Recommendations for all : solving thousands of recommendation problems a day
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
    
    
    
    
    
            Proceedings of the 34th IEEE International Conference on Data Engineering (ICDE) (2018) (to appear)
          
          
        
        
        
          
              Preview abstract
          
          
              Recommendations are known to be an important part of several online experiences. Outside of media recommendation (music, movies, etc), online retailers have made use of product recommendations to help users make purchases. Product recommendation tends to be really hard because of the twin problems of sparsity and cold-start. Building a recommendation system that performs well in this setting is hard and is generally considered to need some expert tuning. However, all online retailers need to solve this problem well to provide good recommendations.
In this paper, we tackle this problem and describe an industrial-scale system called Sigmund where we solve tens of thousands of instances of the recommendation problem as a service for various online retailers. for customers. Sigmund was deployed to production in early 2014 and has been serving thousands of retailers. We describe several design decisions that we made in building Sigmund. We also share some of the lessons we learned from this experience –both from a machine learning perspective and a systems perspective. We hope that these lessons are useful for building future machine-learning services.
              
  
View details
          
        
      
    
        
          
            
              A Generic Coordinate Descent Framework for Learning from Implicit Feedback
            
          
        
        
          
            
              
                
                  
                    
    
    
    
    
    
                      
                        Immanuel Bayer
                      
                    
                
              
            
              
                
                  
                    
                    
                      
                        Xiangnan He
                      
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
                  
              
            
          
          
          
          
            Proceedings of the 26th International Conference on World Wide Web (2017), pp. 1341-1350
          
          
        
        
        
          
              Preview abstract
          
          
              In recent years, interest in recommender research has shifted from explicit feedback towards implicit feedback data. A diversity of complex models has been proposed for a wide variety of applications. Despite this, learning from implicit feedback is still computationally challenging. So far, most work relies on stochastic gradient descent (SGD) solvers which are easy to derive, but in practice challenging to apply, especially for tasks with many items. For the simple matrix factorization model, an efficient coordinate descent (CD) solver has been previously proposed. However, efficient CD approaches have not been derived for more complex models.
In this paper, we provide a new framework for deriving efficient CD algorithms for complex recommender models. We identify and introduce the property of k-separable models. We show that k-separability is a sufficient property to allow efficient optimization of implicit recommender problems with CD. We illustrate this framework on a variety of state-of-the-art models including factorization machines and Tucker decomposition. To summarize, our work provides the theory and building blocks to derive efficient implicit CD algorithms for complex recommender models.
              
  
View details
          
        
      
    
        
          
            
              Latent Factor Models with Additive Hierarchically-smoothed User Preferences
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Sandeep Pandey
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vanja Josifovski
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lluis Garcia-Pueyo
                      
                    
                  
              
            
          
          
          
          
            Proceedings of The 6th ACM International Conference on Web Search and Data Mining (WSDM) (2013)
          
          
        
        
        
          
              Preview abstract
          
          
              Items in recommender systems are usually associated with annotated attributes such as brand and price for products; agency for news articles, etc. These attributes are highly informative and must be exploited for accurate recommendation. While learning a user preference model over these attributes can result in an interpretable recommender system and can hands the cold start problem, it suffers from two major drawbacks: data sparsity and the inability to model random effects. On the other hand, latent-factor collaborative filtering models have shown great promise in recommender systems; however, its performance on rare items is poor. In this paper we propose a novel model LFUM, which provides the advantages of both of the above models. We learn user preferences (over the attributes) using a personalized Bayesian hierarchical model that uses a combination (additive model) of a globally learned preference model along with user-specific preferences. To combat  we smooth these preferences over the item-taxonomy  an efficient forward-filtering and backward-smoothing  algorithm. Our inference algorithms can handle both  attributes (e.g., item brands) and continuous attributes (e.g.,  prices). We combine the user preferences with the latent- models and train the resulting collaborative filtering system end- using the successful BPR ranking algorithm. In our  experimental analysis, we show that our proposed model  several commonly used baselines and we carry out an ablation study showing the benefits of each component of our model.
              
  
View details
          
        
      
    
        
          
            
              Focused Marix Factorization for Audience Selection in Display Advertising
            
          
        
        
          
            
              
                
                  
                    
                
              
            
              
                
                  
                    
                    
                  
              
            
              
                
                  
                    
                    
    
    
    
    
    
                      
                        Sandeep Pandey
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Vanja Josifovski
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Lluis Garcia-Pueyo
                      
                    
                  
              
            
              
                
                  
                    
                    
                      
                        Jeff Yuan
                      
                    
                  
              
            
          
          
          
          
            Proceedings of the 29th International Conference on Data Engineering (ICDE) (2013)
          
          
        
        
        
          
              Preview abstract
          
          
              Audience selection is a key problem in display advertising systems in which we need to select a list of users who are interested (i.e., most likely to buy) in an advertising campaign. The users’ past feedback on this campaign can be leveraged to construct such a list using collaborative filtering techniques such as matrix factorization. However, the user-campaign interaction is typically extremely sparse, hence the conventional matrix factorization does not perform well. Moreover, simply combining the users feedback from all campaigns does not address this since it dilutes the focus on target campaign in consideration. To resolve these issues, we propose a novel focused matrix factorization model (FMF) which learns users’ preferences towards the specific campaign products, while also exploiting the information about related products. We exploit the product taxonomy to discover related campaigns, and design models to discriminate between the users’ interest towards campaign products and non-campaign products. We develop a parallel multi-core implementation of the FMF model and evaluate its performance over a real-world advertising dataset spanning more than a million products. Our experiments demonstrate the benefits of using our models over existing approaches.
              
  
View details