Analyzing User Perspectives on Mobile App Privacy at Scale

International Conference on Software Engineering (ICSE) (2022)
Google Scholar

Abstract

In this paper we present a methodology to analyze users’ concerns and perspectives about privacy at scale. We leverage NLP
techniques to process millions of mobile app reviews and extract
privacy concerns. Our methodology is composed of a binary classifier that distinguishes between privacy and non-privacy related
reviews. We use clustering to gather reviews that discuss similar
privacy concerns, and employ summarization metrics to extract
representative reviews to summarize each cluster. We apply our
methods on 287M reviews for about 2M apps across the 29 categories in Google Play to identify top privacy pain points in mobile
apps. We identified approximately 440K privacy related reviews.
We find that privacy related reviews occur in all 29 categories, with
some issues arising across numerous app categories and other issues
only surfacing in a small set of app categories. We show empirical
evidence that confirms dominant privacy themes – concerns about
apps requesting unnecessary permissions, collection of personal
information, frustration with privacy controls, tracking and the selling of personal data. As far as we know, this is the first large scale
analysis to confirm these findings based on hundreds of thousands
of user inputs. We also observe some unexpected findings such
as users warning each other not to install an app due to privacy
issues, users uninstalling apps due to privacy reasons, as well as
positive reviews that reward developers for privacy friendly apps.
Finally we discuss the implications of our method and findings for
developers and app stores.