Research areas

I have research interests in various topics on social computing and computational social science, involving quantitative data analysis such as social network analysis and natural language processing. Below is a summary of three main themes of my research.

Identifying factors driving longevity in online services (CSCW’16, WWW’17)

Having a substantial number of loyal users is a key for the success of online social platforms. When building a social platform where which users enjoy over a long period, it is crucial to identify predictive signals for long-term engagement, also known as user longevity. We analyze and identify key factors driving user longevity across different kinds of social services such as online multiplayer games (WWW’17) and fitness apps (CSCW’16). Across different types of platforms, we identified early connections with peers are highly associated with one’s future engagement. Such effect was observed even in a cross-platform setting where user identifiers can be matched across dual services. Analyzing behavioral logs of a fitness app shared on Twitter, we discovered connecting with Twitter peers with health-related interests is positively associated with longevity in the fitness app. I believe employing such signals at an early stage of user engagement will help users stay in online services over a long period, and hence help online services succeed.

Discovering rating bias in online customer service (CIKM’15, WWW’18)

Customer ratings are valuable sources to understand their satisfaction and are critical for designing better customer experiences and recommendations. However, previous studies suggest that customer ratings are systemically biased due to presence of others (e.g., social influence bias). Also, the majority of customers do not respond to rating surveys, which suggests that their responses are biased and hence become not representative. To understand overall satisfaction, this paper aims to investigate how likely customers without responses had satisfactory experiences compared to those respondents. To infer customer satisfaction of such unlabeled sessions, we have proposed several models including random forest classifier that learns varying sentiment (CIKM’15) and recurrent neural networks (WWW’18) that learn continuous representations of unstructured text conversation. From chat logs from Samsung’s customer service department, we make a novel finding that while labeled sessions contributed by a small fraction of customers received overwhelmingly positive reviews, the majority of unlabeled sessions would have received lower ratings by customers. This research not only helps detect dissatisfied customers on live chat services, but also make theoretical contributions on discovering the level of biases in online customer ratings.

Clickbait detection using deep neural networks (ongoing)

Fake news is a significant threat to our democracy and society, and thus developing tools for detecting such misinformation can minimize risks by assiting readers to select appropriate articles to read. I am working on developing algorithms and interaction tools to identify clickbait article, which is one kind of misinformation.