Title: Toward the Twuring Test: Conversation Modeling using Twitter
Speaker: Colin Cherry, Institute for Information Technology, NRC
Date:
Time: 3:30 pm
Place: Thorvaldson 159
Abstract:
The growing popularity of social media has had an interesting side effect for language researchers: services such as Twitter have resulted in people having instant-messenger-style conversations using a public medium. This creates a unique opportunity to collect, study and model large-scale conversation data. We present a method for mining conversations from Twitter's public feed. The resulting conversation corpus has more than 1.3 million conversations, providing a rich resource for the study of both Twitter and Internet chat. We present several methods that attempt to model the flow of conversation by discovering latent classes over Tweets. We show that a repurposed content model can discover meaningful dialogue acts, such as "question" and "comment", which indicate not only the role a Tweet plays in its conversation, but also the sorts of Tweets that are likely to follow. We also present a data-driven Twitter-bot built using this resource. By interpreting Twitter posts as input to the translated into an appropriate response, we are able to repurpose algorithms from Statistical Machine Translation to create a system able to respond to Tweets. We compare approaches based on SMT and Information Retrieval in a human evaluation.
Biography:
Colin Cherry is a Research Officer at the National Research Council Canada. He received his doctorate from the University of Alberta, where he studied under Dekang Lin. Before coming to the NRC, he worked as a Researcher in Microsoft Research's natural language processing group. He is interested in predicting structured outputs, with application to parsing, information extraction and machine translation.