Title: Three Kinds of Web Data That Can Help Computers Make Better Sense of Human Language
Speaker: Shane Bergsma
Date:
Time: 3:30 pm
Place: Thorvaldson 105
Abstract:
The field of natural language processing (NLP) aims to develop computer systems that are capable of understanding and responding to human language. To process language robustly, NLP systems need a huge repository of real-world knowledge. In this talk, I describe how I mine interesting knowledge from three kinds of unstructured web data sources: (1) raw English text, extracted from the entire web, (2) bilingual text (i.e., sentences paired with their translations), and (3) visual data found in web images. I describe new and effective ways to apply these knowledge sources to difficult NLP problems. This work can be used to improve a range of important applications, including automated question answering systems (like IBM's Watson), Internet search engines, and tools for automatic machine translation.
Biography:
Shane Bergsma is an NSERC postdoctoral fellow working in the Johns Hopkins University Department of Computer Science. He received his Ph.D. degree from the University of Alberta, under the supervision of Dr. Randy Goebel and Dr. Dekang Lin. Shane's research in natural language processing (NLP) has been published in the proceedings of IJCAI, ACL, EMNLP and NAACL. His algorithms for discourse processing are described in the main textbook for NLP, and covered in introductory NLP courses at Cambridge, Stanford, Penn and NYU. His work received the Best Paper award at the 18th Conference of the Canadian Society for Computational Studies of Intelligence.