Skip to main content
Seminar | Mathematics and Computer Science

From Paraphrase Modeling to Controlled Generation

LANS Informal Seminar

Abstract: A key challenge in natural language understanding is recognizing when two sentences have the same meaning. I will discuss our work on this problem over the past few years, including the exploration of compositional functional architectures, learning criteria, and naturally occurring sources of training data. The result is a single sentence embedding model that outperforms all systems from the 2012-2016 SemEval semantic textual similarity competitions without training on any of the annotated data from those tasks. As a by-product, we developed a large dataset of automatically generated paraphrase pairs by using parallel text and neural machine translation. We have since used the dataset, which we call ParaNMT-50M, to impart a notion of meaning equivalence to controlled text generation tasks, including syntactically controlled paraphrasing and textual style transfer. 

Bio: Kevin Gimpel is an assistant professor at the Toyota Technological Institute at Chicago, a philanthropically endowed academic computer science institute on the campus of the University of Chicago. He received his Ph.D. from the Language Technologies Institute at Carnegie Mellon University in 2012. His research focuses on natural language processing and machine learning. Recent interests include paraphrase recognition, narrative modeling, commonsense knowledge representation, and structured prediction in the era of deep learning.

This seminar will be streamed.