CMU logo
Expand Menu
Close Menu

Language Technologies Ph.D. Thesis Defense

Speaker
CHUNTING ZHOU
Ph.D. Student
Language Technologies Institute
Carnegie Mellon University

When
-

Where
In Person and Virtual - ET

Description
The advancement of neural network models has led to state-of-the-art performance in a wide range of NLP tasks, e.g.~machine translation and question answering. Despite the remarkable performance gains, NLP systems deployed in the wild are brittle and fragile as it always experiences a shift in the data distribution: the training data distribution is different from the one at test time. For example, a multilingual machine translation model is expected to perform uniformly well across a set of language pairs while the training resources can be extremely imbalanced across different language pairs. Such distribution shift is also ubiquitous in the modern pretrain-then-fine-tune paradigm, where models pre-trained on a large text corpus are fine-tuned on various downstream tasks. Real-world applications demand robust adaptation methods such that a pre-trained model is robust to various types of distribution shift in a dynamically changing test environment. To counter distribution shift, the goal of this thesis is to identify potential problems when models are evaluated under distribution shift, and to mitigate such discrepancies by developing distributionally robust methods and efficient transfer learning methods. This thesis consists of three parts. In the first part, we focus on the hallucination problem in conditional sequence generation where a model can generate fluent outputs but not faithful to the input text, particularly when tested on out-of-domain data. We identify and quantify the unfaithful tokens in the machine outputs and leverage the created tools to improve training in a low-resource setting by reducing hallucinated content in the noisy training data. The second part presents our distributionally robust methods for subpopulation shift where the training data is a mixture of different subpopulations, e.g.~different languages, demographic groups, etc, and the test distribution is a subpopulation of it. Models trained on a dataset with imbalanced distribution of subpopulations can perform poorly on data from minority subpopulations. To mitigate this, we develop group-level distributionally robust methods that perform well over a set of potential test distributions. The last part of this thesis focuses on robust transfer learning of large-scale pre-trained language models. As the size of pre-trained language models (PLMs)is increasing every year, how to effectively adapt these models to downstream tasks is becoming increasingly important as models can catastrophically forget its previously acquired knowledge during transfer learning. Parameter-efficient fine-tuning provides an effective and robust way for this by only tuning a small number of additional parameters. To this end, we propose a unified framework to connect parameter-efficient transfer learning methods and instantiate a new state-of-the-art method. Furthermore, we developed a method based on parameter-efficient tuning to improve the performance of a large-scale zero-shot transfer learner. Thesis Committee: Graham Neubig (Chair)    Shinji Watanabe              Zico Kolter                        Luke Zettlemoyer (University of Washington / Meta AI) Additional Information

In Person and Zoom Participation.  See announcement.