Language Technologies Ph.D. Thesis Proposal
Speaker
HAI PHAM
Ph.D. Student
Language Technologies Institute
Carnegie Mellon University
When
-
Where
Virtual Presentation - ET
Description
Nowadays with the deluge of big data, it becomes more and more challenging for institutions and enterprises that tackle such a huge amount of data to extract useful information for informed decisions. With the fast development of machine learning and artificial intelligence (AI), particularly in deep learning, the question as to how to efficiently make use of such enormous data has still been open. As known to the AI community, the cost of annotating or labeling data for fully- or semi-supervised learning is always prohibitively expensive, and in many cases, cannot be done given the fast-growing pace of the field. As a result, more and more resources have been invested in self-supervised or unsupervised learning methods, paving way for the new future of AI, where trained agents are able to make use of heterogeneous data from various sources of information. Despite the optimistic prospect and many advancements in such methods, however, there has been a lack of a systematic methodology on how to properly learn the representation of different kinds of data, and how to interpret such embedded / intermediate representation. As a result, this dissertation aims to help answer part of those questions, in the context of natural language as well as multimodal data, which has complicated distributions. In practice, this type of data can appear in many forms such as text, images, audio, videos, or any combination of them. For self-supervised or unsupervised learning, as a result, modeling such complicated distributions is always a difficult job in terms of both methods and computation, and has not been well-explored despite recent advancements in those areas.
In this proposal, we present our work on representation learning with different types of data, such as multimodal data and noisy scanned handwriting images. Their data nature and different objectives of the associated tasks will guide the suitable methods of data representation. In addition, in terms of dealing with big data, we also present some approximation techniques to estimate the essential components and speed up the learning process, including practical matrix trace approximation with a parallel non-adaptive method, and spectrum approximation in Gaussian Processes training. For future work, we present some proposed methods for effective self-supervised learning for multimodal data in the application of document intelligence, and new approaches for enforcing language knowledge into learning disentanglement of data for controllable text generation.
Thesis Committee:
Barnabás Póczos, (Co-chair)
David P. Woodruff, (Co-chair)
Lori Levin
Zoltán Szabó, (London School of Economics and Political Science)
Additional Information
Zoom Participation. See announcement.