CMU logo
Search
Expand Menu
Close Menu

Master of Science in Robotics Thesis Talk

Open in new window

Speaker
AKSHAY DHARMAVARAM
Masters Student
Robotics Institute
Carnegie Mellon University

When
-

Where
In Person

Description

Generative Models have been shown to be adept in mimicking the behavior of an unknown distribution solely from bootstrapped data. However, deep learning models have been shown to overfit in either the minimization or maximization stage of the two player min-max game, resulting in unstable training dynamics. We explore the use of self-supervision as a way to incorporate domain-knowledge to stabilize the training dynamics and avoid overfitting. In this work, we investigate the use of self-supervision to stabilize the training of generative models in the single-step and multi- step domains.

Recently, there has been an increased interest in rewriting single-step generative models to output realistic images that are semantically similar to a handful of fixed, user-defined sketches. However, replicating images from complex poses or distinctive, minimalist art styles (e.g. “the Picasso horse”) have been shown to be difficult. To rectify these failure cases, we propose a method that builds upon the GANSketching architecture by introducing a translation model that shifts the distribution of fake sketches to be more similar to that of the user-sketches while also retaining the essence of the originally generated image. Such a formulation avoids overfitting by the discriminator, thus reducing the discriminability and improving gradient propagation. We also illustrate how the choice in the direction of translation affects the number of training steps required as well as the overall performance of the generator.

The current landscape of multi-step apprenticeship learning is dominated by Adversarial Imitation Learning (AIL) based methods. In this work, we investigate the issues faced by these algorithms and introduce a novel self-supervised loss that encourages the discriminator to approximate a richer reward function. We also employ our method to train a graph-based multi-agent actor-critic architecture that learns a centralized policy, conditioned on a learned latent interaction graph. We show that our method outperforms prior state-of-the-art methods for both the single-agent and multi-agent domains. Furthermore, we prove that our new regularizer is part of the family of AIL methods by providing a theoretical connection to cost-regularized apprenticeship learning. Moreover, we leverage the self-supervised formulation to illustrate novel reward shaping capabilities as well as the introduction of a novel teacher forcing-based curriculum that improves sample efficiency by progressively increasing the length of the generated trajectory.

Thesis Committee: Prof. Katia Sycara (Advisor) Prof. Jean Oh Tabitha Edith Lee