CMU logo
Expand Menu
Close Menu

Robotics Thesis Defense

Speaker
ADAM HARLEY
Ph.D. Student
Robotics Institute
Carnegie Mellon University

When
-

Where
In Person and Virtual - ET

Description

Most computer vision models in deployment today are not continually learning. Instead, they are in a “test” mode, where they will behave the same way perpetually, until they are replaced by newer models. This is a problem, because it means the models may perform poorly as soon as their “test” environment diverges from their “training” environment. As we work towards building models that can be useful in increasingly complex tasks and environments, we need to provide machines with the ability to learn and improve on their own. In this thesis, we investigate methods for computer vision architectures to self-improve in unlabelled data, by exploiting rich regularities of the natural world itself. As a starting point, we embrace the fact that the world is 3D, and design neural architectures that map RGB-D observations into 3D feature maps. This representation allows us to generate self-supervision objectives using other regularities: we know that two objects cannot be in the same location at once, and that multiple views can be related with geometry. We use these facts to train viewpoint-invariant 3D features (unsupervised), and yield improvements in object detection and tracking. We also show that the same architecture with minor modifications can produce state-of-the-art results as a perception system for autonomous vehicles, where the goal is to estimate a “bird's eye view” semantic map from multiple sensors. We then shift focus to extracting information from dynamic scenes. We demonstrate that useful object representations can be captured entirely unsupervised, by matching appearance cues with simple heuristics such as independent motion and connectedness. Finally, we propose a way to improve motion estimation itself, by revisiting the classic concept of “particle videos”. Using learned temporal priors and within-inference optimization, we can track points across occlusions, and outperform flow-based and feature-matching methods on fine-grained multi-frame correspondence tasks.

Thesis Committee: Katerina Fragkiadaki (Chair) Deva Ramanan Christopher G. Atkeson Andrew Zisserman (University of Oxford) Additional Information

In Person and Zoom Participation. See announcement.