Robotics Thesis Defense

Speaker
XIAOFANG WANG
Ph.D. Student
Robotics Institute
Carnegie Mellon University

When
April 21, 2022 2:00pm - April 21, 2022 2:00pm

Where
In Person and Virtual - ET

Description

Neural architecture search (NAS) is recently proposed to automate the process of designing network architectures. Instead of manually designing network architectures, NAS automatically finds the optimal architecture in a data-driven way. Despite its impressive progress, NAS is still far from being widely adopted as a common paradigm for architecture design in practice. This thesis aims to develop principled NAS methods that can automate the design of neural networks and reduce human efforts in architecture tuning as much as possible. To achieve this goal, we focus on developing better search algorithms and search spaces, both of which are important for the performance of NAS. For search algorithms, we first present an efficient NAS framework using Bayesian optimization (BO). Specifically, we propose a method to learn an embedding space over the domain of network architectures, which makes it possible to define a kernel function for the architecture domain, a necessary component to applying BO to NAS. Then, we propose a neighborhood-aware NAS formulation to encourage the selection of architectures with strong generalization capability. The proposed formulation is general enough to be applied to various search algorithms, such as random search, reinforcement learning, and differentiable NAS methods. For search spaces, we first propose a search space for spatiotemporal attention cells that use attention operations as the primary building block. The attention cells found from our search space not only outperform manually designed ones, but also demonstrate strong generalization across different modalities, backbones, or datasets. Then, we show that committee-based models (ensembles or cascades) are an overlooked design space for efficient models. We find that simply building committees from existing, independently pre-trained models can match or exceed the accuracy of state-of-the-art models while being drastically more efficient. Finally, we point out the importance of controlling the cost in the comparison of different LiDAR-based 3D object detectors. We show that, SECOND, a simple baseline which is generally believed to have been significantly surpassed, can almost match the performance of the state-of-the-art method on the Waymo Open Dataset, if we compare them under the same latency.

Thesis Committee: Kris M. Kitani (Chair) Deva Ramanan Jeff Schneider Michael S. Ryoo (Stony Brook University & Google) Additional Information

In Person and Zoom Participation.