Vision and Autonomous Systems Seminar

Speaker
QITAO ZHAO, VIMAL MOLLYN, HYUNSUNG CHO

When
October 28, 2024 3:30pm - October 28, 2024 4:30pm

Where
Newell-Simon 3305

Description

Qitao Zhao, Master's Student, MS in Computer Vision, Robotics Institute

— Sparse-view Pose Estimation and Reconstruction via Analysis by Generative Synthesis This talk will present our approach for reconstructing objects from sparse-view images captured in unconstrained environments. In the absence of ground-truth camera poses, we will demonstrate how to utilize estimates from off-the-shelf systems and address two key challenges: refining noisy camera poses in sparse views and effectively handling outlier poses. Qitao Zhao is a second-year Master's student in Computer Vision at CMU, RI, advised by Prof. Shubham Tulsiani. His research focuses on camera pose estimation and 3D reconstruction in the wild. He holds a Bachelor's degree from Shandong University in China and was a visiting student at the University of Central Florida, where he worked with Prof. Chen Chen. Additional Information

Vimal Mollyn, Ph.D. Student, Human Computer Interaction Institute

— EgoTouch: On-Body Touch Input Using AR/VR Headset Cameras In augmented and virtual reality (AR/VR) experiences, a user’s arms and hands can provide a convenient and tactile surface for touch input. Prior work has shown on-body input to have significant speed, accuracy, and ergonomic benefits over in-air interfaces, which are common today. In this work, we demonstrate high accuracy, bare hands (i.e., no special instrumentation of the user) skin input using just an RGB camera, like those already integrated into all modern XR headsets. Our results show this approach can be accurate, and robust across diverse lighting conditions, skin tones, and body motion (e.g., input while walking). Finally, our pipeline also provides rich input metadata including touch force, finger identification, angle of attack, and rotation. We believe these are the requisite technical ingredients to more fully unlock on-skin interfaces that have been well motivated in the HCI literature but have lacked robust and practical methods. Vimal Mollyn is a PhD student in the Future Interfaces Group at Carnegie Mellon University where I’m advised by Chris Harrison. I’m interested in creating new ways for people to interact with the world using my background in sensing and machine learning. Previously I graduated with a Bachelors and Masters from IIT Madras, where I majored in Engineering Design and Data Science. Additional Information

Hyunsung Cho, Ph.D. Student, Human-Computer Interaction Institute

— Auptimize: Optimal Placement of Spatial Audio Cues for Extended Reality Spatial audio in Extended Reality (XR) provides users with better awareness of where virtual elements are placed, and efficiently guides them to events such as notifications, system alerts from different windows, or approaching avatars. Humans, however, are inaccurate in localizing sound cues, especially with multiple sources due to limitations in human auditory perception such as angular discrimination error and front-back confusion. This decreases the efficiency of XR interfaces because users misidentify from which XR element a sound is coming. To address this, we propose Auptimize, a novel computational approach for placing XR sound sources, which mitigates such localization errors by utilizing the ventriloquist effect. Auptimize disentangles the sound source locations from the visual elements and relocates the sound sources to optimal positions for unambiguous identification of sound cues, avoiding errors due to inter-source proximity and front-back confusion. Our evaluation shows that Auptimize decreases spatial audio-based source identification errors compared to playing sound cues at the paired visual-sound locations. We demonstrate the applicability of Auptimize for diverse spatial audio-based interactive XR scenarios. Hyunsung Cho is a fourth-year Ph.D. student in the Human-Computer Interaction Institute (HCII) at Carnegie Mellon University, advised by Prof. David Lindlbauer. Her research focuses on designing, implementing, and evaluating context-aware Extended Reality (XR) interfaces and multimodal interaction techniques in XR to enable seamless, unobtrusive human-computer interactions. Her work combines computational modeling of human perception and behavior, user-centered design, and intelligent systems to create adaptive interfaces for diverse user contexts. Her research has received the Best Paper Awards and Methods Recognition at ACM CSCW and ACM ISS. She holds a M.S. and B.S. in Computer Science from KAIST. She has previously worked as a Research Scientist Intern at Meta's Reality Labs Research and Nokia Bell Labs' Pervasive Systems research group. Additional Information

The VASC Seminar is sponsored in part by Meta Reality Labs Pittsburgh