CMU logo
Expand Menu
Close Menu

PhD These Defense: Toby Li, "A Multi-Modal Intelligent Agent that Learns from Demonstrations and Natural Language Instructions"

Speaker
Toby Li

When
-

Where
Virtual (zoom location in home department announcement)

Description

Thesis Committee:
Brad A. Myers, CMU (Chair)
Tom M. Mitchell, CMU
Jeffery Bigham, CMU
John Zimmerman, CMU
Philip J. Guo, UC San Diego
 
 
Abstract:
Intelligent agents that can perform tasks on behalf of users have become increasingly popular with the growing ubiquity of “smart” devices such as phones, wearables, and smart home devices. They allow users to automate common tasks and to perform tasks in contexts where the direct manipulation of traditional graphical user interfaces (GUIs) is infeasible or inconvenient. However, the capabilities of such agents are limited by their available skills (i.e., the procedural knowledge of how to do something) and conceptual knowledge (i.e., what does a concept mean). Most current agents (e.g., Siri, Google Assistant, Alexa) either have fixed sets of capabilities or mechanisms that allow only skilled third-party developers to extend agent capabilities. As a result, they fall short in supporting “long-tail” tasks and suffer from the lack of customizability and flexibility. 
 
To address this problem, I and my collaborators designed SUGILITE, a new intelligent agent that allows end users to teach new tasks and concepts in a natural way. SUGILITE uses a multimodal approach that combines programming by demonstration (PBD) and learning from natural language instructions to support end-user development for intelligent agents. The lab usability evaluation results showed that the prototype of SUGILITE allowed users with little or no programming expertise to successfully teach the agent common smartphone tasks such as ordering coffee, booking restaurants, and checking sports scores, as well as the appropriate conditionals for triggering these actions on user-defined situations using user-taught concepts for determining these conditions. My dissertation presents a new human-AI interaction paradigm for interactive task learning, where the existing third-party app GUIs are used as a medium for users to communicate their intents with an AI agent in addition to being the interface for interacting with the underlying computing services. 
 
Through the development of the integrated SUGILITE system over the past five years, this dissertation presents seven main technical contributions including: (i) a new approach to allow the agent to generalize from learned task procedures by inferring task parameters and their associated possible values from verbal instructions and mobile app GUIs, (ii) a new method to address the data description problem in PBD by allowing users to verbally explain ambiguous or vague demonstrated actions, (iii) a new multi-modal interface to enable users to teach the conceptual knowledge used in conditionals to the agent, (iv) a new mechanism to extend mobile app based PBD to smart home and Internet of Things (IoT) automation, (v) a new multi-modal interface that helps users discover, identify the causes of, and recover from conversational breakdowns using existing mobile app GUIs for grounding, (vi) a new privacy-preserving approach that can identify and obfuscate the potential personal information in GUI-based PBD scripts based on the uniqueness of information entries with respect to the corresponding app GUI context, and (vii) a new self-supervised technique for generating semantic representations of GUI screens and components in embedding vectors without requiring manual annotation.

 

Host
Queenie Kravitz