CMU logo
Expand Menu
Close Menu

HCII PhD Thesis Proposal: Toby Li, "A Multi-Modal Intelligent Agent that Learns from Demonstrations and Natural Language Instructions"

Speaker
Toby Li

When
-

Where
Gates Hillman Center, room 4405

Description

Thesis Committee:
Brad A. Myers (HCII), Chair
Jeffrey P. Bigham (HCII)
John Zimmerman (HCII) 
Tom M. Mitchell (MLD)
Philip J. Guo (UC San Diego)
 
Abstract:
Intelligent agents that can perform tasks on behalf of users have become increasingly popular with growing ubiquity in “smart” devices such as phones, wearables, and smart home devices. They allow users to automate common tasks, and to perform tasks in contexts where the direct manipulation of traditional graphical user interfaces (GUIs) is infeasible or inconvenient. However, the capabilities of such agents are limited by their available skills (i.e., the procedural knowledge of how to do something) and conceptual knowledge (i.e., what does a concept mean). Most current agents (e.g., Siri, Google Assistant, Alexa) either have fixed sets of capabilities, or mechanisms that allow only skilled third-party developers to extend their capabilities. As a result, they fall short in supporting “long-tail” tasks, and suffer from the lack of customizability and flexibility. 
 
To address this problem, I and my collaborators have designed SUGILITE, a new intelligent agent that allows end users to teach new tasks and concepts. SUGILITE uses a multi-modal approach that combines programming by demonstration (PBD) and learning from natural language instructions to support end-user development for intelligent agents. The preliminary lab usability evaluation results showed that the prototype of SUGILITE allowed users with little or no programming expertise to successfully teach the agent common smartphone tasks such as ordering coffee, booking restaurants, and checking sports scores, as well as the appropriate conditionals for triggering these actions and the relevant concepts for determining these conditions. The users also considered the multi-modal interaction style in SUGILITE natural and easy to use. 
 
Over the course of this research, we have also developed three extensions to the original SUGILITE system: APPINITE, EPIDOSITE, and PUMICE. Through them we present (i) an new approach to allow the agent to generalize from learned task procedures by inferring task parameters and their associated possible values from user verbal instructions and mobile app GUIs, (ii) a new method to address the data description problem in PBD by allowing users to verbally explain ambiguous or vague demonstrated actions, (iii) a new multi-modal interface to enable users to teach the agent conceptual knowledge used in conditionals, and (iv) a new mechanism to extend mobile app based PBD to smart home and Internet of Things (IoT) automation. 
 
To complete the dissertation, I will explore several practical issues pertinent to the wide-adoption of the SUGILITE’s approach for real-life usage. First, we will make the system more robust by enhancing its natural language understanding capability, and designing a more effective technique for end users to handle various types of errors that they may encounter when using PBD agents. Second, we plan to investigate privacy issues in sharing PBD scripts, and develop a new privacy-preserving mechanism for sharing the scripts. Lastly, enabled by the two previously proposed works, we will deploy SUGILITE with a group of actual users, and study how they use SUGILITE in real-life scenarios through an in-situ field study.
 
Document: Download here