Interactive Data Science
Course Information
Course Number
HCI Undergraduate:
n/a
HCI Graduate:
05-839
Course Description
The goal of this course is to provide you with the tools to understand data and build data-driven interactive systems. You will learn to tell a story with the data and explore opportunities enabled by interactive data analysis through a combination of lectures, readings of current literature, and practical skills development. Over the course of the semester, you will learn about data science and the entire data pipeline, from collecting and analyzing to interacting with data. We will also cover human-centered aspects of data science and how HCI methods can enhance the interpretation of data. This course requires comfort with programming, as required projects make use of Python and Git. A series of homework assignments helps to lay the groundwork for a final, larger group project.
Course Goals
The learning goals of the course are as follows
- To be able to analyze a dataset, evaluate potential insights, and identify specific questions.
- To introduce the value of data visualization and its principles for designing effective interactive visualizations (e.g., human perception, color theory, storytelling techniques)
- To have a working ability to obtain, analyze, manipulate, transform, and distribute data.
- To introduce common problems with data, such as structural problems, outliers, incomplete data, and dirty data
- To introduce basic concepts in data interpretation, including feature generation, statistical analysis, and classification (e.g., assumptions of data, data quality, missing data, outliers)
- To introduce basic concepts in data collection, including data formats, parsing, and sources of data (Data Structure and Storage)
- Understand and implement basic A/B experiments and understand experimental reliability and validity
- To introduce human-centered data science topics including ethics, fairness, and interpretability
- To provide practical applied examples of the data pipeline through an examination of current literature
- To provide hands on experience with creating data driven applications and a produce a portfolio of such applications
Concepts
- Structured vs unstructured data
- Dealing with heterogeneous data
- Sampling and Bias in Data Collection
- Data transformation and analysis
- Data visualization
- Current research in information-driven interfaces
Skills
- Getting Web data
- Dealing with APIs
- Common data formats
- Data parsing
- Common problems with data
- Tools for analyzing data
- Tools for visualizing data
Some of the specific skills that will be covered in projects include
- Display data from an API on a data-driven application you create
- Create interactive visualizations of data
- Answer a series of intriguing questions from both the data and the corresponding visualization
Prerequisites
The class will involve programming and debugging. If you find programming or debugging extremely difficult, this course may not be for you, as you will have to master several very different programming libraries/concepts in very short order (projects make use of Python-based data science frameworks, including Pandas, Vega-Altair, Streamlit).
Semester Offered and Units
Semester:
Fall, Spring
Graduate:
12
units