CMU logo
Search
Expand Menu
Close Menu

HCII Seminar Series - Ding Wang

Open in new window

Ding Wang

Speaker
Ding Wang
HCI Researcher at Technology and Society Collective (TaSC), Responsible AI, Google Research

When
-

Where
Newell-Simon Hall 1305

Video
Video link

Description

"The Work of Data Annotation and Annotator Diversity"

Diversity in datasets is a key component to building responsible AI/ML. Despite this recognition, we know little about the diversity among the annotators involved in data production. Additionally, despite being an indispensable part of AI, data annotation work is often cast as simple, standardized and even low-skilled work. In this talk, I present a series of studies that aimed at unpacking the data annotation process with an emphasis on the data worker who lift the weight of data production. This includes interview studies to uncover both the data annotator’s perspective of their work and the data requestor’s approach to the diversity and subjectivity the workers bring; an ethnographic investigation in data centers to study the work practices around data annotation; a mixed methods study to explore the impact of worker demographic diversity on the data they annotate. While practitioners described nuanced understandings of annotator diversity, they rarely designed dataset production to account for diversity in the annotation process. This calls for more attention to a pervasive logic of representationalist thinking and counting that is intricately woven into the day to day work practices of annotation. In examining structure in which the annotation is done and the diversity is seen, this talk aims to recover annotation and diversity from its reductive framing and seek alternative approaches to knowing and doing annotation.

Speaker's Bio

Ding Wang, is a senior HCI researcher from Google AI, Responsible AI and Human Centered Technology Group. Her research focuses on the norms, processes and production of data (e.g. the collection, annotation and documentation on data) and responsible data practices that are essential to ML and AI systems.