CMU logo
Search
Expand Menu
Close Menu

HCII Ph.D. Proposal: Steven Moore

Open in new window

When
-

Description

Creating and Evaluating Pedagogically Valid Assessments at Scale

 

 

Time & Location

Tuesday, April 30th, 2024 - 2 pm EST

Newell-Simon Hall (NSH) 4305 

 

Zoom Link

Meeting ID: 915 6147 1758

Passcode: 582492

 

Committee

John Stamper (chair), Carnegie Mellon University

Ken Koedinger, Carnegie Mellon University

Sherry Tongshuang Wu, Carnegie Mellon University

Christopher Brooks, University of Michigan 

 

Abstract

Multiple-choice questions (MCQs) are the predominant form of assessment in educational environments, known for their efficiency and scalability. Traditionally, these questions are crafted by instructors, a method that despite its expertise often results in inconsistencies and errors. In response to these limitations and the need for scalability, learnersourcing has been leveraged, which involves students in the question creation process. Although this method capitalizes on the diverse perspectives of students, it also leads to significant variability in the quality of the questions produced. Additionally, while recent advances in artificial intelligence have facilitated more scalable and automated methods for generating MCQs, these AI-driven methods still suffer from many of the same shortcomings as those created by humans. Current evaluation methods for MCQs predominantly rely on human judgment, which introduces subjectivity and lacks scalability. While automated evaluation methods provide scalability, they fall short in adequately assessing the educational value of questions, focusing instead on surface-level features that do not match expert evaluation.

To improve the evaluation of MCQs across all creation methods, I propose the Scalable Automated Question Usability Evaluation Toolkit (SAQUET). This toolkit provides a domain-agnostic approach to evaluate the quality of educational MCQs with a focus on their pedagogical implications. Utilizing natural language processing techniques, SAQUET applies 19 criteria from the Item-Writing Flaws (IWF) rubric, classifies questions according to Bloom's Revised Taxonomy levels, and suggests a set of hypothesized skills that the question might require. This multifaceted approach allows for a more nuanced evaluation of MCQs, providing immediate and actionable feedback to instructors, students, and other educational stakeholders who engage in question creation.

In my research thus far, I have demonstrated that students are capable of generating high-quality assessments when given minimal scaffolding and support from technology. I have investigated the potential of involving students and crowdworkers in generating and evaluating the skills required to solve problems. I have shown how crowdworkers can leverage the IWF rubric to evaluate questions akin to human experts. Finally, through a preliminary study I  have indicated that the automated application of the IWF rubric to evaluate question quality yields results similar to those of human evaluations. In my proposed work, I plan to extend the capabilities of SAQUET, testing its effectiveness across various educational domains, integrating skill tagging, and refining Bloom’s Revised Taxonomy classifications within the toolkit. I will conduct case studies with educational practitioners to delve deeper into the challenges and opportunities in the MCQ evaluation process, aiming to enhance the reliability and pedagogical effectiveness of these assessments.

 

Proposal Document

https://docs.google.com/document/d/1Jlu2CuQEGtrGlbMbIK8sr3zzXykjnhbzDLwgWVCm_W8/edit?usp=sharing