HCII Ph.D. Thesis Proposal: Napol Rachatasumrit
When
-
Description
Meaningful Models: Unlocking Insights Through Model Interpretations in Educational Data Mining
Napol Rachatasumrit
HCII PhD Thesis Proposal
Time & Location:
Monday, Jul 22nd, 2024, 1pm EST
Room: Newell-Simon Hall (NSH) 3305
Zoom Meeting ID: 552 785 1174
Committee:
Kenneth Koedinger (Co-chair), HCII, CMU
Paulo Carvalho (Co-chair), HCII, CMU
Kenneth Holstein, HCII, CMU
Adam Sales, Mathematical Sciences, WPI
Abstract:
The conventional wisdom in Educational Data Mining (EDM) suggests that a superior model fits the data better. However, this perspective overlooks a critical aspect: models that prioritize prediction accuracy often fail to provide scientifically or practically meaningful interpretations and explanations. Interpretations and explanations are crucial for scientific insight and are useful for practical applications, especially from the human-computer interaction perspective. For example, Deep Knowledge Tracing (DKT) has been demonstrated to have a superior predictive power of student performance; however, its parameters do not have an association with any latent constructs, so there have been no scientific insights or practical applications resulting from it. In contrast, Additive Factor Model (AFM) often underperforms DKT in prediction accuracy, but its parameter estimates have meaningful interpretations (e.g., the slope illustrates the rate of learning of knowledge components) that lead to new scientific insights (e.g. improved cognitive models discovery) and results in useful practical applications (e.g. an intelligent tutoring system redesign). In this thesis, I argue for a claim that interpretations and explanations are what we need and not interpretable or explainable models that are not interpreted or explained, especially in the context of EDM. I aim to develop inherently interpretable or "meaningful" models that transcend post-hoc explanations of black-box models. Specifically, the variables and parameters of these meaningful models are associated with meaningful latent variables.
I make this argument with several examples of scenarios where the existing mechanisms or models are insufficient to produce meaningful interpretations and suggest strategies to investigate and fix them. For example, Performance Factor Analysis (PFA) has been demonstrated to outperform AFM, but we demonstrated that PFA parameters are confounded, which resulted in ambiguous interpretations. We then proposed an improved model that not only de-confound the parameters but also presented meaningful interpretations that lead to insights on the real-student datasets.
In my thesis, leveraging my experience from past projects, I propose generalized strategies for developing meaningful models and apply them to develop a model to capture spacing effect. Additionally, I will develop a recommender system to suggest an optimal study schedule based on the newly developed model to demonstrate the superiority of meaningful models, compared to black-box models, in practical applications. I will conduct in-vivo studies with middle school students in the biology domain to demonstrate the effectiveness of the system.
Proposal Document: https://