Utility of model criticism indices for Bayesian inference networks in cognitive assessment

David Michael Williamson, Fordham University


This study provides a context of cognitive assessment and the influence of model-based measurement, artificial intelligence and technological developments on the feasibility of cognitive assessment utilizing complex constructed-response tasks. While cognitive assessment has benefited from a wealth of knowledge from cognitive research the modeling of cognitive variables for assessment has largely depended on the skill of the modeler with little assistance from empirical methodologies. This study investigates an empirical methodology for model criticism of student models utilizing Bayesian Inference Networks with latent variables. Three indices (Weaver's Surprise Index, Good's Logarithmic Score, and the Ranked Probability Score) were examined for their ability to detect errors of inclusion and exclusion of nodes, strong directed edges, weak directed edges, and node states as well as prior probability misspecification in the latent structure of a hypothetical Bayesian Inference Network. Using Monte Carlo simulations of candidate data this investigation was conducted under a number of sample sizes (N of 50, 100, 250, 500 and 1000) to determine the efficiency of each index for types of model errors. Simulated candidate data was compared to the parameters for critical values for each index determined through a bootstrapping procedure with Monte Carlo data generated to be consistent with the model under examination (a null hypothesis data set). The results suggest that both Weaver's Surprise Index and the Ranked Probability Score are capable of detecting the exclusion of nodes and strong edges at small sample sizes and both the inclusion and exclusion of nodes and strong edges at large sample sizes. The results also suggest that the Ranked Probability Score has an advantage of efficiency over Weaver's Surprise Index in the detection of inclusion of nodes and strong edges at small sample sizes. A further advantage of the Ranked Probability Score is suggested by the ability to detect node state exclusion and prior probability error at high sample sizes. The results suggest that Weaver's Surprise Index may possess an advantage in the ability to detect weak edge exclusion errors at high sample sizes. The sole utility of Good's Logarithmic Score suggested by this study is for the detection of errors of exclusion and inclusion of node states at both small and large sample sizes, an error undetected by the other indices. Of course, replications utilizing different Bayesian Inference Network model structures and latent structure error locations are necessary before these results can be relied upon for common practice. Implications for applications to model criticism and development of student models for cognitive assessment are discussed. ^

Subject Area

Statistics|Psychology, Psychometrics

Recommended Citation

Williamson, David Michael, "Utility of model criticism indices for Bayesian inference networks in cognitive assessment" (2000). ETD Collection for Fordham University. AAI9964578.