Investigating test fairness of GRE scores for veterinary student selection

Trina Caprice Reuben, Fordham University


This study explored methods for investigating test fairness using Graduate Record Examination (GRE) test scores to predict first year veterinary school grades. The GRE General test is widely used across veterinary schools to make selection decisions. Since women comprised 70% of the first year enrolled veterinary students, it was expected that women selected using the same criteria as that used to select men would obtain comparable grades. Hierarchical linear models (HLM) were compared to general linear models (GLM) to evaluate the differential validity and differential prediction observed between males and females. It was believed that HLM when compared to GLM would change the magnitude and interpretation of differential validity and differential prediction. It was expected that school level variables would improve model fit and reduce the amount of mean residual differences between men and women. For comparison purposes, HLM and GLM model results were compared to Ordinary Least Squares (OLS) regression model results for each school. Perception survey items about school satisfaction and workload were also included in the regression models. In total, a sample of 1058 first year veterinary students across 14 schools was used for the evaluation. The total analysis sample was split into calibration and validation sub-samples. Mean square residuals (MSR) and mean residuals (MR) were compared across 12 models and 3 methods. Results obtained with the calibration sample indicated that when GRE scores and UGPA scores were used in combination in the prediction models, the school performance of men would be under-predicted as measured by 1st year grade point average and school performance for women would be over-predicted using OLS, GLM and HLM. In contrast, the validation sample showed mixed results where some models showed under-prediction for women and over-prediction for men. An impact analysis was completed using the MRs obtained on the validation sample. When additional school level variables and perception survey item composite scores were included in the prediction model a decrease was seen in mean residual gender differences for both HLM and GLM methods. Model fit and regression coefficients for the GLM and HLM methods were very similar across all models. ^

Subject Area

Education, Tests and Measurements|Psychology, Psychometrics

Recommended Citation

Reuben, Trina Caprice, "Investigating test fairness of GRE scores for veterinary student selection" (2003). ETD Collection for Fordham University. AAI3083156.