A COMPARATIVE INVESTIGATION INTO THE IDENTIFICATION OF ETHNIC BIAS IN ITEMS ASSESSING CURRENT EDUCATIONAL STATUS
Seven procedures for identifying statistical item bias were compared: four difficulty procedures, including p-values, arcsin and delta transformations of p-values, and Angoff's delta-distance procedure; two discrimination procedures (biserial and point-biserial correlations of items with total test); and Scheuneman's chi-square technique. Fourteen items pools administered in the item analysis of the Metropolitan Readiness Tests (1976 edition) were used. Randomly selected samples of black and white beginning first graders and other samples first matched score for score on an external, correlated variable (1965 MRT total score) were compared, as were two random samples of white pupils.^ Statistical analyses employed rank order correlations between pairs of procedures for each pair of samples and for each item pool. All four difficulty procedures were very highly correlated with each other, though Angoff delta-distance values correlated least highly with the other three. Correlations between the discrimination indices were extremely high, but virtually all correlations of one difficulty procedure with one discrimination procedure were nonsignificant, even at the .05 level. The chi-square procedure correlated moderately with all difficulty procedures. Correlations between the chi-square procedure and the two discrimination indices were extremely modest, though higher than those between discrimination and difficulty procedures. "Matching" the samples improved correlations in all cases.^ An arbitrary cutoff of one and one-half standard deviations from the mean of the distribution of signed item indices for the item pool was used to identify extreme items. Items so identified by a procedure for both matched and random samples of two ethnic groups (but not identified as extreme in the two random white samples) were examined visually for clues as to causes of the bias. Items ranked first, second, last, or next-to-last in the distribution of signed indices were similarly examined.^ Items identified as extreme in their bias against either blacks or whites would not have been identified by reviewers as biased. In general, their face validity was excellent, though a few were poor items by traditional editorial standards. No real generalizations about content of biased items could be made, though more of them involved verbal concepts than required auditory discrimination. Many items found "biased against whites" showed performance of random white and black samples about equal and that of matched samples either equal or with blacks outperforming whites. Some of the "bias" identified appeared to be a function of test format, rather than inherent in the individual item.^ Correlations among procedures are necessary evidence of convergent validity for the construct of statistical item bias, but not sufficient for decisions about which procedure to use, since different procedures identify different items as extreme. Using criteria of time- and cost-effectiveness and explainability to the lay public, as well as psychometric respectability, the Angoff delta-distance procedure and the Scheuneman chi-square technique are recommended. However, none of the procedures used was fully able to eliminate differential ability as a confounding variable in identification of bias. Matching of samples before analysis for bias seems advisable. ^
Educational tests & measurements
BURRILL, LOIS ELIZABETH, "A COMPARATIVE INVESTIGATION INTO THE IDENTIFICATION OF ETHNIC BIAS IN ITEMS ASSESSING CURRENT EDUCATIONAL STATUS" (1981). ETD Collection for Fordham University. AAI8119762.