Paper: Pitfalls of Measuring Comprehension with EGRA
Abstract
This paper examines fundamental validity threats in the measurement of reading comprehension within the Early Grade Reading Assessment (EGRA). Although EGRA comprehension outcomes are widely used to identify proficient readers, analyse fluency–comprehension relationships, establish fluency benchmarks, and inform instructional guidance, the technical assumptions underpinning these measures remain largely unexamined. Drawing on published evidence and new data from early grade reading evaluations in South Africa and Nepal, we show that the standard one-minute time limit used in oral reading fluency (ORF) tasks introduces substantial bias into comprehension scores. While extending reading time does not meaningfully alter fluency measures, it significantly increases the number of comprehension questions learners can attempt and shifts the distribution of comprehension performance—particularly for slower readers who may understand the text but cannot read fast enough to reach key items. We demonstrate that both common scoring approaches—percent correct of attempted items and percent correct of total items—embed untenable assumptions that respectively inflate and underestimate comprehension. The one-minute time limit also induces mechanical correlations between ORF and comprehension, artificially strengthening the apparent fluency–comprehension relationship. Finally, analysis of PIRLS and EGRA items reveals wide variation in item difficulty, undermining the reliability of aggregate comprehension scores and the benchmarks derived from them. Together, these findings call for caution in interpreting existing EGRA-based evidence, highlight the need to re-examine benchmark estimates and cross-language comparisons, and underscore the importance of extending reading time and analysing item-level difficulty to improve the validity of comprehension measurement.