This post originally appeared in an EdSurge guide, Measuring Efficacy in Ed Tech. Similar content, from a perspective about sharing accountability that teachers alone have shouldered, is in this prior post.
Curriculum-wide programs purchased by districts need to show that they work. Even products aimed mainly at efficiency or access should at minimum show that they can maintain status quo results. Rigorous evaluations have been complex, expensive and time-consuming at the student-level. However, given a digital math or reading program that has reached a scale of 30 or more sites statewide, there is a straightforward yet rigorous evaluation method using public, grade-average proficiencies, which can be applied post-adoption. The method enables not only districts, but also publishers to hold their programs accountable for results, in any year and for any state.
Three ingredients come together to enable this cost-effective evaluation method: annual school grade-average proficiencies in math and reading for each grade posted by each state, a program adopted across all classrooms in each using grade at each school, and digital records of grade average program usage. In my experience, school cohorts of 30 or more sites using a program across a state can be statistically evaluated. Once methods and state posted data are in place, the marginal cost and time per state-level evaluation can be as little as a few man-weeks.
A recently published WestEd study of MIND Research Institute’s ST Math, a supplemental digital math curriculum using visualization (disclosure: I am Chief Strategist for MIND Research) validates and exemplifies this method of evaluating grade-average changes longitudinally, aggregating program usage across 58 districts and 212 schools. In alignment with this methodological validation, in 2014 MIND began evaluating all new implementations of its elementary grade ST Math program in any state with 20 or more implementing grades (from grades 3, 4, and 5).
Clearly, evaluations of every program, every year have not been the prior market norm: it wasn’t possible before annual assessment and school proficiency posting requirements, and wasn’t possible before digital program usage measurements. Moreover, the education market has greatly discounted the possibility that curriculum makes all that much difference to outcomes, to the extent of not even trying to uniformly record what programs are being used by what schools. (Choosing Blindly: Instructional Materials, Teacher Effectiveness, and the Common Core by Matthew Chingos and Russ Whitehurst crisply and logically highlights this “scandalous lack of information” on usage and evaluation of instructional materials, as well as pointing out the high value of improving knowledge in this area.)
But publishers themselves are now in a position, in many cases, to aggregate their own digital program usage records from schools across districts, and generate timely, rigorous, standardized evaluations of their own products, using any state’s posted grade-level assessment data. It may be too early or too risky for many publishers. Currently, even just one rigorous, student-level study can serve as sufficient proof for a product. It’s an unnecessary risk for publishers to seek more universal, annual product accountability. It would be as surprising as if, were the anonymized data available, a fitness company started evaluating and publishing its overall average annual fitness impact on club member cohorts, by usage. By observation of the health club market, this level of accountability is neither a market requirement, nor even dreamed of. No reason for those providers to take on extra accountability.
But while we may accept that member-paid health clubs are not accountable for average health improvements, we need not accept that digital content’s contribution to learning outcomes in public schools goes unaccounted for. And universal content evaluation, enabled for digital programs, can launch a continuous improvement cycle, both for content publishers and for supporting teachers.
Once rigorous program evaluations start becoming commonplace, there will be many findings which lack statistical significance, and even some outright failures. Good to know. We will find that some local district implementation choices, as evidenced by digital usage patterns, turn out to be make-or-break for any given program’s success. Where and when robust teacher and student success is found, and as confidence is built, programs and implementation expertise can also start to be baked into sustained district pedagogical strategies and professional development.