Practitioners in the field of education should become dissatisfied with the simplistic “checkbox” approach to evaluating program effectiveness. Merely determining whether something “works” or not fails to capture the nuances and complexities of different applications of the program, different usages, and the resultant outcomes. It is time to leave behind that old checkbox mentality and adopt a more comprehensive and useful perspective.

P.S. anyhow: every serious program can eventually be shown to “work” when used its own specific way, on its own specific application and students, to achieve its own specific outcomes. At least once.

The focus must shift from dominantly assessing the rigor of individual studies to considering a wide range of credible evidence that demonstrates a program’s effectiveness in real-world classroom settings. As someone who has evaluated and published findings on MIND Education’s programs for over 20 years, I can distill the essential factors that should matter to practitioners. These factors are encapsulated in the PRIME acronym, representing Patterns & Repeatability, Implementation, Mix, and Equity.

Patterns & Repeatability

Patterns & Repeatability are fundamental to establishing the reliability of any study finding. Conducting numerous comparable studies, standardized in their approach, on an annual basis enables the identification of usage and impact patterns, as well as their repeatability over time. These patterns provide crucial insights for practitioners, encompassing variations in usage such as amount, rates, regularity, and interactions with different student learner types. Additionally, they shed light on outcome patterns, including effects at various grade levels, on different assessments, and for students at different performance levels. Conducting annual assessments is vital due to the potential for significant changes over time, such as program revisions, evolving standards or assessments, and external changes in the education ecosystem.


Implementation goes beyond the simple assignment of licenses as the “treatment” condition, and involves evaluating the impact of actual program usage at various levels. A single study at some unspecified usage level that finds that assignment of program licenses “works” is insufficient. No program can overcome subsequent implementation that is haphazard, or does not meet the program’s empirically determined minimum requirements. Evaluating and reporting the variations in outcome impact due to the amount of program usage is essential. Real-world educational implementations vary across schools, grades, classrooms, and student populations, and it is crucial to explore implementation-related metrics such as completion rates, regularity of use, variations in usage, and the interaction between usage and student types.


The concept of Mix emphasizes the need for a high volume of studies that cover a diverse range of educational settings, applications, and use cases. This goes beyond repeatability and focuses on the validity and relevance of study findings to practitioners’ specific contexts. To compile the necessary mix in such a varied ecosystem, it is useful to study everyone, everywhere, while providing standardized reports including substudies, to gain a comprehensive understanding of the relevance of program effectiveness findings. Quasi-experimental methods also allow for the inclusion of “non-experimental” users as an important part of the mix. Real world use situations may be quite different from experimental pilots.


Equity recognizes that not all students are the same as some “average” student, and that different student subgroups may have distinct needs and experiences. It is crucial not to settle for a checkbox approach that only reports out the overall “main effect” on the average student. Evaluating the program’s impact on each key student subgroup is necessary to understanding its effectiveness. The impact doesn’t need to be identical among subgroups, but it is essential to be aware of any disparities to address equity concerns.

To achieve a high volume of studies, the most feasible rigorous approach is quasi-experimental group-comparison studies conducted at the school grade-level-cluster unit of analysis. This method, allowed and prescribed by the What Works Clearinghouse (WWC), also meets ESSA Tier II evidence standards. By evaluating entire school-grade cohorts using this method, universal state-level data availability at grades 3-8 for math and reading can be leveraged. This provides a substantial longitudinal “control pool” for rigorous comparisons to any set of user schools, starting at any time, including baseline matching based on school-level demographics. Combining outcomes from various state assessments is possible, increasing the study sample size and allowing for national analyses including even states with low numbers of schools.

MIND Research Institute has collaborated with WestEd to validate these rigorous QE methods, which have been implemented every year since 2008 to evaluate every new school cohort using the ST Math program. This QE WestEd study is verified to meet What Works Clearinghouse (WWC) quality evidence standards by SRI.

In summary, by focusing on the key elements of PRIME, administrators can make well-informed decisions regarding the adoption and implementation of educational technology programs. There is no need to become an expert in navigating academic research papers. Instead, practitioners should use the PRIME framework as a guide to request specific types of information from program providers or researchers. This knowledge empowers administrators to assess the program’s requirements and potential risks, enabling them to determine how well it aligns with their specific needs and the likelihood of achieving real-world use and results.

This shift from a simplistic checkbox approach to a comprehensive evaluation suite of information will greatly empower program evaluation and selection. By embracing PRIME, educators can move beyond the limitations of a narrow “works or not” mindset and gain deeper insights into program effectiveness, implementation strategies, and equity considerations.