# Number Rockets

## Blueprints Program Rating: Promising

A small group tutoring mathematics competency program for at-risk first grade students that includes computation, concepts, applications, and word problems.

## Program Developer/Owner

- Lynn S. Fuchs
- Vanderbilt University
- Department of Special Education
- 228 Peabody
- Vanderbilt University
- Nashville, TN 37203
- United States
- Lynn.Fuchs@vanderbilt.edu
- vkc.mc.vanderbilt.edu/numberrockets/

## Program Outcomes

- Academic Performance

## Program Specifics

- Academic Services
- Mentoring - Tutoring
- School - Individual Strategies
- School
- Selective Prevention (Elevated Risk)
- Late Childhood (5-11) - K/Elementary
- Male and Female
- All Race/Ethnicity
- School
- School: Poor academic performance*
- School: Instructional Practice
- Skill Oriented
- Significant improvement in mathematics performance for at-risk (AR) tutored students when compared to AR non-tutored controls in computation, calculation, concepts/applications, and story problems
- Significantly higher improvement in calculation and concepts/applications for AR tutored students than students identified as not at risk for mathematics difficulty
- A 40% reduction in prevalence of math disability by the end of first grade for AR tutored students as compared to AR non-tutored students
- Significant improvement in mathematics performance for at-risk (AR) tutored students when compared to AR non-tutored controls in simple arithmetic, complex calculations, number knowledge, and word problems
- A narrowing of the achievement gap in simple arithmetic skills between AR tutored students in the speeded practice condition and low-risk controls
- Improvement in use of retrieval as a strategy by AR tutored students, relative to AR non-tutored controls
- Reduction in counting errors for AR students in the speeded practice condition, relative to AR non-tutored controls
- Students receiving Number Rockets tutoring scored significantly higher (4.28 points) than control group students in mathematics proficiency as assessed by the Test of Early Mathematics Ability - Third Edition (TEMA-3)
- : Promising

## Program Type

## Program Setting

## Continuum of Intervention

## Program Goals

A small group tutoring mathematics competency program for at-risk first grade students that includes computation, concepts, applications, and word problems.

## Target Population

## Population Demographics

Number Rockets targets first grade students who are at risk for math difficulty. Participating students were screened and identified as at risk. Evaluations of the program have studied populations of children located in metropolitan and urban areas that are ethnically diverse.

## Age

## Gender

## Race/Ethnicity

## Race/Ethnicity/Gender Details

Evaluations did not parse out results by gender or race/ethnicity. However, the student populations evaluated were from ethnically diverse backgrounds. Two of the three studies had populations that were largely minority (African American and Hispanic), and the majority of main study's population was White(55%), followed by African American (34%).

## Risk/Protective Factor Domain

## Risk and Protective Factors

*Risk/Protective Factor was significantly impacted by the program.

See also: Number Rockets Logic Model (PDF)

## Brief Description of the Program

Number Rockets is a tutoring intervention for first-grade students identified as at risk for mathematics difficulty. The program is based on the concrete-representational-abstract model, which relies on concrete objects to promote conceptual learning. Tutors deliver the program to small groups of 2-3 students three times per week during the school day in 40-minute sessions, with 30 minutes of scripted mathematics instruction and activities, followed by 10 minutes of practice to build fluency. The 63 program lessons cover 17 topics, and topics include worksheets and manipulatives (e.g., Base-10 blocks for place value instruction). Behavior is monitored, and students earn rewards for demonstrating on-task behavior.

## Description of Program

Number Rockets is a tutoring intervention for first-grade students identified as at risk for mathematics difficulty. The program is based on the concrete-representational-abstract model, which relies on concrete objects to promote conceptual learning. Tutors deliver the program to small groups of students during the school day in 40-minute sessions, with 30 minutes of scripted mathematics instruction and activity followed by 10 minutes of practice to build fluency. The 63 lessons cover 17 topics, and lessons include worksheets and manipulatives (e.g., Base-10 blocks for place value instruction). Topics span: identifying and writing numbers; understanding less than, greater than, and equal; sequencing numbers, skip counting; place value; identifying operations; writing number sentences; addition and subtraction facts; 2-digit addition and subtraction; and missing addends. For many topics, if all students in the group demonstrate mastery the group may skip the remaining lessons for that topic and advance to the next topic.

During the final 10 minutes of each tutoring session, students complete drill and practice activities to help develop automatic retrieval of math facts, and students are taught efficient counting strategies as backups to automatic retrieval. Student behavior is monitored throughout each session. At varied intervals, students are awarded points for on-task behavior, and after a pre-determined number of points have been earned, students may trade points for prizes.

## Theoretical Orientation

## Brief Evaluation Methodology

All evaluation studies employed a randomized control trial design. Students in participating schools were screened to determine at-risk status. Students identified as at risk (AR) for mathematics difficulty were randomly assigned to tutoring or control conditions. AR students in the efficacy trials were compared to the control group and to students identified as not at risk (NAR) for mathematics difficulty. In the effectiveness study, the AR intervention group was compared to AR controls. Participating students were tested at baseline and again post-intervention. The original study (Fuchs et al., 2005) included 10 schools, Study 2 (Fuchs et al., 2013) included 40 schools across four cohorts, and Study 3 (Rolfus et al., 2012) included 76 schools in four states. The two efficacy studies included a large number of mathematics measures assessing computation, calculation, concepts, and applications. The primary outcome measure in the effectiveness study (Study 3) was overall math proficiency, as measured by the Test of Early Mathematics Ability (TEMA-3).

## Outcomes (Brief, over all studies)

Findings from the three evaluation studies show support for the Number Rockets program in elevating the math proficiency levels of students identified as at risk (AR) for math difficulties. When compared to groups of AR control students, AR students who received Number Rockets tutoring posted significant gains in a range of mathematics skills, including calculation, concepts/applications, word problems, computation, number knowledge, simple arithmetic, complex calculations, and overall math proficiency. In addition, in the original study (Fuchs et al., 2005), improvement of AR tutored students on calculation and concepts/applications exceeded improvement of students identified as not at risk for mathematics difficulty.

## Outcomes

Results of Number Rockets efficacy study 1 (Fuchs et al., 2005) showed:

Results of Number Rockets efficacy study 2 (Fuchs et al., 2013) showed:

Results of the effectiveness study (Rolfhus et al., 2012) show that:

## Effect Size

Effect sizes were calculated for each of the three studies. In the main study (Fuchs et al., 2005), multiple effect sizes were calculated. The tutoring program resulted in moderate to large effect sizes for the at-risk (AR) tutored students in comparison to the AR non-tutored controls on several measures: computation ES = .40; calculation ES = .57; story problems ES = .70; concepts/applications ES = .67. In addition, for calculation (ES = .61) and concepts/applications (ES = .45), tutored AR students' improvement exceeded the improvement of students identified as not at risk.

In study 2 (Fuchs et al., 2013), effect sizes varied by intervention condition (speeded vs. non-speeded practice) and by comparison group (non-tutored control group and low-risk group). Effect sizes for both intervention groups when compared to the non-tutored group were stronger on simple arithmetic and complex calculation (ES ranged from .38 to .87), than on number knowledge and word problems (ES ranged from .19 to .29). The speeded practice intervention group exceeded gains demonstrated by the low-risk group in one outcome measure: simple arithmetic (ES = .39). With regard to strategic behavior when solving arithmetic problems, effect sizes for counting were .17 for nonspeeded practice and .28 for speeded practice) to medium (retrieval ES = .42-.52) when compared to the non-tutored control group.

In the effectiveness study (Rolfhus et al., 2012), the effect size on math proficiency gains was slightly lower (.34) than effect sizes reported in the main study (range .40 - .70). This was expected, given the lower level of implementation fidelity typically achieved in a real world setting. In Study 3, average implementation fidelity across districts ad lessons was 85%, whereas in the efficacy studies implementation fidelity was 95% (Study 1) and 96% (Study 2).

These differences indicate that the Number Rockets tutoring program does improve math competency in first-grade students at risk for math difficulty.

## Generalizability

The generalizability of this program is limited to first-grade children who are at risk for math difficulty. In all of the evaluations, students were screened to identify their at-risk status. Cut points were defined through the study depending on multiple factors, and the determination of the eligibility for intervention may vary by site. Across studies, sites were heterogeneous and ethnically diverse. he original study (Fuchs et al., 2005) included 10 schools, Study 2 (Fuchs et al., 2013) included 40 schools across four cohorts, and Study 3 (Rolfus et al., 2012) included 76 schools in four states. However, in all three studies schools and districts were targeted and agreed to participate in the studies and therefore may not be representative of the larger population.

## Limitations

In the main study (Fuchs et al., 2005), results may be limited on the basis of the pretreatment scores of the at-risk (AR) population being in the low normal range on a pretreatment standardized instrument. Such measures provide an inadequate floor on student performance to identify AR samples in early grades. However, local normative information placed the study's AR students substantially below normal, so this normative information may be considered. Second, the tutoring intervention was an additive to the regular class math instruction. This was necessary to generalize findings to a response-to-intervention model, where tutoring is added to regular class instruction so that students receive two tiers of instruction. Therefore, one cannot draw the conclusion that tutoring can replace regular class instruction.

None of the studies reported follow-up results, so maintenance of program effects is unknown.

## Endorsements

## References

Fuchs, L. S., Compton, D. L., Fuchs, D., Paulsen, K., Bryant, J. D., & Hamlett, C. L. (2005). The prevention, identification, and cognitive determinants of math difficulty. *Journal of Educational Psychology, 97,* 493-513.

Fuchs, L. S., Geary, D. C., Compton, D. L., Fuchs, D., Schatschneider, C., Hamlett, C. L., ... Changas, P. (2013). Effects of first-grade number knowledge tutoring with contrasting forms of practice. *Journal of Educational Psychology, 105,* 58-77.

Gersten, R., Rolfhus, E., Clarke, B., Decker, L. E., Wilkins, C., & Dimino, J. (2015). Intervention for first graders with limited number knowledge: Large-scale replicaton of a randomized controlled trial. *American Educational Research Journal, 52*(3), 516-546.

Rolfhus, E., Gersten, R., Clarke, B., Decker, L. E., Wilkins, C., & Dimino, J. (2012). An evaluation of Number Rockets: a Tier-2 intervention for grade 1 students at risk for difficulties in mathematics. (NCEE 2012-4007). Washington, DC: National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Final report.

## Program Information Contact

Vanderbilt University

Attn: Lynn Davies / NUMBER ROCKETS

PMB #228

110 Magnolia Circle, Suite MRL 418

Nashville, TN 37203-5721

Phone: (615) 343-4782

Email: lynn.a.davies@vanderbilt.edu

Website: vkc.mc.vanderbilt.edu/numberrockets/

## Blueprints Certified Studies

**Study 1**

Fuchs, L. S., Compton, D. L., Fuchs, D., Paulsen, K., Bryant, J. D., & Hamlett, C. L. (2005). The prevention, identification, and cognitive determinants of math difficulty. *Journal of Educational Psychology, 97,* 493-513.

**Study 2**

Fuchs, L. S., Geary, D. C., Compton, D. L., Fuchs, D., Schatschneider, C., Hamlett, C. L., ... Changas, P. (2013). Effects of first-grade number knowledge tutoring with contrasting forms of practice. *Journal of Educational Psychology, 105,* 58-77.

## Study 1

**Fuchs et al., 2005**

This study examined the effects of a preventive tutoring program on math outcomes when added to the regular curriculum. Additionally, researchers sought to identify the prevalence and severity of math disabilities, both with and without the preventive tutoring, as a way to help define the disability on a variety of math outcomes, as well as determine the cognitive abilities associated with the development of competence in math.

**Evaluation Methodology**

** Design**: Ten schools in a metropolitan school district in the southeastern US participated in this study. Six of the schools were Title I funded. All first grade teachers in these schools and 667 students for whom parental consent was obtained were included. Teachers completed a questionnaire on whole class math instruction with the following results: teachers spent an average of 233 minutes per week on math instruction; lessons consisted of 17% review, 27% instructing on new content, 25% on guiding practice, and 24% on independent practice. There were also about 10 minutes of daily homework assigned. While primary guidance on math skill instruction came from using the district's curriculum standards, other sources of guidance included observations of student performance, classroom assessments, and the district basal. Whole-class instruction was the main method used, followed by small-group instruction, individual instruction, peer tutoring, and cooperative group work.

Students were identified as at-risk (AR) or not-at-risk (NAR) by testing all participating students in a whole-class format on the Curriculum-Based Measurement (CBM) Computation, Addition Fact Fluency, Subtraction Fact Fluency, and CBM Concept/Applications. Testing identified 308 lowest scoring students for individual testing, with an additional 11 nominated by teachers. Individual testing on these 319 children identified 139 as AR. Those students were then randomly assigned to control or tutoring intervention conditions, with blocking by classroom to ensure comparable distribution of AR students in each condition within classrooms. This created 4 pools of students: 1) 69 AR control students, 2) 70 AR tutored students, 3) 180 students who were tested individually then designated NAR, and 4) 348 NAR students only group-tested. Final group numbers (due to attrition) were, respectively: 63, 64, 145, and 292 (groups 3 & 4 were combined into one NAR group).

For the intervention, trained tutors worked with students in groups of two or three students (11 groups of two students, 16 groups of three students). Tutors attended a 2-day training workshop and then received weekly support throughout implementation. Lessons were taught using a scripted manual to promote fidelity. Tutoring sessions were also audiotaped. Each tutoring session lasted 30 minutes, followed by 10 minutes of student use of software called Math Flash, designed to improve automatic retrieval of math facts. There was a sequence of 17 scripted topics completed over 48 sessions (average number of sessions completed by students was 43.8), following the concrete-representational-abstract model. Each session included worksheet and manipulative activities. Each topic took approximately three sessions to complete, and mastery of each topic was assessed daily. Students also completed a cumulative review worksheet of each topic before moving on to the next topic. Student behavior was monitored during each session. Students were awarded points by the tutors for on-task behavior and were able to trade points for prizes. The Math Flash software was designed to provide students with repeated opportunities to hold associations between problem stems and their answers in working memory so the facts could be committed to long-term memory. The program targeted fact families (addition and subtraction facts with answers to 9) that increased in difficulty with mastery. Students were awarded points that translated to "prizes" in a "treasure box" on screen. While efforts were made to minimize the amount of regular math instruction time missed by participating in the tutoring intervention, students missed an average of 10.56 minutes of regular class math instruction.

** Sample Characteristics**: School characteristics included: teachers with an average of 17.8 years teaching experience and classrooms with an average of 18.32 students. All but one teacher were female and the majority of teachers (34 out of 41) were White (7 were African American). Student characteristics included: generally equal distribution of gender per group; majority of students were White, followed closely by African American, with small percentages of Hispanic and Other. NAR students performed higher on academic and intelligence measures, as expected, with AR students' scores across conditions being comparable.

* Measures*: Seven math measures were used; five were administered to intact classes and two were administered individually. The whole-class measures included the Curriculum-Based Measurement (CBM) Computation, Addition Fact Fluency, Subtraction Fact Fluency, First-Grade Concepts/Applications, and Story Problems. Scores indicated the number of correct answers on each measure. The two individual measures included the Woodcock-Johnson (WJ) III Applied Problems and WJ III Computation, which also yielded scores of correct answers. Measures were collected at pre- and post-test. In addition, the CBM Computation was administered weekly to whole classes. Reading skill was measured using the Woodcock Reading Mastery Test-Revised. The battery of cognitive assessments conducted in the individual assessments to determine the at risk students (including a teacher rating of attention) are not presented here but are described in the article, and included tests of intelligence, language ability, nonverbal problem solving, phonological processing, processing speed, concept formation, and working memory. Teachers also completed ratings of attention for each AR student. The cognitive variables examined to determine predictors of early math development included attention, language, nonverbal problem solving, phonological processing, processing speed, executive function, and working memory. Analyses were conducted on five of the seven outcome measures (WJ III Applied Problems and Subtraction Fact Fluency were not included because other measures in related domains provided better distributions).

* Analysis*: One-way ANOVAs were conducted on the pretest, posttest, and improvement scores of the seven math outcome measures, using condition as the factor. Effect sizes were also calculated. Analyses were not intent-to-treat. Multiple regressions were used to determine cognitive variables associated with early math development.

**Outcomes**

** Implementation Fidelity**: All tutoring sessions were audiotaped, and program topics 4 and 16 for each tutor were coded and checked for fidelity using checklists corresponding to the lessons. A second coder rechecked fidelity on a random 25% of the tapes, and agreement between coders was 88%. Program implementation fidelity across tutors was 96% for the first topic and 94% for the second topic checked.

** Baseline Equivalence and Differential Attrition**: There were baseline differences between the not at risk (NAR) and at risk (AR) groups on outcome measures. These differences were expected. There were no significant differences on outcome measures between the two AR conditions. There were a couple of significant differences between groups on the demographic variables. The NAR group had a higher proportion of White students compared to both AR groups, and the AR groups had a larger percentage of students receiving subsidized lunch, compared to students in the NAR group. There was no mention of analysis conducted to determine differences due to attrition between students who left the study (because they moved to a different school) and those who remained.

** Posttest**: Significant improvements were seen differentially across four of the seven math outcome measures. The greatest effect of the tutoring intervention was seen on the Woodcock Johnson III Calculation and First-Grade Concept/Application measures, where the improvements in AR intervention students exceeded even the students who were not at risk. On story problems, intervention students scored significantly better than the AR control students, although both AR groups were outscored by the NAR students. On the CBM Computation measure, AR students who participated in the intervention saw significant improvements in their scores compared to the AR control students, and their scores were comparable to students in the NAR condition. The status of the conditions on the remaining three measures did not change, with the NAR students performing higher than both AR groups, which scored comparably to each other.

Additionally,based on scores for the Woodcock-Johnson Calculations Test,the researchers identified a 40% reduction in prevalence of math disability at the end of first grade as a result of the math tutoring program (1.77% among non-tutored students vs. 1.06% among tutored students). Among a set of cognitive variables thought to be associated with early math development, the strongest predictor was attention, or distractibility, as rated by teachers. In terms of basic fact fluency, specifically, the unique predictors were attention and phonological processing. For end-of-year computation skill (WJ III Calculation and CBM Computation), attention was the strongest predictor for both, while working memory also predicted CBM Computation. Cognitive variables that predicted math measures in which students worked conceptually with numbers included attention, working memory, and nonverbal problem solving.

## Study 2

**Fuchs et al., 2013**

This evaluation assessed the effectiveness of the program using two different forms of practice compared to each other and a control condition. One form of practice (nonspeeded) emphasized the reinforcement of relations and principals that serve as the basis of reasoning strategies that support fact retrieval. The other form (speeded) promoted quick response and use of efficient counting procedures to generate correct responses to thereby form long-term representations to support retrieval. The evaluation also looked at mediators of effects and predictors of outcomes.

**Evaluation Methodology**

** Design**: Participants were recruited from a metropolitan school district in the Southeast across four cohorts in four consecutive school years. Forty schools with 233 classes of first graders were included, and students were screened to identify subsets of at-risk students. Of the 4,141 eligible students, 2,806 were screened. Students were excluded if they did not speak English (n=359) or if their standard intelligence scores fell below 80 on (n=59). From the remaining pool, 973 students were randomly sampled (648 at-risk and 325 low-risk), stratified by risk status and classroom. The at-risk students were randomly assigned at the individual level while stratified by class to the three conditions. During the year, there was a loss of 57 at-risk students (8.9%) and 25 low-risk students (7.7%) who moved to schools outside of the district.

Participants were randomly assigned to one of three conditions: a *speeded practice* condition (n=195), a *non-speeded practice* condition (n=190), and a control condition (206). A second control group consisting of a sample of low-risk, non-tutored classmates, was also included (n=300). The program was referred to as Galaxy Math, which is related to Number Rockets, and used a space theme. In the non-speeded practice condition, students were encouraged to use a variety of number principle strategies including, but not limited to, relying on number lists, arithmetic principles (cardinality, commutativity principle, and subtraction as the inverse of addition), and efficient counting procedures. In the speeded condition, Flash cards were used and efficient counting strategies were encouraged. In this condition, there was also an emphasis on executing the counting strategies quickly, whereas in the non-speeded condition, the focus was on executing the strategies thoughtfully to emphasize number knowledge. For both intervention groups, tutoring occurred for 16 weeks, 3 times per week, in 30-minute sessions. In both conditions, 25 minutes of each session consisted of number knowledge instruction, with the last 5 minutes consisting of practice, but differing by condition as described above. Tutors monitored behavior and rewarded students for displaying on-task behavior with stickers and prizes.

There were 79 tutors who worked with between four and seven students. Tutors attended a 2-day training workshop and then received weekly support throughout implementation. Lessons were taught using a scripted manual to promote fidelity. Tutoring sessions were also audiotaped.

* Sample Characteristics*: The students across the three at-risk conditions were predominantly African American (67%-73%) and most participated in the subsidized lunch program (80%-87%). Twelve to seventeen percent were identified as having a disability. Scores across the at-risk sample of students on the screening measures (Wide Range Achievement Test-3 (WRAT) and Weschsler Abbreviated Intelligence Scale (WASI-IQ)) were comparable. The students in the low-risk condition differed in ethnicity, where there were more White students then African American, and a lower proportion of students were on the subsidized lunch program (59%). Only 2% of low-risk students had an identified disability, and on screening scores, low-risk children exceeded those in each of the at-risk conditions.

** Measures**: Measures were collected pre-intervention and post-intervention. On measures of domain-general cognitive resources, the Woodcock Diagnostic Reading Battery-Listening Comprehension was used to measure the respondent's ability to understand sentences or passages; the Wechsler Abbreviated Intelligence Scale Matrix Reasoning scale was used to measure nonverbal reasoning with pattern completion, classification, analogy, and serial reasoning tasks; the Working Memory Test Battery for Children was used to assess the central executive (listening recall, counting recall, and backward digit recall), the phonological loop (digit recall, word list recall, and nonword list recall), and the visuospatial sketchpad (block recall, mazes memory); the Woodcock-Johnson Reading Battery-III Visual Matching was used to measure processing speed; and the Strength and Weaknesses of ADHD-Symptoms and Normal-Behavior Scale was used to measure attentive behavior. Reliability on all measures was .80 or higher. On the measure of strategic arithmetic behavior, the Addition Strategy Assessment was used to measure response times on simple addition problems, as well as classify the method by which the student solved the problem: counting fingers, verbal counting, retrieval, and decomposition. On measures of mathematics performance, the First-Grade Mathematics Assessment Battery, Arithmetic Combinations was used to measure simple arithmetic problems, while the Double-Digit Addition and Subtraction subsets were used to measure more complex calculations; the Number Sets Test was used to measure number knowledge (understanding of number, numeral mapping, and partitioning sets); Word Problems was used to measure responses using simple arithmetic on word problems. Reliability on these measures was .86 or higher.

* Analysis*: Analysis was intent-to-treat, but the students from the original sample who lacked data because they moved to schools outside of the school district were dropped from the final sample. A two-level residualized change approach was used to analyze effects of study condition: students were nested in classrooms, and pretest performance was used as a covariate along with the main effect of study condition to predict posttest performance. Effect sizes were also calculated.

**Outcomes**

** Implementation Fidelity**: Research staff listened to a randomly selected lesson for each tutor in each condition on a weekly basis, and provided corrective feedback as needed. Additionally, about 16% of the tapes were sampled and coded for fidelity, with a 96% coder agreement. The mean percentage of points addressed was 97.6 in speeded practice and 97.7 in nonspeeded practice.

** Baseline Equivalence and Differential Attrition**: MCAR testing indicated that missing data occurred at random, so students who left the district prior to the end of the study were dropped without introducing significant bias. For the remaining data, there were no missing data. As noted above, students in the low-risk control condition were different than the at-risk children with regard to ethnicity and participation in the subsidized lunch program, and they exceeded the at-risk children on screening scores (reading and math ability, and intelligence), but there were no significant differences among the at-risk study groups.

** Posttest**: There were significant improvements among the treatment groups across all four mathematics outcomes. Both tutoring conditions exceeded students in the at-risk control condition on simple arithmetic. Students in the speeded tutoring group outperformed the students in the nonspeeded tutoring group, and they also narrowed the achievement gap with respect to low-risk classmates. Similar results were seen with complex calculations; the only difference being the speeded group did not narrow the achievement gap with the low-risk group. However, the achievement gap widened between the low-risk group and the nonspeeded tutoring and the at-risk control groups. Children in both tutoring groups exceeded the at-risk control group on gains in number knowledge and word problems, and results between the tutoring groups were comparable. However, on these two measures, the students across all three at-risk conditions lost ground with respect to the low-risk students.

On measures of children's strategic behavior when solving simple arithmetic problems, there were significant program effects on counting errors and retrieval. For counting errors, the speeded condition saw greater improvements over the at-risk no-tutoring group than the nonspeeded condition, although gains in both tutoring groups were comparable. However, these gains did not outpace the low-risk control group's gains. Gains between the two tutoring conditions with respect to retrieval were comparable, although only the gains among the speeded practice group kept pace with those of the low-risk control group, whereas the other two conditions lost ground.

Mediation analysis showed an effect of retrieval in mediating the effects of tutoring on arithmetic for both the nonspeeded and speeded practice groups. Evaluation of pretest cognitive resources in predicting improvements in simple arithmetic showed an interaction between nonverbal reasoning and condition. Variables shown to be predictors of posttest improvements were attentive behavior and central executive working memory. Furthermore, these predictors were more significant among the students in the nonspeeded practice condition.

## Study 3

**Rolfhus et al., 2012**

**Gersten et al., 2015**

This study was a large-scale effectiveness trial of Number Rockets that was implemented in 76 schools in four school districts across four states. Random assignment was used. Local tutors with a range of experience working with at-risk youth were used to implement the program. Additionally, each of the four districts used a different core math curriculum. Outcomes for intervention students were compared to a control condition. In addition to group differences on outcomes, three exploratory research questions were examined: whether there were differential outcome differences of tutoring based on baseline math proficiency; whether program participation, which resulted in intervention students missing regular instruction time, had a differential effect on word reading skill, compared to control students; and whether dosage had an impact on program effects.

**Evaluation Methodology**

* Design*: Participants were first-grade students at risk for math difficulties from 76 schools in four urban school districts across four of the five Regional Educational Laboratory Southwest states, using school as the unit of assignment. Schools were matched within each district on a composite score calculated from mean school achievement scores and the percentage of students receiving free or reduced price lunch. From each pair, one school was assigned to the intervention condition while the other was assigned to the control condition. Eligible students with parental consent were screened on six measures to determine at-risk status: solving computation problems, concept/application problems, brief story problems, number sense, comparative judgments of numerical magnitude, and working memory. Students were deemed at-risk if their composite score was below the sample's 35th percentile. A total of 994 students (615 intervention, 379 control) met this criteria and participated in the study. Intervention students were assigned to tutoring groups of two to three students and tutoring groups met three or more times per week for approximately 17 weeks for a total of 45 lessons. Program implementation ran from December 2008 to May 2009. All students received regular core mathematics. Eighty-six tutors provided program instruction. All were recruited locally and were primarily retired or substitute teachers and had at least a bachelor's degree. Tutors received a one-day training followed by two 2-hour follow-up trainings. Trained district coaches were also available throughout implementation for supervision and technical assistance. Each tutor had an average of 2.8 tutoring groups.

* Sample Characteristics*: Study participants were predominantly minority (44% Black, 46% Hispanic). About half (49%) of the students were female and 35% qualified for free/reduced-price lunch. About 8% of students had an IEP.

* Measures*: The Test of Early Mathematics Ability-Third Edition (TEMA-3) was used to measure student proficiency in math, the primary outcome measure. Math skills tested include numbering, number comparisons, concepts, numeral literacy, number facts, and calculation. Reliability of this measure is alpha = .95. The Woodcock Johnson-III Letter/Word subtest was used to measure reading fluency. Measures were collected at post-intervention. Reliability of this measure is alpha = .98.

* Analysis*: Analysis was intent-to-treat using multiple imputation. Hierarchical linear modeling was used to determine outcome differences.

**Outcomes
**: Fidelity of implementation was measured in three ways. First, lesson fidelity checklists were used, where tutors audiotaped each session and four sessions were randomly evaluated by coding the checklists. Across all districts throughout implementation, average lesson fidelity was 85%. Next, instructional logs were completed, in which tutors reported on lessons completed and length of lesson. Across study districts, an average of 48.1 lessons was completed, with 32.4% of groups completing all 17 program topics. Finally, tutors collected classroom instruction checklists, where teachers recorded the instructional activities that intervention students would miss. This ensured that intervention students did not miss regular math instructional time, as prescribed in implementation guidelines. Checklists indicated that 11.4% of the classroom activities missed by students for program implementation were the whole-class math instruction.

*Implementation Fidelity*

** Baseline Equivalence and Differential Attrition**: There was a significant difference in the percentage of consent forms returned from the intervention schools compared to the control schools. There were significant differences between the intervention and control conditions with regard to race/ethnicity and for grade 1 enrollment, which was higher in intervention schools. For students who were identified as at-risk, there were no statistically significant differences found between the intervention and control groups on either demographic variables or mean screener composite scores. Posttest data was available for 90% (n=555) of the intervention group and 86% (n=326) of the control group and there were no significant baseline differences between groups for whom posttest data were available.

** Posttest**: There was a statistically significant difference found on the main outcome variable, TEMA-3 math proficiency scores, in which students in the intervention group scored an average of 4.28 points higher (ES=.34) than their counterparts in the control group. Exploratory analyses findings indicate that 1) there were no differential impacts of the intervention on students based on baseline math proficiency; 2) there were no differential impacts of the intervention on reading proficiency (as measure by the Woodcock Johnson-III Letter/Word subtest) due to students missing classroom reading instruction for participating in the intervention; and 3) there were no differential impacts of the intervention based on dosage, where student test scores on math proficiency were not significantly higher for students who received more sessions of Number Rockets than students who received fewer sessions of the program.

**Limitations
** Several study limitations were noted. First, researchers were unable to determine whether program outcomes were due specifically to the Number Rockets program or were due to just the additional math instruction time the at-risk students received. Second, parental consent was required for study participation, and there were differential return rates between intervention and control sites. While there were no significant differences between groups on observed demographic characteristics or screener composite scores, there is no way to know whether the differential consent rate influenced baseline equivalence on unobserved characteristics. Additionally, since both parents as well as the school districts volunteered to participate in the study, generalization of results to the larger population should be viewed with caution. Another limitation was the exclusion of Spanish-speaking students due to the materials not being available in Spanish. These students represented between 1-29% of students across all grades. Further, maintenance of effects is unknown, as long-term follow-up results were not examined.