Ability Grouping, Tracking, and How Schools Work

Originally published on Brooking Institution's Brown Center Chalkboard, April 3, 2013

The 2013 Brown Center Report on American Education was released two weeks ago. One of the studies is on ability grouping. A key finding is that elementary teachers are using ability grouping again. Ability grouping is the practice of dividing classes into small instructional groups, especially for teaching reading. According to data collected by the National Assessment of Educational Progress (NAEP), the frequency of ability grouping’s use in fourth grade reading instruction rose about two and a half times, from 28 percent in 1998 to 71 percent in 2009.

This year marks the 30th anniversary of the publication of How Schools Work by Rebecca Barr and Robert Dreeben, a book in which ability grouping plays an important role. I became aware of the book at the University of Chicago in 1988 as a Ph.D. student. Robert Dreeben was my program advisor and dissertation chair.

Ability grouping is one method by which educators differentiate instruction. The term “differentiation” refers to the many ways that schools try to tailor different learning experiences to children’s varying levels of performance. In the 1980s, I earned a masters degree in special education and taught both learning handicapped and gifted students. Differentiation was in my blood when I arrived at Chicago.

Differentiation was also under fire. Ability grouping and tracking were becoming taboo. The popular research at that time, which was predominantly qualitative and impressionistic, condemned tracking and ability grouping for harming black, Hispanic, and economically disadvantaged students. This literature often depicted teachers as stupid or evil: stupid by robotically following tradition and unwittingly imposing harmful practices on students; evil by harboring race- or class-based prejudices that manifested in low expectations for many kids.

That is what made How Schools Work so refreshing. The book honors teachers in a profound way, not in a “you are all saints and we love you” way, but in a manner much more meaningful—by studying teachers’ work. Barr and Dreeben followed a group of Chicago first grade teachers as they taught reading. A wealth of data was collected so that hypotheses could be tested empirically. In How Schools Work, readers discover that first grade reading groups operate within a grand organizational scheme: groups nested in classrooms, classrooms housed within schools, schools situated within a big urban district. Seemingly routine tasks of teaching are transformed into thoughtful, important activities. Teachers do not appear to be stupid or evil. They appear to be professionals engaged in purposeful activities.

In 1988, “The Formation and Instruction of Ability Groups,” was published in the American Journal of Education. Adam Gamoran, a Chicago graduate student at the time, worked on the project producing this paper. Dreeben and Barr describe as “technological” the ways in which teachers form groups and then instruct them; not technological in the sense of using computers or electronic media but in the sense of applying craft knowledge in the pursuit of an occupational end, in this case, the goal of organizing a classroom full of first graders so that they can be taught how to read.

The notion that teaching is primarily intuitive (“teachers are born not made”) was directly refuted. When they teach reading, teachers must juggle four inputs, each with its own constraints –student aptitude, the difficulty of reading materials, time devoted to instruction, and coverage of curriculum. The combination of these four inputs must be expertly managed to optimize learning. Sure, sometimes teachers have to fly by the seat of their pants while teaching, but for most of time, they employ craft knowledge to attain just the right mix. Kids do in fact learn how to read, and first grade, more than any other grade, is where that wonderful accomplishment can be observed while it happens.

Teachers aren’t perfect. They can make mistakes. They can form groups that are too large, too small, or too unwieldy in composition; move groups too fast or too slow; teach from a curriculum that is too demanding or too easy; or fail to provide enough time for instruction. They can also be unfair – even bigoted – but that’s not the norm.

It is heartening to note that as the use of ability grouping is increasing a new generation of researchers is bringing sophisticated statistical techniques (and open minds) to bear on questions involving both ability grouping and tracking. Tracking, the middle and high school practice of grouping students into separate classes as opposed to grouping students within a class, has always drawn the most scholarly attention. And the most opprobrium.

In a recent NBER working paper, Courtney A. Collins and Li Gan classify Dallas schools as sorted or non-sorted based on the heterogeneity of classes in math or reading achievement. The study also considers heterogeneity in the dispersion of students identified as gifted and talented, limited English speaking, or special education. Sorting is found to produce significantly positive effects in both reading and math — and for both high and low achievers. The researchers conclude:

This study has valuable policy implications because unlike many school policy variables, the composition of classes can often be changed with little need for increased funds. A school with a fixed number of classrooms and teachers can increase efficiency by rearranging students in the most effective way possible. This study suggests that creating classes with lower levels of dispersion of score or ability level may improve the achievement outcomes for students across the score distribution (Collins and Gan, 2013, page 20).

The study joins a long line of research dating back to at least the 1920s. The overriding concerns have been to determine whether tracking and ability grouping are good or bad (whether they produce positive effects) and whether they are equitable (even if some students benefit, is it at the expense of others). The evidence on these questions is mixed. To adequately summarize the literature would require a series of posts, and I will return to this topic in the future. The main point I would like to make in concluding this post pertains to the renewed popularity of tracking and ability grouping, not to whether either practice is warranted by research.

In the late 1980s and into the 1990s, powerful groups condemned ability grouping and tracking, among them, the National Governors Association, the NAACP Legal Defense Fund, and the Children’s Defense Fund. The use of ability grouping dropped significantly in the 1990s. Tracking in middle schools declined in all subjects but math. According to the NAEP data reported in the Brown Center Report, ability grouping has made a strong comeback in the past decade. The resurgence of ability grouping accentuates the need for new research questions. If educators are going to use ability grouping again, how should they employ this tool so as to maximize potential benefits and minimize potential harms? How large should groups be? How many groups should a teacher create, and how much time should be spent with each one? Do low achieving groups require more direct instruction than high achieving groups? How often should students be assessed and regrouped? Are different curricula more effective with different groups? Notice the thrust of these inquiries. Such questions are directed towards producing new knowledge on the craft of teaching and to guide teachers in improving their practice, not towards the policy question of whether to group or not to group.

A fine example of this kind of study is provided by Carol McDonald Connor and colleagues at Florida State University. The researchers conducted a randomized field trial of software that organizes first grade reading instruction. The algorithm employed by the software considers each child’s entering skill level and progress made during the school year to recommend several dimensions of instruction, including assignment to small, homogeneous ability groups, the amount of time spent on code- versus meaning-focused literacy, and teacher/child versus child managed delivery. The targets for these recommendations are dynamic; that is, they change in response to periodic assessment of children’s progress. Children in the experimental classrooms gained about two months in reading achievement over those in the control group.

I hope the new generation of researchers will take up more questions like those in the FSU study. The debate over tracking and ability grouping has gone on for nearly a century. Research has not answered the key questions in dispute, at least not to the protagonists’ satisfaction. It’s time for some different questions. How should researchers proceed? A good place to start is reading How School’s Work. It’s just as fresh and illuminating today as when it was published thirty years ago.