Luther Spoehr: Review of William J. Reese's “Testing Wars in the Public Schools: A Forgotten History” (Harvard University Press, 2013.)

Luther Spoehr, an HNN book editor and senior lecturer at Brown University, teaches the history of American school reform.

When the examiners for the Boston School Committee visited the city’s public schools at the end of the 1845 school year, they brought along a surprise. Instead of the standard public questioning, recitation, and exhibitions, students were to take written examinations. This development, says William Reese, a distinguished historian of education at the University of Wisconsin-Madison, was as revolutionary as it was unexpected. Linked as it was to other reforms advocated by Horace Mann, Secretary of the Massachusetts Board of Education, and his allies in Boston, it was instrumental in establishing the template for public schooling across the country that still exists today.

Mann was, in Reese’s words, “a pivotal if shadowy figure” in this particular revolution, and Reese spends more time on Mann’s Boston allies, including Mayor Josiah Quincy, Jr. (who appointed the committees), Samuel Gridley Howe, and other Whigs who, despite the schools’ reputation as “the pride and glory of the city,” and despite not really commanding a majority on the School Committee itself, wanted to improve what they saw as shoddy practice and dubious governance in the schools. They wanted to appoint a superintendent responsible for all public schools; each school to have an accountable principal (rather than two “masters” per school); abolition of corporal punishment; and more professional teachers, preferably women, trained in “normal schools.” “Reformers,” says Reese, “believed that competitive testing, enlightened pedagogy, and humane forms of discipline were fully compatible and mutually reinforcing.”

In the decades before 1845, closing exercises had offered more ritual than rigor. Students undergoing public examinations were given questions in advance and knew which ones they would have to answer. Recitations and exhibitions were calculated to please parents and the public, but were not expected to assess all students’ competence. The new written exams and the analysis of them that followed changed all that. Samuel Gridley Howe’s report on reading was “based on the test scores of 530 pupils, all in the top division of the first classes.” It showed how students did on each question and provided a “cumulative ranking of the schools.” (The best results came from a Roxbury school for girls.) Showing a taste for the emerging field of statistics, the committees announced that, in all, 31,159 answers were graded. “Despite grading leniently, the examiners still identified 2,801 grammatical and 3,733 spelling errors and 35,947 mistakes in punctuation. The cumulative average score was 30 percent.”

Publication of the committees’ reports caused an uproar, made louder by the fact that the schools had been given no advance notice (try THAT today!) and newspapers’ love of controversy. Incumbent masters roused themselves to protect their positions; the public was not easily persuaded that the schools were inadequate and predictably resisted the possibility that it would be necessary to spend more tax money on them. The reformers insisted that their findings were valid, based on the most scientific approaches available to them, especially the nascent field of statistics, and even phrenology (late in the 19^th century, after the dust had settled, Mann was still being celebrated for his “magnificent forehead”).

Viewed from today’s perspective, the tests and subsequent analysis seem rudimentary indeed. Reese notes that while some questions “sought to show which classes best learned to think, analyze, comprehend and understand,” most were “short answer and rote,” based on textbooks that had often been in use for many years. Overall, they ranged “from basic to trivial to enigmatic” and rewarded memorization more than anything else. Students were asked to “Name the principal lakes of North America” (84 percent got that one right), but only seven percent could identify “the rivers, gulfs, oceans, seas and straits, through which a vessel must pass in going from Pittsburgh in Pennsylvania, to Vienna in Austria.” My personal favorite: “On which bank of the Ohio is Cincinnati, on the right or left?” Like Reese, I wonder how many students thought to themselves, “It depends on where you are standing.”

The reformers’ statistics were no more refined. “A product of their time, examiners could know nothing about appropriate procedures for sampling or how to strive for test validity and reliability.” Although they identified schools as either single-sex or coeducational, “the examiners seemed not to notice that girls often outperformed the boys,” nor did they take into account the possible significance of age differences among the test-takers.

The ongoing debate over the meaning and importance of the these “high-stakes” tests has a distinctly modern ring. Dissenters charged that they would narrow the curriculum and actually make teachers rely even more on memorization and rote learning. Reese agrees: “In the 1840s, although Boston’s reformers had championed written tests to expose overreliance on rote teaching and textbooks, the new exam system only aggravated the problem.”

In the second half of the 19^th century, “testing spread from city to city, starting on the East Coast, then traveling near and far.” Unlike today, when federal and state requirements, including those based in No Child Left Behind, drive the testing bus, this expansion was primarily a matter of system-by-system emulation. Technological change helped make it possible: “In the 1850s and 1860s, superintendents often pressured committeemen to furnish schools with more blackboards, slates, pencils and paper. Pens became more common later in the century.” And driving it all was an emerging cultural constant, “a rising national faith in mathematical precision [which] undergirded everything from standard gauge railroad tracks to properly graded bushels of wheat to factory-made consumer goods.” That faith, of course, is with us yet (even though it no longer extends to phrenology).

Interestingly, however, using a single written examination to determine whether a student passed or failed a grade proved too controversial even for Boston. But even as that practice faded, written exams replaced other forms of evaluation in virtually every other venue, from individual courses to high school entrance. Bostonians complained in the 1880s about “incessant testing”; elsewhere it was lamented as an “annual torture.” In 1888, a Syracuse high school teacher described what was by then a familiar “examination room” scene: “Youthful faces are screwed into all sorts of hard knots; hair is made to stand on end, presumably for a free passage of ideas; heads are held together as if to prevent them from bursting….[Some students] are looking furtively around as if to discover whether the coast is clear for examining certain formulae inscribed in microscopic characters on cuffs, fingernails, and pinafores. Some are eating pencil tops.” Sound familiar?

Certainly many of the arguments for and against testing have a contemporary ring. It’s not hard to hear the voice of Ted Sizer in Francis W. Parker’s objections to standardized testing (and Sizer, of course, named a school after Parker.) But one of the strengths of Reese’s admirably researched and clearly written (if sometimes repetitive) book is his ability both to capture those apparently timeless arguments and to place the earlier controversies in proper historical context. As in the famous Sherlock Holmes mystery, the dogs that don’t bark are significant: the federal and state governments are largely absent as players here; reform wasn’t driven by an obsession with foreign competition, nor by modern understandings in psychology and cognitive science; the bureaucratic systems being put into place were so embryonic even at the end of the 19^th century that their shape in the early 21^st century could hardly have been anticipated.

Mark Twain famously said, “History does not repeat itself, but it does rhyme.” As Reese rightly concludes, “Knowledge about the fascinating and long-forgotten history of testing seems indispensable to anyone who seeks humane, informed, and sensible policies to improve the lives of students and teachers.” His engaging book is valuable both as history and as cautionary tale.