With its Eugenics Roots, does Standardized Testing Have Any Place in Education Today?

by Deborah Blum

Deborah Blum is the director of the Knight Science Journalism Fellowship Program at MIT and the publisher of Undark.

Back in the year 2000, sitting in his small home office in California’s Mill Valley, surrounded by stacks of spreadsheets, Jay Rosner hit one of those dizzying moments of dismay. An attorney and the executive director of The Princeton Review Foundation, the philanthropic arm of the private test-preparation and tutoring company, The Princeton Review, Rosner was scheduled to give testimony in a highly charged affirmative action lawsuit against the University of Michigan. He knew the case, Grutter v. Bollinger, was eventually headed to the U.S. Supreme Court, but as he reviewed the paperwork, he discovered a daunting gap in his argument.

Rosner had been asked to explore potential racial and cultural biases baked into standardized testing. He believed such biases, which critics had been surfacing for years prior, were real, but in that moment, he felt himself coming up short. “I suddenly realized that I would be deposed on this issue,” he recalled, “and I had no data to support my hypothesis, only deductive reasoning.”

The punch of that realization still resonates. Rosner is the kind of guy who really likes data to stand behind his points, and he recalls an anxiety-infused hunt for some solid facts. Rosner was testifying about an entrance exam for law school, the LSAT, for which he could find no particulars. But he knew that a colleague had data on how students of different racial backgrounds answered specific questions on another powerful standardized test, the SAT, long used to help decide undergraduate admission to colleges — given in New York state. He decided he could use that information to make a case by analogy. The two scholars agreed to crunch some numbers.

Based on past history of test results, he knew that White students would overall have higher scores than Black students. Still, Rosner expected Black students to perform better on some questions. To his shock, he found no trace of such balance. The results were “incredibly uniform,” he said, skewing almost entirely in favor of White students. “Every single question except one in the New York state data on four SATs favored Whites over Blacks,” Rosner recalled.

There was something going on here, he thought: not with the students, but with the test.

Troubled and curious, Rosner then acquired SAT test data not just for New York, but for the entire United States, from two tests — one conducted in 1998 and another in 2000. The new data sets had information that could help him decipher how questions were chosen for use in the tests.

In making that inquiry, Rosner knew that all of the questions that contributed to a student’s final score had passed the SAT’s “pre-testing process,” meaning they had appeared in experimental sections of previous exams where they did not count. (Pre-testing questions are routinely inserted into SATs and students do not know which questions are being pre-tested.) Instead, they serve as trial runs — new questions that the makers of the SAT are considering adding to the official test in future updates, depending on data gathered from real-world exams. Using racial and gender data gathered from those real-world exams, Rosner then sought to infer whether there was an internal preference for pre-tested questions on which one racial group outperformed another.

Of 276 math and verbal questions that passed pre-testing and ended up in the official tests, Rosner found that White students outperformed Black students on every one. That outcome struck him as statistically impossible — unless the pre-test questions that White students excelled at were disproportionately making it into the final tests. While he had only limited data on pre-test questions themselves, it seemed obvious to Rosner that a selection bias was at work, with pre-test questions that Black students excelled at — which he called “Black questions” — being left on the cutting room floor. “It appears that none ever make it onto a scored section of the SAT,” Rosner wrote in a 2012 book chapter on the topic. “Black students may encounter Black questions, but only on unscored sections of the SAT.”

The reason, Rosner suggests, isn’t to intentionally give one group an advantage — though that’s the outcome just the same. “Each individual SAT question ETS chooses is required to parallel the outcomes of the test overall,” he continued in the book. “So, if high-scoring test-takers — who are more likely to be White (and male, and wealthy) — tend to answer the question correctly in pre-testing, it’s a worthy SAT question; if not, it’s thrown out. Race and ethnicity are not considered explicitly, but racially disparate scores drive question selection, which in turn reproduces racially disparate test results in an internally reinforcing cycle.”

‍Even today, Rosner describes his reaction to the disparity in a single word: “stunned.”

‍

Read entire article at Undark