Wrong "Correct" Answers: The Scourge of the NAEP





Paul Burke is a statistician and President of NumbersInstitute.com. He has measured textbooks' effectiveness, published statistical posters on history, and taught in West Africa.

Related Links


The National Assessment of Educational Progress (NAEP) has three big flaws in measuring history knowledge.  First, students get neither a grade nor any result from the test, so student effort is entirely voluntary.  Second, NAEP seeks proficiency on so many topics it encourages shallowness, not depth.  Third, NAEP marks wrong many of the correct answers.

For example a NAEP question on Lincoln showed fourth graders a picture and asked:

Identify the President in the picture above and give two reasons why he is important in American history.

The name of the President is:

Two reasons he is important in American history (other than being President) are:

The scoring guide says:

Credited responses could include:
He stopped slavery/freed the slaves.
He led the Union to defeat the Confederate States.
He gave the Gettysburg Address.

94% identified Lincoln in the picture.  That's a start.

NAEP said only 9% of students gave two correct reasons why Lincoln was important.  We may agree history knowledge in the middle of fourth grade is incomplete, but 9% seems extreme and suggests faulty scoring.  Fortunately the NAEP website gives examples of answers they marked right and wrong.

NAEP marked wrong "I think he tryed [sic] to help slaves escape."  Many would say that was indeed one purpose of the Emancipation Proclamation:  to tell slaves in the Confederacy (he did not initially free Northern slaves) that they would be free if they could reach Union lines.  They no longer had to reach Canada.

NAEP also marked wrong "He fought in some wars," because the answer does not mention the Civil War.  His fighting in the Black Hawk War was considered important at the time, and arguably as commander in chief one could say he fought in the Civil War as much as some desk-bound generals and admirals, so the answer seems right, and about as specific as you'd expect in fourth grade.

NAEP also marked wrong "He helped the Congress grow enormously."  While it did not grow in size, the war, tariffs, Homestead Act, National Banking Act, Pacific Railway Act and Department of Agriculture did add enormously to federal and congressional power.  If a student said Lincoln helped the presidency grow enormously, that should certainly be marked right, and Congress grew right along with the presidency.  The student may have meant the federal government in general, not just Congress, but either way, this is a pretty good answer for a fourth grader on a voluntary test. If some fourth graders know that federal power grew in the war, I despair that NAEP marks them wrong.  NAEP cannot carp about the precise wording, when they give full credit for "He stopped slavery," which is only partly true.

I have to believe NAEP chose to display these specific wrong answers, because they thought these were some of the clearest examples of wrong answers.  So thousands of other "wrong" answers must have been at least as right.

The "Credited responses" above omit the traditional idea that Lincoln caused the Civil War by refusing to recognize secession, that he was assassinated, suspended habeas corpus, grew up poor, and kept his pledge to serve only one term in Congress.  Were these marked wrong too?

NAEP asked high school seniors a question on Korea:

During the Korean War, United Nations forces made up largely of troops from the United States and South Korea fought against troops from North Korea and

A.   the Soviet Union
B.   Japan
C.   China
D.   Vietnam

The correct answer is C.

Various sources say Soviet aviators fought in the war and were responsible for shooting down American aircraft and killing American pilots.  The word "troops" includes any individual in any branch of the military.  I do not know how many students are aware of the Soviet aviators, but they know during this period Soviets would not sit idly by.  Modern students do know that in a big war, people come from many countries to participate.

While China is not the only right answer, I wish more students had picked it.  Undoubtedly the Korean War gets little teaching; it was long ago, and we have fought many wars.  Our history with China deserves more emphasis, from the clipper ship trade to the present.

NAEP asked twelfth graders a series of three questions about a quote from 1954:  82% were right on the first question, naming Brown v. Board of Education.  Only 2% were marked correct on the third question:

Based on the quotation and your knowledge of history, describe the conditions that this 1954 decision was designed to correct.  Be as specific as possible in your answer.

They were marked wrong when they answered:  "The Brown girl had to walk past the white school every day to get to her 'equal' school.  Her father took the issue to court - Separate but equal is not really equal."  "It was designed to correct segregation in schools."  Or "racism in our public schools."  Most people outside NAEP would think these three answers do indeed describe the conditions in 1954.

But NAEP cavils:  "response mentions that having separate schools for African American and White students is supposed to be equal, but that it is not really equal.  However, it does not say that the Brown case was meant to desegregate schools specifically, making this a partial explanation and not a complete explanation.  The second response says that the Brown case was 'designed to correct segregation in schools,' but it does not directly explain what that correction would be.  Their reference to schools and segregation indicate a partial understanding of the question."

In other words NAEP wants students not only to identify the conditions, but also how the case was going to fix them.  First, the question was just about conditions, not fixes.  Second, most people understand that when separate schools are found to be wrong, it goes without saying desegregation is the fix.  Third, the 1954 decision did not include any fix; that was in the 1955 decision.  Students do not need to know this, but experts at NAEP should not confuse the two years.  In a sense, NAEP rejects correct answers about segregation, and requires wrong answers about fixes.

NAEP, and the state education agencies which review questions, and the newspapers which blindly quote results, show their own lack of historical knowledge.  The Washington Post lamented in an editorial the 2% figure on student knowledge of conditions in 1954, and put the Korea question on its website for public use.  The New York Times put the Korea and Lincoln questions in its opening paragraph.  Do the Post and Times have no military or education experts to check facts?

NAEP, agencies, and newspapers do have an incentive to do their best, since their credibility is at stake; yet they fail. Their reports also ignore Wineburg's point that questions which everyone can answer are edited out of the test, to save time, which keeps people from reporting what everyone does know.  Researchers in other fields, like health, labor and housing, leave some of those questions in, for completeness.

For students these tests do not count towards a grade, or even a star, so while students have to show up, they have no incentive to give their best answers, or indeed to make any effort whatsoever.  The tests are given in late winter, often a time of low energy and motivation in school anyway.  The tests are one more intrusion in the most dismal part of the school year.  It would not be surprising for students to put little energy in the answers, and even to give wrong answers as a protest.

Imagine the improvement if a sample of 200 students received $10 for each right answer.  On a 30-item test the extra cost would be less than $60,000, and we would know a lot more about what students can do.  Clearly someone other than NAEP needs to mark the results; maybe a parent group, which would want to find real issues, not justify a federal role by pulling defeat from the jaws of victory.

Another approach is to release results for sample questions from the ACT and SAT, where students applying for college already try their best.  They only represent college-bound students, but they are real data on an important population, though the testers may not want such scrutiny of their questions.

It is not sensible to rely on, or spend $130 million per year on, a test where students have no stake in doing well.

The Lincoln, Korea and 1954 questions are just three from a vast field.  NAEP tries to measure whether everyone knows a smattering of everything.  We need more measurement of areas of depth.

Many years ago I taught in Ghana, which let students choose a few questions, for example answering 5 out of 16 essay questions on West African history.  This approach let teachers and students specialize, instead of having shallow knowledge of everything.

Colleges could do cluster analysis of tests like the ACT and SAT to see how many students have specialized in certain topics.  If some students throughout the country have deep knowledge of our history with China, and others specialize in segregation, then public discussion of these issues will be informed.  ACT and SAT could even report scores for depth.  Measuring depth will encourage schools to provide it.

And finally, lest historians think NAEP's trouble with right and wrong answers is unique to subjective fields like history, rest assured math has the same trouble. NAEP asks students to explain in words why one figure has the longest perimeter, compared to two other figures, and rejects "Because the sides of P are longer than those of N and Q."  Over 20 years ago NAEP was ignoring negative square roots, misunderstanding interest on loans, and extending a graph beyond the limitations of the data.



comments powered by Disqus

Subscribe to our mailing list