The Dangerous Delusion of the Big Data Utopia

by Jill Lepore

Jill Lepore, a staff writer at The New Yorker, is a professor of history at Harvard. Her books include These Truths: A History of the United States and If Then: How the Simulmatics Corporation Invented the Future.

One unlikely day during the empty-belly years of the Great Depression, an advertisement appeared in the smeared, smashed-ant font of the New York Times’ classifieds:

WANTED. Five hundred college graduates, male, to perform secretarial work of a pleasing nature. Salary adequate to their position. Five-year contract.

Thousands of desperate, out-of-work bachelors of arts applied; five hundred were hired (“they were mainly plodders, good men, but not brilliant”). They went to work for a mysterious Elon Musk-like millionaire who was devising “a new plan of universal knowledge.” In a remote manor in Pennsylvania, each man read three hundred books a year, after which the books were burned to heat the manor. At the end of five years, the men, having collectively read three-quarters of a million books, were each to receive fifty thousand dollars. But when, one by one, they went to an office in New York City to pick up their paychecks, they would encounter a surgeon ready to remove their brains, stick them in glass jars, and ship them to that spooky manor in Pennsylvania. There, in what had once been the library, the millionaire mad scientist had worked out a plan to wire the jars together and connect the jumble of wires to an electrical apparatus, a radio, and a typewriter. This contraption was called the Cerebral Library.

“Now, suppose I want to know all there is to know about toadstools?” he said, demonstrating his invention. “I spell out the word on this little typewriter in the middle of the table,” and then, abracadabra, the radio croaks out “a thousand word synopsis of the knowledge of the world on toadstools.”

Happily, if I want to learn about mushrooms I don’t have to decapitate five hundred recent college graduates, although, to be fair, neither did that mad millionaire, whose experiment exists only in the pages of the May, 1931, issue of the science-fiction magazine Amazing Stories. Instead, all I’ve got to do is command OpenAI’s ChatGPT, “Write a thousand word synopsis of the knowledge of the world on toadstools.” Abracadabra. Toadstools, also known as mushrooms, are a diverse group of fungi that are found in many different environments around the world, the machine begins, spitting out a brisk little essay in a tidy, pixelated computer-screen font, although I like to imagine that synopsis being rasped out of a big wooden-boxed nineteen-thirties radio in the staticky baritone of a young Orson the-Shadow-knows Welles. While some species are edible and have been used by humans for various purposes, it is important to be cautious and properly identify any toadstools before consuming them due to the risk of poisoning, he’d finish up. Then you’d hear a woman shrieking, the sound of someone choking and falling to the ground, and an orchestral stab. Dah-dee-dum-dum-DUM!

If, nearly a century ago, the cost of pouring the sum total of human knowledge into glass jars was cutting off in their prime hundreds of quite unfortunate if exceptionally well-read young men, what’s the price to humanity of uploading everything anyone has ever known onto a worldwide network of tens of millions or billions of machines and training them to learn from it to produce new knowledge? This cost is much harder to calculate, as are the staggering benefits. Even measuring the size of the stored data is chancy. No one really knows how big the Internet is, but some people say it’s more than a “zettabyte,” which, in case this means anything to you, is a trillion gigabytes or one sextillion bytes. That is a lot of brains in jars.

Forget the zettabyten Internet for a minute. Set aside the glowering glass jars. Instead, imagine that all the world’s knowledge is stored, and organized, in a single vertical Steelcase filing cabinet. Maybe it’s lima-bean green. It’s got four drawers. Each drawer has one of those little paper-card labels, snug in a metal frame, just above the drawer pull. The drawers are labelled, from top to bottom, “Mysteries,” “Facts,” “Numbers,” and “Data.” Mysteries are things only God knows, like what happens when you’re dead. That’s why they’re in the top drawer, closest to Heaven. A long time ago, this drawer used to be crammed full of folders with names like “Why Stars Exist” and “When Life Begins,” but a few centuries ago, during the scientific revolution, a lot of those folders were moved into the next drawer down, “Facts,” which contains files about things humans can prove by way of observation, detection, and experiment. “Numbers,” second from the bottom, holds censuses, polls, tallies, national averages—the measurement of anything that can be counted, ever since the rise of statistics, around the end of the eighteenth century. Near the floor, the drawer marked “Data” holds knowledge that humans can’t know directly but must be extracted by a computer, or even by an artificial intelligence. It used to be empty, but it started filling up about a century ago, and now it’s so jammed full it’s hard to open.

From the outside, these four drawers look alike, but, inside, they follow different logics. The point of collecting mysteries is salvation; you learn about them by way of revelation; they’re associated with mystification and theocracy; and the discipline people use to study them is theology. The point of collecting facts is to find the truth; you learn about them by way of discernment; they’re associated with secularization and liberalism; and the disciplines you use to study them are law, the humanities, and the natural sciences. The point of collecting numbers in the form of statistics—etymologically, numbers gathered by the state—is the power of public governance; you learn about them by measurement; historically, they’re associated with the rise of the administrative state; and the disciplines you use to study them are the social sciences. The point of feeding data into computers is prediction, which is accomplished by way of pattern detection. The age of data is associated with late capitalism, authoritarianism, techno-utopianism, and a discipline known as data science, which has lately been the top of the top hat, the spit shine on the buckled shoe, the whir of the whizziest Tesla.

Read entire article at The New Yorker