Thursday, November 23, 2006


Dead Plagiarists Society
Will Google Book Search uncover long-buried literary crimes?

By Paul Collins
Posted Tuesday, Nov. 21, 2006, at 12:22 PM ET

Amir Aczel knew just whom to blame. "It seems," the science author complained last month in an irate letter to the Washington Post, "that [Charles] Seife has submitted every sentence in my book to a Google search." Days earlier in a Post book review, Seife exposed what appeared to be embarrassing plagiarisms in Aczel's new book, The Artist and the Mathematician. But if Seife's discovery that Aczel lifted text from the Guggenheim Museum's Web site was instructive, so was the assumption behind Aczel's response. For any plagiarist living in an age of search engines, waving a loaded book in front of reviewers has become the literary equivalent of suicide by cop.

As it turns out, even authors not living in this online age are in trouble. My fellow literary sleuth Alex MacBride recently revealed to me that he'd uncovered an old crime in a new way. MacBride, a linguist employed by Google, idly ran a phrase from England Howlett's 1899 essay Sacrificial Foundations through Google Book Search, his employer's massive digitization of millions of volumes from university libraries. The search had nothing to do with his job—like the rest of us, sometimes Alex just kills time by plugging stuff into Google—and rather than go to the trouble of digging out Howlett's book by name, he'd decided to call it up with a phrase. To his surprise, he got more back than just Howlett: The search also revealed a suspiciously similar passage in Sabine Baring-Gould's 1892 book Strange Survivals. A lot of suspiciously similar passages.

Perhaps it's not too shocking that a small-time amateur like Howlett swiped from Baring-Gould, a frenetically prolific folklore scholar who published hundreds of books and articles. But, the search results revealed, this was not quite the end of the story. "Charmingly," MacBride e-mails, "Baring-Gould seems to have had sticky fingers himself." The wronged author, you see, had in turn used the unattributed quotation from a still earlier work: Benjamin Thorpe's 1851 study Northern Mythology.

We're talking about forgotten writers here: I don't think there will be too many England Howlett fan clubs grappling with disillusionment today. But MacBride's discovery is the first rumble in what may become a literary earthquake. Given the popularity of plagiarism-seeking software services for academics, it may be only a matter of time before some enterprising scholar yokes Google Book Search and plagiarism-detection software together into a massive literary dragnet, scooping out hundreds of years' worth of plagiarists—giants and forgotten hacks alike—who have all escaped detection until now.

But wait, you might ask, don't people accidentally repeat each other's sentences all the time? It seems to me that this should not be unusual. Yet try plugging that last sentence word by word into Google Book Search, and watch what happens.

It: Rejected—too many hits to count
It seems: 11,160,000 matches
It seems to: 3,050,000
It seems to me: 1,580,000
It seems to me that: 844,000
It seems to me that this: 29,700
It seems to me that this should: 237
It seems to me that this should not: 20
It seems to me that this should not be: 9
It seems to me that this should not be unusual: 0

It seems to me that this should not be unusual is itself ... unusual.

Google Book Search contains hundreds of millions of printed pages, and yet after just a few words, the likelihood of the sentence's replication scales down dramatically. And even before our sentence implodes into utter improbability, there's another telling phenomenon at work. The nine books that contain the penultimate It seems to me that this should not be are from a grab bag of subjects: a 2001 study of Freud, an 1874 collection of Methodist camp sermons, minutes from a 1973 hearing of the Senate subcommittee on transportation. So, if replicating the same sentence alone is suspicious behavior, then to also replicate it on the same subject warrants dialing 911.

Conveniently enough, a few literary greats have already had their mug shots taken. It's long been known that Poe plagiarized his first book, a hack project titled The Conchologist's First Book, and that Herman Melville swiped many technical passages of Moby Dick whole from maritime authors like Henry Cheever. Even more inventively, Lawrence Sterne's immortal diatribe against plagiarism in Tristram Shandy was itself ... plagiarized from Robert Burton's Anatomy of Melancholy. There have always been a dizzying array of ways that authors can rip each other off, even in reverse: Literary critic Terry Eagleton has written entertainingly of "anti-plagiarism," a 19th-century literary wheeze favored by Irish critics, who pounced on poets or novelists for plagiarizing or surreptitiously translating some little-known domestic or foreign work and presenting it under their name. The trick was that the "original" work presented by the prosecuting critic was itself a forgery, written after a new work's publication to frame an enemy.

The most intriguing result of a digital dragnet would be if any deeply idiosyncratic last-person-you'd-guess authors get fingered—Emily Dickinson, anyone? Ben Franklin, perhaps? I'd bet that in the next decade at least one major literary work gets busted. Such thefts don't necessarily end a literary reputation: After all, what Melville did with ordinary maritime literature amounted to an act of lead-to-gold alchemy. But it's invigorating to think that some forgotten authors, long buried and with the dirt tamped down over them by their ruthless rivals, will now get their due. Plagiarism, it seems, will out.

Paul Collins teaches nonfiction at Portland State University. His latest book is The Trouble With Tom: The Strange Afterlife and Times of Thomas Paine.