Monday, June 11, 2007

The New York Times

June 11, 2007

A Dog or a Cat? New Tests to Fool Automated Spammers

On the Internet, nobody knows you’re a human — until you fill out a captcha.

Captchas are the puzzles on many Web sites that present a string of distorted letters and numbers. These are supposed to be easy for people to read and retype, but hard for computer software to figure out.

Most major Internet companies use captchas to keep the automated programs of spammers from infiltrating their sites.

There is only one problem. As online mischief makers design better ways to circumvent or defeat captchas, Web companies are responding by making the puzzles more challenging to solve — even for people.

They are twisting the letters, distorting the backgrounds, adding a confusing kaleidoscope of colors and generally making it difficult for humans.

“They are creating tests that a reasonably healthy adult can’t pass,” said Gordon Weakliem, a programmer and blogger from Denver, who says he failed to correctly discern the captcha code several times last week on the sign-up page for the Windows Live service of Microsoft.

With captchas getting easier for computers and more difficult for real people, several Internet companies, including Microsoft and eBay, are working on replacements.

“You can make a captcha absolutely undefeatable by computers, but at some point, you are turning this from a human reading test into an intelligence test and an acuity test,” said Michael Barrett, the chief information security officer at PayPal, a division of eBay. “We are clearly at the point where captchas have hit diminishing returns.”

If that is true, at least captchas had a good run. Though several researchers devised similar tests early in the decade, credit for inventing the technology usually goes to Carnegie Mellon University, which was asked by Yahoo in 2000 to create a method to prevent rogue programs from invading its chat rooms and e-mail service.

University researchers devised a collection of cognitive puzzles that they knew modern computers could not solve. They called their approach the Completely Automated Public Turing Test to Tell Computers and Humans Apart, or captcha for short. The reference was to the computer scientist Alan Turing, who did research into ways to tell man from machine in the 1950s.

Captchas quickly became popular online and soon expanded into new dimensions. When advocates for the visually impaired complained that some people could not read the puzzles, many sites added audio versions, where a computer voice recites a string of letters and numbers, often over noises in the background.

The emergence of the technology started a wave of research into ways to make computers smart enough to crack the puzzles.

Yet some of that activity can be ethically murky. Aleksey Kolupaev, 25, works for an Internet company in Kiev, Ukraine, and in his spare time, with his friend Juriy Ogijenko, he develops and sells software that can thwart captchas by analyzing the images and separating the letters and numbers from the background noise. They charge $100 to $5,000 a project, depending on the complexity of the puzzle.

Mr. Kolupaev said he had worked both for legitimate companies that want to test their own security and for spammers who seek to infiltrate Web sites.

“Nothing is unbreakable, and each system has its own weakness,” he said. “If you create a program that only recognizes one picture from a hundred, it’s not a problem. You just hit the site 100 times, and you break through.”

On his Web site, ocr-research.org.ua, Mr. Kolupaev boasts of cracking the captchas of companies like MySpace and PayPal; the site also ranks the effectiveness of each captcha. He says he believes that his work makes the Internet more secure because companies tend to improve the captchas that he critiques.

Internet companies have responded to these challenges by making their captchas more complex. On YouTube, for example, the letters and numbers in the captcha float on an uneven grid of colors. On the technology news site Slashdot, random squiggly lines slice through the letters and numbers, as if a child had scrawled with a pen on each puzzle.

All these tricks are attempts to disguise the boundaries of the characters, so that software cannot identify the numbers and letters.

But often these measures prove too tough for humans to decipher as well. On Ticketmaster’s site, the characters appear over a grid of diagonal lines that are so thick that they often obscure the puzzle. Jacob Hanson, the chief technology officer of HireVue, an online employment agency in Salt Lake City, estimated that he had failed to solve the Ticketmaster captcha once every four times.

“I can only imagine someone like my mom trying to go through it,” he said.

Aleksey Kolupaev says that he found ways to circumvent puzzles on sites like PayPal and has sold his software to spammers and companies.

As a result, the hunt is on for puzzles that are friendlier to humans and more difficult for computers. Many researchers are focusing on expanding the test beyond the constrained realm of 26 letters and 9 digits.

Microsoft researchers have developed an alternative captcha that asks Internet users to view nine images of household pets and then select just the cats or the dogs.

“For software, this is wildly hard,” said John Douceur, a Microsoft researcher. “Computers are tripped up by all the photos at different angles, with variable lighting conditions and backgrounds and the animals in different positions.”

The project, called Asirra (for Animal Species Image Recognition for Restricting Access), uses photographs of animals from Petfinder.com, a site that finds homes for homeless pets and has more than two million images in its database.

Other companies prefer to keep their next-generation captcha research quiet. Mr. Barrett of PayPal will say only that the new breed of captchas might resemble simple image identification puzzles, like asking users to view pictures of a head of lettuce, a tree and a whale — and pick out the vegetable.

“Captchas have gotten as good as they are going to get, and it is likely they are going to be slowly supplanted with a different technology that achieves the same thing,” he said.

He added: “No single defensive technology is forever. If they were, we would all be living in fortified castles with moats.”

Not everyone feels that the traditional captcha is finished. Luis von Ahn, a professor at Carnegie Mellon and a member of the team that invented captchas, recently unveiled an effort to give them new usefulness.

His reCaptcha project (recaptcha.net) seeks to block spam while handling the challenge of digitally scanning old books and making them available in Web search engines.

When character recognition software fails to decipher a word scanned in a book — when the page is yellowed or the letters are smudged, for example — Mr. von Ahn’s project makes it part of a captcha. After the mystery word has been verified by several people, it is fed back into the digital copy of the book.

“I heard that 60 million captchas are solved every day around the world, which first made me quite happy for myself but then quite sad,” he said. “It takes about 10 seconds to solve a captcha, so that means humanity is wasting thousands of hours solving them. I wanted to do something good for humanity in that time.”

Human, Read This