Using distorted words to build digital libraries

REALLY fascinating story in today’s Wall Street Journal about inventor Luis von Ahn and the use of his Captcha — “Completely Automated Public Turing test to tell Computers and Humans Apart” to help get old books and newspapers online faster and cheaper.


Web-Security Inventor Charts a Squigglier Course

Digitizing Books
Is Tied to Revamp
Of Captcha System

August 13, 2008; Page B5


The primary inventor of a Web security technique is putting the system to work in another security scheme dubbed ReCaptcha.  This time he wants users to assist with what he thinks is an important public service: heling get old books and newspapers online as part of digitized libraries.

From the story:

When the ReCaptcha project is fully up and running, this month or in early September, Mr. von Ahn expects it to process about 160 books a day being scanned by the Internet Archive, a San Francisco nonprofit. The Internet Archive has paid employees scanning 1,000 books a day at 70 public and university libraries, mostly in the U.S., from the Library of Congress to the Allen County Public Library, in Fort Wayne, Ind.

. . .

Most of the books can be digitized using typical optical character recognition software. Those that prove troublesome are to be handled by ReCaptcha.

“It’s a really mind-blowing application,” says Internet Archive founder Brewster Kahle.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s