Trying to Save the Web’s Shortcuts

From the “Technology Journal” of today’s Wall Street Journal (Wednesday, November 25, 2009, p. B5):

Trying to Save the Web’s Shortcuts

Project Seeks to Preserve Links Behind Fledgling Services That Shrink Internet Addresses

By Andrew LaVallee

The Internet Archive and more than 20 start-ups are banding together to preserve the historical records of the abbreviated Web addresses that are passed around on services such as Twitter.

. . .

The Wayback Machine and More From Brewster Kahle

Really nice 2-page spread on Brewster Kahle, “The internet’s librarian,”  in this week’s issue of The Economist.

The Economist

March 7th – 13th 2009

Technology Quarterly insert

Brain scan

The internet’s librarian

Brewster Kahle wants to create a free, online collection of human knowledge.  It sounds impossibly idealistic — but he is making progress

It is easy to dismiss Mr. Kahle as an idealist, but he has an impressive record of getting things done.

I have used the Wayback machine — i.e., The Internet Archive — to find needed documents that were not otherwise available online anymore.  And apparently I’m not the only one:

The most famous part of the archive is the Wayback Machine (its name inspired by the WABAC machine in the 50-year-old television cartoon featuring Rocky and Bullwinkle). This online attic of digital memorabilia stores copies of internet sites . . . Paul Courant, the dean of libraries at the University of Michigan, equates what the archive does for the internet with what the British Museum did for the British empire. . . . The Wayback Machine “gives us access to what people were producing at different points in time,” he says.  Evidentially this is of more than just academic interest: the site gets 500 page requests per second.

The article also discusses “Mr. Kahle’s wider goal:

to build the world’s largest digital library.  He has recruited 135 libraries worldwide to openlibrary.org, the aim of which is to create a catalogue of every book ever published, with links to its full text where available. . . .

The article notes that “this activist for online privacy is also a staunch supporter of openness” and details efforts and litigation Mr. Kahle has been involved with.

Carl Malamud’s campaign and his many Stanford Law School friends

From Washington Internet Daily, “Agencies,” March 02, 2009 Monday, Vol. 10 No. 39:

. . . Carl Malamud, pushing state legislatures to renounce any claimed copyright interests in legal codes and make them freely available as searchable databases (WID June 20 p7), has support from big names in free-culture and open-government circles. They include [SLS professor] Larry Lessig, founder of Creative Commons, tech publisher Tim O’Reilly, Internet Archive founder Brewster Kahle, Electronic Frontier Foundation lawyer [SLS alumnus and lecturer] Fred von Lohmann, Columbia University law professor Tim Wu and University of California at Berkeley law professor Pamela Samuelson. Malamud’s model, described on his campaign site at YesWeScan.org, is Augustus Giegengack. The printer campaigned his way to becoming U.S. Public Printer by getting endorsement letters from Rotary Clubs and hand-delivering them to the Franklin Roosevelt White House. Malamud said the GPO should lead the effort to make all U.S. primary legal materials available online, create more materials for the public domain that can be re- mixed by users, “reboot” the .gov domain by “installing a cloud” and upgrading its video
capabilities, and work more closely with libraries.

Carl is our hero.  And we (as in librarians) are his.  Carl has been a guest speaker at our Advanced Legal Research class and has made many comments about the role of law librarians in liberating legal information, and he spoke at last summer’s AALL meeting in Portland too.

Using distorted words to build digital libraries

REALLY fascinating story in today’s Wall Street Journal about inventor Luis von Ahn and the use of his Captcha — “Completely Automated Public Turing test to tell Computers and Humans Apart” to help get old books and newspapers online faster and cheaper.

 

Web-Security Inventor Charts a Squigglier Course

Digitizing Books
Is Tied to Revamp
Of Captcha System

By ETHAN SMITH
August 13, 2008; Page B5

 

The primary inventor of a Web security technique is putting the system to work in another security scheme dubbed ReCaptcha.  This time he wants users to assist with what he thinks is an important public service: heling get old books and newspapers online as part of digitized libraries.

From the story:

When the ReCaptcha project is fully up and running, this month or in early September, Mr. von Ahn expects it to process about 160 books a day being scanned by the Internet Archive, a San Francisco nonprofit. The Internet Archive has paid employees scanning 1,000 books a day at 70 public and university libraries, mostly in the U.S., from the Library of Congress to the Allen County Public Library, in Fort Wayne, Ind.

. . .

Most of the books can be digitized using typical optical character recognition software. Those that prove troublesome are to be handled by ReCaptcha.

“It’s a really mind-blowing application,” says Internet Archive founder Brewster Kahle.