World eBook Fair

The World eBook Fair runs July 4-August 4, 2011.

The fair’s aim is to provide free public access for a month to some 6.5 million eBooks.

Project Gutenberg and the Internet Archive are both contributing organizations.

Each will be presenting a number of items in other media during 2011 — such as music, movies, artwork, and dance choreography.

The available collections include reference books and scientific items, as well as approximately 50,000 music entries (on top of 12,000 that debuted last year).

All are welcome to join the World Public Library for an annual membership of US$8.95 per year.

Members can download from a selection of about 2 million PDF eBooks.

Hat tip to ResourceShelf.com.

See also: World eBook Fair – 6.5 million ebooks available through August 4th

Cross-posted on Law Library Blog.

Group of Libraries Launch eBook Lending Program

The Internet Archive and a group of 150 libraries have recently announced development of a collection of over 80,000 eBooks (of mostly 20th century titles) that will extend traditional library lending.

Please see:

In-Library eBook Lending Program Launched

Hat tip to ResourceShelf.com.

Cross-posted on Law Library Blog.

More than One Document a Minute

The headline from the Internet Archive posting reads: “Millions of documents from over 350k federal court cases now freely available.”

The millions of documents are all from PACER by way of the RECAP plugin.

As the posting states:

RECAP is a Firefox Internet browser extension that allows users of the PACER to get free copies of documents they would normally pay for when the Archive has a copy, and if it is not available to then automatically donate the documents after they purchase them from PACER for future users. Therefore the repository on the Internet Archive grows as people use the PACER system with this plug-in. We are currently getting more than one document a minute and some large holdings are being uploaded. We hope that the government will eventually put all of these documents in an open archive, but until then this repository will grow with use.”

Wow.  Growing faster than one document a minute!  (Right now: stop what you are doing and check to see if you have the RECAP plugin installed on your machine — every little bit helps.)

To visit this collection and search the content, go to www.archive.org/details/usfederalcourts.  There you will be able to browse by date (the other browsing features aren’t operational).  You can also do an Advanced Search on the Internet Archive and keyword search through all the available materials by limiting to the Collection Type = usfederalcourts.   VERY COOL.

And, might I add: FREE!

I checked with the good folks who created RECAP at Princeton University’s Center for Information Technology Policy, and they said that for now the RECAP/Internet Archive collection of PACER dockets (specifically: just the high-level case metadata) are indexed and can be searched by the likes of Google, but the underlying dockets, documents and briefs are still hidden from the search robots because of privacy concerns.

Trying to Save the Web’s Shortcuts

From the “Technology Journal” of today’s Wall Street Journal (Wednesday, November 25, 2009, p. B5):

Trying to Save the Web’s Shortcuts

Project Seeks to Preserve Links Behind Fledgling Services That Shrink Internet Addresses

By Andrew LaVallee

The Internet Archive and more than 20 start-ups are banding together to preserve the historical records of the abbreviated Web addresses that are passed around on services such as Twitter.

. . .

Best Evidence and the Wayback Machine: A Workable Authentication Standard for Archived Internet Evidence

“Note, Best Evidence and the Wayback Machine: A Workable Authentication Standard for Archived Internet Evidence”

Fordham Law Review, Forthcoming

DEBORAH R. ELTGROTH, Fordham University – Fordham Law Review

This Note addresses the use of archived Internet content obtained via the Wayback Machine, a service provided by the Internet Archive that accesses the largest online digital collection of archived Web pages in the world. Given the dynamic nature of the World Wide Web, Internet content is constantly changed, amended, and removed. As a result, interim versions of Web pages have limited life spans. The Internet Archive indexes and stores Web pages to allow researchers to access discarded or since-altered versions. In the legal profession, archived Web pages have become an increasingly helpful form of proof. Intellectual property enforcers have recognized the value of the Internet Archive as a tool for tracking down infringers, but evidence from the Internet Archive has rarely been admitted at trial. This Note surveys the handful of judicial opinions and orders that comment on the admission of Internet Archive evidence and explores the conflict underlying these approaches. As an alternative to the courses they have taken, this Note urges courts to treat the introduction of archived Web pages as implicating a best evidence issue in addition to an authentication question. Under this approach, courts would decide using evidence sufficient to the purpose, but not necessarily admissible at trial, whether the archived page qualifies as a ‘duplicate’ of a page that once appeared on the Web. Beyond that, courts would apply authentication standards already developed to decide whether a reasonable jury could find, based only on admissible evidence, whether proffered evidence accurately represents the page stored on the Internet Archive server and, if necessary, whether the original page accurately represented material placed on the originating site by the site’s owner or operator. With this additional step, reliable evidence from the Wayback Machine can become as easily admitted as any other Internet-derived proof.

 

Source:  LSN Intellectual Property Law Vol. 2 No. 109,  09/30/2009

Using distorted words to build digital libraries

REALLY fascinating story in today’s Wall Street Journal about inventor Luis von Ahn and the use of his Captcha — “Completely Automated Public Turing test to tell Computers and Humans Apart” to help get old books and newspapers online faster and cheaper.

 

Web-Security Inventor Charts a Squigglier Course

Digitizing Books
Is Tied to Revamp
Of Captcha System

By ETHAN SMITH
August 13, 2008; Page B5

 

The primary inventor of a Web security technique is putting the system to work in another security scheme dubbed ReCaptcha.  This time he wants users to assist with what he thinks is an important public service: heling get old books and newspapers online as part of digitized libraries.

From the story:

When the ReCaptcha project is fully up and running, this month or in early September, Mr. von Ahn expects it to process about 160 books a day being scanned by the Internet Archive, a San Francisco nonprofit. The Internet Archive has paid employees scanning 1,000 books a day at 70 public and university libraries, mostly in the U.S., from the Library of Congress to the Allen County Public Library, in Fort Wayne, Ind.

. . .

Most of the books can be digitized using typical optical character recognition software. Those that prove troublesome are to be handled by ReCaptcha.

“It’s a really mind-blowing application,” says Internet Archive founder Brewster Kahle.