Sunday, February 21, 2010

How Google Books Does It

[x YouTube/RisTech Channel]
Robotic Book Scanning

Among all of the gazillions of books that Google Books has digitized, this blogger has a favorite. Use the searchable table of contents and click on "A Lone Star Renaissance" and see what you get. Wow, what a rush! If this is (fair & balanced) égoïsme, so be it.

[x The Texas Observer]
Scan And Deliver
By Michael May

Tag Cloud of the following article

created at

When Corpus Christi author Peggy Cleaves heard several years ago that Google planned to scan every book ever published for its own use, she remembers becoming hysterical and paranoid. “you grow up thinking, ‘This is my book. How can Google just take the rights?’ It was hard to deal with.”

Cleaves has a lot at stake. She has written 60 novels, the majority for Harlequin, under the name Ann Major, mostly in a genre she defines as “South Texas ranch romance.” Ten titles are out of print—which means Google could end up with almost exclusive control over those books. From the beginning, Google gave publishers and authors the right to opt out of the project. For Cleaves, doing so could mean losing millions of readers who might come across her books while searching keywords like “tan, muscled rancher” or “thirty days of passion.” Cleaves wasn’t sure whether to opt in or out. “I called my publisher,” she says. “And she had no advice at all. It was scary.”

Judging by their actions, many publishers were equally conflicted. Many cut deals with Google to advertise and sell their books on the Web. Some of the same publishers — along with authors and their representatives, like the Authors Guild — filed a class-action lawsuit against the company for copyright infringement in 2005.

On February 18, the sides are expected to settle the case.

The proposed settlement will let Google mine most of the books ever written in English for its search engine. Authors like Cleaves could see their out-of-print books get a second life. But the question remains: Is giving a private corporation—no matter how hard it tries to not “be evil”— control over tens of millions of books a good idea?

Google’s mission is to “organize the world’s information and make it universally accessible and useful.”

It’s hard to imagine how to meet that goal without liberating books from their dusty lairs. Let’s face it.

After more than a decade of use by millions of contributors, the best the Web’s given us is Wikipedia—a godsend for resolving drunken dinner-party debates, but something quite short of a real library. The reality is that it can be very hard to get accurate, comprehensive information on the Internet. And that fact hurts Google every time someone does a search and comes away frustrated. The company needed better text. It was inevitable: Google had to gobble some books.

So Google went to large university libraries—places with 5 million to 7 million books each—and gave them an offer they couldn’t refuse. The University of Texas at Austin was among the first to sign on. The school had been trying to digitize the books in the Benson Latin-American Collection for years before Google came calling. The collection, with almost a million books, is the largest of its kind in the world. Dennis Dylan, UT’s associate director of research services, says the university has long seen the value of having that content online. Dylan says UT libraries had $100,000 a year for scanning books.

“We tried to find someone to scan 3,000 books in a year,” he says. “We didn’t get any bids. It’s really timeconsuming and expensive. Someone has to scan every page and then go back to make sure there aren’t any errors. Then all the files need to be stored and serviced. It would have taken us a thousand years to do it ourselves.” Google was willing to spend hundreds of millions scanning books. It developed its own top-secret scanning technology that made the process significantly more efficient, and sent teams of scanners to some of the largest libraries in the country. Google scanned the entire Benson collection at UT in three years. For free.

And the company gave the university copies of all the files it scanned from the collection. Now you can read The Life and Times of Pancho Villa from your desktop.

Google scanned three broad categories of books. Books published in the United States before 1923 are in the public domain, which means anyone can scan them, sell them, or do whatever else they want.

(If you want to read Moby-Dick on your computer screen, it’s waiting for you.) Then there are books that are still in print, which Google covered by making individual deals with publishers and authors.

The majority of books ever written are out of print, and in many cases it’s difficult to know who exactly is the copyright holder. Google allows Web browsers to see only “snippets” of these “orphan books” related to key-word searches. Since the public can’t download the entire book, Google argued the snippets are allowed under “fair use,” laws that allow limited amounts of copyright material to be used without permission.

Google was entering uncharted territory here, and it wasn’t long before a group of authors and publishers sued the tech giant. “We argued that this isn’t fair use,” says Authors Guild Executive Director Paul Aiken. “If you have scanned the full book, then it doesn’t matter if you are only displaying snippets at a time—you’ve taken the book.” Aiken adds that Google is using the material for a commercial use, which can look bad in fair use cases. The plaintiffs had a strong argument, and it’s possible—even likely—that they could have wrestled the books back from Google. “But if Google won,” Aiken says, “it could have been catastrophic. It could have jump-started book piracy.”

Instead, they settled. Google agreed to allow authors and publishers the right to remove their works from Google Books. For those that opt in, Google will share with authors and publishers the revenues it makes through ads, selling books and providing a subscription service to public libraries.

The authors will keep 67 percent of the money, and Google will take the rest. The settlement will also create an independent Books Rights Registry to distribute revenues to authors and publishers, as well as hold the money made on orphan works until it finds the copyright holder. In return, Google will have the right to use many millions of pages it has scanned in its search engine, and now has permission to display more than just snippets.

The arrangement could work out well for some authors. Peggy Cleaves decided to let Google corral her long-lost romances. “Before, my earlier books were in a dead zone,” she says. “Google saw a way to seize wealth, but they also made it more valuable with their technology.”

Another way to look at the settlement is that Google has been given permission to privatize the next generation’s library. “No one is disputing the value of having a digital online library,” says Peter Brantley, director of the books server project at the Internet Archive, a nonprofit that scans open-source books and provides them to the public. “But do we want to let a private company run this for its own ends and purposes?”

The Internet Archive is part of the Open Book Alliance, a group of nonprofits and companies like Microsoft Corporation that aim to challenge the settlement in court. They say the settlement gives Google exclusive rights to the millions of orphan works it has scanned.

(The Books Rights Registry will have the ability to sell some digital rights to other companies, as long as the copyright holder consents.) “The settlement essentially grants an exemption from copyright to one commercial advertising company,” Brantley says. He adds that this is not just a question of principle—the public could end up with limited access to our literary heritage if Google makes it too expensive for local libraries to subscribe to its service. Many publishers are negotiating their own deals with Google, which means the public’s access to their books will be piecemeal.

The Open Book Alliance is pushing for Congress to create a national digital library, which could buy Google’s scans and share them freely. This would cost the government hundreds of millions of dollars, which Brantley points out is a small fraction of what was spent on the economic stimulus. Still, it’s hard to imagine the two parties coming together in the midst of a recession to invest in a public commons for books, no matter how important it might be. So the public could be left praying that Google can be benevolent and make money at the same time. Let’s hope it works out better than it did with the banks. Ω

[Observer Culture Editor Michael May has worked for Austin NPR affiliate KUT-FM and the national radio show "Weekend America." He’s also freelanced for "This American Life," "Studio 360," "Marketplace," and others. For his radio work, May has won a Third Coast Audio Festival Gold Award and a National Headliners Grand Award.]

Copyright © 2010 The Texas Observer

Get the Google Reader at no cost from Google. Click on this link to go on a tour of the Google Reader. If you read a lot of blogs, load Reader with your regular sites, then check them all on one page. The Reader's share function lets you publicize your favorite posts.

Copyright © 2010 Sapper's (Fair & Balanced) Rants & Raves