Home > Google Book Search: The Good, the Bad, & the Ugly

Digital Libraries

Google Book Search: The Good, the Bad, & the Ugly

1/1/2008

When a Text Isn't Text

THE PAGES OF TEXT SHOWN through Book Search are actually images, not text. Although as part of Google's digitizing process a conversion takes place to turn a scanned page into text, the publicly offered results are less stellar than those made possible by the better-known OCR applications such as Abbyy FineReader, which is used by compression software provider LuraTech as part of its PDF conversion solution.

Frequently, an out-of-copyright book in Google will include a "View plain text" function, but the user will be shown a page displaying only "No text" at the top-meaning that Google was unable to convert that particular page into plain text. And if a user's keyword search turns up such a page, Book Search still succeeds in locating and highlighting the search terms, even if it can't seem to display the page in plain-text form. It's almost as if two separate optical character recognition systems are in play: one for the search engine, and another for converting scanned pages into plain text. This inconsistency may not trouble most readers; but those who are print-disabled and need to use a screen reader or convert the text to a speech reader, say otherwise.

Susan Gerhart holds a doctorate in computer science and has worked in research and management in software engineering and technology transfer at Duke University (NC), NASA, the National Science Foundation, USC's Information Sciences Institute, and Embry-Riddle Aeronautical University (FL). Gerhart is also legally blind. As she points out in her blog, As Your World Changes, her experiments in using Book Search have turned up this anomaly, for settings that turned images off in her browser. "I got a snippet of page text, a big empty block of missing image, and various book metadata, including where to buy or borrow," she says. When she tried turning images on, "Ouch, was it bright," she recalls.

She writes: "There's nothing in, around, or any way out of the image into screen readable mode. The image might as well have been a lake, a building, or porn for all the information I could glean from it. I wondered why the omnipotent Google toolbar, gathering data about my searches, and offering me various extra search information, could not also be the reader." Gerhart is doubtless not alone in her frustration.

Linda Becker, the VP of sales and marketing for Kirtas, doesn't believe that Google has somehow created a faster digitization process. "I do know what they're doing, and I can't comment on it," she says. "But what I can say is this: They're not scanning faster, they're not digitizing faster, and they don't have the quality controls that the user deserves."

She may be right: In an ongoing online debate about whether Google is using robotic machinery or human beings to flip the pages, bloggers have poked fun at the search giant's quality control methods (or lack of them) by posting screenshots that reveal hands, fingers, and arms in Book Search results. Becker suggests that those screenshots may not be anomalies. "If you go into Google [Book Search] and look at any book, you'll be able to see by the number of body parts and fingerprints that [the pages] are being turned manually."



Recommended Reading
  • CT Industry

  • eProcurement Success!

    Today, it's clear to almost every campus executive that moving an institution from the traditional purchasing model to a strategic eProcurement program can greatly increase staff efficiency and save the institution money. Because eProcurement automates so many purchasing processes, it eliminates reams of paperwork and allows procurement staff to refocus their efforts on cutting costs and improving strategic partnerships.

  • How to Be a Super Tech Leader

    Mary Jo Gorney-Moreno didn't start out in IT. She joined San Jose State University (CA) in 1981 as an assistant professor in the school of nursing. But somewhere along the way, she realized her energy was focused on academic technology, and how it could help a variety of learners gain knowledge.

  • James Morris

  • Products :: Data Security

  • Products :: Physical Security