Google Books: a cataloging disaster

The subject of this post is :
650 b0 Metadata|xErrors of usage

Evidently, the metadata cataloguing of Google Books is a train wreck, particularly regarding dates of publication and subjects:

Do a search on “internet” in books written before 1950 and Google Scholar turns up 527 hits.

But whether it gets the BISAC categories right or wrong, the question is why Google decided to use those headings in the first place. (Clancy denies that they were asked to do so by the publishers, though this might have to do with their own ambitions to compete with Amazon.) The BISAC scheme is well suited to organizing the shelves of a modern 35,000 foot chain bookstore or a small public library where ordinary consumers or patrons are browsing for books on the shelves. But it’s not particularly helpful if you’re flying blind in a library with several million titles, including scholarly works, foreign works, and vast quantities of books from earlier periods. For example, the BISAC “Juvenile Nonfiction” subject heading has almost 300 subheadings, including separate categories for books about “New Baby,” “Skateboarding,” and “Deer, Moose, and Caribou.” By contrast, the “Poetry” subject heading has just 20 subdivisions in all. That means that Bambi and Bullwinkle get a full shelf to themselves, while Schiller, Leopardi, and Verlaine have to scrunch together in the lone subheading reserved for “Poetry/Continental European.” In short, Google has taken the great research collections of the English-speaking world and returned them in the form of a suburban mall bookstore.

To be fair, Jon Orwant replies in the comment about Google’s procedures, and you feel some sympathy. They’re getting in a lot of metadata from disparate sources, and let’s face it, there’s a lot of sucky cataloging out there. Geoff Nunberg was not entirely convinced by the arguments, though:

I simply assumed that this mistake must have been the work of a program, rather than a human — I mean, could someone really misread that ad as providing a publication date? The answer, according to Jon, is, well, actually, somebody did. Which only goes to show that the Turing test can work both ways: do something dumb enough, and it’s hard to tell you from a machine.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: