Friday, August 6, 2010

Google counts 129,864,880 books in the world

Determining the total number of books in the world is a monumental task, but the number is finite and therefore measurable. Google Books recently concluded a project that just did that. The number they arrived at is - 129,864,880 as of August 1, 2010.


Google put up a fascinating and insightful post on how they found out the answer. The first task was to determine what defines a book and what to include in the count and what not to.

One definition of a book we find helpful inside Google when handling book metadata is a “tome,” an idealized bound volume. A tome can have millions of copies (e.g. a particular edition of “Angels and Demons” by Dan Brown) or can exist in just one or two copies (such as an obscure master’s thesis languishing in a university library). This is a convenient definition to work with, but it has drawbacks. For example, we count hardcover and paperback books produced from the same text twice, but treat several pamphlets bound together by a library as a single book.

Systems like ISBN’s, SBN’s, Library of Congress Accession Numbers, and OCLC exist for cataloguing books but they aren’t reliable. The ISBN for example has been around only since the mid 1960s, and remain mostly a western phenomenon. So most books printed earlier, and those not intended for commercial distribution or printed in other regions of the world, have never been assigned an ISBN.

The other reason we can’t rely on ISBNs alone is that ever since they became an accepted standard, they have been used in non-standard ways. They have sometimes been assigned to multiple books: we’ve seen anywhere from two to 1,500 books assigned the same ISBN. They are also often assigned to things other than books. Even though they are intended to represent “books and book-like products,” unique ISBNs have been assigned to anything from CDs to bookmarks to t-shirts.

So Google chose to deal with that problem their own way. They collected metadata from more than 150 different sources that include libraries, WorldCat, national union catalogs and commercial providers. Then they weeded out as many duplicates as possible. Then they removed non-book materials such as microform, film, maps and even about a thousand t-shirts. They also excluded serials which are often given to one work.

Finally, the number that stood before them was 129,864,880. How many have you read?


Post a Comment

Popular Posts