Once upon a somewhere. Georeferencing books using toponyms identified in online book reviews

Alexander Mackie www.mappit.net

 

 

 

Stories are at the heart of this project. Stories of adventures and afternoon tea, dastardly deeds and love affairs, stories of the everyday and the fantastical that bring the past to life and shape the future. Location can help readers discover these stories and better understand both the stories, and the places they inhabit.

 

 

Recently there have been attempts to geolocate stories and books, mostly works of fiction. This allows the discovery of stories about, or set in places using a map. A good example is Placing Literature:

 

http://www.placingliterature.com/map

Placing Literature is a crowd-sourced database of mapped books.

 

There is an argument that when it comes to works of fiction:

 

"?setting them in real-world locations gives a sense of realism to the novels and helps make that

connection between a piece of art and the physical world? (Williams, 2013).

 

In addition to this humanistic argument, there are also commercial applications of georeferenced books in promoting book sales. If a book retailer can offer locally relevant books in recommendations, or allow readers to search for books about their holiday destination then they can improve sales.

 

There are some features of books that make georeferencing them challenging. Unlike a business such as a supermarket which has a physical premises at a discrete location, a book might be about multiple locations, fictional locations, or no have location at all. A location might be specific down to a house on a street or vaguely defined as somewhere within a particular country or galaxy.

 

The main drawback with existing solutions is the sparsity of mapped books. A map simply isn?t very engaging when there are hardly any books in the database. This dissertation examined automated ways of generating this data, specifically using the Unlock Text geotagger to identify place names from online book reviews. This has potential to solve this issue of data sparsity. There reviews? for 72,000 books comprising 80 million words were scraped and processed.

 

????????????????? Example of data for one book: ?The Northern Crusades? by Eric Christiansen

?

 

A detailed evaluation of the accuracy of the data was carried out. On average, approximate 60% of the books linked to a given toponym using this technique were correct. This rate of errors would be unacceptable in a book-searching application.

 

Results:

 

  • There is an unfilled niche in enabling the location based searching of books and that this niche has both humanistic and commercial applications.

 

  • Book reviews likely do mention places relevant to the book frequently enough to produce useful location data.

 

  • The accuracy rate in identifying these place names is likely too low to make this data useful.

 

  • Georesolution of catalogue data is a more promising avenue of research. This has subsequently produced the Global Book Map, which is the world's largest geographical book database. The book map data also support the programmatic location-based searching of books via an API.

 

 

Further reading:

 

Grover, C., Tobin, R., Byrne, K., Woollard, M., Reid, J., Dunn, S. & Ball, J. (2010) Use of the Edinburgh geoparser for georeferencing digitized historical collections. Philosophical Transactions of the Royal Society A: Mathematical, Physical & Engineering Sciences, 368, 3875-3889.