Does NoSQL have a place in GIS? - An open-source spatial database performance comparison with proven RDBMS

 

Christopher J McCarthy

 

Abstract

 

With the relational database model being more than 40 years old, combined with the continuously increasing use of ‘big data’, NoSQL systems are marketed as providing a more efficient means of dealing with large quantities of usually unstructured data. NoSQL systems may provide advantages over relational databases but generally lack the relational robustness for those advantages.

This project attempts to contribute to the GIS field in comparing Open-Source RDBMS and NoSQL systems, storing and querying spatial data with the overall goal to determine if NoSQL systems (specifically MongoDB) have a place within the GI world. Working with Open-source spatial dataset, OpenStreetMap, a scalable approach is taken working through global to local scaled data. This approach aims to provide insight to how either system may present performance advantages related to data size.

The research highlights how the performance of each system is limited by the system functionality. MongoDB’s spatial capabilities are lacking in comparison to the PostgreSQL spatial extension PostGIS. The outcome is that MongoDB cannot support the spatial needs of a specialist GIS operative currently, however if basic spatial functionality is all that is needed, MongoDB presents high performance on large datasets. PostGIS has a complex, highly specialist ream of spatial functionally making it the best performing spatial system, however increasing dataset size does present a system slow down relationship.

The use of each system is dependent on the application but at the present time this NoSQL system is spatially outclassed thus not worthy of the specialist GIS industry.

 

Spatial Benchmark Queries

 

Table 1 Benchmark Queries

Query

Query Description

1

Insert

2

Update

3

Delete

4

Closest Point and Distance

5

Buffer

6

Distance Buffer

7

Bounding Box

8

Bounding Box Render to KML

9

Line Intersects Polygon

10

Area of Polygon

11

Length of Line in Polygon

12

Length of Lines

 

Results

 

Import times between the two systems were noticeably different stemming from the creation of spatial indexes by the PostGIS import tool, while this wasn’t the case for the MongoDB system which required manual creation.

 

Figure 1 Import Times

 

 

MongoDB prevailed as the quicker system for CRUD (Create, Read, Update and Delete) system operations.

 

Table 2 System Operations

 

Insert

Delete

Update

 

PostGIS

MongoDB

PostGIS

MongoDB

PostGIS

MongoDB

 

(s)

(s)

(s)

(s)

(s)

(s)

Dataset

 

 

 

 

 

 

Edinburgh

0.043

0.014

1.314

0.241

1.365

0.352

Scotland

0.049

0.011

5.582

0.253

3.479

0.429

British Isles

0.139

0.024

12.494

0.748

18.289

0.940

Europe

0.349

0.121

134.392

1.581

213.891

1.431

Planet

0.682

0.124

381.482

2.488

492.482

35.392

 

 

MongoDB continued to outperform the more complex spatial system with the following Find Nearest Point and Distance Buffer queries generally running quicker on all levels of dataset with exception to the smaller Buffer. This began to highlight the scalability benefits of MongoDB, less performance degrade was witnessed as dataset scale increased.

 

 

Table 3 Spatial Queries

 

Find Nearest Point

Distance Buffer

PostGIS

MongoDB

PostGIS

MongoDB

(s)

(s)

(s)

(s)

Edinburgh

0.184

0.134

0.011

0.159

Scotland

0.21

0.135

0.015

0.341

British Isles

0.29

0.137

0.034

1.893

Europe

8.34

0.259

13.15

3.682

Planet

59.783

0.361

53.614

7.928



Mongo outperformed PostGIS in many performance tests, but these were rather basic spatial functionality. It became clear that PostGIS was a far more specialist spatial system boasting far more complex spatial functionality. Many operations could not be matched by MongoDB. In many other system tests, PostGIS outperformed MongoDB showing under these more complex tests that the more advanced spatial system had advantages. The increasing dataset size did still present performance effects dramatically increasing operation times in comparison to MongoDB’s impacts.

 

Table 4 Further Spatial Queries

 

Point in Polygon

Line Intersect

BBox Geometries

BBox KML Render

PostGIS

MongoDB

PostGIS

MongoDB

PostGIS

MongoDB

PostGIS

MongoDB

(s)

(s)

(s)

(s)

(s)

(s)

(s)

(s)

JSON

Edinburgh

0.215

2.49

1.225

3.549

0.023

0.948

0.054

10.494

Scotland

0.21

2.569

1.732

7.982

0.065

1.349

0.092

13.444

British Isles

0.448

4.429

2.745

17.138

0.327

1.928

0.668

14.591

Europe

16.221

18.562

8.056

29.389

6.813

7.859

9.841

34.948

Planet

45.897

76.284

98

92.298

29.332

34.384

12.948

54.582

 

Conclusions

 

  • Spatial functionality and application will undoubtedly guide database choice

 

  • MongoDB provides high potential for basic spatial functionality needs with high performance

 

  • PostGIS has ‘big-data’ degrade issues but provides highly specialist spatial functionality

 

  • MongoDB lends itself to basic geospatial web based applications were the user base could increase exceptionally without performance degrade

 

 

Key References

 

de Hass, W., Quak, W. & Vermaji, M., 2008. A spatial DBMS buyers guide, s.l.: Delft University of Technology Section GIS Technology.

Goodchild, M. F., 1992. Geogrpahical information science. Geograpical Information Systems, 6(1), pp. 31-45.

McCarthy, C., 2014. Does NoSQL have a place in GIS? - An open-source spatial database performance comparison with proven RDBMS.

MongoDB, 2014. MongoDB Manual. [Online]
Available at: http://docs.mongodb.org/manual/faq/storage/#why-are-the-files-in-my-data-directory-larger-than-the-data-in-my-database

PostGIS, 2014. Chapter 4. Using PostGIS. [Online]
Available at: http://postgis.refractions.net/documentation/manual-1.3SVN/ch04.html
[Accessed 12 05 2014].

Simion, B., Ilha, D. N., Brown, A. D. & Johnson, R., 2013. The Price of Generality in Spatial Indexing, Toronto: Department of Computer Science, University of Toronto.

Stonebraker, M. & Centintemel, U., 2005. One Size Fits All: An Idea whose Time has Come and Gone. ICDE '05: Proceedings of the 21st International Conference on Data Engineering, pp. 2-11.

Stonebraker, M., Frew, J., Gardels, K. & Meredith, J., 1993. The SEQUOIA 200 storage benchmar. SIGMOD 93' : Proceedings of the 1993 ACM SIGMOD International conference on Management of data, pp. 2-12.

Suprio, R., Bogdan, S. & Demke, A. B., 2011. Jackpine: A Benchmark to Evalutate Spatial Database Performance. Data Engineering (ICDE), Volume 27, pp. 1139-1150.

Vyas, R. K., Paliwal, M. & Pal, B. L., 2011. Conceptual Review on Relational and Spatial Database Query Processing and Benchmarking. International Journal of Advanced Research in Computer Science, 2(5), pp. 578-580.