Top banner

The almost ubiquitous connectivity brought by modern mobile technologies and social networking services helps people around the globe generate very large amounts of information every day. Statistics on social networking sites show that on average 70 billion pieces of content are shared on Facebook each month, 190 million tweets per day are posted on Twitter, and 3000 photos are uploaded to Flickr per minute. Some of these datasets contain geographical information which could potentially be used to perform spatial or temporal analysis. In this project a tool is introduced with the aims to allow tempo-spatial exploration on photos downloaded from the photo sharing site Flickr.

The tool connects to the Flickr public API REST endpoint to download the photos which are related to a specific tag requested by the user. Pre-processing takes place on the downloaded dataset to remove multiple photos that have been taken by the user on the same location on the same day. Cluster analysis is performed on the reduced dataset once for each time unit on a specified study area using the DBSCAN density-based algorithm. Quadrat count analysis is performed by the tool as the last step in the process, giving the user the statistical means to assess the validity of the cluster analysis.

Server-side storing and processing is performed via PostgreSQL’s PostGIS extension, which enables spatial queries on geo-referenced data. Temporal querying is handled with PostgreSQL’s and PHP’s timestamp and time interval data types and functions. The tool implements the DBSCAN clustering algorithm to find density-based spatial relationships. The frontend of the application uses JavaScript, HTML5 and CSS3 to produce a simple and dynamic interface. OpenLayers 3 is used by the tool to deliver interactive web mapping capabilities. The WMS and WFS services used by the tool are provided by a GeoServer instance running on the backend.

The tool has two main interfaces which are accessible to the user: the control panel and the viewer.

On the control panel the user can interact with the main entities used to define the way the photos are explored: the tags and the study area. It is divided in four sections, two used to work with tags, and two used to work with study areas. Study areas can either be drawn on the viewer or can be copied from another study area.

Tag section of the control panel
Tag section of the control panel
Study area section of the control panel
Study area section of the control panel

The viewer allows the user to explore the spatial and temporal distribution of the tags for which photos have been downloaded. Temporal exploration of the study area is achieved by the time slider on the top of the map. The time slider contains several time slots, each representing a consecutive time unit, all of them together cover the entire timeframe.

Each time unit uses different elements to represent an aspect of the cluster distribution. A bar chart is used to depict the relative size of the clusters present in the study area. Below each bar the number of clusters is listed for each time unit. Colour is used to depict the presence of overlapping clusters on consecutive time units with different colour ramps to differentiate clusters in the past from clusters in the future. For cluster validation the tool uses quadrat count analysis with a hexagonal grid (beehive).

The viewer showing clusters in a study area in North America
The viewer showing clusters in a study area in North America
Example of a hexagonal grid for a study area in North America
Example of a hexagonal grid for a study area in North America
Example of quadrat count analysis for a particular time unit
Example of quadrat count analysis for a particular time unit

Evaluation of the tool was performed with a case study in which, using locational data from Flickr photos tagged with "humpback whale", the location of the most popular whale watching spots were calculated for a set of four study areas in the North Pacific Ocean.

Four study areas were defined based on two of the four recognised stocks by the US Marine Mammal Protection Act (MMPA):

Geographic extent of the California / Oregon / Washington stock (summer) study area (west coast of the Unites States of America)
Geographic extent of the California / Oregon / Washington stock (summer) study area (west coast of the Unites States of America)
Geographic extent of the California / Oregon / Washington stock (winter) study area (west coast of Mexico)
Geographic extent of the California / Oregon / Washington stock (winter) study area (west coast of Mexico)
Geographic extent of the Central North Pacific Stock (summer) study area (coast of Alaska)
Geographic extent of the Central North Pacific Stock (summer) study area (coast of Alaska)
Geographic extent of the Central North Pacific Stock (winter) study area (Hawaii)
Geographic extent of the Central North Pacific Stock (winter) study area (Hawaii)

Using cluster analysis within the tool, the most popular locations for whale watching where found on:

Study areaMost popular spotTime unitsClusters
California/Oregon/Washington Stock (summer)Monterey Bay8 out of 98 out of 9
California/Oregon/Washington Stock (winter)De Banderas Bay3 out of 43 out of 4
Central North Pacific Stock (summer)Chichagof Island and the City of Juneau10 out of 1510 out of 19
Central North Pacific Stock (winter)Maui, Molokai, Lanai, and Kahoolawe Islands11 out of 1211 out of 13
Photos concentrated on Monterey Bay, California
Photos concentrated on Monterey Bay, California
Photos concentrated in De Banderas Bay, Mexico
Photos concentrated in De Banderas Bay, Mexico
Photos concentrated off the City of Juneau (right) and south of Kenai Peninsula (left), Alaska, time slider highlighting other clusters for the area around the City of Juneau
Photos concentrated off the City of Juneau (right) and south of Kenai Peninsula (left), Alaska, time slider highlighting other clusters for the area around the City of Juneau
Photos concentrated around the Islands of Maui, Molokai, Lanai, and Kahoolawe, Hawaii
Photos concentrated around the Islands of Maui, Molokai, Lanai, and Kahoolawe, Hawaii

VMR and x2 values indicate that the distribution of "humpback whale" tagged photos on the study areas is not caused by complete spatial randomness.

The project showed that temporal and spatial exploration and querying of Flickr photos can be achieved with an automated tool, and that the approach taken of a web application that downloads and stores the metadata from the photos on Flickr is preferable to downloading metadata on the fly, as this was deemed too slow to provide an appropriate user experience.

Future work could explore the differences in the outputs of the cluster analysis using different algorithms like k-means or hierarchical clustering. The data sources for the tool could also be expanded to include other social networking services to complement or compare analyses across social networks. Finally, automatic methods to adjust the location of the photo could be added to the tool to use the location of the object being depicted on the photo instead of the location where the photo was taken on the calculations, as this was not explored on this project.

Ahern, S., Naaman, M., Nair, R. and Yang, J.H., 2007. World Explorer: Visualizing Aggregate Data from Unstructured Text in Geo-referenced Collections. In: Proceedings of the 7th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’07. New York, NY, USA: ACM, pp.1–10.

Allen, B.M. and Angliss, R.P., 2013. Alaska Marine Mammal Stock Assessments, 2012. NOAA Technical Memorandum NMFS-AFSC-245, [online] p.282. Available at: <http://www.nmfs.noaa.gov/pr/sars/pdf/ak2012.pdf>.

Carretta, J. V, Oleson, E., Weller, D.W., Lang, A.R., Forney, K.A., Baker, J., Hanson, B., Martien, K., Muto, M.M., Lowry, M.S., Barlow, J., Lynch, D., Carswell, L., Brownell Jr., R.L., Mattila, D.K. and Hill, M.C., 2013. U.S. Pacific Marine Mammal Stock Assessments: 2012. NOAA Technical Memorandum NMFS NOAA-TM-NMFS-SWFSC-504, [online] p.378. Available at: <http://www.nmfs.noaa.gov/pr/sars/pdf/po2012.pdf>.

Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghavan, P. and Tomkins, A., 2006. Visualizing Tags over Time. In: Proceedings of the 15th International Conference on World Wide Web, WWW ’06. New York, NY, USA: ACM, pp.193–202.

Ester, M., Kriegel, H., Sander, J. and Xu, X., 1996. A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise. In: E. Simoudis, J. Han and U. Fayyad, eds., Second International Conference on Knowledge Discovery and Data Mining. Portland, Oregon: AAAI Press, pp.226–231.