An introduction to Geoslurp

Geoslurp is a pure-python module which allows downloading/updating/querying of different datasets in a spatially aware database (PostgreSQL with PostGIS).

The idea behind Geoslurp is to centrally manage datasets, which has the following advantages:

  • It provides and encourages a central point to access to various datasets

  • Established database functionatility can be exploited by allowing (spatial) queries of the datasets, joins (querying a combination of datasets)

  • Sharing is encouraged, users share data by default, making it possible for other users to use the registered data and avoiding copies of large datasets

  • It allows for consistent versioning of datasets

  • The use of an ‘off-the-shelf’ database provides a standard and mature interface reachable from various programming languages.

  • Large datafiles don’t necessarily need to be in the database. The metainformation, required for the queries, will be stored together with a link to where the data can be found (e.g. a local file path or an online url). This allows relatively light-weight databases to be set up.

  • Provide ways to standardize downloading/updating/registering of datasets

Requirements

To use Geoslurp one must essentially install the Geoslurp Python package, and have a running PostgreSQL database with PostGIS. Geoslurp is currently not yet in PyPI, but this is foreseen in the future once a relatively decent first version is developed.

What the future holds

At the time of writing, the geoslurp tools are still under development. But taking a look in the future, one may already philosophize about possible features:

  • Manage a geoslurp instance by means of a website-frontend. Currently, the main way to interact with geoslurp is to use the geoslurper command line script. This may be too cumbersome to learn for occasional users.

  • Hide the database access behind a REST-API. This may be more secure when public instances will be set up and will facilitate the interaction with other webservices.

  • Encourage reproducibility by registering scientific processing pipelines.