An introduction to Geoslurp
Geoslurp is a pure-python module which allows downloading/updating/querying of different datasets in a spatially aware database (PostgreSQL with PostGIS).
The idea behind Geoslurp is to centrally manage datasets, which has the following advantages:
It provides and encourages a central point to access to various datasets
Established database functionatility can be exploited by allowing (spatial) queries of the datasets, joins (querying a combination of datasets)
Sharing is encouraged, users share data by default, making it possible for other users to use the registered data and avoiding copies of large datasets
It allows for consistent versioning of datasets
The use of an ‘off-the-shelf’ database provides a standard and mature interface reachable from various programming languages.
Large datafiles don’t necessarily need to be in the database. The metainformation, required for the queries, will be stored together with a link to where the data can be found (e.g. a local file path or an online url). This allows relatively light-weight databases to be set up.
Provide ways to standardize downloading/updating/registering of datasets
Requirements
To use Geoslurp one must essentially install the Geoslurp Python package, and have a running PostgreSQL database with PostGIS. Geoslurp is currently not yet in PyPI, but this is foreseen in the future once a relatively decent first version is developed.
What the future holds
At the time of writing, the geoslurp tools are still under development. But taking a look in the future, one may already philosophize about possible features:
Manage a geoslurp instance by means of a website-frontend. Currently, the main way to interact with geoslurp is to use the geoslurper command line script. This may be too cumbersome to learn for occasional users.
Hide the database access behind a REST-API. This may be more secure when public instances will be set up and will facilitate the interaction with other webservices.
Encourage reproducibility by registering scientific processing pipelines.