geoslurp.dataset package

Submodules

geoslurp.dataset.CSVBase module

class geoslurp.dataset.CSVBase.CSVBase(dbconn)

Bases: DataSet

Base class which downloads reads in a CSV table and registers it in a db table

csvfile = None
hskip = 0
lookup = None
register()

Update/populate a database table from a CSV file). This function reads all rows from an open CSV file. The first line is expected to hold the COlumn names, which are mapped to types in the lookup string dictionary

separator = ','
table = None
geoslurp.dataset.CSVBase.columnsFromCSV(line, lookup, sep=',')

reads column descriptors from comma separated values and creates a list of sqlalchemy columns

geoslurp.dataset.CSVBase.valuesFromCSV(line, names, sep=',')

geoslurp.dataset.OGRBase module

class geoslurp.dataset.OGRBase.OGRBase(dbconn)

Bases: DataSet

Base class which downloads a single OGR layer (e.g. shapefile) and registers it as a postgis table

columnsFromOgrFeat(feat)

Returns a list of columns from a osgeo feature

encoding = 'iso-8859-1'
gtype = None
ignoreFields = None
layerregex = None
ogrfile = None
register()

Update/populate a database table (creates one if it doesn’t exist) This function reads a shapefile and puts it in a single table. :param ogrfile: gdal dataset (e.g. shapefile) :param forceGType (optional): a geometry type to be used as the “geom” column :returns nothing (but sets the internal qlalchemy table)

spatindex = True
swapxy = False
table = None
targetprj = None
targetsrid = 4326
valuesFromOgrFeat(feat, transform=None)

Returns a dictionary with loaded values from a feature

geoslurp.dataset.RasterBase module

class geoslurp.dataset.RasterBase.RasterBase(dbcon)

Bases: DataSet

Base class to load raster (tiles) into the postgis database

auxcolumns = None
bandname = None
columns()
outofdb = False
overviews = None
preview = {}
rastExtract(uri)

How things are extracted from the raster file (this may be overloaded in derived classes for more granular access

rastFromGDAL(uri)
rastFromRio(uri)
rastregex = '.*'
register()

Checks the directory for updated raster files and updates them in the database

regularblocking = False
srcdir = None
srid = 4326
swapxy = False
tiles = None

geoslurp.dataset.cdsbase module

class geoslurp.dataset.cdsbase.CDSBase(dbconn)

Bases: DataSet

Provides a Base class from which subclasses can inherit to download CDS hosted data

addRequest(name, requestdict, priority=0)
cdsalias = None
columns = [Column('id', Integer(), table=None, primary_key=True, nullable=False), Column('name', String(), table=None), Column('lastupdate', DateTime(), table=None), Column('tstart', DateTime(), table=None), Column('tend', DateTime(), table=None), Column('uri', String(), table=None), Column('data', JSON(), table=None), Column('geom', Geography(geometry_type='POLYGON', srid=4326, dimension=2, from_text='ST_GeogFromText', name='geography'), table=None)]
description = 'CDS subset downloaded from cds.climate.copernicus.eu'
dformat = 'unarchived'
getDefaultDict(geomshape=None)
metaExtractor(uri)

implement this function in derived class

oformat = 'netcdf'
productType = None
pull(maxreq=10)

Pulls the necessary data from the online resource

register()

Register the downloaded dataset in the database

reqdicts = {}
res = 0.0
resource = None
resumejobs = False
schema = 'cds'
variables = []

geoslurp.dataset.dataSetBase module

class geoslurp.dataset.dataSetBase.DataSet(dbcon)

Bases: ABC

Abstract Base class which hold a dataset (corresponding to a database table

addEntry(metadict)
bulkInsert(dictlist)

Insert a list of dicts in bulk mode

cacheDir(subdirs=None)

returns the cache directory of this schema and dataset

commitCounter = 0
commitperN = 500
createTable(cols=None, session=None)

dynamically creates a table (when it does not exists) from a list of colums

dataDir(subdirs=None)

Returns the specialized data directory of this schema and dataset The directory will be created if it does not exist

db = None
dropTable()
entryNeedsUpdate(likestr, lastmod, col=None)

Query for a Columns in the table based on a alike string and delete the entry when older than lastmod

export(outputfile)

Export the table to a different format

halt()

can be overridden to properly clean up an aborted operation

isExpired()

Checks whether the table data is expired relative to to the updatefrequency

migrate(version)

Properly migrate a table between software versions (note this function is supposed to be overridden in a derived class)

abstract pull()

Pulls the necessary data from the online resource

purgecache(filter='*')

Deletes the cache directory of the dataset, optionally applying a directory/filename filter

purgedata(filter='*')

Deletes the data directory of the dataset,optionally applying a directory/filename filter

purgeentry()

Delete dataset entry in the database

abstract register()

Register the downloaded dataset in the database

retainnewUris(urilist)

Filters those uris which have table entries which are too old or are not present in the database

schema = 'public'
setCacheDir(cdir)
setDataDir(ddir)
classmethod stname()
stripuri = False
table = None
classmethod tname()
truncateTable()

Truncate all entries in a table

updateInvent(updateTime=True)
updatefreq = None
upsertEntry(metadict, index_elements)
uriNeedsUpdate(urilikestr, lastmod)

Query for a URI in the table based on a alike string and delete the entry when older than lastmod

version = (0, 0, 0)
geoslurp.dataset.dataSetBase.rmfilterdir(ddir, filter='*')

Remove directories and files based on a certain regex filter

geoslurp.dataset.datasetgeneric module

class geoslurp.dataset.datasetgeneric.DataSetGeneric(dbcon)

Bases: DataSet

pull()

Pulls the necessary data from the online resource

register()

Register the downloaded dataset in the database

scheme = 'anyscheme'

geoslurp.dataset.era5 module

class geoslurp.dataset.era5.ERA5Base(dbconn)

Bases: CDSBase

Provides a Base class from which subclasses can inherit to download a subset of the data per area

appendRequest(name, geomshape)

Builds a dictionary for the cdsapi :param geomshape (shapely geometry) geometry which will be used to compute the bounding box to download data for

metaExtractor(ncuri)

implement this function in derived class

productType = 'monthly_averaged_reanalysis'
resource = 'reanalysis-era5-pressure-levels-monthly-means'
scheme = 'atmo'
time = '00:00'
variables = []
yrend = 2000
yrstart = 2000

geoslurp.dataset.motuGridsBase module

class geoslurp.dataset.motuGridsBase.MotuGridsBase(dbconn)

Bases: RasterBase

Downloads and register subsets of gridded data with the motu client

authalias = None
auxcolumns = [Column('lastupdate', TIMESTAMP(), table=None), Column('time', ARRAY(TIMESTAMP()), table=None)]
motuproduct = None
moturoot = None
motuservice = None
outofdb = False
pull(name=None, wsne=None, tstart=None, tend=None)

Pulls a subset of a gridded dataset as netcdf from an motu enabled server This routine calls the internal routines of the motuclient python client :param name: Name of the output datatset (file will be named ‘name.nc’) :param wsne: bounding box of the section of interest as [West,South,North,East] :param tstart: start date (as yyyy-mm-dd) for the extraction :param tend: end date (as yyyy-mm-dd) for the extraction

rastExtract(uri)

extract raster and other meta info from the downloaded files

rastregex = '\\.nc$'
updated = []
variables = None

geoslurp.dataset.pandasbase module

class geoslurp.dataset.pandasbase.PandasBase(dbconn)

Bases: DataSet

Base class which reads in a pandas compatible table (CSV, excel, or in memory dataframe are currently supported) it in a db table

columnsFromDataframe(df)

Returns a list of columns from a dataframe)

encoding = None
ftype = 'csv'
geoinfo = (4326, 'geom', 'GEOMETRY', 2, 'rast')
inbulk = False
modify_df(df)

A derived type can overload this to make modifications to the dataframe before registering it in the database

outdbArchiveName()
pdfile = None
pull()

overload when needed

register(df=None)

Update/populate a database table from a pandas compatible file)

registerInDatabase(df)
setGeoInfo(df)

Try to extract srid, geometry type from a geopandas geodataframe

skipfooter = 0
xrappend_dim = None
class geoslurp.dataset.pandasbase.geoinfo(srid, geoname, geomtype, dims, rastname)

Bases: tuple

dims

Alias for field number 3

geomtype

Alias for field number 2

geoname

Alias for field number 1

rastname

Alias for field number 4

srid

Alias for field number 0

geoslurp.dataset.xarraybase module

class geoslurp.dataset.xarraybase.XarrayBase(dbconn)

Bases: DataSet

Base class which allows writing an xarray dataarray/dataset into db table

columnsFromXar(ds)

Returns a groupby obejct and a list of columns from an xarray object (groupedby)

groupby = None
inbulk = False
outdbArchiveName()
outofdb = False
pull()

overload when needed

register(ds=None)

Update/populate a database table from a xarray compatible file or from a dataset directly)

registerInDatabase(ds)
timename = 'time'
writeoutofdb = True
xarfile = None