geoslurp.dataset package
Submodules
geoslurp.dataset.CSVBase module
- class geoslurp.dataset.CSVBase.CSVBase(dbconn)
Bases:
DataSetBase class which downloads reads in a CSV table and registers it in a db table
- csvfile = None
- hskip = 0
- lookup = None
- register()
Update/populate a database table from a CSV file). This function reads all rows from an open CSV file. The first line is expected to hold the COlumn names, which are mapped to types in the lookup string dictionary
- separator = ','
- table = None
- geoslurp.dataset.CSVBase.columnsFromCSV(line, lookup, sep=',')
reads column descriptors from comma separated values and creates a list of sqlalchemy columns
- geoslurp.dataset.CSVBase.valuesFromCSV(line, names, sep=',')
geoslurp.dataset.OGRBase module
- class geoslurp.dataset.OGRBase.OGRBase(dbconn)
Bases:
DataSetBase class which downloads a single OGR layer (e.g. shapefile) and registers it as a postgis table
- columnsFromOgrFeat(feat)
Returns a list of columns from a osgeo feature
- encoding = 'iso-8859-1'
- gtype = None
- ignoreFields = None
- layerregex = None
- ogrfile = None
- register()
Update/populate a database table (creates one if it doesn’t exist) This function reads a shapefile and puts it in a single table. :param ogrfile: gdal dataset (e.g. shapefile) :param forceGType (optional): a geometry type to be used as the “geom” column :returns nothing (but sets the internal qlalchemy table)
- spatindex = True
- swapxy = False
- table = None
- targetprj = None
- targetsrid = 4326
- valuesFromOgrFeat(feat, transform=None)
Returns a dictionary with loaded values from a feature
geoslurp.dataset.RasterBase module
- class geoslurp.dataset.RasterBase.RasterBase(dbcon)
Bases:
DataSetBase class to load raster (tiles) into the postgis database
- auxcolumns = None
- bandname = None
- columns()
- outofdb = False
- overviews = None
- preview = {}
- rastExtract(uri)
How things are extracted from the raster file (this may be overloaded in derived classes for more granular access
- rastFromGDAL(uri)
- rastFromRio(uri)
- rastregex = '.*'
- register()
Checks the directory for updated raster files and updates them in the database
- regularblocking = False
- srcdir = None
- srid = 4326
- swapxy = False
- tiles = None
geoslurp.dataset.cdsbase module
- class geoslurp.dataset.cdsbase.CDSBase(dbconn)
Bases:
DataSetProvides a Base class from which subclasses can inherit to download CDS hosted data
- addRequest(name, requestdict, priority=0)
- cdsalias = None
- columns = [Column('id', Integer(), table=None, primary_key=True, nullable=False), Column('name', String(), table=None), Column('lastupdate', DateTime(), table=None), Column('tstart', DateTime(), table=None), Column('tend', DateTime(), table=None), Column('uri', String(), table=None), Column('data', JSON(), table=None), Column('geom', Geography(geometry_type='POLYGON', srid=4326, dimension=2, from_text='ST_GeogFromText', name='geography'), table=None)]
- description = 'CDS subset downloaded from cds.climate.copernicus.eu'
- dformat = 'unarchived'
- getDefaultDict(geomshape=None)
- metaExtractor(uri)
implement this function in derived class
- oformat = 'netcdf'
- productType = None
- pull(maxreq=10)
Pulls the necessary data from the online resource
- register()
Register the downloaded dataset in the database
- reqdicts = {}
- res = 0.0
- resource = None
- resumejobs = False
- schema = 'cds'
- variables = []
geoslurp.dataset.dataSetBase module
- class geoslurp.dataset.dataSetBase.DataSet(dbcon)
Bases:
ABCAbstract Base class which hold a dataset (corresponding to a database table
- addEntry(metadict)
- bulkInsert(dictlist)
Insert a list of dicts in bulk mode
- cacheDir(subdirs=None)
returns the cache directory of this schema and dataset
- commitCounter = 0
- commitperN = 500
- createTable(cols=None, session=None)
dynamically creates a table (when it does not exists) from a list of colums
- dataDir(subdirs=None)
Returns the specialized data directory of this schema and dataset The directory will be created if it does not exist
- db = None
- dropTable()
- entryNeedsUpdate(likestr, lastmod, col=None)
Query for a Columns in the table based on a alike string and delete the entry when older than lastmod
- export(outputfile)
Export the table to a different format
- halt()
can be overridden to properly clean up an aborted operation
- isExpired()
Checks whether the table data is expired relative to to the updatefrequency
- migrate(version)
Properly migrate a table between software versions (note this function is supposed to be overridden in a derived class)
- abstract pull()
Pulls the necessary data from the online resource
- purgecache(filter='*')
Deletes the cache directory of the dataset, optionally applying a directory/filename filter
- purgedata(filter='*')
Deletes the data directory of the dataset,optionally applying a directory/filename filter
- purgeentry()
Delete dataset entry in the database
- abstract register()
Register the downloaded dataset in the database
- retainnewUris(urilist)
Filters those uris which have table entries which are too old or are not present in the database
- schema = 'public'
- setCacheDir(cdir)
- setDataDir(ddir)
- classmethod stname()
- stripuri = False
- table = None
- classmethod tname()
- truncateTable()
Truncate all entries in a table
- updateInvent(updateTime=True)
- updatefreq = None
- upsertEntry(metadict, index_elements)
- uriNeedsUpdate(urilikestr, lastmod)
Query for a URI in the table based on a alike string and delete the entry when older than lastmod
- version = (0, 0, 0)
- geoslurp.dataset.dataSetBase.rmfilterdir(ddir, filter='*')
Remove directories and files based on a certain regex filter
geoslurp.dataset.datasetgeneric module
geoslurp.dataset.era5 module
- class geoslurp.dataset.era5.ERA5Base(dbconn)
Bases:
CDSBaseProvides a Base class from which subclasses can inherit to download a subset of the data per area
- appendRequest(name, geomshape)
Builds a dictionary for the cdsapi :param geomshape (shapely geometry) geometry which will be used to compute the bounding box to download data for
- metaExtractor(ncuri)
implement this function in derived class
- productType = 'monthly_averaged_reanalysis'
- resource = 'reanalysis-era5-pressure-levels-monthly-means'
- scheme = 'atmo'
- time = '00:00'
- variables = []
- yrend = 2000
- yrstart = 2000
geoslurp.dataset.motuGridsBase module
- class geoslurp.dataset.motuGridsBase.MotuGridsBase(dbconn)
Bases:
RasterBaseDownloads and register subsets of gridded data with the motu client
- authalias = None
- auxcolumns = [Column('lastupdate', TIMESTAMP(), table=None), Column('time', ARRAY(TIMESTAMP()), table=None)]
- motuproduct = None
- moturoot = None
- motuservice = None
- outofdb = False
- pull(name=None, wsne=None, tstart=None, tend=None)
Pulls a subset of a gridded dataset as netcdf from an motu enabled server This routine calls the internal routines of the motuclient python client :param name: Name of the output datatset (file will be named ‘name.nc’) :param wsne: bounding box of the section of interest as [West,South,North,East] :param tstart: start date (as yyyy-mm-dd) for the extraction :param tend: end date (as yyyy-mm-dd) for the extraction
- rastExtract(uri)
extract raster and other meta info from the downloaded files
- rastregex = '\\.nc$'
- updated = []
- variables = None
geoslurp.dataset.pandasbase module
- class geoslurp.dataset.pandasbase.PandasBase(dbconn)
Bases:
DataSetBase class which reads in a pandas compatible table (CSV, excel, or in memory dataframe are currently supported) it in a db table
- columnsFromDataframe(df)
Returns a list of columns from a dataframe)
- encoding = None
- ftype = 'csv'
- geoinfo = (4326, 'geom', 'GEOMETRY', 2, 'rast')
- inbulk = False
- modify_df(df)
A derived type can overload this to make modifications to the dataframe before registering it in the database
- outdbArchiveName()
- pdfile = None
- pull()
overload when needed
- register(df=None)
Update/populate a database table from a pandas compatible file)
- registerInDatabase(df)
- setGeoInfo(df)
Try to extract srid, geometry type from a geopandas geodataframe
- xrappend_dim = None
geoslurp.dataset.xarraybase module
- class geoslurp.dataset.xarraybase.XarrayBase(dbconn)
Bases:
DataSetBase class which allows writing an xarray dataarray/dataset into db table
- columnsFromXar(ds)
Returns a groupby obejct and a list of columns from an xarray object (groupedby)
- groupby = None
- inbulk = False
- outdbArchiveName()
- outofdb = False
- pull()
overload when needed
- register(ds=None)
Update/populate a database table from a xarray compatible file or from a dataset directly)
- registerInDatabase(ds)
- timename = 'time'
- writeoutofdb = True
- xarfile = None