geoslurp.datapull package
Submodules
geoslurp.datapull.cds module
- class geoslurp.datapull.cds.Cds(resource, jobqueue={}, auth=None)
Bases:
object- clearRequests(removestates=['downloaded', 'unavailable', 'failed'])
clears certain requests and updates the jobqueue
- downloadQueue(sleep=30)
- loadRequests()
Load previous requests from job queue
- queueRequest(fout, requestDict)
geoslurp.datapull.crawler module
- class geoslurp.datapull.crawler.CrawlerBase(url)
Bases:
ABC- parallelDownload(outdir, check=False, maxconn=8, gzip=False, continueonError=False)
Download uris in parallel :param direc: directory to download to :param check: Only download when newer or non-existent (default to False) :param maxconn: amount of parallel downloads to execute :param continueOnError (bool): keep trying
- rooturl = None
- abstract uris()
Generator which returns uri’s to requested datasets
geoslurp.datapull.ftp module
- class geoslurp.datapull.ftp.Crawler(url, pattern='.*', followpattern='.*', auth=None)
Bases:
CrawlerBaseCrawler for ftp directories
- ls(subdirs='')
List directories and files (generator)
- uris(check=False, subdirs='')
Generate a list files in a directory and return a list of uri
geoslurp.datapull.geodesyunr module
- class geoslurp.datapull.geodesyunr.Crawler(catalogfile)
Bases:
CrawlerBaseCrawl the gps tenv3 data on geodesy.unr.edu
- uris(refresh=True)
List uris of available gps final data in tenv3 format
geoslurp.datapull.github module
- class geoslurp.datapull.github.Crawler(reponame, commitsha=None, filter=<geoslurp.datapull.github.GithubFilter object>, followfilt=<geoslurp.datapull.github.GithubFilter object>, oauthtoken=None)
Bases:
CrawlerBaseCrawls a github repository fixed to a certain commit
- getSubTree(url)
- treeitems(rootelem=None, depth=10, dirpath=None)
generator which recursively list all elements in a git tree
- uris(depth=10)
Construct Uris from tree nodes
- class geoslurp.datapull.github.GithubFilter(regexdict={'type': 'blob'})
Bases:
objectFilter used for testing a certain dict element
- isValid(elem)
Returns True if all of the regex criteria match the elem
- geoslurp.datapull.github.cachedGithubCatalogue(reponame, cachedir='.', commitsha=None, gfilter=<geoslurp.datapull.github.GithubFilter object>, gfollowfilter=<geoslurp.datapull.github.GithubFilter object>, depth=2, ghtoken=None)
Caches the result of a github result for later reuse
geoslurp.datapull.http module
geoslurp.datapull.motu module
- class geoslurp.datapull.motu.MotuOpts(moturoot, service, product, auth, btdbox, fout, cache, variables=None)
Bases:
objectA class which mimics the options from argparse as used by the motuclient command line program
- auth_mode = 'cas'
- block_size = 12001
- btdbox = <geoslurp.tools.Bounds.BtdBox object>
- cache = '.'
- console_mode = False
- date_max = '9999-12-31 23:59:59'
- date_min = '1-01-01 00:00:00'
- depth_max = None
- depth_min = None
- describe = False
- extraction_geographic = True
- extraction_vertical = False
- fullname()
- latitude_max = None
- latitude_min = None
- longitude_max = None
- longitude_min = None
- motu = None
- out_dir = '.'
- out_name = 'dataset.nc'
- outputWritten = 'netcdf'
- product_id = None
- proxy_server = None
- pwd = None
- service_id = None
- size = False
- socket_timeout = 515
- sync = False
- syncbtdbox(bbox=None)
Sets the internal btdbox and synchronize the corresponding motu variables
- syncfilename(fout)
- user = None
- user_agent = 'motu-api-client'
- variable = None
- class geoslurp.datapull.motu.MotuRecursive(mopts, keepfiles=False)
Bases:
objectClass which recursively downloads netcdf files within the 1GB limit using motu and patches them together
- download()
Download file
- keepfiles = False
- class geoslurp.datapull.motu.Uri(Mopts)
Bases:
UriBase- download(direc, check=False, gzip=False, outfile=None)
Download file into directory and possibly check the modification time :param check : check whether the file needs updating :param gzip: additionally gzips the file (adds .gz to file name) :param continueonError (bool): don’t raise an exception when a download error occurrs
- info = False
- kbsize = 0
- maxbtdbox = <geoslurp.tools.Bounds.BtdBox object>
- maxkbsize = 0
- requestInfo()
Request info (modification time, size, datacoverage) on this specific query from the server
- updateModTime()
Requests data description from the motu service
- updateSize()
Request information about the size of the query
geoslurp.datapull.rsync module
- class geoslurp.datapull.rsync.Crawler(url, auth)
Bases:
CrawlerBaseCrawler wrapper around the rsync program calls the linux rsync utility
- ls()
list remote content (using dry run)
- parallelDownload(outdir, check=False, includes=None, dryrun=False)
Download uris in parallel :param direc: directory to download to :param check: Only download when newer or non-existent (default to False) :param maxconn: amount of parallel downloads to execute :param continueOnError (bool): keep trying
- startrsync(cmd)
Start rsync and returns the list of files as a generator
- uris()
Generator which returns uri’s to requested datasets
geoslurp.datapull.sftp module
geoslurp.datapull.thredds module
- class geoslurp.datapull.thredds.Crawler(catalogurl, filter=<geoslurp.datapull.thredds.ThreddsFilter object>, followfilter=<geoslurp.datapull.thredds.ThreddsFilter object>, auth=None)
Bases:
CrawlerBaseA class to work with an Opendap server
- static getCatalog(url, auth=None)
Retrieve a catalogue
- static getServices(catalog, rooturl, depth=2)
Retrieves the root for serving files over http url from a catalogue
- setResumePoint(filter, followfilt=None)
Sets the filters after which the normal filters will be applied.
- unsetResumePoint()
Unset resume point
- uris(depth=10)
Generates a list of threddsURI’s (makes use of xmlitems())
- xmlitems(xmlcatalog=None, url=None, depth=10)
Generator which returns xml nodes which obey a certain filter Nodes which obey the followFilter will be recursively searched
- class geoslurp.datapull.thredds.ThreddsFilter(xmltyp='*', attr=None, regex=None)
Bases:
objectHelper class to aid traversing to opendap xml elements
- AND(xmltyp, attr=None, regex=None)
Provides a method for chaining OR filters
- OR(xmltyp, attr=None, regex=None)
Provides a method for chaining OR filters
- isCatalog()
Check if the filter type is a catalogRef
- isValid(xmlelem)
Filter xmlelem on attributes
- class geoslurp.datapull.thredds.Uri(dataxml, services, auth=None)
Bases:
UriBaseThredds URI class
- opendap = None
- suburl = None
- geoslurp.datapull.thredds.getAttrib(xml, regex)
Search in xml attributes based on a regex
- geoslurp.datapull.thredds.getDate(xml)
extracts the date from a dataset element
- geoslurp.datapull.thredds.getTagEnding(xml)
Strip the leading junk ({…}) from a tag
- geoslurp.datapull.thredds.gethref(input)
small function to extract a href link from a dictionary
geoslurp.datapull.uri module
- class geoslurp.datapull.uri.UriBase(url, lastmod=None, auth=None, subdirs='', headers=None, cookiefile=None, checkssl=True)
Bases:
objectBase class to store uri resource
- auth = None
- buffer()
Download file into a buffer (default uses curl)
- download(direc, check=False, gzip=False, gunzip=False, outfile=None, continueonError=False, restdict=None)
Download file into directory and possibly check the modification time :param check : check whether the file needs updating :param gzip: additionally gzips the file (adds .gz to file name) :param continueonError (bool): don’t raise an exception when a download error occurrs
- headers = None
- lastmod = None
- subdirs = ''
- updateModTime()
Tries to retrieve the last modification time of a file Note: this is often not supported by the server
- url = None
- class geoslurp.datapull.uri.UriFile(url, lastmod=None)
Bases:
UriBase- buffer()
Download file into a buffer (default uses curl)
- updateModTime()
Tries to retrieve the last modification time of a file Note: this is often not supported by the server
- geoslurp.datapull.uri.curlDownload(url, fileorfid, mtime=None, gzip=False, gunzip=False, auth=None, restdict=None, headers=None, customRequest=None, upfid=None, cookiefile=None, checkssl=True)
Download the content of an url to an open file or buffer using pycurl :param url: url to download from :param fileorfid: filename or open file or buffer :param mtime: explicitly set the modification time to this (usefull when modification times are not supported b the server) :param gzip: additionally gzip the file on disk (note this routine does not append .gz to the file name) :param gunzip: automatically gunzip the downloaded file :param auth: supply authentification data (user and passw) :param restdic: a set of (REST) API name-value pairs to be added to the url (provide as a dict) :param headers (array of header values): additionally set header elements :param customRequest: set a custoi request (e.g. for WEBDAV servers) :return: modification time of remote file
- geoslurp.datapull.uri.findFiles(dir, pattern, since=None)
Generator to recursively search adirecctor (returns a generator)
- geoslurp.datapull.uri.setFtime(file, modTime=None)
change modification and access time of a file
- geoslurp.datapull.uri.timeFromStamp(stamp)
geoslurp.datapull.webdav module
- class geoslurp.datapull.webdav.Crawler(rooturl, pattern, auth, depth=1)
Bases:
CrawlerBaseWebdav Crawler (list content of a directory)
- find(urlin, depth)
List files in a webdav directory and recursively do this for directories untill the depth is exhausted
- pattern = None
- uris()
Generator which returns uri’s to requested datasets