Core API#
fetchez.core#
This module is the core of the Fetchez library. It handles the initialization of fetchers, connection pooling, threading, and the base FetchModule class.
- copyright:
2010-2026 Regents of the University of Colorado
- license:
MIT, see LICENSE for more details.
- fetchez.core.fetches_callback(r)[source]#
Default callback for fetches processes. r: [url, local-fn, data-type, fetch-status-or-error-code]
- fetchez.core.get_userpass(authenticator_url)[source]#
Retrieve username and password from netrc for a given URL.
- fetchez.core.get_credentials(url, authenticator_url='https://urs.earthdata.nasa.gov')[source]#
Get user credentials from .netrc or prompt for input. Used for EarthData, etc.
- class fetchez.core.iso_xml(url=None, xml=None, timeout=20, read_timeout=60)[source]#
Bases:
objectHelper class for parsing ISO 19115 XML Metadata.
- class fetchez.core.HttpFile(url, session=None, callback=None)[source]#
Bases:
IOBaseA file-like object backed by an HTTP URL.
Translates read() calls into HTTP Range requests to fetch only needed bytes.
- seek(offset, whence=0)[source]#
Change the stream position to the given byte offset.
- offset
The stream position, relative to ‘whence’.
- whence
The relative position to seek from.
The offset is interpreted relative to the position indicated by whence. Values for whence are:
os.SEEK_SET or 0 – start of stream (the default); offset should be zero or positive
os.SEEK_CUR or 1 – current stream position; offset may be negative
os.SEEK_END or 2 – end of stream; offset is usually negative
Return the new absolute position.
- class fetchez.core.Fetch(url, callback=<function fetches_callback>, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:146.0) Gecko/20100101 Firefox/146.0'}, verify=True, allow_redirects=True)[source]#
Bases:
objectFetch class to fetch ftp/http data files
- Parameters:
- __init__(url, callback=<function fetches_callback>, headers={'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:146.0) Gecko/20100101 Firefox/146.0'}, verify=True, allow_redirects=True)[source]#
- fetch_req(method='GET', params=None, data=None, json=None, tries=5, timeout=30, read_timeout=120)[source]#
Fetch src_url and return the requests object (iterative retry).
- fetchez.core.run_fetchez(modules, threads=3, global_hooks=None)[source]#
Run Fetchez in parallel with hooks.
mod.hooks: Run ONLY on entries belonging to ‘mod’.
global_hooks: Run on ALL entries combined.
- Parameters:
modules (
List[FetchModule])threads (
int, default:3)
- class fetchez.core.FetchModule(src_region=None, callback=<function fetches_callback>, hook=None, outdir=None, name='fetches', min_year=None, max_year=None, weight=1.0, uncertainty=0.0, params={}, **kwargs)[source]#
Bases:
objectBase class for all fetch modules.
- __init__(src_region=None, callback=<function fetches_callback>, hook=None, outdir=None, name='fetches', min_year=None, max_year=None, weight=1.0, uncertainty=0.0, params={}, **kwargs)[source]#
- property hooks#
Combine internal and external hooks in the correct execution order.
- class fetchez.core.HttpDataset(url=None, **kwargs)[source]#
Bases:
FetchModuleFetch an http file directly.