argopy.stores.httpstore_erddap_auth.open_mfjson#
- httpstore_erddap_auth.open_mfjson(urls, max_workers: int = 6, method: str = 'thread', progress: bool | str = False, preprocess=None, preprocess_opts={}, open_json_opts={}, url_follow=False, errors: str = 'ignore', *args, **kwargs)#
Download and process a collection of JSON documents from urls
This is a version of the
httpstore.open_json
method that is able to handle a list of urls sequentially or in parallel.This method uses a
concurrent.futures.ThreadPoolExecutor
by default for parallelization. Seemethod
parameters below for more options.- Parameters:
max_workers (int) β Maximum number of threads or processes.
method (str, default:
thread
) β- Define the parallelization method:
thread
(default): based onconcurrent.futures.ThreadPoolExecutor
with a pool of at mostmax_workers
threadsprocess
: based onconcurrent.futures.ProcessPoolExecutor
with a pool of at mostmax_workers
processesdistributed.client.Client
: use a Dask clientsequential
/seq
: open data sequentially in a simple loop, no parallelization applied
progress (bool, default: False) β Display a progress bar if possible
preprocess (
collections.abc.Callable
(optional)) β If provided, call this function on each dataset prior to concatenationpreprocess_opts (dict (optional)) β Options passed to the
preprocess
collections.abc.Callable
, if any.url_follow (bool, False) β Follow the URL to the preprocess method as
url
argument.errors (str, default:
ignore
) β- Define how to handle errors raised during data URIs fetching:
ignore
(default): Do not stop processing, simply issue a debug message in logging consoleraise
: Raise any error encounteredsilent
: Do not stop processing and do not issue log message
- Return type:
list()
Notes
For the
distributed.client.Client
andconcurrent.futures.ProcessPoolExecutor
to work appropriately, the pre-processingcollections.abc.Callable
must be serializable. This can be checked with:>>> from distributed.protocol import serialize >>> from distributed.protocol.serialize import ToPickle >>> serialize(ToPickle(preprocess_function))