argopy.stores.httpstore_erddap_auth.open_mfjson

argopy.stores.httpstore_erddap_auth.open_mfjson#

httpstore_erddap_auth.open_mfjson(urls, max_workers: int = 6, method: str = 'thread', progress: bool | str = False, preprocess=None, preprocess_opts={}, open_json_opts={}, url_follow=False, errors: str = 'ignore', *args, **kwargs)#

Download and process a collection of JSON documents from urls

This is a version of the httpstore.open_json method that is able to handle a list of urls sequentially or in parallel.

This method uses a concurrent.futures.ThreadPoolExecutor by default for parallelization. See method parameters below for more options.

Parameters:
  • urls (list(str))

  • max_workers (int) – Maximum number of threads or processes.

  • method (str, default: thread) –

    Define the parallelization method:

  • progress (bool, default: False) – Display a progress bar if possible

  • preprocess (collections.abc.Callable (optional)) – If provided, call this function on each dataset prior to concatenation

  • preprocess_opts (dict (optional)) – Options passed to the preprocess collections.abc.Callable, if any.

  • url_follow (bool, False) – Follow the URL to the preprocess method as url argument.

  • errors (str, default: ignore) –

    Define how to handle errors raised during data URIs fetching:
    • ignore (default): Do not stop processing, simply issue a debug message in logging console

    • raise: Raise any error encountered

    • silent: Do not stop processing and do not issue log message

Return type:

list()

Notes

For the distributed.client.Client and concurrent.futures.ProcessPoolExecutor to work appropriately, the pre-processing collections.abc.Callable must be serializable. This can be checked with:

>>> from distributed.protocol import serialize
>>> from distributed.protocol.serialize import ToPickle
>>> serialize(ToPickle(preprocess_function))