argopy.stores.httpstore_erddap_auth.open_mfdataset

argopy.stores.httpstore_erddap_auth.open_mfdataset#

httpstore_erddap_auth.open_mfdataset(urls, max_workers: int = 6, method: str = 'thread', progress: bool | str = False, concat: bool = True, concat_dim: str = 'row', concat_method: Literal['drop', 'fill'] = 'drop', preprocess: Callable | None = None, preprocess_opts: dict = {}, open_dataset_opts: dict = {}, errors: Literal['ignore', 'raise', 'silent'] = 'ignore', compute_details: bool = False, *args, **kwargs) Dataset | List[Dataset]#

Download and process multiple urls as a single or a collection of xarray.Dataset

This is a version of the httpstore.open_dataset method that is able to handle a list of urls sequentially or in parallel.

This method uses a concurrent.futures.ThreadPoolExecutor by default for parallelization. See the method parameter below for more options.

Parameters:
  • urls (list(str)) – List of url/path to open

  • max_workers (int, default: 6) – Maximum number of threads or processes

  • method (str, default: thread) –

    Define the parallelization method:

  • progress (bool, default: False) – Display a progress bar

  • concat (bool, default: True) – Concatenate results in a single xarray.Dataset or not (in this case, function will return a list of xarray.Dataset)

  • concat_dim (str, default: row) – Name of the dimension to use to concatenate all datasets (passed to xarray.concat())

  • preprocess (collections.abc.Callable (optional)) – If provided, call this function on each dataset prior to concatenation

  • preprocess_opts (dict (optional)) – Options passed to the preprocess collections.abc.Callable, if any.

  • errors (str, default: ignore) –

    Define how to handle errors raised during data URIs fetching:
    • ignore (default): Do not stop processing, simply issue a debug message in logging console

    • raise: Raise any error encountered

    • silent: Do not stop processing and do not issue log message

Return type:

xarray.Dataset or list of xarray.Dataset

Notes

For the distributed.client.Client and concurrent.futures.ProcessPoolExecutor to work appropriately, the pre-processing collections.abc.Callable must be serializable. This can be checked with:

>>> from distributed.protocol import serialize
>>> from distributed.protocol.serialize import ToPickle
>>> serialize(ToPickle(preprocess_function))