argopy.stores.httpstore_erddap_auth.open_mfdataset#
- httpstore_erddap_auth.open_mfdataset(urls, max_workers: int = 6, method: str = 'thread', progress: bool | str = False, concat: bool = True, concat_dim: str = 'row', concat_method: Literal['drop', 'fill'] = 'drop', preprocess: Callable | None = None, preprocess_opts: dict = {}, open_dataset_opts: dict = {}, errors: Literal['ignore', 'raise', 'silent'] = 'ignore', compute_details: bool = False, *args, **kwargs) Dataset | List[Dataset] #
Download and process multiple urls as a single or a collection of
xarray.Dataset
This is a version of the
httpstore.open_dataset
method that is able to handle a list of urls sequentially or in parallel.This method uses a
concurrent.futures.ThreadPoolExecutor
by default for parallelization. See themethod
parameter below for more options.- Parameters:
max_workers (int, default: 6) β Maximum number of threads or processes
method (str, default:
thread
) β- Define the parallelization method:
thread
(default): based onconcurrent.futures.ThreadPoolExecutor
with a pool of at mostmax_workers
threadsprocess
: based onconcurrent.futures.ProcessPoolExecutor
with a pool of at mostmax_workers
processesdistributed.client.Client
: use a Dask clientsequential
/seq
: open data sequentially in a simple loop, no parallelization appliederddap
: provides a detailed progress bar for erddap URLs, otherwise based on aconcurrent.futures.ThreadPoolExecutor
with a pool of at mostmax_workers
progress (bool, default: False) β Display a progress bar
concat (bool, default: True) β Concatenate results in a single
xarray.Dataset
or not (in this case, function will return a list ofxarray.Dataset
)concat_dim (str, default:
row
) β Name of the dimension to use to concatenate all datasets (passed toxarray.concat()
)preprocess (
collections.abc.Callable
(optional)) β If provided, call this function on each dataset prior to concatenationpreprocess_opts (dict (optional)) β Options passed to the
preprocess
collections.abc.Callable
, if any.errors (str, default:
ignore
) β- Define how to handle errors raised during data URIs fetching:
ignore
(default): Do not stop processing, simply issue a debug message in logging consoleraise
: Raise any error encounteredsilent
: Do not stop processing and do not issue log message
- Return type:
xarray.Dataset
or list ofxarray.Dataset
See also
Notes
For the
distributed.client.Client
andconcurrent.futures.ProcessPoolExecutor
to work appropriately, the pre-processingcollections.abc.Callable
must be serializable. This can be checked with:>>> from distributed.protocol import serialize >>> from distributed.protocol.serialize import ToPickle >>> serialize(ToPickle(preprocess_function))