argopy.stores.httpstore.open_mfdataset#
- httpstore.open_mfdataset(urls, max_workers: int = 6, method: str = 'thread', progress: bool | str = False, concat: bool = True, concat_dim: str = 'row', concat_method: Literal['drop', 'fill'] = 'drop', preprocess: Callable | None = None, preprocess_opts: dict = {}, open_dataset_opts: dict = {}, errors: Literal['ignore', 'raise', 'silent'] = 'ignore', compute_details: bool = False, *args, **kwargs) Dataset | List[Dataset][source]#
Download and process multiple urls as a single or a collection of
xarray.DatasetThis is a version of the
httpstore.open_datasetmethod that is able to handle a list of urls sequentially or in parallel.This method uses a
concurrent.futures.ThreadPoolExecutorby default for parallelization. See themethodparameter below for more options.- Parameters:
max_workers (int, default: 6) β Maximum number of threads or processes
method (str, default:
thread) β- Define the parallelization method:
thread(default): based onconcurrent.futures.ThreadPoolExecutorwith a pool of at mostmax_workersthreadsprocess: based onconcurrent.futures.ProcessPoolExecutorwith a pool of at mostmax_workersprocessesdistributed.client.Client: use a Dask clientsequential/seq: open data sequentially in a simple loop, no parallelization appliederddap: provides a detailed progress bar for erddap URLs, otherwise based on aconcurrent.futures.ThreadPoolExecutorwith a pool of at mostmax_workers
progress (bool, default: False) β Display a progress bar
concat (bool, default: True) β Concatenate results in a single
xarray.Datasetor not (in this case, function will return a list ofxarray.Dataset)concat_dim (str, default:
row) β Name of the dimension to use to concatenate all datasets (passed toxarray.concat())preprocess (
collections.abc.Callable(optional)) β If provided, call this function on each dataset prior to concatenationpreprocess_opts (dict (optional)) β Options passed to the
preprocesscollections.abc.Callable, if any.errors (str, default:
ignore) β- Define how to handle errors raised during data URIs fetching:
ignore(default): Do not stop processing, simply issue a debug message in logging consoleraise: Raise any error encounteredsilent: Do not stop processing and do not issue log message
- Return type:
xarray.Datasetor list ofxarray.Dataset
See also
Notes
For the
distributed.client.Clientandconcurrent.futures.ProcessPoolExecutorto work appropriately, the pre-processingcollections.abc.Callablemust be serializable. This can be checked with:>>> from distributed.protocol import serialize >>> from distributed.protocol.serialize import ToPickle >>> serialize(ToPickle(preprocess_function))