argopy.stores.ArgoIndex

argopy.stores.ArgoIndex#

class ArgoIndex(**kwargs)[source]#

Argo GDAC index store

If Pyarrow is available, this class will use pyarrow.Table as internal storage format; otherwise, a pandas.DataFrame will be used.

Shortcuts for host argument:

http or https for https://data-argo.ifremer.fr
us-http or us-https for https://usgodae.org/pub/outgoing/argo
ftp for ftp://ftp.ifremer.fr/ifremer/argo
s3 or aws for s3://argo-gdac-sandbox/pub/idx

Shortcuts for index_file argument:

core for the ar_index_global_prof.txt index file,
bgc-b for the argo_bio-profile_index.txt index file,
bgc-s for the argo_synthetic-profile_index.txt index file,
aux for the etc/argo-index/argo_aux-profile_index.txt index file.
meta for the ar_index_global_meta.txt index file.

Examples

An index store is instantiated with a host (any access path, local, http or ftp) and an index file#

>>> idx = ArgoIndex()
>>> idx = ArgoIndex(host="https://data-argo.ifremer.fr")  # Default host
>>> idx = ArgoIndex(host="ftp://ftp.ifremer.fr/ifremer/argo", index_file="ar_index_global_prof.txt")  # Default index
>>> idx = ArgoIndex(index_file="bgc-s")  # Use keywords instead of exact file names
>>> idx = ArgoIndex(host="https://data-argo.ifremer.fr", index_file="bgc-b", cache=True)  # Use cache for performances
>>> idx = ArgoIndex(host=".", index_file="dummy_index.txt", convention="core")  # Load your own index

Full index methods and properties#

>>> idx.load()
>>> idx.load(nrows=12)  # Only load the first N rows of the index
>>> idx.to_dataframe(index=True)  # Convert index to user-friendly :class:`pandas.DataFrame`
>>> idx.to_dataframe(index=True, nrows=2)  # Only returns the first nrows of the index
>>> idx.N_RECORDS  # Shortcut for length of 1st dimension of the index array
>>> idx.index  # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`)
>>> idx.shape  # shape of the full index array
>>> idx.uri_full_index  # List of absolute path to files from the full index table column 'file'

Search methods#

>>> idx.query.wmo(1901393)
>>> idx.query.cyc(1)
>>> idx.query.wmo_cyc(1901393, [1,12])

>>> idx.query.lat([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.query.lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.query.date([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.query.lon_lat([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.query.box([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition

>>> idx.query.params(['C1PHASE_DOXY', 'DOWNWELLING_PAR'])  # Take a list of strings, only for BGC index !
>>> idx.query.parameter_data_mode({'BBP700': 'D', 'DOXY': ['A', 'D']})  # Take a dict.

>>> idx.query.profiler_type(845)
>>> idx.query.profiler_label('NINJA')

Composing search methods#

>>> idx.query.compose({'box': BOX, 'wmo': WMOs})
>>> idx.query.compose({'box': BOX, 'params': 'DOXY'})
>>> idx.query.compose({'box': BOX, 'params': (['DOXY', 'DOXY2'], {'logical': 'and'})})
>>> idx.query.compose({'params': 'DOXY', 'profiler_label': 'ARVOR'})

Search result properties and methods#

>>> idx.N_MATCH  # Shortcut for length of 1st dimension of the search results array
>>> idx.search  # Internal table with search results
>>> idx.uri  # List of absolute path to files from the search results table column 'file'

>>> idx.run()  # Run the search and save results in cache if necessary
>>> idx.to_dataframe()  # Convert search results to user-friendly :class:`pandas.DataFrame`
>>> idx.to_dataframe(nrows=2)  # Only returns the first nrows of the search results
>>> idx.to_indexfile("search_index.txt")  # Export search results to Argo standard index file

List of file properties#

>>> idx.read_wmo()
>>> idx.read_dac_wmo()
>>> idx.read_params()
>>> idx.read_domain()
>>> idx.read_files()

>>> idx.records_per_wmo()

Misc#

>>> idx.convention  # What is the expected index format (core vs BGC profile index)
>>> idx.cname
>>> idx.domain # the default read_domain() output, as a property
>>> idx.copy(deep=False)

Iterate on argopy.ArgoFloat instance#

>>> for a_float in idx.iterfloats():
>>>     print(a_float.WMO)
>>>     ds = a_float.open_dataset('prof')

__init__(**kwargs)[source]#

Create an Argo index store

Parameters:

host (str, optional, default=OPTIONS["gdac"]) –
Local or remote (http, ftp or s3) path to a dac folder (compliant with GDAC structure).

This parameter takes values like:
- http or https for https://data-argo.ifremer.fr
- us-http or us-https for https://usgodae.org/pub/outgoing/argo
- ftp for ftp://ftp.ifremer.fr/ifremer/argo
- s3 or aws for s3://argo-gdac-sandbox/pub/idx
- a local absolute path
index_file (str, default: ar_index_global_prof.txt) –
Name of the csv-like text file with the index.

This parameter takes values like:
- core or ar_index_global_prof.txt
- bgc-b or argo_bio-profile_index.txt
- bgc-s or argo_synthetic-profile_index.txt
- aux or etc/argo-index/argo_aux-profile_index.txt
- meta or ar_index_global_meta.txt
- a local absolute path toward a file following an Argo index convention. When using a local file, you need to set the convention followed by the file.
convention (str, default: None) –
Set the expected format convention of the index file.

This is useful when trying to load an index file with a custom name. If set to None, we’ll try to infer the convention from the index_file value.

This parameter takes values like:
- core or ar_index_global_prof
- bgc-b or argo_bio-profile_index
- bgc-s or argo_synthetic-profile_index
- aux or argo_aux-profile_index
- meta or ar_index_global_meta
cache (bool, default: False) – Use cache or not.
cachedir (str, default: OPTIONS['cachedir']) – Folder where to store cached files.
timeout (int, default: OPTIONS['api_timeout']) – Time out in seconds to connect to a remote host (ftp or http).

Methods

`__init__`(**kwargs)	Create an Argo index store
`cachepath`(path)	Return path to a cached file
`clear_cache`()	Clear cache registry and files associated with this store instance.
`copy`([deep])	Returns a copy of this `ArgoIndex` instance
`iterfloats`([index, chunksize])	Iterate over unique Argo floats in the full index or search results
`load`([nrows, force])	Load an Argo-index file content
`read_dac_wmo`([index])	Return a tuple of unique [DAC, WMO] pairs from the index or search results
`read_domain`([index])	Read the space/time domain of the index
`read_files`([index])	Return file paths listed in index or search results
`read_params`([index])	Return list of unique PARAMETERs in index or search results
`read_wmo`([index])	Return list of unique WMOs from the index or search results
`records_per_wmo`([index])	Return the number of records per unique WMOs in search results
`run`([nrows])	Filter index with search criteria
`search_cyc`(CYCs[, nrows])	Deprecated: this method is replaced by ArgoIndex().query.cyc()
`search_lat_lon`(BOX[, nrows])	Deprecated: this method is replaced by `ArgoIndex().query.lon_lat()`
`search_lat_lon_tim`(BOX[, nrows])	Deprecated: this method is replaced by `ArgoIndex().query.box()`
`search_parameter_data_mode`(PARAMs[, ...])	Deprecated: this method is replaced by `ArgoIndex().query.parameter_data_mode()`
`search_params`(PARAMs[, logical, nrows])	Deprecated: this method is replaced by `ArgoIndex().query.params()`
`search_profiler_label`(profiler_label[, nrows])	Deprecated: this method is replaced by `ArgoIndex().query.profiler_label()`
`search_profiler_type`(profiler_type[, nrows])	Deprecated: this method is replaced by `ArgoIndex().query.profiler_type()`
`search_tim`(BOX[, nrows])	Deprecated: this method is replaced by ArgoIndex().query.date()
`search_wmo`(WMOs[, nrows])
`search_wmo_cyc`(WMOs, CYCs[, nrows])	Deprecated: this method is replaced by ArgoIndex().query.wmo_cyc()
`to_dataframe`([nrows, index, completed])	Return index or search results as `pandas.DataFrame`
`to_indexfile`(file)	Save search results on file, following the Argo standard index formats

Attributes

`N_FILES`	Number of rows in search result or index if search not triggered
`N_MATCH`	Number of rows in search result
`N_RECORDS`	Number of rows in the full index
`backend`	Name of store backend (pandas or pyarrow)
`cname`	Search query as a pretty formatted string
`convention`	Convention of the index (standard csv file name)
`convention_columns`	CSV file column names for the index convention
`convention_supported`	List of supported conventions
`convention_title`	Long name for the index convention
`domain`	Space/time domain of the index
`ext`	Storage file extension
`files`	File paths listed in search results
`files_full_index`	File paths listed in the index
`index_path`	Absolute path to the index file
`search_path`	Path to search result uri
`search_type`	Dictionary with search meta-data
`sha_df`	Returns a unique SHA for a cname/dataframe
`sha_h5`	Returns a unique SHA for a cname/hdf5
`sha_pq`	Returns a unique SHA for a cname/parquet
`shape`	Shape of the index array
`uri`	File paths listed in search results
`uri_full_index`	File paths listed in the index

argopy.stores.ArgoIndex

Contents

argopy.stores.ArgoIndex#