Argo Index store

Argo Index store#

If you are familiar with Argo index csv files, you may be interested in using directly the Argo index store ArgoIndex.

If Pyarrow is installed, this store will rely on pyarrow.Table as internal storage format for the index, otherwise it will fall back on pandas.DataFrame. Loading the full Argo profile index takes about 2/3 secs with Pyarrow, while it can take up to 6/7 secs with Pandas.

All index store methods and properties are documented in the ArgoIndex API page.

Index file supported#

The table below summarize the argopy support status of all Argo index files:

Table 4 argopy GDAC index file support status#

Index file

Supported

Profile

ar_index_global_prof.txt

βœ…

Synthetic-Profile

argo_synthetic-profile_index.txt

βœ…

Bio-Profile

argo_bio-profile_index.txt

βœ…

Metadata

ar_index_global_meta.txt

βœ…

Auxiliary

etc/argo-index/argo_aux-profile_index.txt

βœ…

Trajectory

ar_index_global_traj.txt

❌

Bio-Trajectory

argo_bio-traj_index.txt

❌

Technical

ar_index_global_tech.txt

❌

Greylist

ar_greylist.txt

❌

Index files support can be added on demand. Click here to raise an issue if you’d like to access other index files.

Usage#

You create an index store with default or custom options:

In [1]: from argopy import ArgoIndex

In [2]: idx = ArgoIndex()

# or:
# ArgoIndex(index_file="argo_bio-profile_index.txt")
# ArgoIndex(index_file="bgc-s")  # can use keyword instead of file name: core, bgc-b, bgc-b
# ArgoIndex(host="ftp://ftp.ifremer.fr/ifremer/argo")
# ArgoIndex(host="https://data-argo.ifremer.fr", index_file="core")
# ArgoIndex(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt", cache=True)

Note that you can use GDAC host shortcut names:

  • https://data-argo.ifremer.fr, shortcut with http or https

  • https://usgodae.org/pub/outgoing/argo, shortcut with us-http or us-https

  • ftp://ftp.ifremer.fr/ifremer/argo, shortcut with ftp

  • s3://argo-gdac-sandbox/pub/idx, shortcut with s3 or aws

You can then trigger loading of the index content:

In [3]: idx.load()  # Load the full index in memory
Out[3]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: ar_index_global_prof.txt.gz
Convention: ar_index_global_prof (Profile directory file of the Argo GDAC)
In memory: True (3128974 records)
Searched: False

Here is the list of methods and properties of the full index:

idx.load(nrows=12)  # Only load the first N rows of the index
idx.N_RECORDS  # Shortcut for length of 1st dimension of the index array
idx.to_dataframe(index=True)  # Convert index to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(index=True, nrows=2)  # Only returns the first nrows of the index
idx.index  # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`)
idx.uri_full_index  # List of absolute path to files from the full index table column 'file'

They are several methods to search the index, for instance:

In [4]: idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])
Out[4]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: ar_index_global_prof.txt.gz
Convention: ar_index_global_prof (Profile directory file of the Argo GDAC)
In memory: True (3128974 records)
Searched: True (12 matches, 0.0004%)

Here the list of all methods to search the index:

idx.search_wmo(1901393)
idx.search_cyc(1)
idx.search_wmo_cyc(1901393, [1,12])
idx.search_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition, only time is used
idx.search_lat_lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition, only lat/lon is used
idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
idx.search_params(['C1PHASE_DOXY', 'DOWNWELLING_PAR'])  # Only for BGC profile index
idx.search_parameter_data_mode({'BBP700': 'D'})  # Only for BGC profile index
idx.search_profiler_type(845)
idx.search_profiler_label('NINJA')

And finally the list of methods and properties for search results:

idx.N_MATCH  # Shortcut for length of 1st dimension of the search results array
idx.to_dataframe()  # Convert search results to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(nrows=2)  # Only returns the first nrows of the search results
idx.to_indexfile("search_index.txt")  # Export search results to Argo standard index file
idx.search  # Internal table with search results
idx.uri  # List of absolute path to files from the search results table column 'file'

Usage with bgc index#

The argopy index store supports the Bio, Synthetic and Auxiliary Profile directory files:

In [5]: idx = ArgoIndex(index_file="argo_bio-profile_index.txt").load()

# idx = ArgoIndex(index_file="argo_synthetic-profile_index.txt").load()
In [6]: idx
Out[6]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: False

Hint

In order to load one BGC-Argo profile index, you can use either bgc-b or bgc-s keywords to load the argo_bio-profile_index.txt or argo_synthetic-profile_index.txt index files.

All methods presented above are valid with BGC index, but a BGC index store comes with additional search possibilities for parameters and parameter data modes.

Two specific index variables are only available with BGC-Argo index files: PARAMETERS and PARAMETER_DATA_MODE. We thus implemented the ArgoIndex.search_params() and ArgoIndex.search_parameter_data_mode() methods. These method allow to search for (i) profiles with one or more specific parameters and (ii) profiles with parameters in one or more specific data modes.

Syntax for ArgoIndex.search_params()
In [7]: from argopy import ArgoIndex

In [8]: idx = ArgoIndex(index_file='bgc-s').load()

In [9]: idx
Out[9]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: False

You can search for one parameter:

In [10]: idx.search_params('DOXY')
Out[10]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: True (327927 matches, 95.9404%)

Or you can search for several parameters:

In [11]: idx.search_params(['DOXY', 'CDOM'])
Out[11]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: True (58077 matches, 16.9914%)

Note that a multiple parameters search will return profiles with all parameters. To search for profiles with any of the parameters, use:

In [12]: idx.search_params(['DOXY', 'CDOM'], logical='or')
Out[12]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: True (340388 matches, 99.5860%)
Syntax for ArgoIndex.search_parameter_data_mode()
In [13]: from argopy import ArgoIndex

In [14]: idx = ArgoIndex(index_file='bgc-b').load()

In [15]: idx
Out[15]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: False

You can search one mode for a single parameter:

In [16]: idx.search_parameter_data_mode({'BBP700': 'D'})
Out[16]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (17529 matches, 5.0933%)

You can search several modes for a single parameter:

In [17]: idx.search_parameter_data_mode({'DOXY': ['R', 'A']})
Out[17]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (114010 matches, 33.1274%)

You can search several modes for several parameters:

In [18]: idx.search_parameter_data_mode({'BBP700': 'D', 'DOXY': 'D'}, logical='and')
Out[18]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (11292 matches, 3.2811%)

And mix all of these as you wish:

In [19]: idx.search_parameter_data_mode({'BBP700': ['R', 'A'], 'DOXY': 'D'}, logical='or')
Out[19]: 
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (262885 matches, 76.3854%)