Argo Index store#
If you are familiar with Argo index csv files, you may be interested in using directly the Argo index store ArgoIndex
.
If Pyarrow is installed, this store will rely on pyarrow.Table
as internal storage format for the index, otherwise it will fall back on pandas.DataFrame
. Loading the full Argo profile index takes about 2/3 secs with Pyarrow, while it can take up to 6/7 secs with Pandas.
All index store methods and properties are documented in the ArgoIndex
API page.
Index file supported#
The table below summarize the argopy support status of all Argo index files:
Index file |
Supported |
|
---|---|---|
Profile |
ar_index_global_prof.txt |
β |
Synthetic-Profile |
argo_synthetic-profile_index.txt |
β |
Bio-Profile |
argo_bio-profile_index.txt |
β |
Metadata |
ar_index_global_meta.txt |
β |
Auxiliary |
etc/argo-index/argo_aux-profile_index.txt |
β |
Trajectory |
ar_index_global_traj.txt |
β |
Bio-Trajectory |
argo_bio-traj_index.txt |
β |
Technical |
ar_index_global_tech.txt |
β |
Greylist |
ar_greylist.txt |
β |
Index files support can be added on demand. Click here to raise an issue if youβd like to access other index files.
Usage#
You create an index store with default or custom options:
In [1]: from argopy import ArgoIndex
In [2]: idx = ArgoIndex()
# or:
# ArgoIndex(index_file="argo_bio-profile_index.txt")
# ArgoIndex(index_file="bgc-s") # can use keyword instead of file name: core, bgc-b, bgc-b
# ArgoIndex(host="ftp://ftp.ifremer.fr/ifremer/argo")
# ArgoIndex(host="https://data-argo.ifremer.fr", index_file="core")
# ArgoIndex(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt", cache=True)
Note that you can use GDAC host shortcut names:
https://data-argo.ifremer.fr
, shortcut withhttp
orhttps
https://usgodae.org/pub/outgoing/argo
, shortcut withus-http
orus-https
ftp://ftp.ifremer.fr/ifremer/argo
, shortcut withftp
s3://argo-gdac-sandbox/pub/idx
, shortcut withs3
oraws
You can then trigger loading of the index content:
In [3]: idx.load() # Load the full index in memory
Out[3]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: ar_index_global_prof.txt.gz
Convention: ar_index_global_prof (Profile directory file of the Argo GDAC)
In memory: True (3128974 records)
Searched: False
Here is the list of methods and properties of the full index:
idx.load(nrows=12) # Only load the first N rows of the index
idx.N_RECORDS # Shortcut for length of 1st dimension of the index array
idx.to_dataframe(index=True) # Convert index to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(index=True, nrows=2) # Only returns the first nrows of the index
idx.index # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`)
idx.uri_full_index # List of absolute path to files from the full index table column 'file'
They are several methods to search the index, for instance:
In [4]: idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])
Out[4]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: ar_index_global_prof.txt.gz
Convention: ar_index_global_prof (Profile directory file of the Argo GDAC)
In memory: True (3128974 records)
Searched: True (12 matches, 0.0004%)
Here the list of all methods to search the index:
idx.search_wmo(1901393)
idx.search_cyc(1)
idx.search_wmo_cyc(1901393, [1,12])
idx.search_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition, only time is used
idx.search_lat_lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition, only lat/lon is used
idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition
idx.search_params(['C1PHASE_DOXY', 'DOWNWELLING_PAR']) # Only for BGC profile index
idx.search_parameter_data_mode({'BBP700': 'D'}) # Only for BGC profile index
idx.search_profiler_type(845)
idx.search_profiler_label('NINJA')
And finally the list of methods and properties for search results:
idx.N_MATCH # Shortcut for length of 1st dimension of the search results array
idx.to_dataframe() # Convert search results to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(nrows=2) # Only returns the first nrows of the search results
idx.to_indexfile("search_index.txt") # Export search results to Argo standard index file
idx.search # Internal table with search results
idx.uri # List of absolute path to files from the search results table column 'file'
Usage with bgc index#
The argopy index store supports the Bio, Synthetic and Auxiliary Profile directory files:
In [5]: idx = ArgoIndex(index_file="argo_bio-profile_index.txt").load()
# idx = ArgoIndex(index_file="argo_synthetic-profile_index.txt").load()
In [6]: idx
Out[6]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: False
Hint
In order to load one BGC-Argo profile index, you can use either bgc-b
or bgc-s
keywords to load the argo_bio-profile_index.txt
or argo_synthetic-profile_index.txt
index files.
All methods presented above are valid with BGC index, but a BGC index store comes with additional search possibilities for parameters and parameter data modes.
Two specific index variables are only available with BGC-Argo index files: PARAMETERS
and PARAMETER_DATA_MODE
. We thus implemented the ArgoIndex.search_params()
and ArgoIndex.search_parameter_data_mode()
methods. These method allow to search for (i) profiles with one or more specific parameters and (ii) profiles with parameters in one or more specific data modes.
Syntax for ArgoIndex.search_params()
In [7]: from argopy import ArgoIndex
In [8]: idx = ArgoIndex(index_file='bgc-s').load()
In [9]: idx
Out[9]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: False
You can search for one parameter:
In [10]: idx.search_params('DOXY')
Out[10]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: True (327927 matches, 95.9404%)
Or you can search for several parameters:
In [11]: idx.search_params(['DOXY', 'CDOM'])
Out[11]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: True (58077 matches, 16.9914%)
Note that a multiple parameters search will return profiles with all parameters. To search for profiles with any of the parameters, use:
In [12]: idx.search_params(['DOXY', 'CDOM'], logical='or')
Out[12]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_synthetic-profile_index.txt.gz
Convention: argo_synthetic-profile_index (Synthetic-Profile directory file of the Argo GDAC)
In memory: True (341803 records)
Searched: True (340388 matches, 99.5860%)
Syntax for ArgoIndex.search_parameter_data_mode()
In [13]: from argopy import ArgoIndex
In [14]: idx = ArgoIndex(index_file='bgc-b').load()
In [15]: idx
Out[15]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: False
You can search one mode for a single parameter:
In [16]: idx.search_parameter_data_mode({'BBP700': 'D'})
Out[16]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (17529 matches, 5.0933%)
You can search several modes for a single parameter:
In [17]: idx.search_parameter_data_mode({'DOXY': ['R', 'A']})
Out[17]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (114010 matches, 33.1274%)
You can search several modes for several parameters:
In [18]: idx.search_parameter_data_mode({'BBP700': 'D', 'DOXY': 'D'}, logical='and')
Out[18]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (11292 matches, 3.2811%)
And mix all of these as you wish:
In [19]: idx.search_parameter_data_mode({'BBP700': ['R', 'A'], 'DOXY': 'D'}, logical='or')
Out[19]:
<argoindex.pandas>
Host: https://data-argo.ifremer.fr
Index: argo_bio-profile_index.txt.gz
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
In memory: True (344156 records)
Searched: True (262885 matches, 76.3854%)