Argo meta-data#

Index of profiles#

Since the Argo measurements dataset is quite complex, it comes with a collection of index files, or lookup tables with meta data. These index help you determine what you can expect before retrieving the full set of measurements.

argopy provides two methods to work with Argo index files: one is high-level and works like the data fetcher, the other is low-level and works like a โ€œstoreโ€.

Fetcher: High-level Argo index access#

argopy has a specific fetcher for index files:

In [1]: from argopy import IndexFetcher as ArgoIndexFetcher

You can use the Index fetcher with the region or float access points, similarly to data fetching:

In [2]: idx = ArgoIndexFetcher(src='gdac').float(2901623).load()

In [3]: idx.index
                                       file  ...                             profiler
0    nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
1   nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
2    nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
3    nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
4    nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
..                                      ...  ...                                  ...
93   nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
94   nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
95   nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
96   nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor
97   nmdis/2901623/profiles/  ...  Provor, Seabird conductivity sensor

[98 rows x 11 columns]

Alternatively, you can use argopy.IndexFetcher.to_dataframe():

In [4]: idx = ArgoIndexFetcher(src='gdac').float(2901623)

In [5]: df = idx.to_dataframe()

The difference is that with the load method, data are stored in memory and not fetched on every call to the index attribute.

The index fetcher has pretty much the same methods than the data fetchers. You can check them all here: argopy.fetchers.ArgoIndexFetcher.

Store: Low-level Argo Index access#

The IndexFetcher shown above is a user-friendly layer on top of our internal Argo index file store. But if you are familiar with Argo index files and/or cares about performances, you may be interested in using directly the Argo index store ArgoIndex.

If Pyarrow is installed, this store will rely on pyarrow.Table as internal storage format for the index, otherwise it will fall back on pandas.DataFrame. Loading the full Argo profile index takes about 2/3 secs with Pyarrow, while it can take up to 6/7 secs with Pandas.

All index store methods and properties are fully documented in ArgoIndex.


You create an index store with default or custom options:

In [6]: from argopy import ArgoIndex

In [7]: idx = ArgoIndex()

# or:
# ArgoIndex(index_file="argo_bio-profile_index.txt")
# ArgoIndex(host="")
# ArgoIndex(host="", index_file="ar_index_global_prof.txt")
# ArgoIndex(host="", index_file="ar_index_global_prof.txt", cache=True)

You can then trigger loading of the index content:

In [8]: idx.load()  # Load the full index in memory
Here is the list of methods and properties of the full index:

idx.load(nrows=12)  # Only load the first N rows of the index
idx.N_RECORDS  # Shortcut for length of 1st dimension of the index array
idx.to_dataframe(index=True)  # Convert index to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(index=True, nrows=2)  # Only returns the first nrows of the index
idx.index  # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`)
idx.uri_full_index  # List of absolute path to files from the full index table column 'file'

They are several methods to search the index, for instance:

In [9]: idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])
Here the list of all methods to search the index:

idx.search_wmo_cyc(1901393, [1,12])
idx.search_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition, only time is used
idx.search_lat_lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition, only lat/lon is used
idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition

And finally the list of methods and properties for search results:

idx.N_MATCH  # Shortcut for length of 1st dimension of the search results array
idx.to_dataframe()  # Convert search results to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(nrows=2)  # Only returns the first nrows of the search results
idx.to_indexfile("search_index.txt")  # Export search results to Argo standard index file  # Internal table with search results
idx.uri  # List of absolute path to files from the search results table column 'file'


The argopy index store supports the Bio and Synthetic Profile directory files:

In [10]: idx = ArgoIndex(index_file="argo_bio-profile_index.txt").load()

# idx = ArgoIndex(index_file="argo_synthetic-profile_index.txt").load()
In [11]: idx
Index: argo_bio-profile_index.txt
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
Loaded: True (288978 records)
Searched: False

This BGC index store comes with an additional search possibility for parameters:

In [12]: idx.search_params(['C1PHASE_DOXY', 'DOWNWELLING_PAR'])
Index: argo_bio-profile_index.txt
Convention: argo_bio-profile_index (Bio-Profile directory file of the Argo GDAC)
Loaded: True (288978 records)
Searched: True (38271 matches, 13.2436%)
In [13]: idx.to_dataframe()
                                          file  ... profiler
0       bodc/3901496/profiles/  ...  Unknown
1       bodc/3901496/profiles/  ...  Unknown
2       bodc/3901496/profiles/  ...  Unknown
3       bodc/3901496/profiles/  ...  Unknown
4       bodc/3901496/profiles/  ...  Unknown
...                                        ...  ...      ...
38266  csiro/7900947/profiles/  ...  Unknown
38267  csiro/7900947/profiles/  ...  Unknown
38268  csiro/7900947/profiles/  ...  Unknown
38269  csiro/7900947/profiles/  ...  Unknown
38270  csiro/7900947/profiles/  ...  Unknown

[38271 rows x 13 columns]

Index file supported#

The table below summarize the argopy support status of all Argo index files:

argopy GDAC index file support status#

Index file


























Index files support can be added on demand. Click here to raise an issue if youโ€™d like to access other index files.

Reference tables#

The Argo netcdf format is strict and based on a collection of variables fully documented and conventioned. All reference tables can be found in the Argo user manual.

However, a machine-to-machine access to these tables is often required. This is possible thanks to the work of the Argo Vocabulary Task Team (AVTT) that is a team of people responsible for the NVS collections under the Argo Data Management Team governance.


The GitHub organization hosting the AVTT is the โ€˜NERC Vocabulary Server (NVS)โ€™, aka โ€˜nvs-vocabsโ€™. This holds a list of NVS collection-specific GitHub repositories. Each Argo GitHub repository is called after its corresponding collection ID (e.g. R01, RR2, R03 etc.). The current list is given here.

The management of issues related to vocabularies managed by the Argo Data Management Team is done on this repository.

argopy provides the utility class ArgoNVSReferenceTables to easily fetch and get access to all Argo reference tables. If you already know the name of the reference table you want to retrieve, you can simply get it like this:

In [14]: from argopy import ArgoNVSReferenceTables

In [15]: NVS = ArgoNVSReferenceTables()

In [16]: NVS.tbl('R01')
  altLabel  ...                                                 id
0    BPROF  ...
1    BTRAJ  ...
2     META  ...
3    MPROF  ...
4    MTRAJ  ...
5     PROF  ...
6    SPROF  ...
7     TECH  ...
8     TRAJ  ...

[9 rows x 5 columns]

The reference table is returned as a pandas.DataFrame. If you want the exact name of this table:

In [17]: NVS.tbl_name('R01')
 'Terms describing the type of data contained in an Argo netCDF file. Argo netCDF variable DATA_TYPE is populated by R01 prefLabel.',

If youโ€™re looking the ID to use for a specific reference table, you can check it from the list of all available tables given by the ArgoNVSReferenceTables.all_tbl_name() property. It will return a dictionary with table IDs as key and table name, definition and NVS link as values. Use the ArgoNVSReferenceTables.all_tbl() property to retrieve all tables.

In [18]: NVS.all_tbl_name
               'Terms describing the type of data contained in an Argo netCDF file. Argo netCDF variable DATA_TYPE is populated by R01 prefLabel.',
               'Terms describing individual measured phenomena, used to mark up sets of data in Argo netCDF arrays. Argo netCDF variables PARAMETER and TRAJECTORY_PARAMETERS are populated by R03 altLabel; R03 altLabel is also used to name netCDF profile files parameter variables <PARAMETER>.',
               'Codes for data centres and institutions handling or managing Argo data. Argo netCDF variable DATA_CENTRE is populated by R04 altLabel.',
               'Accuracy in latitude and longitude measurements received from the positioning system, grouped by location accuracy classes.',
               'Processing stage of the data based on the concatenation of processing level and class indicators. Argo netCDF variable DATA_STATE_INDICATOR is populated by R06 altLabel.',
               'Coded history information for each action performed on each profile by a data centre. Argo netCDF variable HISTORY_ACTION is populated by R07 altLabel.',
               "Subset of instrument type codes from the World Meteorological Organization (WMO) Common Code Table C-3 (CCT C-3) 1770, named 'Instrument make and type for water temperature profile measurement with fall rate equation coefficients' and available here: Argo netCDF variable WMO_INST_TYPE is populated by R08 altLabel.",
               'List of float location measuring systems. Argo netCDF variable POSITIONING_SYSTEM is populated by R09 altLabel.',
               'List of telecommunication systems. Argo netCDF variable TRANS_SYSTEM is populated by R10 altLabel.',
               'List of real-time quality-control tests and corresponding binary identifiers, used as reference to populate the Argo netCDF HISTORY_QCTEST variable.',
               'Data processing step codes for history record. Argo netCDF variable TRANS_SYSTEM is populated by R12 altLabel.',
               'Ocean area codes assigned to each profile in the Metadata directory (index) file of the Argo Global Assembly Centre.',
               'Measurement code IDs used in Argo Trajectory netCDF files. Argo netCDF variable MEASUREMENT_CODE is populated by R15 altLabel.',
               'Profile sampling schemes and sampling methods. Argo netCDF variable VERTICAL_SAMPLING_SCHEME is populated by R16 altLabel.',
               'Flag scale for values in all Argo netCDF cycle timing variables. Argo netCDF cycle timing variables JULD_<RTV>_STATUS are populated by R19 altLabel.',
               'Codes to indicate the best estimate of whether the float touched the ground during a specific cycle. Argo netCDF variable GROUNDED in the Trajectory file is populated by R20 altLabel.',
               'Argo status flag on the Representative Park Pressure (RPP). Argo netCDF variable REPRESENTATIVE_PARK_PRESSURE_STATUS in the Trajectory file is populated by R21 altLabel.',
               'List of platform family/category of Argo floats. Argo netCDF variable PLATFORM_FAMILY is populated by R22 altLabel.',
               'List of Argo float types. Argo netCDF variable PLATFORM_TYPE is populated by R23 altLabel.',
               'List of Argo float manufacturers. Argo netCDF variable PLATFORM_MAKER is populated by R24 altLabel.',
               'Terms describing sensor types mounted on Argo floats. Argo netCDF variable SENSOR is populated by R25 altLabel.',
               'Terms describing developers and manufacturers of sensors mounted on Argo floats. Argo netCDF variable SENSOR_MAKER is populated by R26 altLabel.',
               'Terms listing models of sensors mounted on Argo floats. Note: avoid using the manufacturer name and sensor firmware version in new entries when possible. Argo netCDF variable SENSOR_MODEL is populated by R27 altLabel.',
               "Quality flag scale for delayed-mode measurements. Argo netCDF variables <PARAMETER>_ADJUSTED_QC in 'D' mode are populated by RD2 altLabel.",
               "Categories of trajectory measurement codes listed in NVS collection 'R15'",
               'Quality control flag scale for whole profiles. Argo netCDF variables PROFILE_<PARAMETER>_QC are populated by RP2 altLabel.',
               "Quality flag scale for real-time measurements. Argo netCDF variables <PARAMETER>_QC in 'R' mode and <PARAMETER>_ADJUSTED_QC in 'A' mode are populated by RR2 altLabel.",
               "Timing variables representing stages of an Argo float profiling cycle, most of which are associated with a trajectory measurement code ID listed in NVS collection 'R15'. Argo netCDF cycle timing variable names JULD_<RTV>_STATUS are constructed by RTV altLabel.",

Deployment Plan#

It may be useful to be able to retrieve meta-data from Argo deployments. argopy can use the OceanOPS API for metadata access to retrieve these information. The returned deployment plan is a list of all Argo floats ever deployed, together with their deployment location, date, WMO, program, country, float model and current status.

To fetch the Argo deployment plan, argopy provides a dedicated utility class: OceanOPSDeployments that can be used like this:

In [19]: from argopy import OceanOPSDeployments

In [20]: deployment = OceanOPSDeployments()

In [21]: df = deployment.to_dataframe()

In [22]: df
                   date    lat     lon  ...      program country    model
0   2023-07-25 00:00:00  72.30 -134.00  ...  Argo CANADA  CANADA    ARVOR
1   2023-07-26 11:06:49  40.10   11.20  ...   Argo ITALY   ITALY    ARVOR
2   2023-07-28 00:00:00  73.00 -150.00  ...  Argo CANADA  CANADA    ARVOR
3   2023-07-30 00:00:00  43.42    7.89  ...     Coriolis  FRANCE    ARVOR
4   2023-07-30 00:00:00  40.00    6.99  ...     Coriolis  FRANCE    ARVOR
..                  ...    ...     ...  ...          ...     ...      ...
427 2024-12-31 13:49:07  47.80   -3.30  ...     Coriolis  FRANCE  ARVOR_D
428 2024-12-31 13:49:07  47.80   -3.30  ...     Coriolis  FRANCE  ARVOR_D
429 2024-12-31 13:49:07  47.80   -3.30  ...     Coriolis  FRANCE  ARVOR_D
430 2024-12-31 13:49:07  47.80   -3.30  ...     Coriolis  FRANCE  ARVOR_D
431 2024-12-31 13:49:07  47.80   -3.30  ...     Coriolis  FRANCE  ARVOR_D

[432 rows x 9 columns]

OceanOPSDeployments can also take an index box definition as argument in order to restrict the deployment plan selection to a specific region or period:

deployment = OceanOPSDeployments([-90, 0, 0, 90])
# deployment = OceanOPSDeployments([-20, 0, 42, 51, '2020-01', '2021-01'])
# deployment = OceanOPSDeployments([-180, 180, -90, 90, '2020-01', None])

Note that if the starting date is not provided, it will be set automatically to the current date.

Last, OceanOPSDeployments comes with a plotting method:

fig, ax = deployment.plot_status()


The list of possible deployment status name/code is given by:







Starting status for some platforms, when there is only a few metadata available, like rough deployment location and date. The platform may be deployed



Automatically set when a ship is attached to the deployment information. The platform is ready to be deployed, deployment is planned



Starting status for most of the networks, when deployment planning is not done. The deployment is certain, and a notification has been sent via the OceanOPS system



Automatically set when the platform is emitting a pulse and observations are distributed within a certain time interval



The platform is not emitting a pulse since a certain time



The platform is not emitting a pulse since a long time, it is considered as dead

ADMT Documentation#

More than 20 pdf manuals have been produced by the Argo Data Management Team. Using the ArgoDocs class, itโ€™s easy to navigate this great database.

If you donโ€™t know where to start, you can simply list all available documents:

In [23]: from argopy import ArgoDocs

In [24]: ArgoDocs().list
             category  ...     id
0   Argo data formats  ...  29825
1     Quality control  ...  33951
2     Quality control  ...  46542
3     Quality control  ...  40879
4     Quality control  ...  35385
5     Quality control  ...  84370
6     Quality control  ...  62466
7           Cookbooks  ...  41151
8           Cookbooks  ...  29824
9           Cookbooks  ...  78994
10          Cookbooks  ...  39795
11          Cookbooks  ...  39459
12          Cookbooks  ...  39468
13          Cookbooks  ...  47998
14          Cookbooks  ...  54541
15          Cookbooks  ...  46121
16          Cookbooks  ...  51541
17          Cookbooks  ...  57195
18          Cookbooks  ...  46120
19          Cookbooks  ...  52154
20          Cookbooks  ...  55637
21          Cookbooks  ...  46202

[22 rows x 4 columns]

Or search for a word in the title and/or abstract:

In [25]: results = ArgoDocs().search("oxygen")

In [26]: for docid in results:
   ....:     print("\n", ArgoDocs(docid))

Then using the Argo doi number of a document, you can easily retrieve it:

In [27]: ArgoDocs(35385)
and open it in your browser:

# ArgoDocs(35385).show()
# ArgoDocs(35385).open_pdf(page=12)