Data sources#
Let’s start with standard import:
In [1]: import argopy
In [2]: from argopy import DataFetcher as ArgoDataFetcher
Available data sources#
argopy can get access to Argo data from the following sources:
- the Ifremer erddap server.
The erddap server database is updated daily and doesn’t require you to download anymore data than what you need. You can select this data source with the keyword
erddap
and methods described below. The Ifremer erddap dataset is based on mono-profile files of the GDAC. Since this is the most efficient method to fetcher Argo data, it’s the default data source in argopy.
- an Argo GDAC server or any other GDAC-compliant folders.
You can fetch data from any of the 3 official GDAC online servers: the Ifremer https and ftp and the US ftp. This data source can also point toward your own local copy of the GDAC ftp content. You can select this data source with the keyword
gdac
and methods described below.
- the Argovis server.
The Argovis server database is updated daily and only provides access to curated Argo data (QC=1 only). You can select this data source with the keyword
argovis
and methods described below.
Selecting a source#
You have several ways to specify which data source you want to use:
using argopy global options:
In [3]: argopy.set_options(src='erddap')
Out[3]: <argopy.options.set_options at 0x7f077cbe5130>
in a temporary context:
In [4]: with argopy.set_options(src='erddap'):
...: loader = ArgoDataFetcher().profile(6902746, 34)
...:
with an argument in the data fetcher:
In [5]: loader = ArgoDataFetcher(src='erddap').profile(6902746, 34)
Comparing data sources#
Features#
Each of the available data sources have their own features and capabilities. Here is a summary:
Data source: |
erddap |
gdac |
argovis |
---|---|---|---|
Access Points |
|||
region |
X |
X |
X |
float |
X |
X |
X |
profile |
X |
X |
X |
User mode |
|||
standard |
X |
X |
X |
expert |
X |
X |
|
Dataset |
|||
core (T/S) |
X |
X |
X |
BGC |
|||
Reference data for DMQC |
X |
||
Parallel method |
|||
multi-threading |
X |
X |
X |
multi-processes |
|||
Dask client |
|||
Offline mode |
* Only when used with a local copy of the GDAC folder.
Fetched data and variables#
Let’s retrieve one float data from a local sample of the GDAC ftp (a sample GDAC ftp is downloaded automatically with the method argopy.tutorial.open_dataset()
):
# Download ftp sample and get the ftp local path:
In [6]: ftproot = argopy.tutorial.open_dataset('gdac')[0]
# then fetch data:
In [7]: with argopy.set_options(src='gdac', ftp=ftproot):
...: ds = ArgoDataFetcher().float(1900857).load().data
...: print(ds)
...:
<xarray.Dataset>
Dimensions: (N_POINTS: 20966)
Coordinates:
* N_POINTS (N_POINTS) int64 0 1 2 3 ... 20962 20963 20964 20965
TIME (N_POINTS) datetime64[ns] 2008-02-25T04:03:00 ... ...
LATITUDE (N_POINTS) float64 -39.93 -39.93 ... -44.16 -44.16
LONGITUDE (N_POINTS) float64 10.81 10.81 10.81 ... 92.65 92.65
Data variables: (12/13)
CONFIG_MISSION_NUMBER (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2
CYCLE_NUMBER (N_POINTS) int32 0 0 0 0 0 0 ... 192 192 192 192 192
DATA_MODE (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
DIRECTION (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_POINTS) int32 1900857 1900857 ... 1900857 1900857
POSITION_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
... ...
PRES_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
PSAL (N_POINTS) float64 34.68 34.68 34.69 ... 34.71 34.72
PSAL_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TEMP (N_POINTS) float64 16.14 16.14 16.03 ... 2.422 2.413
TEMP_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TIME_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: /home/docs/.argopy_tutorial_data/ftp
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: phy;WMO1900857
Fetched_uri: /home/docs/.argopy_tutorial_data/ftp/dac/coriolis/1...
history: Variables filtered according to DATA_MODE; Variable...
Let’s now retrieve the latest data for this float from the erddap
and argovis
sources:
In [8]: with argopy.set_options(src='erddap'):
...: ds = ArgoDataFetcher().float(1900857).load().data
...: print(ds)
...:
<xarray.Dataset>
Dimensions: (N_POINTS: 20966)
Coordinates:
* N_POINTS (N_POINTS) int64 0 1 2 3 ... 20962 20963 20964 20965
LATITUDE (N_POINTS) float64 -39.93 -39.93 ... -44.16 -44.16
LONGITUDE (N_POINTS) float64 10.81 10.81 10.81 ... 92.65 92.65
TIME (N_POINTS) datetime64[ns] 2008-02-25T04:03:00 ... ...
Data variables: (12/13)
CONFIG_MISSION_NUMBER (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2
CYCLE_NUMBER (N_POINTS) int32 0 0 0 0 0 0 ... 192 192 192 192 192
DATA_MODE (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
DIRECTION (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_POINTS) int32 1900857 1900857 ... 1900857 1900857
POSITION_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
... ...
PRES_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
PSAL (N_POINTS) float64 34.68 34.68 34.69 ... 34.71 34.72
PSAL_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TEMP (N_POINTS) float64 16.14 16.14 16.03 ... 2.422 2.413
TEMP_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TIME_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://erddap.ifremer.fr/erddap
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: phy;WMO1900857
Fetched_uri: ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
history: Variables filtered according to DATA_MODE; Variable...
In [9]: with argopy.set_options(src='argovis'):
...: ds = ArgoDataFetcher().float(1900857).load().data
...: print(ds)
...:
<xarray.Dataset>
Dimensions: (N_POINTS: 21029)
Coordinates:
* N_POINTS (N_POINTS) int64 0 1 2 3 4 ... 21025 21026 21027 21028
TIME (N_POINTS) datetime64[ns] 2008-02-28T01:23:00 ... 2013-0...
LATITUDE (N_POINTS) float64 -40.02 -40.02 -40.02 ... -44.16 -44.16
LONGITUDE (N_POINTS) float64 10.54 10.54 10.54 ... 92.65 92.65 92.65
Data variables:
CYCLE_NUMBER (N_POINTS) int32 0 0 0 0 0 0 0 ... 192 192 192 192 192 192
DATA_MODE (N_POINTS) <U1 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D' 'D'
DIRECTION (N_POINTS) <U1 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_POINTS) int32 1900857 1900857 ... 1900857 1900857
POSITION_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
PRES (N_POINTS) int64 16 26 37 45 55 ... 1913 1938 1964 1987
PSAL (N_POINTS) float64 34.74 34.73 34.67 ... 34.71 34.71 34.72
TEMP (N_POINTS) float64 16.69 16.59 15.92 ... 2.431 2.422 2.413
TIME_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://argovisbeta02.colorado.edu
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: phy;WMO1900857
Fetched_uri: ['https://argovisbeta02.colorado.edu/catalog/platfo...
We can see some minor differences between gdac
/erddap
vs the
argovis
response.
Status of sources#
With remote, online data sources, it may happens that the data server is experiencing down time. With local data sources, the availability of the path is checked when it is set. But it may happens that the path points to a disk that get unmounted or unplugged after the option setting.
If you’re running your analysis on a Jupyter notebook, you can use the argopy.status()
method to insert a data status monitor on a cell output. All available data sources will be monitored continuously.
argopy.status()

If one of the data source become unavailable, you will see the status bar changing to something like:

Note that the argopy.status()
method has a refresh
option to let you specify the refresh rate in seconds of the monitoring.
Last, you can check out the following argopy status webpage that monitors all important resources to the software.
Setting-up your own local copy of the GDAC ftp#
Data fetching with the gdac
data source will require you to
specify the path toward your local copy of the GDAC ftp server with the
ftp
option.
This is not an issue for expert users, but standard users may wonder how to set this up. The primary distribution point for Argo data, the only one with full support from data centers and with nearly a 100% time availability, is the GDAC ftp. Two mirror servers are available:
France Coriolis: ftp://ftp.ifremer.fr/ifremer/argo
US GODAE: ftp://usgodae.org/pub/outgoing/argo
If you want to get your own copy of the ftp server content, you have 2 options detailed below.
Copy with DOI reference#
If you need an Argo database referenced with a DOI, one that you could use to make your analysis reproducible, then we recommend you to visit https://doi.org/10.17882/42182. There, you will find links toward monthly snapshots of the Argo database, and each snapshot has its own DOI.
For instance, https://doi.org/10.17882/42182#92121 points toward the snapshot archived on February 10st 2022. Simply download the tar archive file (about 44Gb) and uncompress it locally.
You’re done !
Synchronized copy#
If you need a local Argo database always up to date with the GDAC server, Ifremer provides a nice rsync service. The rsync server “vdmzrs.ifremer.fr” provides a synchronization service between the “dac” directory of the GDAC and a user mirror. The “dac” index files are also available from “argo-index”.
From the user side, the rsync service:
Downloads the new files
Downloads the updated files
Removes the files that have been removed from the GDAC
Compresses/uncompresses the files during the transfer
Preserves the files creation/update dates
Lists all the files that have been transferred (easy to use for a user side post-processing)
To synchronize the whole dac directory of the Argo GDAC:
rsync -avzh --delete vdmzrs.ifremer.fr::argo/ /home/mydirectory/...
To synchronize the index:
rsync -avzh --delete vdmzrs.ifremer.fr::argo-index/ /home/mydirectory/...
Note
The first synchronisation of the whole dac directory of the Argo GDAC (365Gb) can take quite a long time (several hours).