Data sources

Let’s start with standard import:

In [1]: import argopy

In [2]: from argopy import DataFetcher as ArgoDataFetcher

Selecting a source

argopy can get access to Argo data from different sources:

  1. the Ifremer erddap server.

    The erddap server database is updated daily and doesn’t require you to download anymore data than what you need.
    You can select this data source with the keyword erddap and methods described below. The Ifremer erddap dataset is based on mono-profile files of the GDAC.
  2. your local collection of Argo files, organised as in the GDAC ftp.

    This is how you would use argopy with your data, as long as they are formatted and organised the Argo way.
    You can select this data source with the keyword localftp and methods described below.
  3. the Argovis server.

    The Argovis server database is updated daily and provides access to curated Argo data (QC=1 only). You can select this data source with the keyword argovis and methods described below.

You have several ways to specify which data source you want to use:

  • using argopy global options:

In [3]: argopy.set_options(src='erddap')
Out[3]: <argopy.options.set_options at 0x7fe1f1531310>
  • in a temporary context:

In [4]: with argopy.set_options(src='erddap'):
   ...:     loader = ArgoDataFetcher().profile(6902746, 34)
   ...: 
  • with an argument in the data fetcher:

In [5]: loader = ArgoDataFetcher(src='erddap').profile(6902746, 34)

Setting a local copy of the GDAC ftp

Data fetching with the localftp data source will require you to specify the path toward your local copy of the GDAC ftp server with the local_ftp option.

This is not an issue for expert users, but standard users may wonder how to set this up. The primary distribution point for Argo data, the only one with full support from data centers and with nearly a 100% time availability, is the GDAC ftp. Two mirror servers are available:

If you want to get your own copy of the ftp server content, Ifremer provides a nice rsync service. The rsync server “vdmzrs.ifremer.fr” provides a synchronization service between the “dac” directory of the GDAC and a user mirror. The “dac” index files are also available from “argo-index”.

From the user side, the rsync service:

  • Downloads the new files

  • Downloads the updated files

  • Removes the files that have been removed from the GDAC

  • Compresses/uncompresses the files during the transfer

  • Preserves the files creation/update dates

  • Lists all the files that have been transferred (easy to use for a user side post-processing)

To synchronize the whole dac directory of the Argo GDAC:

rsync -avzh --delete vdmzrs.ifremer.fr::argo/ /home/mydirectory/...

To synchronize the index:

rsync -avzh --delete vdmzrs.ifremer.fr::argo-index/ /home/mydirectory/...

Note

The first synchronisation of the whole dac directory of the Argo GDAC (365Gb) can take quite a long time (several hours).

Comparing data sources

Features

Each of the available data sources have their own features and capabilities. Here is a summary:

Data source:

erddap

localftp

argovis

Access Points

region

X

X

X

float

X

X

X

profile

X

X

X

User mode

standard

X

X

X

expert

X

X

Dataset

core (T/S)

X

X

X

BGC

Reference data for DMQC

X

Parallel method

multi-threading

X

X

X

multi-processes

X

Dask client

Fetched data and variables

You may wonder if the fetched data are different from the available data sources.
This will depend on the last update of each data sources and of your local data.

Let’s retrieve one float data from a local sample of the GDAC ftp (a sample GDAC ftp is downloaded automatically with the method argopy.tutorial.open_dataset()):

# Download ftp sample and get the ftp local path:
In [6]: ftproot = argopy.tutorial.open_dataset('localftp')[0]

# then fetch data:
In [7]: with argopy.set_options(src='localftp', local_ftp=ftproot):
   ...:     ds = ArgoDataFetcher().float(1900857).to_xarray()
   ...:     print(ds)
   ...: 
<xarray.Dataset>
Dimensions:                (N_POINTS: 20966)
Coordinates:
  * N_POINTS               (N_POINTS) int64 0 1 2 3 ... 20962 20963 20964 20965
    TIME                   (N_POINTS) datetime64[ns] 2008-02-25T04:03:00 ... ...
    LATITUDE               (N_POINTS) float64 -39.93 -39.93 ... -44.16 -44.16
    LONGITUDE              (N_POINTS) float64 10.81 10.81 10.81 ... 92.65 92.65
Data variables: (12/13)
    CONFIG_MISSION_NUMBER  (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2
    CYCLE_NUMBER           (N_POINTS) int64 0 0 0 0 0 0 ... 192 192 192 192 192
    DATA_MODE              (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION              (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER        (N_POINTS) int64 1900857 1900857 ... 1900857 1900857
    POSITION_QC            (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    ...                     ...
    PRES_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PSAL                   (N_POINTS) float64 34.68 34.68 34.69 ... 34.71 34.72
    PSAL_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TEMP                   (N_POINTS) float64 16.14 16.14 16.03 ... 2.422 2.413
    TEMP_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         /home/docs/.argopy_tutorial_data/ftp
    Fetched_by:           docs
    Fetched_date:         2021/11/02
    Fetched_constraints:  phy;WMO1900857
    Fetched_uri:          /home/docs/.argopy_tutorial_data/ftp/dac/coriolis/1...
    history:              Variables filtered according to DATA_MODE; Variable...

Let’s now retrieve the latest data for this float from the erddap and argovis sources:

In [8]: with argopy.set_options(src='erddap'):
   ...:     ds = ArgoDataFetcher().float(1900857).to_xarray()
   ...:     print(ds)
   ...: 
<xarray.Dataset>
Dimensions:                (N_POINTS: 20966)
Coordinates:
  * N_POINTS               (N_POINTS) int64 0 1 2 3 ... 20962 20963 20964 20965
    LATITUDE               (N_POINTS) float64 -39.93 -39.93 ... -44.16 -44.16
    LONGITUDE              (N_POINTS) float64 10.81 10.81 10.81 ... 92.65 92.65
    TIME                   (N_POINTS) datetime64[ns] 2008-02-25T04:03:00 ... ...
Data variables: (12/13)
    CONFIG_MISSION_NUMBER  (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 2 2 2 2 2 2 2 2
    CYCLE_NUMBER           (N_POINTS) int64 0 0 0 0 0 0 ... 192 192 192 192 192
    DATA_MODE              (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION              (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER        (N_POINTS) int64 1900857 1900857 ... 1900857 1900857
    POSITION_QC            (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    ...                     ...
    PRES_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PSAL                   (N_POINTS) float64 34.68 34.68 34.69 ... 34.71 34.72
    PSAL_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TEMP                   (N_POINTS) float64 16.14 16.14 16.03 ... 2.422 2.413
    TEMP_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC                (N_POINTS) int64 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://www.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2021/11/02
    Fetched_constraints:  phy;WMO1900857
    Fetched_uri:          ['https://www.ifremer.fr/erddap/tabledap/ArgoFloats...
    history:              Variables filtered according to DATA_MODE; Variable...
In [9]: with argopy.set_options(src='argovis'):
   ...:     ds = ArgoDataFetcher().float(1900857).to_xarray()
   ...:     print(ds)
   ...: 
<xarray.Dataset>
Dimensions:          (N_POINTS: 21029)
Coordinates:
  * N_POINTS         (N_POINTS) int64 0 1 2 3 4 ... 21025 21026 21027 21028
    TIME             (N_POINTS) object '2008-02-28T01:23:00.000Z' ... '2013-0...
    LATITUDE         (N_POINTS) float64 -40.02 -40.02 -40.02 ... -44.16 -44.16
    LONGITUDE        (N_POINTS) float64 10.54 10.54 10.54 ... 92.65 92.65 92.65
Data variables:
    CYCLE_NUMBER     (N_POINTS) int64 0 0 0 0 0 0 0 ... 192 192 192 192 192 192
    DATA_MODE        (N_POINTS) <U1 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D' 'D'
    DIRECTION        (N_POINTS) <U1 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_POINTS) int64 1900857 1900857 ... 1900857 1900857
    POSITION_QC      (N_POINTS) int64 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
    PRES             (N_POINTS) int64 16 26 37 45 55 ... 1913 1938 1964 1987
    PSAL             (N_POINTS) float64 34.74 34.73 34.67 ... 34.71 34.71 34.72
    TEMP             (N_POINTS) float64 16.69 16.59 15.92 ... 2.431 2.422 2.413
    TIME_QC          (N_POINTS) int64 1 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://argovis.colorado.edu
    Fetched_by:           docs
    Fetched_date:         2021/11/02
    Fetched_constraints:  phy;WMO1900857
    Fetched_uri:          ['https://argovis.colorado.edu/catalog/platforms/19...

We can see some minor differences between localftp/erddap vs the argovis response: this later data source does not include the descending part of the first profile, this explains why argovis returns slightly less data.

Status of sources

With remote, online data sources, it may happens that the data server is experiencing down time. With local data sources, the availability of the path is checked when it is set. But it may happens that the path points to a disk that get unmounted or unplugged after the option setting.

If you’re running your analysis on a Jupyter notebook, you can use the argopy.status() method to insert a data status monitor on a cell output. All available data sources will be monitored continuously.

argopy.status()
_images/status_monitor.png

If one of the data source become unavailable, you will see the status bar changing to something like:

_images/status_monitor_down.png

Note that the argopy.status() method has a refresh option to let you specify the refresh rate in seconds of the monitoring.

Last, you can check out the following argopy status webpage that monitors all important resources to the software.