Manipulating data#

Once you fetched data, argopy comes with a handy xarray.Dataset accessor argo to perform specific manipulation of the data. This means that if your dataset is named ds, then you can use ds.argo to access more argopy functions. The full list is available in the API documentation page Dataset.argo (xarray accessor).

Let’s start with standard import:

In [1]: from argopy import DataFetcher as ArgoDataFetcher

Transformation#

Points vs profiles#

By default, fetched data are returned as a 1D array collection of measurements:

In [2]: argo_loader = ArgoDataFetcher().region([-75,-55,30.,40.,0,100., '2011-01-01', '2011-01-15'])

In [3]: ds_points = argo_loader.to_xarray()

In [4]: ds_points
Out[4]: 
<xarray.Dataset>
Dimensions:                (N_POINTS: 524)
Coordinates:
  * N_POINTS               (N_POINTS) int64 0 1 2 3 4 5 ... 519 520 521 522 523
    LATITUDE               (N_POINTS) float64 37.28 37.28 37.28 ... 33.07 33.07
    LONGITUDE              (N_POINTS) float64 -66.77 -66.77 ... -64.59 -64.59
    TIME                   (N_POINTS) datetime64[ns] 2011-01-02T11:14:06 ... ...
Data variables: (12/13)
    CONFIG_MISSION_NUMBER  (N_POINTS) int32 1 1 1 1 1 1 1 ... 13 13 13 13 13 13
    CYCLE_NUMBER           (N_POINTS) int32 150 150 150 150 150 ... 13 13 13 13
    DATA_MODE              (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION              (N_POINTS) <U1 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER        (N_POINTS) int32 4900803 4900803 ... 5903377 5903377
    POSITION_QC            (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    ...                     ...
    PRES_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PSAL                   (N_POINTS) float64 36.67 36.67 36.67 ... 36.67 36.67
    PSAL_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TEMP                   (N_POINTS) float64 19.46 19.47 19.47 ... 19.2 19.2
    TEMP_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2022/11/17
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...

If you prefer to work with a 2D array collection of vertical profiles, simply transform the dataset with Dataset.argo.point2profile():

In [5]: ds_profiles = ds_points.argo.point2profile()

In [6]: ds_profiles
Out[6]: 
<xarray.Dataset>
Dimensions:                (N_PROF: 18, N_LEVELS: 50)
Coordinates:
  * N_PROF                 (N_PROF) int64 7 13 15 0 6 2 9 ... 12 10 17 3 8 14 16
  * N_LEVELS               (N_LEVELS) int64 0 1 2 3 4 5 6 ... 44 45 46 47 48 49
    LATITUDE               (N_PROF) float64 37.28 33.98 32.88 ... 34.39 33.07
    LONGITUDE              (N_PROF) float64 -66.77 -71.17 ... -72.75 -64.59
    TIME                   (N_PROF) datetime64[ns] 2011-01-02T11:14:06 ... 20...
Data variables: (12/13)
    CONFIG_MISSION_NUMBER  (N_PROF) int32 1 1 11 1 1 1 0 1 1 1 1 1 1 2 1 1 1 13
    CYCLE_NUMBER           (N_PROF) int32 150 3 11 100 180 ... 62 148 151 4 13
    DATA_MODE              (N_PROF) <U1 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION              (N_PROF) <U1 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER        (N_PROF) int32 4900803 4901218 ... 4901218 5903377
    POSITION_QC            (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    ...                     ...
    PRES_QC                (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    PSAL                   (N_PROF, N_LEVELS) float64 36.67 36.67 ... 36.67 nan
    PSAL_QC                (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    TEMP                   (N_PROF, N_LEVELS) float64 19.46 19.47 ... 19.2 nan
    TEMP_QC                (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    TIME_QC                (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2022/11/17
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...

You can simply reverse this transformation with the Dataset.argo.profile2point():

In [7]: ds = ds_profiles.argo.profile2point()

In [8]: ds
Out[8]: 
<xarray.Dataset>
Dimensions:                (N_POINTS: 524)
Coordinates:
    LONGITUDE              (N_POINTS) float64 -66.77 -66.77 ... -64.59 -64.59
    TIME                   (N_POINTS) datetime64[ns] 2011-01-02T11:14:06 ... ...
    LATITUDE               (N_POINTS) float64 37.28 37.28 37.28 ... 33.07 33.07
  * N_POINTS               (N_POINTS) int64 0 1 2 3 4 5 ... 519 520 521 522 523
Data variables: (12/13)
    CONFIG_MISSION_NUMBER  (N_POINTS) int32 1 1 1 1 1 1 1 ... 13 13 13 13 13 13
    CYCLE_NUMBER           (N_POINTS) int32 150 150 150 150 150 ... 13 13 13 13
    DATA_MODE              (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION              (N_POINTS) <U1 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER        (N_POINTS) int32 4900803 4900803 ... 5903377 5903377
    POSITION_QC            (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    ...                     ...
    PRES_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PSAL                   (N_POINTS) float64 36.67 36.67 36.67 ... 36.67 36.67
    PSAL_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TEMP                   (N_POINTS) float64 19.46 19.47 19.47 ... 19.2 19.2
    TEMP_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2022/11/17
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...

Pressure levels: Interpolation#

Once your dataset is a collection of vertical profiles, you can interpolate variables on standard pressure levels using Dataset.argo.interp_std_levels() with your levels as input:

In [9]: ds_interp = ds_profiles.argo.interp_std_levels([0,10,20,30,40,50])

In [10]: ds_interp
Out[10]: 
<xarray.Dataset>
Dimensions:                (N_PROF: 18, PRES_INTERPOLATED: 6)
Coordinates:
  * N_PROF                 (N_PROF) int64 7 13 15 0 6 2 9 ... 12 10 17 3 8 14 16
    LATITUDE               (N_PROF) float64 37.28 33.98 32.88 ... 34.39 33.07
    LONGITUDE              (N_PROF) float64 -66.77 -71.17 ... -72.75 -64.59
    TIME                   (N_PROF) datetime64[ns] 2011-01-02T11:14:06 ... 20...
  * PRES_INTERPOLATED      (PRES_INTERPOLATED) int64 0 10 20 30 40 50
Data variables:
    CONFIG_MISSION_NUMBER  (N_PROF) int32 1 1 11 1 1 1 0 1 1 1 1 1 1 2 1 1 1 13
    CYCLE_NUMBER           (N_PROF) int32 150 3 11 100 180 ... 62 148 151 4 13
    DATA_MODE              (N_PROF) <U1 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION              (N_PROF) <U1 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER        (N_PROF) int32 4900803 4901218 ... 4901218 5903377
    PRES                   (N_PROF, PRES_INTERPOLATED) float64 5.0 10.0 ... 50.0
    PSAL                   (N_PROF, PRES_INTERPOLATED) float64 36.67 ... 36.68
    TEMP                   (N_PROF, PRES_INTERPOLATED) float64 19.46 ... 19.24
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2022/11/17
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...
Note on the linear interpolation process :
  • Only profiles that have a maximum pressure higher than the highest standard level are selected for interpolation.

  • Remaining profiles must have at least five data points to allow interpolation.

  • For each profile, shallowest data point is repeated to the surface to allow a 0 standard level while avoiding extrapolation.

Pressure levels: Group-by bins#

If you prefer to avoid interpolation, you can opt for a pressure bins grouping reduction using Dataset.argo.groupby_pressure_bins(). This method can be used to subsample and align an irregular dataset (pressure not being similar in all profiles) on a set of pressure bins. The output dataset could then be used to perform statistics along the N_PROF dimension because N_LEVELS will corresponds to similar pressure bins.

To illustrate this method, let’s start by fetching some data from a low vertical resolution float:

In [11]: loader = ArgoDataFetcher(src='erddap', mode='expert').float(2901623)  # Low res float

In [12]: ds = loader.load().data

Let’s now sub-sample these measurements along 250db bins, selecting values from the deepest pressure levels for each bins:

In [13]: bins = np.arange(0.0, np.max(ds["PRES"]), 250.0)

In [14]: ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='deep')

In [15]: ds_binned
Out[15]: 
<xarray.Dataset>
Dimensions:                   (N_POINTS: 659)
Coordinates:
    LONGITUDE                 (N_POINTS) float64 92.28 92.28 ... 94.77 94.77
    TIME                      (N_POINTS) datetime64[ns] 2010-05-14T03:35:00 ....
    LATITUDE                  (N_POINTS) float64 0.012 0.012 ... 3.388 3.388
    STD_PRES_BINS             (N_POINTS) float64 0.0 250.0 500.0 ... 750.0 1e+03
  * N_POINTS                  (N_POINTS) int64 0 1 2 3 4 ... 654 655 656 657 658
Data variables: (12/23)
    CONFIG_MISSION_NUMBER     (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
    CYCLE_NUMBER              (N_POINTS) int32 0 0 0 0 0 0 ... 95 96 96 96 96 96
    DATA_MODE                 (N_POINTS) <U1 'R' 'R' 'R' 'R' ... 'R' 'R' 'R' 'R'
    DIRECTION                 (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER           (N_POINTS) int32 2901623 2901623 ... 2901623
    POSITION_QC               (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
    ...                        ...
    TEMP_ADJUSTED             (N_POINTS) float32 nan nan nan nan ... nan nan nan
    TEMP_ADJUSTED_ERROR       (N_POINTS) float32 nan nan nan nan ... nan nan nan
    TEMP_ADJUSTED_QC          (N_POINTS) int32 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
    TEMP_QC                   (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
    TIME_QC                   (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
    VERTICAL_SAMPLING_SCHEME  (N_POINTS) <U29 'Primary sampling: discrete []'...
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2022/11/17
    Fetched_constraints:  phy;WMO2901623
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Transformed with point2profile; Sub-sampled and re-...

See the new STD_PRES_BINS variable that hold the pressure bins definition.

The figure below shows the sub-sampling effect:

import matplotlib as mpl
import matplotlib.pyplot as plt
import cmocean

fig, ax = plt.subplots(figsize=(18,6))
ds.plot.scatter(x='CYCLE_NUMBER', y='PRES', hue='PSAL', ax=ax, cmap=cmocean.cm.haline)
plt.plot(ds_binned['CYCLE_NUMBER'], ds_binned['PRES'], 'r+')
plt.hlines(bins, ds['CYCLE_NUMBER'].min(), ds['CYCLE_NUMBER'].max(), color='k')
plt.hlines(ds_binned['STD_PRES_BINS'], ds_binned['CYCLE_NUMBER'].min(), ds_binned['CYCLE_NUMBER'].max(), color='r')
plt.title(ds.attrs['Fetched_constraints'])
plt.gca().invert_yaxis()
_images/groupby_pressure_bins_select_deep.png

The bin limits are shown with horizontal red lines, the original data are in the background colored scatter and the group-by pressure bins values are highlighted in red marks

The select option can take many different values, see the full documentation of Dataset.argo.groupby_pressure_bins() , for all the details. Let’s show here results from the random sampling:

ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='random')
_images/groupby_pressure_bins_select_random.png

Filters#

If you fetched data with the expert mode, you may want to use filters to help you curate the data.

  • QC flag filter: Dataset.argo.filter_qc(). This method allows you to filter measurements according to QC flag values. This filter modifies all variables of the dataset.

  • Data mode filter: Dataset.argo.filter_data_mode(). This method allows you to filter variables according to their data mode. This filter modifies the <PARAM> and <PARAM_QC> variables of the dataset.

  • OWC variables filter: Dataset.argo.filter_scalib_pres(). This method allows you to filter variables according to OWC salinity calibration software requirements. This filter modifies pressure, temperature and salinity related variables of the dataset.

Complementary data#

TEOS-10 variables#

You can compute additional ocean variables from TEOS-10. The default list of variables is: ‘SA’, ‘CT’, ‘SIG0’, ‘N2’, ‘PV’, ‘PTEMP’ (‘SOUND_SPEED’, ‘CNDC’ are optional). Simply raise an issue to add a new one.

This can be done using the Dataset.argo.teos10() method and indicating the list of variables you want to compute:

In [16]: ds = ArgoDataFetcher().float(2901623).to_xarray()

In [17]: ds.argo.teos10(['SA', 'CT', 'PV'])
Out[17]: 
<xarray.Dataset>
Dimensions:                (N_POINTS: 8339)
Coordinates:
  * N_POINTS               (N_POINTS) int64 0 1 2 3 4 ... 8335 8336 8337 8338
    LATITUDE               (N_POINTS) float64 0.012 0.012 0.012 ... 3.388 3.388
    LONGITUDE              (N_POINTS) float64 92.28 92.28 92.28 ... 94.77 94.77
    TIME                   (N_POINTS) datetime64[ns] 2010-05-14T03:35:00 ... ...
Data variables: (12/16)
    CONFIG_MISSION_NUMBER  (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    CYCLE_NUMBER           (N_POINTS) int32 0 0 0 0 0 0 0 ... 96 96 96 96 96 96
    DATA_MODE              (N_POINTS) <U1 'R' 'R' 'R' 'R' ... 'R' 'R' 'R' 'R'
    DIRECTION              (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER        (N_POINTS) int32 2901623 2901623 ... 2901623 2901623
    POSITION_QC            (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    ...                     ...
    TEMP                   (N_POINTS) float64 30.16 30.17 30.17 ... 6.189 6.071
    TEMP_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC                (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    SA                     (N_POINTS) float64 34.44 34.44 34.44 ... 35.09 35.08
    CT                     (N_POINTS) float64 30.2 30.2 30.2 ... 6.078 5.959
    PV                     (N_POINTS) float64 nan -1.775e-15 ... 1.573e-12 nan
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2022/11/17
    Fetched_constraints:  phy;WMO2901623
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...
In [18]: ds['SA']
Out[18]: 
<xarray.DataArray 'SA' (N_POINTS: 8339)>
array([34.43589995, 34.43691368, 34.43693526, ..., 35.0919598 ,
       35.09211518, 35.08221619])
Coordinates:
  * N_POINTS   (N_POINTS) int64 0 1 2 3 4 5 6 ... 8333 8334 8335 8336 8337 8338
    LATITUDE   (N_POINTS) float64 0.012 0.012 0.012 0.012 ... 3.388 3.388 3.388
    LONGITUDE  (N_POINTS) float64 92.28 92.28 92.28 92.28 ... 94.77 94.77 94.77
    TIME       (N_POINTS) datetime64[ns] 2010-05-14T03:35:00 ... 2013-01-01T0...
Attributes:
    long_name:      Absolute Salinity
    standard_name:  sea_water_absolute_salinity
    unit:           g/kg

Data models#

By default argopy works with xarray.Dataset for Argo data fetcher, and with pandas.DataFrame for Argo index fetcher.

For your own analysis, you may prefer to switch from one to the other. This is all built in argopy, with the argopy.DataFetcher.to_dataframe() and argopy.IndexFetcher.to_xarray() methods.

In [19]: ArgoDataFetcher().profile(6902746, 34).to_dataframe()
Out[19]: 
          CONFIG_MISSION_NUMBER  CYCLE_NUMBER  ... LONGITUDE                TIME
N_POINTS                                       ...                              
0                             2            34  ...   -58.119 2017-12-20 06:58:00
1                             2            34  ...   -58.119 2017-12-20 06:58:00
2                             2            34  ...   -58.119 2017-12-20 06:58:00
3                             2            34  ...   -58.119 2017-12-20 06:58:00
4                             2            34  ...   -58.119 2017-12-20 06:58:00
...                         ...           ...  ...       ...                 ...
104                           2            34  ...   -58.119 2017-12-20 06:58:00
105                           2            34  ...   -58.119 2017-12-20 06:58:00
106                           2            34  ...   -58.119 2017-12-20 06:58:00
107                           2            34  ...   -58.119 2017-12-20 06:58:00
108                           2            34  ...   -58.119 2017-12-20 06:58:00

[109 rows x 16 columns]

Saving data#

Once you have your Argo data as xarray.Dataset, simply use the awesome possibilities of xarray like xarray.Dataset.to_netcdf() or xarray.Dataset.to_zarr().