Manipulating data#

Once you fetched data, argopy comes with a handy xarray.Dataset accessor argo to perform specific manipulation of the data. This means that if your dataset is named ds, then you can use ds.argo to access more argopy functions. The full list is available in the API documentation page Dataset.argo (xarray accessor).

Let’s start with standard import:

In [1]: from argopy import DataFetcher

Transformation#

Points vs profiles#

By default, fetched data are returned as a 1D array collection of measurements:

In [2]: f = DataFetcher().region([-75,-55,30.,40.,0,100., '2011-01-01', '2011-01-15'])

In [3]: ds_points = f.data

In [4]: ds_points
Out[4]: 
<xarray.Dataset> Size: 63kB
Dimensions:          (N_POINTS: 524)
Coordinates:
  * N_POINTS         (N_POINTS) int64 4kB 0 1 2 3 4 5 ... 519 520 521 522 523
    LATITUDE         (N_POINTS) float64 4kB 37.28 37.28 37.28 ... 33.07 33.07
    LONGITUDE        (N_POINTS) float64 4kB -66.77 -66.77 ... -64.59 -64.59
    TIME             (N_POINTS) datetime64[ns] 4kB 2011-01-02T11:14:06 ... 20...
Data variables: (12/15)
    CYCLE_NUMBER     (N_POINTS) int64 4kB 150 150 150 150 150 ... 13 13 13 13 13
    DATA_MODE        (N_POINTS) <U1 2kB 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION        (N_POINTS) <U1 2kB 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_POINTS) int64 4kB 4900803 4900803 ... 5903377 5903377
    POSITION_QC      (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PRES             (N_POINTS) float32 2kB 5.0 10.0 15.0 ... 95.97 97.97 99.97
    ...               ...
    PSAL_ERROR       (N_POINTS) float32 2kB 0.01 0.01 0.01 ... 0.01 0.01 0.01
    PSAL_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TEMP             (N_POINTS) float32 2kB 19.46 19.47 19.47 ... 19.2 19.2 19.2
    TEMP_ERROR       (N_POINTS) float32 2kB 0.002 0.002 0.002 ... 0.002 0.002
    TEMP_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2024/09/23
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...

If you prefer to work with a 2D array collection of vertical profiles, simply transform the dataset with Dataset.argo.point2profile():

In [5]: ds_profiles = ds_points.argo.point2profile()

In [6]: ds_profiles
Out[6]: 
<xarray.Dataset> Size: 17kB
Dimensions:          (N_PROF: 18, N_LEVELS: 50)
Coordinates:
  * N_PROF           (N_PROF) int64 144B 7 13 15 0 6 2 9 ... 12 10 17 3 8 14 16
  * N_LEVELS         (N_LEVELS) int64 400B 0 1 2 3 4 5 6 ... 44 45 46 47 48 49
    LATITUDE         (N_PROF) float64 144B 37.28 33.98 32.88 ... 34.39 33.07
    LONGITUDE        (N_PROF) float64 144B -66.77 -71.17 ... -72.75 -64.59
    TIME             (N_PROF) datetime64[ns] 144B 2011-01-02T11:14:06 ... 201...
Data variables: (12/15)
    CYCLE_NUMBER     (N_PROF) int64 144B 150 3 11 100 180 ... 62 148 151 4 13
    DATA_MODE        (N_PROF) <U1 72B 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION        (N_PROF) <U1 72B 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_PROF) int64 144B 4900803 4901218 ... 4901218 5903377
    POSITION_QC      (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    PRES             (N_PROF, N_LEVELS) float32 4kB 5.0 10.0 15.0 ... 99.97 nan
    ...               ...
    PSAL_ERROR       (N_PROF, N_LEVELS) float32 4kB 0.01 0.01 0.01 ... 0.01 nan
    PSAL_QC          (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    TEMP             (N_PROF, N_LEVELS) float32 4kB 19.46 19.47 ... 19.2 nan
    TEMP_ERROR       (N_PROF) float32 72B 0.002 0.002 0.002 ... 0.002 0.002
    TEMP_QC          (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    TIME_QC          (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2024/09/23
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...

You can simply reverse this transformation with the Dataset.argo.profile2point():

In [7]: ds = ds_profiles.argo.profile2point()

In [8]: ds
Out[8]: 
<xarray.Dataset> Size: 63kB
Dimensions:          (N_POINTS: 524)
Coordinates:
    LATITUDE         (N_POINTS) float64 4kB 37.28 37.28 37.28 ... 33.07 33.07
    LONGITUDE        (N_POINTS) float64 4kB -66.77 -66.77 ... -64.59 -64.59
    TIME             (N_POINTS) datetime64[ns] 4kB 2011-01-02T11:14:06 ... 20...
  * N_POINTS         (N_POINTS) int64 4kB 0 1 2 3 4 5 ... 519 520 521 522 523
Data variables: (12/15)
    CYCLE_NUMBER     (N_POINTS) int64 4kB 150 150 150 150 150 ... 13 13 13 13 13
    DATA_MODE        (N_POINTS) <U1 2kB 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION        (N_POINTS) <U1 2kB 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_POINTS) int64 4kB 4900803 4900803 ... 5903377 5903377
    POSITION_QC      (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PRES             (N_POINTS) float32 2kB 5.0 10.0 15.0 ... 95.97 97.97 99.97
    ...               ...
    PSAL_ERROR       (N_POINTS) float32 2kB 0.01 0.01 0.01 ... 0.01 0.01 0.01
    PSAL_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TEMP             (N_POINTS) float32 2kB 19.46 19.47 19.47 ... 19.2 19.2 19.2
    TEMP_ERROR       (N_POINTS) float32 2kB 0.002 0.002 0.002 ... 0.002 0.002
    TEMP_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2024/09/23
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...

Pressure levels: Interpolation#

Once your dataset is a collection of vertical profiles, you can interpolate variables on standard pressure levels using Dataset.argo.interp_std_levels() with your levels as input:

In [9]: ds_interp = ds_profiles.argo.interp_std_levels([0,10,20,30,40,50])

In [10]: ds_interp
Out[10]: 
<xarray.Dataset> Size: 2kB
Dimensions:            (N_PROF: 18, PRES_INTERPOLATED: 6)
Coordinates:
    LATITUDE           (N_PROF) float64 144B 37.28 33.98 32.88 ... 34.39 33.07
    LONGITUDE          (N_PROF) float64 144B -66.77 -71.17 ... -72.75 -64.59
    TIME               (N_PROF) datetime64[ns] 144B 2011-01-02T11:14:06 ... 2...
  * PRES_INTERPOLATED  (PRES_INTERPOLATED) int64 48B 0 10 20 30 40 50
  * N_PROF             (N_PROF) int64 144B 7 13 15 0 6 2 9 ... 10 17 3 8 14 16
Data variables:
    CYCLE_NUMBER       (N_PROF) int64 144B 150 3 11 100 180 ... 62 148 151 4 13
    DATA_MODE          (N_PROF) <U1 72B 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION          (N_PROF) <U1 72B 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER    (N_PROF) int64 144B 4900803 4901218 ... 4901218 5903377
    PRES               (N_PROF, PRES_INTERPOLATED) float32 432B 5.0 ... 50.0
    PSAL               (N_PROF, PRES_INTERPOLATED) float32 432B 36.67 ... 36.68
    TEMP               (N_PROF, PRES_INTERPOLATED) float32 432B 19.46 ... 19.24
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2024/09/23
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...
Note on the linear interpolation process :
  • Only profiles that have a maximum pressure higher than the highest standard level are selected for interpolation.

  • Remaining profiles must have at least five data points to allow interpolation.

  • For each profile, shallowest data point is repeated to the surface to allow a 0 standard level while avoiding extrapolation.

Pressure levels: Group-by bins#

If you prefer to avoid interpolation, you can opt for a pressure bins grouping reduction using Dataset.argo.groupby_pressure_bins(). This method can be used to subsample and align an irregular dataset (pressure not being similar in all profiles) on a set of pressure bins. The output dataset could then be used to perform statistics along the N_PROF dimension because N_LEVELS will corresponds to similar pressure bins.

To illustrate this method, let’s start by fetching some data from a low vertical resolution float:

In [11]: f = DataFetcher(src='erddap', mode='expert').float(2901623)  # Low res float

In [12]: ds = f.data

Let’s now sub-sample these measurements along 250db bins, selecting values from the deepest pressure levels for each bins:

In [13]: bins = np.arange(0.0, np.max(ds["PRES"]), 250.0)

In [14]: ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='deep')

In [15]: ds_binned
Out[15]: 
<xarray.Dataset> Size: 190kB
Dimensions:                   (N_POINTS: 659)
Coordinates:
    LATITUDE                  (N_POINTS) float64 5kB 0.012 0.012 ... 3.388 3.388
    LONGITUDE                 (N_POINTS) float64 5kB 92.28 92.28 ... 94.77 94.77
    TIME                      (N_POINTS) datetime64[ns] 5kB 2010-05-14T03:35:...
    STD_PRES_BINS             (N_POINTS) float64 5kB 0.0 250.0 ... 750.0 1e+03
  * N_POINTS                  (N_POINTS) int64 5kB 0 1 2 3 4 ... 655 656 657 658
Data variables: (12/23)
    CONFIG_MISSION_NUMBER     (N_POINTS) int64 5kB 1 1 1 1 1 1 1 ... 1 1 1 1 1 1
    CYCLE_NUMBER              (N_POINTS) int64 5kB 0 0 0 0 0 ... 96 96 96 96 96
    DATA_MODE                 (N_POINTS) <U1 3kB 'D' 'D' 'D' 'D' ... 'D' 'D' 'D'
    DIRECTION                 (N_POINTS) <U1 3kB 'D' 'D' 'D' 'D' ... 'A' 'A' 'A'
    PLATFORM_NUMBER           (N_POINTS) int64 5kB 2901623 2901623 ... 2901623
    POSITION_QC               (N_POINTS) int64 5kB 1 1 1 1 1 1 1 ... 1 1 1 1 1 1
    ...                        ...
    TEMP_ADJUSTED             (N_POINTS) float32 3kB 13.17 10.08 ... 6.551 6.071
    TEMP_ADJUSTED_ERROR       (N_POINTS) float32 3kB 0.0 0.0 0.0 ... 0.0 0.0 0.0
    TEMP_ADJUSTED_QC          (N_POINTS) int64 5kB 1 1 1 1 1 1 1 ... 1 1 1 1 1 1
    TEMP_QC                   (N_POINTS) int64 5kB 1 1 1 1 1 1 1 ... 1 1 1 1 1 1
    TIME_QC                   (N_POINTS) int64 5kB 1 1 1 1 1 1 1 ... 1 1 1 1 1 1
    VERTICAL_SAMPLING_SCHEME  (N_POINTS) <U29 76kB 'Primary sampling: discret...
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2024/09/23
    Fetched_constraints:  WMO2901623
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Transformed with point2profile; Sub-sampled and re-...

See the new STD_PRES_BINS variable that hold the pressure bins definition.

The figure below shows the sub-sampling effect:

import matplotlib as mpl
import matplotlib.pyplot as plt
import cmocean

fig, ax = plt.subplots(figsize=(18,6))
ds.plot.scatter(x='CYCLE_NUMBER', y='PRES', hue='PSAL', ax=ax, cmap=cmocean.cm.haline)
plt.plot(ds_binned['CYCLE_NUMBER'], ds_binned['PRES'], 'r+')
plt.hlines(bins, ds['CYCLE_NUMBER'].min(), ds['CYCLE_NUMBER'].max(), color='k')
plt.hlines(ds_binned['STD_PRES_BINS'], ds_binned['CYCLE_NUMBER'].min(), ds_binned['CYCLE_NUMBER'].max(), color='r')
plt.title(ds.attrs['Fetched_constraints'])
plt.gca().invert_yaxis()
../../_images/groupby_pressure_bins_select_deep.png

The bin limits are shown with horizontal red lines, the original data are in the background colored scatter and the group-by pressure bins values are highlighted in red marks

The select option can take many different values, see the full documentation of Dataset.argo.groupby_pressure_bins() , for all the details. Let’s show here results from the random sampling:

ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='random')
../../_images/groupby_pressure_bins_select_random.png

Filters#

If you fetched data with the expert mode, you may want to use filters to help you curate the data.

  • QC flag filter: Dataset.argo.filter_qc(). This method allows you to filter measurements according to QC flag values. This filter modifies all variables of the dataset.

  • Data mode filter: Dataset.argo.filter_data_mode(). This method allows you to filter variables according to their data mode. This filter modifies the <PARAM> and <PARAM_QC> variables of the dataset.

  • OWC variables filter: Dataset.argo.filter_scalib_pres(). This method allows you to filter variables according to OWC salinity calibration software requirements. This filter modifies pressure, temperature and salinity related variables of the dataset.

Complementary data#

TEOS-10 variables#

You can compute additional ocean variables from TEOS-10. The default list of variables is: ‘SA’, ‘CT’, ‘SIG0’, ‘N2’, ‘PV’, ‘PTEMP’ (‘SOUND_SPEED’, ‘CNDC’ are optional). Simply raise an issue to add a new one.

This can be done using the Dataset.argo.teos10() method and indicating the list of variables you want to compute:

In [16]: ds = DataFetcher().float(2901623).to_xarray()

In [17]: ds.argo.teos10(['SA', 'CT', 'PV'])
Out[17]: 
<xarray.Dataset> Size: 1MB
Dimensions:          (N_POINTS: 8341)
Coordinates:
  * N_POINTS         (N_POINTS) int64 67kB 0 1 2 3 4 ... 8337 8338 8339 8340
    LATITUDE         (N_POINTS) float64 67kB 0.012 0.012 0.012 ... 3.388 3.388
    LONGITUDE        (N_POINTS) float64 67kB 92.28 92.28 92.28 ... 94.77 94.77
    TIME             (N_POINTS) datetime64[ns] 67kB 2010-05-14T03:35:00 ... 2...
Data variables: (12/18)
    CYCLE_NUMBER     (N_POINTS) int64 67kB 0 0 0 0 0 0 0 ... 96 96 96 96 96 96
    DATA_MODE        (N_POINTS) <U1 33kB 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION        (N_POINTS) <U1 33kB 'D' 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_POINTS) int64 67kB 2901623 2901623 ... 2901623 2901623
    POSITION_QC      (N_POINTS) int64 67kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PRES             (N_POINTS) float32 33kB 17.0 25.0 ... 1.112e+03 1.137e+03
    ...               ...
    TEMP_ERROR       (N_POINTS) float32 33kB 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0
    TEMP_QC          (N_POINTS) int64 67kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC          (N_POINTS) int64 67kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    SA               (N_POINTS) float64 67kB 34.44 34.44 34.44 ... 35.09 35.08
    CT               (N_POINTS) float64 67kB 30.2 30.2 30.2 ... 6.078 5.959
    PV               (N_POINTS) float64 67kB nan -1.78e-15 ... 1.573e-12 nan
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         https://erddap.ifremer.fr/erddap
    Fetched_by:           docs
    Fetched_date:         2024/09/23
    Fetched_constraints:  WMO2901623
    Fetched_uri:          ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
    history:              Variables filtered according to DATA_MODE; Variable...
In [18]: ds['SA']
Out[18]: 
<xarray.DataArray 'SA' (N_POINTS: 8341)> Size: 67kB
array([34.43600343, 34.43701333, 34.43703491, ..., 35.09205948,
       35.09221486, 35.08231586])
Coordinates:
  * N_POINTS   (N_POINTS) int64 67kB 0 1 2 3 4 5 ... 8336 8337 8338 8339 8340
    LATITUDE   (N_POINTS) float64 67kB 0.012 0.012 0.012 ... 3.388 3.388 3.388
    LONGITUDE  (N_POINTS) float64 67kB 92.28 92.28 92.28 ... 94.77 94.77 94.77
    TIME       (N_POINTS) datetime64[ns] 67kB 2010-05-14T03:35:00 ... 2013-01...
Attributes:
    long_name:      Absolute Salinity
    standard_name:  sea_water_absolute_salinity
    unit:           g/kg

Data models#

By default argopy works with xarray.Dataset for Argo data fetcher, and with pandas.DataFrame for Argo index fetcher.

For your own analysis, you may prefer to switch from one to the other. This is all built in argopy, with the argopy.DataFetcher.to_dataframe() and argopy.IndexFetcher.to_xarray() methods.

In [19]: DataFetcher().profile(6902746, 34).to_dataframe()
Out[19]: 
          CYCLE_NUMBER DATA_MODE  ... LONGITUDE                TIME
N_POINTS                          ...                              
0                   34         D  ...   -58.119 2017-12-20 06:58:00
1                   34         D  ...   -58.119 2017-12-20 06:58:00
2                   34         D  ...   -58.119 2017-12-20 06:58:00
3                   34         D  ...   -58.119 2017-12-20 06:58:00
4                   34         D  ...   -58.119 2017-12-20 06:58:00
...                ...       ...  ...       ...                 ...
104                 34         D  ...   -58.119 2017-12-20 06:58:00
105                 34         D  ...   -58.119 2017-12-20 06:58:00
106                 34         D  ...   -58.119 2017-12-20 06:58:00
107                 34         D  ...   -58.119 2017-12-20 06:58:00
108                 34         D  ...   -58.119 2017-12-20 06:58:00

[109 rows x 18 columns]

Saving data#

Once you have your Argo data as xarray.Dataset, simply use the awesome possibilities of xarray like xarray.Dataset.to_netcdf() or xarray.Dataset.to_zarr().