Manipulating data#

Once you fetched Argo data, argopy comes with a handy xarray.Dataset accessor argo to perform specific manipulation of the data. This means that if your dataset is named ds, then you can use ds.argo to access more argopy functions. The full list is available in the API documentation page Dataset.argo (xarray accessor).

In this section, we present how argopy can help in manipulating Argo measurements and parameters.

Points vs profiles#

By default, fetched data are returned as a 1D array collection of measurements:

In [1]: from argopy import DataFetcher

In [2]: f = DataFetcher().region([-75,-55,30.,40.,0,100., '2011-01-01', '2011-01-15'])

In [3]: ds_points = f.data

In [4]: ds_points
Out[4]: 
<xarray.Dataset> Size: 63kB
Dimensions:          (N_POINTS: 524)
Coordinates:
    LATITUDE         (N_POINTS) float64 4kB 37.28 37.28 37.28 ... 33.07 33.07
    LONGITUDE        (N_POINTS) float64 4kB -66.77 -66.77 ... -64.59 -64.59
    TIME             (N_POINTS) datetime64[ns] 4kB 2011-01-02T11:14:06 ... 20...
  * N_POINTS         (N_POINTS) int64 4kB 0 1 2 3 4 5 ... 519 520 521 522 523
Data variables: (12/15)
    CYCLE_NUMBER     (N_POINTS) int64 4kB 150 150 150 150 150 ... 13 13 13 13 13
    DATA_MODE        (N_POINTS) <U1 2kB 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION        (N_POINTS) <U1 2kB 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_POINTS) int64 4kB 4900803 4900803 ... 5903377 5903377
    POSITION_QC      (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    PRES             (N_POINTS) float32 2kB 5.0 10.0 15.0 ... 95.97 97.97 99.97
    ...               ...
    PSAL_ERROR       (N_POINTS) float32 2kB 0.01 0.01 0.01 ... 0.01 0.01 0.01
    PSAL_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TEMP             (N_POINTS) float32 2kB 19.46 19.47 19.47 ... 19.2 19.2 19.2
    TEMP_ERROR       (N_POINTS) float32 2kB 0.002 0.002 0.002 ... 0.002 0.002
    TEMP_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
    TIME_QC          (N_POINTS) int64 4kB 1 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         erddap.ifremer.fr
    Fetched_by:           docs
    Fetched_date:         2026/06/25
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          https://erddap.ifremer.fr/erddap/tabledap/ArgoFloat...
    Processing_history:   [PRES,TEMP,PSAL] real-time and adjusted/delayed var...

If you prefer to work with a 2D array collection of vertical profiles, simply transform the dataset with Dataset.argo.point2profile():

In [5]: ds_profiles = ds_points.argo.point2profile()

In [6]: ds_profiles
Out[6]: 
<xarray.Dataset> Size: 17kB
Dimensions:          (N_PROF: 18, N_LEVELS: 50)
Coordinates:
  * N_PROF           (N_PROF) int64 144B 7 13 15 0 6 2 9 ... 12 10 17 3 8 14 16
  * N_LEVELS         (N_LEVELS) int64 400B 0 1 2 3 4 5 6 ... 44 45 46 47 48 49
    LATITUDE         (N_PROF) float64 144B 37.28 33.98 32.88 ... 34.39 33.07
    LONGITUDE        (N_PROF) float64 144B -66.77 -71.17 ... -72.75 -64.59
    TIME             (N_PROF) datetime64[ns] 144B 2011-01-02T11:14:06 ... 201...
Data variables: (12/15)
    CYCLE_NUMBER     (N_PROF) int64 144B 150 3 11 100 180 ... 62 148 151 4 13
    DATA_MODE        (N_PROF) <U1 72B 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION        (N_PROF) <U1 72B 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_PROF) int64 144B 4900803 4901218 ... 4901218 5903377
    POSITION_QC      (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    PRES             (N_PROF, N_LEVELS) float32 4kB 5.0 10.0 15.0 ... 99.97 nan
    ...               ...
    PSAL_ERROR       (N_PROF, N_LEVELS) float32 4kB 0.01 0.01 0.01 ... 0.01 nan
    PSAL_QC          (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    TEMP             (N_PROF, N_LEVELS) float32 4kB 19.46 19.47 ... 19.2 nan
    TEMP_ERROR       (N_PROF) float32 72B 0.002 0.002 0.002 ... 0.002 0.002
    TEMP_QC          (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
    TIME_QC          (N_PROF) int64 144B 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         erddap.ifremer.fr
    Fetched_by:           docs
    Fetched_date:         2026/06/25
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          https://erddap.ifremer.fr/erddap/tabledap/ArgoFloat...
    Processing_history:   [PRES,TEMP,PSAL] real-time and adjusted/delayed var...

You can simply reverse this transformation with the Dataset.argo.profile2point():

In [7]: ds = ds_profiles.argo.profile2point()

In [8]: ds
Out[8]: 
<xarray.Dataset> Size: 63kB
Dimensions:          (N_POINTS: 524)
Coordinates:
    LATITUDE         (N_POINTS) float64 4kB 37.28 37.28 37.28 ... 33.07 33.07
    LONGITUDE        (N_POINTS) float64 4kB -66.77 -66.77 ... -64.59 -64.59
    TIME             (N_POINTS) datetime64[ns] 4kB 2011-01-02T11:14:06 ... 20...
  * N_POINTS         (N_POINTS) int64 4kB 0 1 2 3 4 5 ... 519 520 521 522 523
Data variables: (12/15)
    CYCLE_NUMBER     (N_POINTS) float64 4kB 150.0 150.0 150.0 ... 13.0 13.0 13.0
    DATA_MODE        (N_POINTS) <U1 2kB 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION        (N_POINTS) <U1 2kB 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER  (N_POINTS) float64 4kB 4.901e+06 4.901e+06 ... 5.903e+06
    POSITION_QC      (N_POINTS) float64 4kB 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    PRES             (N_POINTS) float32 2kB 5.0 10.0 15.0 ... 95.97 97.97 99.97
    ...               ...
    PSAL_ERROR       (N_POINTS) float32 2kB 0.01 0.01 0.01 ... 0.01 0.01 0.01
    PSAL_QC          (N_POINTS) float64 4kB 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    TEMP             (N_POINTS) float32 2kB 19.46 19.47 19.47 ... 19.2 19.2 19.2
    TEMP_ERROR       (N_POINTS) float32 2kB 0.002 0.002 0.002 ... 0.002 0.002
    TEMP_QC          (N_POINTS) float64 4kB 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
    TIME_QC          (N_POINTS) float64 4kB 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         erddap.ifremer.fr
    Fetched_by:           docs
    Fetched_date:         2026/06/25
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          https://erddap.ifremer.fr/erddap/tabledap/ArgoFloat...
    Processing_history:   [PRES,TEMP,PSAL] real-time and adjusted/delayed var...

Pressure levels: Interpolation#

Once your dataset is a collection of vertical profiles, you can interpolate variables on standard pressure levels using Dataset.argo.interp_std_levels() with your levels as input:

In [9]: ds_interp = ds_profiles.argo.interp_std_levels([0,10,20,30,40,50])

In [10]: ds_interp
Out[10]: 
<xarray.Dataset> Size: 2kB
Dimensions:            (N_PROF: 18, PRES_INTERPOLATED: 6)
Coordinates:
    LATITUDE           (N_PROF) float64 144B 37.28 33.98 32.88 ... 34.39 33.07
    LONGITUDE          (N_PROF) float64 144B -66.77 -71.17 ... -72.75 -64.59
    TIME               (N_PROF) datetime64[ns] 144B 2011-01-02T11:14:06 ... 2...
  * PRES_INTERPOLATED  (PRES_INTERPOLATED) int64 48B 0 10 20 30 40 50
  * N_PROF             (N_PROF) int64 144B 7 13 15 0 6 2 9 ... 10 17 3 8 14 16
Data variables:
    CYCLE_NUMBER       (N_PROF) int64 144B 150 3 11 100 180 ... 62 148 151 4 13
    DATA_MODE          (N_PROF) <U1 72B 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
    DIRECTION          (N_PROF) <U1 72B 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
    PLATFORM_NUMBER    (N_PROF) int64 144B 4900803 4901218 ... 4901218 5903377
    PRES               (N_PROF, PRES_INTERPOLATED) float32 432B 5.0 ... 50.0
    PSAL               (N_PROF, PRES_INTERPOLATED) float32 432B 36.67 ... 36.68
    TEMP               (N_PROF, PRES_INTERPOLATED) float32 432B 19.46 ... 19.24
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         erddap.ifremer.fr
    Fetched_by:           docs
    Fetched_date:         2026/06/25
    Fetched_constraints:  [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
    Fetched_uri:          https://erddap.ifremer.fr/erddap/tabledap/ArgoFloat...
    Processing_history:   [PRES,TEMP,PSAL] real-time and adjusted/delayed var...
Note on the linear interpolation process :
  • Only profiles that have a maximum pressure higher than the highest standard level are selected for interpolation.

  • Remaining profiles must have at least five data points to allow interpolation.

  • For each profile, shallowest data point is repeated to the surface to allow a 0 standard level while avoiding extrapolation.

Pressure levels: Group-by bins#

If you prefer to avoid interpolation, you can opt for a pressure bins grouping reduction using Dataset.argo.groupby_pressure_bins(). This method can be used to subsample and align an irregular dataset (pressure not being similar in all profiles) on a set of pressure bins. The output dataset could then be used to perform statistics along the N_PROF dimension because N_LEVELS will corresponds to similar pressure bins.

To illustrate this method, let’s start by fetching some data from a low vertical resolution float:

In [11]: f = DataFetcher(src='erddap', mode='expert').float(2901623)  # Low res float

In [12]: ds = f.data

Let’s now sub-sample these measurements along 250db bins, selecting values from the deepest pressure levels for each bins:

In [13]: bins = np.arange(0.0, np.max(ds["PRES"]), 250.0)

In [14]: ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='deep')

In [15]: ds_binned
Out[15]: 
<xarray.Dataset> Size: 190kB
Dimensions:                   (N_POINTS: 659)
Coordinates:
    LATITUDE                  (N_POINTS) float64 5kB 0.012 0.012 ... 3.388 3.388
    LONGITUDE                 (N_POINTS) float64 5kB 92.28 92.28 ... 94.77 94.77
    TIME                      (N_POINTS) datetime64[ns] 5kB 2010-05-14T03:35:...
    STD_PRES_BINS             (N_POINTS) float64 5kB 0.0 250.0 ... 750.0 1e+03
  * N_POINTS                  (N_POINTS) int64 5kB 0 1 2 3 4 ... 655 656 657 658
Data variables: (12/23)
    CONFIG_MISSION_NUMBER     (N_POINTS) float64 5kB 1.0 1.0 1.0 ... 1.0 1.0 1.0
    CYCLE_NUMBER              (N_POINTS) float64 5kB 0.0 0.0 0.0 ... 96.0 96.0
    DATA_MODE                 (N_POINTS) <U1 3kB 'D' 'D' 'D' 'D' ... 'D' 'D' 'D'
    DIRECTION                 (N_POINTS) <U1 3kB 'D' 'D' 'D' 'D' ... 'A' 'A' 'A'
    PLATFORM_NUMBER           (N_POINTS) float64 5kB 2.902e+06 ... 2.902e+06
    POSITION_QC               (N_POINTS) float64 5kB 1.0 1.0 1.0 ... 1.0 1.0 1.0
    ...                        ...
    TEMP_ADJUSTED             (N_POINTS) float32 3kB 13.17 10.08 ... 6.551 6.071
    TEMP_ADJUSTED_ERROR       (N_POINTS) float32 3kB 0.0 0.0 0.0 ... 0.0 0.0 0.0
    TEMP_ADJUSTED_QC          (N_POINTS) float64 5kB 1.0 1.0 1.0 ... 1.0 1.0 1.0
    TEMP_QC                   (N_POINTS) float64 5kB 1.0 1.0 1.0 ... 1.0 1.0 1.0
    TIME_QC                   (N_POINTS) float64 5kB 1.0 1.0 1.0 ... 1.0 1.0 1.0
    VERTICAL_SAMPLING_SCHEME  (N_POINTS) <U29 76kB 'Primary sampling: discret...
Attributes:
    DATA_ID:              ARGO
    DOI:                  http://doi.org/10.17882/42182
    Fetched_from:         erddap.ifremer.fr
    Fetched_by:           docs
    Fetched_date:         2026/06/25
    Fetched_constraints:  WMO2901623
    Fetched_uri:          https://erddap.ifremer.fr/erddap/tabledap/ArgoFloat...
    Processing_history:   Transformed with 'point2profile'; Sub-sampled and r...

See the new STD_PRES_BINS variable that hold the pressure bins definition.

The figure below shows the sub-sampling effect:

import matplotlib as mpl
import matplotlib.pyplot as plt
import cmocean

fig, ax = plt.subplots(figsize=(18,6))
ds.plot.scatter(x='CYCLE_NUMBER', y='PRES', hue='PSAL', ax=ax, cmap=cmocean.cm.haline)
plt.plot(ds_binned['CYCLE_NUMBER'], ds_binned['PRES'], 'r+')
plt.hlines(bins, ds['CYCLE_NUMBER'].min(), ds['CYCLE_NUMBER'].max(), color='k')
plt.hlines(ds_binned['STD_PRES_BINS'], ds_binned['CYCLE_NUMBER'].min(), ds_binned['CYCLE_NUMBER'].max(), color='r')
plt.title(ds.attrs['Fetched_constraints'])
plt.gca().invert_yaxis()
../../_images/groupby_pressure_bins_select_deep.png

The bin limits are shown with horizontal red lines, the original data are in the background colored scatter and the group-by pressure bins values are highlighted in red marks

The select option can take many different values, see the full documentation of Dataset.argo.groupby_pressure_bins() , for all the details. Let’s show here results from the random sampling:

ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='random')
../../_images/groupby_pressure_bins_select_random.png

Filters#

If you fetched data with the expert mode, you may want to use filters to help you curate the data.

  • QC flag filter: Dataset.argo.filter_qc(). This method allows you to filter measurements according to QC flag values. This filter modifies all variables of the dataset.

  • Data mode filter: Dataset.argo.datamode.filter(). This method allows you to filter variables according to their data mode.

  • OWC variables filter: Dataset.argo.filter_scalib_pres(). This method allows you to filter variables according to OWC salinity calibration software requirements. This filter modifies pressure, temperature and salinity related variables of the dataset.