Manipulating data#
Once you fetched data, argopy comes with a handy xarray.Dataset
accessor argo
to perform specific manipulation of the data. This means that if your dataset is named ds, then you can use ds.argo to access more argopy functions. The full list is available in the API documentation page Dataset.argo (xarray accessor).
Let’s start with standard import:
In [1]: from argopy import DataFetcher as ArgoDataFetcher
Transformation#
Points vs profiles#
By default, fetched data are returned as a 1D array collection of measurements:
In [2]: argo_loader = ArgoDataFetcher().region([-75,-55,30.,40.,0,100., '2011-01-01', '2011-01-15'])
In [3]: ds_points = argo_loader.to_xarray()
In [4]: ds_points
Out[4]:
<xarray.Dataset>
Dimensions: (N_POINTS: 524)
Coordinates:
* N_POINTS (N_POINTS) int64 0 1 2 3 4 5 ... 519 520 521 522 523
LATITUDE (N_POINTS) float64 37.28 37.28 37.28 ... 33.07 33.07
LONGITUDE (N_POINTS) float64 -66.77 -66.77 ... -64.59 -64.59
TIME (N_POINTS) datetime64[ns] 2011-01-02T11:14:06 ... ...
Data variables: (12/13)
CONFIG_MISSION_NUMBER (N_POINTS) int32 1 1 1 1 1 1 1 ... 13 13 13 13 13 13
CYCLE_NUMBER (N_POINTS) int32 150 150 150 150 150 ... 13 13 13 13
DATA_MODE (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
DIRECTION (N_POINTS) <U1 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_POINTS) int32 4900803 4900803 ... 5903377 5903377
POSITION_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
... ...
PRES_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
PSAL (N_POINTS) float64 36.67 36.67 36.67 ... 36.67 36.67
PSAL_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TEMP (N_POINTS) float64 19.46 19.47 19.47 ... 19.2 19.2
TEMP_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TIME_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://erddap.ifremer.fr/erddap
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
Fetched_uri: ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
history: Variables filtered according to DATA_MODE; Variable...
If you prefer to work with a 2D array collection of vertical profiles, simply transform the dataset with Dataset.argo.point2profile()
:
In [5]: ds_profiles = ds_points.argo.point2profile()
In [6]: ds_profiles
Out[6]:
<xarray.Dataset>
Dimensions: (N_PROF: 18, N_LEVELS: 50)
Coordinates:
* N_PROF (N_PROF) int64 7 13 15 0 6 2 9 ... 12 10 17 3 8 14 16
* N_LEVELS (N_LEVELS) int64 0 1 2 3 4 5 6 ... 44 45 46 47 48 49
LATITUDE (N_PROF) float64 37.28 33.98 32.88 ... 34.39 33.07
LONGITUDE (N_PROF) float64 -66.77 -71.17 ... -72.75 -64.59
TIME (N_PROF) datetime64[ns] 2011-01-02T11:14:06 ... 20...
Data variables: (12/13)
CONFIG_MISSION_NUMBER (N_PROF) int32 1 1 11 1 1 1 0 1 1 1 1 1 1 2 1 1 1 13
CYCLE_NUMBER (N_PROF) int32 150 3 11 100 180 ... 62 148 151 4 13
DATA_MODE (N_PROF) <U1 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
DIRECTION (N_PROF) <U1 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_PROF) int32 4900803 4901218 ... 4901218 5903377
POSITION_QC (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
... ...
PRES_QC (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
PSAL (N_PROF, N_LEVELS) float64 36.67 36.67 ... 36.67 nan
PSAL_QC (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
TEMP (N_PROF, N_LEVELS) float64 19.46 19.47 ... 19.2 nan
TEMP_QC (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
TIME_QC (N_PROF) int32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://erddap.ifremer.fr/erddap
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
Fetched_uri: ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
history: Variables filtered according to DATA_MODE; Variable...
You can simply reverse this transformation with the Dataset.argo.profile2point()
:
In [7]: ds = ds_profiles.argo.profile2point()
In [8]: ds
Out[8]:
<xarray.Dataset>
Dimensions: (N_POINTS: 524)
Coordinates:
LONGITUDE (N_POINTS) float64 -66.77 -66.77 ... -64.59 -64.59
TIME (N_POINTS) datetime64[ns] 2011-01-02T11:14:06 ... ...
LATITUDE (N_POINTS) float64 37.28 37.28 37.28 ... 33.07 33.07
* N_POINTS (N_POINTS) int64 0 1 2 3 4 5 ... 519 520 521 522 523
Data variables: (12/13)
CONFIG_MISSION_NUMBER (N_POINTS) int32 1 1 1 1 1 1 1 ... 13 13 13 13 13 13
CYCLE_NUMBER (N_POINTS) int32 150 150 150 150 150 ... 13 13 13 13
DATA_MODE (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
DIRECTION (N_POINTS) <U1 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_POINTS) int32 4900803 4900803 ... 5903377 5903377
POSITION_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
... ...
PRES_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
PSAL (N_POINTS) float64 36.67 36.67 36.67 ... 36.67 36.67
PSAL_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TEMP (N_POINTS) float64 19.46 19.47 19.47 ... 19.2 19.2
TEMP_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TIME_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://erddap.ifremer.fr/erddap
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
Fetched_uri: ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
history: Variables filtered according to DATA_MODE; Variable...
Pressure levels: Interpolation#
Once your dataset is a collection of vertical profiles, you can interpolate variables on standard pressure levels using Dataset.argo.interp_std_levels()
with your levels as input:
In [9]: ds_interp = ds_profiles.argo.interp_std_levels([0,10,20,30,40,50])
In [10]: ds_interp
Out[10]:
<xarray.Dataset>
Dimensions: (N_PROF: 18, PRES_INTERPOLATED: 6)
Coordinates:
* N_PROF (N_PROF) int64 7 13 15 0 6 2 9 ... 12 10 17 3 8 14 16
LATITUDE (N_PROF) float64 37.28 33.98 32.88 ... 34.39 33.07
LONGITUDE (N_PROF) float64 -66.77 -71.17 ... -72.75 -64.59
TIME (N_PROF) datetime64[ns] 2011-01-02T11:14:06 ... 20...
* PRES_INTERPOLATED (PRES_INTERPOLATED) int64 0 10 20 30 40 50
Data variables:
CONFIG_MISSION_NUMBER (N_PROF) int32 1 1 11 1 1 1 0 1 1 1 1 1 1 2 1 1 1 13
CYCLE_NUMBER (N_PROF) int32 150 3 11 100 180 ... 62 148 151 4 13
DATA_MODE (N_PROF) <U1 'D' 'D' 'D' 'D' 'D' ... 'D' 'D' 'D' 'D'
DIRECTION (N_PROF) <U1 'A' 'A' 'A' 'A' 'A' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_PROF) int32 4900803 4901218 ... 4901218 5903377
PRES (N_PROF, PRES_INTERPOLATED) float64 5.0 10.0 ... 50.0
PSAL (N_PROF, PRES_INTERPOLATED) float64 36.67 ... 36.68
TEMP (N_PROF, PRES_INTERPOLATED) float64 19.46 ... 19.24
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://erddap.ifremer.fr/erddap
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: [x=-75.00/-55.00; y=30.00/40.00; z=0.0/100.0; t=201...
Fetched_uri: ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
history: Variables filtered according to DATA_MODE; Variable...
- Note on the linear interpolation process :
Only profiles that have a maximum pressure higher than the highest standard level are selected for interpolation.
Remaining profiles must have at least five data points to allow interpolation.
For each profile, shallowest data point is repeated to the surface to allow a 0 standard level while avoiding extrapolation.
Pressure levels: Group-by bins#
If you prefer to avoid interpolation, you can opt for a pressure bins grouping reduction using Dataset.argo.groupby_pressure_bins()
. This method can be used to subsample and align an irregular dataset (pressure not being similar in all profiles) on a set of pressure bins. The output dataset could then be used to perform statistics along the N_PROF dimension because N_LEVELS will corresponds to similar pressure bins.
To illustrate this method, let’s start by fetching some data from a low vertical resolution float:
In [11]: loader = ArgoDataFetcher(src='erddap', mode='expert').float(2901623) # Low res float
In [12]: ds = loader.load().data
Let’s now sub-sample these measurements along 250db bins, selecting values from the deepest pressure levels for each bins:
In [13]: bins = np.arange(0.0, np.max(ds["PRES"]), 250.0)
In [14]: ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='deep')
In [15]: ds_binned
Out[15]:
<xarray.Dataset>
Dimensions: (N_POINTS: 659)
Coordinates:
LONGITUDE (N_POINTS) float64 92.28 92.28 ... 94.77 94.77
TIME (N_POINTS) datetime64[ns] 2010-05-14T03:35:00 ....
LATITUDE (N_POINTS) float64 0.012 0.012 ... 3.388 3.388
STD_PRES_BINS (N_POINTS) float64 0.0 250.0 500.0 ... 750.0 1e+03
* N_POINTS (N_POINTS) int64 0 1 2 3 4 ... 654 655 656 657 658
Data variables: (12/23)
CONFIG_MISSION_NUMBER (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
CYCLE_NUMBER (N_POINTS) int32 0 0 0 0 0 0 ... 95 96 96 96 96 96
DATA_MODE (N_POINTS) <U1 'R' 'R' 'R' 'R' ... 'R' 'R' 'R' 'R'
DIRECTION (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_POINTS) int32 2901623 2901623 ... 2901623
POSITION_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
... ...
TEMP_ADJUSTED (N_POINTS) float32 nan nan nan nan ... nan nan nan
TEMP_ADJUSTED_ERROR (N_POINTS) float32 nan nan nan nan ... nan nan nan
TEMP_ADJUSTED_QC (N_POINTS) int32 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0
TEMP_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
TIME_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1
VERTICAL_SAMPLING_SCHEME (N_POINTS) <U29 'Primary sampling: discrete []'...
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://erddap.ifremer.fr/erddap
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: phy;WMO2901623
Fetched_uri: ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
history: Transformed with point2profile; Sub-sampled and re-...
See the new STD_PRES_BINS
variable that hold the pressure bins definition.
The figure below shows the sub-sampling effect:
import matplotlib as mpl
import matplotlib.pyplot as plt
import cmocean
fig, ax = plt.subplots(figsize=(18,6))
ds.plot.scatter(x='CYCLE_NUMBER', y='PRES', hue='PSAL', ax=ax, cmap=cmocean.cm.haline)
plt.plot(ds_binned['CYCLE_NUMBER'], ds_binned['PRES'], 'r+')
plt.hlines(bins, ds['CYCLE_NUMBER'].min(), ds['CYCLE_NUMBER'].max(), color='k')
plt.hlines(ds_binned['STD_PRES_BINS'], ds_binned['CYCLE_NUMBER'].min(), ds_binned['CYCLE_NUMBER'].max(), color='r')
plt.title(ds.attrs['Fetched_constraints'])
plt.gca().invert_yaxis()

The bin limits are shown with horizontal red lines, the original data are in the background colored scatter and the group-by pressure bins values are highlighted in red marks
The select
option can take many different values, see the full documentation of Dataset.argo.groupby_pressure_bins()
, for all the details. Let’s show here results from the random
sampling:
ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='random')

Filters#
If you fetched data with the expert
mode, you may want to use filters to help you curate the data.
QC flag filter:
Dataset.argo.filter_qc()
. This method allows you to filter measurements according to QC flag values. This filter modifies all variables of the dataset.Data mode filter:
Dataset.argo.filter_data_mode()
. This method allows you to filter variables according to their data mode. This filter modifies the <PARAM> and <PARAM_QC> variables of the dataset.OWC variables filter:
Dataset.argo.filter_scalib_pres()
. This method allows you to filter variables according to OWC salinity calibration software requirements. This filter modifies pressure, temperature and salinity related variables of the dataset.
Complementary data#
TEOS-10 variables#
You can compute additional ocean variables from TEOS-10. The default list of variables is: ‘SA’, ‘CT’, ‘SIG0’, ‘N2’, ‘PV’, ‘PTEMP’ (‘SOUND_SPEED’, ‘CNDC’ are optional). Simply raise an issue to add a new one.
This can be done using the Dataset.argo.teos10()
method and indicating the list of variables you want to compute:
In [16]: ds = ArgoDataFetcher().float(2901623).to_xarray()
In [17]: ds.argo.teos10(['SA', 'CT', 'PV'])
Out[17]:
<xarray.Dataset>
Dimensions: (N_POINTS: 8339)
Coordinates:
* N_POINTS (N_POINTS) int64 0 1 2 3 4 ... 8335 8336 8337 8338
LATITUDE (N_POINTS) float64 0.012 0.012 0.012 ... 3.388 3.388
LONGITUDE (N_POINTS) float64 92.28 92.28 92.28 ... 94.77 94.77
TIME (N_POINTS) datetime64[ns] 2010-05-14T03:35:00 ... ...
Data variables: (12/16)
CONFIG_MISSION_NUMBER (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
CYCLE_NUMBER (N_POINTS) int32 0 0 0 0 0 0 0 ... 96 96 96 96 96 96
DATA_MODE (N_POINTS) <U1 'R' 'R' 'R' 'R' ... 'R' 'R' 'R' 'R'
DIRECTION (N_POINTS) <U1 'D' 'D' 'D' 'D' ... 'A' 'A' 'A' 'A'
PLATFORM_NUMBER (N_POINTS) int32 2901623 2901623 ... 2901623 2901623
POSITION_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
... ...
TEMP (N_POINTS) float64 30.16 30.17 30.17 ... 6.189 6.071
TEMP_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
TIME_QC (N_POINTS) int32 1 1 1 1 1 1 1 1 ... 1 1 1 1 1 1 1 1
SA (N_POINTS) float64 34.44 34.44 34.44 ... 35.09 35.08
CT (N_POINTS) float64 30.2 30.2 30.2 ... 6.078 5.959
PV (N_POINTS) float64 nan -1.775e-15 ... 1.573e-12 nan
Attributes:
DATA_ID: ARGO
DOI: http://doi.org/10.17882/42182
Fetched_from: https://erddap.ifremer.fr/erddap
Fetched_by: docs
Fetched_date: 2023/03/28
Fetched_constraints: phy;WMO2901623
Fetched_uri: ['https://erddap.ifremer.fr/erddap/tabledap/ArgoFlo...
history: Variables filtered according to DATA_MODE; Variable...
In [18]: ds['SA']
Out[18]:
<xarray.DataArray 'SA' (N_POINTS: 8339)>
array([34.43589995, 34.43691368, 34.43693526, ..., 35.0919598 ,
35.09211518, 35.08221619])
Coordinates:
* N_POINTS (N_POINTS) int64 0 1 2 3 4 5 6 ... 8333 8334 8335 8336 8337 8338
LATITUDE (N_POINTS) float64 0.012 0.012 0.012 0.012 ... 3.388 3.388 3.388
LONGITUDE (N_POINTS) float64 92.28 92.28 92.28 92.28 ... 94.77 94.77 94.77
TIME (N_POINTS) datetime64[ns] 2010-05-14T03:35:00 ... 2013-01-01T0...
Attributes:
long_name: Absolute Salinity
standard_name: sea_water_absolute_salinity
unit: g/kg
Data models#
By default argopy works with xarray.Dataset
for Argo data fetcher, and with pandas.DataFrame
for Argo index fetcher.
For your own analysis, you may prefer to switch from one to the other. This is all built in argopy, with the argopy.DataFetcher.to_dataframe()
and argopy.IndexFetcher.to_xarray()
methods.
In [19]: ArgoDataFetcher().profile(6902746, 34).to_dataframe()
Out[19]:
CONFIG_MISSION_NUMBER CYCLE_NUMBER ... LONGITUDE TIME
N_POINTS ...
0 2 34 ... -58.119 2017-12-20 06:58:00
1 2 34 ... -58.119 2017-12-20 06:58:00
2 2 34 ... -58.119 2017-12-20 06:58:00
3 2 34 ... -58.119 2017-12-20 06:58:00
4 2 34 ... -58.119 2017-12-20 06:58:00
... ... ... ... ... ...
104 2 34 ... -58.119 2017-12-20 06:58:00
105 2 34 ... -58.119 2017-12-20 06:58:00
106 2 34 ... -58.119 2017-12-20 06:58:00
107 2 34 ... -58.119 2017-12-20 06:58:00
108 2 34 ... -58.119 2017-12-20 06:58:00
[109 rows x 16 columns]
Saving data#
Once you have your Argo data as xarray.Dataset
, simply use the awesome possibilities of xarray like xarray.Dataset.to_netcdf()
or xarray.Dataset.to_zarr()
.