Manipulating data
=================
.. contents::
:local:
.. currentmodule:: xarray
Once you fetched data, **argopy** comes with a handy :class:`xarray.Dataset` accessor ``argo`` to perform specific manipulation of the data. This means that if your dataset is named `ds`, then you can use `ds.argo` to access more **argopy** functions. The full list is available in the API documentation page :ref:`Dataset.argo (xarray accessor)`.
Let's start with standard import:
.. ipython:: python
:okwarning:
from argopy import DataFetcher as ArgoDataFetcher
Transformation
--------------
Points vs profiles
^^^^^^^^^^^^^^^^^^
By default, fetched data are returned as a 1D array collection of measurements:
.. ipython:: python
:okwarning:
argo_loader = ArgoDataFetcher().region([-75,-55,30.,40.,0,100., '2011-01-01', '2011-01-15'])
ds_points = argo_loader.to_xarray()
ds_points
If you prefer to work with a 2D array collection of vertical profiles, simply transform the dataset with :meth:`Dataset.argo.point2profile`:
.. ipython:: python
:okwarning:
ds_profiles = ds_points.argo.point2profile()
ds_profiles
You can simply reverse this transformation with the :meth:`Dataset.argo.profile2point`:
.. ipython:: python
:okwarning:
ds = ds_profiles.argo.profile2point()
ds
Pressure levels: Interpolation
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once your dataset is a collection of vertical **profiles**, you can interpolate variables on standard pressure levels using :meth:`Dataset.argo.interp_std_levels` with your levels as input:
.. ipython:: python
:okwarning:
ds_interp = ds_profiles.argo.interp_std_levels([0,10,20,30,40,50])
ds_interp
Note on the linear interpolation process :
- Only profiles that have a maximum pressure higher than the highest standard level are selected for interpolation.
- Remaining profiles must have at least five data points to allow interpolation.
- For each profile, shallowest data point is repeated to the surface to allow a 0 standard level while avoiding extrapolation.
Pressure levels: Group-by bins
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
If you prefer to avoid interpolation, you can opt for a pressure bins grouping reduction using :meth:`Dataset.argo.groupby_pressure_bins`. This method can be used to subsample and align an irregular dataset (pressure not being similar in all profiles) on a set of pressure bins. The output dataset could then be used to perform statistics along the N_PROF dimension because N_LEVELS will corresponds to similar pressure bins.
To illustrate this method, let's start by fetching some data from a low vertical resolution float:
.. ipython:: python
:okwarning:
loader = ArgoDataFetcher(src='erddap', mode='expert').float(2901623) # Low res float
ds = loader.load().data
Let's now sub-sample these measurements along 250db bins, selecting values from the **deepest** pressure levels for each bins:
.. ipython:: python
:okwarning:
bins = np.arange(0.0, np.max(ds["PRES"]), 250.0)
ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='deep')
ds_binned
See the new ``STD_PRES_BINS`` variable that hold the pressure bins definition.
The figure below shows the sub-sampling effect:
.. code-block:: python
import matplotlib as mpl
import matplotlib.pyplot as plt
import cmocean
fig, ax = plt.subplots(figsize=(18,6))
ds.plot.scatter(x='CYCLE_NUMBER', y='PRES', hue='PSAL', ax=ax, cmap=cmocean.cm.haline)
plt.plot(ds_binned['CYCLE_NUMBER'], ds_binned['PRES'], 'r+')
plt.hlines(bins, ds['CYCLE_NUMBER'].min(), ds['CYCLE_NUMBER'].max(), color='k')
plt.hlines(ds_binned['STD_PRES_BINS'], ds_binned['CYCLE_NUMBER'].min(), ds_binned['CYCLE_NUMBER'].max(), color='r')
plt.title(ds.attrs['Fetched_constraints'])
plt.gca().invert_yaxis()
.. image:: _static/groupby_pressure_bins_select_deep.png
The bin limits are shown with horizontal red lines, the original data are in the background colored scatter and the group-by pressure bins values are highlighted in red marks
The ``select`` option can take many different values, see the full documentation of :meth:`Dataset.argo.groupby_pressure_bins` , for all the details. Let's show here results from the ``random`` sampling:
.. code-block:: python
ds_binned = ds.argo.groupby_pressure_bins(bins=bins, select='random')
.. image:: _static/groupby_pressure_bins_select_random.png
Filters
^^^^^^^
If you fetched data with the ``expert`` mode, you may want to use *filters* to help you curate the data.
- **QC flag filter**: :meth:`Dataset.argo.filter_qc`. This method allows you to filter measurements according to QC flag values. This filter modifies all variables of the dataset.
- **Data mode filter**: :meth:`Dataset.argo.filter_data_mode`. This method allows you to filter variables according to their data mode. This filter modifies the and variables of the dataset.
- **OWC variables filter**: :meth:`Dataset.argo.filter_scalib_pres`. This method allows you to filter variables according to OWC salinity calibration software requirements. This filter modifies pressure, temperature and salinity related variables of the dataset.
Complementary data
------------------
TEOS-10 variables
^^^^^^^^^^^^^^^^^
You can compute additional ocean variables from `TEOS-10 `_. The default list of variables is: 'SA', 'CT', 'SIG0', 'N2', 'PV', 'PTEMP' ('SOUND_SPEED', 'CNDC' are optional). `Simply raise an issue to add a new one `_.
This can be done using the :meth:`Dataset.argo.teos10` method and indicating the list of variables you want to compute:
.. ipython:: python
:okwarning:
ds = ArgoDataFetcher().float(2901623).to_xarray()
ds.argo.teos10(['SA', 'CT', 'PV'])
.. ipython:: python
:okwarning:
ds['SA']
Data models
-----------
By default **argopy** works with :class:`xarray.Dataset` for Argo data fetcher, and with :class:`pandas.DataFrame` for Argo index fetcher.
For your own analysis, you may prefer to switch from one to the other. This is all built in **argopy**, with the :meth:`argopy.DataFetcher.to_dataframe` and :meth:`argopy.IndexFetcher.to_xarray` methods.
.. ipython:: python
:okwarning:
ArgoDataFetcher().profile(6902746, 34).to_dataframe()
Saving data
===========
Once you have your Argo data as :class:`xarray.Dataset`, simply use the awesome possibilities of `xarray `_ like :meth:`xarray.Dataset.to_netcdf` or :meth:`xarray.Dataset.to_zarr`.