User mode (πŸ„ 🏊 🚣)#

Problem

For non-experts of the Argo dataset, it can be quite complicated to get access to Argo measurements. Indeed, the Argo data set is very complex, with thousands of different variables, tens of reference tables and a user manual more than 100 pages long.

This is mainly due to:

  • Argo measurements coming from many different models of floats or sensors,

  • quality control of in situ measurements of autonomous platforms being really a matter of ocean and data experts,

  • the Argo data management workflow being distributed between more than 10 Data Assembly Centers all around the world,

  • the Argo autonomous profiling floats, despite quite a simple principle of functioning, is a rather complex robot that needs a lot of data to be monitored and logged.

Solution

In order to ease Argo data analysis for the vast majority of users, we implemented in argopy different levels of verbosity and data processing to hide or simply remove variables only meaningful to experts.

User mode details#

argopy provides 3 user modes:

  • πŸ„ expert mode return all the Argo data, without any postprocessing,

  • 🏊 standard mode simplifies the dataset, remove most of its jargon and return a priori good data,

  • 🚣 research mode simplifies the dataset to its heart, preserving only data of the highest quality for research studies, including studies sensitive to small pressure and salinity bias (e.g. calculations of global ocean heat content or mixed layer depth).

In standard and research modes, fetched data are automatically filtered to account for their quality (using the quality control flags) and level of processing by the data centers (considering for each parameter the data mode which indicates if a human expert has carefully looked at the data or not). Both mode return a postprocessed subset of the full Argo dataset.

Hence the main difference between the standard and research modes is in the level of data quality insurance. In standard mode, only good or probably good data are returned and includes real time data that have been validated automatically but not by a human expert. The research mode is the safer choice, with data of the highest quality, carefully checked in delayed mode by a human expert of the Argo Data Management Team.

Table of argopy user mode data processing details#

expert

standard

research

πŸ„

🏊

🚣

Level of quality (QC flags) retained

all

good or probably good (QC=[1,2])

good (QC=1)

Level of assessment (Data mode) retained

all: [R,D,A] modes

all: [R,D,A] modes, but PARAM_ADJUSTED and PARAM are merged in a single variable according to the mode

best only (D mode only)

Pressure error

any

any

smaller than 20db

Variables returned

all

all without jargon (DATA_MODE and QC_FLAG are retained)

comprehensive minimum

How to select a user mode ?#

Let’s import the argopy data fetcher:

In [1]: import argopy

In [2]: from argopy import DataFetcher as ArgoDataFetcher

By default, all argopy data fetchers are set to work with a standard user mode.

If you want to change the user mode, or to simply makes it explicit in your code, you can use one of the following 3 methods:

  • the argopy global option setter:

In [3]: argopy.set_options(mode='standard')
Out[3]: <argopy.options.set_options at 0x7f27a43f6c40>
  • a temporary context:

In [4]: with argopy.set_options(mode='expert'):
   ...:     ArgoDataFetcher().profile(6902746, 34)
   ...: 
  • or the fetcher option:

In [5]: ArgoDataFetcher(mode='research').profile(6902746, 34)
Out[5]: 
<datafetcher.erddap>
Name: Ifremer erddap Argo data fetcher for profiles
API: https://erddap.ifremer.fr/erddap
Domain: phy;WMO6902746_CYC34
Performances: cache=False, parallel=False
User mode: research
Dataset: phy

Example of differences in user modes#

To highlight differences in data returned for each user modes, let’s compare data fetched for one profile.

You will note that the standard and research modes have fewer variables to let you focus on your analysis. For expert, all Argo variables for you to work with are here.

In [6]: argopy.set_options(ftp='https://data-argo.ifremer.fr')
Out[6]: <argopy.options.set_options at 0x7f27a68ac1c0>
In [7]: with argopy.set_options(mode='expert'):
   ...:     ds = ArgoDataFetcher(src='gdac').profile(6902755, 12).to_xarray()
   ...:     print(ds.data_vars)
   ...: 
---------------------------------------------------------------------------
TimeoutError                              Traceback (most recent call last)
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/fsspec/asyn.py:61, in _runner(event, coro, result, timeout)
     60 try:
---> 61     result[0] = await coro
     62 except Exception as ex:

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/fsspec/implementations/http.py:750, in HTTPStreamFile._read(self, num)
    749 async def _read(self, num=-1):
--> 750     out = await self.r.content.read(num)
    751     self.loc += len(out)

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/aiohttp/streams.py:385, in StreamReader.read(self, n)
    384 while not self._buffer and not self._eof:
--> 385     await self._wait("read")
    387 return self._read_nowait(n)

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/aiohttp/streams.py:304, in StreamReader._wait(self, func_name)
    303     with self._timer:
--> 304         await waiter
    305 else:

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/aiohttp/helpers.py:721, in TimerContext.__exit__(self, exc_type, exc_val, exc_tb)
    720 if exc_type is asyncio.CancelledError and self._cancelled:
--> 721     raise asyncio.TimeoutError from None
    722 return None

TimeoutError: 

The above exception was the direct cause of the following exception:

FSTimeoutError                            Traceback (most recent call last)
Cell In[7], line 2
      1 with argopy.set_options(mode='expert'):
----> 2     ds = ArgoDataFetcher(src='gdac').profile(6902755, 12).to_xarray()
      3     print(ds.data_vars)

File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/v0.1.14rc1/argopy/fetchers.py:50, in checkAccessPoint.<locals>.wrapper(*args)
     45 if AccessPoint.__name__ not in args[0].valid_access_points:
     46     raise InvalidFetcherAccessPoint(
     47                     "'%s' not available with '%s' src. Available access point(s): %s" %
     48                     (AccessPoint.__name__, args[0]._src, ", ".join(args[0].Fetchers.keys()))
     49                 )
---> 50 return AccessPoint(*args)

File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/v0.1.14rc1/argopy/fetchers.py:366, in ArgoDataFetcher.profile(self, wmo, cyc)
    364 wmo = check_wmo(wmo)  # Check and return a valid list of WMOs
    365 cyc = check_cyc(cyc)  # Check and return a valid list of CYCs
--> 366 self.fetcher = self.Fetchers["profile"](WMO=wmo, CYC=cyc, **self.fetcher_options)
    367 self._AccessPoint = "profile"  # Register the requested access point
    368 self._AccessPoint_data = {'wmo': wmo, 'cyc': cyc}  # Register the requested access point data

File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/v0.1.14rc1/argopy/data_fetchers/gdacftp_data.py:120, in FTPArgoDataFetcher.__init__(self, ftp, ds, cache, cachedir, dimension, errors, parallel, parallel_method, progress, api_timeout, **kwargs)
    118     nrows = kwargs["N_RECORDS"]
    119 # Number of records in the index, this will force to load the index file:
--> 120 self.N_RECORDS = self.indexfs.load(
    121     nrows=nrows
    122 ).N_RECORDS
    123 self._post_filter_points = False
    125 # Set method to download data:

File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/v0.1.14rc1/argopy/stores/argo_index_pd.py:76, in indexstore_pandas.load(self, nrows, force)
     74     with self.fs["src"].open(self.index_path + ".gz", "rb") as fg:
     75         with gzip.open(fg) as f:
---> 76             self.index = csv2index(f, self.index_path + ".gz")
     77 else:
     78     with self.fs["src"].open(self.index_path, "rb") as f:

File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/v0.1.14rc1/argopy/stores/argo_index_pd.py:49, in indexstore_pandas.load.<locals>.csv2index(obj, origin)
     48 def csv2index(obj, origin):
---> 49     index = read_csv(obj, nrows=nrows)
     50     check_index_cols(
     51         index.columns.to_list(),
     52         convention=self.convention,
     53     )
     54     log.debug("Argo index file loaded with pandas read_csv. src='%s'" % origin)

File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/v0.1.14rc1/argopy/stores/argo_index_pd.py:43, in indexstore_pandas.load.<locals>.read_csv(input_file, nrows)
     42 def read_csv(input_file, nrows=None):
---> 43     this_table = pd.read_csv(
     44         input_file, sep=",", index_col=None, header=0, skiprows=8, nrows=nrows
     45     )
     46     return this_table

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/io/parsers/readers.py:912, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
    899 kwds_defaults = _refine_defaults_read(
    900     dialect,
    901     delimiter,
   (...)
    908     dtype_backend=dtype_backend,
    909 )
    910 kwds.update(kwds_defaults)
--> 912 return _read(filepath_or_buffer, kwds)

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/io/parsers/readers.py:583, in _read(filepath_or_buffer, kwds)
    580     return parser
    582 with parser:
--> 583     return parser.read(nrows)

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/io/parsers/readers.py:1704, in TextFileReader.read(self, nrows)
   1697 nrows = validate_integer("nrows", nrows)
   1698 try:
   1699     # error: "ParserBase" has no attribute "read"
   1700     (
   1701         index,
   1702         columns,
   1703         col_dict,
-> 1704     ) = self._engine.read(  # type: ignore[attr-defined]
   1705         nrows
   1706     )
   1707 except Exception:
   1708     self.close()

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/io/parsers/c_parser_wrapper.py:234, in CParserWrapper.read(self, nrows)
    232 try:
    233     if self.low_memory:
--> 234         chunks = self._reader.read_low_memory(nrows)
    235         # destructive to chunks
    236         data = _concatenate_chunks(chunks)

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/_libs/parsers.pyx:814, in pandas._libs.parsers.TextReader.read_low_memory()

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/_libs/parsers.pyx:875, in pandas._libs.parsers.TextReader._read_rows()

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/_libs/parsers.pyx:850, in pandas._libs.parsers.TextReader._tokenize_rows()

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/_libs/parsers.pyx:861, in pandas._libs.parsers.TextReader._check_tokenize_status()

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/pandas/_libs/parsers.pyx:2021, in pandas._libs.parsers.raise_parser_error()

File ~/.pyenv/versions/3.8.6/lib/python3.8/_compression.py:68, in DecompressReader.readinto(self, b)
     66 def readinto(self, b):
     67     with memoryview(b) as view, view.cast("B") as byte_view:
---> 68         data = self.read(len(byte_view))
     69         byte_view[:len(data)] = data
     70     return len(data)

File ~/.pyenv/versions/3.8.6/lib/python3.8/gzip.py:485, in _GzipReader.read(self, size)
    482     self._new_member = False
    484 # Read a chunk of data from the file
--> 485 buf = self._fp.read(io.DEFAULT_BUFFER_SIZE)
    487 uncompress = self._decompressor.decompress(buf, size)
    488 if self._decompressor.unconsumed_tail != b"":

File ~/.pyenv/versions/3.8.6/lib/python3.8/gzip.py:87, in _PaddedFile.read(self, size)
     85 def read(self, size):
     86     if self._read is None:
---> 87         return self.file.read(size)
     88     if self._read + size <= self._length:
     89         read = self._read

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/fsspec/asyn.py:121, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
    118 @functools.wraps(func)
    119 def wrapper(*args, **kwargs):
    120     self = obj or args[0]
--> 121     return sync(self.loop, func, *args, **kwargs)

File ~/checkouts/readthedocs.org/user_builds/argopy/envs/v0.1.14rc1/lib/python3.8/site-packages/fsspec/asyn.py:104, in sync(loop, func, timeout, *args, **kwargs)
    101 return_result = result[0]
    102 if isinstance(return_result, asyncio.TimeoutError):
    103     # suppress asyncio.TimeoutError, raise FSTimeoutError
--> 104     raise FSTimeoutError from return_result
    105 elif isinstance(return_result, BaseException):
    106     raise return_result

FSTimeoutError: 

Note

A note for expert users looking at standard and research mode results: they are no PARAM_ADJUSTED variables because they’ve been renamed PARAM wherever the DATA_MODE variable was ADJUSTED or DELAYED.