Argo meta-data#
Index of profiles#
Since the Argo measurements dataset is quite complex, it comes with a collection of index files, or lookup tables with meta data. These index help you determine what you can expect before retrieving the full set of measurements. argopy has a specific fetcher for index files:
In [1]: from argopy import IndexFetcher as ArgoIndexFetcher
You can use the Index fetcher with the region
or float
access points, similarly to data fetching:
In [2]: idx = ArgoIndexFetcher(src='gdac').float(2901623).load()
In [3]: idx.index
Out[3]:
file ... profiler
0 nmdis/2901623/profiles/R2901623_000.nc ... Provor, Seabird conductivity sensor
1 nmdis/2901623/profiles/R2901623_000D.nc ... Provor, Seabird conductivity sensor
2 nmdis/2901623/profiles/R2901623_001.nc ... Provor, Seabird conductivity sensor
3 nmdis/2901623/profiles/R2901623_002.nc ... Provor, Seabird conductivity sensor
4 nmdis/2901623/profiles/R2901623_003.nc ... Provor, Seabird conductivity sensor
.. ... ... ...
93 nmdis/2901623/profiles/R2901623_092.nc ... Provor, Seabird conductivity sensor
94 nmdis/2901623/profiles/R2901623_093.nc ... Provor, Seabird conductivity sensor
95 nmdis/2901623/profiles/R2901623_094.nc ... Provor, Seabird conductivity sensor
96 nmdis/2901623/profiles/R2901623_095.nc ... Provor, Seabird conductivity sensor
97 nmdis/2901623/profiles/R2901623_096.nc ... Provor, Seabird conductivity sensor
[98 rows x 11 columns]
Alternatively, you can use argopy.IndexFetcher.to_dataframe()
:
In [4]: idx = ArgoIndexFetcher(src='gdac').float(2901623)
In [5]: df = idx.to_dataframe()
The difference is that with the load method, data are stored in memory and not fetched on every call to the index attribute.
The index fetcher has pretty much the same methods than the data fetchers. you can check them all here: argopy.fetchers.ArgoIndexFetcher
.
Reference tables#
The Argo netcdf format is strict and based on a collection of variables fully documented and conventioned. All reference tables can be found in the Argo user manual.
However, a machine-to-machine access to these tables is often required. This is possible thanks to the work of the Argo Vocabulary Task Team (AVTT) that is a team of people responsible for the NVS collections under the Argo Data Management Team governance.
Note
The GitHub organization hosting the AVTT is the ‘NERC Vocabulary Server (NVS)’, aka ‘nvs-vocabs’. This holds a list of NVS collection-specific GitHub repositories. Each Argo GitHub repository is called after its corresponding collection ID (e.g. R01, RR2, R03 etc.). The current list is given here.
The management of issues related to vocabularies managed by the Argo Data Management Team is done on this repository.
argopy provides the utility class ArgoNVSReferenceTables
to easily fetch and get access to all Argo reference tables. If you already know the name of the reference table you want to retrieve, you can simply get it like this:
In [6]: from argopy import ArgoNVSReferenceTables
In [7]: NVS = ArgoNVSReferenceTables()
In [8]: NVS.tbl('R01')
Out[8]:
altLabel ... id
0 BPROF ... http://vocab.nerc.ac.uk/collection/R01/current...
1 BTRAJ ... http://vocab.nerc.ac.uk/collection/R01/current...
2 META ... http://vocab.nerc.ac.uk/collection/R01/current...
3 MPROF ... http://vocab.nerc.ac.uk/collection/R01/current...
4 MTRAJ ... http://vocab.nerc.ac.uk/collection/R01/current...
5 PROF ... http://vocab.nerc.ac.uk/collection/R01/current...
6 SPROF ... http://vocab.nerc.ac.uk/collection/R01/current...
7 TECH ... http://vocab.nerc.ac.uk/collection/R01/current...
8 TRAJ ... http://vocab.nerc.ac.uk/collection/R01/current...
[9 rows x 5 columns]
The reference table is returned as a pandas.DataFrame
. If you want the exact name of this table:
In [9]: NVS.tbl_name('R01')
Out[9]:
('DATA_TYPE',
'Terms describing the type of data contained in an Argo netCDF file.',
'http://vocab.nerc.ac.uk/collection/R01/current/')
On the other hand, if you want to retrieve all reference tables, you can do it with the ArgoNVSReferenceTables.all_tbl()
method. It will return a dictionary with table short names as key and pandas.DataFrame
as values.
In [10]: all = NVS.all_tbl()
In [11]: all.keys()
Out[11]: odict_keys(['ARGO_WMO_INST_TYPE', 'CYCLE_TIMING_VARIABLE', 'DATA_CENTRE_CODES', 'DATA_STATE_INDICATOR', 'DATA_TYPE', 'DM_QC_FLAG', 'GROUNDED', 'HISTORY_ACTION', 'HISTORY_STEP', 'MEASUREMENT_CODE_CATEGORY', 'MEASUREMENT_CODE_ID', 'OCEAN_CODE', 'PARAMETER', 'PLATFORM_FAMILY', 'PLATFORM_MAKER', 'PLATFORM_TYPE', 'POSITIONING_SYSTEM', 'POSITION_ACCURACY', 'PROF_QC_FLAG', 'REPRESENTATIVE_PARK_PRESSURE_STATUS', 'RTQC_TESTID', 'RT_QC_FLAG', 'SENSOR', 'SENSOR_MAKER', 'SENSOR_MODEL', 'STATUS', 'TRANS_SYSTEM', 'VERTICAL_SAMPLING_SCHEME'])
Deployment Plan#
It may be useful to be able to retrieve meta-data from Argo deployments. argopy can use the OceanOPS API for metadata access to retrieve these information. The returned deployment plan is a list of all Argo floats ever deployed, together with their deployment location, date, WMO, program, country, float model and current status.
To fetch the Argo deployment plan, argopy provides a dedicated utility class: OceanOPSDeployments
that can be used like this:
In [12]: from argopy import OceanOPSDeployments
In [13]: deployment = OceanOPSDeployments()
In [14]: df = deployment.to_dataframe()
---------------------------------------------------------------------------
ClientOSError Traceback (most recent call last)
Cell In[14], line 1
----> 1 df = deployment.to_dataframe()
File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/latest/argopy/utilities.py:3349, in OceanOPSDeployments.to_dataframe(self)
3342 def to_dataframe(self):
3343 """Return the deployment plan as :class:`pandas.DataFrame`
3344
3345 Returns
3346 -------
3347 :class:`pandas.DataFrame`
3348 """
-> 3349 data = self.to_json()
3350 if data['total'] == 0:
3351 raise DataNotFound('Your search matches no results')
File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/latest/argopy/utilities.py:3339, in OceanOPSDeployments.to_json(self)
3337 """Return OceanOPS API request response as a json object"""
3338 if self.data is None:
-> 3339 self.data = self.fs.open_json(self.uri)
3340 return self.data
File ~/checkouts/readthedocs.org/user_builds/argopy/checkouts/latest/argopy/stores/filesystems.py:706, in httpstore.open_json(self, url, **kwargs)
697 log.debug("Opening json from: %s" % url)
698 # try:
699 # with self.open(url) as of:
700 # js = json.load(of, **kwargs)
(...)
704 # except json.JSONDecodeError:
705 # raise
--> 706 data = self.fs.cat_file(url)
707 js = json.loads(data, **kwargs)
708 if len(js) == 0:
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/fsspec/asyn.py:115, in sync_wrapper.<locals>.wrapper(*args, **kwargs)
112 @functools.wraps(func)
113 def wrapper(*args, **kwargs):
114 self = obj or args[0]
--> 115 return sync(self.loop, func, *args, **kwargs)
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/fsspec/asyn.py:100, in sync(loop, func, timeout, *args, **kwargs)
98 raise FSTimeoutError from return_result
99 elif isinstance(return_result, BaseException):
--> 100 raise return_result
101 else:
102 return return_result
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/fsspec/asyn.py:55, in _runner(event, coro, result, timeout)
53 coro = asyncio.wait_for(coro, timeout=timeout)
54 try:
---> 55 result[0] = await coro
56 except Exception as ex:
57 result[0] = ex
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/fsspec/implementations/http.py:229, in HTTPFileSystem._cat_file(self, url, start, end, **kwargs)
227 kw["headers"] = headers
228 session = await self.set_session()
--> 229 async with session.get(self.encode_url(url), **kw) as r:
230 out = await r.read()
231 self._raise_not_found_for_status(r, url)
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/aiohttp/client.py:1141, in _BaseRequestContextManager.__aenter__(self)
1140 async def __aenter__(self) -> _RetType:
-> 1141 self._resp = await self._coro
1142 return self._resp
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/aiohttp/client.py:560, in ClientSession._request(self, method, str_or_url, params, data, json, cookies, headers, skip_auto_headers, auth, allow_redirects, max_redirects, compress, chunked, expect100, raise_for_status, read_until_eof, proxy, proxy_auth, timeout, verify_ssl, fingerprint, ssl_context, ssl, proxy_headers, trace_request_ctx, read_bufsize)
558 resp = await req.send(conn)
559 try:
--> 560 await resp.start(conn)
561 except BaseException:
562 resp.close()
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/aiohttp/client_reqrep.py:899, in ClientResponse.start(self, connection)
897 try:
898 protocol = self._protocol
--> 899 message, payload = await protocol.read() # type: ignore[union-attr]
900 except http.HttpProcessingError as exc:
901 raise ClientResponseError(
902 self.request_info,
903 self.history,
(...)
906 headers=exc.headers,
907 ) from exc
File ~/checkouts/readthedocs.org/user_builds/argopy/envs/latest/lib/python3.8/site-packages/aiohttp/streams.py:616, in DataQueue.read(self)
614 self._waiter = self._loop.create_future()
615 try:
--> 616 await self._waiter
617 except (asyncio.CancelledError, asyncio.TimeoutError):
618 self._waiter = None
ClientOSError: [Errno 104] Connection reset by peer
In [15]: df
Out[15]:
file ... profiler
0 nmdis/2901623/profiles/R2901623_000.nc ... Provor, Seabird conductivity sensor
1 nmdis/2901623/profiles/R2901623_000D.nc ... Provor, Seabird conductivity sensor
2 nmdis/2901623/profiles/R2901623_001.nc ... Provor, Seabird conductivity sensor
3 nmdis/2901623/profiles/R2901623_002.nc ... Provor, Seabird conductivity sensor
4 nmdis/2901623/profiles/R2901623_003.nc ... Provor, Seabird conductivity sensor
.. ... ... ...
93 nmdis/2901623/profiles/R2901623_092.nc ... Provor, Seabird conductivity sensor
94 nmdis/2901623/profiles/R2901623_093.nc ... Provor, Seabird conductivity sensor
95 nmdis/2901623/profiles/R2901623_094.nc ... Provor, Seabird conductivity sensor
96 nmdis/2901623/profiles/R2901623_095.nc ... Provor, Seabird conductivity sensor
97 nmdis/2901623/profiles/R2901623_096.nc ... Provor, Seabird conductivity sensor
[98 rows x 11 columns]
OceanOPSDeployments
can also take an index box definition as argument in order to restrict the deployment plan selection to a specific region or period:
deployment = OceanOPSDeployments([-90, 0, 0, 90])
# deployment = OceanOPSDeployments([-20, 0, 42, 51, '2020-01', '2021-01'])
# deployment = OceanOPSDeployments([-180, 180, -90, 90, '2020-01', None])
Note that if the starting date is not provided, it will be set automatically to the current date.
Last, OceanOPSDeployments
comes with a plotting method:
fig, ax = deployment.plot_status()

Note
The list of possible deployment status name/code is given by:
OceanOPSDeployments().status_code
Status |
Id |
Description |
---|---|---|
PROBABLE |
0 |
Starting status for some platforms, when there is only a few metadata available, like rough deployment location and date. The platform may be deployed |
CONFIRMED |
1 |
Automatically set when a ship is attached to the deployment information. The platform is ready to be deployed, deployment is planned |
REGISTERED |
2 |
Starting status for most of the networks, when deployment planning is not done. The deployment is certain, and a notification has been sent via the OceanOPS system |
OPERATIONAL |
6 |
Automatically set when the platform is emitting a pulse and observations are distributed within a certain time interval |
INACTIVE |
4 |
The platform is not emitting a pulse since a certain time |
CLOSED |
5 |
The platform is not emitting a pulse since a long time, it is considered as dead |