argopy.ArgoIndex#

class ArgoIndex(host: str = 'https://data-argo.ifremer.fr', index_file: str = 'ar_index_global_prof.txt', convention: str | None = None, cache: bool = False, cachedir: str = '', timeout: int = 0)[source]#

Argo GDAC index store

If Pyarrow is available, this class will use pyarrow.Table as internal storage format; otherwise, a pandas.DataFrame will be used.

You can use the exact index file names or keywords:

  • core for the ar_index_global_prof.txt index file,

  • bgc-b for the argo_bio-profile_index.txt index file,

  • bgc-s for the argo_synthetic-profile_index.txt index file.

Examples

An index store is instantiated with a host (any access path, local, http or ftp) and an index file:

>>> idx = ArgoIndex()
>>> idx = ArgoIndex(host="https://data-argo.ifremer.fr")  # Default host
>>> idx = ArgoIndex(host="ftp://ftp.ifremer.fr/ifremer/argo", index_file="ar_index_global_prof.txt")  # Default index
>>> idx = ArgoIndex(index_file="bgc-s")  # Use keywords instead of exact file names
>>> idx = ArgoIndex(host="https://data-argo.ifremer.fr", index_file="bgc-b", cache=True)  # Use cache for performances
>>> idx = ArgoIndex(host=".", index_file="dummy_index.txt", convention="core")  # Load your own index

Full index methods and properties:

>>> idx.load()
>>> idx.load(nrows=12)  # Only load the first N rows of the index
>>> idx.to_dataframe(index=True)  # Convert index to user-friendly :class:`pandas.DataFrame`
>>> idx.to_dataframe(index=True, nrows=2)  # Only returns the first nrows of the index
>>> idx.N_RECORDS  # Shortcut for length of 1st dimension of the index array
>>> idx.index  # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`)
>>> idx.shape  # shape of the full index array
>>> idx.uri_full_index  # List of absolute path to files from the full index table column 'file'

Search methods:

>>> idx.search_wmo(1901393)
>>> idx.search_cyc(1)
>>> idx.search_wmo_cyc(1901393, [1,12])
>>> idx.search_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.search_lat_lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.search_params(['C1PHASE_DOXY', 'DOWNWELLING_PAR'])  # Take a list of strings, only for BGC index !
>>> idx.search_parameter_data_mode({'BBP700': 'D', 'DOXY': ['A', 'D']})  # Take a dict.

Search result properties and methods:

>>> idx.N_MATCH  # Shortcut for length of 1st dimension of the search results array
>>> idx.search  # Internal table with search results
>>> idx.uri  # List of absolute path to files from the search results table column 'file'
>>> idx.run()  # Run the search and save results in cache if necessary
>>> idx.to_dataframe()  # Convert search results to user-friendly :class:`pandas.DataFrame`
>>> idx.to_dataframe(nrows=2)  # Only returns the first nrows of the search results
>>> idx.to_indexfile("search_index.txt")  # Export search results to Argo standard index file

Misc:

>>> idx.convention  # What is the expected index format (core vs BGC profile index)
>>> idx.cname
>>> idx.read_wmo
>>> idx.read_params
>>> idx.records_per_wmo
__init__(host: str = 'https://data-argo.ifremer.fr', index_file: str = 'ar_index_global_prof.txt', convention: str | None = None, cache: bool = False, cachedir: str = '', timeout: int = 0) object#

Create an Argo index file store

Parameters:
  • host (str, default: https://data-argo.ifremer.fr) – Local or remote (ftp or http) path to a dac folder (GDAC structure compliant). This takes values like: ftp://ftp.ifremer.fr/ifremer/argo, ftp://usgodae.org/pub/outgoing/argo or a local absolute path.

  • index_file (str, default: ar_index_global_prof.txt) –

    Name of the csv-like text file with the index.

    Possible values are standard file name: ar_index_global_prof.txt, argo_bio-profile_index.txt or argo_synthetic-profile_index.txt.

    You can also use the following shortcuts: core, bgc-b, bgc-s, respectively.

  • convention (str, default: None) –

    Set the expected format convention of the index file. This is useful when trying to load index file with custom name. If set to None, we’ll try to infer the convention from the index_file value.

    Possible values: ar_index_global_prof, argo_bio-profile_index, or argo_synthetic-profile_index.

    You can also use the keyword: core, bgc-s, bgc-b.

  • cache (bool, default: False) – Use cache or not.

  • cachedir (str, default: OPTIONS['cachedir']) – Folder where to store cached files

  • timeout (int, default: OPTIONS['api_timeout']) – Time out in seconds to connect to a remote host (ftp or http).

Methods

__init__([host, index_file, convention, ...])

Create an Argo index file store

cachepath(path)

Return path to a cached file

clear_cache()

Clear cache registry and files associated with this store instance.

load([nrows, force])

Load an Argo-index file content

read_params([index])

Return list of unique PARAMETERs in index or search results

read_wmo([index])

Return list of unique WMOs in search results

records_per_wmo([index])

Return the number of records per unique WMOs in search results

run([nrows])

Filter index with search criteria

search_cyc(CYCs[, nrows])

Search index for cycle numbers

search_lat_lon(BOX[, nrows])

Search index for a rectangular latitude/longitude domain

search_lat_lon_tim(BOX[, nrows])

Search index for a rectangular latitude/longitude domain and time range

search_parameter_data_mode(PARAMs[, ...])

Search index for profiles with a parameter in a specific data mode

search_params(PARAMs[, logical, nrows])

Search index for one or a list of parameters

search_tim(BOX[, nrows])

Search index for a time range

search_wmo(WMOs[, nrows])

Search index for floats defined by their WMO

search_wmo_cyc(WMOs, CYCs[, nrows])

Search index for floats defined by their WMO and specific cycle numbers

to_dataframe([nrows, index, completed])

Return index or search results as pandas.DataFrame

to_indexfile(outputfile)

Save search results on file, following the Argo standard index formats

Attributes

N_FILES

Number of rows in search result or index if search not triggered

N_MATCH

Number of rows in search result

N_RECORDS

Number of rows in the full index

backend

Name of store backend

cname

Return the search constraint(s) as a pretty formatted string

convention

Convention of the index (standard csv file name)

convention_supported

List of supported conventions

convention_title

Long name for the index convention

ext

Storage file extension

search_path

Path to search result uri

search_type

Dictionary with search meta-data

sha_df

Returns a unique SHA for a cname/dataframe

sha_h5

Returns a unique SHA for a cname/hdf5

sha_pq

Returns a unique SHA for a cname/parquet

shape

Shape of the index array

uri

List of URI from search results

uri_full_index

List of URI from index