argopy.stores.argo_index_pa.indexstore_pyarrow#

class indexstore_pyarrow(host: str = 'https://data-argo.ifremer.fr', index_file: str = 'ar_index_global_prof.txt', cache: bool = False, cachedir: str = '', timeout: int = 0)[source]#

Argo GDAC index store using pyarrow.Table as internal storage format.

With this store, index and search results are saved as pyarrow/parquet files in cache

Examples

An index store is instantiated with the access path (host) and the index file:

>>> idx = indexstore()
>>> idx = indexstore(host="ftp://ftp.ifremer.fr/ifremer/argo")
>>> idx = indexstore(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt")
>>> idx = indexstore(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt", cache=True)

Index methods and properties:

>>> idx.load()
>>> idx.load(nrows=12)  # Only load the first N rows of the index
>>> idx.N_RECORDS  # Shortcut for length of 1st dimension of the index array
>>> idx.index  # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`)
>>> idx.shape  # shape of the full index array
>>> idx.uri_full_index  # List of absolute path to files from the full index table column 'file'
>>> idx.to_dataframe(index=True)  # Convert index to user-friendly :class:`pandas.DataFrame`
>>> idx.to_dataframe(index=True, nrows=2)  # Only returns the first nrows of the index

Search methods and properties:

>>> idx.search_wmo(1901393)
>>> idx.search_cyc(1)
>>> idx.search_wmo_cyc(1901393, [1,12])
>>> idx.search_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.search_lat_lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
>>> idx.N_MATCH  # Shortcut for length of 1st dimension of the search results array
>>> idx.search  # Internal table with search results
>>> idx.uri  # List of absolute path to files from the search results table column 'file'
>>> idx.run()  # Run the search and save results in cache if necessary
>>> idx.to_dataframe()  # Convert search results to user-friendly :class:`pandas.DataFrame`
>>> idx.to_dataframe(nrows=2)  # Only returns the first nrows of the search results

Misc:

>>> idx.cname
>>> idx.read_wmo
>>> idx.records_per_wmo
__init__(host: str = 'https://data-argo.ifremer.fr', index_file: str = 'ar_index_global_prof.txt', cache: bool = False, cachedir: str = '', timeout: int = 0)#

Create an Argo index file store

Parameters
  • host (str, default: https://data-argo.ifremer.fr) – Host is a local or remote ftp/http path to a dac folder (GDAC structure compliant). This takes values like: ftp://ftp.ifremer.fr/ifremer/argo, ftp://usgodae.org/pub/outgoing/argo or a local absolute path.

  • index_file (str, default: ar_index_global_prof.txt) – Name of the csv-like text file with the index

  • cache (bool, default: False) – Use cache or not.

  • cachedir (str, default: OPTIONS['cachedir'])) – Folder where to store cached files

Methods

__init__([host, index_file, cache, ...])

Create an Argo index file store

cachepath(path)

Return path to a cached file

clear_cache()

Clear cache registry and files associated with this store instance.

load([nrows, force])

Load an Argo-index file content

read_wmo([index])

Return list of unique WMOs in search results

records_per_wmo([index])

Return the number of records per unique WMOs in search results

run([nrows])

Filter index with search criteria

search_cyc(CYCs[, nrows])

Search index for cycle numbers

search_lat_lon(BOX[, nrows])

Search index for a rectangular latitude/longitude domain

search_lat_lon_tim(BOX[, nrows])

Search index for a rectangular latitude/longitude domain and time range

search_tim(BOX[, nrows])

Search index for a time range

search_wmo(WMOs[, nrows])

Search index for floats defined by their WMO

search_wmo_cyc(WMOs, CYCs[, nrows])

Search index for floats defined by their WMO and specific cycle numbers

to_dataframe([nrows, index])

Return index or search results as pandas.DataFrame

Attributes

N_FILES

Number of rows in search result or index if search not triggered

N_MATCH

Number of rows in search result

N_RECORDS

Number of rows in the full index

backend

Name of store backend

cname

Return the search constraint(s) as a pretty formatted string

ext

Storage file extension

search_path

Path to search result uri

search_type

Dictionary with search meta-data

sha_df

Returns a unique SHA for a cname/dataframe

sha_h5

Returns a unique SHA for a cname/hdf5

sha_pq

Returns a unique SHA for a cname/parquet

shape

Shape of the index array

uri

List of URI from search results

uri_full_index

List of URI from index