argopy.stores.argo_index_pa.indexstore_pyarrow#
- class indexstore_pyarrow(host: str = 'https://data-argo.ifremer.fr', index_file: str = 'ar_index_global_prof.txt', cache: bool = False, cachedir: str = '', timeout: int = 0)[source]#
Argo GDAC index store using
pyarrow.Table
as internal storage format.With this store, index and search results are saved as pyarrow/parquet files in cache
Examples
An index store is instantiated with the access path (host) and the index file:
>>> idx = indexstore() >>> idx = indexstore(host="ftp://ftp.ifremer.fr/ifremer/argo") >>> idx = indexstore(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt") >>> idx = indexstore(host="https://data-argo.ifremer.fr", index_file="ar_index_global_prof.txt", cache=True)
Index methods and properties:
>>> idx.load() >>> idx.load(nrows=12) # Only load the first N rows of the index >>> idx.N_RECORDS # Shortcut for length of 1st dimension of the index array >>> idx.index # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`) >>> idx.shape # shape of the full index array >>> idx.uri_full_index # List of absolute path to files from the full index table column 'file' >>> idx.to_dataframe(index=True) # Convert index to user-friendly :class:`pandas.DataFrame` >>> idx.to_dataframe(index=True, nrows=2) # Only returns the first nrows of the index
Search methods and properties:
>>> idx.search_wmo(1901393) >>> idx.search_cyc(1) >>> idx.search_wmo_cyc(1901393, [1,12]) >>> idx.search_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.search_lat_lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.search_lat_lon_tim([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.N_MATCH # Shortcut for length of 1st dimension of the search results array >>> idx.search # Internal table with search results >>> idx.uri # List of absolute path to files from the search results table column 'file' >>> idx.run() # Run the search and save results in cache if necessary >>> idx.to_dataframe() # Convert search results to user-friendly :class:`pandas.DataFrame` >>> idx.to_dataframe(nrows=2) # Only returns the first nrows of the search results
Misc:
>>> idx.cname >>> idx.read_wmo >>> idx.records_per_wmo
- __init__(host: str = 'https://data-argo.ifremer.fr', index_file: str = 'ar_index_global_prof.txt', cache: bool = False, cachedir: str = '', timeout: int = 0)#
Create an Argo index file store
- Parameters:
host (str, default:
https://data-argo.ifremer.fr
) – Host is a local or remote ftp/http path to a dac folder (GDAC structure compliant). This takes values like:ftp://ftp.ifremer.fr/ifremer/argo
,ftp://usgodae.org/pub/outgoing/argo
or a local absolute path.index_file (str, default:
ar_index_global_prof.txt
) – Name of the csv-like text file with the indexcache (bool, default: False) – Use cache or not.
cachedir (str, default: OPTIONS['cachedir'])) – Folder where to store cached files
Methods
__init__
([host, index_file, cache, ...])Create an Argo index file store
cachepath
(path)Return path to a cached file
clear_cache
()Clear cache registry and files associated with this store instance.
load
([nrows, force])Load an Argo-index file content
read_wmo
([index])Return list of unique WMOs in search results
records_per_wmo
([index])Return the number of records per unique WMOs in search results
run
([nrows])Filter index with search criteria
search_cyc
(CYCs[, nrows])Search index for cycle numbers
search_lat_lon
(BOX[, nrows])Search index for a rectangular latitude/longitude domain
search_lat_lon_tim
(BOX[, nrows])Search index for a rectangular latitude/longitude domain and time range
search_tim
(BOX[, nrows])Search index for a time range
search_wmo
(WMOs[, nrows])Search index for floats defined by their WMO
search_wmo_cyc
(WMOs, CYCs[, nrows])Search index for floats defined by their WMO and specific cycle numbers
to_dataframe
([nrows, index])Return index or search results as
pandas.DataFrame
Attributes
N_FILES
Number of rows in search result or index if search not triggered
N_MATCH
Number of rows in search result
N_RECORDS
Number of rows in the full index
backend
Name of store backend
cname
Return the search constraint(s) as a pretty formatted string
ext
Storage file extension
search_path
Path to search result uri
search_type
Dictionary with search meta-data
sha_df
Returns a unique SHA for a cname/dataframe
sha_h5
Returns a unique SHA for a cname/hdf5
sha_pq
Returns a unique SHA for a cname/parquet
shape
Shape of the index array
uri
List of URI from search results
uri_full_index
List of URI from index