argopy.stores.ArgoIndex#
- class ArgoIndex(**kwargs)[source]#
Argo GDAC index store
If Pyarrow is available, this class will use
pyarrow.Tableas internal storage format; otherwise, apandas.DataFramewill be used.Shortcuts for
hostargument:httporhttpsforhttps://data-argo.ifremer.frus-httporus-httpsforhttps://usgodae.org/pub/outgoing/argoftpforftp://ftp.ifremer.fr/ifremer/argos3orawsfors3://argo-gdac-sandbox/pub/idx
Shortcuts for
index_fileargument:corefor thear_index_global_prof.txtindex file,bgc-bfor theargo_bio-profile_index.txtindex file,bgc-sfor theargo_synthetic-profile_index.txtindex file,auxfor theetc/argo-index/argo_aux-profile_index.txtindex file.metafor thear_index_global_meta.txtindex file.
Examples
An index store is instantiated with a host (any access path, local, http or ftp) and an index file#>>> idx = ArgoIndex() >>> idx = ArgoIndex(host="https://data-argo.ifremer.fr") # Default host >>> idx = ArgoIndex(host="ftp://ftp.ifremer.fr/ifremer/argo", index_file="ar_index_global_prof.txt") # Default index >>> idx = ArgoIndex(index_file="bgc-s") # Use keywords instead of exact file names >>> idx = ArgoIndex(host="https://data-argo.ifremer.fr", index_file="bgc-b", cache=True) # Use cache for performances >>> idx = ArgoIndex(host=".", index_file="dummy_index.txt", convention="core") # Load your own index
Full index methods and properties#>>> idx.load() >>> idx.load(nrows=12) # Only load the first N rows of the index >>> idx.to_dataframe(index=True) # Convert index to user-friendly :class:`pandas.DataFrame` >>> idx.to_dataframe(index=True, nrows=2) # Only returns the first nrows of the index >>> idx.N_RECORDS # Shortcut for length of 1st dimension of the index array >>> idx.index # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`) >>> idx.shape # shape of the full index array >>> idx.uri_full_index # List of absolute path to files from the full index table column 'file'
Search methods#>>> idx.query.wmo(1901393) >>> idx.query.cyc(1) >>> idx.query.wmo_cyc(1901393, [1,12]) >>> idx.query.lat([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.query.lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.query.date([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.query.lon_lat([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.query.box([-60, -55, 40., 45., '2007-08-01', '2007-09-01']) # Take an index BOX definition >>> idx.query.params(['C1PHASE_DOXY', 'DOWNWELLING_PAR']) # Take a list of strings, only for BGC index ! >>> idx.query.parameter_data_mode({'BBP700': 'D', 'DOXY': ['A', 'D']}) # Take a dict. >>> idx.query.profiler_type(845) >>> idx.query.profiler_label('NINJA')
Composing search methods#>>> idx.query.compose({'box': BOX, 'wmo': WMOs}) >>> idx.query.compose({'box': BOX, 'params': 'DOXY'}) >>> idx.query.compose({'box': BOX, 'params': (['DOXY', 'DOXY2'], {'logical': 'and'})}) >>> idx.query.compose({'params': 'DOXY', 'profiler_label': 'ARVOR'})
Search result properties and methods#>>> idx.N_MATCH # Shortcut for length of 1st dimension of the search results array >>> idx.search # Internal table with search results >>> idx.uri # List of absolute path to files from the search results table column 'file' >>> idx.run() # Run the search and save results in cache if necessary >>> idx.to_dataframe() # Convert search results to user-friendly :class:`pandas.DataFrame` >>> idx.to_dataframe(nrows=2) # Only returns the first nrows of the search results >>> idx.to_indexfile("search_index.txt") # Export search results to Argo standard index file
List of file properties#>>> idx.read_wmo() >>> idx.read_dac_wmo() >>> idx.read_params() >>> idx.read_domain() >>> idx.read_files() >>> idx.records_per_wmo()
Misc#>>> idx.convention # What is the expected index format (core vs BGC profile index) >>> idx.cname >>> idx.domain # the default read_domain() output, as a property >>> idx.copy(deep=False)
Iterate onargopy.ArgoFloatinstance#>>> for a_float in idx.iterfloats(): >>> print(a_float.WMO) >>> ds = a_float.open_dataset('prof')
- __init__(**kwargs)[source]#
Create an Argo index store
- Parameters:
host (str, optional, default=OPTIONS["gdac"]) –
Local or remote (http, ftp or s3) path to a dac folder (compliant with GDAC structure).
This parameter takes values like:
httporhttpsforhttps://data-argo.ifremer.frus-httporus-httpsforhttps://usgodae.org/pub/outgoing/argoftpforftp://ftp.ifremer.fr/ifremer/argos3orawsfors3://argo-gdac-sandbox/pub/idxa local absolute path
index_file (str, default:
ar_index_global_prof.txt) –Name of the csv-like text file with the index.
This parameter takes values like:
coreorar_index_global_prof.txtbgc-borargo_bio-profile_index.txtbgc-sorargo_synthetic-profile_index.txtauxoretc/argo-index/argo_aux-profile_index.txtmetaorar_index_global_meta.txta local absolute path toward a file following an Argo index convention. When using a local file, you need to set the
conventionfollowed by the file.
convention (str, default: None) –
Set the expected format convention of the index file.
This is useful when trying to load an index file with a custom name. If set to
None, we’ll try to infer the convention from theindex_filevalue.This parameter takes values like:
coreorar_index_global_profbgc-borargo_bio-profile_indexbgc-sorargo_synthetic-profile_indexauxorargo_aux-profile_indexmetaorar_index_global_meta
cache (bool, default: False) – Use cache or not.
cachedir (str, default: OPTIONS['cachedir']) – Folder where to store cached files.
timeout (int, default: OPTIONS['api_timeout']) – Time out in seconds to connect to a remote host (ftp or http).
Methods
__init__(**kwargs)Create an Argo index store
cachepath(path)Return path to a cached file
clear_cache()Clear cache registry and files associated with this store instance.
copy([deep])Returns a copy of this
ArgoIndexinstanceiterfloats([index, chunksize])Iterate over unique Argo floats in the full index or search results
load([nrows, force])Load an Argo-index file content
read_dac_wmo([index])Return a tuple of unique [DAC, WMO] pairs from the index or search results
read_domain([index])Read the space/time domain of the index
read_files([index])Return file paths listed in index or search results
read_params([index])Return list of unique PARAMETERs in index or search results
read_wmo([index])Return list of unique WMOs from the index or search results
records_per_wmo([index])Return the number of records per unique WMOs in search results
run([nrows])Filter index with search criteria
search_cyc(CYCs[, nrows])Deprecated: this method is replaced by ArgoIndex().query.cyc()
search_lat_lon(BOX[, nrows])Deprecated: this method is replaced by
ArgoIndex().query.lon_lat()search_lat_lon_tim(BOX[, nrows])Deprecated: this method is replaced by
ArgoIndex().query.box()search_parameter_data_mode(PARAMs[, ...])Deprecated: this method is replaced by
ArgoIndex().query.parameter_data_mode()search_params(PARAMs[, logical, nrows])Deprecated: this method is replaced by
ArgoIndex().query.params()search_profiler_label(profiler_label[, nrows])Deprecated: this method is replaced by
ArgoIndex().query.profiler_label()search_profiler_type(profiler_type[, nrows])Deprecated: this method is replaced by
ArgoIndex().query.profiler_type()search_tim(BOX[, nrows])Deprecated: this method is replaced by ArgoIndex().query.date()
search_wmo(WMOs[, nrows])search_wmo_cyc(WMOs, CYCs[, nrows])Deprecated: this method is replaced by ArgoIndex().query.wmo_cyc()
to_dataframe([nrows, index, completed])Return index or search results as
pandas.DataFrameto_indexfile(file)Save search results on file, following the Argo standard index formats
Attributes
N_FILESNumber of rows in search result or index if search not triggered
N_MATCHNumber of rows in search result
N_RECORDSNumber of rows in the full index
backendName of store backend (pandas or pyarrow)
cnameSearch query as a pretty formatted string
conventionConvention of the index (standard csv file name)
convention_columnsCSV file column names for the index convention
convention_supportedList of supported conventions
convention_titleLong name for the index convention
domainSpace/time domain of the index
extStorage file extension
filesFile paths listed in search results
files_full_indexFile paths listed in the index
index_pathAbsolute path to the index file
search_pathPath to search result uri
search_typeDictionary with search meta-data
sha_dfReturns a unique SHA for a cname/dataframe
sha_h5Returns a unique SHA for a cname/hdf5
sha_pqReturns a unique SHA for a cname/parquet
shapeShape of the index array
uriFile paths listed in search results
uri_full_indexFile paths listed in the index