argopy.stores.ArgoIndex

argopy.stores.ArgoIndex#

class ArgoIndex(**kwargs)[source]#

Argo GDAC index store

If Pyarrow is available, this class will use pyarrow.Table as internal storage format; otherwise, a pandas.DataFrame will be used.

Shortcuts for host argument:

  • http or https for https://data-argo.ifremer.fr

  • us-http or us-https for https://usgodae.org/pub/outgoing/argo

  • ftp for ftp://ftp.ifremer.fr/ifremer/argo

  • s3 or aws for s3://argo-gdac-sandbox/pub/idx

Shortcuts for index_file argument:

  • core for the ar_index_global_prof.txt index file,

  • bgc-b for the argo_bio-profile_index.txt index file,

  • bgc-s for the argo_synthetic-profile_index.txt index file,

  • aux for the etc/argo-index/argo_aux-profile_index.txt index file.

  • meta for the ar_index_global_meta.txt index file.

Examples

An index store is instantiated with a host (any access path, local, http or ftp) and an index file#
idx = ArgoIndex()
idx = ArgoIndex(host="https://data-argo.ifremer.fr")  # Default host
idx = ArgoIndex(host="ftp://ftp.ifremer.fr/ifremer/argo", index_file="ar_index_global_prof.txt")  # Default index
idx = ArgoIndex(index_file="bgc-s")  # Use keywords instead of exact file names
idx = ArgoIndex(host="https://data-argo.ifremer.fr", index_file="bgc-b", cache=True)  # Use cache for performances
idx = ArgoIndex(host=".", index_file="dummy_index.txt", convention="core")  # Load your own index
Full index methods and properties#
idx.load()
idx.load(nrows=12)  # Only load the first N rows of the index
idx.to_dataframe(index=True)  # Convert index to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(index=True, nrows=2)  # Only returns the first nrows of the index
idx.N_RECORDS  # Shortcut for length of 1st dimension of the index array
idx.index  # internal storage structure of the full index (:class:`pyarrow.Table` or :class:`pandas.DataFrame`)
idx.shape  # shape of the full index array
idx.uri_full_index  # List of absolute path to files from the full index table column 'file'
Search methods#
idx.query.wmo(1901393)
idx.query.wmo([6902915, 1901393])
idx.query.cyc(1)
idx.query.cyc([1, 12])
idx.query.wmo_cyc(1901393, [1,12])

idx.query.lat([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
idx.query.lon([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
idx.query.date([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
idx.query.lon_lat([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition
idx.query.box([-60, -55, 40., 45., '2007-08-01', '2007-09-01'])  # Take an index BOX definition

idx.query.params(['C1PHASE_DOXY', 'DOWNWELLING_PAR'])  # Take a list of strings, only for BGC index !
idx.query.parameter_data_mode({'BBP700': 'D', 'DOXY': ['A', 'D']})  # Take a dict.

idx.query.profiler_type(845)
idx.query.profiler_type([845, 856])
idx.query.profiler_label('NINJA')
idx.query.profiler_label(['NINJA', 'SOLO-D deep'])

idx.query.institution_code('IF')
idx.query.institution_code(['IF', 'JA'])
idx.query.institution_name('Canada')
idx.query.institution_name(['Canada', 'INCOIS'])
idx.query.dac('coriolis')
idx.query.dac(['meds', 'aoml'])
Composing search methods#
idx.query.compose({'box': BOX, 'wmo': WMOs})
idx.query.compose({'box': BOX, 'params': 'DOXY'})
idx.query.compose({'box': BOX, 'params': (['DOXY', 'DOXY2'], {'logical': 'and'})})
idx.query.compose({'params': 'DOXY', 'profiler_label': 'ARVOR'})
Search result properties and methods#
idx.N_MATCH  # Shortcut for length of 1st dimension of the search results array
idx.search  # Internal table with search results
idx.uri  # List of absolute path to files from the search results table column 'file'

idx.run()  # Run the search and save results in cache if necessary
idx.to_dataframe()  # Convert search results to user-friendly :class:`pandas.DataFrame`
idx.to_dataframe(nrows=2)  # Only returns the first nrows of the search results
idx.to_indexfile("search_index.txt")  # Export search results to Argo standard index file
List of file properties#
idx.read_wmo()
idx.read_dac_wmo()
idx.read_params()
idx.read_domain()
idx.read_files()

idx.records_per_wmo()
Misc#
idx.convention  # What is the expected index format (core vs BGC profile index)
idx.cname
idx.domain # the default read_domain() output, as a property
idx.copy(deep=False)
Iterate on argopy.ArgoFloat instance#
for a_float in idx.iterfloats():
    print(a_float.WMO)
    ds = a_float.open_dataset('prof')
Trajectory map#
idx = idx.query.wmo(6903091)
idx.plot.trajectory()
Trajectory map with custom arguments#
idx = ArgoIndex(index_file='bgc-s')
idx.query.params('CHLA')

idx.plot.trajectory(set_global=1,
                    add_legend=0,
                    traj=0,
                    cbar=False,
                    markersize=12,
                    markeredgesize=0.1,
                    dpi=120,
                    figsize=(20,20));
Bar plot#
idx.plot.bar(by='dac', index=1)
idx.plot.bar(by='profiler')
__init__(**kwargs)[source]#

Create an Argo index store

Parameters:
  • host (str, optional, default=OPTIONS["gdac"]) –

    Local or remote (http, ftp or s3) path to a dac folder (compliant with GDAC structure).

    This parameter takes values like:

    • http or https for https://data-argo.ifremer.fr

    • us-http or us-https for https://usgodae.org/pub/outgoing/argo

    • ftp for ftp://ftp.ifremer.fr/ifremer/argo

    • s3 or aws for s3://argo-gdac-sandbox/pub/idx

    • a local absolute path

  • index_file (str, default: ar_index_global_prof.txt) –

    Name of the csv-like text file with the index.

    This parameter takes values like:

    • core or ar_index_global_prof.txt

    • bgc-b or argo_bio-profile_index.txt

    • bgc-s or argo_synthetic-profile_index.txt

    • aux or etc/argo-index/argo_aux-profile_index.txt

    • meta or ar_index_global_meta.txt

    • a local absolute path toward a file following an Argo index convention. When using a local file, you need to set the convention followed by the file.

  • convention (str, default: None) –

    Set the expected format convention of the index file.

    This is useful when trying to load an index file with a custom name. If set to None, we’ll try to infer the convention from the index_file value.

    This parameter takes values like:

    • core or ar_index_global_prof

    • bgc-b or argo_bio-profile_index

    • bgc-s or argo_synthetic-profile_index

    • aux or argo_aux-profile_index

    • meta or ar_index_global_meta

  • cache (bool, default: False) – Use cache or not.

  • cachedir (str, default: OPTIONS['cachedir']) – Folder where to store cached files.

  • timeout (int, default: OPTIONS['api_timeout']) – Time out in seconds to connect to a remote host (ftp or http).

Methods

__init__(**kwargs)

Create an Argo index store

cachepath(path)

Return path to a cached file

clear_cache()

Clear cache registry and files associated with this store instance.

copy([deep])

Returns a copy of this ArgoIndex instance

iterfloats([index, chunksize])

Iterate over unique Argo floats in the full index or search results

load([nrows, force])

Load an Argo-index file content

read_dac_wmo([index])

Return a tuple of unique [DAC, WMO] pairs from the index or search results

read_domain([index])

Read the space/time domain of the index

read_files([index, multi])

Return file paths listed in index or search results

read_params([index])

Return list of unique PARAMETERs in index or search results

read_wmo([index])

Return list of unique WMOs from the index or search results

records_per_wmo([index])

Return the number of records per unique WMOs in search results

records_per_wmo_legacy([index])

Return the number of records per unique WMOs in search results

run([nrows])

Filter index with search criteria

to_dataframe([nrows, index, completed])

Return index or search results as a pandas.DataFrame

to_indexfile(file)

Save search results on file, following the Argo standard index formats

Attributes

N_FILES

Number of rows in search result or index if search not triggered

N_MATCH

Number of rows in search result

N_RECORDS

Number of rows in the full index

backend

Name of store backend (pandas or pyarrow)

cname

Search query as a pretty formatted string

convention

Convention of the index (standard csv file name)

convention_columns

CSV file column names for the index convention

convention_supported

List of supported conventions

convention_title

Long name for the index convention

domain

Space/time domain of the index

ext

Storage file extension

files

File paths listed in search results

files_full_index

File paths listed in the index

index_path

Absolute path to the index file

search_path

Path to search result uri

search_type

Dictionary with search meta-data

sha_df

Returns a unique SHA for a cname/dataframe

sha_h5

Returns a unique SHA for a cname/hdf5

sha_pq

Returns a unique SHA for a cname/parquet

shape

Shape of the index array

uri

File paths listed in search results

uri_full_index

File paths listed in the index