Caching

Caching#

Let’s start with standard import:

In [1]: import argopy

In [2]: from argopy import DataFetcher

Caching data#

If you want to avoid retrieving the same data several times during a working session, or if you fetched a large amount of data, you may want to temporarily save data in a cache file.

You can cache fetched data with the fetchers option cache.

Argopy cached data are persistent, meaning that they are stored locally on files and will survive execution of your script with a new session. Cached data have an expiration time of one day, since this is the update frequency of most data sources. This will ensure you always have the last version of Argo data.

All data and meta-data (index) fetchers have a caching system.

The argopy default cache folder is under your home directory at ~/.cache/argopy.

But you can specify the path you want to use in several ways:

with argopy global options:

argopy.set_options(cachedir='mycache_folder')

in a temporary context:

with argopy.set_options(cachedir='mycache_folder'):
    f = DataFetcher(cache=True)

when instantiating the data fetcher:

f = DataFetcher(cache=True, cachedir='mycache_folder')

Warning

You really need to set the cache option to True. Specifying only the cachedir won’t trigger caching !

Clearing the cache#

If you want to manually clear your cache folder, and/or make sure your data are newly fetched, you can do it at the fetcher level with the clear_cache method.

Start to fetch data and store them in cache:

In [3]: argopy.set_options(cachedir='mycache_folder')
Out[3]: <argopy.options.set_options at 0x75bc1f254d50>

In [4]: fetcher1 = DataFetcher(cache=True).profile(6902746, 34).load()

Fetched data are in the local cache folder:

In [5]: import os

In [6]: os.listdir('mycache_folder')
Out[6]: ['a01ebcc88c67402bbed8ca2ca2a00984bf22eae43e284c46323b1784109b10a5', 'cache']

where we see hash entries for the newly fetched data and the cache registry file cache.

We can then fetch something else using the same cache folder:

In [7]: fetcher2 = DataFetcher(cache=True).profile(1901393, 1).load()

All fetched data are cached:

In [8]: os.listdir('mycache_folder')
Out[8]: 
['099b4463a18c2620676831f02511ad3d8604fc8ebd8337536f8fba57ce8dbe24',
 'a01ebcc88c67402bbed8ca2ca2a00984bf22eae43e284c46323b1784109b10a5',
 'cache']

Note the new hash file from fetcher2 data.

It is important to note that we can safely clear the cache from the first fetcher1 data without removing fetcher2 data:

In [9]: fetcher1.clear_cache()

In [10]: os.listdir('mycache_folder')
Out[10]: ['099b4463a18c2620676831f02511ad3d8604fc8ebd8337536f8fba57ce8dbe24', 'cache']

By using the fetcher level clear cache, you make sure that only data fetched with it are removed, while other fetched data (with other fetchers for instance) will stay in place.

If you want to clear the entire cache folder, whatever the fetcher used, do it at the package level with:

In [11]: argopy.clear_cache()

In [12]: os.listdir('mycache_folder')
Out[12]: []

Caching

Contents

Caching#

Caching data#

Clearing the cache#