Caching#
Let’s start with standard import:
In [1]: import argopy
In [2]: from argopy import DataFetcher
Caching data#
If you want to avoid retrieving the same data several times during a working session, or if you fetched a large amount of data, you may want to temporarily save data in a cache file.
You can cache fetched data with the fetchers option cache
.
Argopy cached data are persistent, meaning that they are stored locally on files and will survive execution of your script with a new session. Cached data have an expiration time of one day, since this is the update frequency of most data sources. This will ensure you always have the last version of Argo data.
All data and meta-data (index) fetchers have a caching system.
The argopy default cache folder is under your home directory at
~/.cache/argopy
.
But you can specify the path you want to use in several ways:
with argopy global options:
argopy.set_options(cachedir='mycache_folder')
in a temporary context:
with argopy.set_options(cachedir='mycache_folder'):
f = DataFetcher(cache=True)
when instantiating the data fetcher:
f = DataFetcher(cache=True, cachedir='mycache_folder')
Warning
You really need to set the cache
option to True
. Specifying only the cachedir
won’t trigger caching !
Clearing the cache#
If you want to manually clear your cache folder, and/or make sure your
data are newly fetched, you can do it at the fetcher level with the
clear_cache
method.
Start to fetch data and store them in cache:
In [3]: argopy.set_options(cachedir='mycache_folder')
Out[3]: <argopy.options.set_options at 0x7b097925f4f0>
In [4]: fetcher1 = DataFetcher(cache=True).profile(6902746, 34).load()
Fetched data are in the local cache folder:
In [5]: import os
In [6]: os.listdir('mycache_folder')
Out[6]: ['cache', 'a01ebcc88c67402bbed8ca2ca2a00984bf22eae43e284c46323b1784109b10a5']
where we see hash entries for the newly fetched data and the cache
registry file cache
.
We can then fetch something else using the same cache folder:
In [7]: fetcher2 = DataFetcher(cache=True).profile(1901393, 1).load()
All fetched data are cached:
In [8]: os.listdir('mycache_folder')
Out[8]:
['cache',
'a01ebcc88c67402bbed8ca2ca2a00984bf22eae43e284c46323b1784109b10a5',
'099b4463a18c2620676831f02511ad3d8604fc8ebd8337536f8fba57ce8dbe24']
Note the new hash file from fetcher2 data.
It is important to note that we can safely clear the cache from the first fetcher1 data without removing fetcher2 data:
In [9]: fetcher1.clear_cache()
In [10]: os.listdir('mycache_folder')
Out[10]: ['cache', '099b4463a18c2620676831f02511ad3d8604fc8ebd8337536f8fba57ce8dbe24']
By using the fetcher level clear cache, you make sure that only data fetched with it are removed, while other fetched data (with other fetchers for instance) will stay in place.
If you want to clear the entire cache folder, whatever the fetcher used, do it at the package level with:
In [11]: argopy.clear_cache()
In [12]: os.listdir('mycache_folder')
Out[12]: []