Contributing to argopy#

First off, thanks for taking the time to contribute!

Note

Large parts of this document came from the Xarray and Pandas contributing guides.

If you seek support for your argopy usage or if you don’t want to read this whole thing and just have a question: visit our Discussion forum.

Where to start?#

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

If you are brand new to argopy or open source development, we recommend going through the GitHub “issues” tab to find issues that interest you. There are a number of issues listed under Documentation and Good first issues where you could start out. Once you’ve found an interesting issue, you can return here to get your development environment setup.

Please don’t file an issue to ask a question, instead visit our Discussion forum. where a number of items are listed under Documentation and Good first issue

Bug reports and enhancement requests#

Bug reports are an important part of making argopy more stable. Having a complete bug report will allow others to reproduce the bug and provide insight into fixing. See this stackoverflow article for tips on writing a good bug report.

Trying the bug producing code out on the master branch is often a worthwhile exercise to confirm the bug still exists. It is also worth searching existing bug reports and pull requests to see if the issue has already been reported and/or fixed.

Bug reports must:

  1. Include a short, self contained Python snippet reproducing the problem. You can format the code nicely by using GitHub Flavored Markdown:

    ```python
    >>> import argopy as ar
    >>> ds = ar.DataFetcher(backend='erddap').float(5903248).to_xarray()
    ...
    ```
    
  2. Include the full version string of argopy and its dependencies. You can use the built in function:

    >>> import argopy
    >>> argopy.show_versions()
    
  3. Explain why the current behavior is wrong/not desired and what you expect instead.

The issue will then show up to the argopy community and be open to comments/ideas from others.

Click here to open an issue with the specific bug reporting template

Contributing to the documentation#

If you’re not the developer type, contributing to the documentation is still of huge value. You don’t even have to be an expert on argopy to do so! In fact, there are sections of the docs that are worse off after being written by experts. If something in the docs doesn’t make sense to you, updating the relevant section after you figure it out is a great way to ensure it will help the next person.

About the argopy documentation#

The documentation is written in reStructuredText, which is almost like writing in plain English, and built using Sphinx. The Sphinx Documentation has an excellent introduction to reST. Review the Sphinx docs to perform more complex changes to the documentation as well.

Some other important things to know about the docs:

  • The argopy documentation consists of two parts: the docstrings in the code itself and the docs in this folder argopy/docs/.

    The docstrings are meant to provide a clear explanation of the usage of the individual functions, while the documentation in this folder consists of tutorial-like overviews per topic together with some other information (what’s new, installation, etc).

  • The docstrings follow the Numpy Docstring Standard, which is used widely in the Scientific Python community. This standard specifies the format of the different sections of the docstring. See this document for a detailed explanation, or look at some of the existing functions to extend it in a similar manner.

  • The tutorials make use of the ipython directive sphinx extension. This directive lets you put code in the documentation which will be run during the doc build. For example:

    .. ipython:: python
    
        x = 2
        x ** 3
    

    will be rendered as:

    In [1]: x = 2
    
    In [2]: x ** 3
    Out[2]: 8
    

    Almost all code examples in the docs are run (and the output saved) during the doc build. This approach means that code examples will always be up to date, but it does make the doc building a bit more complex.

  • Our API documentation in docs/api.rst houses the auto-generated documentation from the docstrings. For classes, there are a few subtleties around controlling which methods and attributes have pages auto-generated.

    Every method should be included in a toctree in api.rst, else Sphinx will emit a warning.

How to build the argopy documentation#

Requirements#

Make sure to follow the instructions on creating a development environment below and use the specific environment argopy-docs:

$ ./ci/envs_manager -i argopy-docs
$ conda activate argopy-docs
$ pip install -e .
$ pip install -r docs/requirements.txt

Building the documentation#

Navigate to your local argopy/docs/ directory in the console and run:

make html

Then you can find the HTML output in the folder argopy/docs/_build/html/.

The first time you build the docs, it will take quite a while because it has to run all the code examples and build all the generated docstring pages. In subsequent evocations, sphinx will try to only build the pages that have been modified.

If you want to do a full clean build, do:

make clean
make html

Working with the code#

Development workflow#

Anyone interested in helping to develop argopy needs to create their own fork of our git repository. (Follow the github forking instructions. You will need a github account.)

Clone your fork on your local machine.

$ git clone git@github.com:USERNAME/argopy

(In the above, replace USERNAME with your github user name.)

Then set your fork to track the upstream argopy repo.

$ cd argopy
$ git remote add upstream git://github.com/euroargodev/argopy.git

You will want to periodically sync your master branch with the upstream master.

$ git fetch upstream
$ git rebase upstream/master

Never make any commits on your local master branch. Instead open a feature branch for every new development task.

$ git checkout -b cool_new_feature

(Replace cool_new_feature with an appropriate description of your feature.) At this point you work on your new feature, using git add to add your changes. When your feature is complete and well tested, commit your changes

$ git commit -m 'did a bunch of great work'

and push your branch to github.

$ git push origin cool_new_feature

At this point, you go find your fork on github.com and create a pull request. Clearly describe what you have done in the comments. If your pull request fixes an issue or adds a useful new feature, the team will gladly merge it.

After your pull request is merged, you can switch back to the master branch, rebase, and delete your feature branch. You will find your new feature incorporated into argopy.

$ git checkout master
$ git fetch upstream
$ git rebase upstream/master
$ git branch -d cool_new_feature

Virtual environment#

We created a short command line script to help manage argopy virtual environments. It’s available in the “ci” folder of the repository.

$ ./ci/envs_manager -h

Manage argopy related Conda environments

Syntax: manage_ci_envs [-hl] [-d] [-rik]
options:
h     Print this Help
l     List all available environments
d     Dry run, just list what the script would do

r     Remove an environment
i     Install an environment (start by removing it if it's already installed)
k     Install an environment as a Jupyter kernel
$ ./ci/envs_manager -l

Available environments:
     argopy-docs-rtd
     argopy-py311-all-free
     argopy-py311-core-free
     argopy-py311-all-pinned
     argopy-py311-core-pinned
     argopy-py312-all-free
     argopy-py312-core-free
     argopy-py312-all-pinned
     argopy-py312-core-pinned

Some legacy environment files could persist in the ./ci/requirements folder.

Then, you can simply install the default dev environment like this:

$ ./ci/envs_manager -i argopy-py311-all-pinned
$ conda activate argopy-py311-all-pinned
$ pip install -e .
$ python -c 'import argopy; argopy.show_versions()'

Code standards#

Writing good code is not just about what you write. It is also about how you write it. During Continuous Integration testing, several tools will be run to check your code for stylistic errors. Generating any warnings will cause the test to fail. Thus, good style is a requirement for submitting code to argopy.

Code Formatting#

argopy uses several tools to ensure a consistent code format throughout the project:

  • Flake8 for general code quality

pip:

pip install flake8

and then run from the root of the argopy repository:

flake8

to qualify your code.

Contributing to the code base#

Data fetchers#

Introduction#

If you want to add your own data fetcher for a new service, then, keep in mind that:

  • Data fetchers are responsible for:

    • loading all available data from a given source and providing at least a to_xarray() method

    • making data compliant to Argo standards (data type, variable name, attributes, etc 
)

  • Data fetchers must:

    • inherit from the argopy.data_fetchers.proto.ArgoDataFetcherProto

    • provide parameters:

      • access_points, eg: [‘wmo’, ‘box’]

      • exit_formats, eg: [‘xarray’]

      • dataset_ids, eg: [‘phy’, ‘ref’, ‘bgc’]

    • provides the facade API (argopy.fetchers.ArgoDataFetcher) methods to transform or filter data according to user level or requests. These must includes:

      • transform_data_mode()

      • filter_qc()

      • filter_variables()

      • filter_researchmode()

It is the responsibility of the facade API (argopy.fetchers.ArgoDataFetcher) to run transformers and filters according to user level or requests, not the data fetcher.

Detailed guideline#

A new data fetcher must comply with:

Inheritance#

Inherit from the argopy.data_fetchers.proto.ArgoDataFetcherProto. This enforces minimal internal design compliance.

Auto-discovery of fetcher properties#

The new fetcher must come with the access_points, exit_formats and dataset_ids properties at the top of the file, e.g.:

access_points = ['wmo' ,'box']
exit_formats = ['xarray']
dataset_ids = ['phy', 'bgc']  # First is default
api_server = "https://argovis-api.colorado.edu"
api_server_check = "https://argovis-api.colorado.edu/ping"

Values depend on what the new access point can return and what you want to implement. A good start is with the wmo access point and the phy dataset ID. The xarray data format is the minimum required. These variables are used by the facade to auto-discover the fetcher capabilities. The dataset_ids property is used to determine which variables can be retrieved.

Auto-discovery of fetcher access points#

The new fetcher must come at least with a Fetch_box or Fetch_wmo class, basically one for each of the access_points listed as properties. More generally we may have a main class that provides the key functionality to retrieve data from the source, and then classes for each of the access_points of your fetcher. This pattern could look like this:

class NewDataFetcher(ArgoDataFetcherProto)
class Fetch_wmo(NewDataFetcher)
class Fetch_box(NewDataFetcher)

It could also be like:

class Fetch_wmo(ArgoDataFetcherProto)
class Fetch_box(ArgoDataFetcherProto)

Note that the class names Fetch_wmo and Fetch_box must not change, this is also used by the facade to auto-discover the fetcher capabilities.

Fetch_wmo is used to retrieve platforms and eventually profiles data. It must take in the __init__() method a WMO and a CYC as first and second options. WMO is always passed, CYC is optional. These are passed by the facade to implement the fetcher.float and fetcher.profile methods. When a float is requested, the CYC option is not passed by the facade. Last, WMO and CYC are either a single integer or a list of integers: this means that Fetch_wmo must be able to handle more than one float/platform retrieval.

Fetch_box is used to retrieve a rectangular domain in space and time. It must take in the __init__() method a BOX as first option that is passed a list(lon_min: float, lon_max: float, lat_min: float, lat_max: float, pres_min: float, pres_max: float, date_min: str, date_max: str) from the facade. The two bounding dates [date_min and date_max] should be optional (if not specified, the entire time series is requested by the user).

Internal File systems#

All http requests must go through the internal httpstore, an internal wrapper around fsspec that allows to manage request caching very easily. You can simply use it this way for json requests:

from argopy.stores import httpstore
with httpstore(timeout=120).open("https://argovis.colorado.edu/catalog/profiles/5904797_12") as of:
   profile = json.load(of)
Output data format#

Last but not least, about the output data. In argopy, we want to provide data for both expert and standard users. This is explained and illustrated in the documentation here. This means for a new data fetcher that the data content should be curated and clean of any internal/jargon variables that is not part of the Argo ADMT vocabulary. For instance, variables like: bgcMeasKeys or geoLocation are not allowed. This will ensure that whatever the data source set by users, the output xarray or dataframe will be formatted and contain the same variables. This will also ensure that other argopy features can be used on the new fetcher output, like plotting or xarray data manipulation.

Policy regarding AI powered contributions#

AI Usage Policy#

The argopy developing team has set some rules for generative AI usage. These rules are not definitive and may be updated in the future.

  • All generative AI usage in any form must be disclosed. If AI was used to generate a significant portion of your contribution (beyond simple autocomplete), we ask that you disclose it within the code or PR description. You must state the tool you used along with a sentence saying that the work was AI-assisted.

  • Pull requests and issues created by AI are forbidden. If AI isn’t disclosed but a maintainer suspects its use within a PR or issue creation, the PR will be closed. AI assistance can be used for discussions inside PR or issues, but must always be supervized and reviewed by a human eye before submission.

  • Media. Text and code are the only acceptable AI-generated content, per the other rules in this policy.

These rules apply to all contributors, including maintainers.

Human Accountability#

The human contributor is 100% responsible for the contribution.

If you submit a Pull Request that includes AI-generated code, documentation, or comments:

  • You must fully understand the code you submit, and the context in which it is included inside the global project or Argopy.

  • You must be able to explain the “why” behind the implementation during the review process.

  • You are responsible for the long-term maintenance of that code.

The people#

Please keep in mind that Argopy is developed and maintained by humans being.

It is for us a fundamental aspect of the project that discussions (in issues or PR), even if happenning on github, are made between humans. Knowing this, it will be considered impolite to approach the Argopy dev community with AI-agent.

Ethics, Ecology & Expertise#

For more than 20 years, the Argo program has been a key element in the observation and understanding of climate change and its impact on the ocean. Argo is incremental in monitoring how the human-driven climate change negatively impact the ocean state and marine life (check the IPCC special report on ocean or that brief overview also from the IPCC). So, it is of primary and inherent concern to Argo to limit its environmental footprint.

We like to think that Argopy is part of the Argo ecosystem.

As sustainable software developers, the Argopy team aims to better understand the generative AI’s environmental impact, following our preliminary work in monitoring the carbon footprint of maintaining and developing Argopy. But very little information are available on the generative AI’s environmental impact and “current GAI models are deployed mainly in carbon-intensive regions” (Ding et al, 2025).

With that, 2 things to keep in mind when contributing to the development of Argopy :

  • Ethics & Ecology : Not knowing the - presumably very large - impact of generative AI on CO2 emissions (and use of limited Earth resources), it is important to us that we adopt a conservative and limited approach regarding GAI usage in contributing to Argopy in order to limit its environmental footprint.

  • Expertise : Argopy, as part of a public research infrastructure, has more duty in developing/preserving a technical and scientific expertise for the public good, than participating in a blind race to productivity gain (in the social acceleration sense, an argument echoing the slow science approach at work within Argopy). It is thus also important to us that we adopt a conservative and limited approach regarding GAI usage in contributing to Argopy to preserve a good level of human expertise regarding the produced code and our associated ability to disseminate our expertise to the end-users.