Skip to content

Latest commit

 

History

History
315 lines (258 loc) · 13.1 KB

File metadata and controls

315 lines (258 loc) · 13.1 KB

Contributing

Development Setup

To contribute to this work, you must obtain access to the NASA MAAP, where the Algorithm Development Environment (ADE) resides, and thus where algorithms can be registered and launched.

You'll need to have the following installed in your development environment (wherever you plan to conduct development, within the ADE or not):

  • git (already installed in the ADE): On macOS, using Homebrew is highly recommended: brew install git. Otherwise, see https://git-scm.com/downloads.
  • pixi: See Pixi Installation (Hint: if you want to use the curl command, but don't have curl installed, but you have wget installed, you can replace curl -fsSL with wget -qO- [note the capital letter O and the trailing dash])

To prepare for contributing, do the following in your development environment (see example commands below):

  1. Clone this GitHub repository.
  2. Change directory to the cloned working directory.
  3. Install (and run) Git pre-commit hooks
  4. If desired, activate the default Pixi environment

For example, after installing git (if necessary) and pixi:

git clone git@github.com:MAAP-Project/gedi-subsetter.git
cd gedi-subsetter
pixi run lint  # install and run pre-commit hooks
pixi shell     # if desired, activate the default Pixi environment

Testing

Successfully running linting and testing locally should ensure that the GitHub Actions workflow triggered by your PR will succeed.

Testing CMR Queries

We leverage the vcrpy library to record responses to HTTP/S requests. When running existing tests, these recordings (cassettes in vcrpy parlance) are replayed so that repeated test executions do not make repeated requests. Therefore, if you are not adding or modifying such tests, there is no need to have a network connection, nor any need to run the tests within the ADE.

However, since we currently use the maap-py library for CMR queries, adding new tests that make CMR queries (or modifying existing ones) will not only require a network connection in order to record live responses, but will also require that you obtain such recordings by running the new/modified tests within the ADE in order to have the necessary auth in play. Otherwise, the CMR queries will either fail or produce incorrect responses.

Linting and Running Unit Tests

Linting runs a number of checks on various files, such as making sure your code adheres to coding conventions, among other things. To "lint" the files in the repo, as well as run unit tests, run the following:

pixi run lint  # check formatting and code conventions
pixi run mypy  # perform static type checks
pixi run test  # run unit tests

If you have activated the default Pixi shell via pixi shell, you can either use the commands above, or you can use these instead:

pre-commit  # check formatting and code conventions
mypy        # perform static type checks
pytest      # run unit tests

If you see any errors, address them and repeat the process until there are no errors.

Locally Running GitHub Actions Workflows

Optionally, you may wish to locally test that the build for your future PR will succeed. To do so, you can use act to locally run GitHub Actions workflows. After installing act, run the following command from the root of the repo:

act

NOTE: act uses a Docker container, so this will NOT work within the ADE. You must use act in an environment where Docker is installed.

The command above will initially take several minutes, but subsequent runs should execute more quickly because only the first run must pull the act Docker image.

Locally Testing CWL

In the MAAP Hub, job execution uses CWL. This can be tested locally, to some extent, by using cwltool to run the GEDI Subsetter locally, as follows:

pixi run cwltool subset.cwl ARG ...

where ARG ... are arguments required by the Subsetter. To see the available arguments, run the command above without any arguments.

For example:

pixi run cwltool subset.cwl --aoi input/GAB-ADM0.geojson --doi L4A --columns agbd --limit 5

Note

For the --aoi option, you may run the following commands to obtain the .geojson file shown in the command above, if you don't already have a .geojson file handy for testing (git is already configured to ignore the input directory):

mkdir input
wget -q -O input/GAB-ADM0.geojson https://github.com/wmgeolab/geoBoundaries/raw/9f8c9e0f3aa13c5d07efaf10a829e3be024973fa/releaseData/gbOpen/GAB/ADM0/geoBoundaries-GAB-ADM0.geojson

This will run the Subsetter within a Docker container (which is automatically built as part of the pixi run command above), but it will fail with an "HTTPError: 401 Client Error" because MAAP_PGT is not set. This indicates that it is attempting to fetch temporary AWS credentials via the MAAP API, but fails without having a valid value for the MAAP_PGT environment variable.

However, if you obtain a MAAP_PGT value from your MAAP Profile page, export it as an environment variable, then add --preserve-environment MAAP_PGT immediately following cwltool in the pixi command above, then the Subsetter should be able to fetch temporary credentials. It should then fail with a PermissionError: Forbidden indicating that it attempted to read from S3 (using the temporary credentials it obtained), which is forbidden when running on a machine that is not within the AWS cloud in the us-west-2 region:

export MAAP_PGT="<token obtained from your MAAP Profile page>"
pixi run cwltool --preserve-environment MAAP_PGT \
   subset.cwl --aoi input/GAB-ADM0.geojson --doi L4A --columns agbd --limit 5

Warning

When running cwltool via the command given above, you might encounter the following error:

Error response from daemon: pull access denied for gedi-subset, repository
does not exist or may require 'docker login': denied: requested access to the
resource is denied

You should be able to ignore this error. Rerunning the cwltool command should succeed (up to the point that the Subsetter itself is expected to fail locally, as noted above).

Submitting a Pull Request

To work on a feature or bug fix, you'll generally want to follow these steps:

  1. Checkout and pull the lastest changes from the main branch.
  2. Create a new branch from the main branch.
  3. Add your desired code and/or configuration changes.
  4. Add appropriate entries to the Changelog, according to the Keep a Changelog convention. See existing sections in the Changelog for guidance on structure and format. In general, you should add entries under the Unreleased section at the top. A release manager will relable the Unreleased section to the appropriate release number upon the next release.
  5. Register a version of the algorithm (see next section).
  6. Test your registered version, and repeat as necessary.
  7. Once you're satsified with your changes, delete your registered version.
  8. Submit a PR to the GitHub repository, targeting the main branch.

Registering a Version of the Algorithm

To register a new version of the algorithm, you must do so within the ADE, in order to obtain automatic authorization. If you have not been using the ADE for development, but want to register the algorithm, within the ADE you must clone and/or pull the latest code from the branch from which you want to register the algorithm.

Then (again, within the ADE), simply run the following to register the algorithm configured in algorithm_config.yaml:

bin/algo/register

When on the main branch (typically only after creating a release of the algorithm, as described in the next section), and the current commit (HEAD) is tagged, the script will check whether or not the value of algorithm_version in the YAML file matches the value of the git tag. If so, the YAML file will be registered as-is. If not, the script will report the version-tag mismatch. A match is expected when registering from the main branch, as that's where tagging/releasing should take place.

However, you will likely want to register a version of the algorithm from another branch when testing your changes on the branch, before opening a Pull Request. In this case, when registering from another branch, the script ignores the value of algorithm_version in the YAML file, and the script will instead use the name of the current branch as the algorithm version during registration (the YAML file is not modified).

Upon successful registration, you should see output similar to the following (abridged):

{
  "code": 200,
  "message": {
    "id": "...",
    ...,
    "title": "Registering algorithm: gedi-subset",
    "message": "Registering algorithm: gedi-subset",
    ...,
    "status": "created",
    ...,
    "job_web_url": "https://repo.maap-project.org/root/register-job-hysds-v4/-/jobs/*****",
    "job_log_url": "https://repo.maap-project.org/root/register-job-hysds-v4/-/jobs/*****/raw"
  }
}

This indicates that the registration succeeded (code 200), and that the image for the algorithm is being built. To see the progress of the build, open a browser to the "job_web_url" value shown in your output. Note that although you may see a "success" response (as shown above), that simply indicates that registration was successfully initiated, meaning that an image build was successfully triggered. The image build process may fail, so it is important to make sure the build succeeds. If it does, then the new version of the algorithm should be visible in the Algorithm list on the form shown in the ADE after choosing Jobs > Submit Jobs menu item.

If the corresponding output shown above shows an error, or it succeeds, but the image build fails, analyze the error message from the failed registration or failed build. If the output does not provide the information you need to correct the problem, reach out to the platform team for assistance.

Once the registration build succeeds, you may submit jobs against the algorithm.

For unreleased versions, once you're satisified that your unreleased version of the algorithm works properly, you should delete it as follows:

bin/algo/delete

Then create a Pull Request against the main branch. If you need to make adjustments to your branch, you can rerun registration to replace your unreleased version of the algorithm as often as necessary until you're satisfied.

Creating a Release

After one or more Pull Requests have landed on the main branch to constitute a new release:

  1. Checkout the latest changes to the main branch.
  2. Create a new branch named release-<VERSION>, where <VERSION> is an appropriate version number for the changes being made, according to Semantic Versioning.
  3. In algorithm_config.yaml and algorithm_config_cwl.yaml change the value of algorithm_version to the same value as <VERSION> from the previous step.
  4. In the Changelog, immediately below the Unreleased heading add a new heading (at the same level) using the format [<VERSION>] (<YYYY-MM-DD>) (including the square brackets and parentheses), where <VERSION> is as above, and <YYYY-MM-DD> is the expected release date (which might not be the actual release date, depending on the PR approval process).
  5. Also in the Changelog, at the bottom, add a link to the new release to the existing list of release links (in order), and update the Unreleased link accordingly as well.
  6. Commit the changes, and open a Pull Request to main.
  7. Once the PR is approved and merged, go to https://github.com/MAAP-Project/gedi-subsetter/releases/new
  8. Click the Choose a tag dropdown.
  9. In the input box that appears, enter the same value as the value of <VERSION> from previous steps, and click the Create a new tag label that appears immediately below the input box.
  10. In the Release title input, also enter the same value as the value of <VERSION> in the previous step.
  11. In the description text box, copy and paste the content of only the new version section you added earlier to the Changelog, excluding the new version heading (since it would be redundant with the release title).
  12. Click the Publish release button.
  13. Checkout and pull the main branch in order to pull down the new tag created by the release process.
  14. Register the new release of the algorithm as described in the previous section.