- Development Setup
- Testing
- Submitting a Pull Request
- Registering a Version of the Algorithm
- Creating a Release
To contribute to this work, you must obtain access to the NASA MAAP, where the Algorithm Development Environment (ADE) resides, and thus where algorithms can be registered and launched.
You'll need to have the following installed in your development environment (wherever you plan to conduct development, within the ADE or not):
git(already installed in the ADE): On macOS, using Homebrew is highly recommended:brew install git. Otherwise, see https://git-scm.com/downloads.pixi: See Pixi Installation (Hint: if you want to use thecurlcommand, but don't havecurlinstalled, but you havewgetinstalled, you can replacecurl -fsSLwithwget -qO-[note the capital letterOand the trailing dash])
To prepare for contributing, do the following in your development environment (see example commands below):
- Clone this GitHub repository.
- Change directory to the cloned working directory.
- Install (and run) Git pre-commit hooks
- If desired, activate the default Pixi environment
For example, after installing git (if necessary) and pixi:
git clone git@github.com:MAAP-Project/gedi-subsetter.git
cd gedi-subsetter
pixi run lint # install and run pre-commit hooks
pixi shell # if desired, activate the default Pixi environment
Successfully running linting and testing locally should ensure that the GitHub Actions workflow triggered by your PR will succeed.
We leverage the vcrpy library to record responses to HTTP/S requests. When
running existing tests, these recordings (cassettes in vcrpy parlance) are
replayed so that repeated test executions do not make repeated requests.
Therefore, if you are not adding or modifying such tests, there is no need to
have a network connection, nor any need to run the tests within the ADE.
However, since we currently use the maap-py library for CMR queries, adding
new tests that make CMR queries (or modifying existing ones) will not only
require a network connection in order to record live responses, but will also
require that you obtain such recordings by running the new/modified tests within
the ADE in order to have the necessary auth in play. Otherwise, the CMR queries
will either fail or produce incorrect responses.
Linting runs a number of checks on various files, such as making sure your code adheres to coding conventions, among other things. To "lint" the files in the repo, as well as run unit tests, run the following:
pixi run lint # check formatting and code conventions
pixi run mypy # perform static type checks
pixi run test # run unit tests
If you have activated the default Pixi shell via pixi shell, you can either
use the commands above, or you can use these instead:
pre-commit # check formatting and code conventions
mypy # perform static type checks
pytest # run unit tests
If you see any errors, address them and repeat the process until there are no errors.
Optionally, you may wish to locally test that the build for your future PR will
succeed. To do so, you can use act to locally run
GitHub Actions workflows. After installing act, run the following command
from the root of the repo:
act
NOTE: act uses a Docker container, so this will NOT work within the ADE.
You must use act in an environment where Docker is installed.
The command above will initially take several minutes, but subsequent runs
should execute more quickly because only the first run must pull the act
Docker image.
In the MAAP Hub, job execution uses CWL. This can be tested locally, to some
extent, by using cwltool to run the GEDI Subsetter locally, as follows:
pixi run cwltool subset.cwl ARG ...
where ARG ... are arguments required by the Subsetter. To see the available
arguments, run the command above without any arguments.
For example:
pixi run cwltool subset.cwl --aoi input/GAB-ADM0.geojson --doi L4A --columns agbd --limit 5
Note
For the --aoi option, you may run the following commands to obtain the
.geojson file shown in the command above, if you don't already have a
.geojson file handy for testing (git is already configured to ignore the
input directory):
mkdir input
wget -q -O input/GAB-ADM0.geojson https://github.com/wmgeolab/geoBoundaries/raw/9f8c9e0f3aa13c5d07efaf10a829e3be024973fa/releaseData/gbOpen/GAB/ADM0/geoBoundaries-GAB-ADM0.geojson
This will run the Subsetter within a Docker container (which is automatically
built as part of the pixi run command above), but it will fail with an
"HTTPError: 401 Client Error" because MAAP_PGT is not set. This indicates
that it is attempting to fetch temporary AWS credentials via the MAAP API, but
fails without having a valid value for the MAAP_PGT environment variable.
However, if you obtain a MAAP_PGT value from your MAAP Profile page, export it
as an environment variable, then add --preserve-environment MAAP_PGT
immediately following cwltool in the pixi command above, then the Subsetter
should be able to fetch temporary credentials. It should then fail with a
PermissionError: Forbidden indicating that it attempted to read from S3
(using the temporary credentials it obtained), which is forbidden when running
on a machine that is not within the AWS cloud in the us-west-2 region:
export MAAP_PGT="<token obtained from your MAAP Profile page>"
pixi run cwltool --preserve-environment MAAP_PGT \
subset.cwl --aoi input/GAB-ADM0.geojson --doi L4A --columns agbd --limit 5
Warning
When running cwltool via the command given above, you might encounter the
following error:
Error response from daemon: pull access denied for gedi-subset, repository
does not exist or may require 'docker login': denied: requested access to the
resource is denied
You should be able to ignore this error. Rerunning the cwltool command
should succeed (up to the point that the Subsetter itself is expected to fail
locally, as noted above).
To work on a feature or bug fix, you'll generally want to follow these steps:
- Checkout and pull the lastest changes from the
mainbranch. - Create a new branch from the
mainbranch. - Add your desired code and/or configuration changes.
- Add appropriate entries to the Changelog, according to the
Keep a Changelog convention. See existing sections in the Changelog for
guidance on structure and format. In general, you should add entries under
the
Unreleasedsection at the top. A release manager will relable theUnreleasedsection to the appropriate release number upon the next release. - Register a version of the algorithm (see next section).
- Test your registered version, and repeat as necessary.
- Once you're satsified with your changes, delete your registered version.
- Submit a PR to the GitHub repository, targeting the
mainbranch.
To register a new version of the algorithm, you must do so within the ADE, in order to obtain automatic authorization. If you have not been using the ADE for development, but want to register the algorithm, within the ADE you must clone and/or pull the latest code from the branch from which you want to register the algorithm.
Then (again, within the ADE), simply run the following to register the algorithm
configured in algorithm_config.yaml:
bin/algo/register
When on the main branch (typically only after creating a release of the
algorithm, as described in the next section), and the current commit (HEAD) is
tagged, the script will check whether or not the value of algorithm_version in
the YAML file matches the value of the git tag. If so, the YAML file will be
registered as-is. If not, the script will report the version-tag mismatch. A
match is expected when registering from the main branch, as that's where
tagging/releasing should take place.
However, you will likely want to register a version of the algorithm from
another branch when testing your changes on the branch, before opening a Pull
Request. In this case, when registering from another branch, the script ignores
the value of algorithm_version in the YAML file, and the script will instead
use the name of the current branch as the algorithm version during registration
(the YAML file is not modified).
Upon successful registration, you should see output similar to the following (abridged):
{
"code": 200,
"message": {
"id": "...",
...,
"title": "Registering algorithm: gedi-subset",
"message": "Registering algorithm: gedi-subset",
...,
"status": "created",
...,
"job_web_url": "https://repo.maap-project.org/root/register-job-hysds-v4/-/jobs/*****",
"job_log_url": "https://repo.maap-project.org/root/register-job-hysds-v4/-/jobs/*****/raw"
}
}
This indicates that the registration succeeded (code 200), and that the image
for the algorithm is being built. To see the progress of the build, open a
browser to the "job_web_url" value shown in your output. Note that although
you may see a "success" response (as shown above), that simply indicates that
registration was successfully initiated, meaning that an image build was
successfully triggered. The image build process may fail, so it is important to
make sure the build succeeds. If it does, then the new version of the algorithm
should be visible in the Algorithm list on the form shown in the ADE after
choosing Jobs > Submit Jobs menu item.
If the corresponding output shown above shows an error, or it succeeds, but the image build fails, analyze the error message from the failed registration or failed build. If the output does not provide the information you need to correct the problem, reach out to the platform team for assistance.
Once the registration build succeeds, you may submit jobs against the algorithm.
For unreleased versions, once you're satisified that your unreleased version of the algorithm works properly, you should delete it as follows:
bin/algo/deleteThen create a Pull Request against the main branch. If you need to make
adjustments to your branch, you can rerun registration to replace your
unreleased version of the algorithm as often as necessary until you're
satisfied.
After one or more Pull Requests have landed on the main branch to constitute
a new release:
- Checkout the latest changes to the
mainbranch. - Create a new branch named
release-<VERSION>, where<VERSION>is an appropriate version number for the changes being made, according to Semantic Versioning. - In
algorithm_config.yamlandalgorithm_config_cwl.yamlchange the value ofalgorithm_versionto the same value as<VERSION>from the previous step. - In the Changelog, immediately below the
Unreleasedheading add a new heading (at the same level) using the format[<VERSION>] (<YYYY-MM-DD>)(including the square brackets and parentheses), where<VERSION>is as above, and<YYYY-MM-DD>is the expected release date (which might not be the actual release date, depending on the PR approval process). - Also in the Changelog, at the bottom, add a link to the new release to the existing list of release links (in order), and update the Unreleased link accordingly as well.
- Commit the changes, and open a Pull Request to
main. - Once the PR is approved and merged, go to https://github.com/MAAP-Project/gedi-subsetter/releases/new
- Click the Choose a tag dropdown.
- In the input box that appears, enter the same value as the value of
<VERSION>from previous steps, and click the Create a new tag label that appears immediately below the input box. - In the Release title input, also enter the same value as the value of
<VERSION>in the previous step. - In the description text box, copy and paste the content of only the new version section you added earlier to the Changelog, excluding the new version heading (since it would be redundant with the release title).
- Click the Publish release button.
- Checkout and pull the
mainbranch in order to pull down the new tag created by the release process. - Register the new release of the algorithm as described in the previous section.