Conversation
This bash script replaces the use of Sling for CSV export.
Why?
1. Sling introduces upstream security concerns that we must constantly
respond to.
2. The sling python library does not import well in our multi-platform
(Mac, Linux, Windows) Docker stack.
3. The procedss is *slow*. Many hours?
This runs in 15m locally; ymmv in the cloud.
It takes a list of endpoints from an API, and does the following query
against each:
local query="\COPY "
query+="( SELECT * FROM ${API_VERSION}.${endpoint} "
query+=" WHERE report_id in ( "
query+=" SELECT report_id from ${API_VERSION}.general "
query+=" WHERE fac_accepted_date >= '${start_date}' "
query+=" AND fac_accepted_date <= '${end_date}' "
query+=")) "
query+="TO '${ROOT}/${endpoint}.csv' "
query+="WITH (FORMAT CSV, HEADER, DELIMITER ',');"
This creates a CSV file in the filesystem that we then use the aws-cli
to ship over to S3. By putting the file in the correct place, the static
site can pick up the CSVs.
Initial checks suggest this is exporting data in a manner consistent
with the previous tooling.
TODO:
- [ ] Test in `staging`
- [ ] Remove the sling tooling
- [ ] Modify the GH action to use this
|
Terraform plan for meta No changes. Your infrastructure matches the configuration.✅ Plan applied in Deploy to Development and Meta Environments #1159 |
|
Terraform plan for dev Plan: 1 to add, 0 to change, 1 to destroy.Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
-/+ destroy and then create replacement
Terraform will perform the following actions:
# module.dev.module.cors.null_resource.cors_header must be replaced
-/+ resource "null_resource" "cors_header" {
!~ id = "******************" -> (known after apply)
!~ triggers = { # forces replacement
!~ "always_run" = "2026-04-09T22:27:29Z" -> (known after apply)
}
}
Plan: 1 to add, 0 to change, 1 to destroy.❌ Plan not applied in Deploy to Development and Meta Environments #1159 (Plan has changed) |
asteel-gsa
left a comment
There was a problem hiding this comment.
Approved.
Tested this on call in both preview and staging. ~15 min run time compared to current csv export (7.5h)
|
Before this ships, if it is not too difficult, it would be good to include a dump of This would be valuable for upstream consumers of the data. We have some Federal partners who are pulling the CSVs as an import, and having to compute the It will require an update to the public pages (so we link to the combined CSV), but it should be a straight-forward add. |
Minimum allowed line rate is |
jperson1
left a comment
There was a problem hiding this comment.
Tested manually once more in Preview, and moving forward with confidence in the scheduled command in the lower environments.
Addresses #5227
In this PR
Testing
This can be tested locally. It was tested w/ @asteel-gsa and @jperson1 as reviewers in
previewandstaging, and we see full E2E for dump->S3, and links route correctly from static.Local testing:
cd backendmake docker-first-run && docker compose updocker compose exec web /bin/bashENV=LOCAL ./util/csv-export-to-s3/csv-export-to-s3.bashPR Checklist: Submitter
maininto your branch shortly before creating the PR. (You should also be mergingmaininto your branch regularly during development.)git status | grep migrations. If there are any results, you probably need to add them to the branch for the PR. Your PR should have only one new migration file for each of the component apps, except in rare circumstances; you may need to delete some and re-runpython manage.py makemigrationsto reduce the number to one. (Also, unless in exceptional circumstances, your PR should not delete any migration files.)PR Checklist: Reviewer
make docker-clean; make docker-first-run && docker compose up; then rundocker compose exec web /bin/bash -c "python manage.py test"The larger the PR, the stricter we should be about these points.
Pre Merge Checklist: Merger
-/+ resource "null_resource" "cors_header"should be destroying and recreating its self and~ resource "cloudfoundry_app" "clamav_api"might be updating itssha256for thefac-file-scannerandfac-av-${ENV}by default.main.