Broke infra/dcp terraform script up into smaller modules#36
Broke infra/dcp terraform script up into smaller modules#36dwnoble merged 5 commits intodatacommonsorg:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the Data Commons Platform infrastructure into a modular architecture, separating the DCP and Custom Data Commons (CDC) stacks and introducing a central orchestration module. Feedback focuses on improving infrastructure stability and security, specifically recommending against the use of 'force_destroy' on data buckets, the ':latest' container tag, and authoritative IAM bindings. Additionally, the use of 'timestamp()' in environment variables was identified as a cause for unnecessary resource redeployments.
| resource "google_storage_bucket" "data_bucket" { | ||
| name = local.gcs_data_bucket_name | ||
| location = var.gcs_data_bucket_location | ||
| force_destroy = true |
There was a problem hiding this comment.
|
|
||
| template { | ||
| containers { | ||
| image = "gcr.io/datcom-ci/datacommons-ingestion-helper:latest" |
There was a problem hiding this comment.
Using the :latest tag for container images can lead to non-deterministic deployments and makes it difficult to track which version of the code is currently running. It is recommended to use a specific version tag or an image digest for production-ready infrastructure.
image = "gcr.io/datcom-ci/datacommons-ingestion-helper:stable"
| resource "google_cloud_run_service_iam_binding" "public_invoker" { | ||
| count = var.make_service_public ? 1 : 0 | ||
| location = google_cloud_run_v2_service.dcp_service.location | ||
| service = google_cloud_run_v2_service.dcp_service.name | ||
| role = "roles/run.invoker" | ||
| members = ["allUsers"] | ||
| } |
There was a problem hiding this comment.
Using google_cloud_run_service_iam_binding is authoritative and will overwrite any existing IAM policy on the service. It is generally safer to use google_cloud_run_v2_service_iam_member (non-authoritative) to manage access. Additionally, per repository rules, public access to Cloud Run services must be controlled by the make_dc_web_service_public boolean variable.
resource "google_cloud_run_v2_service_iam_member" "public_invoker" {
count = var.make_dc_web_service_public ? 1 : 0
location = google_cloud_run_v2_service.dcp_service.location
name = google_cloud_run_v2_service.dcp_service.name
role = "roles/run.invoker"
member = "allUsers"
}
References
- Public access to Cloud Run services should be controlled by a boolean variable (e.g., make_dc_web_service_public) that defaults to false to avoid unintentional public exposure.
| }, | ||
| { | ||
| name = "FORCE_RESTART" | ||
| value = "${timestamp()}" |
There was a problem hiding this comment.
Using timestamp() in an environment variable causes the resource to be updated on every terraform apply, even if no other changes are present. This leads to unnecessary redeployments of the Cloud Run services. Consider using a build ID or a specific version variable to trigger redeployments only when the image actually changes.
gmechali
left a comment
There was a problem hiding this comment.
Left a couple comments - thanks for the modularization this is such an improvement!
| @@ -0,0 +1,283 @@ | |||
| data "google_compute_network" "default" { | |||
There was a problem hiding this comment.
Im trying towrap my head around what this stack module represents, and I think what im seeing is that the stack is the control for how we recommend setting it up on the basis for the CDC or DCP setup.
Im trying to think through if there's a cleaner way to put the modules together, but I dont have an answer. Would love to chat a bit more about this specific part!
There was a problem hiding this comment.
The intent here is to tie everything from the other modules together while keeping the top-level main.tf as small as possible.
My end goal is to enable customers to deploy the entire stack by downloading a minimal main.tf file that references these remote modules
Refactored the
infra/dcpterraform folders by flattening the directory structure and breaking down thecdcanddcpmodules into a set of smaller sub-modules.Changes
infra/dcp/modules/.stack: Orchestrates sub-modules based on feature toggles (modules/stack).cdc_data_ingestion_job: Ingestion Cloud Run v2 Job.cdc_iam: IAM and Secret Manager config for CDC.cdc_mysql: Cloud SQL MySQL instance and databases.cdc_network: VPC and serverless access connectors.cdc_redis: Memorystore Redis instance.cdc_services: CDC Cloud Run v2 web services (modules/cdc_services).cdc_storage: GCS buckets for I/O.dcp_dataflow_ingestion: Dataflow and helper service for DCP.dcp_ingestion_workflow: Cloud Workflows for orchestration.dcp_service: DCP Cloud Run service.dcp_storage: GCS bucket for DCP pipeline.spanner: Shared Cloud Spanner instance and databases.Tested in the
datcom-website-devproject under thedanmodnamespace