Skip to content

Broke infra/dcp terraform script up into smaller modules#36

Merged
dwnoble merged 5 commits intodatacommonsorg:mainfrom
dwnoble:terraform-modularize
Apr 30, 2026
Merged

Broke infra/dcp terraform script up into smaller modules#36
dwnoble merged 5 commits intodatacommonsorg:mainfrom
dwnoble:terraform-modularize

Conversation

@dwnoble
Copy link
Copy Markdown
Contributor

@dwnoble dwnoble commented Apr 27, 2026

Refactored the infra/dcp terraform folders by flattening the directory structure and breaking down the cdc and dcp modules into a set of smaller sub-modules.

Changes

  • Flattening: Moved all sub-modules directly under infra/dcp/modules/.
  • stack: Orchestrates sub-modules based on feature toggles (modules/stack).
  • cdc_data_ingestion_job: Ingestion Cloud Run v2 Job.
  • cdc_iam: IAM and Secret Manager config for CDC.
  • cdc_mysql: Cloud SQL MySQL instance and databases.
  • cdc_network: VPC and serverless access connectors.
  • cdc_redis: Memorystore Redis instance.
  • cdc_services: CDC Cloud Run v2 web services (modules/cdc_services).
  • cdc_storage: GCS buckets for I/O.
  • dcp_dataflow_ingestion: Dataflow and helper service for DCP.
  • dcp_ingestion_workflow: Cloud Workflows for orchestration.
  • dcp_service: DCP Cloud Run service.
  • dcp_storage: GCS bucket for DCP pipeline.
  • spanner: Shared Cloud Spanner instance and databases.

Tested in the datcom-website-dev project under the danmod namespace

@dwnoble dwnoble requested a review from gmechali April 27, 2026 04:47
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the Data Commons Platform infrastructure into a modular architecture, separating the DCP and Custom Data Commons (CDC) stacks and introducing a central orchestration module. Feedback focuses on improving infrastructure stability and security, specifically recommending against the use of 'force_destroy' on data buckets, the ':latest' container tag, and authoritative IAM bindings. Additionally, the use of 'timestamp()' in environment variables was identified as a cause for unnecessary resource redeployments.

Comment thread infra/dcp/modules/cdc_storage/main.tf Outdated
resource "google_storage_bucket" "data_bucket" {
name = local.gcs_data_bucket_name
location = var.gcs_data_bucket_location
force_destroy = true
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Setting force_destroy = true on a data storage bucket is risky as it allows Terraform to delete the bucket even if it contains data. It is safer to set this to false or tie it to a deletion protection toggle to prevent accidental data loss.

  force_destroy = false


template {
containers {
image = "gcr.io/datcom-ci/datacommons-ingestion-helper:latest"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using the :latest tag for container images can lead to non-deterministic deployments and makes it difficult to track which version of the code is currently running. It is recommended to use a specific version tag or an image digest for production-ready infrastructure.

      image = "gcr.io/datcom-ci/datacommons-ingestion-helper:stable"

Comment on lines +67 to 73
resource "google_cloud_run_service_iam_binding" "public_invoker" {
count = var.make_service_public ? 1 : 0
location = google_cloud_run_v2_service.dcp_service.location
service = google_cloud_run_v2_service.dcp_service.name
role = "roles/run.invoker"
members = ["allUsers"]
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-medium medium

Using google_cloud_run_service_iam_binding is authoritative and will overwrite any existing IAM policy on the service. It is generally safer to use google_cloud_run_v2_service_iam_member (non-authoritative) to manage access. Additionally, per repository rules, public access to Cloud Run services must be controlled by the make_dc_web_service_public boolean variable.

resource "google_cloud_run_v2_service_iam_member" "public_invoker" {
  count    = var.make_dc_web_service_public ? 1 : 0
  location = google_cloud_run_v2_service.dcp_service.location
  name     = google_cloud_run_v2_service.dcp_service.name
  role     = "roles/run.invoker"
  member   = "allUsers"
}
References
  1. Public access to Cloud Run services should be controlled by a boolean variable (e.g., make_dc_web_service_public) that defaults to false to avoid unintentional public exposure.

},
{
name = "FORCE_RESTART"
value = "${timestamp()}"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using timestamp() in an environment variable causes the resource to be updated on every terraform apply, even if no other changes are present. This leads to unnecessary redeployments of the Cloud Run services. Consider using a build ID or a specific version variable to trigger redeployments only when the image actually changes.

Copy link
Copy Markdown
Contributor

@gmechali gmechali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments - thanks for the modularization this is such an improvement!

Comment thread infra/dcp/modules/cdc_redis/main.tf Outdated
Comment thread infra/dcp/modules/dcp_ingestion_dataflow/main.tf
Comment thread infra/dcp/modules/dcp_dataflow_ingestion/main.tf Outdated
Comment thread infra/dcp/modules/dcp_dataflow_ingestion/main.tf Outdated
Comment thread infra/dcp/modules/dcp_ingestion_workflow/outputs.tf
Comment thread infra/dcp/modules/dcp_service/main.tf
Comment thread infra/dcp/modules/dcp_storage/main.tf Outdated
@@ -0,0 +1,283 @@
data "google_compute_network" "default" {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im trying towrap my head around what this stack module represents, and I think what im seeing is that the stack is the control for how we recommend setting it up on the basis for the CDC or DCP setup.

Im trying to think through if there's a cleaner way to put the modules together, but I dont have an answer. Would love to chat a bit more about this specific part!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent here is to tie everything from the other modules together while keeping the top-level main.tf as small as possible.

My end goal is to enable customers to deploy the entire stack by downloading a minimal main.tf file that references these remote modules

@dwnoble dwnoble requested a review from gmechali April 30, 2026 19:03
Copy link
Copy Markdown
Contributor

@gmechali gmechali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Dan!

@dwnoble dwnoble added this pull request to the merge queue Apr 30, 2026
Merged via the queue into datacommonsorg:main with commit 7465bf5 Apr 30, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants