Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion infra/dcp/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ terraform {
required_providers {
google = {
source = "hashicorp/google"
version = ">= 5.0"
version = ">= 5.11.0"
}
null = {
source = "hashicorp/null"
Expand Down Expand Up @@ -137,6 +137,7 @@ module "cdc" {
use_spanner = var.enable_dcp
spanner_instance_id = var.enable_dcp ? module.dcp[0].spanner_instance_id : ""
spanner_database_id = var.enable_dcp ? module.dcp[0].spanner_database_id : ""
workflow_name = var.enable_dcp && var.dcp_deploy_data_ingestion_workflow ? module.dcp[0].ingestion_orchestrator_name : ""
deletion_protection = var.deletion_protection

depends_on = [google_project_service.apis]
Expand Down
20 changes: 20 additions & 0 deletions infra/dcp/modules/cdc/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,26 @@ resource "google_cloud_run_v2_job" "dc_data_job" {
name = "INPUT_DIR"
value = "gs://${google_storage_bucket.data_bucket.name}/${var.gcs_data_bucket_input_folder}"
}
env {
name = "WORKFLOW_NAME"
value = var.workflow_name
}
env {
name = "PROJECT_ID"
value = var.project_id
}
env {
name = "WORKFLOW_LOCATION"
value = var.region
}
env {
name = "TEMP_LOCATION"
value = "gs://${google_storage_bucket.data_bucket.name}/temp"
}
env {
name = "REGION"
value = var.region
}
}
vpc_access {
connector = google_vpc_access_connector.connector.id
Expand Down
17 changes: 16 additions & 1 deletion infra/dcp/modules/cdc/service_account.tf
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ resource "google_project_iam_member" "datacommons_service_account_roles" {
"roles/vpcaccess.user",
"roles/iam.serviceAccountUser",
"roles/secretmanager.secretAccessor",
"roles/spanner.databaseUser"
"roles/spanner.databaseUser",
"roles/workflows.invoker"
Comment on lines +17 to +18
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

The logic for conditionally granting roles/spanner.databaseUser appears to be incorrect. The role is currently included in the static list and then concatenated again if var.use_spanner is false, meaning it is granted regardless of the setting. If Spanner access should be conditional, the role should be removed from the static list and the ternary logic should be corrected to var.use_spanner ? ["roles/spanner.databaseUser"] : [].

]), var.use_spanner ? [] : ["roles/spanner.databaseUser"])
project = var.project_id
member = "serviceAccount:${google_service_account.datacommons_service_account.email}"
Expand Down Expand Up @@ -46,3 +47,17 @@ resource "google_secret_manager_secret_version" "maps_api_key_version" {
secret = google_secret_manager_secret.maps_api_key[0].id
secret_data = local.maps_api_key
}

resource "google_storage_bucket_iam_member" "cdc_data_bucket_access" {
bucket = google_storage_bucket.data_bucket.name
role = "roles/storage.objectAdmin"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using roles/storage.objectAdmin grants broad permissions, including the ability to manage object ACLs and IAM policies. Following the principle of least privilege, roles/storage.objectUser is generally preferred for ingestion workflows as it provides full control over objects (create, read, update, delete) without administrative overhead on the bucket or object permissions.

  role   = "roles/storage.objectUser"

member = "serviceAccount:${google_service_account.datacommons_service_account.email}"
}

resource "google_storage_bucket_iam_member" "dataflow_bucket_access" {
bucket = google_storage_bucket.data_bucket.name
role = "roles/storage.objectAdmin"
member = "serviceAccount:${local.name_prefix}dcp-ingestion-sa@${var.project_id}.iam.gserviceaccount.com"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Constructing the service account email as a string literal creates a tight coupling between the cdc and dcp modules. This approach is brittle and will break if the naming convention or resource ID in the dcp module changes. It is better to define an input variable for this service account email in the cdc module and pass the value from the dcp module's output in the root main.tf.

}


2 changes: 1 addition & 1 deletion infra/dcp/modules/dcp/ingestion_helper.tf
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ resource "google_cloud_run_v2_service" "ingestion_helper" {
}
env {
name = "SPANNER_INSTANCE_ID"
value = var.spanner_instance_id
value = var.create_spanner_instance ? google_spanner_instance.main[0].name : var.spanner_instance_id
}
env {
name = "SPANNER_DATABASE_ID"
Expand Down
6 changes: 6 additions & 0 deletions infra/dcp/modules/dcp/outputs.tf
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,9 @@ output "data_ingestion_bucket_url" {
description = "GCS path to the dynamically provisioned bucket for customer custom MCF datasets"
value = var.deploy_data_ingestion_workflow && var.create_ingestion_bucket ? google_storage_bucket.data_ingestion_bucket[0].url : null
}

output "ingestion_orchestrator_name" {
description = "Short name of the Cloud Workflows ingestion orchestrator"
value = var.deploy_data_ingestion_workflow ? google_workflows_workflow.ingestion_orchestrator[0].name : null
}

1 change: 1 addition & 0 deletions infra/dcp/modules/dcp/spanner.tf
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ resource "google_spanner_instance" "main" {
display_name = var.create_spanner_instance ? (var.spanner_instance_id != "" ? "${local.name_prefix}${var.spanner_instance_id}" : "${local.name_prefix}dcp-instance") : var.spanner_instance_id
processing_units = var.spanner_processing_units
force_destroy = !var.deletion_protection
edition = "ENTERPRISE"

}

Expand Down
2 changes: 1 addition & 1 deletion infra/dcp/variables.tf
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ variable "dcp_spanner_database_id" {
variable "dcp_spanner_processing_units" {
description = "Spanner units for DCP"
type = number
default = 100
default = 1000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Increasing the default dcp_spanner_processing_units from 100 to 1000 (equivalent to 1 full Spanner node) significantly increases the baseline cost of the deployment. Environment variables that are intentionally specific to one service should be defined directly within that service's configuration rather than in a shared configuration file. If this capacity is only required for specific high-load ingestion scenarios, consider keeping a lower default and overriding it via environment-specific variables where needed.

References
  1. Environment variables that are intentionally specific to one service should be defined directly within that service's configuration rather than in a shared configuration file.

}

variable "dcp_service_cpu" {
Expand Down
Loading