diff --git a/toc/rfc/rfc-draft-buildpacks-s3-bucket-namespacing.md b/toc/rfc/rfc-draft-buildpacks-s3-bucket-namespacing.md new file mode 100644 index 000000000..2b8727182 --- /dev/null +++ b/toc/rfc/rfc-draft-buildpacks-s3-bucket-namespacing.md @@ -0,0 +1,273 @@ +# Meta +[meta]: #meta +- Name: Buildpacks S3 Bucket Namespacing Strategy +- Start Date: 2026-01-09 +- Author(s): @ramonskie +- Status: Draft +- RFC Pull Request: (fill in with PR link after you submit it) + + +## Summary + +Address the pollution and lack of proper namespacing in the `buildpacks.cloudfoundry.org` S3 bucket by implementing a structured namespacing strategy for BOSH release blobs. This RFC proposes two options: (1) implement proper namespacing within the existing shared bucket using BOSH's `folder_name` configuration, or (2) migrate to dedicated per-buildpack S3 buckets. + +## Problem + +The `buildpacks.cloudfoundry.org` S3 bucket currently suffers from significant organizational issues that impact maintainability, auditing, and operational efficiency: + +### Current State + +- **Total Files:** 34,543 files, 2.32 TB +- **UUID Files in Root:** 4,351 files (12.6% of total) with UUID naming (e.g., `29a57fe6-b667-4261-6725-124846b7bb47`) +- **No Human-Readable Names:** Blob files are stored with UUID-only names in the S3 bucket root +- **Poor Discoverability:** Impossible to identify file contents without cross-referencing BOSH release `config/blobs.yml` files +- **Orphaned Blobs:** 770 identified orphaned blobs (~91% from July 2024 CDN migration) that are not tracked in any current git repository + +### Root Cause + +BOSH CLI's blob storage mechanism uses content-addressable storage with UUIDs as S3 object keys: + +```yaml +# config/final.yml in buildpack BOSH releases +blobstore: + provider: s3 + options: + bucket_name: buildpacks.cloudfoundry.org +``` + +When `bosh upload-blobs` runs, BOSH: +1. Generates a UUID for each blob as the S3 object key +2. Uploads the blob to S3 using only the UUID as the filename +3. Tracks the UUID-to-filename mapping in `config/blobs.yml`: + +```yaml +# config/blobs.yml +java-buildpack/java-buildpack-v4.77.0.zip: + size: 253254 + object_id: 29a57fe6-b667-4261-6725-124846b7bb47 + sha: abc123... +``` + +**Result:** Human-readable names exist only in git repositories; S3 contains only UUIDs. + +### Impact + +1. **Operational Difficulty:** Bucket browsing requires downloading and inspecting files or cross-referencing multiple git repositories +2. **Orphan Detection:** No automated way to identify unused blobs when `blobs.yml` diverges from S3 +3. **Audit Challenges:** Cannot identify file types, owners, or purposes without external mappings +4. **Cost Inefficiency:** Potentially storing obsolete blobs (estimated 30-40% orphan rate in some analyses) +5. **Multi-Team Collision:** Multiple buildpack teams share the same flat namespace, increasing collision risk and confusion + +### Examples of Affected Repositories + +The investigation identified that UUID blobs are created by buildpack BOSH release repositories: + +**Buildpack BOSH Releases (13 repositories):** +- ruby-buildpack-release +- java-buildpack-release +- python-buildpack-release +- nodejs-buildpack-release +- go-buildpack-release +- php-buildpack-release +- dotnet-core-buildpack-release +- staticfile-buildpack-release +- binary-buildpack-release +- nginx-buildpack-release +- r-buildpack-release +- hwc-buildpack-release +- java-offline-buildpack-release + +**Note:** This RFC focuses on buildpack BOSH releases only. Infrastructure BOSH releases (diego, capi, routing, garden-runc) that may also use this bucket are out of scope for this proposal. + +## Proposal + +Implement proper namespacing for BOSH blobs in the buildpacks S3 bucket to improve organization, discoverability, and maintainability. Two options are proposed: + +### Option 1: Folder-Based Namespacing + +Implement BOSH's built-in folder namespacing feature by adding `folder_name` configuration to each buildpack BOSH release. + +#### Implementation + +**Step 1: Update BOSH Release Configuration** + +Modify `config/final.yml` in each buildpack BOSH release: + +```yaml +# config/final.yml +--- +blobstore: + provider: s3 + options: + bucket_name: buildpacks.cloudfoundry.org + folder_name: ruby-buildpack # Add this line with buildpack-specific name +``` + +**Naming Convention for `folder_name`:** +- Use the BOSH release repository name without `-release` suffix +- Examples: `ruby-buildpack`, `java-buildpack`, `nodejs-buildpack` + +**Step 2: Migrate Existing Blobs** + +For each buildpack BOSH release: + +1. Clone the release repository and extract current blob UUIDs from `config/blobs.yml` +2. Copy each blob to its new namespaced location: + ```bash + # Example for ruby-buildpack + aws s3 cp \ + s3://buildpacks.cloudfoundry.org/29a57fe6-b667-4261-6725-124846b7bb47 \ + s3://buildpacks.cloudfoundry.org/ruby-buildpack/29a57fe6-b667-4261-6725-124846b7bb47 + ``` +3. Keep original files in root temporarily for rollback capability +4. After successful verification (30-day grace period), archive or delete root-level blobs + +**Step 3: Move orphaned Blobs** + +the orphaned blobs that are left would be moved to a folder named `orphaned` +we would set a retention for this for 3 months + +if no one complains in the next 3 months +it would be safe to assume that we can delete these blobs + +**Step 4: Update CI/CD Pipelines** + +Update `buildpacks-ci` task scripts: +- Modify `tasks/cf-release/create-buildpack-dev-release/run` to use updated `config/final.yml` +- Ensure `bosh upload-blobs` respects new folder configuration +- Update any direct S3 access scripts to use new paths + +**Step 5: Verification & Rollback** + +- Test blob access with updated configuration in staging/dev environments +- Monitor BOSH release builds for 30 days +- Keep original root-level blobs for rollback during grace period +- After successful verification, move root-level blobs to archive or delete + +#### Pros + +- ✅ **Native BOSH Support:** Uses built-in BOSH functionality, no custom tooling required +- ✅ **Minimal Infrastructure Change:** Same bucket, same permissions, same CDN +- ✅ **Clear Ownership:** Each folder represents one buildpack team's blobs +- ✅ **Backward Compatible:** Existing root-level blobs remain accessible during migration +- ✅ **Cost Effective:** No new infrastructure costs +- ✅ **Preserves BOSH Design:** Maintains content-addressable storage benefits (deduplication, immutability) +- ✅ **Easy Browsing:** S3 console shows organized folder structure +- ✅ **Orphan Detection:** Easier to identify unused blobs per buildpack + +#### Cons + +- ❌ **Shared Bucket Limitations:** Still requires coordination between teams for bucket policies +- ❌ **Breaking Change Potential:** May impact consumers if not carefully coordinated +- ❌ **Migration Effort:** Requires updating 13+ buildpack releases +- ❌ **Blob Duplication:** Existing blobs must be copied (not moved) during grace period, temporarily doubling storage +- ❌ **Multi-Repo Coordination:** Changes must be synchronized across multiple repositories +- ❌ **No Per-Team Access Control:** Cannot implement IAM policies for individual buildpack teams within shared bucket + + +#### Additional Consideration: Release Candidates Bucket + +The `buildpack-release-candidates/` directory (1,099 files) could also be separated into its own dedicated bucket under Option 2. This directory contains pre-release buildpack versions organized by buildpack type: + +``` +buildpack-release-candidates/ +├── apt/ +├── binary/ +├── dotnet-core/ +├── go/ +├── java/ +├── nodejs/ +├── php/ +├── python/ +├── ruby/ +└── staticfile/ +``` + +**Creating a separate bucket for release candidates would provide:** +- ✅ **Clear Lifecycle Separation:** Development artifacts isolated from production blobs +- ✅ **Independent Retention Policies:** Apply aggressive cleanup rules (e.g., 180-day expiration) without affecting production +- ✅ **Reduced Production Bucket Clutter:** Keep production bucket focused on finalized BOSH blobs +- ✅ **Simplified Access Control:** CI/CD systems can have different permissions for release candidates vs. production blobs + +**Example bucket structure:** +``` +buildpacks.cloudfoundry.org # Production BOSH blobs only +buildpacks-candidates.cloudfoundry.org # Pre-release buildpack packages +``` + +**Note:** This separation is **compatible with both Option 1 and Option 2**: +- **With Option 1:** Release candidates bucket + shared production bucket with folder namespacing +- **With Option 2:** Release candidates bucket + per-buildpack production buckets + +The release candidates bucket would **not** require BOSH configuration changes, as these files are uploaded directly by CI/CD pipelines (`buildpacks-ci`), not via `bosh upload-blobs`. + +## Comparison + +| Aspect | Option 1: Folder Namespacing | Option 2: Dedicated Buckets | +|--------|------------------------------|----------------------------| +| **Infrastructure Changes** | Minimal (config only) | Major (13 buckets + DNS) | +| **Migration Complexity** | Medium | High | +| **Operational Overhead** | Low | High | +| **Team Isolation** | Logical (folders) | Physical (buckets) | +| **Access Control Granularity** | Bucket-level | Bucket-level per team | +| **Cost Impact** | ~$5-10/month temporary (migration grace period) | +~$20/month ongoing (request & monitoring overhead) | +| **Rollback Ease** | Easy (keep old files) | Difficult (multiple resources) | +| **BOSH Native Support** | ✅ Yes (`folder_name`) | ✅ Yes (`bucket_name`) | +| **Orphan Detection** | Easier (per folder) | Easiest (per bucket) | +| **Breaking Changes Risk** | Low | Medium-High | + +## Cost Analysis + +### Current State (Shared Bucket) +- **Storage:** ~2.32 TB = ~$53/month (at $0.023/GB S3 Standard) +- **Data Transfer OUT:** ~500 GB/month = ~$45/month (at $0.09/GB) +- **S3 Requests:** ~$0.50/month +- **Total:** ~$98.50/month + +### Option 1: Folder Namespacing + +**Migration Period (30 days):** +- Temporary blob duplication during grace period +- Additional storage: ~$5-10/month for 30 days +- **Migration cost:** ~$5-10 (one-time) + +**Steady State:** +- Same storage structure (folders within 1 bucket) +- Same request costs +- Cleanup of orphaned blobs: **-$6/month savings** +- **Total: ~$92/month** (6% reduction) + +### Option 2: Dedicated Buckets Per Buildpack + +**Infrastructure:** +- 13 buildpack buckets +- 1 optional release candidates bucket +- **Total: 14 S3 buckets** + +**Monthly Costs:** +- **Storage:** ~$53/month (same, just distributed) +- **Data Transfer OUT:** ~$45/month (same) +- **Additional S3 Request Overhead:** 14 buckets × $0.50 = **+$7/month** +- **CloudWatch Monitoring:** 14 buckets × $0.30 = **+$4.20/month** +- **Additional DNS/Certificate Management:** ~$1/month +- **Total: ~$110/month** (~12% increase) + +**Cost Comparison:** +- **Option 1:** $92/month (after cleanup) +- **Option 2:** $110/month +- **Difference:** +$18/month (~+$216/year) for Option 2 + +## Additional Information + +- **S3 Bucket Investigation Document:** [`buildpacks-ci/S3_BUCKET_INVESTIGATION.md`](https://github.com/cloudfoundry/buildpacks-ci/blob/cf-release/S3_BUCKET_INVESTIGATION.md) +- **UUID Mapper Tool:** Located in `https://github.com/cloudfoundry/buildpacks-ci/tree/cf-release/tools/uuid-mapper` repository, maps orphaned UUIDs to BOSH release repositories +- **July 2024 Bucket Migration Context:** [buildpacks-ci commit](https://github.com/cloudfoundry/buildpacks-ci/commit/XXXXXXX) - "Switch to using buildpacks.cloudfoundry.org bucket" for CFF CDN takeover +- **BOSH `folder_name` Documentation:** [BOSH Blobstore Docs](https://bosh.io/docs/release-blobs/) +- **Related RFC:** [RFC-0011: Move Buildpack Dependencies Repository to CFF](rfc-0011-move-buildpacks-dependencies-to-cff.md) + +## Open Questions + +1. **Who owns migration execution?** Should this be Application Runtime Interfaces WG or joint ownership with other teams? +2. **Budget approval:** Does the CFF budget accommodate temporary storage cost increase during migration grace period (~$5-10/month for 30 days)? +3. **Access control requirements:** Are there any per-team IAM access control requirements that would necessitate Option 2? +4. **Release candidates bucket separation:** Should the `buildpack-release-candidates/` directory (1,099 files) be moved to a dedicated `buildpacks-candidates.cloudfoundry.org` bucket to enable independent lifecycle management and cleanup policies?