Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
273 changes: 273 additions & 0 deletions toc/rfc/rfc-draft-buildpacks-s3-bucket-namespacing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,273 @@
# Meta
[meta]: #meta
- Name: Buildpacks S3 Bucket Namespacing Strategy
- Start Date: 2026-01-09
- Author(s): @ramonskie
- Status: Draft
- RFC Pull Request: (fill in with PR link after you submit it)


## Summary

Address the pollution and lack of proper namespacing in the `buildpacks.cloudfoundry.org` S3 bucket by implementing a structured namespacing strategy for BOSH release blobs. This RFC proposes two options: (1) implement proper namespacing within the existing shared bucket using BOSH's `folder_name` configuration, or (2) migrate to dedicated per-buildpack S3 buckets.

## Problem

The `buildpacks.cloudfoundry.org` S3 bucket currently suffers from significant organizational issues that impact maintainability, auditing, and operational efficiency:

### Current State

- **Total Files:** 34,543 files, 2.32 TB
- **UUID Files in Root:** 4,351 files (12.6% of total) with UUID naming (e.g., `29a57fe6-b667-4261-6725-124846b7bb47`)
- **No Human-Readable Names:** Blob files are stored with UUID-only names in the S3 bucket root
- **Poor Discoverability:** Impossible to identify file contents without cross-referencing BOSH release `config/blobs.yml` files
- **Orphaned Blobs:** 770 identified orphaned blobs (~91% from July 2024 CDN migration) that are not tracked in any current git repository

### Root Cause

BOSH CLI's blob storage mechanism uses content-addressable storage with UUIDs as S3 object keys:

```yaml
# config/final.yml in buildpack BOSH releases
blobstore:
provider: s3
options:
bucket_name: buildpacks.cloudfoundry.org
```

When `bosh upload-blobs` runs, BOSH:
1. Generates a UUID for each blob as the S3 object key
2. Uploads the blob to S3 using only the UUID as the filename
3. Tracks the UUID-to-filename mapping in `config/blobs.yml`:

```yaml
# config/blobs.yml
java-buildpack/java-buildpack-v4.77.0.zip:
size: 253254
object_id: 29a57fe6-b667-4261-6725-124846b7bb47
sha: abc123...
```

**Result:** Human-readable names exist only in git repositories; S3 contains only UUIDs.

### Impact

1. **Operational Difficulty:** Bucket browsing requires downloading and inspecting files or cross-referencing multiple git repositories
2. **Orphan Detection:** No automated way to identify unused blobs when `blobs.yml` diverges from S3
3. **Audit Challenges:** Cannot identify file types, owners, or purposes without external mappings
4. **Cost Inefficiency:** Potentially storing obsolete blobs (estimated 30-40% orphan rate in some analyses)
5. **Multi-Team Collision:** Multiple buildpack teams share the same flat namespace, increasing collision risk and confusion
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The technical separation makes sense. But I don't think that there is a multi-team collision. All buildpacks are handled by the same team / WG area "Buildpacks and Stacks": https://github.com/cloudfoundry/community/blob/main/toc/working-groups/app-runtime-interfaces.md?plain=1#L75

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only creating a bosh release from a tag will create a issue. but still possible with minimal change.
when checking out a tag. the user should only add a folder_name: xxx in the config/final.yml

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just adding more context. The final release tarballs are available on bosh.io (e.g. https://bosh.io/releases/github.com/cloudfoundry/java-buildpack-release?all=1) uploaded by this task, that is why they are not impacted. If you want to build from source for an older tag will require the workaround @ramonskie mentioned above.


### Examples of Affected Repositories

The investigation identified that UUID blobs are created by buildpack BOSH release repositories:

**Buildpack BOSH Releases (13 repositories):**
- ruby-buildpack-release
- java-buildpack-release
- python-buildpack-release
- nodejs-buildpack-release
- go-buildpack-release
- php-buildpack-release
- dotnet-core-buildpack-release
- staticfile-buildpack-release
- binary-buildpack-release
- nginx-buildpack-release
- r-buildpack-release
- hwc-buildpack-release
- java-offline-buildpack-release

**Note:** This RFC focuses on buildpack BOSH releases only. Infrastructure BOSH releases (diego, capi, routing, garden-runc) that may also use this bucket are out of scope for this proposal.

## Proposal

Implement proper namespacing for BOSH blobs in the buildpacks S3 bucket to improve organization, discoverability, and maintainability. Two options are proposed:

### Option 1: Folder-Based Namespacing

Implement BOSH's built-in folder namespacing feature by adding `folder_name` configuration to each buildpack BOSH release.

#### Implementation

**Step 1: Update BOSH Release Configuration**

Modify `config/final.yml` in each buildpack BOSH release:

```yaml
# config/final.yml
---
blobstore:
provider: s3
options:
bucket_name: buildpacks.cloudfoundry.org
folder_name: ruby-buildpack # Add this line with buildpack-specific name
```

**Naming Convention for `folder_name`:**
- Use the BOSH release repository name without `-release` suffix
- Examples: `ruby-buildpack`, `java-buildpack`, `nodejs-buildpack`

**Step 2: Migrate Existing Blobs**

For each buildpack BOSH release:

1. Clone the release repository and extract current blob UUIDs from `config/blobs.yml`
2. Copy each blob to its new namespaced location:
```bash
# Example for ruby-buildpack
aws s3 cp \
s3://buildpacks.cloudfoundry.org/29a57fe6-b667-4261-6725-124846b7bb47 \
s3://buildpacks.cloudfoundry.org/ruby-buildpack/29a57fe6-b667-4261-6725-124846b7bb47
```
3. Keep original files in root temporarily for rollback capability
4. After successful verification (30-day grace period), archive or delete root-level blobs
Comment on lines +115 to +123
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have two questions:

  1. Are DNS updates for option 1 needed?
  2. How we will make sure that the migrated blobs are used and not the ones in the root folder? We could use bucket polices, ACLs, etc. but we need to make sure that no root blobs are used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • i do not think dns updates are necessary
  • we would change the bosh-release config/final.yml


**Step 3: Move orphaned Blobs**

the orphaned blobs that are left would be moved to a folder named `orphaned`
we would set a retention for this for 3 months

if no one complains in the next 3 months
it would be safe to assume that we can delete these blobs

**Step 4: Update CI/CD Pipelines**

Update `buildpacks-ci` task scripts:
- Modify `tasks/cf-release/create-buildpack-dev-release/run` to use updated `config/final.yml`
- Ensure `bosh upload-blobs` respects new folder configuration
- Update any direct S3 access scripts to use new paths

**Step 5: Verification & Rollback**

- Test blob access with updated configuration in staging/dev environments
- Monitor BOSH release builds for 30 days
- Keep original root-level blobs for rollback during grace period
- After successful verification, move root-level blobs to archive or delete

#### Pros

- ✅ **Native BOSH Support:** Uses built-in BOSH functionality, no custom tooling required
- ✅ **Minimal Infrastructure Change:** Same bucket, same permissions, same CDN
- ✅ **Clear Ownership:** Each folder represents one buildpack team's blobs
- ✅ **Backward Compatible:** Existing root-level blobs remain accessible during migration
- ✅ **Cost Effective:** No new infrastructure costs
- ✅ **Preserves BOSH Design:** Maintains content-addressable storage benefits (deduplication, immutability)
- ✅ **Easy Browsing:** S3 console shows organized folder structure
- ✅ **Orphan Detection:** Easier to identify unused blobs per buildpack

#### Cons

- ❌ **Shared Bucket Limitations:** Still requires coordination between teams for bucket policies
- ❌ **Breaking Change Potential:** May impact consumers if not carefully coordinated
- ❌ **Migration Effort:** Requires updating 13+ buildpack releases
- ❌ **Blob Duplication:** Existing blobs must be copied (not moved) during grace period, temporarily doubling storage
- ❌ **Multi-Repo Coordination:** Changes must be synchronized across multiple repositories
- ❌ **No Per-Team Access Control:** Cannot implement IAM policies for individual buildpack teams within shared bucket


#### Additional Consideration: Release Candidates Bucket

The `buildpack-release-candidates/` directory (1,099 files) could also be separated into its own dedicated bucket under Option 2. This directory contains pre-release buildpack versions organized by buildpack type:

```
buildpack-release-candidates/
├── apt/
├── binary/
├── dotnet-core/
├── go/
├── java/
├── nodejs/
├── php/
├── python/
├── ruby/
└── staticfile/
```

**Creating a separate bucket for release candidates would provide:**
- ✅ **Clear Lifecycle Separation:** Development artifacts isolated from production blobs
- ✅ **Independent Retention Policies:** Apply aggressive cleanup rules (e.g., 180-day expiration) without affecting production
- ✅ **Reduced Production Bucket Clutter:** Keep production bucket focused on finalized BOSH blobs
- ✅ **Simplified Access Control:** CI/CD systems can have different permissions for release candidates vs. production blobs

**Example bucket structure:**
```
buildpacks.cloudfoundry.org # Production BOSH blobs only
buildpacks-candidates.cloudfoundry.org # Pre-release buildpack packages
```

**Note:** This separation is **compatible with both Option 1 and Option 2**:
- **With Option 1:** Release candidates bucket + shared production bucket with folder namespacing
- **With Option 2:** Release candidates bucket + per-buildpack production buckets

The release candidates bucket would **not** require BOSH configuration changes, as these files are uploaded directly by CI/CD pipelines (`buildpacks-ci`), not via `bosh upload-blobs`.

## Comparison

| Aspect | Option 1: Folder Namespacing | Option 2: Dedicated Buckets |
|--------|------------------------------|----------------------------|
| **Infrastructure Changes** | Minimal (config only) | Major (13 buckets + DNS) |
| **Migration Complexity** | Medium | High |
| **Operational Overhead** | Low | High |
| **Team Isolation** | Logical (folders) | Physical (buckets) |
| **Access Control Granularity** | Bucket-level | Bucket-level per team |
| **Cost Impact** | ~$5-10/month temporary (migration grace period) | +~$20/month ongoing (request & monitoring overhead) |
| **Rollback Ease** | Easy (keep old files) | Difficult (multiple resources) |
| **BOSH Native Support** | ✅ Yes (`folder_name`) | ✅ Yes (`bucket_name`) |
| **Orphan Detection** | Easier (per folder) | Easiest (per bucket) |
| **Breaking Changes Risk** | Low | Medium-High |

## Cost Analysis

### Current State (Shared Bucket)
- **Storage:** ~2.32 TB = ~$53/month (at $0.023/GB S3 Standard)
- **Data Transfer OUT:** ~500 GB/month = ~$45/month (at $0.09/GB)
- **S3 Requests:** ~$0.50/month
- **Total:** ~$98.50/month

### Option 1: Folder Namespacing

**Migration Period (30 days):**
- Temporary blob duplication during grace period
- Additional storage: ~$5-10/month for 30 days
- **Migration cost:** ~$5-10 (one-time)

**Steady State:**
- Same storage structure (folders within 1 bucket)
- Same request costs
- Cleanup of orphaned blobs: **-$6/month savings**
- **Total: ~$92/month** (6% reduction)

### Option 2: Dedicated Buckets Per Buildpack

**Infrastructure:**
- 13 buildpack buckets
- 1 optional release candidates bucket
- **Total: 14 S3 buckets**

**Monthly Costs:**
- **Storage:** ~$53/month (same, just distributed)
- **Data Transfer OUT:** ~$45/month (same)
- **Additional S3 Request Overhead:** 14 buckets × $0.50 = **+$7/month**
- **CloudWatch Monitoring:** 14 buckets × $0.30 = **+$4.20/month**
- **Additional DNS/Certificate Management:** ~$1/month
- **Total: ~$110/month** (~12% increase)

**Cost Comparison:**
- **Option 1:** $92/month (after cleanup)
- **Option 2:** $110/month
- **Difference:** +$18/month (~+$216/year) for Option 2

## Additional Information

- **S3 Bucket Investigation Document:** [`buildpacks-ci/S3_BUCKET_INVESTIGATION.md`](https://github.com/cloudfoundry/buildpacks-ci/blob/cf-release/S3_BUCKET_INVESTIGATION.md)
- **UUID Mapper Tool:** Located in `https://github.com/cloudfoundry/buildpacks-ci/tree/cf-release/tools/uuid-mapper` repository, maps orphaned UUIDs to BOSH release repositories
- **July 2024 Bucket Migration Context:** [buildpacks-ci commit](https://github.com/cloudfoundry/buildpacks-ci/commit/XXXXXXX) - "Switch to using buildpacks.cloudfoundry.org bucket" for CFF CDN takeover
- **BOSH `folder_name` Documentation:** [BOSH Blobstore Docs](https://bosh.io/docs/release-blobs/)
- **Related RFC:** [RFC-0011: Move Buildpack Dependencies Repository to CFF](rfc-0011-move-buildpacks-dependencies-to-cff.md)

## Open Questions

1. **Who owns migration execution?** Should this be Application Runtime Interfaces WG or joint ownership with other teams?
2. **Budget approval:** Does the CFF budget accommodate temporary storage cost increase during migration grace period (~$5-10/month for 30 days)?
3. **Access control requirements:** Are there any per-team IAM access control requirements that would necessitate Option 2?
4. **Release candidates bucket separation:** Should the `buildpack-release-candidates/` directory (1,099 files) be moved to a dedicated `buildpacks-candidates.cloudfoundry.org` bucket to enable independent lifecycle management and cleanup policies?