-
Notifications
You must be signed in to change notification settings - Fork 232
buildpack s3 cleanup #1403
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ramonskie
wants to merge
4
commits into
cloudfoundry:main
Choose a base branch
from
ramonskie:main
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
buildpack s3 cleanup #1403
Changes from all commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,273 @@ | ||
| # Meta | ||
| [meta]: #meta | ||
| - Name: Buildpacks S3 Bucket Namespacing Strategy | ||
| - Start Date: 2026-01-09 | ||
| - Author(s): @ramonskie | ||
| - Status: Draft | ||
| - RFC Pull Request: (fill in with PR link after you submit it) | ||
|
|
||
|
|
||
| ## Summary | ||
|
|
||
| Address the pollution and lack of proper namespacing in the `buildpacks.cloudfoundry.org` S3 bucket by implementing a structured namespacing strategy for BOSH release blobs. This RFC proposes two options: (1) implement proper namespacing within the existing shared bucket using BOSH's `folder_name` configuration, or (2) migrate to dedicated per-buildpack S3 buckets. | ||
|
|
||
| ## Problem | ||
|
|
||
| The `buildpacks.cloudfoundry.org` S3 bucket currently suffers from significant organizational issues that impact maintainability, auditing, and operational efficiency: | ||
|
|
||
| ### Current State | ||
|
|
||
| - **Total Files:** 34,543 files, 2.32 TB | ||
| - **UUID Files in Root:** 4,351 files (12.6% of total) with UUID naming (e.g., `29a57fe6-b667-4261-6725-124846b7bb47`) | ||
| - **No Human-Readable Names:** Blob files are stored with UUID-only names in the S3 bucket root | ||
| - **Poor Discoverability:** Impossible to identify file contents without cross-referencing BOSH release `config/blobs.yml` files | ||
| - **Orphaned Blobs:** 770 identified orphaned blobs (~91% from July 2024 CDN migration) that are not tracked in any current git repository | ||
|
|
||
| ### Root Cause | ||
|
|
||
| BOSH CLI's blob storage mechanism uses content-addressable storage with UUIDs as S3 object keys: | ||
|
|
||
| ```yaml | ||
| # config/final.yml in buildpack BOSH releases | ||
| blobstore: | ||
| provider: s3 | ||
| options: | ||
| bucket_name: buildpacks.cloudfoundry.org | ||
| ``` | ||
|
|
||
| When `bosh upload-blobs` runs, BOSH: | ||
| 1. Generates a UUID for each blob as the S3 object key | ||
| 2. Uploads the blob to S3 using only the UUID as the filename | ||
| 3. Tracks the UUID-to-filename mapping in `config/blobs.yml`: | ||
|
|
||
| ```yaml | ||
| # config/blobs.yml | ||
| java-buildpack/java-buildpack-v4.77.0.zip: | ||
| size: 253254 | ||
| object_id: 29a57fe6-b667-4261-6725-124846b7bb47 | ||
| sha: abc123... | ||
| ``` | ||
|
|
||
| **Result:** Human-readable names exist only in git repositories; S3 contains only UUIDs. | ||
|
|
||
| ### Impact | ||
|
|
||
| 1. **Operational Difficulty:** Bucket browsing requires downloading and inspecting files or cross-referencing multiple git repositories | ||
| 2. **Orphan Detection:** No automated way to identify unused blobs when `blobs.yml` diverges from S3 | ||
| 3. **Audit Challenges:** Cannot identify file types, owners, or purposes without external mappings | ||
| 4. **Cost Inefficiency:** Potentially storing obsolete blobs (estimated 30-40% orphan rate in some analyses) | ||
| 5. **Multi-Team Collision:** Multiple buildpack teams share the same flat namespace, increasing collision risk and confusion | ||
|
|
||
| ### Examples of Affected Repositories | ||
|
|
||
| The investigation identified that UUID blobs are created by buildpack BOSH release repositories: | ||
|
|
||
| **Buildpack BOSH Releases (13 repositories):** | ||
| - ruby-buildpack-release | ||
| - java-buildpack-release | ||
| - python-buildpack-release | ||
| - nodejs-buildpack-release | ||
| - go-buildpack-release | ||
| - php-buildpack-release | ||
| - dotnet-core-buildpack-release | ||
| - staticfile-buildpack-release | ||
| - binary-buildpack-release | ||
| - nginx-buildpack-release | ||
| - r-buildpack-release | ||
| - hwc-buildpack-release | ||
| - java-offline-buildpack-release | ||
|
|
||
| **Note:** This RFC focuses on buildpack BOSH releases only. Infrastructure BOSH releases (diego, capi, routing, garden-runc) that may also use this bucket are out of scope for this proposal. | ||
|
|
||
| ## Proposal | ||
|
|
||
| Implement proper namespacing for BOSH blobs in the buildpacks S3 bucket to improve organization, discoverability, and maintainability. Two options are proposed: | ||
|
|
||
| ### Option 1: Folder-Based Namespacing | ||
|
|
||
| Implement BOSH's built-in folder namespacing feature by adding `folder_name` configuration to each buildpack BOSH release. | ||
|
|
||
| #### Implementation | ||
|
|
||
| **Step 1: Update BOSH Release Configuration** | ||
|
|
||
| Modify `config/final.yml` in each buildpack BOSH release: | ||
|
|
||
| ```yaml | ||
| # config/final.yml | ||
| --- | ||
| blobstore: | ||
| provider: s3 | ||
| options: | ||
| bucket_name: buildpacks.cloudfoundry.org | ||
| folder_name: ruby-buildpack # Add this line with buildpack-specific name | ||
| ``` | ||
|
|
||
| **Naming Convention for `folder_name`:** | ||
| - Use the BOSH release repository name without `-release` suffix | ||
| - Examples: `ruby-buildpack`, `java-buildpack`, `nodejs-buildpack` | ||
|
|
||
| **Step 2: Migrate Existing Blobs** | ||
|
|
||
| For each buildpack BOSH release: | ||
|
|
||
| 1. Clone the release repository and extract current blob UUIDs from `config/blobs.yml` | ||
| 2. Copy each blob to its new namespaced location: | ||
| ```bash | ||
| # Example for ruby-buildpack | ||
| aws s3 cp \ | ||
| s3://buildpacks.cloudfoundry.org/29a57fe6-b667-4261-6725-124846b7bb47 \ | ||
| s3://buildpacks.cloudfoundry.org/ruby-buildpack/29a57fe6-b667-4261-6725-124846b7bb47 | ||
| ``` | ||
| 3. Keep original files in root temporarily for rollback capability | ||
| 4. After successful verification (30-day grace period), archive or delete root-level blobs | ||
|
Comment on lines
+115
to
+123
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would have two questions:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
|
|
||
| **Step 3: Move orphaned Blobs** | ||
|
|
||
| the orphaned blobs that are left would be moved to a folder named `orphaned` | ||
| we would set a retention for this for 3 months | ||
|
|
||
| if no one complains in the next 3 months | ||
| it would be safe to assume that we can delete these blobs | ||
|
|
||
| **Step 4: Update CI/CD Pipelines** | ||
|
|
||
| Update `buildpacks-ci` task scripts: | ||
| - Modify `tasks/cf-release/create-buildpack-dev-release/run` to use updated `config/final.yml` | ||
| - Ensure `bosh upload-blobs` respects new folder configuration | ||
| - Update any direct S3 access scripts to use new paths | ||
|
|
||
| **Step 5: Verification & Rollback** | ||
|
|
||
| - Test blob access with updated configuration in staging/dev environments | ||
| - Monitor BOSH release builds for 30 days | ||
| - Keep original root-level blobs for rollback during grace period | ||
| - After successful verification, move root-level blobs to archive or delete | ||
|
|
||
| #### Pros | ||
|
|
||
| - ✅ **Native BOSH Support:** Uses built-in BOSH functionality, no custom tooling required | ||
| - ✅ **Minimal Infrastructure Change:** Same bucket, same permissions, same CDN | ||
| - ✅ **Clear Ownership:** Each folder represents one buildpack team's blobs | ||
| - ✅ **Backward Compatible:** Existing root-level blobs remain accessible during migration | ||
| - ✅ **Cost Effective:** No new infrastructure costs | ||
| - ✅ **Preserves BOSH Design:** Maintains content-addressable storage benefits (deduplication, immutability) | ||
| - ✅ **Easy Browsing:** S3 console shows organized folder structure | ||
| - ✅ **Orphan Detection:** Easier to identify unused blobs per buildpack | ||
|
|
||
| #### Cons | ||
|
|
||
| - ❌ **Shared Bucket Limitations:** Still requires coordination between teams for bucket policies | ||
| - ❌ **Breaking Change Potential:** May impact consumers if not carefully coordinated | ||
| - ❌ **Migration Effort:** Requires updating 13+ buildpack releases | ||
| - ❌ **Blob Duplication:** Existing blobs must be copied (not moved) during grace period, temporarily doubling storage | ||
| - ❌ **Multi-Repo Coordination:** Changes must be synchronized across multiple repositories | ||
| - ❌ **No Per-Team Access Control:** Cannot implement IAM policies for individual buildpack teams within shared bucket | ||
|
|
||
|
|
||
| #### Additional Consideration: Release Candidates Bucket | ||
|
|
||
| The `buildpack-release-candidates/` directory (1,099 files) could also be separated into its own dedicated bucket under Option 2. This directory contains pre-release buildpack versions organized by buildpack type: | ||
|
|
||
| ``` | ||
| buildpack-release-candidates/ | ||
| ├── apt/ | ||
| ├── binary/ | ||
| ├── dotnet-core/ | ||
| ├── go/ | ||
| ├── java/ | ||
| ├── nodejs/ | ||
| ├── php/ | ||
| ├── python/ | ||
| ├── ruby/ | ||
| └── staticfile/ | ||
| ``` | ||
|
|
||
| **Creating a separate bucket for release candidates would provide:** | ||
| - ✅ **Clear Lifecycle Separation:** Development artifacts isolated from production blobs | ||
| - ✅ **Independent Retention Policies:** Apply aggressive cleanup rules (e.g., 180-day expiration) without affecting production | ||
| - ✅ **Reduced Production Bucket Clutter:** Keep production bucket focused on finalized BOSH blobs | ||
| - ✅ **Simplified Access Control:** CI/CD systems can have different permissions for release candidates vs. production blobs | ||
|
|
||
| **Example bucket structure:** | ||
| ``` | ||
| buildpacks.cloudfoundry.org # Production BOSH blobs only | ||
| buildpacks-candidates.cloudfoundry.org # Pre-release buildpack packages | ||
| ``` | ||
|
|
||
| **Note:** This separation is **compatible with both Option 1 and Option 2**: | ||
| - **With Option 1:** Release candidates bucket + shared production bucket with folder namespacing | ||
| - **With Option 2:** Release candidates bucket + per-buildpack production buckets | ||
|
|
||
| The release candidates bucket would **not** require BOSH configuration changes, as these files are uploaded directly by CI/CD pipelines (`buildpacks-ci`), not via `bosh upload-blobs`. | ||
|
|
||
| ## Comparison | ||
|
|
||
| | Aspect | Option 1: Folder Namespacing | Option 2: Dedicated Buckets | | ||
| |--------|------------------------------|----------------------------| | ||
| | **Infrastructure Changes** | Minimal (config only) | Major (13 buckets + DNS) | | ||
| | **Migration Complexity** | Medium | High | | ||
| | **Operational Overhead** | Low | High | | ||
| | **Team Isolation** | Logical (folders) | Physical (buckets) | | ||
| | **Access Control Granularity** | Bucket-level | Bucket-level per team | | ||
| | **Cost Impact** | ~$5-10/month temporary (migration grace period) | +~$20/month ongoing (request & monitoring overhead) | | ||
| | **Rollback Ease** | Easy (keep old files) | Difficult (multiple resources) | | ||
| | **BOSH Native Support** | ✅ Yes (`folder_name`) | ✅ Yes (`bucket_name`) | | ||
| | **Orphan Detection** | Easier (per folder) | Easiest (per bucket) | | ||
| | **Breaking Changes Risk** | Low | Medium-High | | ||
|
|
||
| ## Cost Analysis | ||
|
|
||
| ### Current State (Shared Bucket) | ||
| - **Storage:** ~2.32 TB = ~$53/month (at $0.023/GB S3 Standard) | ||
| - **Data Transfer OUT:** ~500 GB/month = ~$45/month (at $0.09/GB) | ||
| - **S3 Requests:** ~$0.50/month | ||
| - **Total:** ~$98.50/month | ||
|
|
||
| ### Option 1: Folder Namespacing | ||
|
|
||
| **Migration Period (30 days):** | ||
| - Temporary blob duplication during grace period | ||
| - Additional storage: ~$5-10/month for 30 days | ||
| - **Migration cost:** ~$5-10 (one-time) | ||
|
|
||
| **Steady State:** | ||
| - Same storage structure (folders within 1 bucket) | ||
| - Same request costs | ||
| - Cleanup of orphaned blobs: **-$6/month savings** | ||
| - **Total: ~$92/month** (6% reduction) | ||
|
|
||
| ### Option 2: Dedicated Buckets Per Buildpack | ||
|
|
||
| **Infrastructure:** | ||
| - 13 buildpack buckets | ||
| - 1 optional release candidates bucket | ||
| - **Total: 14 S3 buckets** | ||
|
|
||
| **Monthly Costs:** | ||
| - **Storage:** ~$53/month (same, just distributed) | ||
| - **Data Transfer OUT:** ~$45/month (same) | ||
| - **Additional S3 Request Overhead:** 14 buckets × $0.50 = **+$7/month** | ||
| - **CloudWatch Monitoring:** 14 buckets × $0.30 = **+$4.20/month** | ||
| - **Additional DNS/Certificate Management:** ~$1/month | ||
| - **Total: ~$110/month** (~12% increase) | ||
|
|
||
| **Cost Comparison:** | ||
| - **Option 1:** $92/month (after cleanup) | ||
| - **Option 2:** $110/month | ||
| - **Difference:** +$18/month (~+$216/year) for Option 2 | ||
|
|
||
| ## Additional Information | ||
|
|
||
| - **S3 Bucket Investigation Document:** [`buildpacks-ci/S3_BUCKET_INVESTIGATION.md`](https://github.com/cloudfoundry/buildpacks-ci/blob/cf-release/S3_BUCKET_INVESTIGATION.md) | ||
| - **UUID Mapper Tool:** Located in `https://github.com/cloudfoundry/buildpacks-ci/tree/cf-release/tools/uuid-mapper` repository, maps orphaned UUIDs to BOSH release repositories | ||
| - **July 2024 Bucket Migration Context:** [buildpacks-ci commit](https://github.com/cloudfoundry/buildpacks-ci/commit/XXXXXXX) - "Switch to using buildpacks.cloudfoundry.org bucket" for CFF CDN takeover | ||
| - **BOSH `folder_name` Documentation:** [BOSH Blobstore Docs](https://bosh.io/docs/release-blobs/) | ||
| - **Related RFC:** [RFC-0011: Move Buildpack Dependencies Repository to CFF](rfc-0011-move-buildpacks-dependencies-to-cff.md) | ||
|
|
||
| ## Open Questions | ||
|
|
||
| 1. **Who owns migration execution?** Should this be Application Runtime Interfaces WG or joint ownership with other teams? | ||
| 2. **Budget approval:** Does the CFF budget accommodate temporary storage cost increase during migration grace period (~$5-10/month for 30 days)? | ||
| 3. **Access control requirements:** Are there any per-team IAM access control requirements that would necessitate Option 2? | ||
| 4. **Release candidates bucket separation:** Should the `buildpack-release-candidates/` directory (1,099 files) be moved to a dedicated `buildpacks-candidates.cloudfoundry.org` bucket to enable independent lifecycle management and cleanup policies? | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The technical separation makes sense. But I don't think that there is a multi-team collision. All buildpacks are handled by the same team / WG area "Buildpacks and Stacks": https://github.com/cloudfoundry/community/blob/main/toc/working-groups/app-runtime-interfaces.md?plain=1#L75
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only creating a bosh release from a tag will create a issue. but still possible with minimal change.
when checking out a tag. the user should only add a
folder_name: xxxin the config/final.ymlThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding more context. The final release tarballs are available on bosh.io (e.g. https://bosh.io/releases/github.com/cloudfoundry/java-buildpack-release?all=1) uploaded by this task, that is why they are not impacted. If you want to build from source for an older tag will require the workaround @ramonskie mentioned above.