Skip to content

Auto-detect parallelism for pg_dump and pg_restore from RDS instance type#341

Open
NikolayS wants to merge 2 commits intomasterfrom
claude/find-rds-refresh-component-zCRUu
Open

Auto-detect parallelism for pg_dump and pg_restore from RDS instance type#341
NikolayS wants to merge 2 commits intomasterfrom
claude/find-rds-refresh-component-zCRUu

Conversation

@NikolayS
Copy link
Copy Markdown
Contributor

Summary

Add automatic parallelism detection for pg_dump and pg_restore operations based on vCPU counts. The dump parallelism is determined by querying the RDS clone instance type's vCPU count via EC2 API, while restore parallelism is based on the local machine's CPU count.

Key Changes

  • New parallelism module (parallelism.go): Implements vCPU detection logic

    • ResolveParallelism(): Main entry point that determines optimal parallelism levels
    • lookupInstanceVCPUs(): Queries EC2 API for RDS instance type vCPU information
    • rdsClassToEC2Type(): Converts RDS instance class format (e.g., "db.m5.xlarge") to EC2 type format ("m5.xlarge")
    • resolveLocalVCPUs(): Returns the local machine's CPU count
    • Includes EC2 API client initialization with support for custom endpoints
  • Comprehensive test coverage (parallelism_test.go): Tests all parallelism resolution functions with mocked EC2 API

    • Tests RDS class to EC2 type conversion with valid and invalid inputs
    • Tests vCPU lookup with various scenarios (valid instances, missing instances, missing vCPU info, API failures)
    • Tests local vCPU resolution
  • Integration with refresh workflow (refresher.go):

    • Step 2 now resolves parallelism levels before creating RDS clone
    • Gracefully handles parallelism detection failures with fallback to defaults
    • Passes resolved parallelism to DBLab config update
  • DBLab config updates (dblab.go):

    • Extended SourceConfigUpdate struct with DumpParallelJobs and RestoreParallelJobs fields
    • Updated UpdateSourceConfig() to conditionally include parallelism settings (only when > 0)
  • Test coverage (dblab_test.go):

    • Added tests for successful config updates with parallelism settings
    • Added test verifying parallelism fields are omitted when zero

Implementation Details

  • Minimum parallelism level is enforced at 1 job to ensure at least some parallelism
  • RDS instance class validation ensures proper "db." prefix before EC2 API queries
  • Graceful degradation: if parallelism detection fails, the refresh continues with default values (0, which preserves existing DBLab settings)
  • EC2 API client supports custom endpoints for testing and non-standard AWS deployments

https://claude.ai/code/session_01AhnBVCBWjk24T7BBQtmkbq

claude added 2 commits April 10, 2026 13:54
Resolve optimal -j values automatically: use EC2 DescribeInstanceTypes
to determine vCPU count of the RDS clone (for pg_dump parallelism) and
runtime.NumCPU for the local machine (for pg_restore parallelism).
Pass both values through the existing ConfigProjection when updating
DBLab config during refresh.

https://claude.ai/code/session_01AhnBVCBWjk24T7BBQtmkbq
Drop the aws-sdk-go-v2/service/ec2 dependency entirely. Instead of
calling DescribeInstanceTypes (which required ec2:DescribeInstanceTypes
IAM permission and added ~5s IMDS timeout in tests), parse vCPU count
from the RDS instance class size suffix using a static map of standard
AWS size-to-vCPU mappings. Unlisted NUMxlarge sizes are handled via
multiplier parsing.

https://claude.ai/code/session_01AhnBVCBWjk24T7BBQtmkbq
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants