The development storage system provides developers with a fast, reliable way to work with real production data while maintaining a clean, reproducible environment. This architecture eliminates common development issues like database permission conflicts, stale test data, and slow container rebuilds by combining ephemeral storage with intelligent production data caching.
Traditional containerized development workflows face several challenges:
- Permission conflicts: Bind-mounted database directories create UID/GID mismatches between container users and host filesystem permissions
- Stale development data: Developers create test data manually, which diverges from production and becomes outdated
- Slow container rebuilds: Changing version labels or application code rebuilds the entire container stack, including infrastructure layers
- Data persistence complexity: Developers need different storage modes for development (throwaway) vs production (persistent)
1. Ephemeral Storage with tmpfs Development mode uses in-memory tmpfs storage for the PostgreSQL database. This provides:
- Lightning-fast startup: No disk I/O bottlenecks
- Clean slate every restart: Data is automatically discarded when container stops
- Zero permission issues: tmpfs is owned by the container, no bind mount conflicts
- Perfect for testing: Each test run starts with known-good production data
2. Automatic Production Data Seeding On first startup, the system automatically pulls a complete database dump from production via SSH:
- One-time setup: Developers don't need to manually create test data
- Real production data: Test with actual destinations, news, events, and user data
- Cached locally: 384MB seed data stored at
~/.rotv/seed-data.sqlfor fast reuse - Automatic import: Seed data is imported before the application starts, preventing schema conflicts
3. Smart Freshness Checking The system monitors the age of cached seed data:
- 7-day threshold: Warns developers if cached data is older than 7 days
- Non-blocking: Still starts with old data, just reminds you to refresh
- Manual refresh: Run
./run.sh seedanytime to pull latest production data
4. Optimized Container Builds The Containerfile is structured to maximize Docker layer caching:
- Infrastructure layer: PostgreSQL, Node.js, Playwright (rarely changes)
- Application layer: App code, frontend builds (changes frequently)
- Version labels: Moved to top of app stage so version bumps only rebuild app layer
- Playwright caching: Chromium binary installed early in infrastructure layer
5. Dual Storage Modes The same container supports two deployment modes:
- Development (
PERSISTENT_DATA=false): Ephemeral tmpfs + production seed data - Production (
PERSISTENT_DATA=true): Persistent bind mount to~/.rotv/pgdata
6. Secure Secrets Management
Environment variables are loaded from .env files that are gitignored:
- Local secrets:
backend/.envcontains OAuth tokens, session keys, admin email - Git-safe:
.gitignorepattern excludes all.envfiles from version control - Template provided:
.env.exampledocuments required variables for new developers
┌──────────────────────────────────────────────────────────────┐
│ 1. Developer runs: ./run.sh start │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ 2. Check for seed data at ~/.rotv/seed-data.sql │
│ ├─ Missing? → Auto-pull from production via SSH │
│ └─ Exists? → Check age (warn if > 7 days old) │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ 3. Start container with tmpfs storage │
│ --tmpfs /data/pgdata:rw,size=2G,mode=0700 │
│ -v ~/.rotv/seed-data.sql:/tmp/seed-data.sql:ro │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ 4. entrypoint.sh runs: │
│ ├─ Fix tmpfs permissions (chown postgres:postgres) │
│ ├─ Initialize PostgreSQL database │
│ ├─ Start PostgreSQL server │
│ ├─ Import /tmp/seed-data.sql (if exists) │
│ └─ Start Node.js application │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ 5. Application ready at http://localhost:8080 │
│ ✓ 371 POIs, 647 news items, 680 events │
│ ✓ Real production data for testing │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Developer runs: ./run.sh seed │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ Load production settings from backend/.env: │
│ ├─ PRODUCTION_HOST=sven.dc3.crunchtools.com │
│ ├─ PRODUCTION_PORT=22422 │
│ └─ PRODUCTION_CONTAINER=rootsofthevalley.org │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ SSH to production and run pg_dump: │
│ ssh -p 22422 root@sven.dc3.crunchtools.com \ │
│ "podman exec rootsofthevalley.org \ │
│ pg_dump -U rotv --clean --if-exists \ │
│ --no-owner --no-acl rotv" \ │
│ > ~/.rotv/seed-data.sql │
└──────────────────────────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ ✓ Production data cached (384M) │
│ Next ./run.sh start will import this data │
└──────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ FROM ubi10 AS infrastructure │
│ ├─ Install Node.js, npm (rarely changes) │
│ ├─ Install Playwright + Chromium (rarely changes) │
│ ├─ Install PostgreSQL 17 (rarely changes) │
│ ├─ Create postgres user (UID 70) │
│ └─ Set environment variables │
│ │
│ Result: Cached infrastructure layer (~1.2GB) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ FROM infrastructure AS application │
│ ├─ LABEL version="1.10.0" (bump to force rebuild) │
│ ├─ npm install frontend dependencies │
│ ├─ Build frontend with Vite │
│ ├─ npm install backend dependencies │
│ ├─ Copy backend code │
│ ├─ Move frontend build to public/ │
│ └─ Copy entrypoint.sh │
│ │
│ Result: Lightweight app layer (~200MB) │
└─────────────────────────────────────────────────────────────┘
Version bump (1.10.0 → 1.11.0):
✓ Infrastructure layer: CACHED (no rebuild)
✓ Application layer: REBUILD (fast, only ~2 minutes)
-
tmpfs: In-memory filesystem for ephemeral database storage
- 2GB size limit (configurable)
- 0700 permissions for security
- Automatic cleanup on container stop
-
Bind mounts: Persistent storage for production deployments
- Host directory:
~/.rotv/pgdata(default) - SELinux context:
:Zflag for proper labeling - Ownership: Fixed with
podman unshare chown
- Host directory:
-
PostgreSQL 17: Production-grade relational database
- Unix socket directory:
/tmp/pgsocket(custom for permissions) - Trust authentication: Local connections don't require password
- postgres user: UID 70 (standard PostgreSQL user)
- Unix socket directory:
-
pg_dump: PostgreSQL backup utility
--clean --if-exists: Drop objects before creating (idempotent)--no-owner --no-acl: Ignore ownership (rotv vs postgres mismatch)- Output format: Plain SQL text (human-readable, compressible)
-
Podman: Rootless container runtime
- Privileged mode: Required for PostgreSQL to run as non-root user
- User namespaces:
podman unsharefor UID/GID mapping - Network: Port 8080 exposed for web access
-
Multi-stage builds: Separate infrastructure and application layers
- Stage 1:
infrastructure- rarely changes, heavily cached - Stage 2:
application- changes frequently, fast rebuilds
- Stage 1:
-
runuser: Execute commands as different user (preserves environment)
- Replaces
su -cwhich doesn't preserve PGDATA, PGRUNDIR variables - Syntax:
runuser -u postgres -- command args
- Replaces
-
SSH: Secure remote command execution
- Custom port:
-p 22422for production server - Remote execution:
ssh host "command"runs command and returns output
- Custom port:
Environment variable loading (lines 16-21):
# Load environment variables from .env file if it exists
if [ -f ".env" ]; then
export $(grep -v '^#' .env | xargs)
elif [ -f "backend/.env" ]; then
export $(grep -v '^#' backend/.env | xargs)
fiLoads secrets from .env files on the host machine before starting container. This keeps secrets out of container images.
Ephemeral storage configuration (lines 58-61):
if [ "$USE_PERSISTENT" = "true" ]; then
STORAGE_MOUNT="-v $DATA_DIR:/data/pgdata:Z"
else
STORAGE_MOUNT="--tmpfs /data/pgdata:rw,size=2G,mode=0700"
fiDevelopment mode uses tmpfs (in-memory), production uses bind mount.
Automatic seed data pull (lines 67-90):
if [ ! -f "$SEED_DATA_FILE" ]; then
echo "⚠ No seed data found at $SEED_DATA_FILE"
echo "Automatically pulling production data..."
ssh -p "$PRODUCTION_PORT" root@"$PRODUCTION_HOST" \
"podman exec $PRODUCTION_CONTAINER pg_dump -U rotv --clean --if-exists --no-owner --no-acl rotv" \
> "$SEED_DATA_FILE"
if [ $? -eq 0 ]; then
SEED_SIZE=$(du -h "$SEED_DATA_FILE" | cut -f1)
echo "✓ Production data downloaded ($SEED_SIZE)"
else
echo "❌ Failed to pull production data"
echo "Cannot start in development mode without seed data"
exit 1
fi
fiFirst ./run.sh start automatically pulls production data. Exits with error if pull fails.
Freshness check (lines 92-99):
SEED_AGE_DAYS=$(( ($(date +%s) - $(date -r "$SEED_DATA_FILE" +%s)) / 86400 ))
if [ $SEED_AGE_DAYS -gt 7 ]; then
echo "⚠ Seed data is $SEED_AGE_DAYS days old"
echo "Consider running './run.sh seed' to refresh production data"
fiWarns developers if cached data is stale, but doesn't block startup.
Seed data mounting (lines 102-103):
echo "Mounting seed data for import..."
SEED_MOUNT="-v $SEED_DATA_FILE:/tmp/seed-data.sql:ro"Mounts cached seed file as read-only volume. Container imports from /tmp/seed-data.sql.
Permission fixing (lines 17-22):
PGDATA_OWNER=$(stat -c '%u' "$PGDATA" 2>/dev/null || echo "unknown")
if [ "$PGDATA_OWNER" != "70" ]; then
echo "Fixing data directory permissions..."
chown -R postgres:postgres "$PGDATA"
chmod 700 "$PGDATA"
fiBoth tmpfs and bind mounts need ownership fixed since container runs as root. UID 70 is the standard PostgreSQL user.
PostgreSQL initialization (lines 28-58):
if [ ! -f "$PGDATA/PG_VERSION" ]; then
echo "Initializing PostgreSQL database..."
runuser -u postgres -- initdb -D "$PGDATA" -U postgres
# Configure PostgreSQL for local connections
cat >> "$PGDATA/pg_hba.conf" << 'EOF'
host all all 127.0.0.1/32 trust
host all all ::1/128 trust
local all all trust
EOF
# Configure custom socket directory
cat >> "$PGDATA/postgresql.conf" << EOF
listen_addresses = 'localhost'
unix_socket_directories = '$PGRUNDIR'
EOF
# Create databases
runuser -u postgres -- pg_ctl -D "$PGDATA" -l /tmp/pg_init.log start -o "-k $PGRUNDIR"
psql -h "$PGRUNDIR" -U postgres -d postgres -c "CREATE DATABASE rotv;"
psql -h "$PGRUNDIR" -U postgres -d postgres -c "CREATE DATABASE rotv_test;"
runuser -u postgres -- pg_ctl -D "$PGDATA" stop
fiUses runuser instead of su -c to preserve environment variables (PGDATA, PGRUNDIR).
Production data import (lines 77-82):
if [ -f /tmp/seed-data.sql ]; then
echo "Importing seed data..."
psql -h "$PGRUNDIR" -U postgres -d rotv -f /tmp/seed-data.sql 2>&1 | grep -c "^COPY" | xargs echo "Imported rows from tables:"
echo "✓ Seed data imported"
fiImport happens before exec node server.js to prevent schema conflicts with application's initDatabase() function.
Infrastructure layer caching (lines 1-16):
FROM registry.access.redhat.com/ubi10/ubi:latest AS infrastructure
# Install Node.js (rarely changes)
RUN dnf install -y nodejs npm && dnf clean all
# Install Playwright BEFORE PostgreSQL (cached early)
RUN npx playwright install chromium
# Install PostgreSQL (rarely changes)
RUN dnf install -y https://download.postgresql.org/pub/repos/yum/reporpms/EL-10-x86_64/pgdg-redhat-repo-latest.noarch.rpm && \
dnf install -y postgresql17-server postgresql17 && dnf clean all
# Create postgres user (UID 70 is standard)
RUN useradd -u 70 -m -s /bin/bash postgres || truePlaywright installation moved before PostgreSQL so it's cached early. Rarely needs rebuilding.
Application layer with version labels (lines 18-22):
FROM infrastructure AS application
# Labels at top of app stage - bump version to force app rebuild
LABEL maintainer="fatherlinux"
LABEL version="1.10.0"Bumping version label only rebuilds application layer (~2 minutes), not infrastructure (~15 minutes).
Production seed configuration (lines 1-4):
# Production seed data source (for ./run.sh seed)
PRODUCTION_HOST=sven.dc3.crunchtools.com
PRODUCTION_PORT=22422
PRODUCTION_CONTAINER=rootsofthevalley.orgLocal development database (lines 6-11):
# Database
PGHOST=localhost
PGPORT=5432
PGDATABASE=rotv
PGUSER=rotv
PGPASSWORD=rotvOAuth and session secrets (lines 13-27):
# Session
SESSION_SECRET=change-this-to-a-secure-random-string-at-least-32-chars
# OAuth - Google
GOOGLE_CLIENT_ID=693091366844-...apps.googleusercontent.com
GOOGLE_CLIENT_SECRET=GOCSPX-...
GOOGLE_CALLBACK_URL=http://localhost:8080/auth/google/callback
# Admin
ADMIN_EMAIL=scott.mccarty@gmail.comGit safety: This file is excluded by .gitignore pattern .env (line 3), preventing accidental commits.
# Remove seed data
rm ~/.rotv/seed-data.sql
# Start container (should auto-pull)
./run.sh start
# Expected output:
# ⚠ No seed data found at /home/fatherlinux/.rotv/seed-data.sql
# Automatically pulling production data...
# Running pg_dump on production container: rootsofthevalley.org
# ✓ Production data downloaded (384M)
# Mounting seed data for import...# Age seed data artificially
touch -d "10 days ago" ~/.rotv/seed-data.sql
# Start container
./run.sh start
# Expected output:
# ⚠ Seed data is 10 days old
# Consider running './run.sh seed' to refresh production data
# Mounting seed data for import...# Wait for container to start
sleep 10
# Check database
podman exec rotv psql -U postgres -d rotv -c "SELECT COUNT(*) FROM pois;"
# Expected: 371
# Check API
curl -s http://localhost:8080/api/destinations | jq -r 'length'
# Expected: 178# Add test data
podman exec rotv psql -U postgres -d rotv -c "INSERT INTO pois (name, poi_type, location) VALUES ('Test POI', 'point', ST_MakePoint(-81.5, 41.2));"
# Verify test data exists
podman exec rotv psql -U postgres -d rotv -c "SELECT COUNT(*) FROM pois WHERE name = 'Test POI';"
# Expected: 1
# Restart container
./run.sh stop
./run.sh start
sleep 10
# Verify test data is gone (clean production data restored)
podman exec rotv psql -U postgres -d rotv -c "SELECT COUNT(*) FROM pois WHERE name = 'Test POI';"
# Expected: 0# Time full rebuild
time ./run.sh build
# Bump version label in Containerfile
# Change: LABEL version="1.10.0"
# To: LABEL version="1.11.0"
# Time incremental rebuild (should be much faster)
time ./run.sh build
# Expected:
# - Infrastructure layer: Using cache (0 seconds)
# - Application layer: Rebuild (~2 minutes)- Fast startup: Container starts in ~10 seconds with 384MB of production data
- Real data: Test with actual destinations, news, events instead of synthetic data
- Clean slate: Every restart = fresh production data (no stale test data accumulation)
- No permission issues: tmpfs eliminates bind mount UID/GID conflicts
- Fast rebuilds: Version bumps only rebuild app layer (~2 minutes vs ~15 minutes)
- Easy onboarding: New developers run
./run.sh startand get production data automatically - Consistent environments: All developers work with identical production data snapshots
- Secure secrets:
.envfiles are gitignored, OAuth tokens stay local - Production safety: Read-only production access (pg_dump via SSH, no write access)
- Dual mode support: Same container works for development (ephemeral) and production (persistent)
- Simple deployment:
PERSISTENT_DATA=true ./run.sh starton production server - No data loss risk: tmpfs is only used in development mode
- Clear separation: Development seed data cached separately from production storage
Instead of downloading the full 384MB database dump, implement incremental updates:
- Track last sync timestamp
- Pull only changed records since last sync
- Use PostgreSQL logical replication or CDC (Change Data Capture)
- Benefit: Faster refresh (seconds instead of minutes)
Allow developers to maintain multiple named snapshots:
./run.sh seed staging # Pull from staging environment
./run.sh seed prod-2026-01 # Named production snapshot
./run.sh start prod-2026-01 # Start with specific snapshotBenefit: Test migrations, compare data across environments
Add cron job or systemd timer to auto-refresh seed data:
# Refresh nightly at 2am
0 2 * * * /path/to/run.sh seedBenefit: Developers always have fresh data without manual intervention
Compress cached seed data to reduce disk usage:
gzip ~/.rotv/seed-data.sql → ~/.rotv/seed-data.sql.gzBenefit: 384MB → ~40MB (10x compression typical for SQL dumps)
Share Playwright's Chromium binary across containers:
# Volume mount shared browser cache
-v ~/.cache/ms-playwright:/root/.cache/ms-playwright:roBenefit: Eliminate 300MB Chromium download per container rebuild
Add validation to run.sh startup:
# Fail fast if required secrets are missing
if [ -z "$GOOGLE_CLIENT_ID" ]; then
echo "❌ GOOGLE_CLIENT_ID not set in .env file"
exit 1
fiBenefit: Clear error messages instead of cryptic OAuth failures
Compare seed data schema against application's migration scripts:
./run.sh validate-schema # Warn if production schema differs from migrationsBenefit: Catch production schema drift before deploying migrations
Strip sensitive data from seed data before importing:
- Hash email addresses
- Randomize user passwords
- Remove session tokens Benefit: Comply with GDPR, allow seed data sharing with contractors
Document Version: 1.0 Last Updated: 2026-01-21 Author: Claude Code (with Scott McCarty)