Skip to content

Conversation

@Mohiiit
Copy link
Member

@Mohiiit Mohiiit commented Nov 29, 2025

Problem

Currently, Madara has no migration strategy - if the database schema changes, users must delete and resync from scratch. This is problematic for nodes with large databases.

Solution

Implement a migration framework inspired by Pathfinder that:

  1. Detects version mismatch on startup by comparing .db-version file with binary's expected version
  2. Runs migrations sequentially from current → target version
  3. Creates backup before migrating (RocksDB checkpoint)
  4. Supports crash recovery via state checkpointing - interrupted migrations resume on restart
  5. Reports progress for long-running migrations

Behavior

Database State Action
Fresh Create at current version
Same version Open normally
Older (migratable) Run migrations
Too old Error with resync instruction
Newer than binary Error with upgrade instruction

Adding Future Migrations

When a schema change is needed:

  1. Create a migration file
  2. Register it in the migration registry
  3. Bump current_version in .db-versions.yml

Documentation included in MIGRATION.md.

This commit adds the foundational structure for database migrations:

- migration/mod.rs: Main migration runner with status checking,
  execution, backup creation, and crash recovery
- migration/error.rs: Comprehensive error types for migration failures
- migration/context.rs: Migration context with progress reporting
- migration/registry.rs: Migration registry for version-to-function mapping
- migration/revisions/mod.rs: Placeholder for individual revision files

The migration system supports:
- Version detection and comparison
- Sequential migration execution
- Pre-migration backups via RocksDB checkpoints
- Crash recovery via state checkpointing
- Progress reporting for long-running migrations
- Lock files to prevent concurrent migrations

Tests: 13 unit tests covering all core functionality
This commit integrates the migration system with the database opening logic:

Changes:
- Update build.rs to also export DB_BASE_VERSION for minimum migration version
- Add base_version field to .db-versions.yml
- Integrate MigrationRunner into MadaraBackend::open_rocksdb()
- Add initialize_fresh_database() method for new databases
- Add run_migrations_with_storage() for running with RocksDBStorage
- Add inner_db() accessor to RocksDBStorage for migration access
- Remove old db_version.rs module (replaced by migration system)
- Use MultiThreaded DB type in migration module for compatibility

The flow is now:
1. Check migration status (file-based check)
2. For fresh DB: write version file
3. For migration needed: open DB, run migrations, close
4. Open DB normally with RocksDBStorage wrapper

Tests: All 36 tests pass
Add comprehensive integration tests for database migration:

- test_fresh_database_creates_version_file: Verify fresh DB creates version file
- test_fresh_database_can_be_reopened: Test database can be reopened
- test_same_version_opens_without_migration: No migration for same version
- test_newer_database_version_fails: Detect database newer than binary
- test_too_old_database_version_fails: Detect database too old to migrate
- test_migration_state_file_operations: Test state file persistence
- test_version_file_read_write: Test version file operations
- test_migration_lock_file_creation: Test lock file mechanism
- test_backup_directory_creation: Verify backup path
- test_full_database_lifecycle: Full open/reopen/verify cycle
- test_database_with_data_survives_reopen: Data persistence test
- test_invalid_version_file_format: Error handling for bad version
- test_version_file_with_whitespace: Handle whitespace in version file

Tests: 49 tests now pass (14 new migration tests)
Documentation updates:

1. .db-versions.yml:
   - Added comprehensive header documentation
   - Explained purpose of current_version and base_version
   - Added description field to version entries
   - Added instructions for adding new versions

2. MIGRATION.md:
   - Complete guide for the migration system
   - Architecture overview with ASCII diagram
   - Step-by-step guide for adding new migrations
   - MigrationContext API documentation
   - Error handling guide
   - Recovery procedures for failed/interrupted migrations
   - Testing best practices
   - Troubleshooting section
@Mohiiit Mohiiit self-assigned this Nov 29, 2025
@Mohiiit Mohiiit added the madara label Nov 29, 2025
Mohiiit and others added 19 commits November 29, 2025 17:33
- Remove indeterminate() from MigrationProgress (always know steps)
- Improve NoMigrationPath error message to clarify it's a registry bug
- Merge resume_migration into execute_migrations (single unified flow)
- Trim redundant doc comments and simplify code structure
- Add comment explaining RocksDB checkpoint is lightweight (hard links)
- Export DB_VERSION_FILE, DB_MIGRATION_LOCK, DB_MIGRATION_STATE constants
- Make read_version_file/write_version_file public for testing
- Refactor tests: use rstest for parameterized tests, remove redundant ones
- Use strong assertions with exact version checks
- Add task-test-migration.yml: reusable workflow for migration testing
  - Downloads base DB fixture from GitHub releases
  - Starts madara (migration runs automatically)
  - Verifies .db-version matches expected version
  - Runs migration validation tests
  - Graceful handling when fixture doesn't exist

- Add pull-request-migration.yml: trigger on migration file changes
  - Triggers on changes to migration/revisions/** or .db-versions.yml
  - Calls the task workflow with version parameters
- Add migration.rs with test structure:
  - test_migration_validation_rpc_healthy: verify RPC responds
  - test_migration_validation_blocks_accessible: verify block data
  - test_migration_validation_state_update_intact: verify state roots

- Tests are #[ignore] by default, run with:
  cargo test -p mc-e2e-tests migration_validation -- --ignored

- Tests connect to external madara (configurable via MIGRATION_TEST_RPC_URL)

- Add v9_snip34 module placeholder for SNIP-34 specific tests

- Add url dependency to mc-e2e-tests
- Triggers on push to main when .db-versions.yml changes
- Also supports manual dispatch with force_update option
- Workflow steps:
  1. Detect if base_version actually changed
  2. Build madara from current main
  3. Sync configurable number of blocks (default 50)
  4. Package DB as tarball
  5. Upload to GitHub releases as db-fixtures-v{VERSION}

- Release includes usage instructions
- Used by task-test-migration.yml to download fixtures
- If base DB fixture doesn't exist in releases, create one on-the-fly
- Useful for PRs that change both base_version and add migration code
- Added blocks_to_sync input parameter (default 50)
- Increased timeout to 60 min to accommodate DB creation
- Added db_source output to track if DB was downloaded or created
- Improved logging and summary output
pull-request-migration.yml:
- Split into two dependent jobs: create-base-db → migration-test
- Job 1 downloads fixture or creates DB, uploads as artifact
- Job 2 downloads artifact, runs migration, runs tests
- Clear separation of concerns

task-test-migration.yml:
- Simplified to match same two-job pattern
- Reusable for manual dispatch or other workflows
- Cleaner, more focused implementation
- Remove task-update-base-db.yml (no automation)
- Simplify task-test-migration.yml (just download, no fallback)
- Simplify pull-request-migration.yml (just calls task)
- Add scripts/create-base-db.sh for manual DB creation and upload

Manual workflow:
  1. Run: ./scripts/create-base-db.sh 8 50
  2. Script builds madara, syncs blocks, uploads to releases
  3. Migration tests download fixture from releases
Remove path filter - migration tests now run on all PRs, not just
those that modify migration files. This ensures migrations don't
break from any code changes.
- Remove standalone pull-request-migration.yml
- Add test-migration job to pull-request-main-ci.yml
  - Runs after test-madara completes
- Update task-test-migration.yml to read versions from .db-versions.yml
  - No more hardcoded version parameters needed
- Download pre-built madara binary instead of rebuilding
- Run test-migration after build-madara (parallel with test-madara)
- Add madara-binary-hash input parameter
- Faster CI execution
- Add task-do-nothing-test-migration.yml for merge queue
- Add test-migration to pull-request-merge.yml
- Fix markdown lint errors in MIGRATION.md (line length, code block language)
- Store DB fixtures as Docker images at ghcr.io/madara-alliance/db-fixtures:v{VERSION}
- Consistent with existing artifact storage pattern
- No release tags cluttering the releases page
- Add debug_assert validation for MigrationProgress (current_step <= total_steps)
- Handle target_version mismatch when resuming interrupted migrations
Treat locks with unreadable metadata as stale instead of ignoring
- Use stronger memory ordering for cross-thread visibility
- Explicitly handle system time errors in lock age check
- Use subshell for cd to preserve working directory
- Add binary existence check before running
- Add comment explaining || true usage
- Clarify current_step range in MigrationProgress docs
@Mohiiit Mohiiit marked this pull request as ready for review December 1, 2025 11:16
These documentation files were accidentally included. Removing from git tracking.
Similar to clippy, madara needs separate formatting since it's not in the toplevel workspace.
Mohiiit and others added 9 commits December 1, 2025 16:58
Use docker buildx to build for both linux/amd64 (CI) and linux/arm64 (Mac)
- Switch from sepolia to mainnet for consistency
- Fix --no-l1-sync to --l1-sync-disabled in script
- Add --sync-stop-at 50 in workflow to prevent unnecessary syncing
- Use .github/actions/setup-rust which installs LLVM 19
- Use blacksmith runner for consistency
- Add LDFLAGS for Cairo Native compatibility
- Add skip_migration_backup to MadaraBackendConfig
- Add --skip-migration-backup CLI flag (default: backup enabled)
- Add cleanup_backup() after successful migration
- Backup only persists on failure (for recovery)
Copy link
Contributor

@prkpndy prkpndy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add a list of supported DB versions?

Err(MigrationError::DatabaseTooOld { current_version, base_version })
}
MigrationStatus::MigrationRequired { current_version, target_version, .. } => {
self.execute_migrations(db, current_version, target_version)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we add a verification step as well after migration is complete?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there would be tests post migration which would validate the migration, that would right? or are you asking something else?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that after the db is migrated, we can verify that it has been migrated correctly before deleting the backup. is this already present?

@Mohiiit Mohiiit marked this pull request as draft December 5, 2025 13:03
@Mohiiit Mohiiit marked this pull request as ready for review December 5, 2025 13:10
@Mohiiit
Copy link
Member Author

Mohiiit commented Dec 6, 2025

do not update the branch just yet 🙏 should be updated post #875 merger

Comment on lines +103 to +113
- name: Rust setup
uses: ./.github/actions/setup-rust
with:
rust-version: ${{ env.BUILD_RUST_VERSION }}
cache-key: madara-${{ runner.os }}

- name: Run migration tests
working-directory: madara
run: cargo test -p mc-e2e-tests migration_validation -- --ignored --nocapture
env:
MIGRATION_TEST_RPC_URL: "http://localhost:9944"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering, we could potentially remove the burden of doing a rust setup here just to run the migration tests by utilising the Base vs Current version detection of job Read versions from .db-versions.yml
and using that as a trigger in the test-madara CI that will dynamically add this crate to the testing scope only when a version difference is detected, leading to saving time consumed by Rust Setup ~1.5 min per run

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wdyt?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will run the CI for each PR, because there could be cases where type changes could lead to madara not supporting certain lower db versions

.find(|line| line.starts_with("current_version:"))
.ok_or(BuildError::Parse(Cow::Borrowed("Could not find current_version")))?
.find(|line| line.starts_with(&prefix))
.ok_or_else(|| BuildError::Parse(Cow::Owned(format!("Could not find {}", key))))?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, what is the necessity to use Owned here ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BuildError::Parse takes Cow<'static, str> - this pattern allows both static strings (Cow::Borrowed("static error")) without allocation AND dynamic strings (Cow::Owned(format!(...))) when we need to include runtime values like key. Since we're using format!() to include the key name in the error message, we need Owned. Could simplify to just String but Cow is more flexible if we ever add static error messages.

Comment on lines +42 to +43
DatabaseTooOld { current_version: u32, base_version: u32 },
DatabaseNewer { db_version: u32, binary_version: u32 },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these two DatabaseTooOld & DatabaseNewer be adopted from MigrationError ?
seems like we are re-writing these two

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good observation! These are intentionally separate:

  • MigrationStatus (enum) describes states - it's informational, returned by check_status() to tell you what state the DB is in
  • MigrationError (error enum) describes failures - actionable errors when operations fail

The flow is: check_status() returns a status → run_migrations() converts certain statuses to errors. Keeping them separate maintains the distinction between "what state is the DB in?" vs "what went wrong?". Could reference errors from status but the explicit separation makes the intent clearer IMO.

Comment on lines +47 to +51
if ! docker pull "${IMAGE}"; then
echo "❌ Base DB fixture not found!"
echo "Please run: ./scripts/create-base-db.sh ${VERSION}"
exit 1
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can automate this as well using the strategy I've mentioned for Read versions from .db-versions.yml

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, in the file .db_version.yml if the db version is updated then we run this before proceeding.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like, having to ensure that we just need to write the code and CI will handle everything else will be superb

@@ -0,0 +1,268 @@
# Database Migration System
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this file not be in root/madara-root, unsure of the convention here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current location keeps documentation close to the migration code it documents - this is a common pattern (co-locating module-specific docs with the module). Moving to root would increase visibility but separate docs from implementation. Happy to move if there's a preference for root-level docs, but the co-location with code is intentional.

@Mohiiit Mohiiit requested review from heemankv and prkpndy December 9, 2025 08:37
Copy link
Contributor

@prkpndy prkpndy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

asked for a clarification. looks good otherwise. gg!

@heemankv heemankv added this pull request to the merge queue Dec 9, 2025
Merged via the queue into main with commit 3f5ee40 Dec 9, 2025
30 checks passed
@github-project-automation github-project-automation bot moved this to Done in Madara Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants