Skip to content

Conversation

@hannesill
Copy link
Contributor

@hannesill hannesill commented Nov 26, 2025

Summary

Generalizes the M3 architecture to support multiple tabular PhysioNet datasets beyond just MIMIC-IV. This allows researchers to switch between different clinical databases (like eICU) seamlessly within the same session.

Key Changes

  • Refactor: Decoupled core logic from MIMIC-IV specifics.
  • Registry Pattern: Introduced DatasetRegistry and DatasetDefinition to handle dataset metadata.
  • Configurable Datasets: New datasets can now be added via JSON configuration files in m3_data/datasets/ (no code changes required for new data).
  • Dynamic Switching: Modified m3 use <dataset> command to switch active datasets instantly without restarting the MCP client.
  • Documentation: Updated README with instructions for managing multiple datasets and adding custom ones. Made it easier to follow.

Testing

  • Verified standard MIMIC-IV demo flow works as before.
  • Verified eICU support by adding eicu.json and running initialization.
  • Added new test case for dataset switching
  • All tests pass

Notes
The tools get_icu_stays, get_lab_results, and get_race_distribution are MIMIC-IV specific. In the future, we could try to make them as generalizable as possible or register dataset-specific tools.
For now this is fine because the three core tools are dataset agnostic.

- Implement dynamic dataset registry supporting MIMIC-IV Demo and Full in src/m3/datasets.py
- Add BigQuery backend support to MCP server for cloud-based data access in src/m3/mcp_server.py
- Update CLI to handle credentialed dataset initialization and active dataset switching in src/m3/cli.py
- Enhance MCP server tools to be dataset-aware and provide better error guidance
- Improve security with SQL injection prevention and safer query execution
- Refactor configuration management in src/m3/config.py for better runtime dataset detection
- Update CI workflows for uv integration
- Add comprehensive tests for new backends and tools
- Implement 'm3 use' CLI command to switch between active datasets (e.g., mimic-iv-full, eicu).
- Update MCP server to dynamically resolve database paths and BigQuery configurations based on the active dataset.
- Enhance 'm3 init' to handle datasets requiring authentication by providing 'wget' instructions for PhysioNet.
- Update 'get_database_schema', 'get_table_info', and convenience tools to be dataset-aware.
- Add 'tests/test_dynamic_switching.py' to verify dataset switching logic.
@simonprovost
Copy link
Collaborator

The tools get_icu_stays, get_lab_results, and get_race_distribution are MIMIC-IV specific. In the future, we could try to make them as generalizable as possible or register dataset-specific tools. --> That's where MCPStack could really be useful to avoid having to create too generalisable functions; which at some point over time would not makes sense because would be too complex or not too specific enough

@hannesill hannesill closed this Nov 28, 2025
@hannesill hannesill deleted the physionet_support branch November 28, 2025 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants