Skip to content
/ m4 Public
forked from rafiattrach/m3

πŸ₯πŸ€– Query PhysioNet medical data (MIMIC-IV, eICU, etc.) using natural language through Model Context Protocol (MCP). Transform healthcare research with AI-powered database interactions - supports both local and BigQuery datasets.

License

Notifications You must be signed in to change notification settings

hannesill/m4

Β 
Β 

Repository files navigation

M4 (Beta): Many Medical Datasets ↔ MCP ↔ Models πŸ₯πŸ€–

M4 Logo

Query tabular PhysioNet medical data using natural language through MCP clients

Python MCP Tests Code Quality PRs Welcome

Transform medical data analysis with AI! Ask questions about MIMIC-IV and other PhysioNet datasets in plain English and get instant insights. Choose between local data (free) or full cloud dataset (BigQuery).

🫢 M4 is built upon M3.
Please acknowledge the original authors and cite their work.

πŸ’‘ How It Works

M4 acts as a bridge between your AI Client (like Claude Desktop, Cursor, or LibreChat) and your medical data.

  1. You ask a question in your chat interface: "How many patients in the ICU have high blood pressure?"
  2. M4 securely translates this into a database query.
  3. M4 runs the query on your local or cloud data.
  4. The LLM explains the results to you in plain English.

No SQL knowledge required.

Features

  • πŸ” Natural Language Queries: Ask questions about your medical data in plain English
  • 🏠 Modular Datasets: Support for any tabular PhysioNet dataset (MIMIC-IV, etc.)
  • πŸ“‚ Local DuckDB + Parquet: Fast local queries using Parquet files with DuckDB views
  • ☁️ BigQuery Support: Access full MIMIC-IV dataset on Google Cloud
  • πŸ”’ Enterprise Security: OAuth2 authentication with JWT tokens and rate limiting
  • πŸ›‘οΈ SQL Injection Protection: Read-only queries with comprehensive validation
  • 🧩 Extensible Architecture: Easily add new custom datasets via configuration or CLI

πŸš€ Quick Start

Prerequisites

You need an MCP-compatible Client to use M4. Popular options include:

1. Install uv (Required)

We use uvx to run the MCP server efficiently.

macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Choose Your Data Source

Select Option A (Local) or Option B (Cloud).

Option A: Local Dataset (Free & Fast)

Best for development, testing, and offline use.

  1. Create project directory:

    mkdir m4 && cd m4
  2. Initialize Dataset:

    We will use MIMIC-IV as an example.

    For Demo (Auto-download ~16MB):

    uv init && uv add m4-mcp
    uv run m4 init mimic-iv-demo

    For Full Data (Requires Manual Download): Download CSVs from PhysioNet first and place them in m4_data/raw_files.

    uv init && uv add m4-mcp
    uv run m4 init mimic-iv-full

    This can take 5-15 minutes depending on your machine

  3. Configure Your Client:

    For Claude Desktop (Shortcut):

    uv run m4 config claude --quick

    For Other Clients (Cursor, LibreChat, etc.):

    uv run m4 config --quick

    This generates the configuration JSON you need to paste into your client's settings.

Option B: BigQuery (Full Cloud Dataset)

Best for researchers with Google Cloud access.

  1. Authenticate with Google:

    gcloud auth application-default login
  2. Configure Client:

    uv run m4 config --backend bigquery --project_id BIGQUERY_PROJECT_ID

    This also generates the configuration JSON you need to paste into your client's settings.

3. Start Asking Questions!

Restart your MCP client and try:

  • "What tools do you have for MIMIC-IV data?"
  • "Show me patient demographics from the ICU"
  • "What is the race distribution in admissions?"

πŸ”„ Managing Datasets

Switch between available datasets instantly:

# Switch to full dataset
m4 use mimic-iv-full

# Switch back to demo
m4 use mimic-iv-demo

# Check status
m4 status

Backend Comparison

Feature DuckDB (Demo) DuckDB (Full) BigQuery (Full)
Cost Free Free BigQuery usage fees
Setup Zero config Manual Download GCP credentials required
Credentials Not required PhysioNet PhysioNet
Data Size 100 patients 365k patients 365k patients
Speed Fast (local) Fast (local) Network latency
Use Case Learning Research (local) Research, production

βž• Adding Custom Datasets

M4 is designed to be modular. You can add support for any tabular dataset on PhysioNet easily. Let's take eICU as an example:

JSON Definition Method

  1. Create a definition file: m4_data/datasets/eicu.json

    {
      "name": "eicu",
      "description": "eICU Collaborative Research Database",
      "file_listing_url": "https://physionet.org/files/eicu-crd/2.0/",
      "subdirectories_to_scan": [],
      "primary_verification_table": "eicu_crd_patient",
      "tags": ["clinical", "eicu"],
      "requires_authentication": true,
      "bigquery_project_id": "physionet-data",
      "bigquery_dataset_ids": ["eicu_crd"]
    }
  2. Initialize it:

    m4 init eicu --src /path/to/raw/csvs

    M4 will convert CSVs to Parquet and create DuckDB views automatically.


Alternative Installation Methods

Already have Docker or prefer pip?

🐳 Docker

DuckDB (Local):

git clone https://github.com/hannesill/m4.git && cd m4
docker build -t m4:lite --target lite .
docker run -d --name m4-server m4:lite tail -f /dev/null

BigQuery:

git clone https://github.com/rafiattrach/m4.git && cd m4
docker build -t m4:bigquery --target bigquery .
docker run -d --name m4-server \
  -e M4_BACKEND=bigquery \
  -e M4_PROJECT_ID=your-project-id \
  -v $HOME/.config/gcloud:/root/.config/gcloud:ro \
  m4:bigquery tail -f /dev/null

MCP config (same for both):

{
  "mcpServers": {
    "m4": {
      "command": "docker",
      "args": ["exec", "-i", "m4-server", "python", "-m", "m4.mcp_server"]
    }
  }
}

pip Install

pip install m4-mcp
m4 config --quick

Local Development

For contributors:

  1. Clone & Install (using uv):

    git clone https://github.com/rafiattrach/m4.git
    cd m4
    uv venv
    uv sync
  2. MCP Config:

    {
      "mcpServers": {
        "m4": {
          "command": "/absolute/path/to/m4/.venv/bin/python",
          "args": ["-m", "m4.mcp_server"],
          "cwd": "/absolute/path/to/m4",
          "env": { "M4_BACKEND": "duckdb" }
        }
      }
    }

πŸ”§ Advanced Configuration

Interactive Config Generator:

m4 config

OAuth2 Authentication: For secure production deployments:

m4 config claude --enable-oauth2 \
  --oauth2-issuer https://your-auth-provider.com \
  --oauth2-audience m4-api

See docs/OAUTH2_AUTHENTICATION.md for details.


πŸ› οΈ Available MCP Tools

  • get_database_schema: List all available tables
  • get_table_info: Get column info and sample data
  • execute_mimic_query: Execute SQL SELECT queries
  • get_icu_stays: ICU stay info & length of stay
  • get_lab_results: Laboratory test results
  • get_race_distribution: Patient race statistics

Example Prompts

Demographics:

  • What is the race distribution in MIMIC-IV admissions?
  • Show me patient demographics for ICU stays

Clinical Data:

  • Find lab results for patient X
  • What lab tests are most commonly ordered?

Exploration:

  • What tables are available in the database?

Troubleshooting

  • "Parquet not found": Rerun m4 init <dataset_name>.
  • MCP client not starting: Check logs (Claude Desktop: Help β†’ View Logs).
  • BigQuery Access Denied: Run gcloud auth application-default login and verify project ID.

Contributing & Citation

For Developers

We welcome contributions!

  1. Setup: Follow the "Local Development" steps above.
  2. Test: Run uv run pre-commit --all-files to ensure everything is working and linted.
  3. Submit: Open a Pull Request with your changes.

Citation:

M4 is built upon M3. Please cite the original work:

@article{attrach2025conversational,
  title={Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis},
  author={Attrach, Rafi Al and Moreira, Pedro and Fani, Rajna and Umeton, Renato and Celi, Leo Anthony},
  journal={arXiv preprint arXiv:2507.01053},
  year={2025}
}

Related Projects

M4 is a fork from M3 to add features that are not possible to include in the M3 repo because of it needing to stay in sync with the M3 paper.

M3 has been forked and adapted by the community. Over time, we would love to see them being adapted to M4. Here are the projects:

  • MCPStack-MIMIC - Integrates M3 with other MCP servers (Jupyter, sklearn, etc.)

Built with ❀️ for the medical AI community

Need help? Open an issue on GitHub or check our troubleshooting guide above.

About

πŸ₯πŸ€– Query PhysioNet medical data (MIMIC-IV, eICU, etc.) using natural language through Model Context Protocol (MCP). Transform healthcare research with AI-powered database interactions - supports both local and BigQuery datasets.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.5%
  • JavaScript 18.0%
  • CSS 9.5%
  • Other 1.0%