M4 (Beta): Many Medical Datasets ↔ MCP ↔ Models 🏥🤖

Query tabular PhysioNet medical data using natural language through MCP clients

Transform medical data analysis with AI! Ask questions about MIMIC-IV and other PhysioNet datasets in plain English and get instant insights. Choose between local data (free) or full cloud dataset (BigQuery).

🫶 M4 is built upon M3.
Please acknowledge the original authors and cite their work.

💡 How It Works

M4 acts as a bridge between your AI Client (like Claude Desktop, Cursor, or LibreChat) and your medical data.

You ask a question in your chat interface: "How many patients in the ICU have high blood pressure?"
M4 securely translates this into a database query.
M4 runs the query on your local or cloud data.
The LLM explains the results to you in plain English.

No SQL knowledge required.

Features

🔍 Natural Language Queries: Ask questions about your medical data in plain English
🏠 Modular Datasets: Support for any tabular PhysioNet dataset (MIMIC-IV, etc.)
📂 Local DuckDB + Parquet: Fast local queries using Parquet files with DuckDB views
☁️ BigQuery Support: Access full MIMIC-IV dataset on Google Cloud
🔒 Enterprise Security: OAuth2 authentication with JWT tokens and rate limiting
🛡️ SQL Injection Protection: Read-only queries with comprehensive validation
🧩 Extensible Architecture: Easily add new custom datasets via configuration or CLI

🚀 Quick Start

Prerequisites

You need an MCP-compatible Client to use M4. Popular options include:

1. Install `uv` (Required)

We use uvx to run the MCP server efficiently.

macOS and Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh

Windows (PowerShell):

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Choose Your Data Source

Select Option A (Local) or Option B (Cloud).

Option A: Local Dataset (Free & Fast)

Best for development, testing, and offline use.

Create project directory:
```
mkdir m4 && cd m4
```
Initialize Dataset:

We will use MIMIC-IV as an example.

For Demo (Auto-download ~16MB):
```
uv init && uv add m4-mcp
uv run m4 init mimic-iv-demo
```
For Full Data (Requires Manual Download): Download CSVs from PhysioNet first and place them in m4_data/raw_files.
```
uv init && uv add m4-mcp
uv run m4 init mimic-iv-full
```
This can take 5-15 minutes depending on your machine
Configure Your Client:

For Claude Desktop (Shortcut):
```
uv run m4 config claude --quick
```
For Other Clients (Cursor, LibreChat, etc.):
```
uv run m4 config --quick
```
This generates the configuration JSON you need to paste into your client's settings.

Option B: BigQuery (Full Cloud Dataset)

Best for researchers with Google Cloud access.

Authenticate with Google:
```
gcloud auth application-default login
```
Configure Client:
```
uv run m4 config --backend bigquery --project_id BIGQUERY_PROJECT_ID
```
This also generates the configuration JSON you need to paste into your client's settings.

3. Start Asking Questions!

Restart your MCP client and try:

"What tools do you have for MIMIC-IV data?"
"Show me patient demographics from the ICU"
"What is the race distribution in admissions?"

🔄 Managing Datasets

Switch between available datasets instantly:

# Switch to full dataset
m4 use mimic-iv-full

# Switch back to demo
m4 use mimic-iv-demo

# Check status
m4 status

Backend Comparison

Feature	DuckDB (Demo)	DuckDB (Full)	BigQuery (Full)
Cost	Free	Free	BigQuery usage fees
Setup	Zero config	Manual Download	GCP credentials required
Credentials	Not required	PhysioNet	PhysioNet
Data Size	100 patients	365k patients	365k patients
Speed	Fast (local)	Fast (local)	Network latency
Use Case	Learning	Research (local)	Research, production

➕ Adding Custom Datasets

M4 is designed to be modular. You can add support for any tabular dataset on PhysioNet easily. Let's take eICU as an example:

JSON Definition Method

Create a definition file: m4_data/datasets/eicu.json

{
  "name": "eicu",
  "description": "eICU Collaborative Research Database",
  "file_listing_url": "https://physionet.org/files/eicu-crd/2.0/",
  "subdirectories_to_scan": [],
  "primary_verification_table": "eicu_crd_patient",
  "tags": ["clinical", "eicu"],
  "requires_authentication": true,
  "bigquery_project_id": "physionet-data",
  "bigquery_dataset_ids": ["eicu_crd"]
}

Initialize it:
```
m4 init eicu --src /path/to/raw/csvs
```
M4 will convert CSVs to Parquet and create DuckDB views automatically.

Alternative Installation Methods

Already have Docker or prefer pip?

🐳 Docker

DuckDB (Local):

git clone https://github.com/hannesill/m4.git && cd m4
docker build -t m4:lite --target lite .
docker run -d --name m4-server m4:lite tail -f /dev/null

BigQuery:

git clone https://github.com/rafiattrach/m4.git && cd m4
docker build -t m4:bigquery --target bigquery .
docker run -d --name m4-server \
  -e M4_BACKEND=bigquery \
  -e M4_PROJECT_ID=your-project-id \
  -v $HOME/.config/gcloud:/root/.config/gcloud:ro \
  m4:bigquery tail -f /dev/null

MCP config (same for both):

{
  "mcpServers": {
    "m4": {
      "command": "docker",
      "args": ["exec", "-i", "m4-server", "python", "-m", "m4.mcp_server"]
    }
  }
}

pip Install

pip install m4-mcp
m4 config --quick

Local Development

For contributors:

Clone & Install (using uv):

git clone https://github.com/rafiattrach/m4.git
cd m4
uv venv
uv sync

MCP Config:

{
  "mcpServers": {
    "m4": {
      "command": "/absolute/path/to/m4/.venv/bin/python",
      "args": ["-m", "m4.mcp_server"],
      "cwd": "/absolute/path/to/m4",
      "env": { "M4_BACKEND": "duckdb" }
    }
  }
}

🔧 Advanced Configuration

Interactive Config Generator:

m4 config

OAuth2 Authentication: For secure production deployments:

m4 config claude --enable-oauth2 \
  --oauth2-issuer https://your-auth-provider.com \
  --oauth2-audience m4-api

See docs/OAUTH2_AUTHENTICATION.md for details.

🛠️ Available MCP Tools

get_database_schema: List all available tables
get_table_info: Get column info and sample data
execute_mimic_query: Execute SQL SELECT queries
get_icu_stays: ICU stay info & length of stay
get_lab_results: Laboratory test results
get_race_distribution: Patient race statistics

Example Prompts

Demographics:

What is the race distribution in MIMIC-IV admissions?
Show me patient demographics for ICU stays

Clinical Data:

Find lab results for patient X
What lab tests are most commonly ordered?

Exploration:

What tables are available in the database?

Troubleshooting

"Parquet not found": Rerun m4 init <dataset_name>.
MCP client not starting: Check logs (Claude Desktop: Help → View Logs).
BigQuery Access Denied: Run gcloud auth application-default login and verify project ID.

Contributing & Citation

For Developers

We welcome contributions!

Setup: Follow the "Local Development" steps above.
Test: Run uv run pre-commit --all-files to ensure everything is working and linted.
Submit: Open a Pull Request with your changes.

Citation:

M4 is built upon M3. Please cite the original work:

@article{attrach2025conversational,
  title={Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis},
  author={Attrach, Rafi Al and Moreira, Pedro and Fani, Rajna and Umeton, Renato and Celi, Leo Anthony},
  journal={arXiv preprint arXiv:2507.01053},
  year={2025}
}

Related Projects

M4 is a fork from M3 to add features that are not possible to include in the M3 repo because of it needing to stay in sync with the M3 paper.

M3 has been forked and adapted by the community. Over time, we would love to see them being adapted to M4. Here are the projects:

MCPStack-MIMIC - Integrates M3 with other MCP servers (Jupyter, sklearn, etc.)

Built with ❤️ for the medical AI community

Need help? Open an issue on GitHub or check our troubleshooting guide above.

Name		Name	Last commit message	Last commit date
Latest commit History 188 Commits
.github		.github
benchmarks/ehrsql-naacl2024		benchmarks/ehrsql-naacl2024
docs		docs
src/m4		src/m4
tests		tests
webapp		webapp
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.cff		CITATION.cff
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
llms-install.md		llms-install.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

M4 (Beta): Many Medical Datasets ↔ MCP ↔ Models 🏥🤖

💡 How It Works

Features

🚀 Quick Start

Prerequisites

1. Install `uv` (Required)

2. Choose Your Data Source

Option A: Local Dataset (Free & Fast)

Option B: BigQuery (Full Cloud Dataset)

3. Start Asking Questions!

🔄 Managing Datasets

Backend Comparison

➕ Adding Custom Datasets

JSON Definition Method

Alternative Installation Methods

🐳 Docker

pip Install

Local Development

🔧 Advanced Configuration

🛠️ Available MCP Tools

Example Prompts

Troubleshooting

Contributing & Citation

For Developers

Related Projects

About

Uh oh!

Releases

Packages

Languages

License

hannesill/m4

Folders and files

Latest commit

History

Repository files navigation

M4 (Beta): Many Medical Datasets ↔ MCP ↔ Models 🏥🤖

💡 How It Works

Features

🚀 Quick Start

Prerequisites

1. Install uv (Required)

2. Choose Your Data Source

Option A: Local Dataset (Free & Fast)

Option B: BigQuery (Full Cloud Dataset)

3. Start Asking Questions!

🔄 Managing Datasets

Backend Comparison

➕ Adding Custom Datasets

JSON Definition Method

Alternative Installation Methods

🐳 Docker

pip Install

Local Development

🔧 Advanced Configuration

🛠️ Available MCP Tools

Example Prompts

Troubleshooting

Contributing & Citation

For Developers

Related Projects

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. Install `uv` (Required)

Packages