Query tabular PhysioNet medical data using natural language through MCP clients
Transform medical data analysis with AI! Ask questions about MIMIC-IV and other PhysioNet datasets in plain English and get instant insights. Choose between local data (free) or full cloud dataset (BigQuery).
π«Ά M4 is built upon M3.
Please acknowledge the original authors and cite their work.
M4 acts as a bridge between your AI Client (like Claude Desktop, Cursor, or LibreChat) and your medical data.
- You ask a question in your chat interface: "How many patients in the ICU have high blood pressure?"
- M4 securely translates this into a database query.
- M4 runs the query on your local or cloud data.
- The LLM explains the results to you in plain English.
No SQL knowledge required.
- π Natural Language Queries: Ask questions about your medical data in plain English
- π Modular Datasets: Support for any tabular PhysioNet dataset (MIMIC-IV, etc.)
- π Local DuckDB + Parquet: Fast local queries using Parquet files with DuckDB views
- βοΈ BigQuery Support: Access full MIMIC-IV dataset on Google Cloud
- π Enterprise Security: OAuth2 authentication with JWT tokens and rate limiting
- π‘οΈ SQL Injection Protection: Read-only queries with comprehensive validation
- π§© Extensible Architecture: Easily add new custom datasets via configuration or CLI
You need an MCP-compatible Client to use M4. Popular options include:
We use uvx to run the MCP server efficiently.
macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | shWindows (PowerShell):
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"Select Option A (Local) or Option B (Cloud).
Best for development, testing, and offline use.
-
Create project directory:
mkdir m4 && cd m4
-
Initialize Dataset:
We will use MIMIC-IV as an example.
For Demo (Auto-download ~16MB):
uv init && uv add m4-mcp uv run m4 init mimic-iv-demoFor Full Data (Requires Manual Download): Download CSVs from PhysioNet first and place them in
m4_data/raw_files.uv init && uv add m4-mcp uv run m4 init mimic-iv-fullThis can take 5-15 minutes depending on your machine
-
Configure Your Client:
For Claude Desktop (Shortcut):
uv run m4 config claude --quick
For Other Clients (Cursor, LibreChat, etc.):
uv run m4 config --quick
This generates the configuration JSON you need to paste into your client's settings.
Best for researchers with Google Cloud access.
-
Authenticate with Google:
gcloud auth application-default login
-
Configure Client:
uv run m4 config --backend bigquery --project_id BIGQUERY_PROJECT_ID
This also generates the configuration JSON you need to paste into your client's settings.
Restart your MCP client and try:
- "What tools do you have for MIMIC-IV data?"
- "Show me patient demographics from the ICU"
- "What is the race distribution in admissions?"
Switch between available datasets instantly:
# Switch to full dataset
m4 use mimic-iv-full
# Switch back to demo
m4 use mimic-iv-demo
# Check status
m4 status| Feature | DuckDB (Demo) | DuckDB (Full) | BigQuery (Full) |
|---|---|---|---|
| Cost | Free | Free | BigQuery usage fees |
| Setup | Zero config | Manual Download | GCP credentials required |
| Credentials | Not required | PhysioNet | PhysioNet |
| Data Size | 100 patients | 365k patients | 365k patients |
| Speed | Fast (local) | Fast (local) | Network latency |
| Use Case | Learning | Research (local) | Research, production |
M4 is designed to be modular. You can add support for any tabular dataset on PhysioNet easily. Let's take eICU as an example:
-
Create a definition file:
m4_data/datasets/eicu.json{ "name": "eicu", "description": "eICU Collaborative Research Database", "file_listing_url": "https://physionet.org/files/eicu-crd/2.0/", "subdirectories_to_scan": [], "primary_verification_table": "eicu_crd_patient", "tags": ["clinical", "eicu"], "requires_authentication": true, "bigquery_project_id": "physionet-data", "bigquery_dataset_ids": ["eicu_crd"] } -
Initialize it:
m4 init eicu --src /path/to/raw/csvs
M4 will convert CSVs to Parquet and create DuckDB views automatically.
Already have Docker or prefer pip?
|
DuckDB (Local): git clone https://github.com/hannesill/m4.git && cd m4
docker build -t m4:lite --target lite .
docker run -d --name m4-server m4:lite tail -f /dev/null |
BigQuery: git clone https://github.com/rafiattrach/m4.git && cd m4
docker build -t m4:bigquery --target bigquery .
docker run -d --name m4-server \
-e M4_BACKEND=bigquery \
-e M4_PROJECT_ID=your-project-id \
-v $HOME/.config/gcloud:/root/.config/gcloud:ro \
m4:bigquery tail -f /dev/null |
MCP config (same for both):
{
"mcpServers": {
"m4": {
"command": "docker",
"args": ["exec", "-i", "m4-server", "python", "-m", "m4.mcp_server"]
}
}
}pip install m4-mcp
m4 config --quickFor contributors:
-
Clone & Install (using
uv):git clone https://github.com/rafiattrach/m4.git cd m4 uv venv uv sync -
MCP Config:
{ "mcpServers": { "m4": { "command": "/absolute/path/to/m4/.venv/bin/python", "args": ["-m", "m4.mcp_server"], "cwd": "/absolute/path/to/m4", "env": { "M4_BACKEND": "duckdb" } } } }
Interactive Config Generator:
m4 configOAuth2 Authentication: For secure production deployments:
m4 config claude --enable-oauth2 \
--oauth2-issuer https://your-auth-provider.com \
--oauth2-audience m4-apiSee
docs/OAUTH2_AUTHENTICATION.mdfor details.
- get_database_schema: List all available tables
- get_table_info: Get column info and sample data
- execute_mimic_query: Execute SQL SELECT queries
- get_icu_stays: ICU stay info & length of stay
- get_lab_results: Laboratory test results
- get_race_distribution: Patient race statistics
Demographics:
- What is the race distribution in MIMIC-IV admissions?
- Show me patient demographics for ICU stays
Clinical Data:
- Find lab results for patient X
- What lab tests are most commonly ordered?
Exploration:
- What tables are available in the database?
- "Parquet not found": Rerun
m4 init <dataset_name>. - MCP client not starting: Check logs (Claude Desktop: Help β View Logs).
- BigQuery Access Denied: Run
gcloud auth application-default loginand verify project ID.
We welcome contributions!
- Setup: Follow the "Local Development" steps above.
- Test: Run
uv run pre-commit --all-filesto ensure everything is working and linted. - Submit: Open a Pull Request with your changes.
Citation:
M4 is built upon M3. Please cite the original work:
@article{attrach2025conversational,
title={Conversational LLMs Simplify Secure Clinical Data Access, Understanding, and Analysis},
author={Attrach, Rafi Al and Moreira, Pedro and Fani, Rajna and Umeton, Renato and Celi, Leo Anthony},
journal={arXiv preprint arXiv:2507.01053},
year={2025}
}M4 is a fork from M3 to add features that are not possible to include in the M3 repo because of it needing to stay in sync with the M3 paper.
M3 has been forked and adapted by the community. Over time, we would love to see them being adapted to M4. Here are the projects:
- MCPStack-MIMIC - Integrates M3 with other MCP servers (Jupyter, sklearn, etc.)
Built with β€οΈ for the medical AI community
Need help? Open an issue on GitHub or check our troubleshooting guide above.
