Canopy is an open-source platform for building FAIR-aligned scientific data hubs that support study discovery, data sharing, and metadata validation. It is derived from the NIH RADx Data Hub, a cloud-based platform originally developed for the NIH Rapid Acceleration of Diagnostics (RADx) program during the COVID-19 pandemic. RADx Data Hub is available at https://radxdatahub.nih.gov/ and on GitHub. Canopy redesigns the core components of the RADx Data Hub by removing domain-specific assumptions while retaining essential functionality.
Deploying Canopy to AWS
Start here → 📘 Deployment Guide
Exploring the codebase?
Start here → 🗂️ Main Repository — links to every service, tool, and guide
Canopy runs on AWS as a microservices platform:
- 7 Spring Boot microservices on ECS Fargate, behind an Application Load Balancer
- Next.js / React frontend with server-side rendering
- PostgreSQL (RDS) for relational data persistence
- OpenSearch for full-text and faceted search
- AWS Lambda for asynchronous email processing and search reindexing
- S3 for dataset file storage
- Keycloak for authentication and authorization
- CloudFormation (IaC) for repeatable, auditable AWS deployments
| Repository | Description |
|---|---|
| datahub-service-entity | Direct retrieval of database entities |
| datahub-service-search | Search across studies and variables |
| datahub-service-user | User info, profiles, and support requests |
| datahub-service-submission | Data and study ingestion workflows |
| datahub-service-report | Metrics dashboard and reporting |
| datahub-service-download | Controlled dataset file downloads |
| datahub-service-email | Lambda-based email notifications via AWS SES |
| datahub-lib-keycloak-auth | Shared Keycloak authentication library |
| datahub-project | Maven parent POM for all Java services |
| Repository | Description |
|---|---|
| datahub-ui-main | Next.js / React web application |
| Repository | Description |
|---|---|
| datahub-cloud-replication | AWS CloudFormation templates |
| datahub-development | PostgreSQL schema scripts, seed data, OpenSearch Lambda, Keycloak Docker Compose |
| datahub-docs | Deployment guide, limitations, and operator documentation |
| datahub-deployment-scripts | Automation scripts supporting deployment and operations |
| Repository | Description |
|---|---|
| datahub-cli | CLI for local development and server management |
| datahub-utility-scripts | Automation helpers and publication utilities |