Skip to content

prishagarg/Health_Atlas

ย 
ย 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

438 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿฉบ Health Atlas: Autonomous Provider Data Validation Service

An intelligent, full-stack AI system that autonomously verifies, corrects, and enriches healthcare provider data from diverse sources.

Features โ€ข Tech Stack โ€ข Getting Started โ€ข Architecture โ€ข API Documentation


๐ŸŽฏ The Problem

Healthcare organizations struggle with one of the industry's most persistent challenges: inaccurate and outdated provider data. Manual validation is time-consuming, error-prone, and doesn't scale. Incorrect provider information leads to:

  • โŒ Patient care disruptions
  • โŒ Revenue loss from denied claims
  • โŒ Regulatory compliance issues
  • โŒ Poor member experience

๐Ÿ’ก The Solution

Health Atlas leverages a multi-agent AI system that autonomously validates healthcare provider data at scale, transforming weeks of manual work into minutes of intelligent automation.


โœจ Key Features

๐Ÿš€ Real-Time Bulk Validation

  • Upload CSV files containing provider data and watch the system validate each record in parallel
  • Stream results back to the UI in real-time with live progress tracking
  • Process hundreds of records simultaneously using async architecture

๐Ÿ‘๏ธ Vision Language Model (VLM) Ready

  • Architected specifically for VLM integration to extract structured data from unstructured documents
  • Handle scanned PDFs, image-based documents, and handwritten forms
  • Process documents that traditional text parsers cannot read
  • Ready to integrate: Gemini Vision API, GPT-4 Vision, or Claude 3 Vision

๐ŸŽฏ Intelligent Prioritization System

  • Priority Score Algorithm: Combines data accuracy (Confidence Score) with business impact (Member Impact)
  • Automatically flag high-risk records for manual review
  • Focus your team's efforts on the most critical data quality issues first

๐Ÿ“Š Actionable Reporting & Dashboards

  • Run Summary Dashboard: At-a-glance metrics for every validation job
    • Total records processed
    • Auto-validated vs. flagged records
    • Breakdown of common error types
    • Confidence score distribution
  • Professional PDF Reports: Export clean, shareable reports for stakeholders
  • Email Generation: Auto-generate follow-up emails for flagged providers

๐Ÿค– Multi-Agent AI Engine

A deterministic AI pipeline where specialized agents collaborate:

Agent Role Capabilities
๐Ÿง  Data Validation Agent Baseline Verification Cross-checks provider info against official NPI registry, validates physical addresses, verifies credentials
๐ŸŒ Information Enrichment Agent Data Enhancement Web scraping for missing data, contact information discovery, specialty validation
๐Ÿ” Quality Assurance Agent Integrity Checks Flags inconsistencies, detects mock/fake licenses, calculates reliability scores
๐Ÿ—‚๏ธ Directory Management Agent Data Synthesis Standardizes formats, resolves conflicts, generates final validated profiles

๐Ÿš€ Tech Stack

Category Technologies
AI Backend Python 3.10+, FastAPI, LangGraph, Groq API
Frontend React 18, Vite, Tailwind CSS, jsPDF, React Query
AI/ML LangChain, LangGraph, Vision API Integration Layer
Data Processing Pandas, AsyncIO, PyPDF2
Web Automation Selenium WebDriver
APIs & Services Geoapify (Geocoding), NPI Registry API
Development Tools Faker (test data generation), ESLint, Prettier

๐Ÿงฉ Getting Started

Prerequisites

Before you begin, ensure you have the following installed:

โš™๏ธ 1. Clone the Repository

git clone https://github.com/Rupali_2507/Health_Atlas
cd Health_Atlas

๐Ÿ–ฅ๏ธ 2. Backend Setup

# Navigate to backend directory
cd backend

# Create and activate virtual environment
python -m venv .venv

# On Windows
.\.venv\Scripts\activate

# On macOS/Linux
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Configure Environment Variables:

Create a .env file in the backend directory:

# Required API Keys
GROQ_API_KEY="your-groq-api-key-here"
GEOAPIFY_API_KEY="your-geoapify-api-key-here"

# Optional: VLM Integration (Uncomment when ready)
# GOOGLE_API_KEY="your-google-api-key"
# OPENAI_API_KEY="your-openai-api-key"

# Server Configuration
HOST="127.0.0.1"
PORT=8000

Start the Backend Server:

uvicorn main:app --reload

โœ… Backend running at: http://127.0.0.1:8000
๐Ÿ“š API Documentation: http://127.0.0.1:8000/docs

๐Ÿ’ป 3. Frontend Setup

Open a new terminal window:

# Navigate to frontend directory
cd frontend

# Install dependencies
npm install

# Start development server
npm run dev

โœ… Frontend running at: http://localhost:5173

๐ŸŽ‰ 4. Access the Application

Open your browser and navigate to:


๐Ÿ”ฌ Backend Deep Dive: Dual-Flow AI Architecture

Health Atlas operates on a sophisticated dual-flow architecture, demonstrating versatility in handling different business processes.

Flow 1: AI Validation Pipeline (Core)

CSV Upload โ†’ Parallel Processing โ†’ Multi-Agent Analysis โ†’ Real-Time Streaming โ†’ Summary Report

Key Components:

  1. High-Throughput Async Processing

    • FastAPI backend uses asyncio for concurrent record processing
    • Configurable batch sizes for optimal performance
    • Handles large datasets (10,000+ records) efficiently
  2. Live Streaming Architecture

    • Server-Sent Events (SSE) push results to frontend
    • Real-time progress tracking and log visualization
    • No polling required - true push-based updates
  3. Comprehensive Analysis Pipeline

    • NPI registry cross-validation
    • Address geocoding and verification
    • Website scraping for data enrichment
    • Confidence scoring and flagging logic
  4. Actionable Outputs

    • Downloadable PDF summary reports
    • Prioritized review queue
    • Auto-generated follow-up emails for flagged providers

Flow 2: VLM Document Processing (Future-Ready)

PDF Upload โ†’ VLM Analysis โ†’ Structured Extraction โ†’ Data Validation โ†’ Profile Creation

Currently Implemented:

  • PDF text extraction using PyPDF2
  • Structured data parsing
  • Ready-to-integrate VLM API layer

VLM Integration (Ready to Enable):

# Example: Gemini Vision Integration
def analyze_provider_document_vlm(file_path: str) -> dict:
    """
    Extract structured provider data from any document type using VLM.
    Handles: scanned PDFs, images, handwritten forms, etc.
    """
    file = genai.upload_file(path=file_path)
    
    prompt = """
    Extract the following provider information:
    - Full Name
    - NPI Number
    - Specialties
    - Address (Street, City, State, ZIP)
    - Phone and Fax
    - License Numbers
    - Accepting New Patients status
    """
    
    response = model.generate_content([file, prompt])
    return parse_structured_response(response.text)

๐Ÿงฐ The Agent's Toolkit

Health Atlas agents are powered by specialized tools that handle distinct validation tasks.

Function Description Technology
search_npi_registry() ๐Ÿ”Ž Connects to official NPI database for baseline verification NPI Registry API
parse_provider_pdf() ๐Ÿ“„ Extracts text from provider documents with broad PDF compatibility PyPDF2
parse_provider_pdf_vlm() ๐Ÿ‘๏ธ VLM-powered extraction from scanned/image-based documents Gemini Vision API
scrape_provider_website() ๐ŸŒ Dynamically scrapes provider websites for enrichment Selenium WebDriver
validate_address() ๐Ÿ—บ๏ธ Confirms address accuracy with geographic confidence scoring Geoapify API
calculate_priority_score() ๐Ÿ“Š Computes priority based on confidence ร— member impact Custom Algorithm
generate_follow_up_email() โœ‰๏ธ Creates professional email templates for flagged records LangChain + Groq

๐Ÿ“Š System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   React UI      โ”‚  โ† User uploads CSV
โ”‚   (Frontend)    โ”‚  โ† Real-time results streaming
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  FastAPI Server โ”‚  โ† Async job orchestration
โ”‚   (Backend)     โ”‚  โ† Multi-agent coordination
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
    โ–ผ         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ LangGraphโ”‚ โ”‚  Tools   โ”‚
โ”‚  Agents  โ”‚ โ”‚  Layer   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
    โ”‚            โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”˜
          โ–ผ
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚ External APIsโ”‚
    โ”‚ - NPI Reg    โ”‚
    โ”‚ - Geoapify   โ”‚
    โ”‚ - Web Scraperโ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ“ˆ Performance & Scalability

  • โšก Processing Speed: 100+ records/minute with parallel execution
  • ๐Ÿ“ฆ Batch Processing: Configurable batch sizes for memory optimization
  • ๐Ÿ”„ Async Architecture: Non-blocking I/O for maximum throughput
  • ๐Ÿ“Š Scalability: Horizontal scaling ready with minimal configuration

๐Ÿ›ฃ๏ธ Roadmap

Phase 1: Core Validation (โœ… Complete)

  • Multi-agent AI pipeline
  • NPI registry integration
  • Address validation
  • Real-time streaming UI
  • PDF reporting

Phase 2: VLM Integration (๐Ÿšง In Progress)

  • Gemini Vision API integration
  • Scanned document processing
  • Handwriting recognition
  • Image-based PDF parsing

Phase 3: Advanced Features (๐Ÿ“‹ Planned)

  • Historical data tracking
  • Automated re-validation scheduling
  • Machine learning-based anomaly detection
  • Multi-tenant architecture
  • API rate limiting and caching
  • Advanced analytics dashboard

Phase 4: Enterprise Ready (๐Ÿ”ฎ Future)

  • SSO/SAML authentication
  • Role-based access control
  • Audit logging
  • SOC 2 compliance
  • HIPAA compliance features
  • Microservices architecture

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


Final KPI Summary

The system successfully processed all valid records and produced the following final summary:

Analysis of Results vs. Goals

  • KPI Goal Result Status
  • Validation Accuracy 80%+ 88.89% โœ… GOAL ACHIEVED
  • Processing Speed < 300 sec ~732 sec PARTIALLY ACHIEVED*
  • Processing Throughput 500+/hr 517 providers/hr โœ… GOAL ACHIEVED

Note on Processing Speed: The 5-minute target was missed as a deliberate engineering trade-off for the demo. To guarantee a stable run without hitting API rate limits on the free tier, the number of parallel workers was set to 1. The throughput of 517 providers/hour proves the architecture is highly efficient and would easily beat the speed target with a production-level API key.

๐Ÿ™ Acknowledgments

  • LangChain & LangGraph for the agent orchestration framework
  • Groq for high-speed LLM inference
  • FastAPI for the excellent async web framework
  • React Community for the robust frontend ecosystem

๐Ÿงญ Vision

Health Atlas represents a step toward self-healing data ecosystems โ€” systems that not only detect but autonomously repair data drift in critical infrastructures like healthcare.

This foundation can scale toward enterprise-grade deployments where data reliability becomes an autonomous service, reducing operational overhead and improving patient outcomes across the healthcare industry.


About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • JavaScript 57.5%
  • Python 26.4%
  • Java 15.5%
  • Other 0.6%