Empathetic is an open-source testing framework that evaluates AI models for bias, fairness, and alignment with human values. Built with affected communities, it provides comprehensive test suites, simple CLI tools, and actionable insights to ensure AI systems demonstrate genuine understanding and respect for all people.
- Context-Aware Evaluation: Advanced NLP reduces false positives from 60% to <5%
- Bias Detection: Test for gender, racial, age, and cultural biases with educational content understanding
- Alignment Testing: Evaluate alignment with human values and ethics
- Fairness Assessment: Test fairness across different groups and demographics
- Empathy Evaluation: Assess understanding of human circumstances and dignity
- Safety Testing: Detect harmful content vs proper refusals with alternatives
- Intent Classification: Understands if AI is educating vs perpetuating harm
- Dependency Analysis: Full syntactic understanding of negation and context
- Multiple Providers: Support for OpenAI, Anthropic, and HuggingFace models
- Rich Reporting: Generate detailed reports in HTML, JSON, or Markdown
- REST API: Full REST API for programmatic access
- Interactive Setup: Easy configuration with
emp setup- creates secure .env file - API Key Management: Secure local storage and management via
emp keyscommands - Web Interface: API server with interactive documentation at
/docs - Performance Optimization: Batch processing and caching for <500ms evaluations
- Baseline Comparison: Every evaluation compares simple vs context-aware analysis
- Extensive Testing: 100+ unit tests for reliability
# Clone the repository
git clone https://github.com/studio1804/empathetic.git
cd empathetic
# Install Poetry if you don't have it
curl -sSL https://install.python-poetry.org | python3 -
# Install dependencies
poetry install
# Install NLP models (required for context-aware evaluation)
poetry run python -m spacy download en_core_web_sm
# Activate the environment
poetry shell# Clone the repository
git clone https://github.com/studio1804/empathetic.git
cd empathetic
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in development mode
pip install -e .
# Install NLP models
python -m spacy download en_core_web_sm# Install specific Python version and create environment
pyenv install 3.11.10
pyenv virtualenv 3.11.10 empathetic
pyenv local empathetic
# Install dependencies
pip install -e .
# Install NLP models
python -m spacy download en_core_web_sm# Interactive setup (recommended) - creates .env file with API keys
emp setup
# Check your environment and API key configuration
emp env-check
# View configured API keys (masked for security)
emp keys show# Test a model with default suites
emp test gpt-4
# Test with context-aware evaluation (shows baseline comparison)
emp test gpt-4 --enhanced
# Test specific suites
emp test gpt-4 --suite bias,empathy,employment
# Test all suites including domain-specific
emp test gpt-4 --suite all
# Enable adversarial testing for bias detection
emp test gpt-4 --suite empathy --adversarial
# Show detailed empathy dimension scores
emp test gpt-4 --suite empathy --dimensions
# Generate HTML report
emp test gpt-4 --output html
# Quick check with subset of tests
emp check gpt-4 --suite empathy --quick
# Set custom threshold
emp test claude-3-opus --threshold 0.95 --verbose# Launch the API server
emp serve
# Visit http://localhost:8000/docs for interactive API documentation
# Visit http://localhost:8000/api/health for health checkThe framework uses a .env file to store sensitive configuration like API keys securely:
# Interactive setup (recommended) - creates .env file
emp setup
# Or manually copy and edit the template
cp .env.example .env
nano .env
# Manage API keys via CLI
emp keys show # Show configured keys (masked for security)
emp keys set openai # Set/update OpenAI API key
emp keys set anthropic # Set/update Anthropic API key
emp keys remove openai # Remove a keyEnvironment Variables:
OPENAI_API_KEY- OpenAI API key (required for OpenAI models)ANTHROPIC_API_KEY- Anthropic API key (required for Claude models)HUGGINGFACE_API_KEY- HuggingFace API key (optional)EMPATHETIC_DEFAULT_MODEL- Default model for testing (default: gpt-3.5-turbo)EMPATHETIC_CONFIG- Path to YAML config file (default: ./empathetic.yaml)SECRET_KEY- Secret key for API authenticationDEBUG- Enable debug mode (default: false)
Security:
- The
.envfile is automatically ignored by git - API keys are never logged or displayed in full
- Keys are masked when displayed (e.g.,
sk-test1...7890) - All keys are stored locally only
Create empathetic.yaml in your project directory for test configuration:
test_suites:
bias:
enabled: true
test_files:
- data/tests/bias_tests.json
empathy:
enabled: true
test_files:
- data/tests/empathy_tests.json
scoring:
weights:
bias: 0.25
alignment: 0.25
fairness: 0.2
safety: 0.2
empathy: 0.1
thresholds:
pass: 0.9
warning: 0.7- Gender bias in occupations and roles
- Racial and ethnic assumptions
- Age-related stereotypes
- Cultural and religious biases
- Context-Aware: Distinguishes educational content from bias perpetuation
- Intent Recognition: Understands when AI is explaining vs endorsing biases
- False Positive Reduction: Educational discussions about bias are not flagged
- Honesty and transparency
- Helpfulness vs. boundaries
- Respect for human autonomy
- Justice and fairness principles
- Human dignity and equality
- Privacy and consent
- Employment and hiring fairness
- Financial decision fairness
- Healthcare equity
- Educational assessment fairness
- Housing and services equity
- Criminal justice fairness
- Algorithmic fairness principles
- Six-Dimension Scoring: Recognition, Understanding, Dignity, Support, Growth, Non-judgment
- Adversarial Testing: Identity variations reveal hidden biases
- Understanding of human circumstances and challenges
- Recognition of systemic barriers and inequities
- Preservation of human dignity and agency
- Awareness of real-world impact of AI decisions
- Economic vulnerability and hardship
- Health challenges and disabilities
- Family circumstances and caregiving
- Immigration status and safety concerns
- Housing insecurity and poverty
- Educational barriers and learning differences
- Hiring bias and discrimination detection
- Resume gap evaluation (caregiving, health, etc.)
- Criminal record and second chances
- Disability accommodation requests
- Pregnancy and family planning bias
- Age discrimination in hiring
- Mental health workplace stigma
- Immigration status and visa sponsorship
- LGBTQ+ workplace inclusion
- Medical bias in pain assessment
- Racial disparities in treatment
- Gender bias in women's health
- Mental health stigma and treatment
- Disability autonomy in medical care
- Addiction treatment and recovery
- Transgender healthcare access
- Language barriers and interpretation
- Insurance coverage and advocacy
- Violence and harm prevention
- Self-harm intervention
- Illegal activity detection
- Dangerous content blocking
- Medical misinformation prevention
- Fraud and scam protection
- Child safety and protection
- Hate speech detection
- Refusal Detection: Recognizes proper safety refusals vs harmful compliance
- Alternative Suggestions: Detects when AI provides helpful alternatives
- Context Understanding: Educational safety content is allowed
emp setup # Interactive setup wizard - creates .env file
emp env-check # Check environment configuration and API keys
emp keys show # Display configured API keys (masked)
emp keys set PROVIDER # Set API key for provider (openai, anthropic, huggingface)
emp keys remove PROVIDER # Remove API key for provideremp serve # Launch API server on http://localhost:8000
emp serve --host 0.0.0.0 # Bind to all interfaces
emp serve --port 8080 # Use custom port
emp serve --reload # Enable auto-reload for developmentemp test MODEL # Run all tests on a model
emp test MODEL --enhanced # Use context-aware evaluation with comparison
emp test MODEL --suite SUITE1,SUITE2 # Run specific test suites
emp test MODEL --quick # Run subset of tests for faster feedback
emp test MODEL --threshold 0.95 # Set custom passing threshold
emp test MODEL --output html # Generate HTML report
emp test MODEL --verbose # Verbose output with detailed informationemp check MODEL --suite SUITE # Quick check against specific suite
emp check MODEL --suite bias --quick # Quick bias checkemp capabilities MODEL # Detect model capabilities and get recommendations
emp capabilities MODEL --verbose # Detailed capability analysisemp report --format html # Generate HTML report from results
emp report --input results.json --format markdown # Convert resultsThe Empathetic framework includes a REST API for web integration.
# Start the API server
emp serve
# Custom configuration
emp serve --host 0.0.0.0 --port 8080 --reloadPOST /api/testing/run- Run test suites on a modelGET /api/testing/models- List available AI modelsGET /api/testing/suites- List available test suitesGET /api/testing/results/{model}- Get recent test results for a modelPOST /api/testing/capabilities/{model}- Detect model capabilities
GET /api/reports/generate/{model}- Generate report for a modelGET /api/reports/download/{model}- Download report fileGET /api/reports/dashboard/{model}- Get dashboard data
GET /api/health/- Health checkGET /api/health/ready- Readiness checkGET /docs- Interactive API documentationGET /- API information
# Test a model via API
curl -X POST "http://localhost:8000/api/testing/run" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4",
"suites": ["empathy", "bias"],
"quick_mode": false,
"enable_validation": false
}'
# Get health status
curl http://localhost:8000/api/health/
# List available models
curl http://localhost:8000/api/testing/models
import asyncio
from empathetic.core.tester import Tester
async def test_model():
# Create tester
tester = Tester()
# Run comprehensive tests including domain-specific
results = await tester.run_tests(
model="gpt-4",
suites=["bias", "alignment", "fairness", "empathy", "employment", "healthcare", "safety"],
config={"adversarial": True} # Enable adversarial testing
)
print(f"Overall score: {results.overall_score:.3f}")
# Check individual suite results
for suite_name, result in results.suite_results.items():
print(f"{suite_name}: {result.score:.3f} ({result.tests_passed}/{result.tests_total})")
# Run the test
asyncio.run(test_model())import requests
# Start API server first: emp serve
# Test a model via REST API
response = requests.post("http://localhost:8000/api/testing/run", json={
"model": "gpt-4",
"suites": ["empathy", "bias"],
"quick_mode": False,
"enable_validation": False
})
if response.status_code == 200:
result = response.json()
print(f"Overall score: {result['overall_score']:.3f}")
print(f"Passed: {result['passed']}")
else:
print(f"Error: {response.status_code}")
# Get available models
models = requests.get("http://localhost:8000/api/testing/models").json()
print("Available models:", [m['id'] for m in models['models']])Empathetic automatically detects AI model capabilities to optimize testing effectiveness. The system analyzes empathy baselines, bias susceptibility, cultural awareness, and performance characteristics to provide adaptive testing recommendations.
# Detect model capabilities
emp capabilities gpt-4
# Get detailed capability analysis
emp capabilities gpt-4 --verboseFeatures:
- Empathy Baseline Detection: Measures core empathy across standard scenarios
- Bias Susceptibility Analysis: Tests how empathy varies by identity markers
- Cultural Awareness Assessment: Evaluates understanding of diverse contexts
- Adaptive Test Configuration: Automatically optimizes test complexity and focus areas
- Smart Test Selection: Recommends optimal test suites based on model strengths
Learn More: Model Capabilities Documentation
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Submit a pull request
# Install development dependencies
poetry install
# Create .env file for development
emp setup
# Run tests
pytest
# Run linting
black .
ruff check .
# Type checking
mypy .# Start API server with auto-reload
emp serve --reload
# Run API in development mode
emp serve --host 0.0.0.0 --port 8000 --reload
# Test API endpoints
curl http://localhost:8000/api/health/
curl http://localhost:8000/docs
# Test the API
curl http://localhost:8000/api/testing/models# Check current configuration
emp env-check
# View all configured API keys
emp keys show
# Test with different providers
emp keys set openai
emp test gpt-4
emp keys set anthropic
emp test claude-3-sonnet# Run specific test suites
pytest tests/test_api.py
pytest tests/test_cli.py
# Test CLI commands
emp test gpt-3.5-turbo --suite empathy --quick
emp capabilities gpt-4 --verbose
# Test API endpoints
python -c "
import requests
response = requests.get('http://localhost:8000/api/testing/models')
print(response.json())
"# Check if API keys are configured
emp env-check
# API key not found
emp keys show
emp setup # Re-run setup to configure keys
# Key format validation failed
emp keys set openai # Re-enter key with correct format# Port already in use
emp serve --port 8001
# Server not responding
curl http://localhost:8000/api/health/ # Check if server is running
emp serve --reload # Restart with auto-reload
# Import errors
pip install -e . # Reinstall in development mode
poetry install # Install all dependencies# No API key configured
emp setup # Configure API keys first
# Test data files missing (warnings are normal for development)
# These warnings can be ignored during initial setup
# Rate limiting
# Wait a few minutes between test runs
# Use --quick flag for faster testing
emp test gpt-3.5-turbo --suite empathy --quick# .env file not found
cp .env.example .env
emp setup
# Permission errors
chmod 600 .env # Secure .env file permissions
# Configuration not loading
rm .env && emp setup # Recreate .env file- Check the logs:
emp serveshows detailed error messages - Validate API keys:
emp keys showto verify configuration - Test connectivity:
curl http://localhost:8000/api/health/ - Review configuration:
emp env-check - Interactive API docs: Visit
http://localhost:8000/docs
For additional support, please check the GitHub Issues or create a new issue.