From the New York State Senate
Dual BSD/GPL License. See the NYSenate licensing page http://www.nysenate.gov/Open-Source-Software-Licenses.
OpenLegislation is a comprehensive platform for accessing and analyzing legislative data from multiple sources. Originally developed by the New York State Senate for NY State legislative data, it has evolved into a unified platform that aggregates federal and all 50 state legislative information.
Core Mission: Democratize access to legislative information through advanced technology, AI-powered analysis, and developer-friendly APIs.
Data Sources:
- NY State LBDC: Real-time NY State legislative data (original source)
- Congress.gov: Official U.S. Congress legislative information
- GovInfo.gov: Bulk federal legislative data and documents
- OpenStates: All 50 state legislative data in unified format
Key Capabilities:
- Multi-source data aggregation and harmonization
- Real-time data processing and updates
- AI-powered semantic search and content analysis
- Comprehensive analytics and trend identification
- Developer-friendly APIs and SDKs
- Advanced research and comparison tools
Updates to legislative data are processed in real-time from multiple sources and redistributed through unified APIs for integration with various web applications. The platform is developed and run using modern open-source technologies and frameworks including:
- Java 21
- Spring 5 Framework
- PostgreSQL
- Elasticsearch 8
- React
- Tomcat 9
This repository includes comprehensive automated PR management:
- Auto-merges safe Dependabot updates
- Provides automated code review feedback
- Automatically labels and categorizes PRs
- Generates weekly PR dashboards
- Manages stale PRs
π Learn More | π Setup Guide
New! Deploy your own AI-powered code review webhook server:
- Uses OpenRouter AI agents (Claude, GPT-4, etc.) for intelligent code review
- Provides detailed analysis: security, bugs, style, performance
- Auto-merge capability based on AI review scores
- Designed for homelab deployment with Docker
π Webhook Server Guide | βοΈ Setup Instructions
- Kevin Caseiras [email protected]
- Ken Zalewski [email protected]
- Anthony Calabrese [email protected]
- Jacob Keegan [email protected]
- Nathan Freitas [email protected]
- Jared Williams [email protected]
- Graylin Kim [email protected]
- Ash Islam [email protected]
- Sam Stouffer [email protected]
This repository is organized into several key directories, each serving a specific purpose in the OpenLegislation ecosystem:
project_summary.md- High-level project overview and capabilitiesknowledge_base.md- Essential notes and crucial information for understanding the repositoryREADME.md- Main project documentationdocs/- Comprehensive documentation including setup guides, API references, and development docsdemos/- Demo scripts and sample outputslogs/- Application and ingestion log filesscripts/- Setup and utility scripts
-
src/- Java source code for the OpenLegislation applicationmain/- Main application code including API controllers, data processors, and business logictest/- Unit and integration testsdb/- Database migration scripts and SQL filespipeline/- Data processing pipeline componentsvector/- Vector database and semantic search components
-
pom.xml- Maven build configuration for the Java application (Java 17, Spring 5, PostgreSQL, Elasticsearch 8)
frontend/- Next.js-based web interface for data ingestion management- Parameter-based filtering for downloading datasets
- Real-time monitoring of ingestion progress
- Data viewer for browsing ingested data
- AI-enhanced processing capabilities
tools/- Python utilities and scripts for data ingestion and analysisingest_*.py- Scripts for pulling legislative data from Congress.gov, GovInfo, and other sourcesinstall_*.sh- Infrastructure provisioning scripts (Elasticsearch, PostgreSQL, Tomcat, etc.)research/- Reproducible analysis pipelines for legislative research- Bill text analysis (TF-IDF, topic modeling, sentiment analysis)
- Social media research and engagement tracking
- Member activity summaries and statistics
- See tools/README.md for detailed documentation
-
bin/- Operational scripts for running the applicationrun.sh- Application startup scriptcron.sh- Scheduled task managementelasticsearch.sh- Elasticsearch management utilitieswebsite_cron_*.sh- Website synchronization scriptsxferdata.sh- Data transfer utilities
-
infra/- Infrastructure as Code (IaC) configurationsterraform/- Terraform configurations for cloud infrastructurepulumi/- Pulumi configurations for infrastructure managementscripts/- Infrastructure management scripts
-
ansible/- Ansible playbooks for configuration management- Automated deployment configurations
- GitLab integration setup
- Server provisioning playbooks
-
webhook-server/- AI-powered PR review and auto-merge webhook server- OpenRouter AI integration (Claude, GPT-4, etc.)
- Automated code review with security, bug, and style analysis
- Quality scoring system (1-10) for PRs
- Optional auto-merge based on thresholds
- Designed for self-hosted deployment
- See webhook-server/README.md for setup
-
.github/- GitHub Actions workflows and automation- Auto-merge for safe Dependabot updates
- Automated code review feedback
- PR labeling and categorization
- Weekly PR dashboards
- Stale PR management
docs/- Comprehensive project documentationbackend/- Backend development guidesapi/- API documentation and referenceexternal_docs/- Third-party integration documentation- Federal data integration guides (Congress.gov, GovInfo)
- Database schema documentation
- Deployment and setup guides
- Automation and ingestion guides (moved from root)
- See docs/pr-automation-README.md for PR automation details
jmeter/- JMeter load testing configurations- API load test scripts
- Performance benchmarking tools
models/- Python data models for legislative entities- Bill, agenda, calendar, committee models
- Member and person data structures
- Spotcheck and quality assurance models
.env.example- Environment variable templateREADME_DEV.md- Local development quickstart guiderequirements.txt- Python dependenciessetup_user.sh- User environment setup script
OpenLegislation has evolved to address comprehensive legislative data needs:
-
Comprehensive Data Coverage - Provide free, open access to NY State, federal, and all 50 state legislative information through unified APIs
-
Multi-Source Integration - Aggregate and harmonize data from NY State LBDC, Congress.gov, GovInfo.gov, and OpenStates into a single platform
-
Real-time Processing - Parse and redistribute legislative updates in real-time from all sources with <15 minute latency
-
AI-Powered Analysis - Incorporate semantic search, ML-powered insights, and automated content analysis for legislative intelligence
-
Developer-Friendly Platform - Offer well-documented REST APIs, SDKs, and tools for easy integration with web applications and research projects
-
Advanced Research Tools - Support policy research through comparative analysis, trend identification, and predictive analytics across jurisdictions
-
Open Source Collaboration - Foster transparency and community contributions through dual BSD/GPL licensing and active community engagement
-
Modern Infrastructure - Leverage cloud-native technologies (Java 17, Spring 5, PostgreSQL, Elasticsearch, pgvector) for scalability and reliability
-
Cross-Jurisdiction Analytics - Enable comparative analysis between federal and state legislation, tracking policy diffusion and influence
-
Enterprise-Grade Quality - Maintain 99.9% uptime, >99% data accuracy, and comprehensive security compliance
- Comprehensive Coverage: Federal + all 50 states legislative data
- Real-time Synchronization: Updates within 15 minutes from all sources
- Data Harmonization: Unified data model across different legislative structures
- Intelligent Deduplication: Advanced entity resolution and duplicate detection
- Semantic Search: Natural language queries with vector embeddings
- Content Analysis: Automated bill classification, sentiment analysis, and summarization
- Predictive Analytics: Bill passage probability and trend identification
- Comparative Analysis: Cross-jurisdiction legislation comparison and influence tracking
- Comprehensive APIs: RESTful APIs with OpenAPI documentation
- SDKs & Tools: Client libraries for popular programming languages
- Sandbox Environment: Safe testing environment for development
- Community Support: Developer forums, documentation, and tutorials
- π Enhanced Tasks Management: Comprehensive project roadmap with microgoals and success criteria
- π Software Requirements Specification: Detailed technical requirements and validation criteria
- π Data Source Integration: Technical specifications for all data sources
- π OpenDiscourse Integration Plan: 8-week phased integration strategy
- βοΈ Automation Guide: Complete CI/CD and automation documentation
- π API Reference: Comprehensive API documentation and examples
Legacy Documentation (archived for reference):
- Comparative Analysis: Track policy diffusion across states and federal levels
- Trend Identification: Identify emerging legislative trends and patterns
- Impact Assessment: Measure effectiveness and outcomes of legislation
- Custom Reports: Generate tailored analytics and visualizations
- API Response Time: <200ms (95th percentile)
- System Availability: >99.9% uptime
- Data Freshness: <15 minute update latency
- Search Accuracy: >90% relevance score
- Throughput: 10,000+ concurrent users
- Coverage: 100% federal and state legislative data
- Accuracy: >99% data accuracy across all sources
- Completeness: >98% field completeness
- Consistency: Unified data model across all sources
- Developer Adoption: >100 external API users
- Research Usage: >50 academic citations annually
- User Satisfaction: >4.5/5 satisfaction score
- Community Growth: >30% year-over-year contributor growth
