A powerful and free OCR (Optical Character Recognition) solution for the Go community that bridges the gap between Go's efficiency and Python's rich AI ecosystem.
The OCR market is flooded with expensive, proprietary solutions that often come with limitations:
- Costly licensing fees for commercial OCR APIs
- Limited language support in many solutions
- Vendor lock-in with cloud-based services
- Complex integration requiring specialized knowledge
- Poor accuracy on varied document types
- No local processing options for sensitive documents
This project addresses these pain points by providing a completely free, locally-run OCR solution that leverages the power of established Python AI libraries while maintaining Go's performance and simplicity.
Python has already solved many complex problems in the AI and machine learning space with mature, battle-tested libraries. Rather than reinventing the wheel in Go, this project creates a bridge that allows Go developers to harness these powerful Python capabilities:
- Tesseract OCR - Google's industry-leading OCR engine
- PIL (Python Imaging Library) - Robust image processing
- PyPDF2 & pdfplumber - Comprehensive PDF text extraction
- Extensive language support - Over 100 languages supported by Tesseract
- PDF Text Extraction: Extract text from PDF documents using multiple extraction methods
- Image OCR: Convert images to text with high accuracy
- Multilingual Support: Supports Portuguese, English, and 100+ other languages
- Caching System: Built-in memory cache to avoid reprocessing the same files
- Fallback Mechanisms: Multiple extraction methods ensure maximum compatibility
- Thread-Safe: Concurrent processing with mutex protection
- Error Handling: Comprehensive error reporting and recovery
- Free & Open Source: No licensing fees or API limits
The project automatically handles Python dependency installation, but you can install them manually:
pip install PyPDF2 pdfplumber pytesseract Pillowdocker build --build-arg TARGETOS=darwin --build-arg TARGETARCH=arm64 -t ocr-processor:latest .docker build --build-arg TARGETOS=windows --build-arg TARGETARCH=amd64 -t ocr-processor:latest .docker build --build-arg TARGETOS=linux --build-arg TARGETARCH=amd64 -t ocr-processor:latest .docker build --build-arg TARGETOS=linux --build-arg TARGETARCH=arm64 -t ocr-processor:latest .- Download the installer from GitHub Tesseract releases
- Run the installer and follow the setup wizard
- Add Tesseract to your system PATH:
- Default installation path:
C:\Program Files\Tesseract-OCR - Add this path to your Windows PATH environment variable
- Default installation path:
- Restart your command prompt
Using Homebrew:
brew install tesseractUsing MacPorts:
sudo port install tesseractsudo apt-get update
sudo apt-get install tesseract-ocr
sudo apt-get install tesseract-ocr-por # For Portuguese language supportsudo yum install tesseract tesseract-langpack-poror for newer versions:
sudo dnf install tesseract tesseract-langpack-porThe main orchestrator that manages Python script execution and handles:
- Python environment detection
- Dependency management
- Script execution
- Result parsing
- Caching management
- Primary Method: pdfplumber (more accurate)
- Fallback Method: PyPDF2 (broader compatibility)
- Automatic Selection: Chooses the best method based on document type
- Language Detection: Attempts Portuguese first, falls back to English
- Image Preprocessing: Automatic RGB conversion
- Format Support: PNG, JPG, JPEG, TIFF, BMP, GIF
- LangChain Integration: Structured prompt templates for AI processing
- Flexible Configuration: Customizable prompt parameters
- JSON Output: Structured response format
type PDFTextResult struct {
Success bool `json:"success"`
Text string `json:"text"`
Error string `json:"error"`
Pages int `json:"pages"`
Filename string `json:"filename"`
}
type ImageTextResult struct {
Success bool `json:"success"`
Text string `json:"text"`
Error string `json:"error"`
Pages int `json:"pages"`
Filename string `json:"filename"`
}package main
import (
"fmt"
"log"
"your-project/model"
)
func main() {
// Initialize the Python executor
executor := model.NewPythonExecutor()
// Check and install PDF dependencies
if err := executor.CheckPythonDependenciesForPDF(); err != nil {
log.Fatal(err)
}
// Extract text from PDF
result, err := executor.ExtractPDFText("document.pdf")
if err != nil {
log.Fatal(err)
}
if result.Success {
fmt.Printf("Extracted %d characters from %d pages\n",
len(result.Text), result.Pages)
fmt.Println(result.Text)
} else {
fmt.Printf("Error: %s\n", result.Error)
}
// Extract text from image
imageResult, err := executor.ExtractImageText("scanned_document.png")
if err != nil {
log.Fatal(err)
}
if imageResult.Success {
fmt.Printf("OCR Result: %s\n", imageResult.Text)
}
// Use with AI prompts
promptInstance := model.NewPromptOCRInstance()
prompt := promptInstance.GetPrompt(result.Text)
// Process with your preferred LLM...
}This project represents a commitment to advancing AI capabilities within the Go ecosystem. By providing free access to powerful OCR functionality, we aim to:
- Lower barriers to entry for developers interested in document processing
- Accelerate innovation in Go-based AI applications
- Foster collaboration between Go and Python communities
- Enable experimentation without cost constraints
- Support education and research initiatives
This project stands on the shoulders of giants. We extend our gratitude to the following projects and their maintainers:
- Tesseract OCR - Google's open-source OCR engine
- pytesseract - Python wrapper for Tesseract
- PyPDF2 - Pure Python PDF library
- pdfplumber - Detailed PDF text extraction
- Pillow (PIL) - Python Imaging Library
- LangChain Go - Go implementation of LangChain for LLM integration
The system provides comprehensive error handling:
- Dependency Check: Automatic verification and installation of Python packages
- File Validation: Format and existence verification
- Graceful Fallbacks: Multiple extraction methods for maximum compatibility
- Detailed Logging: Clear error messages and debugging information
- Support for more document formats (DOCX, RTF, etc.)
- Batch processing capabilities
- Configuration file support
- Docker containerization
- REST API wrapper
- Performance benchmarking tools
This project is open-source and free to use. Please refer to the LICENSE file for details.
We welcome contributions from the community! Whether it's bug reports, feature requests, or code contributions, every effort helps make this tool better for everyone.
Built with β€οΈ for the Go community. Empowering developers to build amazing AI applications without breaking the bank.