This repository demonstrates how to evaluate language model translations using BLEU and ROUGE metrics. It provides a practical implementation with Python, utilizing the H2OGPTE client for translation and the nltk and rouge-score libraries for evaluation. The project highlights the advantages, limitations, and interpretation of these metrics for assessing the quality of machine-generated translations.
- Translate text from English to a target language using the H2OGPTE API.
- Calculate BLEU and ROUGE scores for translation evaluation.
- Understand the strengths and limitations of BLEU and ROUGE metrics.
- Example dataset for testing and evaluation.
- Python 3.8+
- Required Python packages:
pip install h2ogpte nltk rouge-score
- NLTK resources:
import nltk nltk.download('punkt')
-
Clone the repository:
git clone https://github.com/your-repo/evaluate-language-model-translations.git cd evaluate-language-model-translations -
Install dependencies:
pip install -r requirements.txt
-
Set up H2OGPTE API keys in a
config.pyfile:H2O_GPT_E_API_KEY = "your_api_key_here" REMOTE_ADDRESS = "your_h2ogpte_server_address"
-
Run the script to translate and evaluate:
python evaluate_translations.py
Source: How are you?
Reference: ¿Cómo estás?
Candidate: ¿Cómo estás?
Metrics: {'BLEU': 0.221, 'ROUGE-1': 1.0, 'ROUGE-2': 1.0, 'ROUGE-L': 1.0}
Source: What is your name?
Reference: ¿Cómo te llamas?
Candidate: ¿Cuál es tu nombre?
Metrics: {'BLEU': 0.0, 'ROUGE-1': 0.67, 'ROUGE-2': 0.50, 'ROUGE-L': 0.60}
Feel free to contribute by submitting issues or pull requests to improve the repository.
This project is licensed under the MIT License.