Currently each batch of ~200 subtitle lines is translated independently. The LLM has no knowledge of what was said in previous or next batches.
This causes:
- Inconsistent character names — a name translated one way in batch 1 may differ in batch 2
- Lost context — references to earlier dialogue are missed
- Terminology drift — technical terms or show-specific vocabulary varies between batches
Possible approaches
- Sliding context window — include last N lines from previous batch as read-only context
- Auto-glossary — after batch 1, extract character names and key terms, pass as glossary to subsequent batches
- Two-pass translation — first pass extracts glossary, second pass translates with it
- Larger batches — increase from 200 to 500+ lines (risk: context overflow, slower)
Trade-offs
More context = better quality but higher cost and latency. The current approach is fast and cheap (~$0.01/episode on Gemini Flash Lite). Any solution should be opt-in or model-aware.
See extension/providers.js → buildJsonTranslationPrompt() for the current prompt structure.
Currently each batch of ~200 subtitle lines is translated independently. The LLM has no knowledge of what was said in previous or next batches.
This causes:
Possible approaches
Trade-offs
More context = better quality but higher cost and latency. The current approach is fast and cheap (~$0.01/episode on Gemini Flash Lite). Any solution should be opt-in or model-aware.
See
extension/providers.js→buildJsonTranslationPrompt()for the current prompt structure.