Skip to content

Improve translation quality with cross-batch context #4

@aveleazer

Description

@aveleazer

Currently each batch of ~200 subtitle lines is translated independently. The LLM has no knowledge of what was said in previous or next batches.

This causes:

  • Inconsistent character names — a name translated one way in batch 1 may differ in batch 2
  • Lost context — references to earlier dialogue are missed
  • Terminology drift — technical terms or show-specific vocabulary varies between batches

Possible approaches

  1. Sliding context window — include last N lines from previous batch as read-only context
  2. Auto-glossary — after batch 1, extract character names and key terms, pass as glossary to subsequent batches
  3. Two-pass translation — first pass extracts glossary, second pass translates with it
  4. Larger batches — increase from 200 to 500+ lines (risk: context overflow, slower)

Trade-offs

More context = better quality but higher cost and latency. The current approach is fast and cheap (~$0.01/episode on Gemini Flash Lite). Any solution should be opt-in or model-aware.

See extension/providers.jsbuildJsonTranslationPrompt() for the current prompt structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions