aaronsb · aaronsb · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026 · Apr 3, 2026
diff --git a/.gitignore b/.gitignore
@@ -77,6 +77,7 @@
 !hooks/ways/**/
 !hooks/ways/**/*.md
 !hooks/ways/**/*.sh
+!hooks/ways/**/*.locales.jsonl
 !hooks/ways/**/*.yaml.template
 !hooks/ways/**/provenance.yaml
 !hooks/ways/**/adr-tool

diff --git a/docs/architecture/system/ADR-107-way-match-corpus-batch-mode-and-locale-support.md b/docs/architecture/system/ADR-107-way-match-corpus-batch-mode-and-locale-support.md
@@ -107,18 +107,52 @@ Way body content (the guidance injected into agent context) is NOT translated. R
 - The guidance is for the agent's reasoning, not displayed to the user
 - Cross-language injection is well-understood: English instructions → non-English output
 
-The ADR-107 Draft's Tier 1/Tier 2 file model (`{name}-{lang}.md` with frontmatter-only stubs) is **deferred**. It solved a real problem (matching vocabulary in the user's language) but the embedding engine solves it better — cross-language semantic matching without per-language vocabulary files. If BM25 is the only engine and a non-Romance language is needed, the tiered file model can be revisited.
+### Native language stubs (shipped)
 
-### Embedding model upgrade path
+The original ADR-107 Draft proposed a tiered file model (`{name}-{lang}.md`). This was initially deferred in favor of cross-language embedding. However, evaluation data showed that native-language stubs dramatically outperform cross-language matching:
 
-The current `all-MiniLM-L6-v2` (21MB, English, 98% accuracy) serves the English-only use case well. For multilingual matching:
+| Language | EN model × EN desc | Multi model × cross-lang | Multi model × native stub |
+|----------|-------------------:|------------------------:|-------------------------:|
+| ja       | -0.03              | 0.69                    | **0.93**                 |
+| ar       | 0.04               | 0.40                    | **0.96**                 |
+| de       | 0.08               | 0.62                    | **0.82**                 |
+| es       | 0.44               | 0.79                    | **0.84**                 |
 
-| Model | Size | Languages | Notes |
-|-------|------|-----------|-------|
-| all-MiniLM-L6-v2 | 21MB | English | Current, shipping |
-| paraphrase-multilingual-MiniLM-L12-v2 | ~120MB | 52 | Same architecture, multilingual training data |
+Native stubs are now the primary multilingual matching strategy. Each stub provides a `description` and `vocabulary` in the target language, scored by the multilingual embedding model.
 
-The upgrade is a model swap — same GGUF format, same `way-embed` binary, same embedding dimensions. `make setup` downloads the appropriate model based on configured language. If `output_language` is `en` or unset, the smaller English model is used. If non-English, the multilingual model is downloaded.
+### Packed locale storage (.locales.jsonl)
+
+Stubs are stored as **packed JSONL**, one file per way, co-located with the way it belongs to:
+
+```
+ea/briefing/
+  briefing.md              # the way (English)
+  briefing.locales.jsonl   # all language stubs
+```
+
+```jsonl
+{"lang":"ja","description":"朝のブリーフィング、昨夜の要約","vocabulary":"朝礼 ブリーフィング 要約 優先事項"}
+{"lang":"de","description":"Morgendliches Briefing, Tagesübersicht","vocabulary":"Morgenbriefing Tagesübersicht Zusammenfassung"}
+```
+
+Design constraints:
+- **No `embed_threshold`** in packed format — hardcoded to `0.25` in the corpus generator. Per-way override requires externalizing to a full `.lang.md` file.
+- **No `embed_model`** in packed format — always `"multilingual"` for locale stubs.
+- **Override mechanism**: if `briefing.ja.md` exists as a real file on disk, it supersedes the `ja` entry in `briefing.locales.jsonl`. This allows graduating any stub to a full native-language way with body content.
+- **Co-location over aggregation**: one `.locales.jsonl` per way (not per language, not one global file). Way deletion = directory deletion, translations go with it.
+
+This replaces the individual `{name}.{lang}.md` stub files (which would grow to 4,000+ files at full language coverage). The packed format keeps the training corpus version-controlled, diffable, and lintable while eliminating file sprawl.
+
+### Dual embedding model (shipped)
+
+Both models ship simultaneously. `make setup` downloads both:
+
+| Model | Size | Languages | Use case |
+|-------|------|-----------|----------|
+| all-MiniLM-L6-v2 | 21MB | English | Precise EN matching (default) |
+| paraphrase-multilingual-MiniLM-L12-v2 | 127MB | 52 | Native-language stub matching |
+
+`ways corpus` splits entries by `embed_model` field into two corpora (`ways-corpus-en.jsonl`, `ways-corpus-multi.jsonl`). The scanner queries both and merges results. Each way's English entry is scored by the EN model; each locale stub is scored by the multilingual model.
 
 `languages.json` defines the supported language set for the multilingual model. Adding a language means verifying it's in the model's training data and adding the entry — no code changes.
 
@@ -161,7 +195,8 @@ This makes model selection empirical: run the tests against candidate models, pi
 ### Neutral
 
 - Way content stays English — no translation infrastructure needed
-- The tiered file model from the original Draft is deferred, not rejected — it becomes relevant if someone needs BM25-only matching in non-Romance languages
+- Packed `.locales.jsonl` replaces per-language stub files — same data, fewer files
+- Override mechanism (`{name}.{lang}.md` supersedes JSONL entry) allows gradual migration from stubs to full native-language ways
 - `ways.json` `output_language: "en"` is the default — zero behavior change for existing users
 
 ## References

diff --git a/docs/architecture/system/multilingual-model-evaluation.md b/docs/architecture/system/multilingual-model-evaluation.md
@@ -64,6 +64,6 @@ The multilingual model enables three matching strategies:
 
 1. **English ways + English model** — current production. High precision for English prompts.
 2. **English ways + multilingual model (cross-language)** — user types in any language, matches against English descriptions. Works but scores 30-50% lower.
-3. **Native-language stubs + multilingual model (same-language)** — frontmatter-only `.ja.md` stubs with native descriptions. Consistently scores 0.80+ across tested languages.
+3. **Native-language stubs + multilingual model (same-language)** — locale entries in `.locales.jsonl` with native descriptions. Consistently scores 0.80+ across tested languages.
 
 **Recommendation:** Ship both models. English ways use the English model (precise, 21MB). Multilingual stubs use the multilingual model (broad, 127MB). Per-way `embed_model` frontmatter field controls routing. This gives per-language threshold tuning without compromising English accuracy.
diff --git a/docs/hooks-and-ways/languages.md b/docs/hooks-and-ways/languages.md
@@ -49,37 +49,42 @@ embed_threshold: 0.35
 
 Ways with `embed_model: multilingual` are scored by the multilingual model against a separate corpus.
 
-## Creating language stubs
+## Locale stubs — packed format
 
-A language stub is a frontmatter-only `.{lang}.md` file that provides native-language matching vocabulary for an existing way. The way body stays English — only the matching changes.
+Locale stubs provide native-language matching vocabulary for existing ways. They're stored as **packed JSONL**, one file per way, co-located with the way they belong to:
 
 ```
 hooks/ways/softwaredev/code/security/
-  security.md           # English way — full body + frontmatter
-  security.ja.md        # Japanese stub — frontmatter only, no body
-  security.ko.md        # Korean stub — frontmatter only, no body
+  security.md                # English way — full body + frontmatter
+  security.locales.jsonl     # all language stubs (one line per language)
 ```
 
-Example stub (`security.ja.md`):
+Each line in the `.locales.jsonl` is a self-contained locale entry:
 
-```yaml
----
-description: セキュリティ脆弱性スキャンと監査
-vocabulary: セキュリティ 脆弱性 CVE 監査 認証 暗号化
-embed_model: multilingual
-embed_threshold: 0.25
----
+```jsonl
+{"lang":"ja","description":"セキュリティ脆弱性スキャンと監査","vocabulary":"セキュリティ 脆弱性 CVE 監査","embed_threshold":0.74}
+{"lang":"de","description":"Sicherheitsüberblick, sichere Programmierstandards","vocabulary":"Sicherheit Schwachstelle schützen OWASP","embed_threshold":0.79}
+{"lang":"es","description":"Seguridad general, codificación segura","vocabulary":"seguridad vulnerable defensa OWASP","embed_threshold":0.78}
+{"lang":"ar","description":"نظرة عامة على الأمان والبرمجة الآمنة","vocabulary":"أمان برمجة آمنة حماية ثغرات","embed_threshold":0.84}
 ```
 
 When a Japanese user types a prompt, the scanner:
-1. Matches `security.ja.md`'s frontmatter using the multilingual model
+1. Scores the Japanese stub's description using the multilingual model
 2. Injects `security.md`'s English body (the guidance text)
 
-The agent reads the English guidance and responds in the configured output language.
+### Format rules
+
+- **`embed_threshold`** is optional — omit it and the corpus generator defaults to 0.25. Use `ways tune --apply` to compute optimal values automatically.
+- **`embed_model`** is implicit — always `multilingual` for locale stubs (not stored in the file).
+- **No body content** — just the JSONL line. If someone writes a full native-language way, they create `security.ja.md` as a regular file, which overrides the packed entry.
+
+### Override mechanism
+
+If `security.ja.md` exists as a real file alongside `security.locales.jsonl`, the `.md` file wins for Japanese. This lets authors graduate a stub into a full native-language way with body content, without touching the packed file.
 
 ### Why same-language stubs matter
 
-Cross-language matching (Japanese prompt → English description) scores ~0.69. Same-language matching (Japanese prompt → Japanese description) scores ~0.93. The stub's native-language description dramatically improves matching precision.
+Cross-language matching (Japanese prompt → English description) scores ~0.69. Same-language matching (Japanese prompt → Japanese description) scores ~0.93. The native stub dramatically improves matching precision.
 
 | Scenario | Cosine similarity |
 |----------|----------------:|
@@ -89,6 +94,65 @@ Cross-language matching (Japanese prompt → English description) scores ~0.69.
 
 See `docs/architecture/system/multilingual-model-evaluation.md` for full test results.
 
+## Tuning and auditing
+
+### Auto-tuning thresholds
+
+`ways tune` computes the optimal `embed_threshold` for each locale entry by scoring it against the full corpus and finding the discrimination boundary:
+
+```bash
+# Preview what would change (dry run)
+ways tune
+
+# Tune a specific way
+ways tune --way security
+
+# Apply tuned thresholds to .locales.jsonl files
+ways tune --apply
+
+# Regenerate corpus with tuned values
+ways corpus
+```
+
+The tuner runs in parallel (all cores minus 4). ~13 seconds for 328 entries on a 32-core machine.
+
+### Discrimination audit
+
+`ways tune --audit` flags entries where the description doesn't clearly separate this way from others — no threshold can fix an ambiguous description:
+
+```bash
+# Flag entries with discrimination gap < 0.15
+ways tune --audit
+
+# Adjust the gap threshold
+ways tune --audit --audit-threshold 0.20
+```
+
+The audit shows **confusers** — which ways the ambiguous entry is being confused with:
+
+```
+softwaredev/docs/mermaid
+  ar — gap 0.07  (self 1.00, noise 0.93)  confused with: softwaredev/visualization/diagrams (0.93)
+```
+
+This tells the author: "your Arabic mermaid description looks too similar to the diagrams way — revise the vocabulary to distinguish them."
+
+### Full authoring cycle
+
+```
+write stubs → compile → tune → audit → revise → repeat
+```
+
+1. Write/generate locale entries in `.locales.jsonl`
+2. `ways corpus` — compile into embeddings
+3. `ways tune --apply` — auto-set thresholds
+4. `ways tune --audit` — flag ambiguous descriptions
+5. Revise flagged descriptions, go to step 2
+
+Two dimensions to optimize:
+- **Discrimination** (gap): how clearly the description identifies this way vs others. Property of description quality.
+- **Sensitivity** (threshold): how much signal required before firing. Auto-tuned from discrimination data.
+
 ## Supported languages
 
 Languages are defined in `tools/ways-cli/languages.json`. Each entry specifies:

diff --git a/hooks/ways/ea/briefing/briefing.locales.jsonl b/hooks/ways/ea/briefing/briefing.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"إعداد الإحاطة الصباحية وملخص الأحداث الليلية","vocabulary":"إحاطة صباحية ملخص ليلي تقرير يومي مستجدات","embed_threshold":0.56}
+{"lang":"de","description":"Morgendliches Briefing, was ist über Nacht passiert, Tagesübersicht über alle Posteingänge und Kalender","vocabulary":"Morgenbriefing Tagesübersicht aufholen was habe ich verpasst Zusammenfassung Prioritäten Posteingang Kalender Überblick","embed_threshold":0.69}
+{"lang":"es","description":"Resumen matutino, ponerse al día con lo que pasó durante la noche","vocabulary":"ponerse al día resumen matutino inicio del día agenda prioridades briefing","embed_threshold":0.62}
+{"lang":"ja","description":"朝のブリーフィング、昨夜の出来事の要約、一日の予定確認","vocabulary":"朝礼 ブリーフィング 要約 まとめ 予定 優先事項 今日のタスク","embed_threshold":0.7}
diff --git a/hooks/ways/ea/calendar/calendar.locales.jsonl b/hooks/ways/ea/calendar/calendar.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"جدولة الاجتماعات والتحقق من التوفر في التقويم","vocabulary":"تقويم اجتماع موعد جدولة توفر حجز","embed_threshold":0.69}
+{"lang":"de","description":"Termine planen, Verfügbarkeit prüfen, Zeitblöcke im Kalender reservieren, Besprechungen erstellen, freie Zeitfenster finden","vocabulary":"Termin Kalender Verfügbarkeit Zeitblock Besprechung Einladung Erinnerung verschieben freier Slot buchen Zeitzone Terminplanung","embed_threshold":0.74}
+{"lang":"es","description":"Agendar reuniones, consultar disponibilidad, bloquear tiempo, eventos del calendario","vocabulary":"agendar calendario disponibilidad bloquear tiempo evento reunión invitación horario","embed_threshold":0.68}
+{"lang":"ja","description":"会議のスケジュール調整、空き時間の確認、カレンダー管理","vocabulary":"スケジュール カレンダー 予定 会議 空き時間 予約 招待 日程調整","embed_threshold":0.76}
diff --git a/hooks/ways/ea/comms/comms.locales.jsonl b/hooks/ways/ea/comms/comms.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"إدارة محادثات الفريق ومنصات التراسل","vocabulary":"محادثة فريق رسائل تواصل منصة تراسل","embed_threshold":0.64}
+{"lang":"de","description":"Team-Chat und Messaging-Plattformen, Nachrichten lesen und mit Freigabe senden, Kommunikationskanäle","vocabulary":"Teams Chat Nachricht Slack Kanal ungelesen Konversation Direktnachricht Gruppenchat senden antworten Benachrichtigung Erwähnung","embed_threshold":0.66}
+{"lang":"es","description":"Chat de equipo y plataformas de mensajería, envío de mensajes","vocabulary":"teams chat mensaje slack canal no leído conversación respuesta notificación","embed_threshold":0.62}
+{"lang":"ja","description":"チームチャットやメッセージングの管理、メッセージ送信","vocabulary":"チャット メッセージ 通知 チャンネル 返信 未読 会話 連絡","embed_threshold":0.61}
diff --git a/hooks/ways/ea/comms/recap/recap.locales.jsonl b/hooks/ways/ea/comms/recap/recap.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"ملخصات الاجتماعات والنصوص المفرغة وبنود العمل","vocabulary":"ملخص اجتماع تفريغ بنود عمل محضر نقاط رئيسية","embed_threshold":0.72}
+{"lang":"de","description":"Besprechungszusammenfassungen, Transkripte, KI-generierte Meeting-Protokolle, Aktionspunkte aus Meetings","vocabulary":"Zusammenfassung Transkript Protokoll Besprechungsnotizen Aufzeichnung Aktionspunkte besprochen Nachbereitung Teilnehmer Rückblick","embed_threshold":0.73}
+{"lang":"es","description":"Resúmenes de reuniones, transcripciones, acciones pendientes de reuniones","vocabulary":"resumen transcripción acta reunión notas grabación acciones pendientes","embed_threshold":0.66}
+{"lang":"ja","description":"会議の振り返り、議事録、アクションアイテムの整理","vocabulary":"議事録 振り返り 要約 アクションアイテム 録音 文字起こし 会議メモ","embed_threshold":0.64}
diff --git a/hooks/ways/ea/ea.locales.jsonl b/hooks/ways/ea/ea.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"المساعد التنفيذي — إدارة البريد والتقويم والمهام","vocabulary":"مساعد تنفيذي بريد إلكتروني تقويم مهام إدارة","embed_threshold":0.65}
+{"lang":"de","description":"Persönliche Assistenz für E-Mail, Posteingang, Kalender, Aufgaben und Kommunikation über mehrere Konten hinweg","vocabulary":"Assistenz Triage Briefing aufholen Posteingang Tagesablauf Terminplan Agenda Konten Arbeitsbereich verwalten helfen","embed_threshold":0.69}
+{"lang":"es","description":"Asistente ejecutivo para correo, bandeja de entrada, calendario, tareas y comunicaciones","vocabulary":"asistente ejecutivo triaje briefing bandeja de entrada agenda calendario","embed_threshold":0.68}
+{"lang":"ja","description":"メール・カレンダー・タスク・コミュニケーションを統括するエグゼクティブアシスタント","vocabulary":"エグゼクティブアシスタント 秘書 受信トレイ トリアージ 日程 アジェンダ","embed_threshold":0.69}
diff --git a/hooks/ways/ea/email/drafting/drafting.locales.jsonl b/hooks/ways/ea/email/drafting/drafting.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"صياغة الردود على رسائل البريد الإلكتروني","vocabulary":"صياغة رد بريد إلكتروني كتابة رسالة مسودة","embed_threshold":0.67}
+{"lang":"de","description":"E-Mail-Entwürfe schreiben, Schreibstil kalibrieren, Antworten mit korrektem Threading erstellen","vocabulary":"Entwurf Antwort verfassen E-Mail schreiben Nachricht Tonfall Stil Thread Anhang formulieren","embed_threshold":0.76}
+{"lang":"es","description":"Redactar respuestas de correo, estilo de escritura, borradores de email","vocabulary":"borrador respuesta redactar correo escribir mensaje tono estilo hilo","embed_threshold":0.71}
+{"lang":"ja","description":"メールの返信作成、文体調整、下書き","vocabulary":"メール下書き 返信 作成 文体 トーン スレッド 文章","embed_threshold":0.71}
diff --git a/hooks/ways/ea/email/email.locales.jsonl b/hooks/ways/ea/email/email.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"فرز البريد الإلكتروني ومسح صندوق الوارد","vocabulary":"بريد إلكتروني فرز صندوق وارد تصنيف أولويات","embed_threshold":0.67}
+{"lang":"de","description":"E-Mail-Posteingang sichten, ungelesene Nachrichten scannen, Threads klassifizieren und filtern, was braucht eine Antwort","vocabulary":"Triage Posteingang ungelesen E-Mail scannen Nachrichten filtern Priorität Handlungsbedarf prüfen dringend Antwort Thread sichten","embed_threshold":0.76}
+{"lang":"es","description":"Triaje de correo, revisar bandeja de entrada, clasificar y filtrar hilos","vocabulary":"triaje bandeja entrada no leído correo revisar filtrar prioridad urgente responder","embed_threshold":0.7}
+{"lang":"ja","description":"メールのトリアージ、受信トレイの整理、スレッドの分類とフィルタリング","vocabulary":"トリアージ 受信トレイ 未読 メール 分類 フィルター 優先度 緊急","embed_threshold":0.69}
diff --git a/hooks/ways/ea/intelligence/intelligence.locales.jsonl b/hooks/ways/ea/intelligence/intelligence.locales.jsonl
@@ -0,0 +1,4 @@
+{"lang":"ar","description":"التحضير للاجتماعات والمراجعة الأسبوعية وبناء السياق","vocabulary":"تحضير اجتماع مراجعة أسبوعية سياق استخبارات معلومات","embed_threshold":0.7}
+{"lang":"de","description":"Besprechungsvorbereitung, Wochenrückblick, E-Mail-Kalender-Aufgaben-Chat querverweisen um Kontext zu einer Person oder einem Thema aufzubauen","vocabulary":"Besprechungsvorbereitung Wochenrückblick Querverweis Recherche Synthese Kontext Teilnehmer Hintergrund vorbereiten","embed_threshold":0.73}
+{"lang":"es","description":"Prepararse para reuniones, revisión semanal, cruzar referencias y contexto","vocabulary":"preparación reunión revisión semanal cruzar referencias inteligencia contexto","embed_threshold":0.71}
+{"lang":"ja","description":"会議の事前準備、週次レビュー、コンテキストの横断的整理","vocabulary":"会議準備 週次レビュー 情報収集 インテリジェンス コンテキスト 分析","embed_threshold":0.72}