Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -77,6 +77,7 @@
!hooks/ways/**/
!hooks/ways/**/*.md
!hooks/ways/**/*.sh
!hooks/ways/**/*.locales.jsonl
!hooks/ways/**/*.yaml.template
!hooks/ways/**/provenance.yaml
!hooks/ways/**/adr-tool
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -107,18 +107,52 @@ Way body content (the guidance injected into agent context) is NOT translated. R
- The guidance is for the agent's reasoning, not displayed to the user
- Cross-language injection is well-understood: English instructions → non-English output

The ADR-107 Draft's Tier 1/Tier 2 file model (`{name}-{lang}.md` with frontmatter-only stubs) is **deferred**. It solved a real problem (matching vocabulary in the user's language) but the embedding engine solves it better — cross-language semantic matching without per-language vocabulary files. If BM25 is the only engine and a non-Romance language is needed, the tiered file model can be revisited.
### Native language stubs (shipped)

### Embedding model upgrade path
The original ADR-107 Draft proposed a tiered file model (`{name}-{lang}.md`). This was initially deferred in favor of cross-language embedding. However, evaluation data showed that native-language stubs dramatically outperform cross-language matching:

The current `all-MiniLM-L6-v2` (21MB, English, 98% accuracy) serves the English-only use case well. For multilingual matching:
| Language | EN model × EN desc | Multi model × cross-lang | Multi model × native stub |
|----------|-------------------:|------------------------:|-------------------------:|
| ja | -0.03 | 0.69 | **0.93** |
| ar | 0.04 | 0.40 | **0.96** |
| de | 0.08 | 0.62 | **0.82** |
| es | 0.44 | 0.79 | **0.84** |

| Model | Size | Languages | Notes |
|-------|------|-----------|-------|
| all-MiniLM-L6-v2 | 21MB | English | Current, shipping |
| paraphrase-multilingual-MiniLM-L12-v2 | ~120MB | 52 | Same architecture, multilingual training data |
Native stubs are now the primary multilingual matching strategy. Each stub provides a `description` and `vocabulary` in the target language, scored by the multilingual embedding model.

The upgrade is a model swap — same GGUF format, same `way-embed` binary, same embedding dimensions. `make setup` downloads the appropriate model based on configured language. If `output_language` is `en` or unset, the smaller English model is used. If non-English, the multilingual model is downloaded.
### Packed locale storage (.locales.jsonl)

Stubs are stored as **packed JSONL**, one file per way, co-located with the way it belongs to:

```
ea/briefing/
briefing.md # the way (English)
briefing.locales.jsonl # all language stubs
```

```jsonl
{"lang":"ja","description":"朝のブリーフィング、昨夜の要約","vocabulary":"朝礼 ブリーフィング 要約 優先事項"}
{"lang":"de","description":"Morgendliches Briefing, Tagesübersicht","vocabulary":"Morgenbriefing Tagesübersicht Zusammenfassung"}
```

Design constraints:
- **No `embed_threshold`** in packed format — hardcoded to `0.25` in the corpus generator. Per-way override requires externalizing to a full `.lang.md` file.
- **No `embed_model`** in packed format — always `"multilingual"` for locale stubs.
- **Override mechanism**: if `briefing.ja.md` exists as a real file on disk, it supersedes the `ja` entry in `briefing.locales.jsonl`. This allows graduating any stub to a full native-language way with body content.
- **Co-location over aggregation**: one `.locales.jsonl` per way (not per language, not one global file). Way deletion = directory deletion, translations go with it.

This replaces the individual `{name}.{lang}.md` stub files (which would grow to 4,000+ files at full language coverage). The packed format keeps the training corpus version-controlled, diffable, and lintable while eliminating file sprawl.

### Dual embedding model (shipped)

Both models ship simultaneously. `make setup` downloads both:

| Model | Size | Languages | Use case |
|-------|------|-----------|----------|
| all-MiniLM-L6-v2 | 21MB | English | Precise EN matching (default) |
| paraphrase-multilingual-MiniLM-L12-v2 | 127MB | 52 | Native-language stub matching |

`ways corpus` splits entries by `embed_model` field into two corpora (`ways-corpus-en.jsonl`, `ways-corpus-multi.jsonl`). The scanner queries both and merges results. Each way's English entry is scored by the EN model; each locale stub is scored by the multilingual model.

`languages.json` defines the supported language set for the multilingual model. Adding a language means verifying it's in the model's training data and adding the entry — no code changes.

Expand Down Expand Up @@ -161,7 +195,8 @@ This makes model selection empirical: run the tests against candidate models, pi
### Neutral

- Way content stays English — no translation infrastructure needed
- The tiered file model from the original Draft is deferred, not rejected — it becomes relevant if someone needs BM25-only matching in non-Romance languages
- Packed `.locales.jsonl` replaces per-language stub files — same data, fewer files
- Override mechanism (`{name}.{lang}.md` supersedes JSONL entry) allows gradual migration from stubs to full native-language ways
- `ways.json` `output_language: "en"` is the default — zero behavior change for existing users

## References
Expand Down
2 changes: 1 addition & 1 deletion docs/architecture/system/multilingual-model-evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,6 @@ The multilingual model enables three matching strategies:

1. **English ways + English model** — current production. High precision for English prompts.
2. **English ways + multilingual model (cross-language)** — user types in any language, matches against English descriptions. Works but scores 30-50% lower.
3. **Native-language stubs + multilingual model (same-language)** — frontmatter-only `.ja.md` stubs with native descriptions. Consistently scores 0.80+ across tested languages.
3. **Native-language stubs + multilingual model (same-language)** — locale entries in `.locales.jsonl` with native descriptions. Consistently scores 0.80+ across tested languages.

**Recommendation:** Ship both models. English ways use the English model (precise, 21MB). Multilingual stubs use the multilingual model (broad, 127MB). Per-way `embed_model` frontmatter field controls routing. This gives per-language threshold tuning without compromising English accuracy.
96 changes: 80 additions & 16 deletions docs/hooks-and-ways/languages.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,37 +49,42 @@ embed_threshold: 0.35

Ways with `embed_model: multilingual` are scored by the multilingual model against a separate corpus.

## Creating language stubs
## Locale stubs — packed format

A language stub is a frontmatter-only `.{lang}.md` file that provides native-language matching vocabulary for an existing way. The way body stays English — only the matching changes.
Locale stubs provide native-language matching vocabulary for existing ways. They're stored as **packed JSONL**, one file per way, co-located with the way they belong to:

```
hooks/ways/softwaredev/code/security/
security.md # English way — full body + frontmatter
security.ja.md # Japanese stub — frontmatter only, no body
security.ko.md # Korean stub — frontmatter only, no body
security.md # English way — full body + frontmatter
security.locales.jsonl # all language stubs (one line per language)
```

Example stub (`security.ja.md`):
Each line in the `.locales.jsonl` is a self-contained locale entry:

```yaml
---
description: セキュリティ脆弱性スキャンと監査
vocabulary: セキュリティ 脆弱性 CVE 監査 認証 暗号化
embed_model: multilingual
embed_threshold: 0.25
---
```jsonl
{"lang":"ja","description":"セキュリティ脆弱性スキャンと監査","vocabulary":"セキュリティ 脆弱性 CVE 監査","embed_threshold":0.74}
{"lang":"de","description":"Sicherheitsüberblick, sichere Programmierstandards","vocabulary":"Sicherheit Schwachstelle schützen OWASP","embed_threshold":0.79}
{"lang":"es","description":"Seguridad general, codificación segura","vocabulary":"seguridad vulnerable defensa OWASP","embed_threshold":0.78}
{"lang":"ar","description":"نظرة عامة على الأمان والبرمجة الآمنة","vocabulary":"أمان برمجة آمنة حماية ثغرات","embed_threshold":0.84}
```

When a Japanese user types a prompt, the scanner:
1. Matches `security.ja.md`'s frontmatter using the multilingual model
1. Scores the Japanese stub's description using the multilingual model
2. Injects `security.md`'s English body (the guidance text)

The agent reads the English guidance and responds in the configured output language.
### Format rules

- **`embed_threshold`** is optional — omit it and the corpus generator defaults to 0.25. Use `ways tune --apply` to compute optimal values automatically.
- **`embed_model`** is implicit — always `multilingual` for locale stubs (not stored in the file).
- **No body content** — just the JSONL line. If someone writes a full native-language way, they create `security.ja.md` as a regular file, which overrides the packed entry.

### Override mechanism

If `security.ja.md` exists as a real file alongside `security.locales.jsonl`, the `.md` file wins for Japanese. This lets authors graduate a stub into a full native-language way with body content, without touching the packed file.

### Why same-language stubs matter

Cross-language matching (Japanese prompt → English description) scores ~0.69. Same-language matching (Japanese prompt → Japanese description) scores ~0.93. The stub's native-language description dramatically improves matching precision.
Cross-language matching (Japanese prompt → English description) scores ~0.69. Same-language matching (Japanese prompt → Japanese description) scores ~0.93. The native stub dramatically improves matching precision.

| Scenario | Cosine similarity |
|----------|----------------:|
Expand All @@ -89,6 +94,65 @@ Cross-language matching (Japanese prompt → English description) scores ~0.69.

See `docs/architecture/system/multilingual-model-evaluation.md` for full test results.

## Tuning and auditing

### Auto-tuning thresholds

`ways tune` computes the optimal `embed_threshold` for each locale entry by scoring it against the full corpus and finding the discrimination boundary:

```bash
# Preview what would change (dry run)
ways tune

# Tune a specific way
ways tune --way security

# Apply tuned thresholds to .locales.jsonl files
ways tune --apply

# Regenerate corpus with tuned values
ways corpus
```

The tuner runs in parallel (all cores minus 4). ~13 seconds for 328 entries on a 32-core machine.

### Discrimination audit

`ways tune --audit` flags entries where the description doesn't clearly separate this way from others — no threshold can fix an ambiguous description:

```bash
# Flag entries with discrimination gap < 0.15
ways tune --audit

# Adjust the gap threshold
ways tune --audit --audit-threshold 0.20
```

The audit shows **confusers** — which ways the ambiguous entry is being confused with:

```
softwaredev/docs/mermaid
ar — gap 0.07 (self 1.00, noise 0.93) confused with: softwaredev/visualization/diagrams (0.93)
```

This tells the author: "your Arabic mermaid description looks too similar to the diagrams way — revise the vocabulary to distinguish them."

### Full authoring cycle

```
write stubs → compile → tune → audit → revise → repeat
```

1. Write/generate locale entries in `.locales.jsonl`
2. `ways corpus` — compile into embeddings
3. `ways tune --apply` — auto-set thresholds
4. `ways tune --audit` — flag ambiguous descriptions
5. Revise flagged descriptions, go to step 2

Two dimensions to optimize:
- **Discrimination** (gap): how clearly the description identifies this way vs others. Property of description quality.
- **Sensitivity** (threshold): how much signal required before firing. Auto-tuned from discrimination data.

## Supported languages

Languages are defined in `tools/ways-cli/languages.json`. Each entry specifies:
Expand Down
4 changes: 4 additions & 0 deletions hooks/ways/ea/briefing/briefing.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"إعداد الإحاطة الصباحية وملخص الأحداث الليلية","vocabulary":"إحاطة صباحية ملخص ليلي تقرير يومي مستجدات","embed_threshold":0.56}
{"lang":"de","description":"Morgendliches Briefing, was ist über Nacht passiert, Tagesübersicht über alle Posteingänge und Kalender","vocabulary":"Morgenbriefing Tagesübersicht aufholen was habe ich verpasst Zusammenfassung Prioritäten Posteingang Kalender Überblick","embed_threshold":0.69}
{"lang":"es","description":"Resumen matutino, ponerse al día con lo que pasó durante la noche","vocabulary":"ponerse al día resumen matutino inicio del día agenda prioridades briefing","embed_threshold":0.62}
{"lang":"ja","description":"朝のブリーフィング、昨夜の出来事の要約、一日の予定確認","vocabulary":"朝礼 ブリーフィング 要約 まとめ 予定 優先事項 今日のタスク","embed_threshold":0.7}
4 changes: 4 additions & 0 deletions hooks/ways/ea/calendar/calendar.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"جدولة الاجتماعات والتحقق من التوفر في التقويم","vocabulary":"تقويم اجتماع موعد جدولة توفر حجز","embed_threshold":0.69}
{"lang":"de","description":"Termine planen, Verfügbarkeit prüfen, Zeitblöcke im Kalender reservieren, Besprechungen erstellen, freie Zeitfenster finden","vocabulary":"Termin Kalender Verfügbarkeit Zeitblock Besprechung Einladung Erinnerung verschieben freier Slot buchen Zeitzone Terminplanung","embed_threshold":0.74}
{"lang":"es","description":"Agendar reuniones, consultar disponibilidad, bloquear tiempo, eventos del calendario","vocabulary":"agendar calendario disponibilidad bloquear tiempo evento reunión invitación horario","embed_threshold":0.68}
{"lang":"ja","description":"会議のスケジュール調整、空き時間の確認、カレンダー管理","vocabulary":"スケジュール カレンダー 予定 会議 空き時間 予約 招待 日程調整","embed_threshold":0.76}
4 changes: 4 additions & 0 deletions hooks/ways/ea/comms/comms.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"إدارة محادثات الفريق ومنصات التراسل","vocabulary":"محادثة فريق رسائل تواصل منصة تراسل","embed_threshold":0.64}
{"lang":"de","description":"Team-Chat und Messaging-Plattformen, Nachrichten lesen und mit Freigabe senden, Kommunikationskanäle","vocabulary":"Teams Chat Nachricht Slack Kanal ungelesen Konversation Direktnachricht Gruppenchat senden antworten Benachrichtigung Erwähnung","embed_threshold":0.66}
{"lang":"es","description":"Chat de equipo y plataformas de mensajería, envío de mensajes","vocabulary":"teams chat mensaje slack canal no leído conversación respuesta notificación","embed_threshold":0.62}
{"lang":"ja","description":"チームチャットやメッセージングの管理、メッセージ送信","vocabulary":"チャット メッセージ 通知 チャンネル 返信 未読 会話 連絡","embed_threshold":0.61}
4 changes: 4 additions & 0 deletions hooks/ways/ea/comms/recap/recap.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"ملخصات الاجتماعات والنصوص المفرغة وبنود العمل","vocabulary":"ملخص اجتماع تفريغ بنود عمل محضر نقاط رئيسية","embed_threshold":0.72}
{"lang":"de","description":"Besprechungszusammenfassungen, Transkripte, KI-generierte Meeting-Protokolle, Aktionspunkte aus Meetings","vocabulary":"Zusammenfassung Transkript Protokoll Besprechungsnotizen Aufzeichnung Aktionspunkte besprochen Nachbereitung Teilnehmer Rückblick","embed_threshold":0.73}
{"lang":"es","description":"Resúmenes de reuniones, transcripciones, acciones pendientes de reuniones","vocabulary":"resumen transcripción acta reunión notas grabación acciones pendientes","embed_threshold":0.66}
{"lang":"ja","description":"会議の振り返り、議事録、アクションアイテムの整理","vocabulary":"議事録 振り返り 要約 アクションアイテム 録音 文字起こし 会議メモ","embed_threshold":0.64}
4 changes: 4 additions & 0 deletions hooks/ways/ea/ea.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"المساعد التنفيذي — إدارة البريد والتقويم والمهام","vocabulary":"مساعد تنفيذي بريد إلكتروني تقويم مهام إدارة","embed_threshold":0.65}
{"lang":"de","description":"Persönliche Assistenz für E-Mail, Posteingang, Kalender, Aufgaben und Kommunikation über mehrere Konten hinweg","vocabulary":"Assistenz Triage Briefing aufholen Posteingang Tagesablauf Terminplan Agenda Konten Arbeitsbereich verwalten helfen","embed_threshold":0.69}
{"lang":"es","description":"Asistente ejecutivo para correo, bandeja de entrada, calendario, tareas y comunicaciones","vocabulary":"asistente ejecutivo triaje briefing bandeja de entrada agenda calendario","embed_threshold":0.68}
{"lang":"ja","description":"メール・カレンダー・タスク・コミュニケーションを統括するエグゼクティブアシスタント","vocabulary":"エグゼクティブアシスタント 秘書 受信トレイ トリアージ 日程 アジェンダ","embed_threshold":0.69}
4 changes: 4 additions & 0 deletions hooks/ways/ea/email/drafting/drafting.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"صياغة الردود على رسائل البريد الإلكتروني","vocabulary":"صياغة رد بريد إلكتروني كتابة رسالة مسودة","embed_threshold":0.67}
{"lang":"de","description":"E-Mail-Entwürfe schreiben, Schreibstil kalibrieren, Antworten mit korrektem Threading erstellen","vocabulary":"Entwurf Antwort verfassen E-Mail schreiben Nachricht Tonfall Stil Thread Anhang formulieren","embed_threshold":0.76}
{"lang":"es","description":"Redactar respuestas de correo, estilo de escritura, borradores de email","vocabulary":"borrador respuesta redactar correo escribir mensaje tono estilo hilo","embed_threshold":0.71}
{"lang":"ja","description":"メールの返信作成、文体調整、下書き","vocabulary":"メール下書き 返信 作成 文体 トーン スレッド 文章","embed_threshold":0.71}
4 changes: 4 additions & 0 deletions hooks/ways/ea/email/email.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"فرز البريد الإلكتروني ومسح صندوق الوارد","vocabulary":"بريد إلكتروني فرز صندوق وارد تصنيف أولويات","embed_threshold":0.67}
{"lang":"de","description":"E-Mail-Posteingang sichten, ungelesene Nachrichten scannen, Threads klassifizieren und filtern, was braucht eine Antwort","vocabulary":"Triage Posteingang ungelesen E-Mail scannen Nachrichten filtern Priorität Handlungsbedarf prüfen dringend Antwort Thread sichten","embed_threshold":0.76}
{"lang":"es","description":"Triaje de correo, revisar bandeja de entrada, clasificar y filtrar hilos","vocabulary":"triaje bandeja entrada no leído correo revisar filtrar prioridad urgente responder","embed_threshold":0.7}
{"lang":"ja","description":"メールのトリアージ、受信トレイの整理、スレッドの分類とフィルタリング","vocabulary":"トリアージ 受信トレイ 未読 メール 分類 フィルター 優先度 緊急","embed_threshold":0.69}
4 changes: 4 additions & 0 deletions hooks/ways/ea/intelligence/intelligence.locales.jsonl
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
{"lang":"ar","description":"التحضير للاجتماعات والمراجعة الأسبوعية وبناء السياق","vocabulary":"تحضير اجتماع مراجعة أسبوعية سياق استخبارات معلومات","embed_threshold":0.7}
{"lang":"de","description":"Besprechungsvorbereitung, Wochenrückblick, E-Mail-Kalender-Aufgaben-Chat querverweisen um Kontext zu einer Person oder einem Thema aufzubauen","vocabulary":"Besprechungsvorbereitung Wochenrückblick Querverweis Recherche Synthese Kontext Teilnehmer Hintergrund vorbereiten","embed_threshold":0.73}
{"lang":"es","description":"Prepararse para reuniones, revisión semanal, cruzar referencias y contexto","vocabulary":"preparación reunión revisión semanal cruzar referencias inteligencia contexto","embed_threshold":0.71}
{"lang":"ja","description":"会議の事前準備、週次レビュー、コンテキストの横断的整理","vocabulary":"会議準備 週次レビュー 情報収集 インテリジェンス コンテキスト 分析","embed_threshold":0.72}
Loading
Loading