Skip to content

Comments

fix: translate Swedish content in non-Swedish articles, use content-based titles#451

Closed
Copilot wants to merge 10 commits intomainfrom
copilot/analyze-improve-agentic-workflows
Closed

fix: translate Swedish content in non-Swedish articles, use content-based titles#451
Copilot wants to merge 10 commits intomainfrom
copilot/analyze-improve-agentic-workflows

Conversation

Copy link
Contributor

Copilot AI commented Feb 23, 2026

  • Fix translatePhrase prefix matching to require word boundaries (prevents "质询ssvar" hybrids)
  • Add multi-segment translation with possessive 's' stripping for "Xutskottets sammanträde"
  • Add compound dictionary entries (interpellationssvar, sammanträde, återrapportering)
  • Simplify Swedish detection to use X.titel presence instead of fragile regex
  • Fix 91 old 2026-02-22 articles with broken hybrid translations
  • Align JSDoc/comments with updated span-matching behavior
  • Add 6 new tests (compound word boundary, committee meeting, bare span translation)
  • All 1784 tests pass, build succeeds

🔒 GitHub Advanced Security automatically protects Copilot coding agent pull requests. You can protect all pull requests by enabling Advanced Security for your repositories. Learn more about Advanced Security.

Copilot AI changed the title [WIP] Analyze and improve all agentic workflows for article generation Analyze agentic workflows and create targeted issues for article generation improvements Feb 23, 2026
Copilot AI requested a review from pethers February 23, 2026 07:31
@pethers pethers requested a review from Copilot February 23, 2026 10:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review any files in this pull request.

@github-actions github-actions bot added the size-xs Extra small change (< 10 lines) label Feb 23, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers
Copy link
Member

pethers commented Feb 24, 2026

Now all PR:s for issues above have been merged. Analyse all issues and current state and complete any remaining tasks by create github issues and assign to copilot . Analyse differeans areas for copilot to improve

@pethers
Copy link
Member

pethers commented Feb 24, 2026

@copilot
Now all PR:s for issues above have been merged. Analyse all issues and current state and complete any remaining tasks by create github issues and assign to copilot . Analyse differeans areas for copilot to improve

Copilot AI and others added 3 commits February 24, 2026 04:02
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…based titles

- Enhance translateSwedishContent to translate bare <span lang="sv"> spans
  for non-Swedish articles (not just data-translate spans)
- Use generateContentTitle() in generate-news-enhanced.ts for all 3 article
  types instead of hardcoded static titles ("Battle Lines This Week")
- Fix renderMotionEntry and other renderers to always wrap Swedish titles in
  svSpan() so they get translated by translateSwedishContent post-processor
- Batch-fix 91 old 2026-02-22 articles removing data-translate markers and
  translating Swedish text

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI changed the title Analyze agentic workflows and create targeted issues for article generation improvements fix: translate Swedish content in non-Swedish articles, use content-based titles Feb 24, 2026
@pethers pethers requested a review from Copilot February 24, 2026 04:19
@github-actions github-actions bot added html-css HTML/CSS changes translation Translation updates rtl RTL language support (Arabic, Hebrew) deployment Deployment configuration labels Feb 24, 2026
@github-actions github-actions bot added testing Test coverage refactor Code refactoring news News articles and content generation size-xl Extra large change (> 1000 lines) labels Feb 24, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 97 changed files in this pull request and generated 3 comments.

Comment on lines +733 to +737
// For non-Swedish articles: translate ALL <span lang="sv"> spans in one pass
return html.replace(ANY_SV_SPAN_REGEX, (_match: string, attrs: string, inner: string): string => {
// Remove data-translate marker if present, keep lang="sv" for accessibility
const cleanedAttrs = attrs.replace(/\s*data-translate=(?:"true"|'true')/, '').trim();
const translatedInner = translatePhrase(inner, targetLang);
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ANY_SV_SPAN_REGEX at line 718 correctly matches all <span lang="sv"> elements, but the translatePhrase function it calls (line 737) uses longest-prefix matching that produces broken hybrid translations. When dictionary contains "Interpellations" but the span contains "Interpellationssvar", prefix matching returns translation + "svar", creating text like "质询ssvar" (Chinese + Swedish suffix). All Swedish compound words in spans need complete dictionary entries.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 528faf1. translatePhrase now has a word boundary check that prevents prefix matching from splitting compound Swedish words. The character after a prefix match must be a word boundary ([\s.,;:!?\-–—/()0-9]) or end of string — letters like "ssvar" in "Interpellationssvar" no longer trigger a prefix match on "interpellation".

Additionally added complete dictionary entries for "interpellationssvar", "sammanträde", and "återrapportering" so these compound words get exact-match translations.

Comment on lines 2251 to 2252
const looksSwedish = /[åäöÅÄÖ]/.test(titleText) || /^med anledning av /i.test(titleText);
const titleHtml = (motion.titel || looksSwedish)
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Swedish detection regex /[åäöÅÄÖ]/ and /^med anledning av /i is too simplistic and will miss many Swedish titles. It will fail to detect Swedish text that doesn't contain these specific characters or this specific phrase. A more robust approach would check if motion.titel is populated (indicating Swedish API source) or use language detection. This could lead to Swedish text being rendered without the svSpan() wrapper, preventing translation.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 528faf1. Replaced the fragile regex /[åäöÅÄÖ]/ with a simple X.titel presence check. Since titel is only populated by the Swedish Riksdag API, its presence reliably indicates Swedish content — no character-level heuristics needed. Applied to all 6 rendering locations (events, documents, reports, propositions, motions, generic).


<div class="document-entry">
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDC120260324ip/" target="_blank" rel="noopener noreferrer"><span data-translate="true" lang="sv">Interpellationssvar</span></a></h4>
<h4><a href="https://riksdagen.se/sv/dokument-och-lagar/dokument/HDC120260324ip/" target="_blank" rel="noopener noreferrer"><span lang="sv">质询ssvar</span></a></h4>
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The translated text contains partial translations with Swedish suffixes still attached. For example, "质询ssvar" (Chinese + "ssvar"), "استجوابssvar" (Arabic + "ssvar"), "interpellasjonssvar" appears correct. The translation system is splitting compound Swedish words but only translating part of them, leaving Swedish morphemes like "ssvar", "s sammanträde" appended to translated committee names. This creates broken hybrid text that is neither Swedish nor the target language.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 528faf1 + 6696585. All 91 old 2026-02-22 articles have been repaired:

  • "质询ssvar" → "质询答复" (ZH), "ردود الاستجواب" (AR), etc.
  • "Committee on Civil Affairss meeting" → "Committee on Civil Affairs meeting"
  • "国防委员会s 会议" → "国防委员会 会议"

The root cause (prefix matching splitting compound words) is fixed in translatePhrase so new articles won't have this issue.

@pethers
Copy link
Member

pethers commented Feb 24, 2026

@copilot apply changes based on the comments in this thread

Copilot AI and others added 2 commits February 24, 2026 04:35
…ee meetings

- Add word boundary check in translatePhrase prefix matching to prevent
  splitting compound words (e.g., "Interpellationssvar" no longer becomes
  "质询ssvar")
- Add multi-segment translation with Swedish possessive 's' stripping for
  phrases like "Arbetsmarknadsutskottets sammanträde"
- Add dictionary entries for "interpellationssvar", "sammanträde",
  "återrapportering"
- Simplify Swedish detection to use X.titel presence (API source indicator)
  instead of fragile regex character checks
- Fix 91 old articles with broken hybrid translations
- Add 2 new tests for compound word protection

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 24, 2026

@copilot apply changes based on the comments in this thread

All 3 review comments addressed in commits 528faf1 and 6696585:

  1. Word boundary check in translatePhrase prevents compound word splitting
  2. Swedish detection uses X.titel presence instead of regex
  3. All 91 old articles repaired with correct translations

@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 97 changed files in this pull request and generated 1 comment.

Comment on lines +774 to +779
/**
* Matches any `<span …lang="sv"…>…</span>` span — both bare and with data-translate.
* Used for non-Swedish articles to translate ALL Swedish text in a single pass.
*/
const ANY_SV_SPAN_REGEX =
/<span\s+((?=[^>]*lang="sv")[^>]*)>([\s\S]*?)<\/span>/g;
Copy link

Copilot AI Feb 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new ANY_SV_SPAN_REGEX/logic translates all <span …lang="sv"…> spans for non-Swedish targets, but the surrounding documentation in this file still frames the post-processing step as only handling data-translate="true" markers. Please align the nearby JSDoc/comments with the updated behavior to avoid future regressions based on outdated assumptions.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in d071edc. The translateSwedishContent JSDoc now documents the dual-path behavior (Swedish → strip markers only; non-Swedish → translate ALL <span lang="sv"> spans). The TRANSLATABLE_SV_SPAN_REGEX JSDoc clarifies it's only used for Swedish articles, with a cross-reference to ANY_SV_SPAN_REGEX for non-Swedish.

@pethers
Copy link
Member

pethers commented Feb 24, 2026

@copilot apply changes based on the comments in this thread

…hing behavior

Update function and regex documentation to reflect that non-Swedish
articles now process ALL <span lang="sv"> spans (not just data-translate
ones). Clarify that TRANSLATABLE_SV_SPAN_REGEX is only used for Swedish
articles, while ANY_SV_SPAN_REGEX handles non-Swedish translation.

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 24, 2026

@copilot apply changes based on the comments in this thread

Done in d071edc — aligned all JSDoc/comments in translation-dictionary.ts with the updated span-matching behavior.

@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 50 out of 97 changed files in this pull request and generated no new comments.

@pethers pethers closed this Feb 24, 2026
@pethers pethers deleted the copilot/analyze-improve-agentic-workflows branch February 24, 2026 05:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Deployment configuration html-css HTML/CSS changes news News articles and content generation refactor Code refactoring rtl RTL language support (Arabic, Hebrew) size-xl Extra large change (> 1000 lines) size-xs Extra small change (< 10 lines) testing Test coverage translation Translation updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants