Skip to content

Comments

Group duplicate motions by parent proposition; group propositions by committee#463

Open
Copilot wants to merge 7 commits intomainfrom
copilot/fix-duplicate-motions-entry
Open

Group duplicate motions by parent proposition; group propositions by committee#463
Copilot wants to merge 7 commits intomainfrom
copilot/fix-duplicate-motions-entry

Conversation

Copy link
Contributor

Copilot AI commented Feb 23, 2026

Multiple motions referencing the same government proposition rendered identical <h3> titles, causing 2–3× visual repetition per article. Propositions had no structural grouping by referred committee.

Motions (generateMotionsContent)

  • New exported groupMotionsByProposition(motions): extracts prop. YYYY/YY:NNN via strict regex /med anledning av prop\.\s+(\d{4}\/\d{2}:\d+)/i, returns { grouped: Map<propId, RawDocument[]>, independent: RawDocument[] }
  • Grouped motions render under a single <h3> per proposition; individual motions use <h4> showing their unique document ID
  • Independent motions get their own <h2> section only when grouped motions are also present
  • Section labels use L(lang, 'responsesToProp') / L(lang, 'independentMotions') — no hardcoded 14-branch ternaries
// Before: same title repeated N times
<h3>med anledning av prop. 2025/26:118 Tillståndsprövning...</h3>  ← HD023912
<h3>med anledning av prop. 2025/26:118 Tillståndsprövning...</h3>  ← HD023908

// After: one heading, distinct sub-entries
<h2>Responses to Government Propositions</h2>
  <h3>Prop. 2025/26:118: Tillståndsprövning...</h3>
    <h4>HD023912 …</h4>
    <h4>HD023908 …</h4>

Propositions (generatePropositionsContent)

  • New exported groupPropositionsByCommittee(propositions): keys on organ ?? committee ?? '', returns Map<committeeCode, RawDocument[]>
  • Multiple committees → <h3> per committee, <h4> per proposition title
  • Single committee → <h3> per proposition with inline "Referred to" line preserved (no redundant heading)
  • Reuses byCommitteeGroup.size for policyImplicationsContext domain count

Labels

  • responsesToProp and independentMotions added to ContentLabelSet and all 14 language objects in CONTENT_LABELS

Tests

  • Unit tests for groupMotionsByProposition (strict-ID regex, empty input, title fallback, total-count invariant)
  • Unit tests for groupPropositionsByCommittee (organ/committee fallback, empty-string key, total-count invariant)
  • 28 label-presence tests (2 keys × 14 languages)
  • Rendering tests: single-committee uses h3, multi-committee uses h4
Original prompt

This section details on the original issue you should resolve

<issue_title>Group duplicate motions by parent proposition to eliminate repetitive entries</issue_title>
<issue_description>## 📋 Issue Type
Bug Fix / Content Quality

🎯 Objective

Fix duplicate document entries appearing in generated articles. Currently, the same motion/proposition can appear multiple times in a single article because multiple motions reference the same government proposition, and the deduplication logic doesn't account for this.

📊 Current State

In news/2026-02-22-opposition-motions-en.html:

  • Motion title "med anledning av prop. 2025/26:118 Tillståndsprövning enligt förnybartdirektivet" appears twice (HD023912 and HD023908)
  • Motion title "med anledning av prop. 2025/26:108 Reformering av avfallslagstiftningen" appears three times (HD023909, HD023907, HD023906)
  • Motion title "med anledning av prop. 2025/26:106 Identitetskrav vid lagfart" appears twice (HD023910, HD023905)

This makes articles look repetitive and inflated. Each motion should still be listed but grouped under its parent proposition, with each individual motion clearly distinguished.

🚀 Desired State

  1. Group motions by parent proposition — Motions responding to the same prop should be grouped together
  2. Show parent proposition as heading — "Responses to Prop. 2025/26:108: Waste Legislation Reform"
  3. List individual motions under group — Each with unique document ID, author, and specific demands
  4. Avoid visual repetition — Don't repeat the same proposition title 3 times

🔧 Implementation Approach

Target file: scripts/data-transformers.ts (generateMotionsContent and related functions)

  1. In generateMotionsContent(), group motions by their parent proposition reference (extract from title pattern "med anledning av prop. YYYY/YY:NNN")
  2. Create a groupMotionsByProposition(motions) helper that returns Map<string, Motion[]>
  3. Render grouped motions with the proposition as a section header
  4. Each individual motion within the group shows its unique ID, author, and specific proposal
  5. For motions not responding to a proposition, list them in a separate "Independent Motions" section

✅ Acceptance Criteria

  • Motions responding to same proposition are grouped together
  • No visual duplication of the same proposition title
  • Each individual motion still listed with unique document ID
  • Independent motions (not responding to a proposition) handled separately
  • Unit tests verify grouping logic
  • Apply same grouping for propositions (by committee reference)

📚 References

  • scripts/data-transformers.ts ~line 1953 (generateMotionsContent)
  • Motion title pattern: "med anledning av prop. YYYY/YY:NNN ..."
  • Example: HD023912 and HD023908 both reference prop. 2025/26:118

🤖 Recommended Agent

Copilot coding agent — add grouping logic in data-transformers.ts</issue_description>

<agent_instructions>Focus ONLY on the generateMotionsContent() function in scripts/data-transformers.ts (~line 1953). Add a helper function groupMotionsByProposition(motions) that extracts the parent proposition reference from each motion title (pattern: 'med anledning av prop. YYYY/YY:NNN') and groups them into a Map. Modify the HTML generation to render grouped motions with proposition headings. Do NOT modify mcp-client.ts, generate-news-enhanced.ts, title generation code, or any workflow files. Update tests in tests/data-transformers.test.ts only.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 2 commits February 23, 2026 08:05
- Add groupMotionsByProposition() exported helper that groups motions by
  their parent proposition reference (pattern: med anledning av prop. YYYY/YY:NNN)
- Modify generateMotionsContent() to render proposition-grouped sections:
  - Proposition groups get h3 heading (In response to: Prop. YYYY/YY:NNN ...)
  - Individual motions within groups get h4 headings showing their document ID
  - Independent motions use h3 with their title (unchanged behavior)
  - Independent Motions section header added when mixed with grouped motions
- Add responsesToProp and independentMotions labels to all 14 languages
- Add responsesToProp and independentMotions to ContentLabelSet type
- Add 40 new unit tests for groupMotionsByProposition and grouped rendering

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
…cing

- Extract PROP_REFERENCE_REGEX and PROP_FULL_REF_REGEX as named constants
  to avoid duplication between groupMotionsByProposition() and the rendering loop
- Replace non-null assertion (!) with ?? [] and optional chaining for safer access

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix duplicate motions by parent proposition Group duplicate motions by parent proposition to eliminate repetitive entries Feb 23, 2026
Copilot AI requested a review from pethers February 23, 2026 08:10
…ndering

- Add groupPropositionsByCommittee() exported helper: groups propositions
  by organ/committee field into Map<string, RawDocument[]>; key '' for uncategorised
- Modify generatePropositionsContent() to render committee-grouped sections:
  - Multiple committees → h3 section per committee, h4 per proposition
  - Single committee (no heading shown) → h3 per proposition + 'Referred to' line preserved
  - Reuse byCommitteeGroup.size for domain count in Policy Implications (removes separate counter)
- Add 9 unit tests for groupPropositionsByCommittee and committee-grouped rendering
- 468 total tests passing; CodeQL: 0 alerts

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@pethers pethers marked this pull request as ready for review February 23, 2026 09:30
Copilot AI changed the title Group duplicate motions by parent proposition to eliminate repetitive entries Group duplicate motions by parent proposition; group propositions by committee Feb 23, 2026
@pethers pethers requested a review from Copilot February 23, 2026 10:47
@github-actions github-actions bot added testing Test coverage refactor Code refactoring size-l Large change (250-1000 lines) labels Feb 23, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request introduces motion and proposition grouping logic to eliminate visual repetition in generated articles. Motions referencing the same government proposition are now grouped under a single heading, and propositions are grouped by their referred committee.

Changes:

  • Adds groupMotionsByProposition() helper that extracts proposition references using regex patterns and returns a Map grouping motions by proposition ID
  • Adds groupPropositionsByCommittee() helper that groups propositions by their organ/committee field
  • Refactors motion rendering to use h3 for group headings and h4 for individual motion entries within groups
  • Refactors proposition rendering to show committee section headings when multiple committees are present
  • Adds responsesToProp and independentMotions labels to all 14 language localization files

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
scripts/data-transformers.ts Core changes: adds two new exported grouping functions, two private regex constants, refactors generateMotionsContent and generatePropositionsContent to render grouped entries with conditional heading levels (h3 vs h4)
scripts/types/content.ts Extends ContentLabelSet interface with two new string properties for localized labels
tests/data-transformers.test.ts Comprehensive test coverage: 49 new tests covering grouping logic, label presence across 14 languages, and rendering behavior for grouped vs ungrouped scenarios

Comment on lines 1355 to 1356
/** Regex to capture the full proposition reference text (including description) */
const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i;
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PROP_FULL_REF_REGEX pattern uses a greedy (?:\s+.+)? suffix which will capture everything until the end of the string, potentially including unwanted trailing content. This could lead to excessively long proposition headings that include motion-specific text rather than just the proposition reference.

Consider using a non-greedy match or limiting the capture to a reasonable length. For example, the pattern could be:

  • /med anledning av (prop\.\s*\S+(?:\s+[^<]+?)?(?=\s*$|<))/i to stop at HTML tags
  • Or simply capturing up to a certain number of words after the proposition ID

This would ensure that only the proposition title is included in the group heading, not the entire motion title.

Suggested change
/** Regex to capture the full proposition reference text (including description) */
const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i;
/** Regex to capture the full proposition reference text (including description, but not trailing HTML or motion text) */
const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+[^<]+?)?(?=\s*$|<))/i;

Copilot uses AI. Check for mistakes.
Comment on lines 1353 to 1356
const PROP_REFERENCE_REGEX = /med anledning av prop\.\s+(\S+)/i;

/** Regex to capture the full proposition reference text (including description) */
const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i;
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern PROP_REFERENCE_REGEX uses \S+ to capture the proposition ID, which will match any sequence of non-whitespace characters. This is too permissive and could capture additional text beyond the proposition ID (e.g., "2025/26:118Tillståndsprövning" if there's no space).

Consider using a more specific pattern that matches the expected format: /med anledning av prop(?:ositionen?)?\.\s+(\d{4}\/\d{2}:\d+)/i

This would ensure only valid proposition IDs like "2025/26:118" are captured, matching the format documented in the repository memories.

Suggested change
const PROP_REFERENCE_REGEX = /med anledning av prop\.\s+(\S+)/i;
/** Regex to capture the full proposition reference text (including description) */
const PROP_FULL_REF_REGEX = /med anledning av (prop\.\s*\S+(?:\s+.+)?)/i;
const PROP_REFERENCE_REGEX = /med anledning av prop(?:itionen?)?\.\s+(\d{4}\/\d{2}:\d+)/i;
/** Regex to capture the full proposition reference text (including description) */
const PROP_FULL_REF_REGEX = /med anledning av (prop(?:itionen?)?\.\s*\d{4}\/\d{2}:\d+(?:\s+.+)?)/i;

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Member

pethers commented Feb 23, 2026

@copilot apply changes based on the comments in this thread

- PROP_REFERENCE_REGEX: replace \S+ with \d{4}\/\d{2}:\d+ to only match
  valid proposition ID format (e.g. 2025/26:118), rejecting malformed input
- PROP_FULL_REF_REGEX: replace greedy (?:\s+.+)? with non-greedy [^<]+?
  anchored to end-of-string/HTML-tag boundary to prevent capturing excess
  trailing content when motion titles contain additional motion-specific text

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Copy link
Contributor Author

Copilot AI commented Feb 23, 2026

@copilot apply changes based on the comments in this thread

Done in 94b137e. Both patterns tightened:

  • PROP_REFERENCE_REGEX: \S+\d{4}\/\d{2}:\d+ (only matches valid YYYY/YY:NNN format)
  • PROP_FULL_REF_REGEX: greedy (?:\s+.+)? → non-greedy (?:\s+[^<]+?)?(?=\s*$|<) (stops at HTML tags or end of string, no excess trailing text)

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated no new comments.

- Take origin/main as base for data-transformers.ts, types/content.ts and
  tests/data-transformers.test.ts (which had landed PRs #453, #457, etc.)
- Apply our PR changes on top:
  * extractPropRef: tighten regex to \d{4}\/\d{2}:\d+ (strict ID format)
  * groupMotionsByProposition: add export keyword; keep {grouped,independent}
    return type from main; replace inline lang-switches with L() labels
  * generateMotionsContent: use L(lang,'responsesToProp') and
    L(lang,'independentMotions') instead of 14-branch ternary chains
  * groupPropositionsByCommittee: new exported helper (Map<committeeKey,[...]>)
  * generatePropositionsContent: committee-grouped rendering (h3 per committee,
    h4 per prop when multi-committee; h3 per prop + Referred-to line otherwise)
    reuses byCommitteeGroup.size for policyImplicationsContext domainCount
  * ContentLabelSet: add responsesToProp and independentMotions fields
  * CONTENT_LABELS: add both new fields to all 14 language objects
- Tests: add unit tests for exported helpers ({grouped,independent} API),
  label presence tests (28 tests for 2 keys × 14 langs), proposition grouping
  rendering tests; all 1674 tests passing across 39 test files

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
@pethers pethers requested a review from Copilot February 23, 2026 14:44
@github-actions github-actions bot added deployment Deployment configuration size-xl Extra large change (> 1000 lines) labels Feb 23, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 4 comments.

Comments suppressed due to low confidence (1)

scripts/data-transformers.ts:2155

  • In grouped motion rendering, each motion entry still uses <h3> with the full motion title. For motions responding to the same proposition this reintroduces the duplicated heading problem under the group and also creates an invalid heading hierarchy (h3 group header followed by multiple h3 siblings). Consider rendering per-motion entries as h4 and using a unique identifier (e.g. dok_id + author/party, or the title with the proposition prefix stripped) to avoid repeating the same text.
    return `
    <div class="motion-entry">
      <h3>${titleHtml}</h3>
      <p><strong>${L(lang, 'filedBy')}:</strong> ${authorLine}</p>
      <p>${summaryHtml}</p>

Comment on lines +2173 to +2174

content += ` <h3>${escapeHtml(`Prop. ${propRef}: ${propTitle}`)}</h3>\n`;
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposition group header (Prop. ${propRef}: ${propTitle}) is emitted as plain escaped text. If propTitle originates from Swedish API fields (titel), it should be wrapped/marked consistently with the rest of the Swedish-source content so the translation step can detect and translate it in non-Swedish articles.

Suggested change
content += ` <h3>${escapeHtml(`Prop. ${propRef}: ${propTitle}`)}</h3>\n`;
const safePropRef = escapeHtml(String(propRef));
const safePropTitle = escapeHtml(String(propTitle));
// Mark Swedish-source proposition title segment so translation can detect it
content += ` <h3>Prop. ${safePropRef}: <span data-source-lang="sv">${safePropTitle}</span></h3>\n`;

Copilot uses AI. Check for mistakes.
Comment on lines +618 to 626
it('should wrap Swedish titel in lang="sv" span for non-Swedish committee reports', () => {
const content = generateArticleContent(
{ reports: [{ titel: 'Bättre förutsättningar', url: '#', organ: 'FiU' }] } as MockArticlePayload,
'committee-reports',
'en'
) as string;
expect(content).toContain('data-translate="true"');
// Non-Swedish: lang="sv" for accessibility but no data-translate marker
expect(content).not.toContain('data-translate="true"');
expect(content).toContain('lang="sv"');
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests now assert that non-Swedish article generation output contains no data-translate="true" markers even when Swedish API fields (titel/Swedish summaries) are present. That contradicts the documented translation flow, where generation must mark Swedish text for later LLM translation and validate-news verifies markers are removed after translation. Consider keeping marker expectations for non-Swedish generation output here, or splitting tests into “pre-translation” vs “post-translation” validation stages.

Copilot uses AI. Check for mistakes.
Comment on lines +2876 to +2912
function extractPolicyDomains(documents: RawDocument[]): string[] {
const counts: Record<string, number> = {};
for (const doc of documents) {
const text = `${doc.titel ?? ''} ${doc.rubrik ?? ''} ${doc.summary ?? ''} ${doc.notis ?? ''}`.toLowerCase();
for (const [domain, keywords] of Object.entries(POLICY_DOMAIN_KEYWORDS)) {
if (keywords.some(kw => text.includes(kw))) {
counts[domain] = (counts[domain] ?? 0) + 1;
}
}
}
return Object.entries(counts)
.sort((a, b) => b[1] - a[1])
.slice(0, 3)
.map(([domain]) => domain);
}

/**
* Generate a content-aware article title derived from the policy domains found in documents.
*
* Extracts the top 2 policy themes from the provided documents and returns a
* language-specific title that reflects the actual content. Returns `null` when
* fewer than 2 distinct domains can be detected so callers can fall back to
* their static titles.
*
* @param documents - Source documents to analyse for policy themes
* @param lang - Target language code (e.g. `'en'`, `'sv'`, `'de'`)
* @param articleType - Article type: `'motions'` | `'propositions'` | `'committee-reports'`
* @returns Content-based `{ title, subtitle }`, or `null` when analysis is insufficient
*/
export function generateContentTitle(
documents: RawDocument[],
lang: Language | string,
articleType: 'motions' | 'propositions' | 'committee-reports'
): { title: string; subtitle: string } | null {
const domains = extractPolicyDomains(documents);
if (domains.length < 2) return null;

Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces generateContentTitle plus large keyword/translation/template tables. This appears unrelated to the stated PR goal (grouping motions/propositions) and increases maintenance surface. Consider moving this feature (and its tests) to a separate PR, or updating the PR description to explicitly include and justify this change.

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +236
* For Swedish articles (`lang === 'sv'`) the span carries both the
* `lang="sv"` accessibility attribute AND `data-translate="true"` so
* quality-validation tooling can verify that Swedish articles contain the
* original text.
*
* For **all other** languages the span carries only `lang="sv"` (screen
* readers still know the text is Swedish) but the `data-translate` marker is
* intentionally omitted — it signals "this text should be translated" but no
* client-side translation mechanism exists, so the marker only causes false
* validation failures in non-Swedish articles.
*
* @param escapedText - Already HTML-escaped text content
* @param lang - Target article language (e.g. `'sv'`, `'en'`)
*/
function svSpan(escapedText: string, lang: Language | string): string {
if (lang === 'sv') {
return `<span data-translate="true" lang="sv">${escapedText}</span>`;
}
return `<span lang="sv">${escapedText}</span>`;
Copy link

Copilot AI Feb 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

svSpan currently omits data-translate="true" for non-Swedish articles. This breaks the repo’s documented translation workflow and validators that rely on these markers to find Swedish API text that must be translated and removed before publishing (see .github/workflows/news-article-generator.md and scripts/validate-news-translations.ts). Consider emitting data-translate="true" lang="sv" for Swedish-source text when lang !== 'sv', and omitting the marker for Swedish (sv) output.

Suggested change
* For Swedish articles (`lang === 'sv'`) the span carries both the
* `lang="sv"` accessibility attribute AND `data-translate="true"` so
* quality-validation tooling can verify that Swedish articles contain the
* original text.
*
* For **all other** languages the span carries only `lang="sv"` (screen
* readers still know the text is Swedish) but the `data-translate` marker is
* intentionally omitted it signals "this text should be translated" but no
* client-side translation mechanism exists, so the marker only causes false
* validation failures in non-Swedish articles.
*
* @param escapedText - Already HTML-escaped text content
* @param lang - Target article language (e.g. `'sv'`, `'en'`)
*/
function svSpan(escapedText: string, lang: Language | string): string {
if (lang === 'sv') {
return `<span data-translate="true" lang="sv">${escapedText}</span>`;
}
return `<span lang="sv">${escapedText}</span>`;
* For Swedish articles (`lang === 'sv'`) the span carries only the
* `lang="sv"` accessibility attribute. No `data-translate` marker is used in
* Swedish output, because the text is already in the target language.
*
* For **all other** languages the span carries both `lang="sv"` and
* `data-translate="true"`. This marks embedded Swedish-source text in
* non-Swedish articles so that the translation workflow and validators (see
* `.github/workflows/news-article-generator.md` and
* `scripts/validate-news-translations.ts`) can detect, translate, and remove
* the original Swedish before publishing.
*
* @param escapedText - Already HTML-escaped text content
* @param lang - Target article language (e.g. `'sv'`, `'en'`)
*/
function svSpan(escapedText: string, lang: Language | string): string {
if (lang === 'sv') {
return `<span lang="sv">${escapedText}</span>`;
}
return `<span data-translate="true" lang="sv">${escapedText}</span>`;

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

deployment Deployment configuration refactor Code refactoring size-l Large change (250-1000 lines) size-xl Extra large change (> 1000 lines) testing Test coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Group duplicate motions by parent proposition to eliminate repetitive entries

2 participants