Skip to content

Comments

feat: add Evening Analysis Content Validator for intelligence assessment quality#407

Merged
pethers merged 43 commits intomainfrom
stricttyping
Feb 21, 2026
Merged

feat: add Evening Analysis Content Validator for intelligence assessment quality#407
pethers merged 43 commits intomainfrom
stricttyping

Conversation

@pethers
Copy link
Member

@pethers pethers commented Feb 21, 2026

  • Implemented a new script validate-evening-analysis.ts to validate evening analysis articles against defined editorial standards, ensuring comprehensive political intelligence assessments.
  • The validator checks for structural integrity, analytical depth, historical context, party perspectives, international comparisons, and source validation.
  • Introduced a quality scoring system to evaluate articles based on multiple metrics.

feat: create Workflow State Coordinator for multi-workflow synchronization

  • Added workflow-state-coordinator.ts to manage coordination between multiple news generation workflows, preventing duplicate articles and optimizing resource usage.
  • Implemented a deduplication framework based on similarity analysis and time-window filtering.
  • Integrated MCP query caching to reduce redundant API calls and improve performance.

chore: add TypeScript configuration for scripts

  • Created tsconfig.scripts.json to define TypeScript compiler options for scripts, ensuring strict type checking and proper module resolution.
  • Configured output directory and included/excluded paths for better project organization.

…ent quality

- Implemented a new script `validate-evening-analysis.ts` to validate evening analysis articles against defined editorial standards, ensuring comprehensive political intelligence assessments.
- The validator checks for structural integrity, analytical depth, historical context, party perspectives, international comparisons, and source validation.
- Introduced a quality scoring system to evaluate articles based on multiple metrics.

feat: create Workflow State Coordinator for multi-workflow synchronization

- Added `workflow-state-coordinator.ts` to manage coordination between multiple news generation workflows, preventing duplicate articles and optimizing resource usage.
- Implemented a deduplication framework based on similarity analysis and time-window filtering.
- Integrated MCP query caching to reduce redundant API calls and improve performance.

chore: add TypeScript configuration for scripts

- Created `tsconfig.scripts.json` to define TypeScript compiler options for scripts, ensuring strict type checking and proper module resolution.
- Configured output directory and included/excluded paths for better project organization.
@github-actions github-actions bot added dependencies Dependency updates refactor Code refactoring size-xl Extra large change (> 1000 lines) labels Feb 21, 2026
@github-actions
Copy link
Contributor

🏷️ Automatic Labeling Summary

This PR has been automatically labeled based on the files changed and PR metadata.

Applied Labels: dependencies,refactor,size-xl

Label Categories

  • 🗳️ Content: news, dashboard, visualization, intelligence
  • 💻 Technology: html-css, javascript, workflow, security
  • 📊 Data: cia-data, riksdag-data, data-pipeline, schema
  • 🌍 I18n: i18n, translation, rtl
  • 🔒 ISMS: isms, iso-27001, nist-csf, cis-controls
  • 🏗️ Infrastructure: ci-cd, deployment, performance, monitoring
  • 🔄 Quality: testing, accessibility, documentation, refactor
  • 🤖 AI: agent, skill, agentic-workflow

For more information, see .github/labeler.yml.

@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…ty variants, and setup configuration

- Implement tests for editorial pillars including language detection and localized headings.
- Create tests for HTML utility functions, specifically for escaping HTML characters.
- Add unit tests for MCP client functions to ensure correct API interaction.
- Develop comprehensive tests for party variants and extraction of party mentions from HTML.
- Establish a global setup file for Vitest to mock dependencies and provide test utilities.
@github-actions github-actions bot added the testing Test coverage label Feb 21, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

const h3Matches: string[] = [];
let h3Match: RegExpExecArray | null;
while ((h3Match = h3Pattern.exec(content)) !== null) {
const cleanText = h3Match[1]!.replace(/<[^>]+>/g, '').trim();

Check failure

Code scanning / CodeQL

Incomplete multi-character sanitization High

This string may still contain
<script
, which may cause an HTML element injection vulnerability.

Copilot Autofix

AI about 8 hours ago

In general, the problem is that the code attempts to sanitize HTML by stripping tags with a regex that can miss partial or malformed tags; CodeQL is warning that <script (and similar sequences) might remain and later be interpreted as HTML. The safest pattern is: (1) remove tags only to the extent needed, and (2) HTML‑escape the resulting string whenever it may end up in an HTML context, so any < or > that remain are treated as literal characters, not markup.

For this specific code, the best minimal fix is to post‑process the cleaned h3 text with a small HTML‑escape function before storing it in h3Matches. That way, even if the initial regex fails to remove something like <script, the output will contain &lt;script and cannot become an actual script tag when rendered. We can implement a local helper escapeHtml(text: string): string near extractTerms, using straightforward replacements for &, <, >, ", and '. Then, in the while loop around line 75–78, change the code so that cleanText is run through escapeHtml before being pushed to h3Matches. This keeps behavior effectively the same from a user perspective (they still see the textual content of headings), but closes the injection risk that CodeQL identified. No new external dependencies are required; the helper can be implemented directly in scripts/extract-vocabulary.ts.

Concretely:

  • Add a escapeHtml helper function above extractTerms.
  • Update the cleanText handling inside the h3 extraction loop:
    • Keep the existing tag‑stripping regex.
    • After trimming, pass the string to escapeHtml.
    • Push the escaped text into h3Matches instead of the raw string.
Suggested changeset 1
scripts/extract-vocabulary.ts

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/scripts/extract-vocabulary.ts b/scripts/extract-vocabulary.ts
--- a/scripts/extract-vocabulary.ts
+++ b/scripts/extract-vocabulary.ts
@@ -63,6 +63,18 @@
 // ---------------------------------------------------------------------------
 
 /**
+ * Basic HTML entity escaping to ensure extracted text cannot form HTML tags.
+ */
+function escapeHtml(text: string): string {
+  return text
+    .replace(/&/g, '&amp;')
+    .replace(/</g, '&lt;')
+    .replace(/>/g, '&gt;')
+    .replace(/"/g, '&quot;')
+    .replace(/'/g, '&#39;');
+}
+
+/**
  * Extract political terms from HTML content using structure-based approach.
  */
 function extractTerms(content: string, _lang: Language): ExtractedTerms {
@@ -74,7 +86,9 @@
   let h3Match: RegExpExecArray | null;
   while ((h3Match = h3Pattern.exec(content)) !== null) {
     const cleanText = h3Match[1]!.replace(/<[^>]+>/g, '').trim();
-    if (cleanText) h3Matches.push(cleanText);
+    if (cleanText) {
+      h3Matches.push(escapeHtml(cleanText));
+    }
   }
   terms.titles = h3Matches.slice(0, 10);
 
EOF
@@ -63,6 +63,18 @@
// ---------------------------------------------------------------------------

/**
* Basic HTML entity escaping to ensure extracted text cannot form HTML tags.
*/
function escapeHtml(text: string): string {
return text
.replace(/&/g, '&amp;')
.replace(/</g, '&lt;')
.replace(/>/g, '&gt;')
.replace(/"/g, '&quot;')
.replace(/'/g, '&#39;');
}

/**
* Extract political terms from HTML content using structure-based approach.
*/
function extractTerms(content: string, _lang: Language): ExtractedTerms {
@@ -74,7 +86,9 @@
let h3Match: RegExpExecArray | null;
while ((h3Match = h3Pattern.exec(content)) !== null) {
const cleanText = h3Match[1]!.replace(/<[^>]+>/g, '').trim();
if (cleanText) h3Matches.push(cleanText);
if (cleanText) {
h3Matches.push(escapeHtml(cleanText));
}
}
terms.titles = h3Matches.slice(0, 10);

Copilot is powered by AI and may make mistakes. Always verify output.
@github-actions github-actions bot added the javascript JavaScript code changes label Feb 21, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…ate coordination

- Implemented a test suite for evening analysis articles covering structure validation, analytical depth, historical context detection, and more.
- Created unit tests for Workflow State Coordinator, including state management, MCP query caching, recent article tracking, duplicate detection, and similarity calculations.
- Ensured proper handling of metadata directory, cache expiration, and workflow execution recording.
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

…e-news-translations scripts

- Implement comprehensive tests for load-cia-stats.js to verify script configuration, data extraction, output generation, caching strategy, error handling, CSV parsing, data validation, integration with update scripts, and GitHub Actions integration.
- Create unit tests for news-realtime-monitor.js focusing on multi-language synchronization, quality framework, and workflow coordination, including real-world integration tests.
- Develop tests for validate-news-translations.js to ensure validation of translated news articles, including language code detection, untranslated marker detection, multiple file validation, recursive directory scanning, exit codes, and support for CJK and RTL languages.
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

- Implement tests for Motions article generation module.
- Implement tests for Propositions article generation module.
- Implement tests for Week-Ahead article generation module.
- Implement tests for Sitemap generation, ensuring coverage of all content types and languages.
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@github-actions github-actions bot added cia-data CIA platform data integration intelligence Political intelligence analysis html-css HTML/CSS changes riksdag-data Riksdag-Regering MCP data data-pipeline ETL and data processing i18n Internationalization/localization news News articles and content generation labels Feb 21, 2026
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 356 changed files in this pull request and generated 6 comments.


- name: Build with Vite
run: npm run build

Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new build step with Vite is added to generate the dist/ directory, but there's no validation that the build succeeded or that required files were generated. Add a verification step after the build to confirm critical files exist before deployment.

Suggested change
- name: Verify build artifacts
run: |
echo "Verifying Vite build output..."
if [ ! -d "dist" ]; then
echo "❌ Build verification failed: 'dist' directory is missing."
exit 1
fi
if [ ! -f "dist/index.html" ]; then
echo "❌ Build verification failed: 'dist/index.html' is missing."
exit 1
fi
echo "✅ Build artifacts verified: 'dist' directory and 'dist/index.html' exist."

Copilot uses AI. Check for mistakes.
run: npm ci

- name: TypeScript type-check (browser)
run: npx tsc --project tsconfig.browser.json --noEmit
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The workflow is renamed to "TypeScript & JavaScript Testing" but the job name remains "test" which doesn't reflect the dual nature. Consider renaming the job to something like "typescript-javascript-test" for clarity.

Copilot uses AI. Check for mistakes.
<html lang="sv">
<head>
<!-- Title tag (50-60 characters) -->
<title>Politiker Risk analys - Riksdagsmonitor</title>
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected spelling of 'Risk analys' to 'Riskanalys' (compound noun in Swedish).

Copilot uses AI. Check for mistakes.
Comment on lines +58 to +60
SELECT
p.party,
COUNT(DISTINCT vr.ballot_id) as total_votes,
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This SQL example includes a QUALIFY clause (line 76) which is specific to certain database engines (e.g., Snowflake, BigQuery). Consider adding a comment indicating database compatibility or providing an alternative using standard SQL window functions with WHERE for broader portability.

Copilot uses AI. Check for mistakes.
# \/ \/ \___/|_| |_|\_\|_| |_|\___/ \_/\_/ |___/
#
# This file was automatically generated by gh-aw (v0.47.5). DO NOT EDIT.
# This file was automatically generated by gh-aw (v0.48.2). DO NOT EDIT.
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The auto-generated warning states v0.48.2, but the metadata comment on line 26 also references v0.48.2. Ensure consistency is maintained when these files are regenerated, as version mismatches could indicate compilation issues.

Copilot uses AI. Check for mistakes.

**Token Handling Rules:**
```
✅ DO: Use environment variable references (${VAR_NAME})
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example uses shell-style variable syntax ${VAR_NAME}, but depending on the configuration parser, this may need to be $VAR_NAME or a different format. Add a note clarifying which configuration format this applies to (e.g., GitHub Actions, MCP config files).

Suggested change
✅ DO: Use environment variable references (${VAR_NAME})
✅ DO: Use environment variable references (${VAR_NAME}) in MCP JSON config files, and adjust syntax to match your configuration format (e.g., $VAR_NAME in shell scripts or ${{ secrets.VAR_NAME }} in GitHub Actions)

Copilot uses AI. Check for mistakes.
@pethers
Copy link
Member Author

pethers commented Feb 21, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Contributor

Copilot AI commented Feb 21, 2026

@pethers I've opened a new pull request, #417, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits February 21, 2026 21:00
…fix, SQL note, config clarification

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
fix: build verification, SQL compatibility note, spelling, and MCP config syntax clarification
@pethers pethers requested a review from Copilot February 21, 2026 21:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 356 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (1)

.github/workflows/jsdoc-validation.yml:1

  • Variable names like TOTAL_JS/FILES_WITH_JSDOC are now misleading since they include both .ts and .js and are counting documentation comments rather than strictly JSDoc. Consider renaming to something clearer (e.g., TOTAL_SOURCE_FILES and FILES_WITH_DOC_COMMENTS) to avoid confusion when maintaining the workflow.
name: TypeDoc Documentation Validation

# Count JavaScript files
TOTAL_JS=$(find js dashboard scripts -name "*.js" ! -name "*.min.js" ! -path "*/lib/*" | wc -l)
# Count source files (JS + TS) across all source directories
TOTAL_JS=$(find src/browser scripts -name "*.js" -o -name "*.ts" | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l)
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find ... -name "*.js" -o -name "*.ts" expression likely has incorrect precedence (the -o can cause matches that ignore the intended directory/name constraints). Wrap the name predicates in parentheses, similar to how it’s done a few lines later for FILES_WITH_JSDOC, so the total-file count is accurate.

Suggested change
TOTAL_JS=$(find src/browser scripts -name "*.js" -o -name "*.ts" | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l)
TOTAL_JS=$(find src/browser scripts \( -name "*.js" -o -name "*.ts" \) | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l)

Copilot uses AI. Check for mistakes.
Comment on lines +95 to +96
- name: Build with Vite
run: npm run build
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow builds with Vite (producing dist/), but the deploy step still syncs $SRC to S3. That likely deploys source files instead of the built assets, and also makes the build step effectively unused. If the intent is to deploy the built site, adjust the sync source to dist/ (and ensure any needed static assets are present there).

Copilot uses AI. Check for mistakes.
Comment on lines 163 to +171
aws s3 sync "$SRC" "$BUCKET" \
--exclude '.git/*' --exclude 'screenshots/*' \
--exclude '.github/*' --exclude 'docs/*' --exclude 'schemas/*' \
--exclude 'scripts/*' --exclude '.devcontainer/*' --exclude 'quicksight/*' \
--exclude '*.md' --exclude 'package*.json' --exclude '.gitignore'
--exclude 'src/*' --exclude 'tests/*' --exclude 'cypress/*' \
--exclude 'node_modules/*' --exclude 'builds/*' \
--exclude '*.md' --exclude 'package*.json' --exclude '.gitignore' \
--exclude 'tsconfig*.json' --exclude '*.config.js' --exclude 'jsdoc.json' \
--exclude 'typedoc.json' --exclude 'eslint.config.js'
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This workflow builds with Vite (producing dist/), but the deploy step still syncs $SRC to S3. That likely deploys source files instead of the built assets, and also makes the build step effectively unused. If the intent is to deploy the built site, adjust the sync source to dist/ (and ensure any needed static assets are present there).

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers
Copy link
Member Author

pethers commented Feb 21, 2026

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Contributor

Copilot AI commented Feb 21, 2026

@pethers I've opened a new pull request, #418, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits February 21, 2026 21:15
…dist/ in deploy-s3.yml

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
fix: correct find precedence in jsdoc-validation and deploy from dist/ in deploy-s3
@pethers pethers requested a review from Copilot February 21, 2026 21:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 356 changed files in this pull request and generated 2 comments.

Comment on lines 122 to +123
BUCKET="s3://${{ env.S3_BUCKET_NAME }}"
SRC="."
SRC="dist"
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching the deployment source from . to dist will drop any site content that is not emitted into the Vite build output (common examples in static sites: news/, dashboard/, docs/api/, sitemap.xml, robots.txt, and other non-bundled HTML files). If those pages/assets are meant to be published, ensure they are copied into dist/ as part of the build (e.g., via Vite publicDir, a post-build copy step, or a build plugin), or keep deploying from the repo root with explicit excludes.

Copilot uses AI. Check for mistakes.
Comment on lines 148 to 149
# Count source files (JS + TS) across all source directories
TOTAL_JS=$(find src/browser scripts \( -name "*.js" -o -name "*.ts" \) | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l)
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable name TOTAL_JS is no longer accurate since it counts both .js and .ts files. Rename it to something like TOTAL_SOURCE_FILES (and similarly rename FILES_WITH_JSDOC if it’s intended to mean 'files with doc comments' rather than strictly JSDoc) to avoid confusion when interpreting coverage output.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

Copy link
Contributor

Copilot AI commented Feb 21, 2026

@pethers I've opened a new pull request, #419, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits February 21, 2026 21:31
…C_COMMENTS

Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Rename misleading coverage variables in jsdoc-validation workflow
@pethers pethers requested a review from Copilot February 21, 2026 21:43
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 46 out of 356 changed files in this pull request and generated no new comments.

@github-actions
Copy link
Contributor

🔍 Lighthouse Performance Audit

Category Score Status
Performance 85/100 🟡
Accessibility 95/100 🟢
Best Practices 90/100 🟢
SEO 95/100 🟢

📥 Download full Lighthouse report

Budget Compliance: Performance budgets enforced via budget.json

@pethers pethers merged commit f792047 into main Feb 21, 2026
29 of 30 checks passed
@pethers pethers deleted the stricttyping branch February 21, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

accessibility WCAG 2.1 AA compliance agentic-workflow Agentic workflow changes ci-cd CI/CD pipeline changes cia-data CIA platform data integration dashboard Interactive dashboards (Chart.js/D3.js) data-pipeline ETL and data processing dependencies Dependency updates deployment Deployment configuration documentation Documentation updates html-css HTML/CSS changes i18n Internationalization/localization intelligence Political intelligence analysis javascript JavaScript code changes news News articles and content generation refactor Code refactoring riksdag-data Riksdag-Regering MCP data rtl RTL language support (Arabic, Hebrew) security Security improvements size-xl Extra large change (> 1000 lines) skill Skill configuration testing Test coverage translation Translation updates workflow GitHub Actions workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants