feat: add Evening Analysis Content Validator for intelligence assessment quality#407
feat: add Evening Analysis Content Validator for intelligence assessment quality#407
Conversation
…ent quality - Implemented a new script `validate-evening-analysis.ts` to validate evening analysis articles against defined editorial standards, ensuring comprehensive political intelligence assessments. - The validator checks for structural integrity, analytical depth, historical context, party perspectives, international comparisons, and source validation. - Introduced a quality scoring system to evaluate articles based on multiple metrics. feat: create Workflow State Coordinator for multi-workflow synchronization - Added `workflow-state-coordinator.ts` to manage coordination between multiple news generation workflows, preventing duplicate articles and optimizing resource usage. - Implemented a deduplication framework based on similarity analysis and time-window filtering. - Integrated MCP query caching to reduce redundant API calls and improve performance. chore: add TypeScript configuration for scripts - Created `tsconfig.scripts.json` to define TypeScript compiler options for scripts, ensuring strict type checking and proper module resolution. - Configured output directory and included/excluded paths for better project organization.
🏷️ Automatic Labeling SummaryThis PR has been automatically labeled based on the files changed and PR metadata. Applied Labels: dependencies,refactor,size-xl Label Categories
For more information, see |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…ty variants, and setup configuration - Implement tests for editorial pillars including language detection and localized headings. - Create tests for HTML utility functions, specifically for escaping HTML characters. - Add unit tests for MCP client functions to ensure correct API interaction. - Develop comprehensive tests for party variants and extraction of party mentions from HTML. - Establish a global setup file for Vitest to mock dependencies and provide test utilities.
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
| const h3Matches: string[] = []; | ||
| let h3Match: RegExpExecArray | null; | ||
| while ((h3Match = h3Pattern.exec(content)) !== null) { | ||
| const cleanText = h3Match[1]!.replace(/<[^>]+>/g, '').trim(); |
Check failure
Code scanning / CodeQL
Incomplete multi-character sanitization High
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 8 hours ago
In general, the problem is that the code attempts to sanitize HTML by stripping tags with a regex that can miss partial or malformed tags; CodeQL is warning that <script (and similar sequences) might remain and later be interpreted as HTML. The safest pattern is: (1) remove tags only to the extent needed, and (2) HTML‑escape the resulting string whenever it may end up in an HTML context, so any < or > that remain are treated as literal characters, not markup.
For this specific code, the best minimal fix is to post‑process the cleaned h3 text with a small HTML‑escape function before storing it in h3Matches. That way, even if the initial regex fails to remove something like <script, the output will contain <script and cannot become an actual script tag when rendered. We can implement a local helper escapeHtml(text: string): string near extractTerms, using straightforward replacements for &, <, >, ", and '. Then, in the while loop around line 75–78, change the code so that cleanText is run through escapeHtml before being pushed to h3Matches. This keeps behavior effectively the same from a user perspective (they still see the textual content of headings), but closes the injection risk that CodeQL identified. No new external dependencies are required; the helper can be implemented directly in scripts/extract-vocabulary.ts.
Concretely:
- Add a
escapeHtmlhelper function aboveextractTerms. - Update the
cleanTexthandling inside theh3extraction loop:- Keep the existing tag‑stripping regex.
- After trimming, pass the string to
escapeHtml. - Push the escaped text into
h3Matchesinstead of the raw string.
| @@ -63,6 +63,18 @@ | ||
| // --------------------------------------------------------------------------- | ||
|
|
||
| /** | ||
| * Basic HTML entity escaping to ensure extracted text cannot form HTML tags. | ||
| */ | ||
| function escapeHtml(text: string): string { | ||
| return text | ||
| .replace(/&/g, '&') | ||
| .replace(/</g, '<') | ||
| .replace(/>/g, '>') | ||
| .replace(/"/g, '"') | ||
| .replace(/'/g, '''); | ||
| } | ||
|
|
||
| /** | ||
| * Extract political terms from HTML content using structure-based approach. | ||
| */ | ||
| function extractTerms(content: string, _lang: Language): ExtractedTerms { | ||
| @@ -74,7 +86,9 @@ | ||
| let h3Match: RegExpExecArray | null; | ||
| while ((h3Match = h3Pattern.exec(content)) !== null) { | ||
| const cleanText = h3Match[1]!.replace(/<[^>]+>/g, '').trim(); | ||
| if (cleanText) h3Matches.push(cleanText); | ||
| if (cleanText) { | ||
| h3Matches.push(escapeHtml(cleanText)); | ||
| } | ||
| } | ||
| terms.titles = h3Matches.slice(0, 10); | ||
|
|
…d add unit tests for MCP client
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…ate coordination - Implemented a test suite for evening analysis articles covering structure validation, analytical depth, historical context detection, and more. - Created unit tests for Workflow State Coordinator, including state management, MCP query caching, recent article tracking, duplicate detection, and similarity calculations. - Ensured proper handling of metadata directory, cache expiration, and workflow execution recording.
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…e-news-translations scripts - Implement comprehensive tests for load-cia-stats.js to verify script configuration, data extraction, output generation, caching strategy, error handling, CSV parsing, data validation, integration with update scripts, and GitHub Actions integration. - Create unit tests for news-realtime-monitor.js focusing on multi-language synchronization, quality framework, and workflow coordination, including real-world integration tests. - Develop tests for validate-news-translations.js to ensure validation of translated news articles, including language code detection, untranslated marker detection, multiple file validation, recursive directory scanning, exit codes, and support for CJK and RTL languages.
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
- Implement tests for Motions article generation module. - Implement tests for Propositions article generation module. - Implement tests for Week-Ahead article generation module. - Implement tests for Sitemap generation, ensuring coverage of all content types and languages.
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
|
|
||
| - name: Build with Vite | ||
| run: npm run build | ||
|
|
There was a problem hiding this comment.
The new build step with Vite is added to generate the dist/ directory, but there's no validation that the build succeeded or that required files were generated. Add a verification step after the build to confirm critical files exist before deployment.
| - name: Verify build artifacts | |
| run: | | |
| echo "Verifying Vite build output..." | |
| if [ ! -d "dist" ]; then | |
| echo "❌ Build verification failed: 'dist' directory is missing." | |
| exit 1 | |
| fi | |
| if [ ! -f "dist/index.html" ]; then | |
| echo "❌ Build verification failed: 'dist/index.html' is missing." | |
| exit 1 | |
| fi | |
| echo "✅ Build artifacts verified: 'dist' directory and 'dist/index.html' exist." |
| run: npm ci | ||
|
|
||
| - name: TypeScript type-check (browser) | ||
| run: npx tsc --project tsconfig.browser.json --noEmit |
There was a problem hiding this comment.
The workflow is renamed to "TypeScript & JavaScript Testing" but the job name remains "test" which doesn't reflect the dual nature. Consider renaming the job to something like "typescript-javascript-test" for clarity.
| <html lang="sv"> | ||
| <head> | ||
| <!-- Title tag (50-60 characters) --> | ||
| <title>Politiker Risk analys - Riksdagsmonitor</title> |
There was a problem hiding this comment.
Corrected spelling of 'Risk analys' to 'Riskanalys' (compound noun in Swedish).
| SELECT | ||
| p.party, | ||
| COUNT(DISTINCT vr.ballot_id) as total_votes, |
There was a problem hiding this comment.
This SQL example includes a QUALIFY clause (line 76) which is specific to certain database engines (e.g., Snowflake, BigQuery). Consider adding a comment indicating database compatibility or providing an alternative using standard SQL window functions with WHERE for broader portability.
| # \/ \/ \___/|_| |_|\_\|_| |_|\___/ \_/\_/ |___/ | ||
| # | ||
| # This file was automatically generated by gh-aw (v0.47.5). DO NOT EDIT. | ||
| # This file was automatically generated by gh-aw (v0.48.2). DO NOT EDIT. |
There was a problem hiding this comment.
The auto-generated warning states v0.48.2, but the metadata comment on line 26 also references v0.48.2. Ensure consistency is maintained when these files are regenerated, as version mismatches could indicate compilation issues.
|
|
||
| **Token Handling Rules:** | ||
| ``` | ||
| ✅ DO: Use environment variable references (${VAR_NAME}) |
There was a problem hiding this comment.
The example uses shell-style variable syntax ${VAR_NAME}, but depending on the configuration parser, this may need to be $VAR_NAME or a different format. Add a note clarifying which configuration format this applies to (e.g., GitHub Actions, MCP config files).
| ✅ DO: Use environment variable references (${VAR_NAME}) | |
| ✅ DO: Use environment variable references (${VAR_NAME}) in MCP JSON config files, and adjust syntax to match your configuration format (e.g., $VAR_NAME in shell scripts or ${{ secrets.VAR_NAME }} in GitHub Actions) |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
…fix, SQL note, config clarification Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
fix: build verification, SQL compatibility note, spelling, and MCP config syntax clarification
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 46 out of 356 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (1)
.github/workflows/jsdoc-validation.yml:1
- Variable names like
TOTAL_JS/FILES_WITH_JSDOCare now misleading since they include both.tsand.jsand are counting documentation comments rather than strictly JSDoc. Consider renaming to something clearer (e.g.,TOTAL_SOURCE_FILESandFILES_WITH_DOC_COMMENTS) to avoid confusion when maintaining the workflow.
name: TypeDoc Documentation Validation
| # Count JavaScript files | ||
| TOTAL_JS=$(find js dashboard scripts -name "*.js" ! -name "*.min.js" ! -path "*/lib/*" | wc -l) | ||
| # Count source files (JS + TS) across all source directories | ||
| TOTAL_JS=$(find src/browser scripts -name "*.js" -o -name "*.ts" | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l) |
There was a problem hiding this comment.
The find ... -name "*.js" -o -name "*.ts" expression likely has incorrect precedence (the -o can cause matches that ignore the intended directory/name constraints). Wrap the name predicates in parentheses, similar to how it’s done a few lines later for FILES_WITH_JSDOC, so the total-file count is accurate.
| TOTAL_JS=$(find src/browser scripts -name "*.js" -o -name "*.ts" | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l) | |
| TOTAL_JS=$(find src/browser scripts \( -name "*.js" -o -name "*.ts" \) | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l) |
| - name: Build with Vite | ||
| run: npm run build |
There was a problem hiding this comment.
This workflow builds with Vite (producing dist/), but the deploy step still syncs $SRC to S3. That likely deploys source files instead of the built assets, and also makes the build step effectively unused. If the intent is to deploy the built site, adjust the sync source to dist/ (and ensure any needed static assets are present there).
| aws s3 sync "$SRC" "$BUCKET" \ | ||
| --exclude '.git/*' --exclude 'screenshots/*' \ | ||
| --exclude '.github/*' --exclude 'docs/*' --exclude 'schemas/*' \ | ||
| --exclude 'scripts/*' --exclude '.devcontainer/*' --exclude 'quicksight/*' \ | ||
| --exclude '*.md' --exclude 'package*.json' --exclude '.gitignore' | ||
| --exclude 'src/*' --exclude 'tests/*' --exclude 'cypress/*' \ | ||
| --exclude 'node_modules/*' --exclude 'builds/*' \ | ||
| --exclude '*.md' --exclude 'package*.json' --exclude '.gitignore' \ | ||
| --exclude 'tsconfig*.json' --exclude '*.config.js' --exclude 'jsdoc.json' \ | ||
| --exclude 'typedoc.json' --exclude 'eslint.config.js' |
There was a problem hiding this comment.
This workflow builds with Vite (producing dist/), but the deploy step still syncs $SRC to S3. That likely deploys source files instead of the built assets, and also makes the build step effectively unused. If the intent is to deploy the built site, adjust the sync source to dist/ (and ensure any needed static assets are present there).
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
|
@copilot open a new pull request to apply changes based on the comments in this thread |
…dist/ in deploy-s3.yml Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
fix: correct find precedence in jsdoc-validation and deploy from dist/ in deploy-s3
| BUCKET="s3://${{ env.S3_BUCKET_NAME }}" | ||
| SRC="." | ||
| SRC="dist" |
There was a problem hiding this comment.
Switching the deployment source from . to dist will drop any site content that is not emitted into the Vite build output (common examples in static sites: news/, dashboard/, docs/api/, sitemap.xml, robots.txt, and other non-bundled HTML files). If those pages/assets are meant to be published, ensure they are copied into dist/ as part of the build (e.g., via Vite publicDir, a post-build copy step, or a build plugin), or keep deploying from the repo root with explicit excludes.
| # Count source files (JS + TS) across all source directories | ||
| TOTAL_JS=$(find src/browser scripts \( -name "*.js" -o -name "*.ts" \) | grep -v node_modules | grep -v lib/ | grep -v ".min." | grep -v ".d.ts" | wc -l) |
There was a problem hiding this comment.
The variable name TOTAL_JS is no longer accurate since it counts both .js and .ts files. Rename it to something like TOTAL_SOURCE_FILES (and similarly rename FILES_WITH_JSDOC if it’s intended to mean 'files with doc comments' rather than strictly JSDoc) to avoid confusion when interpreting coverage output.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
…C_COMMENTS Co-authored-by: pethers <1726836+pethers@users.noreply.github.com>
Rename misleading coverage variables in jsdoc-validation workflow
🔍 Lighthouse Performance Audit
📥 Download full Lighthouse report Budget Compliance: Performance budgets enforced via |
validate-evening-analysis.tsto validate evening analysis articles against defined editorial standards, ensuring comprehensive political intelligence assessments.feat: create Workflow State Coordinator for multi-workflow synchronization
workflow-state-coordinator.tsto manage coordination between multiple news generation workflows, preventing duplicate articles and optimizing resource usage.chore: add TypeScript configuration for scripts
tsconfig.scripts.jsonto define TypeScript compiler options for scripts, ensuring strict type checking and proper module resolution.