[security] fix: guard rss transcript fetches#239
Conversation
|
Codex review: needs maintainer review before merge. Reviewed June 5, 2026, 8:55 PM ET / 00:55 UTC. Summary Reproducibility: yes. from source inspection: current main fetches feed-controlled RSS transcript URLs directly with automatic redirects, and the PR's tests and runtime proof exercise the blocked loopback/private and allowed public paths. I did not run the exploit locally because this review is read-only. Review metrics: 2 noteworthy metrics.
Merge readiness Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch. Rank-up moves:
Risk before merge
Maintainer options:
Next step before merge
Security Review detailsBest possible solution: Land the scoped RSS transcript guard once maintainers accept the fail-closed compatibility impact and either accept the duplicated guard as a short-term boundary or request a shared helper first. Do we have a high-confidence way to reproduce the issue? Yes, from source inspection: current main fetches feed-controlled RSS transcript URLs directly with automatic redirects, and the PR's tests and runtime proof exercise the blocked loopback/private and allowed public paths. I did not run the exploit locally because this review is read-only. Is this the best way to solve the issue? Yes, the PR is a narrow and maintainable security hardening path for the core RSS transcript boundary, but maintainers should explicitly accept the private-network compatibility break and duplicated guard logic before merge. AGENTS.md: found and applied where relevant. Codex review notes: model gpt-5.5, reasoning high; reviewed against f40e26b1330e. Label changesLabel changes:
Label justifications:
Evidence reviewedWhat I checked:
Likely related people:
What the crustacean ranks mean
Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics. How this review workflow works
|
|
@clawsweeper re-review I updated the PR body with follow-up proof for the P1 items on head
Focused local validation rerun after rehydrating this PR: pnpm exec tsx /tmp/summarize-rss-transcript-runtime-proof.ts
pnpm -s test tests/security.rss-transcript-ssrf.test.ts
pnpm -s test tests/transcript.podcast-provider.podcast-transcript.test.ts
git diff --checkAll of the above completed successfully locally. The local machine is still Node v22.22.2 while the package declares Node |
|
🦞🧹 I asked ClawSweeper to review this item again. Re-review progress:
|
Summary
This PR hardens RSS podcast transcript fetching so feed-controlled
<podcast:transcript url="...">values cannot be used to fetch loopback, link-local, private-network, or redirected local targets from the host running Summarize.The patch keeps the existing podcast transcript feature, but moves the transcript URL across an explicit network safety boundary before fetching:
undicifetch resolution to the already validated DNS answers to reduce DNS rebinding exposureSecurity issues covered
<podcast:transcript url="...">values in feed XML.Before this PR
fetchwith automatic redirect following.After this PR
http:orhttps:.undicidispatcher pinned to the validated address set for hostname targets.Why this matters
Podcast/RSS feed XML can be controlled by the feed publisher. If a user or daemon summarizes an attacker-controlled feed, the feed can direct the host to fetch transcript content from services that are not reachable from the public internet, such as loopback admin interfaces, RFC1918 services, or link-local metadata endpoints.
That creates an SSRF boundary crossing from untrusted feed metadata into host-side network access. Depending on deployment, this can expose internal service responses or induce requests to internal infrastructure.
Runtime behavior proof
A real feed/runtime proof was run against the patched RSS transcript path (
tryFetchTranscriptFromFeedXml) with a synthetic RSS feed containing<podcast:transcript>entries. The proof starts a loopback HTTP service for the blocked case and verifies the service is not contacted, then verifies a public transcript URL still succeeds.Command run:
pnpm exec tsx /tmp/summarize-rss-transcript-runtime-proof.tsRedacted output:
This gives after-fix behavior proof for both sides of the boundary: local/private transcript URLs are rejected before request dispatch (
local_service_hits: 0), while a normal public HTTP(S) transcript URL still returns transcript text.Guard boundary note
This PR keeps the new RSS transcript guard in
packages/core/src/content/transcript/providers/podcast/rss-transcript.tsseparate from the existing daemon URL fetch guard insrc/daemon/url-fetch-guard.ts. That separation is intentional for this patch because the trust boundary here is the core podcast transcript provider: untrusted feed metadata crosses directly into core transcript fetching and must be protected even when the daemon layer is not involved.The daemon guard remains the boundary for daemon-owned URL fetches. Maintainers may still choose to extract a shared helper later to reduce long-term drift, but this PR does not assume the daemon guard automatically protects core transcript fetches.
Attack flow
<podcast:transcript>entry whoseurlpoints to a local/private target, or to a public URL that redirects to one.Affected code
packages/core/src/content/transcript/providers/podcast/rss-transcript.ts<podcast:transcript>extraction and transcript fetch path.Root cause
CVSS assessment
AV:N/AC:L/PR:N/UI:R/S:U/C:H/I:H/A:NThe exact score depends on whether maintainers treat arbitrary podcast/RSS feeds as untrusted remote input in the affected deployment. The key security property is that untrusted feed metadata could cause host-side requests to internal/local network destinations.
Safe reproduction steps
A bounded regression test is included instead of probing real local services:
<podcast:transcript url="http://127.0.0.1:8080/admin/transcript.txt" />.tryFetchTranscriptFromFeedXmlwith a mocked fetch implementation.null, the mocked fetch implementation is not called, and the rejection note mentions a blocked local network target.Additional tests cover DNS answers that resolve to private/link-local addresses, redirect revalidation, and pinned dispatcher use for public hostname targets.
Expected vulnerable behavior
On vulnerable code, the loopback/private transcript URL path would be passed to fetch, and automatic redirects could move an initially public transcript URL to a local/private destination without a second validation step.
Changes in this PR
undicidispatcher for native fetch so resolved hostnames use the validated address set.Files changed
packages/core/src/content/transcript/providers/podcast/rss-transcript.tstests/security.rss-transcript-ssrf.test.tspackages/core/package.jsonundicifor pinned dispatcher support.pnpm-lock.yamlMaintainer impact
Fix rationale
The safe boundary is at the point where feed-controlled metadata becomes a host-side network request. Checking only the original URL is not enough because DNS resolution and redirects can change the eventual target. This PR validates the original target, validates DNS answers, pins the native fetch to those answers where possible, and repeats validation across redirects.
Type of change
Test plan
Local validation completed:
pnpm installpnpm -s test tests/security.rss-transcript-ssrf.test.tspnpm -s test tests/transcript.podcast-provider.podcast-transcript.test.tsgit diff --checkpnpm -s format:checkpnpm -s lintpnpm -s typecheckpnpm -s buildLocal note: install/build contexts emitted the existing engine warning because this machine is using Node v22.22.2 while the package declares Node
>=24. The commands above completed successfully on this local environment.Disclosure notes
This PR is intentionally limited to RSS podcast transcript URL fetching. It does not claim to audit every outbound network request in the repository. The patch focuses on the feed-controlled transcript URL path where untrusted RSS metadata can become host-side network access.