Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/19231.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a bug where Mastodon posts (and possibly other embeds) have the wrong description for URL previews.
14 changes: 10 additions & 4 deletions synapse/media/url_previewer.py
Original file line number Diff line number Diff line change
Expand Up @@ -328,10 +328,16 @@ async def _do_preview(self, url: str, user: UserID, ts: int) -> bytes:
# response failed or is incomplete.
og_from_html = parse_html_to_open_graph(tree)

# Compile the Open Graph response by using the scraped
# information from the HTML and overlaying any information
# from the oEmbed response.
og = {**og_from_html, **og_from_oembed}
# Compile an Open Graph response by combining the oEmbed response
# and the information from the HTML, with information in the HTML
# preferred.
#
# The ordering here is intentional: certain websites (especially
# SPA JavaScript-based ones) including Mastodon and YouTube provide
# almost complete OpenGraph descriptions but only stubs for oEmbed,
# with further oEmbed information being populated with JavaScript,
# that Synapse won't execute.
og = og_from_oembed | og_from_html
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe at this point it would be good to add a comment explaining why this order matters (otherwise I'm half worried someone will come and flip-flop it the other way in a couple of months when another site is better the other way around).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, it now says why the order matters - not only the type of site but also the two examples shown off in the PR.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note on the comment - you have your comment the wrong way around. This is actually preferring OpenGraph info from the HTML over from oEmbed, and that's why it's helping. SPA sites like Mastodon are providing only stub information in their oEmbed, and it's the OpenGraph data in the HTML that has useful information.


await self._precache_image_url(user, media_info, og)
else:
Expand Down
Loading