Skip to content

Metadata not in head but in the body #37

@ThePavolC

Description

@ThePavolC

Hi,

I am having an issue with getting the metadata using opengraph_py3, urllib and bs4.

In parser method you are only checking the <head> but it looks like <meta> tags are sometimes in the body. Any ideas how can I fix this ? Is it due to the UserAgent ?

  • urllib3 1.23
  • opengraph-py3 0.71
  • beautifulsoup4 4.6.0
import re
import opengraph_py3 as opengraph
import urllib
from bs4 import BeautifulSoup

raw = urllib.request.FancyURLopener().open("https://youtu.be/DQwU_kU4pUg")
html = raw.read()
soap = BeautifulSoup(html, 'html.parser')

# This is the same code as in `parser`
soap.html.head.findAll(property=re.compile(r'^og'))
# []

soap.html.body.findAll(property=re.compile(r'^og'))
# [<meta content="YouTube" property="og:site_na....]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions