Skip to content

Check why certain pages were filtered out by the blacklist #24

@jogli5er

Description

@jogli5er

The latest counts showed that we filter out a large part of the contents, either because they have a "wrong" mimetype (we should not even download those, see #23 ) or because the parser finds something that resembles base64. We have to crawl through a few of those pages to see, whether the parser works correctly or if there is some sort of bug.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions