Skip to content

Support BOM in xml feeds#540

Merged
probberechts merged 6 commits intoPySport:masterfrom
Alek050:bug/support_BOM_in_xml_feeds
Mar 10, 2026
Merged

Support BOM in xml feeds#540
probberechts merged 6 commits intoPySport:masterfrom
Alek050:bug/support_BOM_in_xml_feeds

Conversation

@Alek050
Copy link
Contributor

@Alek050 Alek050 commented Feb 24, 2026

Suggested fix for BOM's in xml feeds.

closes #539

@probberechts
Copy link
Contributor

Thanks for looking into this! Would it be possible to use the utf-8-sig encoding instead? If that works, I’d prefer it over the current workaround, as it feels a bit hacky.

@Alek050
Copy link
Contributor Author

Alek050 commented Feb 24, 2026

@probberechts I tried that first, but it raised the same error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 0: unexpected end of data
decoding with 'utf-8-sig' codec failed

@probberechts
Copy link
Contributor

That's odd. It seems to work fine for me. Here is a minimal example:

import io
bom = b"\xef\xbb\xbf"
xml_content = bom + b'<?xml version="1.0"?><root/>'
feed = io.BytesIO(xml_content)
first_char = feed.read(4).decode("utf-8-sig")
assert first_char == '<'

Am I missing something?

@Alek050
Copy link
Contributor Author

Alek050 commented Feb 24, 2026

Hahaha no, my bad.

I stupidly tested feed.read(1).decode("utf-8-sig")instead of feed.read(4). I changed the code.

The only assumption is that the 4th byte is now the first of the xml string. This holds for utf-8 encoding, however for utf-16 and utf32 it is 2 and 4 bytes respectively. But I guess we can keep it like this because the .decode("utf-8-sig") will not work on the other encoding types anyway (?).

@probberechts probberechts added the bug Something isn't working label Mar 10, 2026
@probberechts probberechts changed the title Bug/support BOM in xml feeds Support BOM in xml feeds Mar 10, 2026
@probberechts probberechts modified the milestone: 3.19.0 Mar 10, 2026
@probberechts probberechts merged commit ea54640 into PySport:master Mar 10, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support Byte Order Marks in XML feeds

2 participants