Conversation
| # log stderr | ||
| stderr = proc.streams.stderr # bytes in py3 | ||
| stderr = stderr.decode() if isinstance(stderr, bytes) else stderr | ||
| emit("stderr for '{}' @ {}\n{}\n".format(ua.name, ua.srcaddr, stderr)) |
There was a problem hiding this comment.
It feels wrong.
stderr was also bytes in py2 (non-unicode-str; there is no future import unicode_literals). But once you decode it, it would be unicode, and conflict with the emit(bytestring.format())
$ cat 1.py
from subprocess import Popen, PIPE
p = Popen(r"/usr/bin/printf '\xc3\xa1' >&2", shell=True, stderr=PIPE, stdout=PIPE)
out, err = p.communicate()
print(repr(err))
stderr = err
print(repr(stderr.decode()))
$ python 1.py
'\xc3\xa1'
Traceback (most recent call last):
File "1.py", line 6, in <module>
print(repr(stderr.decode()))
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)
$ python3 1.py
b'\xc3\xa1'
'á'
But this should work:
stderr = stderr.decode() if not isinstance(stderr, str) else stderr
^-- py2: str==bytes=>noop, py3: str!=bytes=>decode
And the bytestring.format() doesn't choke either.
Howver. Always decoding a bytestring as utf-8 is not so safe. Better do decode with 'replace':
>>> bytes([0x41, 0x81, 0xc3, 0xa1]).decode()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x81 in position 1: invalid start byte
>>> bytes([0x41, 0x81, 0xc3, 0xa1]).decode('utf-8', 'replace')
'A�á'
There was a problem hiding this comment.
This is because the default encoding changed from ascii to utf-8 with python3.
To go back to your code snippet above, this works on both py2 and py3:
from subprocess import Popen, PIPE
p = Popen(r"/usr/bin/printf '\xc3\xa1' >&2", shell=True, stderr=PIPE, stdout=PIPE)
out, err = p.communicate()
print(repr(err))
stderr = err
print(repr(stderr.decode('utf-8', 'replace')))When run:
$ python2 ./foo.py
'\xc3\xa1'
u'\xe1'
$ python3 ./foo.py
b'\xc3\xa1'
'á'
So you don't even need the isinstance check.
There was a problem hiding this comment.
Errata: its utf-8 since python3.5.
And probably more future proof too since python 3.7 is going to lean on utf-8 more:
I mean, its my humble opinion that if your system isn't set up to support utf8 these days, you're running a misconfigured system 😉
There was a problem hiding this comment.
So you don't even need the isinstance check.
Well, we still do, because:
print('{}'.format(stderr.decode('utf-8', 'replace')))
Traceback (most recent call last):
File "1.py", line 6, in <module>
print('{}'.format(stderr.decode('utf-8', 'replace')))
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe1' in position 0: ordinal not in range(128)
The emit() gets a bytestring in py2, which doesn't do unicode.
I mean, its my humble opinion that if your system isn't set up to support utf8 these days, you're running a misconfigured system
Absolutely, but we don't want to error out because an application sends invalid utf-8.
There was a problem hiding this comment.
That's because you can't embed a unicode string into a ascii format string (well, implicit encoding happens). Adding a u to the format string also makes it work on both:
print(u'{}'.format(stderr.decode('utf-8', 'replace')))There was a problem hiding this comment.
My point exactly :)
Although I did not dive into emit() to see if it takes an unicodestring in py2.
There was a problem hiding this comment.
Yeah, so I think we're on the same page.
Just personally I think I'd rather see the underlying format string adjusted than a if isinstance(...) expression.
On py3 we get
bytesback from the stderr output. Decode to a nativestrbefore logging to keep a readable output.