Skip to content
This repository was archived by the owner on Dec 31, 2025. It is now read-only.

fix: parsing of multiline MIME encoded headers#718

Open
sjinks wants to merge 2 commits intoalephdata:mainfrom
sjinks:fix-multiline-headers
Open

fix: parsing of multiline MIME encoded headers#718
sjinks wants to merge 2 commits intoalephdata:mainfrom
sjinks:fix-multiline-headers

Conversation

@sjinks
Copy link

@sjinks sjinks commented May 6, 2025

The email parser incorrectly parses multiline MIME-encoded headers. For example, given this header:

From: =?UTF-8?Q?=D0=9E=D1=82=D0=B4=D0=B5=D0=BB_=D0=BF=D0=BE_=D1=80=D0=B0?=
 =?UTF-8?Q?=D0=B1=D0=BE=D1=82=D0=B5_=D1=81_=D0=BF=D1=80=D0=BE=D1=85=D0=BE?=
 =?UTF-8?Q?=D0=B6=D0=B4=D0=B5=D0=BD=D0=B8=D0=B5=D0=BC_=D0=B7=D0=B0=D0=BA?=
 =?UTF-8?Q?=D0=BE=D0=BD=D0=BE=D0=BF=D1=80=D0=BE=D0=B5=D0=BA=D1=82=D0=BE?=
 =?UTF-8?Q?=D0=B2?= <redacted@example.com>

It is parsed as

Отдел по ра боте с прохо ждением зак онопроекто в <redacted@example.com>

instead of

Отдел по работе с прохождением законопроектов <redacted@example.com>

That is, the lines of the headers are joined with a space, and then the result is decoded. The expected behavior is concatenating the lines, discarding the continuation whitespace characters. This is what, for example, Thunderbird (and other mail clients) does:

Screenshot_20250506_123048

This behavior can be verified with, for example, https://dogmamix.com/MimeHeadersDecoder/

As a side effect, the current behavior results in creation of many unnecessary aliases of the same name, like this:

Screenshot_20250506_123325

Unfortunately, the issue is in Python's internals. The compat32 policy, however, parses the headers correctly (make_header(decode_header(value)) produces the expected result).

This PR attempts to fix the described issue by using a custom policy derived from email.policy.default that implements header_fetch_parse the way email.policy.compat32 does (and maintaining compatibility with email.policy.default).

However, I am not a Python developer; there could be a cleaner (or better) way to do this. But it works :-)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant