decoding mail headers in python

Memo on decoding the email header.

Using the sender/from address as an example.

from email.header import decode_header

decoded = decode_header(message['from'])

outstr = decoded[0][0].decode(decoded[0][1])

# sample: decoded[0][0]

# b'LinkedIn\xe3\x82\xb3\xe3\x83\xb3\xe3\x82\xbf\xe3\x82\xaf\xe3\x83\x88'

# sample: decoded[0][1]

# utf-8

# sample: outstr

# LinkedInコンタクト

Here's two print codes.

Below seems more regular, but during debugging, exceptions occurred, so I opted for the UTF-8 specified method above to observe the situation.

print(i, outstr.encode('utf-8', errors='replace').decode('utf-8'))

print(outstr.encode(sys.stdout.encoding, errors='replace').decode(sys.stdout.encoding))

Here's a summary:

The argument 'i' is an optional index number for only display purposes.

The reason for separating exception handling of the decoding and output parts is

that there are quite a few exceptions occurring during decoding,

so if combined, it would stop processing before reaching the output stage.

def print_decoded_item(i, instr):
outstr = instr
try:
decoded = decode_header(instr)
outstr = decoded[0][0].decode(decoded[0][1])
except Exception as e:
pass

try:
print(i, outstr.encode('utf-8', errors='replace').decode('utf-8'))
except Exception as e:
pass

My Tech Life

Memo by a Japanese Software Developer in his late 50s.

decoding mail headers in python