My Tech Life

Memo by a Japanese Software Developer in his late 50s.

decoding mail headers in python

Memo on decoding the email header.

Using the sender/from address as an example.

 

from email.header import decode_header

 

decoded = decode_header(message['from'])

outstr = decoded[0][0].decode(decoded[0][1])

 

# sample: decoded[0][0]

# b'LinkedIn\xe3\x82\xb3\xe3\x83\xb3\xe3\x82\xbf\xe3\x82\xaf\xe3\x83\x88'

# sample: decoded[0][1]

# utf-8

# sample: outstr

# LinkedInコンタクト

 

Here's two print codes.

Below seems more regular, but during debugging, exceptions occurred, so I opted for the UTF-8 specified method above to observe the situation.

 

print(i, outstr.encode('utf-8', errors='replace').decode('utf-8'))

print(outstr.encode(sys.stdout.encoding, errors='replace').decode(sys.stdout.encoding))

 

Here's a summary:

The argument 'i' is an optional index number for only display purposes.

The reason for separating exception handling of the decoding and output parts is

that there are quite a few exceptions occurring during decoding,

so if combined, it would stop processing before reaching the output stage.

 

def print_decoded_item(i, instr):
    outstr = instr
    try:
        decoded = decode_header(instr)
        outstr = decoded[0][0].decode(decoded[0][1])
    except Exception as e:
        pass

    try:
        print(i, outstr.encode('utf-8', errors='replace').decode('utf-8'))
    except Exception as e:
        pass