メールヘッダーのデコード

メールヘッダーのデコード部分の備忘録。

送信元を例に。

from email.header import decode_header

decoded = decode_header(message['from'])

outstr = decoded[0][0].decode(decoded[0][1])

# 例）decoded[0][0]

# b'LinkedIn\xe3\x82\xb3\xe3\x83\xb3\xe3\x82\xbf\xe3\x82\xaf\xe3\x83\x88'

# 例）decoded[0][1]

# utf-8

# 例）outstr

# LinkedInコンタクト

出力は、下のほうが汎用性は高そうだけど、

デバッグ中に例外が出たので、上のUTF-8指定で様子見とした。

print(i, outstr.encode('utf-8', errors='replace').decode('utf-8'))

print(outstr.encode(sys.stdout.encoding, errors='replace').decode(sys.stdout.encoding))

まとめると、こんな感じ。

引数のiは表示用のインデックス番号なので、任意。

デコード部と出力部の例外処理を分けている理由は、

実際はデコード部の例外発生がかなり多く、

一緒にしてしまうと、出力までたどり着かなくなってしまうので。

def print_decoded_item(i, instr):
outstr = instr
try:
decoded = decode_header(instr)
outstr = decoded[0][0].decode(decoded[0][1])
except Exception as e:
pass

try:
print(i, outstr.encode('utf-8', errors='replace').decode('utf-8'))
except Exception as e:
pass

My Tech Life

Memo by a Japanese Software Developer in his late 50s.

メールヘッダーのデコード