My Tech Life

Memo by a Japanese Software Developer in his late 50s.

reading .mbox file in python

With a tendency to forget things often,

I've decided to open a blog to keep future memos.

 

Over the past decade or so,

my Gmail account has accumulated a lot of emails,

mostly advertisements and newsletters.

 

With such a large collection,

I thought it might be interesting to analyze any trends over the years

using Python.

 

Gmail allows exporting emails in .mbox format in bulk.

I managed to obtain around 3.6GB of data.

 

And now, I've received a typical sample from ChatGPT.

It seems to be loading the entire content at first,

so there's a delay right after executing the command.

 

 

import mailbox

# Create an MBOX object by specifying the path to the MBOX file
mbox = mailbox.mbox('example.mbox')

# repeat processing messages in mbox
for message in mbox:
    print("Subject:", message['subject'])
    print("From:", message['from'])
    print("To:", message['to'])
    print("Date:", message['date'])
    print("Body:", message.get_payload())  # get message content
    print("\n")

# close mbox file
mbox.close()

 

My idea is to open the .mbox file as a file and then read it for each email.

I did some research, but it seems quite cumbersome.

 

I decided to create a small .mbox file of a few hundred megabytes

and proceed with creating a prototype using that.