Skip to content
This repository was archived by the owner on May 25, 2022. It is now read-only.

Commit f9b6c6e

Browse files
authored
Merge pull request #382 from tripleee/store-emails-update
Store emails: update code and README
2 parents f5b5ba2 + 6866c61 commit f9b6c6e

File tree

3 files changed

+100
-55
lines changed

3 files changed

+100
-55
lines changed
+63-31
Original file line numberDiff line numberDiff line change
@@ -1,31 +1,63 @@
1-
# Store mails in your inbox in csv format
2-
<!--Remove the below lines and add yours -->
3-
1)This script takes your email and password as input.
4-
5-
2)Return a csv containing following attributes:
6-
7-
-Date
8-
9-
-From(Sender)
10-
11-
-Subject
12-
13-
-Mail Text
14-
15-
16-
## Prerequisites
17-
<!--Remove the below lines and add yours -->
18-
You only need Python to run this script. You can visit [here](https://www.python.org/downloads/) to download Python.
19-
20-
21-
## How to run the script
22-
<!--Remove the below lines and add yours -->
23-
Running the script is really simple! Just open a terminal in the folder where your script is located and run the following command :
24-
25-
`pip install -r requirements.txt`
26-
`python store_emails.py`
27-
28-
29-
## *Author Name*
30-
<!--Remove the below lines and add yours -->
31-
gpriya32(Priyanka)
1+
# Store emails in CSV
2+
3+
This project contains a simple script to extract email messages
4+
from an IMAP server.
5+
6+
The messages are written to a simple four-column CSV file.
7+
8+
9+
## Dependencies
10+
11+
This depends on the BeautifulSoup library and `lxml`
12+
for extracting text from HTML messages.
13+
14+
15+
## Running the script
16+
17+
You will need to have a file `credentials.txt`
18+
with your IMAP server account name and password on separate lines.
19+
20+
Gmail - and many other IMAP providers -
21+
requires you to create a separate "application password"
22+
to allow this code to run, so probably do that first.
23+
Then put that password in `credentials.txt`.
24+
25+
Then simply run
26+
27+
```
28+
python store_emails.py
29+
```
30+
31+
This generates `mails.csv` in the current directory.
32+
33+
The generated CSV file contains the following fields for each message:
34+
35+
* Date
36+
* From (Sender)
37+
* Subject
38+
* Message text
39+
40+
41+
## Development ideas
42+
43+
This hardcodes the IMAP server for Gmail.com and the `"INBOX"` folder.
44+
Perhaps this should be configured outside of the code
45+
for easier customization.
46+
47+
This brutally marks all messages as read.
48+
Perhaps make it `PEEK` so as to not change the message flags.
49+
50+
This will read everything in the `INBOX` folder.
51+
It could be useful to make it remember which messages it has already seen,
52+
and update a CSV file only with information from messages which have
53+
arrived since the previous poll.
54+
55+
It might be useful to be able to specify which messages to fetch,
56+
instead of have it fetch everything every time.
57+
58+
The exception handling is not a good example of how to do this properly.
59+
60+
61+
## Author Name
62+
63+
Aditya Jetely (@AdityaJ7)
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
beautifulsoup4
2+
lxml

projects/Store_emails_in_csv/store_emails.py

+36-24
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,20 @@
11
#!/usr/bin/env python
22

3-
import imaplib
3+
import csv
44
import email
55
from email import policy
6-
import csv
7-
import ssl
6+
import imaplib
7+
import logging
88
import os
9+
import ssl
10+
911
from bs4 import BeautifulSoup
1012

11-
credential_path = os.getcwd() + "/credentials.txt"
12-
csv_path = os.getcwd() + "/mails.csv"
13+
14+
credential_path = "credentials.txt"
15+
csv_path = "mails.csv"
16+
17+
logger = logging.getLogger('imap_poller')
1318

1419
host = "imap.gmail.com"
1520
port = 993
@@ -36,7 +41,7 @@ def get_text(email_body):
3641
return soup.get_text(separator="\n", strip=True)
3742

3843

39-
def write_to_csv(mail, writer):
44+
def write_to_csv(mail, writer, N, total_no_of_mails):
4045

4146
for i in range(total_no_of_mails, total_no_of_mails - N, -1):
4247
res, data = mail.fetch(str(i), "(RFC822)")
@@ -60,12 +65,11 @@ def write_to_csv(mail, writer):
6065
content_disposition = str(part.get("Content-Disposition"))
6166
try:
6267
# get the email email_body
63-
email_body = part.get_payload(decode=True).decode(
64-
"utf-8"
65-
)
66-
email_text = get_text(email_body)
67-
except Exception:
68-
pass
68+
email_body = part.get_payload(decode=True)
69+
if email_body:
70+
email_text = get_text(email_body.decode('utf-8'))
71+
except Exception as exc:
72+
logger.warning('Caught exception: %r', exc)
6973
if (
7074
content_type == "text/plain"
7175
and "attachment" not in content_disposition
@@ -80,18 +84,22 @@ def write_to_csv(mail, writer):
8084
# extract content type of email
8185
content_type = msg.get_content_type()
8286
# get the email email_body
83-
email_body = msg.get_payload(decode=True).decode("utf-8")
84-
email_text = get_text(email_body)
85-
86-
# Write data in the csv file
87-
row = [email_date, email_from, email_subject, email_text]
88-
writer.writerow(row)
89-
90-
91-
if __name__ == "__main__":
87+
email_body = msg.get_payload(decode=True)
88+
if email_body:
89+
email_text = get_text(email_body.decode('utf-8'))
90+
91+
if email_text is not None:
92+
# Write data in the csv file
93+
row = [email_date, email_from, email_subject, email_text]
94+
writer.writerow(row)
95+
else:
96+
logger.warning('%s:%i: No message extracted', "INBOX", i)
9297

98+
def main():
9399
mail, messages = connect_to_mailbox()
94100

101+
logging.basicConfig(level=logging.WARNING)
102+
95103
total_no_of_mails = int(messages[0])
96104
# no. of latest mails to fetch
97105
# set it equal to total_no_of_emails to fetch all mail in the inbox
@@ -101,6 +109,10 @@ def write_to_csv(mail, writer):
101109
writer = csv.writer(fw)
102110
writer.writerow(["Date", "From", "Subject", "Text mail"])
103111
try:
104-
write_to_csv(mail, writer)
105-
except Exception as e:
106-
print(e)
112+
write_to_csv(mail, writer, N, total_no_of_mails)
113+
except Exception as exc:
114+
logger.warning('Caught exception: %r', exc)
115+
116+
117+
if __name__ == "__main__":
118+
main()

0 commit comments

Comments
 (0)