Thursday, February 27, 2014

Reading email messages in Perl using Mail::POP3Client.

I had to write a program to read emails through POP3. It seemed to be a relatively easy to do using Perl's Mail::POP3Client library. There are enough examples on the Internet to get a basic program written very quickly that will retrieve emails. However, I was surprised that I couldn’t find a more detailed program. For example, I soon learned (the hard way) about message encodings (Base64 and Quoted-Printable). I found references to MIME::QuotedPrint and MIME::Base64 libraries for decoding the email bodies. It was easy enough to check the header for encoding entries:

# Process header
for (split /^/, $header)
  if ( /^From:\s+<(.*?)>/i ) { $from = $1 };
  if ( /^Subject:\s+(.*)/i ) { $subject = $1 };
# Decode body if encoded
if ( $header =~ /Content-Transfer-Encoding:\s+base64/is )
  $body = decode_base64($body);
elsif ( $header =~ /Content-Transfer-Encoding:\s+quoted-printable/is )
  $body = decode_qp($body);

But I was surprised this was a “manual” process left up to the user. I then discovered some characters in the body were wrong. The emails are utf8 encoded and seemed to be readable 99.9% of the time, but every so often, I’d see a character get messed up. Things like a single-right-quote wouldn’t convert properly. I searched again and found Encoding::FixLatin which did the trick. So yes, I got it all working, but it seemed very inelegant.

I happened to stumble on a bit of code while looking for more information on Perl and utf8 that used Email::MIME to process emails. Most examples using Email::MIME that I’ve seen are for encoding (i.e., sending) emails. I took an educated guess at using Email::MIME as a decoder and it worked a treat. Here’s an example program to read emails using what I’ve found.

use Modern::Perl;
use Mail::POP3Client;
use Email::MIME;
use IO::Socket::SSL;
use Encode;
# Set output to UTF8
use utf8;
binmode(STDOUT, ":utf8");
# Retrieve emails through POP3
my $pop_user = '********';
my $pop_pass = '********';
my $pop_host = 'mail.********.com';
# Connect to POP3 sever
# Manually create SSL connection since we can't
#  set SSL_verify_mode in Mail::POP3Client
my $socket = IO::Socket::SSL->new( PeerAddr        => $pop_host,
                                   PeerPort        => 995,
                                   SSL_verify_mode => SSL_VERIFY_NONE,
                                   Proto           => 'tcp') || die "No socket!";
my $pop = Mail::POP3Client->new();
  or die "Unable to connect to POP3 server: ".$pop->Message()."\n";
# Count number of items in mailbox        
my $mailcount = $pop->Count();
# Process each email individually
for (my $i = 1; $i <= $mailcount ; $i++)
  my $header = $pop->Head($i); # Gets the email header
  #my $uni = $pop->Uidl($i); # Gets the unique id
  #my $body = $pop->Body($i); # Gets the email body
  my $mail = $pop->HeadAndBody($i);
  my $parsed = Email::MIME->new($mail);
  my $from = encode('utf8', $parsed->header('From'));
  my $subject = encode('utf8', $parsed->header('Subject'));
  my $body = $parsed->body_str;
  say "$header";
  say "$body\n";
} # END for loop
# Close POP connection

No comments:

Post a Comment