Parsing POP email messages

Posted by Jack Altiere on March 29th, 2010

I was recently tasked with coming up with a way to parse a pop email account and automatically save off the message and any attachments that came in the email.  While it didn’t sound like too bad of a task, I know from my limited experience that parsing MIME content can be a pain in the rear.  I thought it would be interesting to share what I came up with in a sample application.

First, I knew that I had no desire to reinvent the wheel and write my own MIME parser if I could find one that works.  After some searching and some trial and error I ended up finding a .NET library that seems to work great from here.  The full version of this library isn’t free, but the trial version is fully functional, it just rewrites the email subjects once in a while to tell you to buy the full version.  (as a side note, I will probably end up buying this product, it worked exactly like I wanted it to) 

Since I felt like showing this would go better if I had a sample app, I decided to write a little one page website that takes emails from an approved list of addresses, and posts any photo attachments in a little photo album.  I decided I wanted a SQL Server backend, and that I would just use LINQ to SQL for data access.  The plan is to just take the body of the email and make it the image caption and attach 1 image per email.   It would also be nice to resize the image if it’s too big.

First, the backend.  For the sake of simplicity I just created 2 tables…one to handle user validation and one to store the photos.  After creating the LINQ to SQL classes for these two tables, my .dbml design view looked like this:

dblayout

After that, I wanted to set up my mail parsing library.  Like I mentioned above, I just wrapped a 3rd party utility, so my code for this isn’t too complex.

public class MailParser
{
    private readonly string _popServer;
    private readonly string _username;
    private readonly string _password;
    private readonly IMessageProcessor _messageProcessor;

    public MailParser(IMessageProcessor messageProcessor)
    {
        _popServer = ConfigurationParser.GetAppSettingString("MailServer");
        _username = ConfigurationParser.GetAppSettingString("MailUserName");
        _password = ConfigurationParser.GetAppSettingString("MailPassword");
        _messageProcessor = messageProcessor;
    }

    public void ProcessMessages()
    {
        var messages = new List<IMail>();

        using (var pop3 = new Pop3())
        {
            pop3.Connect(_popServer);
            pop3.Login(_username, _password);

            foreach (var uid in pop3.GetAll())
            {
                // Add the email message to the list to be processed.
                var message = new MailBuilder().CreateFromEml(pop3.GetMessageByUID(uid));
                messages.Add(message);

                // Delete the message from the server.
                pop3.DeleteMessageByUID(uid);
            }
            pop3.Close(true);

            // We have snagged all of the messages from the server, now process them.
            foreach (var message in messages)
            {
                _messageProcessor.ProcessMessage(message);
            }
        }
    }

}

public interface IMessageProcessor
{
    void ProcessMessage(IMail message);
}

public class MessageProcessor : IMessageProcessor
{
    private readonly PhotoDBDataContext _db = new PhotoDBDataContext();

    public void ProcessMessage(IMail message)
    {
        var fromAddresses = message.From;
        var body = message.HtmlData ?? message.TextData;

        if (fromAddresses.Count == 0)
            return;

        // I only want to process emails from a list of valid email addresses.
        var address = (fromAddresses[0].Address.IsNullOrEmpty()) ? string.Empty : fromAddresses[0].Address;
        var user = ValidateUser(address);
        if (user == null)
            return;

        // If there are no attachments ignore the message.
        if (message.Attachments.Count == 0)
            return;

        // Process the attachments. (only supporting 1 attachment per email)
        var counter = 0;
        foreach (var att in message.Attachments)
        {
            if (counter > 0)
                break;

            // Make sure that the attachment is a valid image.
            if (!att.ContentType.MimeType.ToString().ToLower().Equals("image"))
                return;

            var photo = new Photo
                            {
                                Caption = body.Text,
                                DateUploaded = DateTime.Now,
                                UserID = user.UserID,
                                ImageData = att.Data,
                                ContentType = att.ContentType.ToString(),
                                FileName = att.FileName
                            };

            _db.Photos.InsertOnSubmit(photo);
            _db.SubmitChanges();

            counter ++;
        }
    }

    private User ValidateUser(string email)
    {
        var query = from u in _db.Users
                    where u.EmailAddress.Equals(email)
                    select u;

        return query.FirstOrDefault();
    }

}

The code should be fairly straight forward.  I created a message processing interface, because it is conceivable that I would want to reuse this and processes messages differently.  You may notice the ConfigurationParser references, that is just a class I wrote to wrap the access of configuration variables allowing me to load them strongly typed.   All I am doing is looping over the messages on the POP server, loading them into a list to process, and deleting the original message from the server.  I’ll throw out the standard disclaimer here and say that this is an example application!  I would definitely consider persisting messages into some sort of a queue to process before deleting them from the mail server if this were a production application.

Using LINQ to SQL it’s a snap to validate the user and make sure we’re only posting pictures from trusted email addresses. (I know, we would have to think about how to protect this against spoofing if this were a real app)  Notice how I’m also using the MIME type to make sure that we only accept image attachments.  Lastly, I’m only processing the first attachment to make sure to only accept one image per email, I did this because I’m using the message body as the caption of the photo.

Now that images are being scraped from the email account and stored in the database, I thought I’d close the loop on this sample application and put a quick web page up to display the album.   I created a new ASP.NET MVC 2 web application, and then went through the same steps as above to create my LINQ to SQL classes.  The first thing I wanted to do was create a strongly typed view.  The way I do this is to right-click in the HomeController.cs file and select Add View.   You will get a dialog like this one:

strongtypedview

All I did was name the view, select my LINQ to SQL generated Photo class, and choose List as the View content.  This basically says that the view will have access to a collection of photos.  It’s trivial to pull the list of photos out of the database with LINQ:

public ActionResult Index()
{
    var db = new PhotoDBDataContext();
    var photos = (from p in db.Photos
                  select p).ToList();

    ViewData.Model = photos;
    return View();
}

I wanted to do is to display an image straight from an MVC controller.  I’m not going to post all the code I used to accomplish this, but the short version: I modified a guide I found here.  One requirement that I still haven’t addressed is that I never resized the images.  Since I took the images straight from the POP account and put them in the database, I have no idea how big they really are.  Rather than go way off track and figure out how to do this, I used a method I read about here to resize my images from a byte array if needed.  Now I wanted to come up with a clever way to be able to thumbnail images on the fly, so I changed the default route in the global.asax file to look like this:

public static void RegisterRoutes(RouteCollection routes)
{
    routes.IgnoreRoute("{resource}.axd/{*pathInfo}");

    routes.MapRoute(
        "Default", // Route name
        "{controller}/{action}/{photoID}/{thumbnail}", // URL with parameters
        new { controller = "Home", action = "Index", photoID = UrlParameter.Optional, thumbnail = false } // Parameter defaults
    );

}


What this allows me to do is use a URL like mysite.com/Home/Images/1 to show the full image and mysite.com/Home/Images/1/true if I want to show the thumbnail of the same image.  My controller method for the image creation looks like this:

public ActionResult Images(int photoID, bool thumbnail)
{
    var db = new PhotoDBDataContext();
    var photo = (from p in db.Photos
                 where p.PhotoID.Equals(photoID)
                 select p).SingleOrDefault();

    if (photo == null)
        throw new Exception("Photo not loaded");

    var bytes = (thumbnail)
                    ? ResizeFromStream(100, new MemoryStream(photo.ImageData.ToArray()), photo.FileName)
                    : photo.ImageData.ToArray();

    var image = bytes;
    var contentType = photo.ContentType;
    return this.Image(image, contentType);
}

The view ended up being trivial:

<%@ Page Title="" Language="C#" MasterPageFile="~/Views/Shared/Site.Master" Inherits="System.Web.Mvc.ViewPage<IEnumerable<BlogSamples.DisplayPhotos.Photo>>" %>

<asp:Content ID="Content1" ContentPlaceHolderID="TitleContent" runat="server">
    Photo Album - Jaltiere.Com
</asp:Content>

<asp:Content ID="Content2" ContentPlaceHolderID="MainContent" runat="server">

    <div id="container">
        <ul>

        <% foreach (var item in Model) { %>

            <li>
            <a href="Home/Images/<%= Html.Encode(item.PhotoID) %>" target="_blank" style="background-image: url('Home/Images/<%= Html.Encode(item.PhotoID) %>/true')">
            <span><%= Html.Encode(item.Caption) %><br /><i>uploaded <%= Html.Encode(item.DateUploaded.ToShortDateString()) %></i></span></a>
            </li>

        <% } %>

        </ul>
    </div>

</asp:Content>

And with a little CSS magic…

body {
    text-align:center;
    font-family: tahoma, arial, sans-serif;
    font-size: 8pt;
}

#container {
    position:relative;
    width:770px;
    height:396px;
    margin:20px auto 0 auto;
    border:1px solid #aaa;
} 

ul {
    padding:0;
    margin:0;
    list-style-type:none;
}

li {
  display: inline;
  float: left;
  width: 101px;
  height: 101px;
  margin: 4px;
}

li a {
  display: block;
  width: 101px;
  height: 101px;
  background-position: center;
  background-repeat: no-repeat;
  text-decoration: none;
}

li { height: 115px; }
li a span {
  font-size: 9px;
  position: relative;
  top: 103px;
  color: #666;
  display: block;
  text-align: center;
}
li a:hover span { color: red; }

You get a functional photo album:

 gallery

This article kind of took off on me and turned out to be more MVC than POP mail parsing, but it was a fun little exercise.  Things to take away:

  1. I recommend the mail parsing library I used, I found it to be intuitive and easy to use.
  2. LINQ to SQL makes data access easy. (coming from a guy with a big stored procedure background)
  3. MVC is much more fun to work with than Webforms in my opinion.
  4. I freely admit that I am definitely not a designer, so my web layout is not very good.

kick it on DotNetKicks.com

The Open Data Protocol

Posted by Jack Altiere on March 25th, 2010

I recently went to the MIX developer’s conference in Las Vegas, and one of the topics that I put on my list of things to get more familiar with is the Open Data Protocol, or OData for short.  To put it simply, OData is an HTTP based method for sharing data.  I highly recommend checking out the OData site to get more familiar with the protocol from a technical standpoint.   To sum it up, I’ll just say that it’s built on conventions used in the Atom Publishing Protocol. (AtomPub)

There are several public OData feeds already available, but the one I used to get my feet wet was the Netflix OData Catalog API.  This was part of the day 2 keynote at the MIX conference, and it seemed like a good feed to learn the ropes with.  There are a lot of different ways you can query the feed, (you can even use a browser) but I felt the best way to get started was to use LINQPad.  I recommend installing the .NET Framework 4.0 RC and installing the corresponding version of LINQPad, it worked better with the Netflix OData feed when I was learning it.

LINQPad supports WCF Data Services out of the box, so you will be literally be querying the Netflix feed in minutes.  All you have to do is start the program and click the Add Connection link on the top left.  You will be presented with a dialog box like this one:

addconnection

Select WCF Data Services from the list and hit the next button.  You will then be presented with this dialog:

connectionstring

All you have to do here is type in the URL of the feed, which is this case is http://odata.netflix.com/Catalog.  After you hit the OK button you are ready to start writing queries, it really is that easy.  Once you have your feed set up, LINQPad figures out what information that the feed is providing and outlines it for you on the left side.

feeddef

At this point you can start writing queries in LINQ.   Here are a few sample queries that I came up with:

What if I wanted to get a list of all of the awards that Al Pacino has won?

from p in People.Expand("Awards")
where p.Name.Equals("Al Pacino")

If you run that query in LINQPad, you get the following results:

 query1

I used the Expand method to expand the Awards collection off of the Netflix defined People section.  This is a method available off of the DataServiceQuery<T> class, which is how the results come back.  This result isn’t quite as useful as I’d hoped since I still need to expand the CatalogTitle to see which movie is associated with each award.  This can be accomplished by chaining another Expand call, the only thing to notice is the ‘/’ rather than the ‘.’ that you might expect when accessing the collection:

from p in People.Expand("Awards").Expand("Awards/Title")
where p.Name.Equals("Al Pacino")
select p

This returns us the CatalogTitle information as well, so we can see that in 1993, Al Pacino won the Academy Award for Best Actor for his role in Scent of a Woman.

query2

How about something a little more complex.  I want to see all ‘R’ rated movies released in 2009 that are available in Blu-Ray with a netflix rating of 4 or higher.  This is trivial with a simple LINQ expression:

from t in Titles
where t.Type.Equals("Movie") && t.AverageRating >= 4
&& t.BluRay.Available == true && t.ReleaseYear == 2009
&& t.Rating.Equals("R")
orderby t.AverageRating descending
select new
{
    t.Id,
    t.RegularTitle,
    t.Synopsis,
    t.ReleaseYear,
    t.Rating
}

In the previous example, I also used an anonymous type to only show the data I was interested in, and I sorted the results based on their rating from highest to lowest.  This query produced the following result:

query3

It might be a little tough to see, but the query returned 4 movies that fit the criteria.  Just to prove you can do the same thing straight from the browser, the same query as above could be run straight from the browser using this link.  The only thing this link doesn’t do that my query above does is restrict the amount of data being brought back.  To get a better idea of how to navigate an OData using URL’s check this article out.

Visual Studio 2010 provides OData support straight out of the box as well.  I would also suggest adding the Open Data Protocol Visualizer plugin, which allows you to see a read-only graphical representation of the types and relationships provided in a WCF Data Service.  I created a Service Reference to the Netflix OData feed to show you a sample of what this plugin can output.

odatavisualizer

I really like the direction that Microsoft is going with OData.  This protocol will become more and more useful as more providers start creating feeds.  This should allow developers to create mashups fairly easily and consume data that already exists in the public domain.  Microsoft is really pushing the adoption of this protocol…I found this image over at Douglas Purdy’s blog which sums up OData support pretty well I think.

odata

I want to show one more way that OData feeds can be consumed.  I downloaded the Office 2010 Beta and then installed Power Pivot to check out the native OData integration that is available.  I was able to quickly set up a connection to the Netflix feed, as shown here:

powerpivot

To add the feed, click the PowerPivot Window in the top left, then choose the From Data Feeds button from the pop-up that you get.    When you choose "From Data Feeds” you will get a drop down, select “From Other Feeds” and you will be presented with the Data Feed Dialog above.  All you have to do is enter the same URL as before and Power Pivot will discover the available elements from the feed as shown here:

powerpivot2

You can import any of the sections from this screen, but keep in mind that if you try to import all of the catalog titles it is going to take a while!   Just to show that Excel can grab this data, I entered the URL from our browser above rather than the base of the feed to import the same 4 records as in our previous example.

excel

I really like the potential for this data protocol, and I’m really hoping it picks up steam.  I hope this quick intro at least convinces a few people to check out OData.  I should mention that producing OData is just as simple as consuming it.  Sharepoint 2010 supports this out of the box, and you can see a great example of it from the screencast available here.


Copyright © 2007 Jack Altiere. All rights reserved.