Thursday, June 01, 2006

E-Mail: What's in a Folder?

My previous post turned out to be very popular: Thanks to a combined digg-and-reddit effect, plus getting quoted in various places, 15000+ people have read it so far. I hope some of the new readers will stick around for more.

There was plenty of feedback and controversy regarding my request for making web apps work offline. While I have neatly organized all the suggestions, I haven't been able to thoroughly look at many of them. Therefore, today's topic is about something completely different: e-mail foldering.

When Gmail came out, one of the things I cried loudest for was folders. These days, I still use Mozilla Thunderbird for my main e-mail, and have a wonderful system of folders and search folders. Coincidentally, my Master's thesis is about better organizing user's e-mail, a topic I have written about before.

Almost anyone who uses a desktop mail client organizes mail into folders. I think it's about time to ask: What are folders all about?

Folders group related items. What does "related" mean? What do e-mails in a folder have in common? My grand unified theory of mail folders is as follows. Folders are used to group mails by:
  1. by groups of senders or
  2. by topic.
This is pretty obvious: Users are likely to have a "family" folder or a folder for the heavy-volume mailing list they were too lazy to unsubscribe from. Similarly, they may have a folder named "Fluffy Tiger", where they collect e-mails on that super-secret project they're working on.

How do we free the user from this burden and classify all his e-mails automatically?

For folders with e-mails from specific senders, things are pretty easy. For those about a certain topic, this is wicked hard. How do you recognize that certain e-mails belong to the same topic? They're not all from the same people. They may be about the same concept but use completely different words. In addition, folders may be extremely small and not offer very much data for automatic classification.

For these reasons, automatic foldering has accuracy rates of anywhere from 60% to 80%, depending on the classification algorithm and the corpus used for testing. 20-40% of e-mails get misclassified.


Somewhere on my desk, there is a growing stack of papers. One of these describes SwiftFile (formerly known as MailCat), a system designed at IBM Research in 1999 for Lotus Notes. It shows a method for compensating for the low accuracy of automatic classifiers: Instead of instantly filing an e-mail away, SwiftFile shows buttons with the three most likely folders it may land in. The likelihood that at least one of the three buttons will be the correct folder is 80-90%. For a large majority of e-mails, instead of dragging them to the right folder, all you'll need to do is click a button.

So today's question is: Hot or not? Would you like SwiftFile-like buttons in your mail client? Write a comment and let me know.

--

Thanks to Fabian Siegel and Bálint Miklós for reviewing a draft of this.

2 comments:

dan said...

I can see how this would be useful ... but the extra bar with the buttons could be unnerving after a while. Add a "Keep in Inbox" button.

Michael Greenberg said...

Yes -- that would be clever and handy. My current solution to the "topic problem" is to have multiple topic-oriented e-mail addresses. I've seen people doing quite advanced things with procmail to make this automatic, but I'm not sure if it's worth it.

Then again, it's only two clicks to label something in GMail.