Tuesday, June 27, 2006

Nestoria Launches

A couple of months ago, I talked about Opencage, a stealth startup in London. Ed Freyfogle, who was my mentor when I interned at Yahoo Germany in 2001 (aaah, the good old days), assembled a team to build a real estate search company.

After what must have been long nights of hard work, Ed and the team have launched Nestoria. Far from the austerity of Craigslist, this site focuses on the user experience: Strong search functionality, maps, and much extra info (pictures of the neighborhood, nearby tube lines, schools, pubs, etc.).

My favorite visual feature is the display of nearby neighborhoods on the map. One click takes you to the hipper, less-expensive neighborhood next door.

Nestoria is very open about sharing data with others. In fact, their approach to building the website reminds me of the "Native to a Web of Data" presentation by Tom Coates of Yahoo at this year's Future of Web Apps summit. They have hackable URLs, an RSS feed for every page, and will even offer direct access to their database to external developers.

They're are also helping the open source community by sponsoring – among others – the Mapstraction project, which is trying to pull together all the different mapping APIs (Google, Yahoo, Microsoft) into a common abstraction.

Full disclosure: I had taken on a small consulting role for a part of the system design for Nestoria, but do not own stock in the company.

Saturday, June 24, 2006

Some Random Observations

Some random observations from the last few days:
  1. The Swiss are Going Crazy
  2. Recruiting Like it's 1999
  3. The Shrunken Textbook

The Swiss are Going Crazy (over Soccer)

The Swiss are calm but patriotic people. Since the soccer World Cup started a few weeks ago, there have been Swiss flags everywhere. I bet the current Swiss flag density beats the average US flag density in the Midwest – quite an achievement. All around town, people are wearing red-white Switzerland T-shirts, which convinced even me to wear mine (once).

After the 2:0 win vs. Korea last night, craziness set in. All around Zurich, people were honking horns and screaming "Hopp Schwiiiiz!" (loosely: "Go Switzerland!"). Due to the huge crowds downtown, the super-punctual tram and bus system came to a complete halt. But these people were genuinely having fun.

At Bellevue, some decided to climb on one of the tram stops. Sounds unspectacular (especially considering the hooliganism in other countries), but such behavior is practically unseen here. Here's a video clip showing the action – sorry for the poor quality.

So much for the cliché of the cool, composed Swiss. When it comes to soccer, they go crazy!

Side note: I've made a bet with some students that if the Swiss win the world cup, I'll have to add an appendix to my Master's thesis that formally explains the rules of the game. I'm not worried.

Recruiting Like it's 1999

In 1999, I remember walking by a huge poster with a recruiting ad for software engineers at a train station in Germany. Back then I thought "Wow, what is a great waste of money!" – seriously, how many of the people who walk by that ad are going to be in the target demographic? I can understand such ads at the Kendall Square T Stop, but not there.

After having almost forgotten this incident, I had a déjà vu last week when I saw a BSI recruiting ad at Zurich's main station.

Do ads like this always coincide with a certain point of a 7-year business cycle? That would give the current boom about 2-3 more years. Or is this all just a ploy to get me to blog about a recruiting ad? Who knows.

The Shrunken Textbook

For my Master's, I've been dealing a lot with natural language processing. Two friends from Google independently recommended reading Foundations of Statistical Natural Language Processing by Manning and Schütze. I went to the ETH library and checked it out. It turned out to be an absolutely fantastic book, so I decided to shell out cold, hard cash and get it from Amazon for $77 plus shipping.

Boy, was I surprised when the book that arrived! It was much slimmer than the version I had checked out from the library. But even though 1.6 cm had been shaved off the book, none of the pages were missing. Between the fifth and sixth printing, MIT Press decided to switch to (much) thinner paper.

Thursday, June 01, 2006

E-Mail: What's in a Folder?

My previous post turned out to be very popular: Thanks to a combined digg-and-reddit effect, plus getting quoted in various places, 15000+ people have read it so far. I hope some of the new readers will stick around for more.

There was plenty of feedback and controversy regarding my request for making web apps work offline. While I have neatly organized all the suggestions, I haven't been able to thoroughly look at many of them. Therefore, today's topic is about something completely different: e-mail foldering.

When Gmail came out, one of the things I cried loudest for was folders. These days, I still use Mozilla Thunderbird for my main e-mail, and have a wonderful system of folders and search folders. Coincidentally, my Master's thesis is about better organizing user's e-mail, a topic I have written about before.

Almost anyone who uses a desktop mail client organizes mail into folders. I think it's about time to ask: What are folders all about?

Folders group related items. What does "related" mean? What do e-mails in a folder have in common? My grand unified theory of mail folders is as follows. Folders are used to group mails by:
  1. by groups of senders or
  2. by topic.
This is pretty obvious: Users are likely to have a "family" folder or a folder for the heavy-volume mailing list they were too lazy to unsubscribe from. Similarly, they may have a folder named "Fluffy Tiger", where they collect e-mails on that super-secret project they're working on.

How do we free the user from this burden and classify all his e-mails automatically?

For folders with e-mails from specific senders, things are pretty easy. For those about a certain topic, this is wicked hard. How do you recognize that certain e-mails belong to the same topic? They're not all from the same people. They may be about the same concept but use completely different words. In addition, folders may be extremely small and not offer very much data for automatic classification.

For these reasons, automatic foldering has accuracy rates of anywhere from 60% to 80%, depending on the classification algorithm and the corpus used for testing. 20-40% of e-mails get misclassified.

Somewhere on my desk, there is a growing stack of papers. One of these describes SwiftFile (formerly known as MailCat), a system designed at IBM Research in 1999 for Lotus Notes. It shows a method for compensating for the low accuracy of automatic classifiers: Instead of instantly filing an e-mail away, SwiftFile shows buttons with the three most likely folders it may land in. The likelihood that at least one of the three buttons will be the correct folder is 80-90%. For a large majority of e-mails, instead of dragging them to the right folder, all you'll need to do is click a button.

So today's question is: Hot or not? Would you like SwiftFile-like buttons in your mail client? Write a comment and let me know.


Thanks to Fabian Siegel and Bálint Miklós for reviewing a draft of this.