In 1998 I worked for a company that had a contract to process bulk email. The emails were to attempt to get leads to sign up for a trial membership and then reminders for when the trial would expire and the like. This raised some interesting questions about how to send bulk email and get the server to perform. The “best” answer to the problem was quite interesting – qmail on IRIX.
It turns out that most mail systems end up storing information in files on the file system. As an example – sendmail – generally regarded as the standard Mail Transfer Agent (MTA) stores three files on disk for each message in it’s queue. If you are sending a lot of email, that queue gets very big very fast. Most filesystems at the time used a linear scan to find directory entries which means that the average lookup order is 3n where n is the number of items in the queue. Irix’s filesystem, XFS (which is also available on Linux) uses a B+Tree to store directory entries in O(log n), which is much more efficient.
When I came to reinstall my own mail server a few years later, my principal issue was not with mail spools, but with IMAP message stores. Previous to this point I had been using sendmail and the University of Washington IMAP server which used mail spool files. I had an inbox containing about 2000 messages at the time; and operations on that box were becoming prohibitively slow. I installed the Cyrus IMAP server to replace the UW IMAP server and found that my IMAP performance dramatically improved. I was able to maintain an inbox with around 5000 messages in it before things became problematic again. This was a significant improvement, but I was now running into problems caused by the linear scan on the directory table in the ext2 filesystem.
The cyrus imap server is an interesting beast because it uses hash structures for indexes (excellent) and individual files for message content. This makes it much easier for the sever to perform operations on the mailbox. Deleting a message is a good example: for a spool file – the message must be removed, and then the whole file after the deleted message must be rewritten with the deleted message removed, which is a very expensive operation. For the cyrus server, this is implemented as a simple file delete.
Another imap server that can work in a similar manner to cyrus (and may be easier to administrate) is Dovecot. Dovecot allows it’s users to use either a Maildir (one file per message) structure, or a spool file structure. I haven’t actually worked with Dovecot so I don’t have too much to say about it other than to give it a mention. :).
When I tackled the next reinstall, I installed ReiserFS. ReiserFS has some features around it’s handling of lots of very small files which make it very space efficient where lots of small files are used. It also uses B+Tree directory structures and doesn’t have an inode count which limits file count before disk space. I used Postfix instead of sendmail, which seems to be regarded as having better security than sendmail or qMail (there is contentious debate on this topic). I also continued my use of the Cyrus imap server. Cyrus’s one file per message store is particularly potent when coupled with a filesystem like Reiser. I can pick any message from the 11,000 messages in my inbox and view it close to instantaneously. Deletions and moves are similarly quick. I also have the whole system authenticating against an LDAP directory as well which makes the setup very manageable, although I haven’t taken any steps to automate management since I don’t have to do a lot of it. I’m confident this is now a very robust and scalable mail server, and it definitely outperforms any Exchange server I’ve had the pleasure of using.
NB: Wikipedia hosts a Comparison of filesystems, which is also worth a read.