Hi All ;)
Is there an option to compact large mbox files from the shell? I did not find anything in google, I have some very large constantly updated mbox files and would like to know if they can be made smaller with any tool. AFAIK mutt does such operation when for example an email is deleted but I am curious if there are other options.
BR, Rafal.
On 04/06/2014 02:09 PM, Rafał Radecki wrote:
Hi All ;)
Is there an option to compact large mbox files from the shell? I did not find anything in google, I have some very large constantly updated mbox files and would like to know if they can be made smaller with any tool.
rm makes them a lot smaller. gzip not as much, but you can get the content back.
sorry for the noise... :-)
On Sun, 6 Apr 2014 14:09:45 +0200 Rafał Radecki radecki.rafal@gmail.com wrote:
Is there an option to compact large mbox files from the shell?
Couple of different filesystems support on the fly compression. zfs and btrfs come to mind.
On 2014-04-14, Chris ch2009@arcor.de wrote:
On 04/06/2014 02:09 PM, Rafa?? Radecki wrote:
I have some very large constantly updated mbox files
I don't know a tool to compact them, but I would consider converting them to Maildir. Although they won't need less space, handling them will be easier.
In the context of the OP, when mutt tries to deal with a message (e.g., deleting, moving to a folder), it can be boatloads faster, since handling the message works on a small file which contains just that message. Deleting a message from an mbox mailbox, for example, requires rewriting the entire changed mbox file to disk (minus the deleted message). Deleting a message from a Maildir mailbox is just removing one file from a directory.
--keith
On Apr 13, 2014, at 10:25 PM, Keith Keller kkeller@wombat.san-francisco.ca.us wrote:
In the context of the OP, when mutt tries to deal with a message (e.g., deleting, moving to a folder), it can be boatloads faster, since handling the message works on a small file which contains just that message. Deleting a message from an mbox mailbox, for example, requires rewriting the entire changed mbox file to disk (minus the deleted message). Deleting a message from a Maildir mailbox is just removing one file from a directory.
HOWEVER. When a directory grows too large, the OS can take a long time to seek through the directory, which can cause its own set of problems. And this makes cleaning out a maildir directory selectively a real pain. Maildir really could do with a hashing mechanism.
--Russell
On 4/13/2014 10:41 PM, Russell Miller wrote:
HOWEVER. When a directory grows too large, the OS can take a long time to seek through the directory, which can cause its own set of problems. And this makes cleaning out a maildir directory selectively a real pain. Maildir really could do with a hashing mechanism.
some file systems are better at this than others... like, xfs does quite well with 1000s of small files in a directory.
I wonder what thunderbird uses? I have 12000 messages in my 'centos' folder, 24720 in another folder, yet it seems quite snappy to find and delete individual messages.
On Sun, Apr 13, 2014 at 10:41:14PM -0700, Russell Miller wrote:
On Apr 13, 2014, at 10:25 PM, Keith Keller kkeller@wombat.san-francisco.ca.us wrote:
In the context of the OP, when mutt tries to deal with a message (e.g., deleting, moving to a folder), it can be boatloads faster, since handling the message works on a small file which contains just that message. Deleting a message from an mbox mailbox, for example, requires rewriting the entire changed mbox file to disk (minus the deleted message). Deleting a message from a Maildir mailbox is just removing one file from a directory.
Time spent with mutt searching a directory can be drastically cut by using caching. See my old page, http://home.roadrunner.com/~computertaijutsu/mutt.html#IMAP
Even if not using IMAP, using a $HOME/.mutt_cache can greatly speed things up.
On Sun, Apr 13, 2014, Russell Miller wrote:
On Apr 13, 2014, at 10:25 PM, Keith Keller kkeller@wombat.san-francisco.ca.us wrote:
In the context of the OP, when mutt tries to deal with a message (e.g., deleting, moving to a folder), it can be boatloads faster, since handling the message works on a small file which contains just that message. Deleting a message from an mbox mailbox, for example, requires rewriting the entire changed mbox file to disk (minus the deleted message). Deleting a message from a Maildir mailbox is just removing one file from a directory.
HOWEVER. When a directory grows too large, the OS can take a long time to seek through the directory, which can cause its own set of problems. And this makes cleaning out a maildir directory selectively a real pain. Maildir really could do with a hashing mechanism.
We have been using Maildir with courier-imap for decades, and haven't had an issue with this. My security folder typically has 25,000+ messages for the last 7 days messages, and accessing either with IMAP or directly with mutt isn't a problem.
I have written various scripts over the years to convert from various mail storage formats ranging from SCO's horrible ctrl-a delimited through the U.W. IMAP, and ones that query other IMAP servers to convert their folder structures to local Maildir.
Maildir is generally very easy to handle with standard *nix command line tools. We have moved mail servers for some regional ISPs by rsync'ing with tens of thousands of email customers by rsync'ing from the old server to the new one to get the bulk of the mail across before cutting over to the new machine. Then we shut the old server down, change the DNS to point to the new one, and finally do a new rsync --delete to update the new machine. There's a period where some deleted messages may reappear on the client's email before the rsync is complete, but all new messages appear immediately.
Bill
We have been using Maildir with courier-imap for decades, and haven't had an issue with this. My security folder typically has 25,000+ messages for the last 7 days messages, and accessing either with IMAP or directly with mutt isn't a problem.
I have written various scripts over the years to convert from various mail storage formats ranging from SCO's horrible ctrl-a delimited through the U.W. IMAP, and ones that query other IMAP servers to convert their folder structures to local Maildir.
Maildir is generally very easy to handle with standard *nix command line tools.
As some have noted, modern filesystems are better at this than ones such as ext2. However, even in the best of cases, there are still situations where maildirs with a lot of messages are awkward to handle. Specifically, if you're trying to find specific messages based on criteria that are not easily discernable from the inode, for example, things with attachments. The awkwardness comes from the fact that the shell has a maximum argument size, so you can't use *. You have to use a bit more script-fu, such as find, etc.
Even if there aren't huge issues with doing this, it's an easily fixed thing. Allowing directories to have hundreds of thousands of entries as a matter of course, even if it's something that causes no issues in many cases, to me is an architectural issue.
But then, I noticed my beard is starting to turn grey the other day, so maybe I should just get out the COBOL and tell everyone how we did it when I was a kid.
--Russell
On Mon, Apr 14, 2014 at 9:49 PM, Russell Miller duskglow@gmail.com wrote:
Even if there aren't huge issues with doing this, it's an easily fixed thing. Allowing directories to have hundreds of thousands of entries as a matter of course, even if it's something that causes no issues in many cases, to me is an architectural issue.
Even if modern systems sort-of handle it, it still seems like a bad thing to do when you consider that opening a file for writing has to atomically decide whether that name already exists before creating it - so other concurrent create/delete operations have to be blocked.
On 4/14/2014 9:51 PM, Les Mikesell wrote:
Even if modern systems sort-of handle it, it still seems like a bad thing to do when you consider that opening a file for writing has to atomically decide whether that name already exists before creating it
- so other concurrent create/delete operations have to be blocked.
the better file systems (xfs, zfs, ntfs at least) use a b-tree directory structure, so finding a filename out of 10s of 1000s is very little overhead.
On 2014-04-15, Russell Miller duskglow@gmail.com wrote:
As some have noted, modern filesystems are better at this than ones such as ext2. However, even in the best of cases, there are still situations where maildirs with a lot of messages are awkward to handle. Specifically, if you're trying to find specific messages based on criteria that are not easily discernable from the inode, for example, things with attachments.
This will be bad with an mbox mailbox too. Actually it'll be worse, because it'll be too hard to tell which message the grep hits.
--keith
On 04/14/2014 01:41 AM, Russell Miller wrote:
HOWEVER. When a directory grows too large, the OS can take a long time to seek through the directory, which can cause its own set of problems. And this makes cleaning out a maildir directory selectively a real pain. Maildir really could do with a hashing mechanism.
Worse, if the dir gets too big, even after files are deleted it can be very slow. I had one case with >1,000,000 messages in a single maildir (spam on steroids, was getting 80,000 messages per hour overnight); after it was cleaned out to <1,000 messages it still took several minutes to ls the dir, and the machine's responsiveness went through the floor. Copying to a new dir and renaming fixed the slowdown; the directory was >50MB (the directory itself, not its contents).
I'd rather have mbox for plain text e-mail storage, and a database for something really high performance.