On Mon, Apr 9, 2012 at 1:09 PM, James B. Byrne byrnejb@harte-lyne.ca wrote:
CentOS-6.2
I am investigating how to split long lines present in a Mailman generated html archives. Mailman places the email bodies within <pre></pre> tags and some users have MUAs that send entire paragraphs as one long line.
Such users are usually tough customers, too. "flowed text" is the way they assert their personalities, I think.
I have looked at fmt and fold but these assume a pipeline from stdout to a fixed filename, which presumably is best done at the time of the original file's creation. I am looking for a way to deal with multiple existing files in a batch fashion so that the reformatted file is written back out to the same file name oin the same location.
It is very rare to see a Unix utility that operates "in place" like this. Off hand, I can't think of any.
I cannot seem to hit upon a way to get this to work using find, xargs and fmt (or fold). Nor can I seem to find an example of how this might be done using these utilities.
What I would like to discover is the functional equivalent of this:
find /path/to/archives/*.html -print | xargs -I {} fmt -s {} > {}
This syntax does not work of course because the xargs file name substitution only occurs once in the initial argument list of the following command. But, this example does describe the effect I wish to obtain, to have the original file name receive the reformatted contents.
Assuming that the fmt utility does what you want, then you will need a stanza something like this:
fmt -flagswhatever FILENAME >/tmp/mytemp mv /tmp/mytemp FILENAME
In other words you need a script, not a single pipe. You want fmt to operate on one file at a time.
find somedir -name "*.html" >/tmp/htmlstuff for FILENAME in `cat /tmp/htmlstuff` do fmt (flags) $FILENAME >/tmp/foo mv /tmp/foo $FILENAME done
That's not robust but is just for concept. More robust scripts would use "read" to get filenames, and would worry about embedded blanks in filenames, and other niceties. A real script would use mktemp to generate a temp filename.
fmt(1) is not robust, either. It will format the whole file with a single-minded determination. This includes mail headers, attachments, blah blah. it might even break the html. There are many unexpected consequences.
My advice is to not format these mails. Why do you want to? perhaps there is a work-around that meets your goals.
Dave