[CentOS] sendmail and rbl blocking - generating statistics

Wed Mar 14 22:16:33 UTC 2007
Will McDonald <wmcdonald at gmail.com>

On 14/03/07, Ryan Simpkins <centos at ryansimpkins.com> wrote:
> On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
> > On 14/03/07, Ryan Simpkins <centos at ryansimpkins.com> wrote:
> >> Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
> >
> > You can avoid the unnecessary 'cat' by just passing the filename to grep directly:
> >
> > # grep -c 'checK_relay.*spamhaus' /var/log/maillog
> > # grep -c 'checK_relay.*spamcop' /var/log/maillog
> > # grep -c 'checK_relay.*njabl' /var/log/maillog
> >
> > Would probably be more efficient and faster, you can test with 'time' to verify
> this. You're spawning one process 'grep', instead of three seperate processes,
> 'cat, 'grep' and 'grep' again.
>
> Am I using time right to measure it?

Yep.

> # time cat /var/log/maillog | grep check_relay | grep -c njabl
> 8
>
> real    0m0.299s
> user    0m0.289s
> sys     0m0.009s
>
> # time grep -c 'check_relay.*njabl' /var/log/maillog
> 8
>
> real    0m0.404s
> user    0m0.402s
> sys     0m0.000s
>
> Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'?

It should be the time taken for the command line to execute.

> I also tried this:
> time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8
>
> real    0m0.325s
> user    0m0.312s
> sys     0m0.012s
>
> time echo `grep -c 'check_relay.*njabl' /var/log/maillog`
> 8
>
> real    0m0.411s
> user    0m0.408s
> sys     0m0.002s
>
> I ran these several times mixed back and forth to try and see if they were flukes,
> these numbers appear to be representitive of the average. What do you get on your
> system? Maybe passing the file name to grep gets faster as the file size increases?
>
> wc /var/log/maillog
>   12323  142894 1588860 /var/log/maillog
>
> I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more
> expensive match:

I think you're correct, that regexp wildcard is slower. I've done
similar cat/grep/awk tests myself and in *some* cases using awk's
pattern matching '/foo/ { awkstuff }' has been quicker than grep so
it's always worth running the numbers a couple of times to see what's
most effective for a given/typical dataset.

The removal of the redundant cat still stands though. There really is
no conceivable benefit to forking that additional process. I don't
think, anyway. :)

And of course, when you start to loop through running

for i in `list of stuff`
do
  grep blah | grep -c snee
done

for example, depending on the number of iterations through the loop
it's worth thinking about how you're doing stuff. There is an element
of early overoptimisation mind, if something's working on a box that's
NOT heavily loaded then don't sweat it.

Will.