[CentOS] sendmail and rbl blocking - generating statistics
Ryan Simpkins
centos at ryansimpkins.com
Thu Mar 15 19:56:12 UTC 2007
On Wed, March 14, 2007 16:16, Will McDonald wrote:
> On 14/03/07, Ryan Simpkins <centos at ryansimpkins.com> wrote:
>> On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
>> > On 14/03/07, Ryan Simpkins <centos at ryansimpkins.com> wrote:
>> >> Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
>> >
>> > You can avoid the unnecessary 'cat' by just passing the filename to grep
>> directly:
>> >
>> > # grep -c 'checK_relay.*spamhaus' /var/log/maillog
>> > # grep -c 'checK_relay.*spamcop' /var/log/maillog
>> > # grep -c 'checK_relay.*njabl' /var/log/maillog
>> >
>> > Would probably be more efficient and faster, you can test with 'time' to verify
>> this. You're spawning one process 'grep', instead of three seperate processes,
>> 'cat, 'grep' and 'grep' again.
>>
>> Am I using time right to measure it?
I see from other posts I wasn't using it right. So I re-wrote and tested again on
the same system, about the same log size:
##########################
$ cat timetest1
#!/bin/bash
for x in `seq 1 3000`; do
cat /var/log/maillog | grep check_relay | grep -c njabl > /dev/null
done
$ time ./timetest1
real 0m36.685s
user 0m12.505s
sys 0m24.136s
##########################
$ cat timetest2
#!/bin/bash
for x in `seq 1 3000`; do
grep -c 'check_relay.*njabl' /var/log/maillog > dev/null
done
$ time ./timetest2
real 2m57.914s
user 2m50.574s
sys 0m7.134s
##########################
$ cat timetest3
#!/bin/bash
for x in `seq 1 3000`; do
grep -c njabl /var/log/maillog > dev/null
done
$ time ./timetest3
real 0m13.331s
user 0m6.895s
sys 0m6.429s
##########################
$ cat timetest4
#!/bin/bash
for x in `seq 1 3000`; do
cat /var/log/maillog | grep -c njabl > /dev/null
done
$ time ./timetest4
real 0m28.442s
user 0m9.520s
sys 0m18.905s
I think this proves the original poster right on his main point. Getting rid of the
cat speeds things up quite a bit. However, it could be argued that it only matters
if you are doing quite a few in a row, in this case 3000. And it further proves that
doing a 'pattern*pattern' is not a good idea at all (at least not with grep).
One poster also argued on ease of coding. I typically code like thus (my brain
thinking inside the '*'):
cat file | less; *yes, that is the right data, and I see the pattern I wanna match*
cat file | grep pattern | less; *ahh, mistake*
cat file | grep pattern2 | less; *yes, that is right, but still need to reduce*
cat file | grep pattern2 | grep pattern3 | less; *yes, that is looking about right*
The alternate method?
less file; *Right data, I see the patterns*
grep pattern file | less; *mistake*
grep pattern2 file | less; *right, time to reduce*
grep pattern2+pattern3 file | less; *Yes, that is right*
What I don't like about the alternate method is where the file name lives in the
first two lines between the comparison. Also, the pattern is before the file on the
first grep, making it harder to adjust the pattern (which some of us need to do
quite a lot). It makes more sense to me to just add a | on the end and keep going.
Further, for me, it is easier to reduce data by stringing greps together rather than
come up with the regex-fu to do it all in one pattern. Maybe if I were better at
regex...
However, I 100% agree that doing strings of | produces inefficient more often. I
think it is wise to go back and find efficiencies when needed.
-Ryan
More information about the CentOS
mailing list