Re: [CentOS] sendmail and rbl blocking - generating statistics - Discuss

List overview All Threads
Download

newer

Re: [CentOS] sendmail and rbl blocking - generating statistics

older

CentOS-4.3 Install Fails

no=acpi?

Ryan Simpkins

14 Mar 2007 14 Mar '07

8:35 p.m.

On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):

...

On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:

...
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'

You can avoid the unnecessary 'cat' by just passing the filename to grep directly:

# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog

Would probably be more efficient and faster, you can test with 'time' to verify

this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.

Am I using time right to measure it?

# time cat /var/log/maillog | grep check_relay | grep -c njabl 8

real 0m0.299s user 0m0.289s sys 0m0.009s

# time grep -c 'check_relay.*njabl' /var/log/maillog 8

real 0m0.404s user 0m0.402s sys 0m0.000s

Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'?

I also tried this: time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8

real 0m0.325s user 0m0.312s sys 0m0.012s

time echo `grep -c 'check_relay.*njabl' /var/log/maillog` 8

real 0m0.411s user 0m0.408s sys 0m0.002s

I ran these several times mixed back and forth to try and see if they were flukes, these numbers appear to be representitive of the average. What do you get on your system? Maybe passing the file name to grep gets faster as the file size increases?

wc /var/log/maillog 12323 142894 1588860 /var/log/maillog

I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more expensive match:

time echo `grep -c 'check_relay' /var/log/maillog | grep njabl`

real 0m0.269s user 0m0.263s sys 0m0.006s

-Ryan

Show replies by date

John Summerfield

14 Mar 14 Mar

10:15 p.m.

New subject: sendmail and rbl blocking - generating statistics

Ryan Simpkins wrote:

...

On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):

...
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:

...
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'

You can avoid the unnecessary 'cat' by just passing the filename to grep directly:

# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog

Would probably be more efficient and faster, you can test with 'time' to verify

this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.

Am I using time right to measure it?

No, you're timing the cat only.

...

# time cat /var/log/maillog | grep check_relay | grep -c njabl 8

real 0m0.299s user 0m0.289s sys 0m0.009s

Too short for useful measurement with these tools.

-- Cheers John -- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu Please do not reply off-list

Will McDonald

10:35 p.m.

New subject: sendmail and rbl blocking - generating statistics

On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:

...

Ryan Simpkins wrote:

...
Am I using time right to measure it?

No, you're timing the cat only.

I don't think that's the case, you know. If I run the following:

[wmcdonald@stella ~]$ ls -lh /tmp/messages.1 -rw-r--r-- 1 root root 4.3M Mar 14 20:03 /tmp/messages.1 [wmcdonald@stella ~]$ time cat /tmp/messages.1 1> /dev/null

real 0m0.018s user 0m0.001s sys 0m0.017s

[wmcdonald@stella ~]$ time cat /tmp/messages.1 | grep '*.foo' 1> /dev/null

real 0m0.047s user 0m0.021s sys 0m0.026s

Running both commands repeatedly shows similar time differences, I think 'time''s timing the execution time of the whole command.

Will.

chrism＠imntv.com

10:51 p.m.

New subject: sendmail and rbl blocking - generating statistics

You could always try using pflogsum and season it to taste.

Cheers,

John Summerfield

10:56 p.m.

New subject: sendmail and rbl blocking - generating statistics

Will McDonald wrote:

...

On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:

...
Ryan Simpkins wrote:

...
Am I using time right to measure it?

No, you're timing the cat only.

I don't think that's the case, you know. If I run the following:

[summer@bilby ~]$ time sleep 10s;sleep 10s

real 0m10.011s user 0m0.000s sys 0m0.003s [summer@bilby ~]$ time sleep 10s|sleep 10s

real 0m10.002s user 0m0.001s sys 0m0.003s [summer@bilby ~]$ time sleep 10s | time sleep 10s 0.00user 0.00system 0:09.99elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+150minor)pagefaults 0swaps

real 0m10.011s user 0m0.000s sys 0m0.006s [summer@bilby ~]$

...

[wmcdonald@stella ~]$ ls -lh /tmp/messages.1 -rw-r--r-- 1 root root 4.3M Mar 14 20:03 /tmp/messages.1 [wmcdonald@stella ~]$ time cat /tmp/messages.1 1> /dev/null

real 0m0.018s user 0m0.001s sys 0m0.017s

[wmcdonald@stella ~]$ time cat /tmp/messages.1 | grep '*.foo' 1> /dev/null

real 0m0.047s user 0m0.021s sys 0m0.026s

Running both commands repeatedly shows similar time differences, I think 'time''s timing the execution time of the whole command.

I think that writing to a pipe is more expensive than writing to /dev/null. Needs buffering etc.

Try

time cat /tmp/messages.1 | grep \ | grep '*.foo' 1> /dev/null

-- Cheers John -- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu Please do not reply off-list

Will McDonald

11:33 p.m.

New subject: sendmail and rbl blocking - generating statistics

On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:

...

Will McDonald wrote:

...
On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:

...
Ryan Simpkins wrote:

...
Am I using time right to measure it?

No, you're timing the cat only.

I don't think that's the case, you know. If I run the following:

[summer@bilby ~]$ time sleep 10s;sleep 10s

real 0m10.011s user 0m0.000s sys 0m0.003s [summer@bilby ~]$ time sleep 10s|sleep 10s

real 0m10.002s user 0m0.001s sys 0m0.003s [summer@bilby ~]$ time sleep 10s | time sleep 10s 0.00user 0.00system 0:09.99elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+150minor)pagefaults 0swaps

real 0m10.011s user 0m0.000s sys 0m0.006s [summer@bilby ~]$

I sit corrected, thanks John. A subshell appears to show the expected behaviour...

[wmcdonald@stella ~]$ time $(sleep 10s; sleep 10s)

real 0m20.011s user 0m0.001s sys 0m0.006s

Will.

William L. Maltby

17 Mar 17 Mar

7:53 p.m.

New subject: sendmail and rbl blocking - generating statistics

On Wed, 2007-03-14 at 22:35 +0000, Will McDonald wrote:

...

On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:

...
Ryan Simpkins wrote:

...
Am I using time right to measure it?

No, you're timing the cat only.

Correct.

...

I don't think that's the case, you know. If I run the following:

[wmcdonald@stella ~]$ ls -lh /tmp/messages.1 -rw-r--r-- 1 root root 4.3M Mar 14 20:03 /tmp/messages.1 [wmcdonald@stella ~]$ time cat /tmp/messages.1 1> /dev/null

real 0m0.018s user 0m0.001s sys 0m0.017s

[wmcdonald@stella ~]$ time cat /tmp/messages.1 | grep '*.foo' 1> /dev/null

real 0m0.047s user 0m0.021s sys 0m0.026s

Running both commands repeatedly shows similar time differences, I think 'time''s timing the execution time of the whole command.

I'm unsure, but I think that's wrong?

Try time { cat /tmp/messages.1 | grep '*.foo' 1> /dev/null ; }

This avoids certain overheads and does the job OP really wants?

...

Will.

<snip sig stuff>

-- Bill

Will McDonald

14 Mar 14 Mar

10:16 p.m.

New subject: sendmail and rbl blocking - generating statistics

On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:

...

On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):

...
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:

...
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'

You can avoid the unnecessary 'cat' by just passing the filename to grep directly:

# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog

Would probably be more efficient and faster, you can test with 'time' to verify

this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.

Am I using time right to measure it?

Yep.

...

# time cat /var/log/maillog | grep check_relay | grep -c njabl 8

real 0m0.299s user 0m0.289s sys 0m0.009s

# time grep -c 'check_relay.*njabl' /var/log/maillog 8

real 0m0.404s user 0m0.402s sys 0m0.000s

Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'?

It should be the time taken for the command line to execute.

...

I also tried this: time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8

real 0m0.325s user 0m0.312s sys 0m0.012s

time echo `grep -c 'check_relay.*njabl' /var/log/maillog` 8

real 0m0.411s user 0m0.408s sys 0m0.002s

I ran these several times mixed back and forth to try and see if they were flukes, these numbers appear to be representitive of the average. What do you get on your system? Maybe passing the file name to grep gets faster as the file size increases?

wc /var/log/maillog 12323 142894 1588860 /var/log/maillog

I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more expensive match:

I think you're correct, that regexp wildcard is slower. I've done similar cat/grep/awk tests myself and in *some* cases using awk's pattern matching '/foo/ { awkstuff }' has been quicker than grep so it's always worth running the numbers a couple of times to see what's most effective for a given/typical dataset.

The removal of the redundant cat still stands though. There really is no conceivable benefit to forking that additional process. I don't think, anyway. :)

And of course, when you start to loop through running

for i in `list of stuff` do grep blah | grep -c snee done

for example, depending on the number of iterations through the loop it's worth thinking about how you're doing stuff. There is an element of early overoptimisation mind, if something's working on a box that's NOT heavily loaded then don't sweat it.

Will.

Ryan Simpkins

15 Mar 15 Mar

7:56 p.m.

New subject: sendmail and rbl blocking - generating statistics

On Wed, March 14, 2007 16:16, Will McDonald wrote:

...

On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:

...
On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):

...
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:

...
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'

You can avoid the unnecessary 'cat' by just passing the filename to grep

directly:

...
# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog

Would probably be more efficient and faster, you can test with 'time' to verify

this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.

Am I using time right to measure it?

I see from other posts I wasn't using it right. So I re-wrote and tested again on the same system, about the same log size:

########################## $ cat timetest1 #!/bin/bash

for x in `seq 1 3000`; do cat /var/log/maillog | grep check_relay | grep -c njabl > /dev/null done

$ time ./timetest1

real 0m36.685s user 0m12.505s sys 0m24.136s

########################## $ cat timetest2 #!/bin/bash

for x in `seq 1 3000`; do grep -c 'check_relay.*njabl' /var/log/maillog > dev/null done

$ time ./timetest2

real 2m57.914s user 2m50.574s sys 0m7.134s

########################## $ cat timetest3 #!/bin/bash

for x in `seq 1 3000`; do grep -c njabl /var/log/maillog > dev/null done

$ time ./timetest3

real 0m13.331s user 0m6.895s sys 0m6.429s

########################## $ cat timetest4 #!/bin/bash

for x in `seq 1 3000`; do cat /var/log/maillog | grep -c njabl > /dev/null done

$ time ./timetest4

real 0m28.442s user 0m9.520s sys 0m18.905s

I think this proves the original poster right on his main point. Getting rid of the cat speeds things up quite a bit. However, it could be argued that it only matters if you are doing quite a few in a row, in this case 3000. And it further proves that doing a 'pattern*pattern' is not a good idea at all (at least not with grep).

One poster also argued on ease of coding. I typically code like thus (my brain thinking inside the '*'):

The alternate method?

less file; *Right data, I see the patterns* grep pattern file | less; *mistake* grep pattern2 file | less; *right, time to reduce* grep pattern2+pattern3 file | less; *Yes, that is right*

What I don't like about the alternate method is where the file name lives in the first two lines between the comparison. Also, the pattern is before the file on the first grep, making it harder to adjust the pattern (which some of us need to do quite a lot). It makes more sense to me to just add a | on the end and keep going. Further, for me, it is easier to reduce data by stringing greps together rather than come up with the regex-fu to do it all in one pattern. Maybe if I were better at regex...

However, I 100% agree that doing strings of | produces inefficient more often. I think it is wise to go back and find efficiencies when needed.

-Ryan

Will McDonald

8:48 p.m.

New subject: sendmail and rbl blocking - generating statistics

On 15/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:

...

less file; *Right data, I see the patterns* grep pattern file | less; *mistake* grep pattern2 file | less; *right, time to reduce* grep pattern2+pattern3 file | less; *Yes, that is right*

What I don't like about the alternate method is where the file name lives in the first two lines between the comparison. Also, the pattern is before the file on the first grep, making it harder to adjust the pattern (which some of us need to do quite a lot). It makes more sense to me to just add a | on the end and keep going. Further, for me, it is easier to reduce data by stringing greps together rather than come up with the regex-fu to do it all in one pattern. Maybe if I were better at regex...

Do you use bash command line shortcuts?

I have CTRL-A, CTRL-E, META-F [1], META-B and META-D ingrained in my fingers which eases the pain of things not being *quite* where you want them.

CTRL-A - jump to beginning of line (like HOME if your terminal's setup right) CTRL-E - jump to end of line (like END if your terminal's setup right) META-F - forward one word at a time, like 'w' in Vi. META-B - backward one word at a time, like 'b' in Vi. META-D - delete one word, like 'dw' in Vi.

There are more but learning those couple be heart really helps me, even on misconfigured terminals[2]. For example, to change the 'file' element in...

$ something /path/to/file | alskhflkasdflasjdfljk | lajkdhflakjsdflkasjd | alsdjkhflasdjkhf

CTRL-A ALT-F ALT-F ALT-D start typing replacement filename. Which is much easier that it looks when actually typed out. :)

Will.

[1] Typically ALT [2] Other people's obviously :)

John Summerfield

9:02 p.m.

New subject: sendmail and rbl blocking - generating statistics

Ryan Simpkins wrote:

...

However, I 100% agree that doing strings of | produces inefficient more often. I think it is wise to go back and find efficiencies when needed.

Which efficiency is more important, yours or the computer's?

How about the time spent deciding which way's better;-)

btw The simple answers to the above are wrong. One person waiting once for a computer for five minutes too long isn't a great problem. 100,000 people waiting one minute too long is an enormous problem.

-- Cheers John -- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu Please do not reply off-list

Erick Perez

16 Mar 16 Mar

4:57 a.m.

New subject: sendmail and rbl blocking - generating statistics

People, thanks to all for your posts. I understand some of them, some i don't 'cause i don't do scripting a lot. However you have helped me to solve my main issue.

Thanks again, you have made my day :-))

On 3/15/07, John Summerfield debian@herakles.homelinux.org wrote:

...

Ryan Simpkins wrote:

...
However, I 100% agree that doing strings of | produces inefficient more often. I think it is wise to go back and find efficiencies when needed.

Which efficiency is more important, yours or the computer's?

How about the time spent deciding which way's better;-)

btw The simple answers to the above are wrong. One person waiting once for a computer for five minutes too long isn't a great problem. 100,000 people waiting one minute too long is an enormous problem.

--

Cheers John

-- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu

Please do not reply off-list _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

-- ------------------------------------------------------------ Erick Perez Panama Sistemas Integradores de Telefonia IP y Soluciones Para Centros de Datos Panama, Republica de Panama Cel Panama. +(507) 6694-4780 ------------------------------------------------------------

6693

Age (days ago)

6696

Last active (days ago)

discuss@lists.centos.org

11 comments

6 participants

tags (0)

participants (6)

chrism＠imntv.com
Erick Perez
John Summerfield
Ryan Simpkins
Will McDonald
William L. Maltby