On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
You can avoid the unnecessary 'cat' by just passing the filename to grep directly:
# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog
Would probably be more efficient and faster, you can test with 'time' to verify
this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.
Am I using time right to measure it?
# time cat /var/log/maillog | grep check_relay | grep -c njabl 8
real 0m0.299s user 0m0.289s sys 0m0.009s
# time grep -c 'check_relay.*njabl' /var/log/maillog 8
real 0m0.404s user 0m0.402s sys 0m0.000s
Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'?
I also tried this: time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8
real 0m0.325s user 0m0.312s sys 0m0.012s
time echo `grep -c 'check_relay.*njabl' /var/log/maillog` 8
real 0m0.411s user 0m0.408s sys 0m0.002s
I ran these several times mixed back and forth to try and see if they were flukes, these numbers appear to be representitive of the average. What do you get on your system? Maybe passing the file name to grep gets faster as the file size increases?
wc /var/log/maillog 12323 142894 1588860 /var/log/maillog
I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more expensive match:
time echo `grep -c 'check_relay' /var/log/maillog | grep njabl`
real 0m0.269s user 0m0.263s sys 0m0.006s
-Ryan
Ryan Simpkins wrote:
On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
You can avoid the unnecessary 'cat' by just passing the filename to grep directly:
# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog
Would probably be more efficient and faster, you can test with 'time' to verify
this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.
Am I using time right to measure it?
No, you're timing the cat only.
# time cat /var/log/maillog | grep check_relay | grep -c njabl 8
real 0m0.299s user 0m0.289s sys 0m0.009s
Too short for useful measurement with these tools.
On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:
Ryan Simpkins wrote:
Am I using time right to measure it?
No, you're timing the cat only.
I don't think that's the case, you know. If I run the following:
[wmcdonald@stella ~]$ ls -lh /tmp/messages.1 -rw-r--r-- 1 root root 4.3M Mar 14 20:03 /tmp/messages.1 [wmcdonald@stella ~]$ time cat /tmp/messages.1 1> /dev/null
real 0m0.018s user 0m0.001s sys 0m0.017s
[wmcdonald@stella ~]$ time cat /tmp/messages.1 | grep '*.foo' 1> /dev/null
real 0m0.047s user 0m0.021s sys 0m0.026s
Running both commands repeatedly shows similar time differences, I think 'time''s timing the execution time of the whole command.
Will.
You could always try using pflogsum and season it to taste.
Cheers,
Will McDonald wrote:
On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:
Ryan Simpkins wrote:
Am I using time right to measure it?
No, you're timing the cat only.
I don't think that's the case, you know. If I run the following:
[summer@bilby ~]$ time sleep 10s;sleep 10s
real 0m10.011s user 0m0.000s sys 0m0.003s [summer@bilby ~]$ time sleep 10s|sleep 10s
real 0m10.002s user 0m0.001s sys 0m0.003s [summer@bilby ~]$ time sleep 10s | time sleep 10s 0.00user 0.00system 0:09.99elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+150minor)pagefaults 0swaps
real 0m10.011s user 0m0.000s sys 0m0.006s [summer@bilby ~]$
[wmcdonald@stella ~]$ ls -lh /tmp/messages.1 -rw-r--r-- 1 root root 4.3M Mar 14 20:03 /tmp/messages.1 [wmcdonald@stella ~]$ time cat /tmp/messages.1 1> /dev/null
real 0m0.018s user 0m0.001s sys 0m0.017s
[wmcdonald@stella ~]$ time cat /tmp/messages.1 | grep '*.foo' 1> /dev/null
real 0m0.047s user 0m0.021s sys 0m0.026s
Running both commands repeatedly shows similar time differences, I think 'time''s timing the execution time of the whole command.
I think that writing to a pipe is more expensive than writing to /dev/null. Needs buffering etc.
Try
time cat /tmp/messages.1 | grep \ | grep '*.foo' 1> /dev/null
On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:
Will McDonald wrote:
On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:
Ryan Simpkins wrote:
Am I using time right to measure it?
No, you're timing the cat only.
I don't think that's the case, you know. If I run the following:
[summer@bilby ~]$ time sleep 10s;sleep 10s
real 0m10.011s user 0m0.000s sys 0m0.003s [summer@bilby ~]$ time sleep 10s|sleep 10s
real 0m10.002s user 0m0.001s sys 0m0.003s [summer@bilby ~]$ time sleep 10s | time sleep 10s 0.00user 0.00system 0:09.99elapsed 0%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+150minor)pagefaults 0swaps
real 0m10.011s user 0m0.000s sys 0m0.006s [summer@bilby ~]$
I sit corrected, thanks John. A subshell appears to show the expected behaviour...
[wmcdonald@stella ~]$ time $(sleep 10s; sleep 10s)
real 0m20.011s user 0m0.001s sys 0m0.006s
Will.
On Wed, 2007-03-14 at 22:35 +0000, Will McDonald wrote:
On 14/03/07, John Summerfield debian@herakles.homelinux.org wrote:
Ryan Simpkins wrote:
Am I using time right to measure it?
No, you're timing the cat only.
Correct.
I don't think that's the case, you know. If I run the following:
[wmcdonald@stella ~]$ ls -lh /tmp/messages.1 -rw-r--r-- 1 root root 4.3M Mar 14 20:03 /tmp/messages.1 [wmcdonald@stella ~]$ time cat /tmp/messages.1 1> /dev/null
real 0m0.018s user 0m0.001s sys 0m0.017s
[wmcdonald@stella ~]$ time cat /tmp/messages.1 | grep '*.foo' 1> /dev/null
real 0m0.047s user 0m0.021s sys 0m0.026s
Running both commands repeatedly shows similar time differences, I think 'time''s timing the execution time of the whole command.
I'm unsure, but I think that's wrong?
Try time { cat /tmp/messages.1 | grep '*.foo' 1> /dev/null ; }
This avoids certain overheads and does the job OP really wants?
Will.
<snip sig stuff>
-- Bill
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:
On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
You can avoid the unnecessary 'cat' by just passing the filename to grep directly:
# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog
Would probably be more efficient and faster, you can test with 'time' to verify
this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.
Am I using time right to measure it?
Yep.
# time cat /var/log/maillog | grep check_relay | grep -c njabl 8
real 0m0.299s user 0m0.289s sys 0m0.009s
# time grep -c 'check_relay.*njabl' /var/log/maillog 8
real 0m0.404s user 0m0.402s sys 0m0.000s
Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'?
It should be the time taken for the command line to execute.
I also tried this: time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8
real 0m0.325s user 0m0.312s sys 0m0.012s
time echo `grep -c 'check_relay.*njabl' /var/log/maillog` 8
real 0m0.411s user 0m0.408s sys 0m0.002s
I ran these several times mixed back and forth to try and see if they were flukes, these numbers appear to be representitive of the average. What do you get on your system? Maybe passing the file name to grep gets faster as the file size increases?
wc /var/log/maillog 12323 142894 1588860 /var/log/maillog
I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more expensive match:
I think you're correct, that regexp wildcard is slower. I've done similar cat/grep/awk tests myself and in *some* cases using awk's pattern matching '/foo/ { awkstuff }' has been quicker than grep so it's always worth running the numbers a couple of times to see what's most effective for a given/typical dataset.
The removal of the redundant cat still stands though. There really is no conceivable benefit to forking that additional process. I don't think, anyway. :)
And of course, when you start to loop through running
for i in `list of stuff` do grep blah | grep -c snee done
for example, depending on the number of iterations through the loop it's worth thinking about how you're doing stuff. There is an element of early overoptimisation mind, if something's working on a box that's NOT heavily loaded then don't sweat it.
Will.
On Wed, March 14, 2007 16:16, Will McDonald wrote:
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:
On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed):
On 14/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:
Try doing a simple 'cat /var/log/maillog | grep -c check_relay'
You can avoid the unnecessary 'cat' by just passing the filename to grep
directly:
# grep -c 'checK_relay.*spamhaus' /var/log/maillog # grep -c 'checK_relay.*spamcop' /var/log/maillog # grep -c 'checK_relay.*njabl' /var/log/maillog
Would probably be more efficient and faster, you can test with 'time' to verify
this. You're spawning one process 'grep', instead of three seperate processes, 'cat, 'grep' and 'grep' again.
Am I using time right to measure it?
I see from other posts I wasn't using it right. So I re-wrote and tested again on the same system, about the same log size:
########################## $ cat timetest1 #!/bin/bash
for x in `seq 1 3000`; do cat /var/log/maillog | grep check_relay | grep -c njabl > /dev/null done
$ time ./timetest1
real 0m36.685s user 0m12.505s sys 0m24.136s
########################## $ cat timetest2 #!/bin/bash
for x in `seq 1 3000`; do grep -c 'check_relay.*njabl' /var/log/maillog > dev/null done
$ time ./timetest2
real 2m57.914s user 2m50.574s sys 0m7.134s
########################## $ cat timetest3 #!/bin/bash
for x in `seq 1 3000`; do grep -c njabl /var/log/maillog > dev/null done
$ time ./timetest3
real 0m13.331s user 0m6.895s sys 0m6.429s
########################## $ cat timetest4 #!/bin/bash
for x in `seq 1 3000`; do cat /var/log/maillog | grep -c njabl > /dev/null done
$ time ./timetest4
real 0m28.442s user 0m9.520s sys 0m18.905s
I think this proves the original poster right on his main point. Getting rid of the cat speeds things up quite a bit. However, it could be argued that it only matters if you are doing quite a few in a row, in this case 3000. And it further proves that doing a 'pattern*pattern' is not a good idea at all (at least not with grep).
One poster also argued on ease of coding. I typically code like thus (my brain thinking inside the '*'):
cat file | less; *yes, that is the right data, and I see the pattern I wanna match* cat file | grep pattern | less; *ahh, mistake* cat file | grep pattern2 | less; *yes, that is right, but still need to reduce* cat file | grep pattern2 | grep pattern3 | less; *yes, that is looking about right*
The alternate method?
less file; *Right data, I see the patterns* grep pattern file | less; *mistake* grep pattern2 file | less; *right, time to reduce* grep pattern2+pattern3 file | less; *Yes, that is right*
What I don't like about the alternate method is where the file name lives in the first two lines between the comparison. Also, the pattern is before the file on the first grep, making it harder to adjust the pattern (which some of us need to do quite a lot). It makes more sense to me to just add a | on the end and keep going. Further, for me, it is easier to reduce data by stringing greps together rather than come up with the regex-fu to do it all in one pattern. Maybe if I were better at regex...
However, I 100% agree that doing strings of | produces inefficient more often. I think it is wise to go back and find efficiencies when needed.
-Ryan
On 15/03/07, Ryan Simpkins centos@ryansimpkins.com wrote:
less file; *Right data, I see the patterns* grep pattern file | less; *mistake* grep pattern2 file | less; *right, time to reduce* grep pattern2+pattern3 file | less; *Yes, that is right*
What I don't like about the alternate method is where the file name lives in the first two lines between the comparison. Also, the pattern is before the file on the first grep, making it harder to adjust the pattern (which some of us need to do quite a lot). It makes more sense to me to just add a | on the end and keep going. Further, for me, it is easier to reduce data by stringing greps together rather than come up with the regex-fu to do it all in one pattern. Maybe if I were better at regex...
Do you use bash command line shortcuts?
I have CTRL-A, CTRL-E, META-F [1], META-B and META-D ingrained in my fingers which eases the pain of things not being *quite* where you want them.
CTRL-A - jump to beginning of line (like HOME if your terminal's setup right) CTRL-E - jump to end of line (like END if your terminal's setup right) META-F - forward one word at a time, like 'w' in Vi. META-B - backward one word at a time, like 'b' in Vi. META-D - delete one word, like 'dw' in Vi.
There are more but learning those couple be heart really helps me, even on misconfigured terminals[2]. For example, to change the 'file' element in...
$ something /path/to/file | alskhflkasdflasjdfljk | lajkdhflakjsdflkasjd | alsdjkhflasdjkhf
CTRL-A ALT-F ALT-F ALT-D start typing replacement filename. Which is much easier that it looks when actually typed out. :)
Will.
[1] Typically ALT [2] Other people's obviously :)
Ryan Simpkins wrote:
However, I 100% agree that doing strings of | produces inefficient more often. I think it is wise to go back and find efficiencies when needed.
Which efficiency is more important, yours or the computer's?
How about the time spent deciding which way's better;-)
btw The simple answers to the above are wrong. One person waiting once for a computer for five minutes too long isn't a great problem. 100,000 people waiting one minute too long is an enormous problem.
People, thanks to all for your posts. I understand some of them, some i don't 'cause i don't do scripting a lot. However you have helped me to solve my main issue.
Thanks again, you have made my day :-))
On 3/15/07, John Summerfield debian@herakles.homelinux.org wrote:
Ryan Simpkins wrote:
However, I 100% agree that doing strings of | produces inefficient more often. I think it is wise to go back and find efficiencies when needed.
Which efficiency is more important, yours or the computer's?
How about the time spent deciding which way's better;-)
btw The simple answers to the above are wrong. One person waiting once for a computer for five minutes too long isn't a great problem. 100,000 people waiting one minute too long is an enormous problem.
--
Cheers John
-- spambait 1aaaaaaa@coco.merseine.nu Z1aaaaaaa@coco.merseine.nu
Please do not reply off-list _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos