On 14/03/07, Ryan Simpkins <centos at ryansimpkins.com> wrote: > On Wed, March 14, 2007 14:08, Will McDonald wrote (trimmed): > > On 14/03/07, Ryan Simpkins <centos at ryansimpkins.com> wrote: > >> Try doing a simple 'cat /var/log/maillog | grep -c check_relay' > > > > You can avoid the unnecessary 'cat' by just passing the filename to grep directly: > > > > # grep -c 'checK_relay.*spamhaus' /var/log/maillog > > # grep -c 'checK_relay.*spamcop' /var/log/maillog > > # grep -c 'checK_relay.*njabl' /var/log/maillog > > > > Would probably be more efficient and faster, you can test with 'time' to verify > this. You're spawning one process 'grep', instead of three seperate processes, > 'cat, 'grep' and 'grep' again. > > Am I using time right to measure it? Yep. > # time cat /var/log/maillog | grep check_relay | grep -c njabl > 8 > > real 0m0.299s > user 0m0.289s > sys 0m0.009s > > # time grep -c 'check_relay.*njabl' /var/log/maillog > 8 > > real 0m0.404s > user 0m0.402s > sys 0m0.000s > > Is the first 'time' measuring the whole one-liner, or just the time it takes to 'cat'? It should be the time taken for the command line to execute. > I also tried this: > time echo `cat /var/log/maillog | grep check_relay | grep -c njabl` 8 > > real 0m0.325s > user 0m0.312s > sys 0m0.012s > > time echo `grep -c 'check_relay.*njabl' /var/log/maillog` > 8 > > real 0m0.411s > user 0m0.408s > sys 0m0.002s > > I ran these several times mixed back and forth to try and see if they were flukes, > these numbers appear to be representitive of the average. What do you get on your > system? Maybe passing the file name to grep gets faster as the file size increases? > > wc /var/log/maillog > 12323 142894 1588860 /var/log/maillog > > I wonder if the issue here is actually the 'stuff*morestuff' as that might be a more > expensive match: I think you're correct, that regexp wildcard is slower. I've done similar cat/grep/awk tests myself and in *some* cases using awk's pattern matching '/foo/ { awkstuff }' has been quicker than grep so it's always worth running the numbers a couple of times to see what's most effective for a given/typical dataset. The removal of the redundant cat still stands though. There really is no conceivable benefit to forking that additional process. I don't think, anyway. :) And of course, when you start to loop through running for i in `list of stuff` do grep blah | grep -c snee done for example, depending on the number of iterations through the loop it's worth thinking about how you're doing stuff. There is an element of early overoptimisation mind, if something's working on a box that's NOT heavily loaded then don't sweat it. Will.