Woodchuck wrote:
On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote:
This snippet of code pulls an array of hostnames from some log files. It has to parse around 3GB of log files, so I'm keen on making it as efficient as possible. Can you think of any way to optimize this to run faster?
If the key phrase is *as efficient as possible*, then I would say you want a compiled pattern search. Lex is the tool for this, and
That, to me, would be a Big Deal. <snip>
BTW, you could easily incorporate a sorting function in lex that would eliminate the need for an external sort. This might be done in awk, too, but in lex it would be more natural. You simply would not
<snip> Hello, mark, wake up.
Of course, there's an even easier way, just using awk:
awk '{if (/[-.0-9a-z][-.0-9a-z]*.com/) { hostarray[$9] = 1;}} END { for (i in hostarray ) { print i;}}'
This dumps it into an associative array - that's one whose indices are a string - so it will by default be in order.
mark