Woodchuck wrote:
> On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote:
>> This snippet of code pulls an array of hostnames from some log files.
>> It has to parse around 3GB of log files, so I'm keen on making it as
>> efficient as possible. Can you think of any way to optimize this to
>> run faster?
>
> If the key phrase is *as efficient as possible*, then I would say
> you want a compiled pattern search. Lex is the tool for this, and
That, to me, would be a Big Deal.
<snip>
> BTW, you could easily incorporate a sorting function in lex that
> would eliminate the need for an external sort. This might be done in awk,
> too, but in lex it would be more natural. You simply would not
<snip>
Hello, mark, wake up.
Of course, there's an even easier way, just using awk:
awk '{if (/[-\.0-9a-z][-\.0-9a-z]*.com/) { hostarray[$9] = 1;}} END { for
(i in hostarray ) { print i;}}'
This dumps it into an associative array - that's one whose indices are a
string - so it will by default be in order.
mark