Woodchuck wrote: > On Thu, Jun 28, 2012 at 01:30:33PM -0500, Sean Carolan wrote: >> This snippet of code pulls an array of hostnames from some log files. >> It has to parse around 3GB of log files, so I'm keen on making it as >> efficient as possible. Can you think of any way to optimize this to >> run faster? > > If the key phrase is *as efficient as possible*, then I would say > you want a compiled pattern search. Lex is the tool for this, and That, to me, would be a Big Deal. <snip> > BTW, you could easily incorporate a sorting function in lex that > would eliminate the need for an external sort. This might be done in awk, > too, but in lex it would be more natural. You simply would not <snip> Hello, mark, wake up. Of course, there's an even easier way, just using awk: awk '{if (/[-\.0-9a-z][-\.0-9a-z]*.com/) { hostarray[$9] = 1;}} END { for (i in hostarray ) { print i;}}' This dumps it into an associative array - that's one whose indices are a string - so it will by default be in order. mark