On Friday, June 17, 2011 04:54:09 PM Les Mikesell wrote:
Yes - and then some. You need something to waste those cycles that you save by running awk instead of perl.
Well..... I once wrote an awk script to parse a RealAudio server log to determine simultaneous listeners at a given time. This is harder than it sounds, because at the time RA server (now Helix Server) logged the listener's entry at connection close; you had to work backwards in time based on connected bitrate and the number of bytes logged in the log entry to lay out simultaneous listeners (and thus 'stream hours') for the music licensing organizations' fee calculations.
The awk script took forever (it seemed) to run, and used lots of memory. Even though it seems like you could just take a point in time and use a window on the log file to get the connections active at that time, in practice that missed 'long term' listeners (listeners that connected, and left the connection up for sometimes weeks at a time! I think our record was one listener who kept a connection up for two months. So it was really necessary to pull in many logs, and run the reports multiple times so that connected users would be counted. I got in the habit of restarting Helix whenever I needed to do an analysis run so that I could make sure all connections were accounted for.
I used an automated 'awk to perl' translator (meaning it wasn't idiomatic perl, but 'perl in awk mode' being run) to get the perl equivalent, and it was nearly a hundred times faster, and used one-tenth the memory, of the awk script. Idiomatic perl I would suspect would be faster still.
YMMV.