[CentOS-mirror] big_bw: Bandwidth Reporting Script

Thu May 1 19:43:01 UTC 2008
Gilbert E. Detillieux <gedetil at cs.umanitoba.ca>

On 2008-05-01 11:13, greg at raystedman.org wrote:
> Good Morning,
> 
> The discussion about "RayStedman.org Bandwidth" inspired me to write a
> script that reports who the largest bandwidth consumers by ip address and
> host name.  The report looks like this:
> 
>   3,867,534,553    66.159.202.142   adsl-66-159-202-142.dslextreme.com.
>   3,847,010,060     190.82.182.19   190-82-182-19.adsl.cust.tie.cl.
>   1,410,308,739   130.160.110.250
>   1,051,088,947     216.57.200.57  
> 
> I'm sure this kind of thing has been done many times in the past by other
> tools. I thought I would post the script I created just in case it might be
> helpful to others on this forum.
> 
> Thanks again for your feedback on this topic.  Greg

Thanks, Greg, for the handy script.  There are, of course, some 
optimizations that are possible, e.g. to eliminate use of temporary 
files, decrease the number of commands, etc.  There's also a slight bug 
in your first loop, that results in the last "$thisipbw $thisip" pair 
not being output at the end.

Here's my slightly obfuscated one-liner, which I think accomplishes more 
or less the same thing as your script.  (I've broken it into multiple 
lines, with indents, for readability.) ...

cat "$basedir"/access_log{,.processed}|
   cut -d' ' -f1,10|awk '{b[$1]+=$2}END{for(i in b)print b[i],i}'|
   sort -nr|head -20|
while read b i;do
   echo -n "$b"|sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta'|
     sed -e :a -e 's/^.\{1,14\}$/ &/;ta';
   echo -n "$i"|sed -e :a -e 's/^.\{1,15\}$/ &/;ta';
   echo -n " ";echo `host "$i"`|sed -e 's/.*)//' -e 's/.*pointer //';
done

I use the "awk" command (and its associative array feature) to eliminate 
your first loop entirely.  The second loop (to format the output) has 
been simplified.  Your use of "sed" to format the numbers and pad the 
fields was very clever (and I copied it pretty much as is).  I really 
have to go back and study all the new regular expression features that 
have been added since the "good old days" when I first picked this up.

Gilbert

> #!/bin/bash
> 
> # big_bw -- written by Greg Sims 05/01/08
> 
> # this script takes as input apache httpd log files access_log and
> # access_log.processed. a report is generated that contains one line
> # per ip address with the following fields: bandwidth consumed,
> # the ip address and the host name associated with the ip address.
> #
> # it is important to use mod_logio in the creation of the log files
> # to ensure the proper number of bytes are recorded in each log
> # entry.  please see http://www.devside.net/guides/config/bytes-sent
> # how to accomplish this.
> 
> # directory where access_log and access_log.processed are located
> #
> basedir="/var/www/vhosts/raystedman.net/statistics/logs/"
> 
> # create bw.raw containing the ip address and bandwidth for each record;
> # sort the resulting file by ip address
> #
> cd /tmp
> cat $basedir"access_log" >bw.log
> cat $basedir"access_log.processed" >>bw.log
> 
> cat bw.log | cut -d' ' --field=1,10 | sort >bw.raw
> 
> # read through bw.raw and create bw.sum which contains one line per
> # ip address.  each line in bw.sum contains the amount of bandwidth
> # consumed and the ip address that used the bandwidth
> #
> thisip=""
> rm -f bw.sum
> 
> while read inputline;
> do
>   ip=$(echo "$inputline" | cut -d " " -f 1)
>   bw=$(echo "$inputline" | cut -d " " -f 2)
>   if [ "$bw" = "-" ];
>   then
>     bw=0
>   fi
> 
>   if [ "$thisip" != "$ip" ];
>   then
>     echo $thisipbw $thisip >>bw.sum
>     thisip=$ip
>     thisipbw=$bw
>   else
>     if [ $bw != "-" ];
>     then
>       thisipbw=$(( $thisipbw + $bw ))
>     fi
>   fi
> 
> done < "bw.raw"
> 
> # sort bw.sum so the largest amount of bandwidth used is at the top.
> # create bw.sum.sort which is the largest 35 consumers of bandwidth.
> # write a report to stdout doing some formatting in the process.
> #
> sort -nr bw.sum | head -n 35 >bw.sum.sort
> 
> while read inputline;
> do
>   bw=$(echo "$inputline" | cut -d " " -f 1)
>   bw=$(echo "$bw" | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta')
>   ip=$(echo "$inputline" | cut -d " " -f 2)
> 
>   echo -n $bw | sed -e :a -e 's/^.\{1,14\}$/ &/;ta'
>   echo -n "  "
>   echo -n $ip | sed -e :a -e 's/^.\{1,15\}$/ &/;ta'
>   echo -n "   "
>   host_name=$(host $ip | sed 's/^.*pointer //' | sed 's/.*DOMAIN)//')
>   host_name=$(echo "$host_name" | sed 's/.*alias for //')
>   echo $host_name
> 
> done <"bw.sum.sort"

-- 
Gilbert E. Detillieux		E-mail:	<gedetil at cs.umanitoba.ca>
Dept. of Computer Science	Web:	http://www.cs.umanitoba.ca/~gedetil/
University of Manitoba		Phone:	(204)474-8161
Winnipeg MB CANADA  R3T 2N2	Fax:	(204)474-7609