[CentOS-mirror] big_bw: Bandwidth Reporting Script

Thu May 1 16:13:17 UTC 2008
greg at raystedman.org <greg at raystedman.org>

Good Morning,

The discussion about "RayStedman.org Bandwidth" inspired me to write a
script that reports who the largest bandwidth consumers by ip address and
host name.  The report looks like this:

  3,867,534,553    66.159.202.142   adsl-66-159-202-142.dslextreme.com.
  3,847,010,060     190.82.182.19   190-82-182-19.adsl.cust.tie.cl.
  1,410,308,739   130.160.110.250
  1,051,088,947     216.57.200.57  

I'm sure this kind of thing has been done many times in the past by other
tools. I thought I would post the script I created just in case it might be
helpful to others on this forum.

Thanks again for your feedback on this topic.  Greg

#!/bin/bash

# big_bw -- written by Greg Sims 05/01/08

# this script takes as input apache httpd log files access_log and
# access_log.processed. a report is generated that contains one line
# per ip address with the following fields: bandwidth consumed,
# the ip address and the host name associated with the ip address.
#
# it is important to use mod_logio in the creation of the log files
# to ensure the proper number of bytes are recorded in each log
# entry.  please see http://www.devside.net/guides/config/bytes-sent
# how to accomplish this.

# directory where access_log and access_log.processed are located
#
basedir="/var/www/vhosts/raystedman.net/statistics/logs/"

# create bw.raw containing the ip address and bandwidth for each record;
# sort the resulting file by ip address
#
cd /tmp
cat $basedir"access_log" >bw.log
cat $basedir"access_log.processed" >>bw.log

cat bw.log | cut -d' ' --field=1,10 | sort >bw.raw

# read through bw.raw and create bw.sum which contains one line per
# ip address.  each line in bw.sum contains the amount of bandwidth
# consumed and the ip address that used the bandwidth
#
thisip=""
rm -f bw.sum

while read inputline;
do
  ip=$(echo "$inputline" | cut -d " " -f 1)
  bw=$(echo "$inputline" | cut -d " " -f 2)
  if [ "$bw" = "-" ];
  then
    bw=0
  fi

  if [ "$thisip" != "$ip" ];
  then
    echo $thisipbw $thisip >>bw.sum
    thisip=$ip
    thisipbw=$bw
  else
    if [ $bw != "-" ];
    then
      thisipbw=$(( $thisipbw + $bw ))
    fi
  fi

done < "bw.raw"

# sort bw.sum so the largest amount of bandwidth used is at the top.
# create bw.sum.sort which is the largest 35 consumers of bandwidth.
# write a report to stdout doing some formatting in the process.
#
sort -nr bw.sum | head -n 35 >bw.sum.sort

while read inputline;
do
  bw=$(echo "$inputline" | cut -d " " -f 1)
  bw=$(echo "$bw" | sed -e :a -e 's/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/;ta')
  ip=$(echo "$inputline" | cut -d " " -f 2)

  echo -n $bw | sed -e :a -e 's/^.\{1,14\}$/ &/;ta'
  echo -n "  "
  echo -n $ip | sed -e :a -e 's/^.\{1,15\}$/ &/;ta'
  echo -n "   "
  host_name=$(host $ip | sed 's/^.*pointer //' | sed 's/.*DOMAIN)//')
  host_name=$(echo "$host_name" | sed 's/.*alias for //')
  echo $host_name

done <"bw.sum.sort"