[CentOS] Text Proccessing script - advice?

Tue Dec 21 18:43:33 UTC 2010
Les Mikesell <lesmikesell at gmail.com>

On 12/21/2010 11:30 AM, Roland RoLaNd wrote:
>
> Hello,
>
> I have a log file with the following input:
> X , ID , Date, Time, Y
> 01,01368,2010-12-02,09:07:00,Pass
> 01,01368,2010-12-02,10:54:00,Pass
> 01,01368,2010-12-02,13:07:04,Pass
> 01,01368,2010-12-02,18:54:01,Pass
> 01,01368,2010-12-03,09:02:00,Pass
> 01,01368,2010-12-03,13:53:00,Pass
> 01,01368,2010-12-03,16:07:00,Pass
>
> My goal is to get the number of times ID has a TIME that's after 09:00:00 each DATE.
> That would give me two output. one is the number of days ID has been late, and secondly, the day and time this ID has been late .
>
> I've started as such:
>
> sort -t ','  -k 3,3 -k 4,4  file.log  # this will sort the file according to the DATE field as well as the Time fileld.
> I'm stuck for the last 30 min to find a way to get the first line of each day (logically it'll be the earliest as i've sorted by date/time previously) once i know how to do this, i'll be able to compare time and proceed..
>
> Can any one help ?
> i looked into sort - u and uniq -f3 though i didnt get far with it..

Most logs are written in append mode so ascending date/time comes 
naturally.  This perl should list each instance and the count:

my %id_count;
my %id_date; #date already seen;
while (<>) {
my  ($x,$id,$date,$time) = split /,/;
   next if ($x == 'X'); #skip header
   next if ($time le  "09:00:00");
   next if ($id_date{$id} eq $date);
   $id_date{$id} = $date;
   print "$id - $date - $time\n";
   $id_count{$id}++;
}
print "----\n";
while (( my $id,$count) = each(%id_count)) {
print "$id late $count days\n";
}


-- 
   Les Mikesell
    lesmikesell at gmail.com