[CentOS] Text Proccessing script - advice?
Les Mikesell
lesmikesell at gmail.com
Tue Dec 21 18:43:33 UTC 2010
On 12/21/2010 11:30 AM, Roland RoLaNd wrote:
>
> Hello,
>
> I have a log file with the following input:
> X , ID , Date, Time, Y
> 01,01368,2010-12-02,09:07:00,Pass
> 01,01368,2010-12-02,10:54:00,Pass
> 01,01368,2010-12-02,13:07:04,Pass
> 01,01368,2010-12-02,18:54:01,Pass
> 01,01368,2010-12-03,09:02:00,Pass
> 01,01368,2010-12-03,13:53:00,Pass
> 01,01368,2010-12-03,16:07:00,Pass
>
> My goal is to get the number of times ID has a TIME that's after 09:00:00 each DATE.
> That would give me two output. one is the number of days ID has been late, and secondly, the day and time this ID has been late .
>
> I've started as such:
>
> sort -t ',' -k 3,3 -k 4,4 file.log # this will sort the file according to the DATE field as well as the Time fileld.
> I'm stuck for the last 30 min to find a way to get the first line of each day (logically it'll be the earliest as i've sorted by date/time previously) once i know how to do this, i'll be able to compare time and proceed..
>
> Can any one help ?
> i looked into sort - u and uniq -f3 though i didnt get far with it..
Most logs are written in append mode so ascending date/time comes
naturally. This perl should list each instance and the count:
my %id_count;
my %id_date; #date already seen;
while (<>) {
my ($x,$id,$date,$time) = split /,/;
next if ($x == 'X'); #skip header
next if ($time le "09:00:00");
next if ($id_date{$id} eq $date);
$id_date{$id} = $date;
print "$id - $date - $time\n";
$id_count{$id}++;
}
print "----\n";
while (( my $id,$count) = each(%id_count)) {
print "$id late $count days\n";
}
--
Les Mikesell
lesmikesell at gmail.com
More information about the CentOS
mailing list