Hello,
I have a log file with the following input: X , ID , Date, Time, Y 01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass 01,01368,2010-12-03,16:07:00,Pass
My goal is to get the number of times ID has a TIME that's after 09:00:00 each DATE. That would give me two output. one is the number of days ID has been late, and secondly, the day and time this ID has been late .
I've started as such:
sort -t ',' -k 3,3 -k 4,4 file.log # this will sort the file according to the DATE field as well as the Time fileld. I'm stuck for the last 30 min to find a way to get the first line of each day (logically it'll be the earliest as i've sorted by date/time previously) once i know how to do this, i'll be able to compare time and proceed..
Can any one help ? i looked into sort - u and uniq -f3 though i didnt get far with it..
sort -t ','? -k 3,3 -k 4,4? file.log? # this will sort the file according to the DATE field as well as the Time fileld. I'm stuck for the last 30 min to find a way to get the first line of each day (logically it'll be the earliest as i've sorted by date/time previously) once i know how to do this, i'll be able to compare time and proceed..
If you're not afraid of perl, the Date-Manip module allows comparing time and date, among other things.
--------------------------------------------------------------- This message and any attachments may contain Cypress (or its subsidiaries) confidential information. If it has been received in error, please advise the sender and immediately delete this message. ---------------------------------------------------------------
On Tue, Dec 21, 2010 at 2:33 PM, lhecking@users.sourceforge.net wrote:
If you're not afraid of perl, the Date-Manip module allows comparing time and date, among other things.
A dirtier take could be
perl -ne '/,(\d+),(.*),(\d\d):.*/ && ($3>=9) and $s->{$1,$2}++ ; END {use Data::Dumper; print Dumper($s)}' < data $VAR1 = { '01368 2010-12-02' => 4, '01368 2010-12-03' => 3 };
Roland RoLaNd wrote:
I have a log file with the following input: X , ID , Date, Time, Y 01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass 01,01368,2010-12-03,16:07:00,Pass
My goal is to get the number of times ID has a TIME that's after 09:00:00 each DATE. That would give me two output. one is the number of days ID has been late, and secondly, the day and time this ID has been late .
awk 'BEGIN { FS=",";} \ { if ( $4 > "09:00:00" ) { array[ $2 ][1]++; array[ $2 ][ array[$2][1] + 1] = $3 "::" $4; } } END { for j in array { for k in array[j] { print j, array[j][k]; } } }
It's been a while since I needed to do this, but I *think* the nested "for <var> in array" will work. <snip> mark
First of all i'd like to appologize for those who helped me by giving an advice using "perl" i'm ashamed to say that i have no experience with it.
Mark, thanks for your effort in writing the below though could you help me understand how it goes ? the best way to do thigns, is to learn them for future references.
I'm no expert with AWK, so i need your help with the below if possible:
awk 'BEGIN { FS=",";} \ ## awk -f begin triggers the afterwords commands to be executed in awk, with , as field delimiter { if ( $4 > "09:00:00" ) { # condition that matched 09 am array[ $2 ][1]++; # incrementing count by one though im a bit at a loss with "array" array[ $2 ][ array[$2][1] + 1] = $3 "::" $4; } # couldn't figure it out } END { for j in array { for k in array[j] { print j, array[j][k]; # prints out what exactly? } } }
----------------------------------------
Date: Tue, 21 Dec 2010 12:58:33 -0500 From: m.roth@5-cent.us To: centos@centos.org Subject: Re: [CentOS] Text Proccessing script - advice?
Roland RoLaNd wrote:
I have a log file with the following input: X , ID , Date, Time, Y 01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass 01,01368,2010-12-03,16:07:00,Pass
My goal is to get the number of times ID has a TIME that's after 09:00:00 each DATE. That would give me two output. one is the number of days ID has been late, and secondly, the day and time this ID has been late .
awk 'BEGIN { FS=",";} \ { if ( $4 > "09:00:00" ) { array[ $2 ][1]++; array[ $2 ][ array[$2][1] + 1] = $3 "::" $4; } } END { for j in array { for k in array[j] { print j, array[j][k]; } } }
It's been a while since I needed to do this, but I *think* the nested "for in array" will work.
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Tue, Dec 21, 2010 at 08:30:43PM +0200, Roland RoLaNd wrote:
(chuckle) That's a bit more verbose than necessary. As a one-liner:
awk -F, '($4>"09:00:00"){c[$2 "," $3]++};END{for (i in c){print i "," c[i]}}' $filename
01368,2010-12-02,4 01368,2010-12-03,3
(You might check if you want >="09:00:00", and include the edge case.)
-F, # set separator to comma
# (automatic loop over all data lines) ($4>"09:00:00"){ # do if fourth field greater than 09:... c[$2 "," $3]++ # increment hash element pointed to by # second and third fields separated by comma # (that is, hash on id,date)
END{ # after finishing the data for (i in c){ # for each observed hash value in array c print i "," c[i] # print the hash value, comma, count
John Lundin wrote:
On Tue, Dec 21, 2010 at 08:30:43PM +0200, Roland RoLaNd wrote:
(chuckle) That's a bit more verbose than necessary. As a one-liner:
awk -F, '($4>"09:00:00"){c[$2 "," $3]++};END{for (i in c){print i "," c[i]}}' $filename
Well, yes, but he also wanted a count....
mark
01368,2010-12-02,4 01368,2010-12-03,3
(You might check if you want >="09:00:00", and include the edge case.)
-F, # set separator to comma
# (automatic loop over all data lines)
($4>"09:00:00"){ # do if fourth field greater than 09:... c[$2 "," $3]++ # increment hash element pointed to by # second and third fields separated by comma # (that is, hash on id,date)
END{ # after finishing the data for (i in c){ # for each observed hash value in array c print i "," c[i] # print the hash value, comma, count
-- lundin@fini.net _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Thanks to your help i've reached this step:
original data:
01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass 01,01368,2010-12-03,16:07:00,Pass
awk -F , '{if ($4 > "09:10:00") print $2 " was late on", $3 " by coming at ",$4}' test | tee DaysLate ; wc -l DaysLate
01368 was late on 2010-12-02 by coming at 10:54:00
01368 was late on 2010-12-02 by coming at 13:07:04
01368 was late on 2010-12-02 by coming at 18:54:01
01368 was late on 2010-12-03 by coming at 13:53:00
01368 was late on 2010-12-03 by coming at 16:07:00
5 DaysLate
the only thing missing is to find a way to just take the earliest time of each day.
in other words the above output should be:
0 DaysLate # as on 12-02 he came in at 09:07 which is before 09:10 and on 12-03 he came in at 09:02 which is also before the set time
----------------------------------------
Date: Tue, 21 Dec 2010 14:35:13 -0500 From: m.roth@5-cent.us To: centos@centos.org Subject: Re: [CentOS] Text Proccessing script - advice?
John Lundin wrote:
On Tue, Dec 21, 2010 at 08:30:43PM +0200, Roland RoLaNd wrote:
(chuckle) That's a bit more verbose than necessary. As a one-liner:
awk -F, '($4>"09:00:00"){c[$2 "," $3]++};END{for (i in c){print i "," c[i]}}' $filename
Well, yes, but he also wanted a count....
mark
01368,2010-12-02,4 01368,2010-12-03,3
(You might check if you want >="09:00:00", and include the edge case.)
-F, # set separator to comma
# (automatic loop over all data lines) ($4>"09:00:00"){ # do if fourth field greater than 09:... c[$2 "," $3]++ # increment hash element pointed to by # second and third fields separated by comma # (that is, hash on id,date)
END{ # after finishing the data for (i in c){ # for each observed hash value in array c print i "," c[i] # print the hash value, comma, count
-- lundin@fini.net _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 12/21/2010 1:40 PM, Roland RoLaNd wrote:
awk -F , '{if ($4> "09:10:00") print $2 " was late on", $3 " by coming at ",$4}' test | tee DaysLate ; wc -l DaysLate
01368 was late on 2010-12-02 by coming at 10:54:00
01368 was late on 2010-12-02 by coming at 13:07:04
01368 was late on 2010-12-02 by coming at 18:54:01
01368 was late on 2010-12-03 by coming at 13:53:00
01368 was late on 2010-12-03 by coming at 16:07:00
5 DaysLate
On my calendar 12-02 and 12-03 are only 2 days...
Exactly, hence:
[quote] the only thing missing is to find a way to just take the earliest time of each day.
in other words the above output should be:
0 DaysLate
[/quote]
----------------------------------------
Date: Tue, 21 Dec 2010 13:54:41 -0600 From: lesmikesell@gmail.com To: centos@centos.org Subject: Re: [CentOS] Text Proccessing script - advice?
On 12/21/2010 1:40 PM, Roland RoLaNd wrote:
awk -F , '{if ($4> "09:10:00") print $2 " was late on", $3 " by coming at ",$4}' test | tee DaysLate ; wc -l DaysLate
01368 was late on 2010-12-02 by coming at 10:54:00
01368 was late on 2010-12-02 by coming at 13:07:04
01368 was late on 2010-12-02 by coming at 18:54:01
01368 was late on 2010-12-03 by coming at 13:53:00
01368 was late on 2010-12-03 by coming at 16:07:00
5 DaysLate
On my calendar 12-02 and 12-03 are only 2 days...
-- Les Mikesell lesmikesell@gmail.com
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On 12/21/2010 1:58 PM, Roland RoLaNd wrote:
the only thing missing is to find a way to just take the earliest time of each day.
in other words the above output should be:
0 DaysLate
That means my perl script was wrong... This looks more like what you want, except for your last change to 9:10.
my %id_count; my %id_date; #date already seen; my %iddate_time; #1st time each day while (<>) { my ($x,$id,$date,$time,$junk) = split /,/; next if ($x == 'X'); #skip header $iddate_time{$id . $date} = $time unless ($iddate_time{$id . $date}); #store earliest next if ($time le "09:00:00"); # not late $t = $iddate_time{$id . $date}; next if ($iddate_time{$id . $date} le "09:00:00"); # 1st wasn't late next if ($id_date{$id} eq $date); # already counted today print "Late: $id - $date - $time\n"; $id_count{$id}++; $id_date{$id} = $date; } print "----\n"; while (( my $id,$count) = each(%id_count)) { print "$id late $count days\n"; }
On Tue, Dec 21, 2010 at 09:40:42PM +0200, Roland RoLaNd wrote:
original data:
01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass 01,01368,2010-12-03,16:07:00,Pass
the only thing missing is to find a way to just take the earliest time of each day.
You may use mktime(datespec) (see man awk) to covert date and time into comparable integers.
Mihai
On Tue, Dec 21, 2010 at 02:35:13PM -0500, m.roth@5-cent.us wrote:
John Lundin wrote:
On Tue, Dec 21, 2010 at 08:30:43PM +0200, Roland RoLaNd wrote:
[...]
Well, yes, but he also wanted a count....
Oh, lord, it's worse than that. I was solving the wrong problem. (And still am if he really wanted a count of after-nine entries.)
Once again with awk one-liners:
awk -F, '{k=$2 "," $3};(!e[k]||($4<e[k])){e[k]=$4}\ ;END{for (i in e){if (e[i]>"09:00:00"){print i "," e[i]}}}' infile \ |tee latedays\ |awk -F, '{c[$1]++};END{for (i in c){print i "," c[i]}}' >latecounts
01368,2010-12-02,09:07:00 01368,2010-12-03,09:02:00
01368,2
You may now wince.
If earliest time seen for user and date is undefined or if this time is less, then set earliest time to this time. After all processed, print out the user, date and time if it's later than 09:00:00.
Second awk script just counts lines reported above, by user.
(I usually switch to perl or at least a bash script file before it gets this unreadable. And add some sanity testing.)
John Lundin wrote:
On Tue, Dec 21, 2010 at 02:35:13PM -0500, m.roth@5-cent.us wrote:
John Lundin wrote:
On Tue, Dec 21, 2010 at 08:30:43PM +0200, Roland RoLaNd wrote:
[...]
Well, yes, but he also wanted a count....
Oh, lord, it's worse than that. I was solving the wrong problem. (And still am if he really wanted a count of after-nine entries.)
Once again with awk one-liners:
Why? What do you have against more-than-one-line awk scripts?
asks the guy who's written 100 and 200 line awk scripts....
On 12/21/2010 11:30 AM, Roland RoLaNd wrote:
Hello,
I have a log file with the following input: X , ID , Date, Time, Y 01,01368,2010-12-02,09:07:00,Pass 01,01368,2010-12-02,10:54:00,Pass 01,01368,2010-12-02,13:07:04,Pass 01,01368,2010-12-02,18:54:01,Pass 01,01368,2010-12-03,09:02:00,Pass 01,01368,2010-12-03,13:53:00,Pass 01,01368,2010-12-03,16:07:00,Pass
My goal is to get the number of times ID has a TIME that's after 09:00:00 each DATE. That would give me two output. one is the number of days ID has been late, and secondly, the day and time this ID has been late .
I've started as such:
sort -t ',' -k 3,3 -k 4,4 file.log # this will sort the file according to the DATE field as well as the Time fileld. I'm stuck for the last 30 min to find a way to get the first line of each day (logically it'll be the earliest as i've sorted by date/time previously) once i know how to do this, i'll be able to compare time and proceed..
Can any one help ? i looked into sort - u and uniq -f3 though i didnt get far with it..
Most logs are written in append mode so ascending date/time comes naturally. This perl should list each instance and the count:
my %id_count; my %id_date; #date already seen; while (<>) { my ($x,$id,$date,$time) = split /,/; next if ($x == 'X'); #skip header next if ($time le "09:00:00"); next if ($id_date{$id} eq $date); $id_date{$id} = $date; print "$id - $date - $time\n"; $id_count{$id}++; } print "----\n"; while (( my $id,$count) = each(%id_count)) { print "$id late $count days\n"; }