I have a huge mysql.log file full of errors. I'd like to sort it by the most common line, and work from there. I did go through the manpage for sort, and googled a bit, but I found nothing relevant.
Here is an example of the output: [root@ log]# tail mysqld.log 110925 11:05:35 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_summary_ad_hourly.MYI'; try to repair it 110925 11:05:35 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_summary_ad_hourly.MYI'; try to repair it 110925 12:05:28 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_intermediate_ad.MYI'; try to repair it 110925 12:05:28 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_intermediate_ad.MYI'; try to repair it 110925 12:05:28 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_intermediate_ad.MYI'; try to repair it 110925 12:05:28 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_summary_ad_hourly.MYI'; try to repair it 110925 13:09:43 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_intermediate_ad.MYI'; try to repair it 110925 13:09:43 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_intermediate_ad.MYI'; try to repair it 110925 13:09:43 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_intermediate_ad.MYI'; try to repair it 110925 13:09:43 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_summary_ad_hourly.MYI'; try to repair it [root@ log]# wc -l mysqld.log 20686 mysqld.log [root@ log]# cat mysqld.log | grep ERROR | wc -l 20332 [root@ log]#
Is there a way to get the most common (unique) lines of the file?
By the way, I'm not sure if this is RHEL or CentOS, or which version: [root@ log]# uname -a Linux example.com 2.6.18-194.32.1.el5xen #1 SMP Wed Jan 5 18:44:24 EST 2011 x86_64 x86_64 x86_64 GNU/Linux [root@ log]# uname -o GNU/Linux [root@ log]#
I assume that it is one of these, as Yum is installed. How would I find out?
Thanks!
On 09/25/11 11:51 AM, Dotan Cohen wrote:
... 110925 13:09:43 [ERROR] /usr/libexec/mysqld: Incorrect key file for table './ox_data_summary_ad_hourly.MYI'; try to repair it [root@ log]# wc -l mysqld.log 20686 mysqld.log [root@ log]# cat mysqld.log | grep ERROR | wc -l 20332 [root@ log]#
Is there a way to get the most common (unique) lines of the file?
sort -k 3 | uniq -f 2
which will sort starting at field 3, and then print lines that are unique, skipping the first 2 fields, where fields by default are blank separated.
On Sun, Sep 25, 2011 at 22:06, John R Pierce pierce@hogranch.com wrote:
Is there a way to get the most common (unique) lines of the file?
sort -k 3 | uniq -f 2
which will sort starting at field 3, and then print lines that are unique, skipping the first 2 fields, where fields by default are blank separated.
Thanks, John. This looks to me that it will sort alphabetically, not by commonness. For instance: ERROR b ERROR a ERROR b
Since "ERROR b" was reported more often than "ERROR a", I would prefer that the output be: ERROR b ERROR a
I'm sorry for not making that so clear! Is there a good word for "most common" or "used most often" that would be concise in this context?
Thanks!
On 09/25/11 12:18 PM, Dotan Cohen wrote:
On Sun, Sep 25, 2011 at 22:06, John R Piercepierce@hogranch.com wrote:
Is there a way to get the most common (unique) lines of the file?
sort -k 3 | uniq -f 2
which will sort starting at field 3, and then print lines that are unique, skipping the first 2 fields, where fields by default are blank separated.
Thanks, John. This looks to me that it will sort alphabetically, not by commonness. For instance: ERROR b ERROR a ERROR b
Since "ERROR b" was reported more often than "ERROR a", I would prefer that the output be: ERROR b ERROR a
I'm sorry for not making that so clear! Is there a good word for "most common" or "used most often" that would be concise in this context?
uniq can count occurances. will require two sorts. one to get all similar errors adjacent, the other to sort by count order. instead of using field selects, lets just clip the timestamps off up front...
cut -c 17- | sort | uniq -c | sort -rn
(17- means from char 17 on... I may have miscounted)
On Sun, Sep 25, 2011 at 22:43, John R Pierce pierce@hogranch.com wrote:
uniq can count occurances. will require two sorts. one to get all similar errors adjacent, the other to sort by count order. instead of using field selects, lets just clip the timestamps off up front...
cut -c 17- | sort | uniq -c | sort -rn
(17- means from char 17 on... I may have miscounted)
Thank you John! That is perfect! I'm going through the uniq manpage now. Have a great night!
On Sun, 25 Sep 2011 21:51:51 +0300 Dotan Cohen wrote:
Is there a way to get the most common (unique) lines of the file?
If you want what I think you want, a combination of cut and sort will do it.
By the way, I'm not sure if this is RHEL or CentOS, or which version: I assume that it is one of these, as Yum is installed. How would I find out?
cat /etc/redhat-release
On Sun, Sep 25, 2011 at 22:10, Frank Cox theatre@sasktel.net wrote:
Is there a way to get the most common (unique) lines of the file?
If you want what I think you want, a combination of cut and sort will do it.
Neither seem to have the "most common line" ability built in. I might have to resort to either Perl, or just attacking the logfile errors at random!
cat /etc/redhat-release
Thanks! I is more up to date than I thought!
[root@gastricsleeve html]# cat /etc/redhat-release CentOS release 5.5 (Final)
On Sun, Sep 25, 2011 at 10:21:11PM +0300, Dotan Cohen wrote:
Thanks! I is more up to date than I thought!
[root@gastricsleeve html]# cat /etc/redhat-release CentOS release 5.5 (Final)
Actually you are 2 full point releases behind; current is 5.7. I would strongly suggest you update.
John
On Sun, Sep 25, 2011 at 23:34, John R. Dennison jrd@gerdesas.com wrote:
Actually you are 2 full point releases behind; current is 5.7. I would strongly suggest you update.
Thanks. I will mention that to the sysadmin.