To be more specific, I need to find how many distinct records are there in say column#1?
awk '{print $1}' filename | sort -u | wc -l
This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter)
How can I filter out the distinct records with number of occurances less than a pre-determined threshold?
I don't quite understand this part.
awk '{print $1}' filename | sort | uniq -c | sort -rn
Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one.
Now I think you want to put that through a loop and only show those that are less than threshold?
Thanks Sheraz
------Original Message------ From: sheraznaz@yahoo.com Sender: centos-bounces@centos.org To: CentOS mailing list ReplyTo: CentOS mailing list Subject: Re: [CentOS] Text file manipulation in CentOS? Sent: May 11, 2010 1:14 AM
Can you sample input and expected result.
Sent from my Verizon Wireless BlackBerry
-----Original Message----- From: hadi motamedi motamedi24@gmail.com Date: Tue, 11 May 2010 09:09:23 To: CentOS mailing listcentos@centos.org Subject: [CentOS] Text file manipulation in CentOS?
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Sent from my Verizon Wireless BlackBerry