[CentOS] Text file manipulation in CentOS?
Dominik Zyla
gavroche at gavroche.pl
Tue May 11 17:29:45 UTC 2010
On Tue, May 11, 2010 at 08:25:43AM +0000, sheraznaz at yahoo.com wrote:
> >>To be more specific, I need to find how many distinct records are there in say column#1?
>
> awk '{print $1}' filename | sort -u | wc -l
>
> This will show how many unique entries are present in column one (use awk -F to change delimiter e.g awk -F ":" for : delimiter)
>
> >> How can I filter out the distinct records with number of occurances less than a pre-determined threshold?
>
> I don't quite understand this part.
>
> awk '{print $1}' filename | sort | uniq -c | sort -rn
>
> Will give you a number of occurrences (reverse numerically sorted) of uniq data from column one.
>
> Now I think you want to put that through a loop and only show those that are less than threshold?
If I understand correctly, you can pipe your output to: `awk '{a=$1} {if
(a > 3) print a}''. `a' is awk variable. `$1' is first column of awk
input so you probably need to change it.
--
Dominik Zyla
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 195 bytes
Desc: not available
URL: <http://lists.centos.org/pipermail/centos/attachments/20100511/d65255d4/attachment.sig>
More information about the CentOS
mailing list