On 2009-10-28 18:09, Truejack wrote:
Need a scripting help to sort out a list and list all the duplicate lines.
My data looks somethings like this
host6:dev406mum.dd.mum.test.com:22:11:11:no host7:dev258mum.dd.mum.test.com:36:17:19:no host7:dev258mum.dd.mum.test.com:36:17:19:no host17:dev258mum.dd.mum.test.com:31:17:19:no host12:dev258mum.dd.mum.test.com:41:17:19:no host2:dev258mum.dd.mum.test.com:36:17:19:no host4:dev258mum.dd.mum.test.com:41:17:19:no host4:dev258mum.dd.mum.test.com:45:17:19:no host4:dev258mum.dd.mum.test.com:36:17:19:no
I need to sort this list and print all the lines where column 3 has a duplicate entry.
I need to print the whole line, if a duplicate entry exists in column 3.
I tried using a combination of "sort" and "uniq" but was not successful.
Long time ago (when I was still young and beautiful) and encountering also the limitations of "uniq", I wrote a small program in C to do these kinds of things. It is designed to handle record oriented stuff in groups similar to uniq. The primary purpose was as prepocessor to awk/perl, but simple things like this are builtin. You find it here:
ftp://ftp.xplanation.com/utils/by-src.zip
Unpack; make; and copy the program "by" somehwere in your PATH.
Then, to solve your problem do:
sort -t: -k 3 InputFile | by -F: -f3 -D
This sorts the input on field 3, fields separated by colon, and outputs all lines that are duplicate according to field 3 (-D).
The program can do more as well, and a little tutorial is included in the zip.