[CentOS] Scripting help please....
m.roth at 5-cent.us
m.roth at 5-cent.us
Wed Oct 28 20:17:05 UTC 2009
> m.roth at 5-cent.us wrote:
>>> Need a scripting help to sort out a list and list all the duplicate
>>> lines.
>>>
>>> My data looks somethings like this
>>>
>>> host6:dev406mum.dd.mum.test.com:22:11:11:no
>>> host7:dev258mum.dd.mum.test.com:36:17:19:no
>>> host7:dev258mum.dd.mum.test.com:36:17:19:no
>>> host17:dev258mum.dd.mum.test.com:31:17:19:no
>>> host12:dev258mum.dd.mum.test.com:41:17:19:no
>>> host2:dev258mum.dd.mum.test.com:36:17:19:no
>>> host4:dev258mum.dd.mum.test.com:41:17:19:no
>>> host4:dev258mum.dd.mum.test.com:45:17:19:no
>>> host4:dev258mum.dd.mum.test.com:36:17:19:no
>>>
>>> I need to sort this list and print all the lines where column 3 has a
>>> duplicate entry.
>>>
>>> I need to print the whole line, if a duplicate entry exists in column
>>> 3.
>>>
>>> I tried using a combination of "sort" and "uniq" but was not
>>> successful.
>>
>> list.awk
>> BEGIN {
>> FS=":";
>> }
>> { if ( $3 == last ) {
>>
>> print $0;
>> }
>> last = $3;
>> }
>>
>> sort <file> | awk -f list.awk
>>
>> mark "*how* long an awk script would you like?"
>
> This doesn't print the first of the duplicates. Also, the question
> wasn't clear as to whether every line with matching 3rd fields should be
> printed or just ones where the others or previous fields matched (but
> the sort options could control that).
Oh, sorry:
BEGIN {
FS=":";
}
{ if ( $3 == last ) {
if ( first == 0 ) {
print saved;
first++;
}
print $0;
}
else {
first = 0;
last = $3;
saved = $0;
}
}
mark "did I mention that I've written 100 -200 line awk scripts?"
More information about the CentOS
mailing list