[CentOS] Scripting help please....

m.roth at 5-cent.us m.roth at 5-cent.us
Wed Oct 28 20:17:05 UTC 2009


> m.roth at 5-cent.us wrote:
>>> Need a scripting help to sort out a list and list all the duplicate
>>> lines.
>>>
>>> My data looks somethings like this
>>>
>>> host6:dev406mum.dd.mum.test.com:22:11:11:no
>>> host7:dev258mum.dd.mum.test.com:36:17:19:no
>>> host7:dev258mum.dd.mum.test.com:36:17:19:no
>>> host17:dev258mum.dd.mum.test.com:31:17:19:no
>>> host12:dev258mum.dd.mum.test.com:41:17:19:no
>>> host2:dev258mum.dd.mum.test.com:36:17:19:no
>>> host4:dev258mum.dd.mum.test.com:41:17:19:no
>>> host4:dev258mum.dd.mum.test.com:45:17:19:no
>>> host4:dev258mum.dd.mum.test.com:36:17:19:no
>>>
>>> I need to sort this list and print all the lines where column 3 has a
>>> duplicate entry.
>>>
>>> I need to print the whole line, if a duplicate entry exists in column
>>> 3.
>>>
>>> I tried using a combination of "sort" and "uniq" but was not
>>> successful.
>>
>> list.awk
>> BEGIN {
>>    FS=":";
>> }
>> {  if ( $3 == last ) {
>>
>>       print $0;
>>    }
>>    last = $3;
>> }
>>
>> sort <file> | awk -f list.awk
>>
>>      mark "*how* long an awk script would you like?"
>
> This doesn't print the first of the duplicates.  Also, the question
> wasn't clear as to whether every line with matching 3rd fields should be
> printed or just ones where the others or previous fields matched (but
> the sort options could control that).

Oh, sorry:
BEGIN {
   FS=":";
}
{  if ( $3 == last ) {
      if ( first == 0 ) {
         print saved;
         first++;
      }
      print $0;
   }
   else {
      first = 0;
      last = $3;
      saved = $0;
   }
}

        mark "did I mention that I've written 100 -200 line awk scripts?"




More information about the CentOS mailing list