[CentOS] localhost/PING is CRITICAL

Sat Sep 5 12:48:37 UTC 2009
A. Kirillov <nevis2us at infoline.su>

> I've recently switched to the latest rt kernel
> available from http://people.centos.org/hughesjr/kernel-rt
> And one of the two problems I'm having with this new kernel
> is that the logs have been flooded with nagios alerts
> which indicate high percentage (50-60) of packet loss
> when pinging localhost. The problem shows up several hours
> after the system reboot.

The problem also shows up with the latest kernel-rt-2.6.24.7-132
and seems to be triggered by scheduled cron jobs. That's probably
the only time when this system might be under any significant load.
Below are the relevant excerpts from /var/log/messages indicating
when the system booted into which kernel and the first 10 ping alerts
after system reboot.

Aug 28 20:03:40 angara kernel: Linux version 2.6.24.7-65.el5rt.centos

Aug 29 02:12:54 PING CRITICAL - Packet loss = 60%, RTA = 0.13 ms
Aug 29 02:13:44 PING WARNING - Packet loss = 44%, RTA = 0.09 ms
Aug 29 02:14:54 PING WARNING - Packet loss = 54%, RTA = 0.08 ms
Aug 29 02:15:45 PING WARNING - Packet loss = 44%, RTA = 0.09 ms
Aug 29 03:10:54 PING CRITICAL - Packet loss = 70%, RTA = 0.11 ms
Aug 29 03:20:46 PING WARNING - Packet loss = 44%, RTA = 0.09 ms
Aug 29 03:30:56 PING CRITICAL - Packet loss = 72%, RTA = 0.09 ms
Aug 29 03:35:45 PING WARNING - Packet loss = 44%, RTA = 0.12 ms
Aug 29 03:40:44 PING OK - Packet loss = 16%, RTA = 0.10 ms
Aug 29 03:45:44 PING WARNING - Packet loss = 50%, RTA = 0.09 ms
...

Aug 29 11:30:56 angara kernel: Linux version 2.6.24.7-65.el5rt.centos

Aug 31 03:43:13 PING WARNING - Packet loss = 28%, RTA = 0.10 ms
Aug 31 03:44:13 PING WARNING - Packet loss = 37%, RTA = 0.10 ms
Aug 31 03:45:23 PING WARNING - Packet loss = 54%, RTA = 0.08 ms
Aug 31 03:46:23 PING CRITICAL - Packet loss = 60%, RTA = 0.12 ms
Aug 31 03:51:13 PING WARNING - Packet loss = 44%, RTA = 0.10 ms
Aug 31 04:31:23 PING CRITICAL - Packet loss = 60%, RTA = 0.09 ms
Aug 31 04:41:14 PING WARNING - Packet loss = 50%, RTA = 0.11 ms
Aug 31 06:16:13 PING OK - Packet loss = 16%, RTA = 0.11 ms
Aug 31 06:21:13 PING WARNING - Packet loss = 44%, RTA = 0.10 ms
Aug 31 06:22:13 PING WARNING - Packet loss = 50%, RTA = 0.10 ms
...

Sep  3 23:44:20 angara kernel: Linux version 2.6.24.7-132.el5.local

Sep  5 04:19:53 PING WARNING - Packet loss = 50%, RTA = 0.14 ms
Sep  5 04:21:01 PING CRITICAL - Packet loss = 60%, RTA = 0.10 ms
Sep  5 04:21:51 PING WARNING - Packet loss = 28%, RTA = 0.12 ms
Sep  5 04:22:56 PING WARNING - Packet loss = 50%, RTA = 0.11 ms
Sep  5 04:28:01 PING CRITICAL - Packet loss = 80%, RTA = 0.11 ms
Sep  5 04:32:53 PING WARNING - Packet loss = 44%, RTA = 0.10 ms
Sep  5 04:37:51 PING OK - Packet loss = 16%, RTA = 0.12 ms
Sep  5 04:42:51 PING WARNING - Packet loss = 50%, RTA = 0.11 ms
Sep  5 04:44:02 PING CRITICAL - Packet loss = 60%, RTA = 0.09 ms
Sep  5 04:45:01 PING CRITICAL - Packet loss = 70%, RTA = 0.10 ms
...

When it happens the pattern of ping failures remains
basically the same until the system reboots.

# ping -n -c10 localhost
PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.074 ms
64 bytes from 127.0.0.1: icmp_seq=4 ttl=64 time=0.075 ms
64 bytes from 127.0.0.1: icmp_seq=6 ttl=64 time=0.074 ms
64 bytes from 127.0.0.1: icmp_seq=8 ttl=64 time=0.097 ms
64 bytes from 127.0.0.1: icmp_seq=10 ttl=64 time=0.067 ms

--- localhost.localdomain ping statistics ---
10 packets transmitted, 5 received, 50% packet loss, time 9004ms
rtt min/avg/max/mdev = 0.067/0.077/0.097/0.012 ms

Interestingly enough flood pings yeild much better results.

# ping -nf -c10000 localhost
PING localhost.localdomain (127.0.0.1) 56(84) bytes of data.
.......
--- localhost.localdomain ping statistics ---
10000 packets transmitted, 9993 received, 0% packet loss, time 646ms
rtt min/avg/max/mdev = 0.019/0.025/0.119/0.010 ms, ipg/ewma 0.064/0.020
ms

Any ideas or suggestions welcome.

Thanks,
Sasha