I have small xen VM running centos4 which acts as a router/firewall, and has been working fine for over 1.5 years with 32MB of RAM and a kernel I either got from xensource.org or built myself from their sources. (centos 4 didn't have a xen kernel back then)
I lost the kernel to a corrupted disk and decided to use the centos provided xen kernel.
All these months 32MB + 64MB Swap was more than sufficient and I never had any problems.(3-digit uptimes) The CentOS Xen kernel is unusable at 32MB and invokes the OOM killer even though there is plenty of swap free (and about 5-10MB of RAM too). After the OOM killer has done it's job there's about 15 MB of RAM free and swap is hardly used at all (1MB at most)
I've increased RAM to 64MB and the OOM killer doesn't kick in.. but I am seeing plenty of such messages in the logs ...
Dec 2 16:05:21 noc kernel: Badness in local_bh_enable at kernel/softirq.c:141 Dec 2 16:05:21 noc kernel: [<c0121178>] local_bh_enable+0x47/0x6f Dec 2 16:05:21 noc kernel: [<c02177d5>] skb_checksum+0x133/0x25e Dec 2 16:05:21 noc kernel: [<c0250f06>] udp_poll+0x66/0x113 Dec 2 16:05:21 noc kernel: [<c02135fd>] sock_poll+0x19/0x1d Dec 2 16:05:21 noc kernel: [<c016d1a6>] do_select+0x190/0x2c7 Dec 2 16:05:21 noc kernel: [<c016ceb5>] __pollwait+0x0/0x9b Dec 2 16:05:21 noc kernel: [<c0144ae4>] __kmalloc+0x56/0xd3 Dec 2 16:05:21 noc kernel: [<c016d5dc>] sys_select+0x2e7/0x45c Dec 2 16:05:21 noc kernel: [<c010740f>] syscall_call+0x7/0xb
Any idea what's causing it / how to make it stop ?
I'm running Centos 5 in Dom0 and CentOS 4 in the router domU and there's another couple of Centos 5 domU which are running error free.
Kingsly
Sorry about the multiple mails that came through to the list.
I'd been trying to send them out for over 20 hours and they wouldn't relay through a openvpn tunnel because of the "Badness".
I had moved the queue manually and forgot to remove the mail that went through.
Dec 2 16:05:21 noc kernel: Badness in local_bh_enable at kernel/softirq.c:141 Dec 2 16:05:21 noc kernel: [<c0121178>] local_bh_enable+0x47/0x6f Dec 2 16:05:21 noc kernel: [<c02177d5>] skb_checksum+0x133/0x25e Dec 2 16:05:21 noc kernel: [<c0250f06>] udp_poll+0x66/0x113 Dec 2 16:05:21 noc kernel: [<c02135fd>] sock_poll+0x19/0x1d Dec 2 16:05:21 noc kernel: [<c016d1a6>] do_select+0x190/0x2c7 Dec 2 16:05:21 noc kernel: [<c016ceb5>] __pollwait+0x0/0x9b Dec 2 16:05:21 noc kernel: [<c0144ae4>] __kmalloc+0x56/0xd3 Dec 2 16:05:21 noc kernel: [<c016d5dc>] sys_select+0x2e7/0x45c Dec 2 16:05:21 noc kernel: [<c010740f>] syscall_call+0x7/0xb
This seems to happen on certain mails which are relayed through an openvpn connection on the router VM.(possibly everytime sendmail tries to clear the queue.) And the mail never gets out and the the connection times out.
All test mails (one liners) went out without any problems via the VPN.
After sending one mail by-passing the VPN, i went ahead and switched the kernel.. and now all my mails are relaying.
Kingsly
On Dec 3, 2008, at 4:09 AM, Kingsly John member+centos@kingsly.net wrote:
Sorry about the multiple mails that came through to the list.
I'd been trying to send them out for over 20 hours and they wouldn't relay through a openvpn tunnel because of the "Badness".
I had moved the queue manually and forgot to remove the mail that went through.
Dec 2 16:05:21 noc kernel: Badness in local_bh_enable at kernel/ softirq.c:141 Dec 2 16:05:21 noc kernel: [<c0121178>] local_bh_enable+0x47/0x6f Dec 2 16:05:21 noc kernel: [<c02177d5>] skb_checksum+0x133/0x25e Dec 2 16:05:21 noc kernel: [<c0250f06>] udp_poll+0x66/0x113 Dec 2 16:05:21 noc kernel: [<c02135fd>] sock_poll+0x19/0x1d Dec 2 16:05:21 noc kernel: [<c016d1a6>] do_select+0x190/0x2c7 Dec 2 16:05:21 noc kernel: [<c016ceb5>] __pollwait+0x0/0x9b Dec 2 16:05:21 noc kernel: [<c0144ae4>] __kmalloc+0x56/0xd3 Dec 2 16:05:21 noc kernel: [<c016d5dc>] sys_select+0x2e7/0x45c Dec 2 16:05:21 noc kernel: [<c010740f>] syscall_call+0x7/0xb
This seems to happen on certain mails which are relayed through an openvpn connection on the router VM.(possibly everytime sendmail tries to clear the queue.) And the mail never gets out and the the connection times out.
All test mails (one liners) went out without any problems via the VPN.
After sending one mail by-passing the VPN, i went ahead and switched the kernel.. and now all my mails are relaying.
As you found out you need more memory for the RH 2.6.18 kernel then then stock one which Xen.org uses because RH backports later kernel features and enables advanced features.
The errors you are seeing are network related so it might be checksum offloading in dom0 and domU is getting in the way.
Use ethtool to disable checksum offloading in dom0 and the domU.
This is a newer Xen "feature" then what was in your older kernel you could just try disabling checksum offloading in the domU only.
-Ross