Hello, I have a couple of Dell 2950 III, both of them with CentOS 5.3, Xen, drbd 8.2 and cluster suite. Hardware: 32DB RAM, RAID 5 with 6 SAS disks (one hot spare) on a PERC/6 controller.
I configured DRBD to use the main network interfaces (bnx2 driver), with bonding and crossover cables to have a direct link. The normal network traffic uses two different network cards. There are two DRBD resources for a total of a little less than 1TB.
When the two hosts are in sync, if I activate more than a few (six or seven) xen guests, the master server crashes spectacularly and reboots.
I've seen a kernel dump over the serial console, but the machine restarts immediately so I didn't write it down.
Unfortunately I cannot experiment because I have production services on those machines (and they are working fine until I start drbd on the slave).
drbd configuration is attached.
Anybody has an idea of the problem? The crash is perfectly reproducible, and drbd seems to be the problem (maybe the Xen kernel helps?).
Thanks in advance, Andrea
On Tue, 2009-07-28 at 20:11 +0200, Andrea Dell'Amico wrote:
Hello, I have a couple of Dell 2950 III, both of them with CentOS 5.3, Xen, drbd 8.2 and cluster suite. Hardware: 32DB RAM, RAID 5 with 6 SAS disks (one hot spare) on a PERC/6 controller.
I configured DRBD to use the main network interfaces (bnx2 driver), with bonding and crossover cables to have a direct link. The normal network traffic uses two different network cards. There are two DRBD resources for a total of a little less than 1TB.
When the two hosts are in sync, if I activate more than a few (six or seven) xen guests, the master server crashes spectacularly and reboots.
I've seen a kernel dump over the serial console, but the machine restarts immediately so I didn't write it down.
If you have an available pc, hook it up in place of the serial console and start a terminal emulator, e.g. minicom or whatever you prefer, and turn on full logging. This should save everyting in a file that you can then review.
If it's a Windows based, just remember to get rid of the ^M with dos2unix, or equivalent, after you send it to a *IX box.
I don't know anything about the rest of your problem, sorry.
Unfortunately I cannot experiment because I have production services on those machines (and they are working fine until I start drbd on the slave).
drbd configuration is attached.
Anybody has an idea of the problem? The crash is perfectly reproducible, and drbd seems to be the problem (maybe the Xen kernel helps?).
Thanks in advance, Andrea
<snip sig stuff>
HTH
On Tue, 2009-07-28 at 14:31 -0400, William L. Maltby wrote:
When the two hosts are in sync, if I activate more than a few (six or seven) xen guests, the master server crashes spectacularly and reboots.
I've seen a kernel dump over the serial console, but the machine restarts immediately so I didn't write it down.
If you have an available pc, hook it up in place of the serial console and start a terminal emulator, e.g. minicom or whatever you prefer, and turn on full logging. This should save everyting in a file that you can then review.
Uhm. The console is on the DRAC5 card. I think I would need to activate some network kernel crash dump feature.
If it's a Windows based, just remember to get rid of the ^M with dos2unix, or equivalent, after you send it to a *IX box.
I don't know anything about the rest of your problem, sorry.
As I wrote, it's a production server. I cannot stop it when I want, I need to reserve a weekend session. In the meantime, I was asking if there's a known problem with a setup like mine.
Thanks, anyway
HTH
ciao andrea