Short story: Would it be possible to get kernel-debuginfo-2.6.9-67.0.20.EL.x86_64.rpm on di.c.o? I have a need to run crash on a 2.6.9-67.0.20.ELxenU xm dump-core.
Long story: Two Dell 6950 (now called R905, 4 Dual-Core AMD Opteron 8200 series) heartbeat/drbd nodes running the stock CentOS 5.2 Dom0. The domU's are the only resources in heartbeat.
Dom1 is a perfectly running, updated, CentOS 5.2 Apache/MySQL/Samba server (2.6.18-92.1.6.el5xen). It's xen config: name = "guinan" bootloader = "/usr/bin/pygrub" uuid = "8fa0ac9e-fe28-17f5-6a72-07f84b4daa24" memory = 4097 vcpus = 6 on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ "type=vnc,vncunused=1,keymap=en-us" ] disk = [ "phy:/dev/drbd1,xvda,w", "phy:/dev/drbd2,xvdb,w" ] vif = [ "mac=00:16:3e:68:16:5d,bridge=xenbr0", "bridge=xenbr1" ]
Dom2 is a CentOS 4.6 software development and database server (2.6.9-67.0.20.ELxenU). Every so often, could be several hours or several days, it just hangs, locks up, becomes unresponsive. Nothing to the console. Nothing logged. When it gets to this point, the only recourse is an xm destroy. It's occurred with every combination of Dom0/DomU. The hardware of both servers checks out OK. The only sign of life, on Dom2's current Dom0, xentop shows CPU usage, and at a high percent, at that: xentop - 12:02:36 Xen 3.1.2-92.1.6.el5 2 domains: 2 running, 0 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown Mem: 16775712k total, 11629904k used, 5145808k free CPUs: 8 @ 2194MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID Domain-0 -----r 50891 1.5 901352 5.4 no limit n/a 8 4 17720114 18765713 0 0 0 0 0 monolith -----r 965342 599.2 10486632 62.5 10486784 62.5 6 2 332672635 30178285 2 0 15805538 60805435 0
I've experimented with various xen config's. Currently it's this: name = "monolith" uuid = "283746fa-c708-cfa0-f5df-cad6abea568e" memory = 10241 vcpus = 4 bootloader = "/usr/bin/pygrub" on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ "type=vnc,vncunused=1,keymap=en-us" ] disk = [ "phy:/dev/drbd3,xvda,w", "phy:/dev/drbd4,xvdb,w" ] vif = [ "mac=00:16:3e:08:8d:c3,bridge=xenbr0", "bridge=xenbr1" ]
Previously, I'd tried adjusting memory, but the problem resurfaced. Currently, I've gone from 6 vcpus to the 4 shown above in the hopes that stabilizes it.
I'm at wits end. I've committed these machines to production status, so this instability has everyone kind of on edge, and wanting to go back to bare metal...
jerry