Short story:
Would it be possible to get
kernel-debuginfo-2.6.9-67.0.20.EL.x86_64.rpm on di.c.o?
I have a need to run crash on a 2.6.9-67.0.20.ELxenU xm dump-core.
Long story:
Two Dell 6950 (now called R905, 4 Dual-Core AMD Opteron 8200 series)
heartbeat/drbd nodes running the stock CentOS 5.2 Dom0. The domU's are
the only resources in heartbeat.
Dom1 is a perfectly running, updated, CentOS 5.2 Apache/MySQL/Samba
server (2.6.18-92.1.6.el5xen). It's xen config:
name = "guinan"
bootloader = "/usr/bin/pygrub"
uuid = "8fa0ac9e-fe28-17f5-6a72-07f84b4daa24"
memory = 4097
vcpus = 6
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "phy:/dev/drbd1,xvda,w", "phy:/dev/drbd2,xvdb,w" ]
vif = [ "mac=00:16:3e:68:16:5d,bridge=xenbr0", "bridge=xenbr1" ]
Dom2 is a CentOS 4.6 software development and database server
(2.6.9-67.0.20.ELxenU). Every so often, could be several hours or
several days, it just hangs, locks up, becomes unresponsive. Nothing
to the console. Nothing logged. When it gets to this point, the only
recourse is an xm destroy. It's occurred with every combination of
Dom0/DomU. The hardware of both servers checks out OK. The only sign
of life, on Dom2's current Dom0, xentop shows CPU usage, and at a high
percent, at that:
xentop - 12:02:36 Xen 3.1.2-92.1.6.el5
2 domains: 2 running, 0 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown
Mem: 16775712k total, 11629904k used, 5145808k free CPUs: 8 @ 2194MHz
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k)
MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR
SSID
Domain-0 -----r 50891 1.5 901352 5.4 no limit
n/a 8 4 17720114 18765713 0 0 0 0 0
monolith -----r 965342 599.2 10486632 62.5 10486784
62.5 6 2 332672635 30178285 2 0 15805538 60805435
0
I've experimented with various xen config's. Currently it's this:
name = "monolith"
uuid = "283746fa-c708-cfa0-f5df-cad6abea568e"
memory = 10241
vcpus = 4
bootloader = "/usr/bin/pygrub"
on_poweroff = "destroy"
on_reboot = "restart"
on_crash = "restart"
vfb = [ "type=vnc,vncunused=1,keymap=en-us" ]
disk = [ "phy:/dev/drbd3,xvda,w", "phy:/dev/drbd4,xvdb,w" ]
vif = [ "mac=00:16:3e:08:8d:c3,bridge=xenbr0", "bridge=xenbr1" ]
Previously, I'd tried adjusting memory, but the problem resurfaced.
Currently, I've gone from 6 vcpus to the 4 shown above in the hopes
that stabilizes it.
I'm at wits end. I've committed these machines to production status,
so this instability has everyone kind of on edge, and wanting to go
back to bare metal...
jerry
--
"Your life is trite and jaded, boring and confiscated." - Twisted Sister