Short story: Would it be possible to get kernel-debuginfo-2.6.9-67.0.20.EL.x86_64.rpm on di.c.o? I have a need to run crash on a 2.6.9-67.0.20.ELxenU xm dump-core.
Long story: Two Dell 6950 (now called R905, 4 Dual-Core AMD Opteron 8200 series) heartbeat/drbd nodes running the stock CentOS 5.2 Dom0. The domU's are the only resources in heartbeat.
Dom1 is a perfectly running, updated, CentOS 5.2 Apache/MySQL/Samba server (2.6.18-92.1.6.el5xen). It's xen config: name = "guinan" bootloader = "/usr/bin/pygrub" uuid = "8fa0ac9e-fe28-17f5-6a72-07f84b4daa24" memory = 4097 vcpus = 6 on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ "type=vnc,vncunused=1,keymap=en-us" ] disk = [ "phy:/dev/drbd1,xvda,w", "phy:/dev/drbd2,xvdb,w" ] vif = [ "mac=00:16:3e:68:16:5d,bridge=xenbr0", "bridge=xenbr1" ]
Dom2 is a CentOS 4.6 software development and database server (2.6.9-67.0.20.ELxenU). Every so often, could be several hours or several days, it just hangs, locks up, becomes unresponsive. Nothing to the console. Nothing logged. When it gets to this point, the only recourse is an xm destroy. It's occurred with every combination of Dom0/DomU. The hardware of both servers checks out OK. The only sign of life, on Dom2's current Dom0, xentop shows CPU usage, and at a high percent, at that: xentop - 12:02:36 Xen 3.1.2-92.1.6.el5 2 domains: 2 running, 0 blocked, 0 paused, 0 crashed, 0 dying, 0 shutdown Mem: 16775712k total, 11629904k used, 5145808k free CPUs: 8 @ 2194MHz NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS NETTX(k) NETRX(k) VBDS VBD_OO VBD_RD VBD_WR SSID Domain-0 -----r 50891 1.5 901352 5.4 no limit n/a 8 4 17720114 18765713 0 0 0 0 0 monolith -----r 965342 599.2 10486632 62.5 10486784 62.5 6 2 332672635 30178285 2 0 15805538 60805435 0
I've experimented with various xen config's. Currently it's this: name = "monolith" uuid = "283746fa-c708-cfa0-f5df-cad6abea568e" memory = 10241 vcpus = 4 bootloader = "/usr/bin/pygrub" on_poweroff = "destroy" on_reboot = "restart" on_crash = "restart" vfb = [ "type=vnc,vncunused=1,keymap=en-us" ] disk = [ "phy:/dev/drbd3,xvda,w", "phy:/dev/drbd4,xvdb,w" ] vif = [ "mac=00:16:3e:08:8d:c3,bridge=xenbr0", "bridge=xenbr1" ]
Previously, I'd tried adjusting memory, but the problem resurfaced. Currently, I've gone from 6 vcpus to the 4 shown above in the hopes that stabilizes it.
I'm at wits end. I've committed these machines to production status, so this instability has everyone kind of on edge, and wanting to go back to bare metal...
jerry
On Mon, Jul 14, 2008 at 3:49 PM, Jerry Amundson jamundso@gmail.com wrote:
Two Dell 6950 (now called R905, 4 Dual-Core AMD Opteron 8200 series) heartbeat/drbd nodes running the stock CentOS 5.2 Dom0. The domU's are the only resources in heartbeat. Dom1 is a perfectly running, updated, CentOS 5.2 Apache/MySQL/Samba Dom2 is a CentOS 4.6 software development and database server
So crash tells me that Dom2 gets to this point: SYSTEM MAP: System.map-2.6.9-67.0.20.ELxenU DEBUG KERNEL: /usr/lib/debug/lib/modules/2.6.9-67.0.20.ELxenU/vmlinux (2.6.9-67.0.20.ELxenU) DUMPFILE: /public/IntSys/tmp/m1.dmp CPUS: 6 DATE: Mon Jul 14 11:53:59 2008 UPTIME: 6 days, 11:39:33 LOAD AVERAGE: 548.07, 542.95, 434.99 TASKS: 2721 NODENAME: monolith RELEASE: 2.6.9-67.0.20.ELxenU VERSION: #1 SMP Thu Jun 26 08:36:44 EDT 2008 MACHINE: x86_64 (2194 Mhz) MEMORY: 10 GB PANIC: "" PID: 0 COMMAND: "swapper" TASK: ffffffff80322b40 (1 of 6) [THREAD_INFO: ffffffff80426000] CPU: 0 STATE: TASK_RUNNING WARNING: panic task not found
crash> bt PID: 0 TASK: ffffffff80322b40 CPU: 0 COMMAND: "swapper" #0 [ffffffff80427ec0] schedule at ffffffff80294d9a #1 [ffffffff80427f98] cpu_idle at ffffffff8010b85d crash> kmem -i PAGES TOTAL PERCENTAGE TOTAL MEM 2621696 10 GB ---- FREE 8884 34.7 MB 0% of TOTAL MEM USED 2612812 10 GB 99% of TOTAL MEM SHARED 0 0 0% of TOTAL MEM BUFFERS 59585 232.8 MB 2% of TOTAL MEM CACHED 1325825 5.1 GB 50% of TOTAL MEM SLAB 358565 1.4 GB 13% of TOTAL MEM
TOTAL HIGH 0 0 0% of TOTAL MEM FREE HIGH 0 0 0% of TOTAL HIGH TOTAL LOW 2621696 10 GB 100% of TOTAL MEM FREE LOW 8884 34.7 MB 0% of TOTAL LOW
kmem: swap_info[0].swap_map at ffffff00001ea000 is unaccessible
So I see where the DomU is, but how did it get there? Can I find out from crash, or do I need something "real-time" within the DomU? Of course, searching has given me nothing to go on, hence this post, but I'll continue...
jerry