On Feb 25, 2011, at 4:29 AM, Rudi Ahlers Rudi@SoftDux.com wrote:
On Wed, Feb 23, 2011 at 4:33 PM, Ross Walker rswwalker@gmail.com wrote:
On Feb 23, 2011, at 3:42 AM, Rudi Ahlers Rudi@SoftDux.com wrote:
On Wed, Feb 23, 2011 at 9:06 AM, yonatan pingle yonatan.pingle@gmail.com wrote:
you should have a look at your I/O disk status.
try with iostat -dx 5 to see the disk utilization info over time. when it comes to slowdown on a virtual environment on a Desktop grade machine, i suspect disk I/O latency and bottleneck as a cause.
Thanx, I don't know how to interpret the results (yet), but here's the current output:
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
Knowing the columns helps here,
rrqm/s and wrqm/s, mean read/write requests merged a second, shows how well scheduler is merging contiguous io operations
r/s and w/s, read/write io operations a second
rsec/s and wsec/s, read/write sectors a second, I usually use the -k option so it displays as kilobytes a second
avgrq-sz, shows average request size in the unit of choice, here being sectors, I wish it'd separate reads from writes, but oh well
avgqu-sz, average amount of io operations waiting for service, smaller is better
await, average time an io operation waited on queue to be serviced in ms, again smaller is better
svctm, last time it took to service an io operation, how long the drive took to perform the operation from when it left queue to when a result was returned
%util, the estimated drive utilization based on svctm, await and avgqu-sz
For lockups though I'd look at dmesg and xen log, xmlog I think is the command.
The number one reason for lockups though is most likely memory contention between domUs and dom0.
What are you running in dom0? What are your memory reservations like?
I see a lot of these errors in /var/log/messages shortly before it crashed:
Feb 22 15:27:14 zaxen01 kernel: HighMem: empty Feb 22 15:27:14 zaxen01 kernel: 918 pagecache pages Feb 22 15:27:14 zaxen01 kernel: Swap cache: add 2248198, delete 2248009, find 160685591/160898897, race 0+453 Feb 22 15:27:14 zaxen01 kernel: Free swap = 0kB Feb 22 15:27:14 zaxen01 kernel: Total swap = 4194296kB Feb 22 15:27:14 zaxen01 kernel: Free swap: 0kB Feb 22 15:27:14 zaxen01 kernel: 133120 pages of RAM Feb 22 15:27:14 zaxen01 kernel: 22818 reserved pages Feb 22 15:27:16 zaxen01 kernel: 105840 pages shared Feb 22 15:27:16 zaxen01 kernel: 189 pages swap cached Feb 22 15:27:17 zaxen01 kernel: Out of memory: Killed process 17464, UID 99, (sendmail). Feb 23 00:35:38 zaxen01 syslogd 1.4.1: restart. Feb 23 00:35:38 zaxen01 kernel: klogd 1.4.1, log source = /proc/kmsg started. Feb 23 00:35:38 zaxen01 kernel: Bootdata ok (command line is ro root=/dev/System/root rhgb quiet xencons=tty6) Feb 23 00:35:38 zaxen01 kernel: Linux version 2.6.18-194.32.1.el5xen (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-48)) #1 SMP Wed Jan 5 18:44:24 EST 2011 Feb 23 00:35:38 zaxen01 kernel: BIOS-provided physical RAM map: Feb 23 00:35:38 zaxen01 kernel: Xen: 0000000000000000 - 0000000020800000 (usable) Feb 23 00:35:38 zaxen01 kernel: DMI 2.4 present. Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) Feb 23 00:35:38 zaxen01 kernel: ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) Feb 23 00:35:38 zaxen01 kernel: IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 Feb 23 00:35:38 zaxen01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Feb 23 00:35:38 zaxen01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Feb 23 00:35:38 zaxen01 kernel: Setting APIC routing to xen Feb 23 00:35:38 zaxen01 kernel: Using ACPI (MADT) for SMP configuration information Feb 23 00:35:38 zaxen01 kernel: Allocating PCI resources starting at d4000000 (gap: d0000000:2ff00000) Feb 23 00:35:38 zaxen01 kernel: Built 1 zonelists. Total pages: 133120 Feb 23 00:35:38 zaxen01 kernel: Kernel command line: ro root=/dev/System/root rhgb quiet xencons=tty6 Feb 23 00:35:38 zaxen01 kernel: Initializing CPU#0 Feb 23 00:35:38 zaxen01 kernel: PID hash table entries: 4096 (order: 12, 32768 bytes)
It seems dom0's memory got under pressure from the other domUs.
Make sure to set an absolute minimum of memory for dom0 in xend.conf or using the boot option (forgot what it is). I always made it to the OS min of 256MB, but if you are doing more in dom0 you'd want more.
On a side, I might run all management apps in a VM and manage dom0 from that domU.
-Ross
On Fri, Feb 25, 2011 at 4:21 PM, Ross Walker rswwalker@gmail.com wrote:
It seems dom0's memory got under pressure from the other domUs.
Make sure to set an absolute minimum of memory for dom0 in xend.conf or using the boot option (forgot what it is). I always made it to the OS min of 256MB, but if you are doing more in dom0 you'd want more.
On a side, I might run all management apps in a VM and manage dom0 from that domU.
-Ross
I already set dom0 to 512MB, but it seems it might not be enough. I was hoping I could optimize it a bit more though