[CentOS] how to optimize CentOS XEN dom0?

On Feb 25, 2011, at 4:29 AM, Rudi Ahlers <Rudi at SoftDux.com> wrote:

> On Wed, Feb 23, 2011 at 4:33 PM, Ross Walker <rswwalker at gmail.com> wrote:
>> On Feb 23, 2011, at 3:42 AM, Rudi Ahlers <Rudi at SoftDux.com> wrote:
>> 
>>> On Wed, Feb 23, 2011 at 9:06 AM, yonatan pingle
>>> <yonatan.pingle at gmail.com> wrote:
>>>> you should have a look at your I/O disk status.
>>>> 
>>>> try with iostat -dx 5 to see the disk utilization info over time.
>>>> when it comes to slowdown on a virtual environment on a Desktop grade
>>>> machine,  i suspect disk I/O latency and bottleneck as a cause.
>>> 
>>> Thanx, I don't know how to interpret the results (yet), but here's the
>>> current output:
>>> 
>>> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
>>> avgqu-sz   await  svctm  %util
>> 
>> Knowing the columns helps here,
>> 
>> rrqm/s and wrqm/s, mean read/write requests merged a second, shows how well scheduler is merging contiguous io operations
>> 
>> r/s and w/s, read/write io operations a second
>> 
>> rsec/s and wsec/s, read/write sectors a second, I usually use the -k option so it displays as kilobytes a second
>> 
>> avgrq-sz, shows average request size in the unit of choice, here being sectors, I wish it'd separate reads from writes, but oh well
>> 
>> avgqu-sz, average amount of io operations waiting for service, smaller is better
>> 
>> await, average time an io operation waited on queue to be serviced in ms, again smaller is better
>> 
>> svctm, last time it took to service an io operation, how long the drive took to perform the operation from when it left queue to when a result was returned
>> 
>> %util, the estimated drive utilization based on svctm, await and avgqu-sz
>> 
>> For lockups though I'd look at dmesg and xen log, xmlog I think is the command.
>> 
>> The number one reason for lockups though is most likely memory contention between domUs and dom0.
>> 
>> What are you running in dom0? What are your memory reservations like?
>> 
>> 
> I see a lot of these errors in /var/log/messages shortly before it crashed:
> 
> 
> 
> Feb 22 15:27:14 zaxen01 kernel: HighMem: empty
> Feb 22 15:27:14 zaxen01 kernel: 918 pagecache pages
> Feb 22 15:27:14 zaxen01 kernel: Swap cache: add 2248198, delete
> 2248009, find 160685591/160898897, race 0+453
> Feb 22 15:27:14 zaxen01 kernel: Free swap  = 0kB
> Feb 22 15:27:14 zaxen01 kernel: Total swap = 4194296kB
> Feb 22 15:27:14 zaxen01 kernel: Free swap:            0kB
> Feb 22 15:27:14 zaxen01 kernel: 133120 pages of RAM
> Feb 22 15:27:14 zaxen01 kernel: 22818 reserved pages
> Feb 22 15:27:16 zaxen01 kernel: 105840 pages shared
> Feb 22 15:27:16 zaxen01 kernel: 189 pages swap cached
> Feb 22 15:27:17 zaxen01 kernel: Out of memory: Killed process 17464,
> UID 99, (sendmail).
> Feb 23 00:35:38 zaxen01 syslogd 1.4.1: restart.
> Feb 23 00:35:38 zaxen01 kernel: klogd 1.4.1, log source = /proc/kmsg started.
> Feb 23 00:35:38 zaxen01 kernel: Bootdata ok (command line is ro
> root=/dev/System/root rhgb quiet xencons=tty6)
> Feb 23 00:35:38 zaxen01 kernel: Linux version 2.6.18-194.32.1.el5xen
> (mockbuild at builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat
> 4.1.2-48)) #1 SMP
> Wed Jan 5 18:44:24 EST 2011
> Feb 23 00:35:38 zaxen01 kernel: BIOS-provided physical RAM map:
> Feb 23 00:35:38 zaxen01 kernel:  Xen: 0000000000000000 -
> 0000000020800000 (usable)
> Feb 23 00:35:38 zaxen01 kernel: DMI 2.4 present.
> Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x01]
> lapic_id[0x00] enabled)
> Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x03]
> lapic_id[0x02] enabled)
> Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x02]
> lapic_id[0x01] enabled)
> Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC (acpi_id[0x04]
> lapic_id[0x03] enabled)
> Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl
> lint[0x1])
> Feb 23 00:35:38 zaxen01 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl
> lint[0x1])
> Feb 23 00:35:38 zaxen01 kernel: ACPI: IOAPIC (id[0x02]
> address[0xfec00000] gsi_base[0])
> Feb 23 00:35:38 zaxen01 kernel: IOAPIC[0]: apic_id 2, version 32,
> address 0xfec00000, GSI 0-23
> Feb 23 00:35:38 zaxen01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0
> global_irq 2 dfl dfl)
> Feb 23 00:35:38 zaxen01 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9
> global_irq 9 high level)
> Feb 23 00:35:38 zaxen01 kernel: Setting APIC routing to xen
> Feb 23 00:35:38 zaxen01 kernel: Using ACPI (MADT) for SMP
> configuration information
> Feb 23 00:35:38 zaxen01 kernel: Allocating PCI resources starting at
> d4000000 (gap: d0000000:2ff00000)
> Feb 23 00:35:38 zaxen01 kernel: Built 1 zonelists.  Total pages: 133120
> Feb 23 00:35:38 zaxen01 kernel: Kernel command line: ro
> root=/dev/System/root rhgb quiet xencons=tty6
> Feb 23 00:35:38 zaxen01 kernel: Initializing CPU#0
> Feb 23 00:35:38 zaxen01 kernel: PID hash table entries: 4096 (order:
> 12, 32768 bytes)

It seems dom0's memory got under pressure from the other domUs.

Make sure to set an absolute minimum of memory for dom0 in xend.conf or using the boot option (forgot what it is). I always made it to the OS min of 256MB, but if you are doing more in dom0 you'd want more.

On a side, I might run all management apps in a VM and manage dom0 from that domU.

-Ross