Hello,
I have strange behaviour on a server that I can't get a handle on. I have a reasonably powerful server running VMware server 1.0.4-56528. It has a RAID5 build with mdadm on 5 SATA drives. Masses of ram and 2 XEON CPUs. But it stutters.
Example : fire up vi, press and keep finger on i. After filling 2-3 lines, the display is stopped for 2-12 seconds. Then they continue. This happens even on the host OS, at the console.
Host system running CentOS 5.2 x86-64:
CPU : 2x Xeon E5430 @ 2.66GHz RAM : 24GB Mobo : DSBV-DX HD : 5 x SATA ST3750330AS 750GB in RAID5
There are 5 VMs, detailed at http://www.awale.qc.ca/vmware/stj1.txt to make this mail shorter.
Seems to me this system should be more then adequate to handle the load.
This is what vmstat on the host looks like when the server is "unhappy" : http://www.awale.qc.ca/vmware/vmstat.txt Spending a lot of time in 'wa', but 'bo' and 'bi' are miniscule.
The problem seems like a disk problem. I grow to suspect that SATA isn't ready for the big time. I also grow to dislike RAID5.
Questions :
- Anyone have a clue or other on how to track down my bottle neck?
- SATA NCQ is limited to 15 queue depth. Is this per-SATA-port or per-SATA-chip? Or does this question make no sense?
- I realise there are more recent versions of CentOS out. Are there specific items in the changelogs that would affect my problem?
Thank you for any help,
-Philip
Philip Gwyn liste@artware.qc.ca writes:
Hello,
I have strange behaviour on a server that I can't get a handle on. I have a reasonably powerful server running VMware server 1.0.4-56528. It has a RAID5 build with mdadm on 5 SATA drives. Masses of ram and 2 XEON CPUs. But it stutters.
...
The problem seems like a disk problem. I grow to suspect that SATA isn't ready for the big time. I also grow to dislike RAID5.
Personally, I will use RAID5, and I will use SATA, but I will not use SATA with RAID5 except in 'tape replacement' roles. The weak bits of RAID5 (read/ write cycle on sub-stripe writes) are often exasterbated by the weak bits of SATA (slow seek time, slow rotational speed) creating a perfect storm of suck.
Not to say that's your primary problem.
Actually, it sounds a whole lot like the problems I get with xen on heavily used servers, if I don't assign a core exclusively to the dom0 (or at least give it a very high priority.) But I have little knowledge of or experience with VMware, so I don't know if you have a similar problem.
I have been using a 3ware 9690se sata card with raid5. I have been running Centos using xen and have had no problems. I wrote my virtual machines to the raw raid5 drive. Its seems to have worked fine for me.
On Thu, Sep 24, 2009 at 9:18 PM, Philip Gwyn liste@artware.qc.ca wrote:
Hello,
I have strange behaviour on a server that I can't get a handle on. I have a reasonably powerful server running VMware server 1.0.4-56528. It has a RAID5 build with mdadm on 5 SATA drives. Masses of ram and 2 XEON CPUs. But it stutters.
Example : fire up vi, press and keep finger on i. After filling 2-3 lines, the display is stopped for 2-12 seconds. Then they continue. This happens even on the host OS, at the console.
Host system running CentOS 5.2 x86-64:
CPU : 2x Xeon E5430 @ 2.66GHz RAM : 24GB Mobo : DSBV-DX HD : 5 x SATA ST3750330AS 750GB in RAID5
There are 5 VMs, detailed at http://www.awale.qc.ca/vmware/stj1.txt to make this mail shorter.
Seems to me this system should be more then adequate to handle the load.
This is what vmstat on the host looks like when the server is "unhappy" : http://www.awale.qc.ca/vmware/vmstat.txt Spending a lot of time in 'wa', but 'bo' and 'bi' are miniscule.
The problem seems like a disk problem. I grow to suspect that SATA isn't ready for the big time. I also grow to dislike RAID5.
Questions :
Anyone have a clue or other on how to track down my bottle neck?
SATA NCQ is limited to 15 queue depth. Is this per-SATA-port or
per-SATA-chip? Or does this question make no sense?
- I realise there are more recent versions of CentOS out. Are there
specific items in the changelogs that would affect my problem?
Thank you for any help,
-Philip
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
On Thu, Sep 24, 2009 at 8:18 PM, Philip Gwyn liste@artware.qc.ca wrote:
Hello,
I have strange behaviour on a server that I can't get a handle on. I have a reasonably powerful server running VMware server 1.0.4-56528. It has a RAID5 build with mdadm on 5 SATA drives. Masses of ram and 2 XEON CPUs. But it stutters.
Example : fire up vi, press and keep finger on i. After filling 2-3 lines, the display is stopped for 2-12 seconds. Then they continue. This happens even on the host OS, at the console.
Host system running CentOS 5.2 x86-64:
CPU : 2x Xeon E5430 @ 2.66GHz RAM : 24GB Mobo : DSBV-DX HD : 5 x SATA ST3750330AS 750GB in RAID5
There are 5 VMs, detailed at http://www.awale.qc.ca/vmware/stj1.txt to make this mail shorter.
Seems to me this system should be more then adequate to handle the load.
This is what vmstat on the host looks like when the server is "unhappy" : http://www.awale.qc.ca/vmware/vmstat.txt Spending a lot of time in 'wa', but 'bo' and 'bi' are miniscule.
The problem seems like a disk problem. I grow to suspect that SATA isn't ready for the big time. I also grow to dislike RAID5.
Questions :
Anyone have a clue or other on how to track down my bottle neck?
SATA NCQ is limited to 15 queue depth. Is this per-SATA-port or
per-SATA-chip? Or does this question make no sense?
- I realise there are more recent versions of CentOS out. Are there specific
items in the changelogs that would affect my problem?
VMware Server 1.0.x was never supported on RHEL/CentOS 5.x, especially as early as 1.0.4. Not that it can't be made to work, but it just wasn't made for newer kernel versions. We run up to 10 guests in VMware Server 1.0.9 on a single Xeon quad core with the host running CentOS 4, SATA hardware RAID 1. Admittedly, our guests are pretty low CPU, low throughput, but it works just fine for us. If your guests are not really hammering the disk system, then you may be on a wild goose chase blaming RAID 5.
In my time on the VMware forums, it was always suggested to use single CPU guests running non-smp kernels for Server 1.0.x. It might help to convert the one smp guest you have. If you can afford some down-time, reconfigure the host to use compatible CentOS/VMware versions (4.x/1.0.x or 5.x/2.x respectively). At the very least, get the latest VMware Server 1.0.9.
-- Jeff
Hi,
On Thu, Sep 24, 2009 at 21:18, Philip Gwyn liste@artware.qc.ca wrote:
The problem seems like a disk problem. I grow to suspect that SATA isn't ready for the big time. I also grow to dislike RAID5.
Questions :
- Anyone have a clue or other on how to track down my bottle neck?
You can use the command "iostat -kx 1 /dev/sd?" which will give you more information of what is happening, in particular it will show %util which will show how often the drive is busy, and you can correlate that with the rkB/s and wkB/s to see how much data is being read or written to that specific drive. You also have averages for the request size (to know if you have many small operations or a few big ones), queue size, service time and wait time. See "man iostat" for more details. It's not installed by default on CentOS 5 but it's available from the base repositories, just run "yum install sysstat" if you don't have it yet.
If you are using RAID-5 you might want to see if the chunk size you are using is good. You can specify that when you create a new array using the "-c" option to mdadm. I don't think you can change that after it's created. The default is 64kB which sounds sane enough but you might want to check if yours was created with that value or not.
The problem is basically if you have big operations that are larger than the chunk size it will require operations on all the disks which means all of them will have to seek to a specific position to complete your operation, and while they are doing that they will not be able to work on any other requests. If you have high usage and random access the disks will spend a lot of time seeking. If that is the case, you might want to increase the chunk size so that most operations can be fulfilled by one disk only so that the others are free to work on other requests at that time.
On the other hand, if you have specific areas of your filesystem that are hit more often that fall always on the same disk, that disk will be used more than the other ones, so your performance will be effectively limited by that one disk instead of multiplied by the number of disks due to the striped access. In that case it might make sense to reduce the chunk size in order to make the access more even across disks. I read sometime ago that ext2/ext3 has a way of allocating blocks that will create such unfair distribution when you are striping across a certain number of disks, I don't know exactly how that works but you might want to check into that. I remember that when you create the ext2/ext3 filesystem you can use an option such as "stride=..." to give a hint on the disk layout so that the filesystem can disalign those blocks enough to spread the load across the disks. But I remember I could never exactly figure out what "stride=..." number would make sense to me... the documentation is kind of scarce in this area, but check the mke2fs manpage anyway if you have a disk that is more "hot" than the others and you think that might be the problem. You can also experiment with other filesystems such as XFS which is available in the extras repository.
And of course, make sure "cat /proc/mdstat" shows everything OK, make sure you aren't running a degraded array before you start investigating its performance.
I'm sure there are performance tunings that can be done with, e.g., hdparm, tweaking numbers in /proc and /sys filesystems, or changing the kernel scheduler, but I'm not really experienced with that so I couldn't really advise you on that. I'm sure others will have such experience and will be able to give you pointers on that. You might want to ask on the main list in that case, instead of the -virt one.
HTH, Filipe
Philip Gwyn wrote:
Hello,
I have strange behaviour on a server that I can't get a handle on. I have a reasonably powerful server running VMware server 1.0.4-56528. It has a RAID5 build with mdadm on 5 SATA drives. Masses of ram and 2 XEON CPUs. But it stutters.
This will double your memory usage. But it should fix your I/O.
Take a look at http://vmfaq.com/?View=entry&EntryID=25
In particular, putting your temporary directory in a ramdisk will improve your I/O profile immensely.
Edit /etc/vmware/config and add:
tmpDirectory = "/tmp/vmware" mainMem.useNamedFile = "FALSE" sched.mem.pshare.enable = "FALSE" MemTrimRate = "0" MemAllowAutoScaleDown = "FALSE" prefvmx.useRecommendedLockedMemSize = "TRUE" prefvmx.minVmMemPct = "100"
Edit /etc/fstab and add
tmpfs /tmp/vmware tmpfs defaults,size=100% 0 0
and edit /tmp/cron.daily/tmpwatch and add '-x /tmp/vmware' to the tmpwatch command line for /tmp.
make your mount point for /tmp/vmware and mount /tmp/vmware
restart vmware.
That is how I run my systems.
Benjamin Franz wrote:
This will double your memory usage. But it should fix your I/O.
Take a look at http://vmfaq.com/?View=entry&EntryID=25
In particular, putting your temporary directory in a ramdisk will improve your I/O profile immensely.
Edit /etc/vmware/config and add:
tmpDirectory = "/tmp/vmware" Edit /etc/fstab and add
tmpfs /tmp/vmware tmpfs defaults,size=100% 0 0
and edit /tmp/cron.daily/tmpwatch and add '-x /tmp/vmware' to the tmpwatch command line for /tmp.
make your mount point for /tmp/vmware and mount /tmp/vmware
And I just learned something new. According to http://communities.vmware.com/thread/105144;jsessionid=DE9B4FFB861971525BEDB... if you use /dev/shm for your tmpDirectory you don't pay the 'double the memory' penalty. I am testing it now.
Benjamin Franz wrote:
And I just learned something new. According to http://communities.vmware.com/thread/105144;jsessionid=DE9B4FFB861971525BEDB... if you use /dev/shm for your tmpDirectory you don't pay the 'double the memory' penalty. I am testing it now.
To wrap this up, VMware has actually put up a Knowledge Base entry on this documenting exactly how to do it as of last month:
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docT...
On 09/25/2009 12:44 PM, Jerry Franz wrote:
To wrap this up, VMware has actually put up a Knowledge Base entry on this documenting exactly how to do it as of last month:
http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docT...
Slick, this gave me a little performance boost. I wonder why this is not the default. What risk am I taking?
On 25-Sep-2009 Benjamin Franz wrote:
And I just learned something new. According to http://communities.vmware.com/thread/105144;jsessionid=DE9B4FFB861971525BEDB... 984F6A670?start=15&tstart=0 if you use /dev/shm for your tmpDirectory you don't pay the 'double the memory' penalty. I am testing it now.
This is misleading. As I undestand it, anything in a tmpfs that has been opened with mmap(MAP_SHARED) will not pay the 'double the memory' penalty. /dev/shm is "simply" a standard place for a tmpfs.
http://communities.vmware.com/thread/167897
-Philip
I have a "fairly" stable Xen (CentOS 5.3 "standard" 3.1.x Xen) install that I want to put into production within the next two weeks or so.
I have some small (so far non-fatal) issues and tweaks that Xen 3.4.x may address. E.g. AMD x64 IOMMU bios read, GPLPV PCI connection, HPET clock, better GPLPV handling, and some others.
My question is: that if I follow the directions at stacklet.com (http://stacklet.com/downloads/kernel) to load up Xen 3.4 can/will depmod overwrite dependencies needed for my "standard" Xen kernel that will not be available by a simple edit of grub.conf to restore the "standard" Xen kernel. I'm not familiar with depmod's actions.
Ben M. wrote on 10/06/2009 01:24 PM:
I have a "fairly" stable Xen (CentOS 5.3 "standard" 3.1.x Xen) install that I want to put into production within the next two weeks or so.
I have some small (so far non-fatal) issues and tweaks that Xen 3.4.x may address. E.g. AMD x64 IOMMU bios read, GPLPV PCI connection, HPET clock, better GPLPV handling, and some others.
My question is: that if I follow the directions at stacklet.com (http://stacklet.com/downloads/kernel) to load up Xen 3.4 can/will depmod overwrite dependencies needed for my "standard" Xen kernel that will not be available by a simple edit of grub.conf to restore the "standard" Xen kernel. I'm not familiar with depmod's actions.
You shouldn't have to worry about depmod - it will only operate on the current running kernel, or the one you explicitly tell it to, and there should be no direct interaction with GRUB. I'd be much more worried about installing tarballs onto an RPM based system, as Stacklet seems to want to do from my brief look at the site you referenced. Be sure you have a good backup before proceeding with that.
Phil
Check out this repository for Xen 3.4
-Adam
On Thu, Oct 8, 2009 at 8:38 AM, Phil Schaffner Philip.R.Schaffner@nasa.govwrote:
Ben M. wrote on 10/06/2009 01:24 PM:
I have a "fairly" stable Xen (CentOS 5.3 "standard" 3.1.x Xen) install that I want to put into production within the next two weeks or so.
I have some small (so far non-fatal) issues and tweaks that Xen 3.4.x may address. E.g. AMD x64 IOMMU bios read, GPLPV PCI connection, HPET clock, better GPLPV handling, and some others.
My question is: that if I follow the directions at stacklet.com (http://stacklet.com/downloads/kernel) to load up Xen 3.4 can/will depmod overwrite dependencies needed for my "standard" Xen kernel that will not be available by a simple edit of grub.conf to restore the "standard" Xen kernel. I'm not familiar with depmod's actions.
You shouldn't have to worry about depmod - it will only operate on the current running kernel, or the one you explicitly tell it to, and there should be no direct interaction with GRUB. I'd be much more worried about installing tarballs onto an RPM based system, as Stacklet seems to want to do from my brief look at the site you referenced. Be sure you have a good backup before proceeding with that.
Phil
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
I'm familiar with Gitco, and it is a very good repo, however, using Gitco to go to 3.4x killed my "standard" Centos installation and it (Xen) didn't work on 3.3x on my hardware.
I tried repo'ing in Gitco a couple of times, with yum priorities toggled on/off and different priority labels. Then I tried it with a fresh install, put on Yum AllowDowngrade util, but either I didn't have it config'd right, because it didn't allow a downgrade.
I was hoping that with a prerolled Xen binary, and a manual grub.conf entry, I could try it. But I just am clueless on depmod's impact on my "decent" installation.
I have two more Xen machines to build out, I could test on them I guess. I just would have felt a little more comfortable if Xen 3.4x deals with GPLPV as well as I have read (no PCI connection on Xen 3.1). The most very important thing is that the XenPV Shutdown Monitor does work, though disk speed doesn't seem to benefit from it. I must say Win2k8 is a disappointment over Win2k3 so far. Win2k3 was decent, fairly lean and very fast, virtual or "real."
Adam wrote:
Check out this repository for Xen 3.4
-Adam
On Thu, Oct 8, 2009 at 8:38 AM, Phil Schaffner <Philip.R.Schaffner@nasa.gov mailto:Philip.R.Schaffner@nasa.gov> wrote:
Ben M. wrote on 10/06/2009 01:24 PM: > I have a "fairly" stable Xen (CentOS 5.3 "standard" 3.1.x Xen) install > that I want to put into production within the next two weeks or so. > > I have some small (so far non-fatal) issues and tweaks that Xen 3.4.x > may address. E.g. AMD x64 IOMMU bios read, GPLPV PCI connection, HPET > clock, better GPLPV handling, and some others. > > My question is: that if I follow the directions at stacklet.com <http://stacklet.com> > (<http://stacklet.com/downloads/kernel>) to load up Xen 3.4 can/will > depmod overwrite dependencies needed for my "standard" Xen kernel that > will not be available by a simple edit of grub.conf to restore the > "standard" Xen kernel. I'm not familiar with depmod's actions. You shouldn't have to worry about depmod - it will only operate on the current running kernel, or the one you explicitly tell it to, and there should be no direct interaction with GRUB. I'd be much more worried about installing tarballs onto an RPM based system, as Stacklet seems to want to do from my brief look at the site you referenced. Be sure you have a good backup before proceeding with that. Phil _______________________________________________ CentOS-virt mailing list CentOS-virt@centos.org <mailto:CentOS-virt@centos.org> http://lists.centos.org/mailman/listinfo/centos-virt
CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
----- "Ben M." centos@rivint.com wrote:
entry, I could try it. But I just am clueless on depmod's impact on my "decent" installation.
What do you think depmod is going to do?
I do not have a comprehensive grasp on startup scripts, as well as what files are not rolled into the kernel itself.
In other words, I don't understand yet when a new kernel is installed, whether there are any support files that come with it, or whether everything that, for instance, the Xen kernel needs, are entirely within that kernel file (hardware drivers).
If it is just a matter of having a section for it in grub.conf.
Christopher G. Stach II wrote:
----- "Ben M." centos@rivint.com wrote:
entry, I could try it. But I just am clueless on depmod's impact on my "decent" installation.
What do you think depmod is going to do?
On 10/09/2009 01:57 AM, Ben M. wrote:
I do not have a comprehensive grasp on startup scripts, as well as what files are not rolled into the kernel itself.
In other words, I don't understand yet when a new kernel is installed, whether there are any support files that come with it, or whether everything that, for instance, the Xen kernel needs, are entirely within that kernel file (hardware drivers).
the normal centos kernel comes with lots of drivers compiled as modules. using tar tjf on the kernel provided by stacklet will let you know what modules does this one include...
If it is just a matter of having a section for it in grub.conf.
depmod, used as described on the site, will not touch the rest of your system. and note that if using the new kernel only inside DomU (thus leaving Dom0 intact) AND using the first method that is described on that page, you do not have to touch ANY grub.conf at all ( by using the kernel directive inside the VM's config file, as described in the first part of the page)
actually the instructions given over there are pretty sane and - if respected - will not harm in any way the existing systems (either Dom0 or DomU). unlike gitco, which, as you have seen, has a much more invasive approach.
Thanks, the stacklet.com directions looked pretty non-invasive and I will try it after a backup. I just wanted to get an idea if depmod will do anything other than make the new kernel to test "system ready."
Manuel Wolfshant wrote:
On 10/09/2009 01:57 AM, Ben M. wrote:
I do not have a comprehensive grasp on startup scripts, as well as what files are not rolled into the kernel itself.
In other words, I don't understand yet when a new kernel is installed, whether there are any support files that come with it, or whether everything that, for instance, the Xen kernel needs, are entirely within that kernel file (hardware drivers).
the normal centos kernel comes with lots of drivers compiled as modules. using tar tjf on the kernel provided by stacklet will let you know what modules does this one include...
If it is just a matter of having a section for it in grub.conf.
depmod, used as described on the site, will not touch the rest of your system. and note that if using the new kernel only inside DomU (thus leaving Dom0 intact) AND using the first method that is described on that page, you do not have to touch ANY grub.conf at all ( by using the kernel directive inside the VM's config file, as described in the first part of the page)
actually the instructions given over there are pretty sane and - if respected - will not harm in any way the existing systems (either Dom0 or DomU). unlike gitco, which, as you have seen, has a much more invasive approach. _______________________________________________ CentOS-virt mailing list CentOS-virt@centos.org http://lists.centos.org/mailman/listinfo/centos-virt
----- "Ben M." centos@rivint.com wrote:
I do not have a comprehensive grasp on startup scripts, as well as what files are not rolled into the kernel itself.
In other words, I don't understand yet when a new kernel is installed, whether there are any support files that come with it, or whether everything that, for instance, the Xen kernel needs, are entirely within that kernel file (hardware drivers).
Kernels in major distros are usually distributed with most drivers compiled as modules in a package that contains those modules and an initrd, or script that makes an initrd, that contains the drivers necessary to boot your system. This isn't always the case, as drivers may be compiled right into the kernel or they may be completely excluded for whatever reason (mini distros, appliances).
After booting, depmod resolves the kernel module dependencies in /lib/modules/<kernel version> only for the kernel that is currently running. As long as you don't install a kernel package that has the same version string (e.g., 2.6.18-128.4.1.el5xen) as a kernel you care about, you have nothing to worry about. If someone is distributing third party kernel packages that collide with a major distribution's without a really good reason, you should probably avoid using their packages altogether.
If it is just a matter of having a section for it in grub.conf.
Many kernel packages will set up grub.conf for you. If it's just a tarball, you will have to do this manually. You may also need to build a new initrd.
Thank you very much that fills in a few gaps in my knowledge.
Christopher G. Stach II wrote:
----- "Ben M." centos@rivint.com wrote:
I do not have a comprehensive grasp on startup scripts, as well as what files are not rolled into the kernel itself.
In other words, I don't understand yet when a new kernel is installed, whether there are any support files that come with it, or whether everything that, for instance, the Xen kernel needs, are entirely within that kernel file (hardware drivers).
Kernels in major distros are usually distributed with most drivers compiled as modules in a package that contains those modules and an initrd, or script that makes an initrd, that contains the drivers necessary to boot your system. This isn't always the case, as drivers may be compiled right into the kernel or they may be completely excluded for whatever reason (mini distros, appliances).
After booting, depmod resolves the kernel module dependencies in /lib/modules/<kernel version> only for the kernel that is currently running. As long as you don't install a kernel package that has the same version string (e.g., 2.6.18-128.4.1.el5xen) as a kernel you care about, you have nothing to worry about. If someone is distributing third party kernel packages that collide with a major distribution's without a really good reason, you should probably avoid using their packages altogether.
If it is just a matter of having a section for it in grub.conf.
Many kernel packages will set up grub.conf for you. If it's just a tarball, you will have to do this manually. You may also need to build a new initrd.
An update : I've moved all the VM memory files into a tmpfs. Confirmed with lsof that the mm0 and ram0 files are on that tmpfs. Though not the WinXP's mm0...
After 2 days, back to the same problem.
What's more iostats -tkx 10 shows %util nearly maxing out every 30 seconds. Running top at the same time, the top processes are vmware-vmx, but it varies amoung which VM so I can't accuse a single instance.
Here is iostat output from with no user logged in : http://awale.qc.ca/vmware/iostat.20090927.2
This is iostat after 2 days of all VMs working : http://awale.qc.ca/vmware/iostat.20090930.1
Next up, downgrading to CentOS 4. Which is going to be annoying, for a variaty of non-techncal reasons.
-Philip
A short follow up to indicate how I solved my problem :
- moved all the ram files to /dev/shm - downgraded host to CentOS 4.8 (was 5.2) - Moved virtual disks to RAID1 (was RAID5) - Spread the virtual disks over various raidsets (was all on same raidset)
The first element alone was not helpful. I was not able to test RAID1 vs RAID5 in isolation from 4.8 vs 5.2, which would have been nice.
I might be downgrading all the other hosts to 4.8, in which case I might be able to test it in isolation.
-Philip