Martin Knoblauch wrote:
We are experiencing responsiveness problems (and higher than expected load) when the system is under combined memory+network+disk-IO stress.
^^^^^^^^^^^^^^^
First, I'd check the paging with `vmstat 5` ... if you see excessive SI (swap in/second), you need more physical memory, no amount of dinking with vm parameters can change this.
If you're not seeing excessive paging, I'd be inclined to monitor the disk IO with `iostat -x 5`... if the avgqu-sz and/or await on any device is high, you need to balance your disk IO across more physical devices and/or more channels. await = 500 means disk physical IO requests are taking an average of 500mS (0.5 seconds) to satisfy. If many processes are waiting for disk IO, you'll see high load factors even though CPU usage is fairly low.
iostat is in yum package systat (not installed by default in most configs), vmstat is in procps (generally installed by default). on both of these commands, ignore the first output, thats the system average since reboot, generally meaningless. the 2nd and successive outputs are at the intervals specified (5 seconds in my above examples).
On our database servers, which experience very high disk IO loads, we often use 4 separate RAIDs... / and the other normal system volumes are partitions on a raid1 (typically 2 x 36GB 15k scsi or sas), then the database itself will be spread across 3 volumes /u10 /u11 /u12, which are each RAID 1+0 built from 4 x 72GB 15k scsi/sas or FC SAN volumes. We'll always use RAID controllers with hardware battery-protected raid write-back cache for the database volumes, as this hugely accelerates 'commits'. Note, we don't use mysql, I have no idea if its capable of taking advantage of configurations like this, but postgresql and oracle certainly are. The database adminstrators will spend hours pouring over IO logs and database statistics in order to better optimize the distribution of tables and indicies across the available tablespaces.
Under these sorts of heavy concurrent random access patterns, SATA and software RAID just don't cut it, regardless of how good its sequential benchmarks may be.
Please CC me on replies, as I am only getting the digest.
spamtrap@knobisoft.de ??!? no thanks.