[CentOS] Good value for /proc/sys/vm/min_free_kbytes

Thu Dec 7 18:14:25 UTC 2006
John R Pierce <pierce at hogranch.com>

Martin Knoblauch wrote:
>  We are experiencing responsiveness problems (and higher than expected
> load) when the system is under combined memory+network+disk-IO stress.

First, I'd check the paging with `vmstat 5` ... if you see excessive SI 
(swap in/second), you need more physical memory, no amount of dinking 
with vm parameters can change this.

If you're not seeing excessive paging, I'd be inclined to monitor the 
disk IO with `iostat -x 5`...   if the avgqu-sz and/or await on any 
device is high, you need to balance your disk IO across more physical 
devices and/or more channels.   await = 500 means disk physical IO 
requests are taking an average of 500mS (0.5 seconds) to satisfy.    If 
many processes are waiting for disk IO, you'll see high load factors 
even though CPU usage is fairly low.

iostat is in yum package systat (not installed by default in most 
configs), vmstat is in procps (generally installed by default).   on 
both of these commands, ignore the first output, thats the system 
average since reboot, generally meaningless.  the 2nd and successive 
outputs are at the intervals specified (5 seconds in my above examples).

On our database servers, which experience very high disk IO loads, we 
often use 4 separate RAIDs...  / and the other normal system volumes are 
partitions on a raid1 (typically 2 x 36GB 15k scsi or sas), then the 
database itself will be spread across 3 volumes /u10 /u11 /u12, which 
are each RAID 1+0 built from 4 x 72GB 15k scsi/sas or FC SAN volumes.   
We'll always use RAID controllers with hardware battery-protected raid 
write-back cache for the database volumes, as this hugely accelerates 
'commits'.    Note, we don't use mysql, I have no idea if its capable of 
taking advantage of configurations like this, but postgresql and oracle 
certainly are.    The database adminstrators will spend hours pouring 
over IO logs and database statistics in order to better optimize the 
distribution of tables and indicies across the available tablespaces.

Under these sorts of heavy concurrent random access patterns, SATA and 
software RAID just don't cut it, regardless of how good its sequential 
benchmarks may be.

> Please CC me on replies, as I am only getting the digest.

   spamtrap at knobisoft.de  ??!?   no thanks.