Hello all. I have a centos 4.4 box (2.6.9-42.0.10.ELsmp) with 7gig of ram that doesn't seem to be using swap (also 7gig now). I say "seem" because I've noticed ram utilization run around 95% (with oracle and friends running) and then firing up a couple apps to use that last 5% will stop the machine dead in it's tracks.
I ran across some reading[0] about /proc/sys/vm/swappiness and how it affects the kernels decisions when dealing with swap, but it's not all that clear to me whether this is the source of the problem. It seems like the kernel just isn't using swap at all.
I've double checked my swap partition in fstab, ran mkswap again, double checked it's availability with vmstat -s but still it's just not getting used. Sunday, the box died around 3am and sar seems to show memory use on the rise right up till it hurled[1] (I just stuck another gig in the box monday, so sar only shows 6gig on sunday)
Anyone know whats up with this? PS: swap is an LVM partition.
Thanks Thomas
[0] - http://lwn.net/Articles/83588/ [1] - sar -A 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad ... 02:40:01 AM 197656 5909384 96.76 203308 4066344 16383992 0 0.00 0 02:50:01 AM 191728 5915312 96.86 203320 4066400 16383992 0 0.00 0 03:00:01 AM 185584 5921456 96.96 203336 4067132 16383992 0 0.00 0 03:10:01 AM 185712 5921328 96.96 203344 4067872 16383992 0 0.00 0 Average: 191895 5915145 96.86 203221 4063819 16383992 0 0.00 0
12:00:01 AM pswpin/s pswpout/s ... 02:00:01 AM 0.00 0.00 02:10:01 AM 0.00 0.00 02:20:01 AM 0.00 0.00 02:30:01 AM 0.00 0.00 02:40:01 AM 0.00 0.00 02:50:01 AM 0.00 0.00 03:00:01 AM 0.00 0.00 03:10:01 AM 0.00 0.00 Average: 0.00 0.00
On 4/10/07, tblader tblader@flambeau.com wrote:
Hello all. I have a centos 4.4 box (2.6.9-42.0.10.ELsmp) with 7gig of ram that doesn't seem to be using swap (also 7gig now). I say "seem" because I've noticed ram utilization run around 95% (with oracle and friends running) and then firing up a couple apps to use that last 5% will stop the machine dead in it's tracks.
How do you have your swap space allocated? Is it all 1 big 7G chunk, or broken up across disks somewhat? There used to be a limitation with having a single swap partition over 2G in size. I'm not certain if that still applies or not. If you really do require 7G of swap space, you'll probably see a benefit to keeping it spread across multiple spindles in 1-2G chunks. This will allow some sane system handling when swap usage is required, and it'll be able to essentially raid the swap space, for lack of a better description.
Jim Perrin wrote: <snip>
How do you have your swap space allocated? Is it all 1 big 7G chunk, or broken up across disks somewhat?
It's all in one chunk on an LVM partition. I thought I would adjust the swap size on LVM and when I got it dialed in make a new raw partition to hold it.
tblader wrote:
Hello all. I have a centos 4.4 box (2.6.9-42.0.10.ELsmp) with 7gig of ram that doesn't seem to be using swap (also 7gig now). I say "seem" because I've noticed ram utilization run around 95% (with oracle and friends running) and then firing up a couple apps to use that last 5% will stop the machine dead in it's tracks.
What architecture is this beast?
John Summerfield wrote:
tblader wrote:
Hello all. I have a centos 4.4 box (2.6.9-42.0.10.ELsmp) with 7gig of ram that doesn't seem to be using swap (also 7gig now). I say "seem" because I've noticed ram utilization run around 95% (with oracle and friends running) and then firing up a couple apps to use that last 5% will stop the machine dead in it's tracks.
What architecture is this beast?
Hope this is is of use.
[2]$ uname -virmop 2.6.9-42.0.10.ELsmp #1 SMP Tue Feb 27 09:40:21 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
[2]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : AMD Athlon(tm) 64 FX-62 Dual Core Processor stepping : 2 cpu MHz : 2800.038 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16 bogomips : 5603.52 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp [4] [5]
processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : AMD Athlon(tm) 64 FX-62 Dual Core Processor stepping : 2 cpu MHz : 2800.038 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16 bogomips : 5599.20 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp [4] [5]
[2]$ cat /proc/meminfo MemTotal: 7128956 kB MemFree: 2132556 kB Buffers: 556832 kB Cached: 2821408 kB SwapCached: 0 kB Active: 3036432 kB Inactive: 1511364 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 7128956 kB LowFree: 2132556 kB SwapTotal: 7340024 kB SwapFree: 7340024 kB Dirty: 47404 kB Writeback: 0 kB Mapped: 1493920 kB Slab: 378568 kB CommitLimit: 10904500 kB Committed_AS: 5853520 kB PageTables: 39176 kB VmallocTotal: 536870911 kB VmallocUsed: 271256 kB VmallocChunk: 536599031 kB HugePages_Total: 0 HugePages_Free: 0 Hugepagesize: 2048 kB
tblader wrote:
[2]$ cat /proc/meminfo MemTotal: 7128956 kB MemFree: 2132556 kB Buffers: 556832 kB Cached: 2821408 kB
you do realize, out of that 7GB of ram, 2GB is unused entirely, and 2.8GB is being used as disk cache, another 500MB as buffer space... so in fact program memory usage is down around 2-3GB ?
if you understand that, my apologies. Its just that I've dealt with lots and lots folks who focus entirely on that "MemFree" line without realizing that "Cached" is also available as 'free' space.
btw, you said Oracle. I've heard that if you enable huge pages w/ oracle on large memory x86_64 systems, you get a BIG performance boost. sorry, I don't know the specifics, but there's probably a Oracle MetaLink article on it.
John R Pierce wrote:
tblader wrote:
[2]$ cat /proc/meminfo MemTotal: 7128956 kB MemFree: 2132556 kB Buffers: 556832 kB Cached: 2821408 kB
you do realize, out of that 7GB of ram, 2GB is unused entirely, and 2.8GB is being used as disk cache, another 500MB as buffer space... so in fact program memory usage is down around 2-3GB ?
Thanks for breaking that down. No I'm not all that familiar with swap details so it helps to have some insight. The box flatlined around 3am on Monday morning and memused (sar) was around 96%. All the swap space was untouched. Maybe every process running was active, and unable to be swapped, so the box just up and died when a request came in to allocate more ram. I just don't have enough of an understanding of swap to say that was what happened. I do remember a day when you could heat a small room with the swap activity generated on earlier 2.x kernels, but things seem much different with 2.6.
Thanks for your insight, Thomas
On Tue, Apr 10, 2007 at 12:44:31PM -0500, tblader wrote:
Thanks for breaking that down. No I'm not all that familiar with swap details so it helps to have some insight. The box flatlined around 3am on Monday morning and memused (sar) was around 96%. All the swap space was untouched. Maybe every process running was active, and unable to be swapped, so the box just up and died when a request came in to allocate more ram. I just don't have enough of an understanding of swap to say that was what happened. I do remember a day when you could heat a small room with the swap activity generated on earlier 2.x kernels, but things seem much different with 2.6.
It could be hardware related. Maybe the hardware doesn't like that memory module, or it has problems. Did you memtest+ all your memory?
Luciano Miguel Ferreira Rocha wrote:
On Tue, Apr 10, 2007 at 12:44:31PM -0500, tblader wrote:
Thanks for breaking that down. No I'm not all that familiar with swap details so it helps to have some insight. The box flatlined around 3am on Monday morning and memused (sar) was around 96%. All the swap space was untouched. Maybe every process running was active, and unable to be swapped, so the box just up and died when a request came in to allocate more ram. I just don't have enough of an understanding of swap to say that was what happened. I do remember a day when you could heat a small room with the swap activity generated on earlier 2.x kernels, but things seem much different with 2.6.
It could be hardware related. Maybe the hardware doesn't like that memory module, or it has problems. Did you memtest+ all your memory?
Yep. Haven't ruled that out yet. I ran memtest for a weekend straight on it and it said things were cool. Someone did mention that they weren't sure memtest ran in protected mode so it may not have been able access all the ram (4G limit I guess?). I fired up memtest again after that and it listed the whole lot of ram in the stats so it would seem it must have been able to test it.
On Tue, Apr 10, 2007 at 01:43:01PM -0500, tblader wrote:
Luciano Miguel Ferreira Rocha wrote:
It could be hardware related. Maybe the hardware doesn't like that memory module, or it has problems. Did you memtest+ all your memory?
Yep. Haven't ruled that out yet. I ran memtest for a weekend straight on it and it said things were cool. Someone did mention that they weren't sure memtest ran in protected mode so it may not have been able access all the ram (4G limit I guess?). I fired up memtest again after that and it listed the whole lot of ram in the stats so it would seem it must have been able to test it.
Could you try one night with the ram limited? Try ram=6912M (exclude 256MB).
There are some instances where the kernel doesn't get a correct memory mapping from the BIOS and tries to use memory it wasn't supposed to.
In my instances it only caused the system to be *very* slow, not crashes, but your case may be related.
tblader wrote:
John Summerfield wrote:
tblader wrote:
Hello all. I have a centos 4.4 box (2.6.9-42.0.10.ELsmp) with 7gig of ram that doesn't seem to be using swap (also 7gig now). I say "seem" because I've noticed ram utilization run around 95% (with oracle and friends running) and then firing up a couple apps to use that last 5% will stop the machine dead in it's tracks.
What architecture is this beast?
Hope this is is of use.
[2]$ uname -virmop 2.6.9-42.0.10.ELsmp #1 SMP Tue Feb 27 09:40:21 EST 2007 x86_64 x86_64 x86_64 GNU/Linux
[2]$ cat /proc/cpuinfo processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : AMD Athlon(tm) 64 FX-62 Dual Core Processor stepping : 2 cpu MHz : 2800.038 cache size : 1024 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16 bogomips : 5603.52 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp [4] [5]
processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 67 model name : AMD Athlon(tm) 64 FX-62 Dual Core Processor stepping : 2 cpu MHz : 2800.038 cache size : 1024 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext lm 3dnowext 3dnow pni cx16 bogomips : 5599.20 TLB size : 1088 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp [4] [5]
[2]$ cat /proc/meminfo MemTotal: 7128956 kB MemFree: 2132556 kB Buffers: 556832 kB Cached: 2821408 kB
A supplementary question, can anyone explain what "buffers" and "cached" are? I have a performance problem with postgresql (discussion is on the postgresql list), and I see those figures displayed by commands such as free, top and procinfo but the man pages don't define them.
[2]$ cat /proc/meminfo MemTotal: 7128956 kB MemFree: 2132556 kB Buffers: 556832 kB Cached: 2821408 kB
A supplementary question, can anyone explain what "buffers" and "cached" are? I have a performance problem with postgresql (discussion is on the postgresql list), and I see those figures displayed by commands such as free, top and procinfo but the man pages don't define them.
Buffers = disk buffers (buffers of stuff to be read/written) Cached = page cache which stores filesystem and process pages...copy of files in memory
Linux uses all ram for file caching if applications don't use it. That's how it works. If you aren't using swap and your mahcine is running fine everything's cool
tblader wrote:
Hello all. I have a centos 4.4 box (2.6.9-42.0.10.ELsmp) with 7gig of ram that doesn't seem to be using swap (also 7gig now). I say "seem" because I've noticed ram utilization run around 95% (with oracle and friends running) and then firing up a couple apps to use that last 5% will stop the machine dead in it's tracks.
I ran across some reading[0] about /proc/sys/vm/swappiness and how it affects the kernels decisions when dealing with swap, but it's not all that clear to me whether this is the source of the problem. It seems like the kernel just isn't using swap at all.
I've double checked my swap partition in fstab, ran mkswap again, double checked it's availability with vmstat -s but still it's just not getting used. Sunday, the box died around 3am and sar seems to show memory use on the rise right up till it hurled[1] (I just stuck another gig in the box monday, so sar only shows 6gig on sunday)
Anyone know whats up with this? PS: swap is an LVM partition.
Thanks Thomas
[0] - http://lwn.net/Articles/83588/ [1] - sar -A 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad ... 02:40:01 AM 197656 5909384 96.76 203308 4066344 16383992 0 0.00 0 02:50:01 AM 191728 5915312 96.86 203320 4066400 16383992 0 0.00 0 03:00:01 AM 185584 5921456 96.96 203336 4067132 16383992 0 0.00 0 03:10:01 AM 185712 5921328 96.96 203344 4067872 16383992 0 0.00 0 Average: 191895 5915145 96.86 203221 4063819 16383992 0 0.00 0
12:00:01 AM pswpin/s pswpout/s ... 02:00:01 AM 0.00 0.00 02:10:01 AM 0.00 0.00 02:20:01 AM 0.00 0.00 02:30:01 AM 0.00 0.00 02:40:01 AM 0.00 0.00 02:50:01 AM 0.00 0.00 03:00:01 AM 0.00 0.00 03:10:01 AM 0.00 0.00 Average: 0.00 0.00