Load at 5, no CPU I/O or swap in use

List overview All Threads
Download

newer

older

The system is pseudo dead when...

CentOS 4 Kernel Update

Ed Donahue

20 Aug 2010 20 Aug '10

6:48 p.m.

We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?

I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.

Show replies by date

Hakan Koseoglu

20 Aug 20 Aug

6:58 p.m.

On 20 August 2010 19:48, Ed Donahue liberaled@gmail.com wrote:

...

We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle

That's a high load average. What's top/nmon reporting? Run nmon with -t -f -s 10 -c 180 to collect half an hours worth of data and put it through the analyser. It should give you a good idea of what's happening on your server.

Are your servers purely Oracle servers or are they also serving other software? Any slow NFS mounts?

-- Hakan (m1fcj) - http://www.hititgunesi.org

Hakan Koseoglu

7:03 p.m.

On 20 August 2010 19:58, Hakan Koseoglu hakan@koseoglu.org wrote:

...

That's a high load average. What's top/nmon reporting?

Hum, should have said "high load average for an idle(ish) server". Friday evening + beer + CentOS list = not a good idea.

-- Hakan (m1fcj) - http://www.hititgunesi.org

Les Mikesell

7:19 p.m.

On 8/20/2010 2:03 PM, Hakan Koseoglu wrote:

...

On 20 August 2010 19:58, Hakan Koseogluhakan@koseoglu.org wrote:

...
That's a high load average. What's top/nmon reporting?

Hum, should have said "high load average for an idle(ish) server". Friday evening + beer + CentOS list = not a good idea.

'top' should show the busy-ish processes if they are long-running. You might be able to strace them to see what they are doing. Or something might be spawning off a bunch of short-lived processes. Those are harder to track down but you can tell if that's the case by a fast turnover in process ids. And you might catch one with a ps and find it's parent.

-- Les Mikesell lesmikesell@gmail.com

James Hogarth

7:01 p.m.

Load isn't a bad thing. Load is the number of processes in the run queue. You have 16 cores and only 5 processes in the run queue. Are you witnessing poor responsiveness on that server?

What are you trying to really troubleshoot?

On 20 Aug 2010 19:49, "Ed Donahue" liberaled@gmail.com wrote:

...

We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?

I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Brian Mathis

8:14 p.m.

On Fri, Aug 20, 2010 at 2:48 PM, Ed Donahue liberaled@gmail.com wrote:

...

We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?

I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.

Do you have sar (sysstat) installed and running? That will gather stats 1x per minute on the server and you can see more than what a typical 'top' will show you. You can also graph the output using ksar, which will make it easier to see things.

Ed Donahue

9:07 p.m.

On Fri, Aug 20, 2010 at 4:14 PM, Brian Mathis brian.mathis@gmail.com wrote:

...

On Fri, Aug 20, 2010 at 2:48 PM, Ed Donahue liberaled@gmail.com wrote:

...
We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?

I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.

Do you have sar (sysstat) installed and running? That will gather stats 1x per minute on the server and you can see more than what a typical 'top' will show you. You can also graph the output using ksar, which will make it easier to see things. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

sar is showing a 99.6% idle cpu, this box only has an oracle db running on it.

It has dataguard which keeps it in sync with the DR server over a vpn.

Here is vmstat output:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 50057932 28332 13016296 0 0 1 6 0 0 0 0 100 0 0 0 0 0 50058992 28348 13016292 0 0 0 180 1172 2660 0 0 100 0 0 0 0 0 50059156 28356 13016300 0 0 0 192 1250 3071 0 0 100 0 0 0 0 0 50059988 28372 13016300 0 0 0 50 1221 3074 0 0 100 0 0 0 0 0 50060244 28380 13016292 0 0 0 126 1057 2578 0 0 100 0 0

No processes in D status or zombies

The NFS mounts are fine too, only two of them and they have users home directories and no one logged onto the system.

It is also hooked up to a MD3000 where the db and oracle files are stored, the md3000 isn't showing any alerts.

James Hogarth

10:32 p.m.

Ok let's see pf -efc if we can and see what is listed as in the run queue

On 20 Aug 2010 22:07, "Ed Donahue" liberaled@gmail.com wrote:

...

On Fri, Aug 20, 2010 at 4:14 PM, Brian Mathis brian.mathis@gmail.com

wrote:

...

...
On Fri, Aug 20, 2010 at 2:48 PM, Ed Donahue liberaled@gmail.com wrote:

...
We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?

I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.

Do you have sar (sysstat) installed and running? That will gather stats 1x per minute on the server and you can see more than what a typical 'top' will show you. You can also graph the output using ksar, which will make it easier to see things. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

sar is showing a 99.6% idle cpu, this box only has an oracle db running on

it.

...

It has dataguard which keeps it in sync with the DR server over a vpn.

Here is vmstat output:

procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 50057932 28332 13016296 0 0 1 6 0 0 0 0 100 0 0 0 0 0 50058992 28348 13016292 0 0 0 180 1172 2660 0 0 100 0 0 0 0 0 50059156 28356 13016300 0 0 0 192 1250 3071 0 0 100 0 0 0 0 0 50059988 28372 13016300 0 0 0 50 1221 3074 0 0 100 0 0 0 0 0 50060244 28380 13016292 0 0 0 126 1057 2578 0 0 100 0 0

No processes in D status or zombies

The NFS mounts are fine too, only two of them and they have users home directories and no one logged onto the system.

It is also hooked up to a MD3000 where the db and oracle files are stored, the md3000 isn't showing any alerts. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

5621

Age (days ago)

5621

Last active (days ago)

discuss@lists.centos.org

7 comments

5 participants

tags (0)

participants (5)

Brian Mathis
Ed Donahue
Hakan Koseoglu
James Hogarth
Les Mikesell