We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?
I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.
On 20 August 2010 19:48, Ed Donahue liberaled@gmail.com wrote:
We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle
That's a high load average. What's top/nmon reporting? Run nmon with -t -f -s 10 -c 180 to collect half an hours worth of data and put it through the analyser. It should give you a good idea of what's happening on your server.
Are your servers purely Oracle servers or are they also serving other software? Any slow NFS mounts?
On 20 August 2010 19:58, Hakan Koseoglu hakan@koseoglu.org wrote:
That's a high load average. What's top/nmon reporting?
Hum, should have said "high load average for an idle(ish) server". Friday evening + beer + CentOS list = not a good idea.
On 8/20/2010 2:03 PM, Hakan Koseoglu wrote:
On 20 August 2010 19:58, Hakan Koseogluhakan@koseoglu.org wrote:
That's a high load average. What's top/nmon reporting?
Hum, should have said "high load average for an idle(ish) server". Friday evening + beer + CentOS list = not a good idea.
'top' should show the busy-ish processes if they are long-running. You might be able to strace them to see what they are doing. Or something might be spawning off a bunch of short-lived processes. Those are harder to track down but you can tell if that's the case by a fast turnover in process ids. And you might catch one with a ps and find it's parent.
Load isn't a bad thing. Load is the number of processes in the run queue. You have 16 cores and only 5 processes in the run queue. Are you witnessing poor responsiveness on that server?
What are you trying to really troubleshoot?
On 20 Aug 2010 19:49, "Ed Donahue" liberaled@gmail.com wrote:
We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?
I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Fri, Aug 20, 2010 at 2:48 PM, Ed Donahue liberaled@gmail.com wrote:
We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?
I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.
Do you have sar (sysstat) installed and running? That will gather stats 1x per minute on the server and you can see more than what a typical 'top' will show you. You can also graph the output using ksar, which will make it easier to see things.
On Fri, Aug 20, 2010 at 4:14 PM, Brian Mathis brian.mathis@gmail.com wrote:
On Fri, Aug 20, 2010 at 2:48 PM, Ed Donahue liberaled@gmail.com wrote:
We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?
I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.
Do you have sar (sysstat) installed and running? That will gather stats 1x per minute on the server and you can see more than what a typical 'top' will show you. You can also graph the output using ksar, which will make it easier to see things. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
sar is showing a 99.6% idle cpu, this box only has an oracle db running on it.
It has dataguard which keeps it in sync with the DR server over a vpn.
Here is vmstat output:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 50057932 28332 13016296 0 0 1 6 0 0 0 0 100 0 0 0 0 0 50058992 28348 13016292 0 0 0 180 1172 2660 0 0 100 0 0 0 0 0 50059156 28356 13016300 0 0 0 192 1250 3071 0 0 100 0 0 0 0 0 50059988 28372 13016300 0 0 0 50 1221 3074 0 0 100 0 0 0 0 0 50060244 28380 13016292 0 0 0 126 1057 2578 0 0 100 0 0
No processes in D status or zombies
The NFS mounts are fine too, only two of them and they have users home directories and no one logged onto the system.
It is also hooked up to a MD3000 where the db and oracle files are stored, the md3000 isn't showing any alerts.
Ok let's see pf -efc if we can and see what is listed as in the run queue
On 20 Aug 2010 22:07, "Ed Donahue" liberaled@gmail.com wrote:
On Fri, Aug 20, 2010 at 4:14 PM, Brian Mathis brian.mathis@gmail.com
wrote:
On Fri, Aug 20, 2010 at 2:48 PM, Ed Donahue liberaled@gmail.com wrote:
We are currently running CentOS 5 update 4 on a Dell R910 server 16 cores/32 hyperthreaded with 64GB of memory. It is our main Oracle 11g DB server for one of our customers and is attached to an MD 3000 storage array. We are having a load averaging around 5 but see no swap in use, CPUs are pretty much idle and no I/O wait. We have Oracle dataguard turned on in transactional mode. I've checked everything that I can think of, there are no Oracle processes running which would cause a spike. Anyone have any ideas as to what to check next?
I have another R910 configured the same way and do not see any issues with the 3 databases running on that server. The load is at .5.
Do you have sar (sysstat) installed and running? That will gather stats 1x per minute on the server and you can see more than what a typical 'top' will show you. You can also graph the output using ksar, which will make it easier to see things. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
sar is showing a 99.6% idle cpu, this box only has an oracle db running on
it.
It has dataguard which keeps it in sync with the DR server over a vpn.
Here is vmstat output:
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 0 0 50057932 28332 13016296 0 0 1 6 0 0 0 0 100 0 0 0 0 0 50058992 28348 13016292 0 0 0 180 1172 2660 0 0 100 0 0 0 0 0 50059156 28356 13016300 0 0 0 192 1250 3071 0 0 100 0 0 0 0 0 50059988 28372 13016300 0 0 0 50 1221 3074 0 0 100 0 0 0 0 0 50060244 28380 13016292 0 0 0 126 1057 2578 0 0 100 0 0
No processes in D status or zombies
The NFS mounts are fine too, only two of them and they have users home directories and no one logged onto the system.
It is also hooked up to a MD3000 where the db and oracle files are stored, the md3000 isn't showing any alerts. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos