I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Both top and uptime report the same thing. Looking at top, I cannot see any processes that are using CPU time except for top and init, and they are not using enough cycles to push up the load average.
According to top, there are occasional tiny (like 0.5%) bumps in the system usage occasionally, and almost no user space usage. Again, not enough to account for the load average I am seeing.
I have tried a couple of kernel updates, and upgraded from CentOS 5.0 to 5.2, none of which make any difference.
Has anyone else seen this? And can anyone recommend a way to figure out what is causing the load average to be this high when the machine is idle?
Thanks, --Bill
--- On Sat, 7/19/08, listmail listmail@entertech.com wrote:
From: listmail listmail@entertech.com Subject: [CentOS] Load Average ~0.40 when idle To: "CentOS mailing list" centos@centos.org Date: Saturday, July 19, 2008, 1:48 PM I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Both top and uptime report the same thing. Looking at top, I cannot see any processes that are using CPU time except for top and init, and they are not using enough cycles to push up the load average.
According to top, there are occasional tiny (like 0.5%) bumps in the system usage occasionally, and almost no user space usage. Again, not enough to account for the load average I am seeing.
I have tried a couple of kernel updates, and upgraded from CentOS 5.0 to 5.2, none of which make any difference.
Has anyone else seen this? And can anyone recommend a way to figure out what is causing the load average to be this high when the machine is idle?
I have not seen this with any C5. However I have moved all /etc/cron.daily/prelink /etc/cron.daily/makewhatis
to the weekly.
check /var/log/secure for dictionary attacks
check your /var/log/httpd/access_log for unusual PHP activity
check http://localhost/usage for the webalizer logs, where maybe something will standout.
On Sat, 19 Jul 2008 13:58:15 -0700 (PDT), Mark Pryor wrote
--- On Sat, 7/19/08, listmail listmail@entertech.com wrote:
From: listmail listmail@entertech.com Subject: [CentOS] Load Average ~0.40 when idle To: "CentOS mailing list" centos@centos.org Date: Saturday, July 19, 2008, 1:48 PM I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Both top and uptime report the same thing. Looking at top, I cannot see any processes that are using CPU time except for top and init, and they are not using enough cycles to push up the load average.
According to top, there are occasional tiny (like 0.5%) bumps in the system usage occasionally, and almost no user space usage. Again, not enough to account for the load average I am seeing.
I have tried a couple of kernel updates, and upgraded from CentOS 5.0 to 5.2, none of which make any difference.
Has anyone else seen this? And can anyone recommend a way to figure out what is causing the load average to be this high when the machine is idle?
I have not seen this with any C5. However I have moved all /etc/cron.daily/prelink /etc/cron.daily/makewhatis
to the weekly.
check /var/log/secure for dictionary attacks
check your /var/log/httpd/access_log for unusual PHP activity
check http://localhost/usage for the webalizer logs, where maybe something will standout.
Thanks, Mark. I have done all of that. There was a dictionary attack a few days ago, but there is no activity now. Since this is a new machine that I am just burning in, I am tempted to reinstall from scratch in case the machine somehow got hacked during burn-in. I don't see any stuck processes, or any other clues. I have an identical machine running a slightly older version of the kernel (CentOS 5.0 - 2.6.18.53.1.14.el5) that does not exhibit this problem, so I am a bit suspicious. Has anyone else noticed anything like this?
Thanks, --Bill
On Sat, 19 Jul 2008 14:50:18 -0700, listmail wrote
On Sat, 19 Jul 2008 13:58:15 -0700 (PDT), Mark Pryor wrote
--- On Sat, 7/19/08, listmail listmail@entertech.com wrote:
From: listmail listmail@entertech.com Subject: [CentOS] Load Average ~0.40 when idle To: "CentOS mailing list" centos@centos.org Date: Saturday, July 19, 2008, 1:48 PM I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Both top and uptime report the same thing. Looking at top, I cannot see any processes that are using CPU time except for top and init, and they are not using enough cycles to push up the load average.
According to top, there are occasional tiny (like 0.5%) bumps in the system usage occasionally, and almost no user space usage. Again, not enough to account for the load average I am seeing.
I have tried a couple of kernel updates, and upgraded from CentOS 5.0 to 5.2, none of which make any difference.
Has anyone else seen this? And can anyone recommend a way to figure out what is causing the load average to be this high when the machine is idle?
I have not seen this with any C5. However I have moved all /etc/cron.daily/prelink /etc/cron.daily/makewhatis
to the weekly.
check /var/log/secure for dictionary attacks
check your /var/log/httpd/access_log for unusual PHP activity
check http://localhost/usage for the webalizer logs, where maybe something will standout.
Thanks, Mark. I have done all of that. There was a dictionary attack a few days ago, but there is no activity now. Since this is a new machine that I am just burning in, I am tempted to reinstall from scratch in case the machine somehow got hacked during burn-in. I don't see any stuck processes, or any other clues. I have an identical machine running a slightly older version of the kernel (CentOS 5.0 - 2.6.18.53.1.14.el5) that does not exhibit this problem, so I am a bit suspicious. Has anyone else noticed anything like this?
Replying to my own post as a follow-up. I just checked another machine that I am burning in with CentOS 5.2, and it has the same problem: load average ~0.4 when idle. Both of these machines have Supermicro X7DBN motherboards, but one is running a single quad-core CPU (Intel Xeon) and the other is running two dual-core CPUs (Intel Xeon). Anyone else seeing anything like this?
Thanks, --Bill
listmail wrote:
<snip>
Are you running X ... how many processes (on average are running).
Running X and logged in with applets and such, I have this: =========================================================== top - 17:18:49 up 4:13, 3 users, load average: 0.15, 0.27, 0.32 Tasks: 153 total, 2 running, 149 sleeping, 0 stopped, 0 zombie ===========================================================
On Sat, 19 Jul 2008 17:21:44 -0500, Johnny Hughes wrote
listmail wrote:
<snip>
Are you running X ... how many processes (on average are running).
Running X and logged in with applets and such, I have this:
top - 17:18:49 up 4:13, 3 users, load average: 0.15, 0.27, 0.32 Tasks: 153 total, 2 running, 149 sleeping, 0 stopped, 0 zombie ===========================================================
One is running X, the other is not. The one that is running X has the same load average as the one that is not. A small number of processes are running, but as I said they are not using any CPU time, according to top. ===
Replying to my own post as a follow-up. I just checked another machine that I am burning in with CentOS 5.2, and it has the same problem: load average ~0.4 when idle. Both of these machines have Supermicro X7DBN motherboards, but one is running a single quad-core CPU (Intel Xeon) and the other is running two dual-core CPUs (Intel Xeon). Anyone else seeing anything like this?
Do you have hyper-threading turned on in the bios? What shows in cat /proc/cpuinfo
do you have 2 virtual CPU's per core?
I would be bet that performance improves by turning hyper-threading off.
On Sat, 19 Jul 2008 16:04:17 -0700 (PDT), Mark Pryor wrote
Replying to my own post as a follow-up. I just checked another machine that I am burning in with CentOS 5.2, and it has the same problem: load average ~0.4 when idle. Both of these machines have Supermicro X7DBN motherboards, but one is running a single quad-core CPU (Intel Xeon) and the other is running two dual-core CPUs (Intel Xeon). Anyone else seeing anything like this?
Do you have hyper-threading turned on in the bios?
No, the BIOS does not support hyperthreading.
What shows in cat /proc/cpuinfo
This is an example for one of the four CPUS - they are all the same except for the processor number: === processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz stepping : 6 cpu MHz : 2000.191 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm bogomips : 4001.80 ===
do you have 2 virtual CPU's per core?
Nope.
The systems are running at at 1KHz interrupt rate and doing about 20 context switches per second while idle. But as I said, this does not cause the CPU load average to move off of 0.00 on another almost identical machine.
Thx, --Bill
On Sat, 19 Jul 2008 19:28:55 -0400, Dan Halbert wrote
listmail wrote:
it has the same problem: load average 0.4 when idle.
If you disconnect or shut down the NIC(s), does that make any difference?
Good suggestion. Disconnecting the Ethernet cables from the NICs did not make a difference. However, shutting down the interfaces (e.g ifdown eth0, ifdown eth1) did cut the load average down to nothing (0.00).
So it wasn't actual traffic, but something that the interfaces were doing, or something that was trying to talk to one or both of them.
Does this result suggest anything else?
Thx, --Bill
listmail wrote:
Good suggestion. Disconnecting the Ethernet cables from the NICs did not make a difference. However, shutting down the interfaces (e.g ifdown eth0, ifdown eth1) did cut the load average down to nothing (0.00).
So it wasn't actual traffic, but something that the interfaces were doing, or something that was trying to talk to one or both of them.
That's interesting. A few ideas (I'm just trying divide and conquer here -- I don't have a hypothesis): 1. See if it's one interface or the other. Does just shutting down one make a difference? 2. Use tcpdump on the interface to see what's going on there, even when the cables are disconnected. (I may be wrong about seeing anything when it's disconnected; you may not see any traffic if the driver knows nothing can go out.) 3. Do "chkconfig --list" to find out which services are on, and shut them down one by one to see if one is the offender.
On Sat, 19 Jul 2008 20:26:24 -0400, Dan Halbert wrote
listmail wrote:
Good suggestion. Disconnecting the Ethernet cables from the NICs did not make a difference. However, shutting down the interfaces (e.g ifdown eth0, ifdown eth1) did cut the load average down to nothing (0.00).
So it wasn't actual traffic, but something that the interfaces were doing, or something that was trying to talk to one or both of them.
That's interesting. A few ideas (I'm just trying divide and conquer here -- I don't have a hypothesis):
- See if it's one interface or the other. Does just shutting down
one make a difference?
Nope. If either one is up, I see the load run up. Ethernet connected or not.
- Use tcpdump on the interface to see what's going on there, even
when the cables are disconnected. (I may be wrong about seeing anything when it's disconnected; you may not see any traffic if the driver knows nothing can go out.)
Can't see any traffic with the interfaces up and the Ethernet connected.
- Do "chkconfig --list" to find out which services are on, and shut
them down one by one to see if one is the offender.
I shut off everything, and the problem remained until I at last shut off the network service.
Thanks for the ideas - I'm beginning to suspect a bug in the kernel or the timer code.
--Bill
On Sat, 2008-07-19 at 16:54 -0700, listmail wrote:
On Sat, 19 Jul 2008 19:28:55 -0400, Dan Halbert wrote
listmail wrote:
it has the same problem: load average 0.4 when idle.
If you disconnect or shut down the NIC(s), does that make any difference?
Good suggestion. Disconnecting the Ethernet cables from the NICs did not make a difference. However, shutting down the interfaces (e.g ifdown eth0, ifdown eth1) did cut the load average down to nothing (0.00).
So it wasn't actual traffic, but something that the interfaces were doing, or something that was trying to talk to one or both of them.
Does this result suggest anything else?
You could try running powertop to see if it tells you anything more about interrupts being generated. If you're not using IPV6, you could disable it. If you didn't want to keep installing things, you could try the live CDs from various distributions that have some of the newer kernel features enabled, such as fedora.
I am running vmware server and found that when I compile a recent custom kernel (in the VM) with the tickless kernel and several other options enabled, it substantially reduces wasted CPU cycles. This is somewhat different from running on real hardware though. Even so, on the various laptop lists, people talk about how the machines run hotter when their constantly processing unnecessary interupts.
Here are just some of the things I've turned on, though I can't remember all of them:
CONFIG_TICK_ONESHOT=y CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y CONFIG_PARAVIRT_GUEST=y (paravirtuailized Guest/VMU Guest support) CONFIG_PARAVIRT=y CONFIG_VMI=y CONFIG_HIBERNATION=y
Nataraj
--- On Sat, 7/19/08, listmail listmail@entertech.com wrote:
From: listmail listmail@entertech.com Subject: Re: [CentOS] Load Average ~0.40 when idle To: "CentOS mailing list" centos@centos.org Date: Saturday, July 19, 2008, 4:27 PM On Sat, 19 Jul 2008 16:04:17 -0700 (PDT), Mark Pryor wrote
Replying to my own post as a follow-up. I just
checked
another machine that I am burning in with CentOS 5.2, and it has the
same
problem: load average ~0.4 when idle. Both of these machines have
Supermicro
X7DBN motherboards, but one is running a single quad-core CPU (Intel
Xeon) and
the other is running two dual-core CPUs (Intel Xeon). Anyone
else seeing
anything like this?
Do you have hyper-threading turned on in the bios?
No, the BIOS does not support hyperthreading.
What shows in cat /proc/cpuinfo
This is an example for one of the four CPUS - they are all the same except for the processor number: === processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU 5130 @ 2.00GHz stepping : 6 cpu MHz : 2000.191 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc pni monitor ds_cpl vmx tm2 cx16 xtpr lahf_lm bogomips : 4001.80 ===
do you have 2 virtual CPU's per core?
Nope.
the ht flag means the cpu supports hyperthreading lm means that you can run 64 bit.
By the way, is it an i386 kernel?
I've seen only one SuperMicro bios and it was quite complex. Are you sure that there is no way to toggle hyperthreading in the bios?
the siblings flag in cpuinfo says 2, which I thought means 2 virtual cpu's.
I doubt if any of the above is relevant to your problem, but if you reinstall anytime soon you might want to consider these support flags in how you set things up.
On Sat, 19 Jul 2008 17:22:29 -0700 (PDT), Mark Pryor wrote
--- On Sat, 7/19/08, listmail listmail@entertech.com wrote:
<snip>
the ht flag means the cpu supports hyperthreading lm means that you can run 64 bit.
By the way, is it an i386 kernel?
Yes, it's the i386 kernel.
I've seen only one SuperMicro bios and it was quite complex. Are you sure that there is no way to toggle hyperthreading in the bios?
Pretty sure, I have looked for it to no avail.
the siblings flag in cpuinfo says 2, which I thought means 2 virtual cpu's.
There are 2 cores per chip....??
I doubt if any of the above is relevant to your problem, but if you reinstall anytime soon you might want to consider these support flags in how you set things up.
Do you mean install a 64-bit kernel? I do want the kernel to see each core as one CPU. Not sure what else I would do on a re-install...
Thx, --Bill
You mentioned these are Supermicro X7DBN boards. They use the Intel (ESB2/Gilgal) 82563EB Dual-Port Gigabit Ethernet Controller. There's an open bug here: https://bugzilla.redhat.com/show_bug.cgi?id=403121, "e1000: issues with Intel ESB2/Gilgal (82563EB)". It doesn't describe your problem, but complains about other issues with that NIC, and references related bugs.
On Sat, 19 Jul 2008 21:32:42 -0400, Dan Halbert wrote
You mentioned these are Supermicro X7DBN boards. They use the Intel
(ESB2/Gilgal) 82563EB Dual-Port Gigabit Ethernet Controller. There's an open bug here: https://bugzilla.redhat.com/show_bug.cgi?id=403121, "e1000: issues with Intel ESB2/Gilgal (82563EB)". It doesn't describe your problem, but complains about other issues with that NIC, and references related bugs.
Yes, I looked at the buglist for the driver and didn't see anything related. The NICs actually work just fine at moving data. And I have the same NICs on several other Supermicros that do not have this problem.
Just for fun, I ran a backup on one of the machines, and not only did the Ethernet work well, but the "phantom" load went away while a real load was running. That's what leads me to suspect a kernel or timer bug of some sort. There was a post on the Linode site about a year ago about something that smells similar: http://www.linode.com/forums/archive/o_t/t_2729/strange_load_average.html
But those guys are doing virtualization and using newer kernels that what CentOS is distributing.
I wonder if anyone else has seen this problem?
Thanks, --Bill
On Sat, Jul 19, 2008 at 2:48 PM, listmail listmail@entertech.com wrote:
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Download the livecd and boot using it. See if the load average still occurs. Check to see if you have any traffic occuring on the network from the system. [I had a box that was kernel trojaned that had a load average all the time when it was on the wire and did not when it didn't. The kernel trojan was looking for a particular bit of traffic that would open up its backdoor to.]
Stephen John Smoogen wrote:
On Sat, Jul 19, 2008 at 2:48 PM, listmail listmail@entertech.com wrote:
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Download the livecd and boot using it. See if the load average still occurs. Check to see if you have any traffic occuring on the network from the system. [I had a box that was kernel trojaned that had a load average all the time when it was on the wire and did not when it didn't. The kernel trojan was looking for a particular bit of traffic that would open up its backdoor to.]
its been ages since i've had to do this, but in years past, rkhunter was really good at finding rootkits like this. worst case, you put it on alive CD and run it from there.
I believe this is the source home page, http://www.rootkit.nl/projects/rootkit_hunter.html
On Sat, 19 Jul 2008 21:56:45 -0700, John R Pierce wrote
Stephen John Smoogen wrote:
On Sat, Jul 19, 2008 at 2:48 PM, listmail listmail@entertech.com wrote:
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Download the livecd and boot using it. See if the load average still occurs. Check to see if you have any traffic occuring on the network from the system. [I had a box that was kernel trojaned that had a load average all the time when it was on the wire and did not when it didn't. The kernel trojan was looking for a particular bit of traffic that would open up its backdoor to.]
its been ages since i've had to do this, but in years past, rkhunter was really good at finding rootkits like this. worst case, you put it on alive CD and run it from there.
OK, I downloaded the CentOS 5.2 Live CD and booted from it. To eliminate load from the GUI, I forced the system into runlevel 3 and ran top. I see the same problem; the load average sits at about 0.40 continuously. This is with the ethernet drivers running, and it does not matter if the network cables are plugged in or not.
In my mind, that pretty much eliminates the possibility of a rootkit, unless one was delivered with the Live CD. :-) So it looks like this is a bug in either the Intel GLAN driver, or some other kernel timing issue. If anyone can suggest where this bug should be reported and is likely to be addressed, please let me know. I don't know myself who would be the correct party to notify.
Thanks to everyone who responded and helped me track this one down. I'm not sure if should roll back to CentOS 5.0, or just try to live with this bug until the maintainers address it, but at least I have some idea of what's wrong.
Thanks, --Bill
post it on the centos bug tracker to start..:)
listmail wrote:
On Sat, 19 Jul 2008 21:56:45 -0700, John R Pierce wrote
Stephen John Smoogen wrote:
On Sat, Jul 19, 2008 at 2:48 PM, listmail listmail@entertech.com wrote:
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Download the livecd and boot using it. See if the load average still occurs. Check to see if you have any traffic occuring on the network from the system. [I had a box that was kernel trojaned that had a load average all the time when it was on the wire and did not when it didn't. The kernel trojan was looking for a particular bit of traffic that would open up its backdoor to.]
its been ages since i've had to do this, but in years past, rkhunter was really good at finding rootkits like this. worst case, you put it on alive CD and run it from there.
OK, I downloaded the CentOS 5.2 Live CD and booted from it. To eliminate load from the GUI, I forced the system into runlevel 3 and ran top. I see the same problem; the load average sits at about 0.40 continuously. This is with the ethernet drivers running, and it does not matter if the network cables are plugged in or not.
In my mind, that pretty much eliminates the possibility of a rootkit, unless one was delivered with the Live CD. :-) So it looks like this is a bug in either the Intel GLAN driver, or some other kernel timing issue. If anyone can suggest where this bug should be reported and is likely to be addressed, please let me know. I don't know myself who would be the correct party to notify.
Thanks to everyone who responded and helped me track this one down. I'm not sure if should roll back to CentOS 5.0, or just try to live with this bug until the maintainers address it, but at least I have some idea of what's wrong.
Thanks, --Bill _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
William Warren escribió:
post it on the centos bug tracker to start..:)
listmail wrote:
On Sat, 19 Jul 2008 21:56:45 -0700, John R Pierce wrote
Stephen John Smoogen wrote:
On Sat, Jul 19, 2008 at 2:48 PM, listmail listmail@entertech.com wrote:
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Download the livecd and boot using it. See if the load average still occurs. Check to see if you have any traffic occuring on the network from the system. [I had a box that was kernel trojaned that had a load average all the time when it was on the wire and did not when it didn't. The kernel trojan was looking for a particular bit of traffic that would open up its backdoor to.]
its been ages since i've had to do this, but in years past, rkhunter was really good at finding rootkits like this. worst case, you put it on alive CD and run it from there.
OK, I downloaded the CentOS 5.2 Live CD and booted from it. To eliminate load from the GUI, I forced the system into runlevel 3 and ran top. I see the same problem; the load average sits at about 0.40 continuously. This is with the ethernet drivers running, and it does not matter if the network cables are plugged in or not.
In my mind, that pretty much eliminates the possibility of a rootkit, unless one was delivered with the Live CD. :-) So it looks like this is a bug in either the Intel GLAN driver, or some other kernel timing issue. If anyone can suggest where this bug should be reported and is likely to be addressed, please let me know. I don't know myself who would be the correct party to notify.
Thanks to everyone who responded and helped me track this one down. I'm not sure if should roll back to CentOS 5.0, or just try to live with this bug until the maintainers address it, but at least I have some idea of what's wrong.
Thanks, --Bill _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello,
to try to find out if you have hidden processes I suggest you to try this: http://www.security-projects.com/?Unhide
I have cronned it every night in my server.
It works really good. rkhunter is very good tool too.
Try both and let us know.
Another issue: What is the proposal of the machine? is it a web server? mail server? dns server? Check that /etc/resolv.conf has the right information and check the routes to get access to different nerworks too. If machine processor is idle, but the machine load is high, it could be because the processes queue is very big, but the machine processors could not be so overloaded.
Regards,
the issue occurs even on a live cd so the machine's software load isn't suspect. It's the nics.
Lorenzo Martínez Rodríguez wrote:
William Warren escribió:
post it on the centos bug tracker to start..:)
listmail wrote:
On Sat, 19 Jul 2008 21:56:45 -0700, John R Pierce wrote
Stephen John Smoogen wrote:
On Sat, Jul 19, 2008 at 2:48 PM, listmail listmail@entertech.com wrote:
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Download the livecd and boot using it. See if the load average still occurs. Check to see if you have any traffic occuring on the network from the system. [I had a box that was kernel trojaned that had a load average all the time when it was on the wire and did not when it didn't. The kernel trojan was looking for a particular bit of traffic that would open up its backdoor to.]
its been ages since i've had to do this, but in years past, rkhunter was really good at finding rootkits like this. worst case, you put it on alive CD and run it from there.
OK, I downloaded the CentOS 5.2 Live CD and booted from it. To eliminate load from the GUI, I forced the system into runlevel 3 and ran top. I see the same problem; the load average sits at about 0.40 continuously. This is with the ethernet drivers running, and it does not matter if the network cables are plugged in or not.
In my mind, that pretty much eliminates the possibility of a rootkit, unless one was delivered with the Live CD. :-) So it looks like this is a bug in either the Intel GLAN driver, or some other kernel timing issue. If anyone can suggest where this bug should be reported and is likely to be addressed, please let me know. I don't know myself who would be the correct party to notify.
Thanks to everyone who responded and helped me track this one down. I'm not sure if should roll back to CentOS 5.0, or just try to live with this bug until the maintainers address it, but at least I have some idea of what's wrong.
Thanks, --Bill _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hello,
to try to find out if you have hidden processes I suggest you to try this: http://www.security-projects.com/?Unhide
I have cronned it every night in my server.
It works really good. rkhunter is very good tool too.
Try both and let us know.
Another issue: What is the proposal of the machine? is it a web server? mail server? dns server? Check that /etc/resolv.conf has the right information and check the routes to get access to different nerworks too. If machine processor is idle, but the machine load is high, it could be because the processes queue is very big, but the machine processors could not be so overloaded.
Regards,
On Mon, 21 Jul 2008 08:06:54 -0400, William Warren wrote
the issue occurs even on a live cd so the machine's software load isn't suspect. It's the nics.
It sure does look like it. I submitted a bug to the CentOS bug tracker, so hopefully someone better equipped than I to resolve this can duplicate the issue.
On Sun, Jul 20, 2008 at 4:52 PM, listmail listmail@entertech.com wrote:
On Sat, 19 Jul 2008 21:56:45 -0700, John R Pierce wrote
Stephen John Smoogen wrote:
On Sat, Jul 19, 2008 at 2:48 PM, listmail listmail@entertech.com wrote:
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Download the livecd and boot using it. See if the load average still occurs. Check to see if you have any traffic occuring on the network from the system. [I had a box that was kernel trojaned that had a load average all the time when it was on the wire and did not when it didn't. The kernel trojan was looking for a particular bit of traffic that would open up its backdoor to.]
its been ages since i've had to do this, but in years past, rkhunter was really good at finding rootkits like this. worst case, you put it on alive CD and run it from there.
OK, I downloaded the CentOS 5.2 Live CD and booted from it. To eliminate load from the GUI, I forced the system into runlevel 3 and ran top. I see the same problem; the load average sits at about 0.40 continuously. This is with the ethernet drivers running, and it does not matter if the network cables are plugged in or not.
Ok sorry for the wild goose chase earlier...
1. Check with the manufacturer or motherboard to see if this is a known issue. Sometimes these items show up and are fixed with a BIOS update. 2. Check to see if you can pinpoint where the problem is coming from... set up sar and iostat to see if there are excessive irq's on one line or another. Run the system as a minimal OS when doing this... nothing but init 1 if possible. 3. Try Fedora 9 livecd and see if it still occurs. If it doesn't then the problem was fixed in the main kernel between EL-5 and now. That can help make it easier to track down for a bug in Red Hat's bugzilla.
In my mind, that pretty much eliminates the possibility of a rootkit, unless one was delivered with the Live CD. :-) So it looks like this is a bug in either the Intel GLAN driver, or some other kernel timing issue. If anyone can suggest where this bug should be reported and is likely to be addressed, please let me know. I don't know myself who would be the correct party to notify.
Thanks to everyone who responded and helped me track this one down. I'm not sure if should roll back to CentOS 5.0, or just try to live with this bug until the maintainers address it, but at least I have some idea of what's wrong.
Thanks, --Bill _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, 21 Jul 2008 10:20:53 -0600, Stephen John Smoogen wrote
On Sun, Jul 20, 2008 at 4:52 PM, listmail listmail@entertech.com wrote:
<snip
OK, I downloaded the CentOS 5.2 Live CD and booted from it. To eliminate load from the GUI, I forced the system into runlevel 3 and ran top. I see the same problem; the load average sits at about 0.40 continuously. This is with the ethernet drivers running, and it does not matter if the network cables are plugged in or not.
Ok sorry for the wild goose chase earlier...
- Check with the manufacturer or motherboard to see if this is a
known issue. Sometimes these items show up and are fixed with a BIOS update. 2. Check to see if you can pinpoint where the problem is coming from... set up sar and iostat to see if there are excessive irq's on one line or another. Run the system as a minimal OS when doing this... nothing but init 1 if possible. 3. Try Fedora 9 livecd and see if it still occurs. If it doesn't then the problem was fixed in the main kernel between EL-5 and now. That can help make it easier to track down for a bug in Red Hat's bugzilla.
I cannot find relevant support notes on either the Supermicro or Intel sites, but I'll send an email to Supermicro support to see if they know anything.
I used vmstat to compare interrupt and context switch rates on a system with the issue and a system without the issue (older kernel). Both systems show an irq rate of about 1000/sec and cs rate of about 25/sec.
The system that does not exhibit the problem is running 2.6.18-53.1.14.el5, so it seems to be something that has changed since that time frame (early CentOS 5.1, I think).
Thanks, --Bill
On Mon, Jul 21, 2008 at 11:00 AM, listmail listmail@entertech.com wrote:
On Mon, 21 Jul 2008 10:20:53 -0600, Stephen John Smoogen wrote
On Sun, Jul 20, 2008 at 4:52 PM, listmail listmail@entertech.com wrote:
<snip
OK, I downloaded the CentOS 5.2 Live CD and booted from it. To eliminate load from the GUI, I forced the system into runlevel 3 and ran top. I see the same problem; the load average sits at about 0.40 continuously. This is with the ethernet drivers running, and it does not matter if the network cables are plugged in or not.
Ok sorry for the wild goose chase earlier...
- Check with the manufacturer or motherboard to see if this is a
known issue. Sometimes these items show up and are fixed with a BIOS update. 2. Check to see if you can pinpoint where the problem is coming from... set up sar and iostat to see if there are excessive irq's on one line or another. Run the system as a minimal OS when doing this... nothing but init 1 if possible. 3. Try Fedora 9 livecd and see if it still occurs. If it doesn't then the problem was fixed in the main kernel between EL-5 and now. That can help make it easier to track down for a bug in Red Hat's bugzilla.
I cannot find relevant support notes on either the Supermicro or Intel sites, but I'll send an email to Supermicro support to see if they know anything.
I used vmstat to compare interrupt and context switch rates on a system with the issue and a system without the issue (older kernel). Both systems show an irq rate of about 1000/sec and cs rate of about 25/sec.
The system that does not exhibit the problem is running 2.6.18-53.1.14.el5, so it seems to be something that has changed since that time frame (early CentOS 5.1, I think).
Does the non-affected system show the problem when you run livecd on it? If not, i would try installing that kernel on your affected system and see if the problem goes away for the time being.
On Sat, 19 Jul 2008 13:48:55 -0700, I wrote
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Both top and uptime report the same thing. Looking at top, I cannot see any processes that are using CPU time except for top and init, and they are not using enough cycles to push up the load average.
According to top, there are occasional tiny (like 0.5%) bumps in the system usage occasionally, and almost no user space usage. Again, not enough to account for the load average I am seeing.
I have tried a couple of kernel updates, and upgraded from CentOS 5.0 to 5.2, none of which make any difference.
Has anyone else seen this? And can anyone recommend a way to figure out what is causing the load average to be this high when the machine is idle?
A follow-up now that this issue is resolved. Thanks to the help of some kind souls on this list, I was able to determine that the problem was only manifested when the Ethernet drivers were running. This led me to update the drivers, which solved the problem.
Details for others who will probably encounter this issue:
1. The problem occurs with the 2.6.18-92.1.6.el5 kernels that come with CentOS 5.2, and the supplied Intel e1000e Ethernet drivers v0.2.0 that ship with 5.2.
2. The fix is to update the e1000e drivers, which are available from the Intel web site. I installed e1000e version 0.4.1.7-NAPI. Instructions for installation come with the driver; the package I found was e1000e-0.4.1.7.tar.gz
3. You have to compile the drivers from source. They require the kernel-devel package to be installed in order to compile, of course. But if you are running the PAE kernel, you need to install kernel-PAE-devel to compile against. News to me, the naming convention makes it hard to figure out which name you need until you browse the available kernel packages. Simply doing yum install kernel-devel does not get you what you need.
I hope this saves someone else the time I wasted figuring this out. :-)
Cheers, --Bill
listmail wrote:
On Sat, 19 Jul 2008 13:48:55 -0700, I wrote
I am running CentOS 5 on a dual-dual-core Intel machine, and I am seeing a load average of between 0.35 and 0.50 while the machine is idle, i.e. no processes appear to be running.
Both top and uptime report the same thing. Looking at top, I cannot see any processes that are using CPU time except for top and init, and they are not using enough cycles to push up the load average.
According to top, there are occasional tiny (like 0.5%) bumps in the system usage occasionally, and almost no user space usage. Again, not enough to account for the load average I am seeing.
I have tried a couple of kernel updates, and upgraded from CentOS 5.0 to 5.2, none of which make any difference.
Has anyone else seen this? And can anyone recommend a way to figure out what is causing the load average to be this high when the machine is idle?
A follow-up now that this issue is resolved. Thanks to the help of some kind souls on this list, I was able to determine that the problem was only manifested when the Ethernet drivers were running. This led me to update the drivers, which solved the problem.
Details for others who will probably encounter this issue:
- The problem occurs with the 2.6.18-92.1.6.el5 kernels that come with
CentOS 5.2, and the supplied Intel e1000e Ethernet drivers v0.2.0 that ship with 5.2.
- The fix is to update the e1000e drivers, which are available from the
Intel web site. I installed e1000e version 0.4.1.7-NAPI. Instructions for installation come with the driver; the package I found was e1000e-0.4.1.7.tar.gz
- You have to compile the drivers from source. They require the kernel-devel
package to be installed in order to compile, of course. But if you are running the PAE kernel, you need to install kernel-PAE-devel to compile against. News to me, the naming convention makes it hard to figure out which name you need until you browse the available kernel packages. Simply doing yum install kernel-devel does not get you what you need.
I hope this saves someone else the time I wasted figuring this out. :-)
I think I am going to file this as a bug on the RH site to inform them of this issue so that they can choose to upgrade their driver if they want.
On Fri, Aug 1, 2008 at 5:28 PM, listmail listmail@entertech.com wrote:
- You have to compile the drivers from source. They require the kernel-devel
package to be installed in order to compile, of course. But if you are running the PAE kernel, you need to install kernel-PAE-devel to compile against. News to me, the naming convention makes it hard to figure out which name you need until you browse the available kernel packages. Simply doing yum install kernel-devel does not get you what you need.
Just to extend on this subject... somewhat detailed description on obtaining the kernel-devel package can be found in this CentOS wiki article:
http://wiki.centos.org/HowTos/I_need_the_Kernel_Source#head-7cb967afe95d0c9b...
Akemi