> compdoc wrote: >>> According to the man page, it apparently needs a kernel driver >>> named OpenIMPI, which it claims is installed in standard >>> distributions. I don't find it on my system. >> >> >> lm_sensors is another, and I think installs ready to use from the repos. > > sensors says that the three temp sensors read +36C, +39C, and +87C. > These appear to be AMD K10 temp sensors, although I might be > misreading sensors-detect. Low/highs are (+127/+127, +127/+90, > +127/+127) respectively. (I'm not sure if these are alarm set > points or something else.) > > One fan is listed as 0 rpm. Something to look into. Hmm, much has been said now in this thread and I know how difficult it can be to find such an issue. However, I suggest not to throw in too many new tools in parallel. And, be careful of how to interpret any information gathered by tools like lm_sensors. They can only report as good as the mainboard and it's sensors were designed and built, both can be suboptimal. I've seen all kind of things like temp sensors not mounted where they should. Of course, builtin sensors like thiose of a CPU should be taken very serious. So, may I give some more tips how I'd try to find what is wrong: - Take a vacuum cleaner and *carefully* clean the whole box. Dust can really do bad things because it is not a perfect insulator. - If you feel you have to remove any device like CPU, make sure you up everything, have a good quality heat sink paste at hand and make sure everything is seated well after mounting it again. - For the memory part, do you have ECC? If not, then it is really a problem and if the box is used as a server, ECC is a must, if yes, then most errors will be corrected by ECC but what is more important, memory errors are usually logged. You should be able to find a list of those errors in the BIOS, you may see how many times errors occur and where, does something like that exist? - For the temparatures, 87C is not so uncommon, but yes, it looks a little bit high. Someone else posted 80C to be the max for your CPU, that seems correct, at least our 12core Opterons have "Caution: 75C; Critical: 80C" but they usually run at 45C-55C under normal load. So if 87C is really correct, under normal load, that may be already too much, and then consider what happens at peak times? - When you look at the lm_sensors values, do they correspund with what is shown in the BIOS (if is has this kind of diagnostics)? Simon