[CentOS] how can I stress a server?

Fri Nov 21 17:28:36 UTC 2008
William L. Maltby <CentOS4Bill at triad.rr.com>

On Fri, 2008-11-21 at 18:38 +0200, Rudi Ahlers wrote:
> <snip>

> I'm sitting with a very expensive paper weight right now, and I don't
> know what todo. The same websites are running very well on a machine
> with a Gigabyte G31MX-S motherboard + 4GB DDRII 800 RAM + C2D 6750
> CPU. This is what baffles me, how can the same load on a slower
> machine work fine, but on the faster one not?

Having watched all this thread, I note that certain things are not
mentioned. Assuming that you followed all the previous suggestions, I'll
add my own that is based on practical experience some years back, and
one recent experience.

Like you, I always built my own. Since you have no way to check the PS,
try removing all components you can and see if that helps. _Usually_ a
weak PS will show symptoms on boot, since all things are spinning up
asnd doing max current draw, but sometimes not. Some BIOS have settings
that allow or automatically "spin up" in a stepped sequence. This would
not stress the PS as much. Keep in mind that PS's have different
amperage draw capabilities for different rails. A seemingly "sufficient"
PS in terms of wattage may be weak on one or more of the rails. Specs
for the mobo and PS might indicate a problem.

Have you checked the voltage settings in the BIOS for the CPU and
memory? Many/most these days automatically detect, but...

Check the spec sheets for the CPU and memory sticks.

I recently upgraded a mobo memory and it would not boot or run reliably.
The spec for the memory was not available and I left the settings as
with the previous memory. Not wanting to fry the sticks and possibly
void the warranty, I picked up the whole thing an carried it back to my
local supplier. I explained the symptoms and told him I suspected memory
voltage but didn't want to try/fry the sticks and risk the warranty.

Hmmm... he said. Well, long story short, he eventually kicked up the
voltage (I guess the "auto" in the BIOS was flaky or something) and all
worked. Required +.2 volts. Most memory sticks can be run at slightly
higher (+.1, +.2) volts without harm. Larger memory may require a slight
increase in voltage. I guess the "automatic" settings can't always be
trusted.

Running about 6 months now, NPs.

Another thing about pulling all components you can: if there is some
kind of IRQ conflict, this can (used to?) cause slowdowns. Maybe that
will be shown there. But that should also leave some traces in
the /var/log/messages or dmesg log.

Let's presume that the "obvious" problem is not the problem. What if it
is not hardware directly?

Examine your /var/log/dmesg carefully for any "suspect" messages. I've
also found that occasionally drivers selected by the system may not be
exactly correct. Check the specs for mobo and add-in cards and see if it
looks like the best drivers for the chip sets are loaded (lsmod and
modinfo help here).

Grab any old performance/diagnostics software (maybe some on this list
have current knowledge - I don't) and run it. Compare to published data
for same or similar systems.

Enable sar on the system, run the reports and see where the slowdowns
are.

I haven't used multi-core yet, but I would first check to see if all the
cores are being effectively used. Maybe top will help here? Not sure.

BIOS: some have oddball (not really, but legacy issues abound) settings
that may limit amount of memory seen/used? Keep an eye out for those.
Memory timings may not be properly detected and set. Check the specs for
the memory and see if the BIOS has them properly set. BTW, _some_ memory
and mobo combos will allow faster settings, but be careful. I haven't
dinked with them for a long time, so I can't make any Q & A suggestions.

Have you upgraded to the latest BIOS on the system? Most retail mobos
come with an early BIOS version that has... "issues". Check the
manufacturers web site and see if there is a later BIOS.

OTHER: Of course, you have manually "re-seated" all connections, yes? A
slightly loose cable, add-in card or memory not fully seated can do
things such as you describe.

Visually inspect cables for "micro-fractures". Better, if you have
access to meters, check for excessive resistance or opens. If not, try
changing out cables. You might want to look in this area only if SAR
reports show slow disk activity. Also hdparm might give some
information. Maybe some settings there would help too.

That's all I can think of ATM. I hope something of use here.

-- 
Bill