on 8-14-2008 12:55 AM Chris Miller spake the following:
nate wrote:
Chris Miller wrote:
I've got a pair of HA servers I'm trying to get into production. Here are some specs :
[..]
[root@haws1 ~]# BUG: unable to handle kernel paging request at virtual address c
This typically means bad RAM
While I won't rule this out, my local hardware vendor does a 48 hour burn-in including a full gamut of tests (including memory) before handing over the servers. These servers are less than two weeks old...
Seems like this is a common type of error in some situations. I tried to boot in kexec/kdump mode (CentOS 5 replacement for diskdumputils), but the e1000 driver isn't seeing the NICs after a reboot via the "capture kernel", so I can't replicate the (rsync induced) problem and perform kernel debugging. I'll explore this more tomorrow.
Chris
When the servers are shipped to you, do you open them and make sure all modules are seated completely, and haven't been dislodged by the shipping?