Rudi Ahlers wrote: > On Sun, Nov 16, 2008 at 1:14 AM, John R Pierce <pierce at hogranch.com> wrote: >> Rudi Ahlers wrote: >>> Well, on a standard CentOS 5.2, /var/log/messages will be the the >>> place to log problems like this, or where else can I get more info? >>> >> tough to write to the disk when the kernel is crashing. ditto the network. >> that leaves VGAs and serial ports, which can be written to by self >> contained emergency-crash routines... >> >> IIRC, you said this was a Q9something quad core... thats a desktop >> processor... does this server have ECC memory? (I ask, because few desktop >> platforms do, while ECC is fairly standard on servers). Without ECC, the >> system has no way of knowing it read in bad data from the ram, and if the >> bad data happens to be code and that code happens to be in the kernel, >> ka-RASH, without any detection or warning, it leaps off into never-land, and >> you get a kernel fault, almost always resulting in... >> >> kernel panic >> system halted >> >> with no additional useful information available. with ECC memory, single >> bit errors get corrected on the fly, and log an ECC error event, while >> double bit errors result in a system halt with a message indicating such. >> >> > > > No, the motherboard doesn't support ECC RAM. The motherboard is a > Intel DG35EC - http://www.intel.com/products/desktop/motherboards/DG35EC/DG35EC-overview.htm I had machine that would crash about once every week or two in normal operation. Memtest86+ found an error in the 2nd day of running. The worst part was that it left the raid mirrors in a strange state that caused occasional problems for months even after replacing the RAM. -- Les Mikesell lesmikesell at gmail.com