Gerry Reno wrote: > On 03/08/2013 10:43 AM, m.roth at 5-cent.us wrote: >> Kwan Lowe wrote: >>> On Thu, Mar 7, 2013 at 8:51 PM, Johnny Hughes <johnny at centos.org> >>> wrote: >> <snip> >>> As far as logging goes, any idea what sort of failures could cause >>> such a lockup? I.e., if memory was failing, would the system still be >>> able to log? As the mouse is frozen and kernel sysrq has no effect, >>> I'm still leaning towards hardware but literally everything except the >>> case has been swapped out. (Well.. let me qualify that.. Everything >>> but the 64GB SSD drive has been swapped but it seemed unlikely that a >>> drive failure could cause such a lockup. Incorrect assumption?) >> No ideas... and I've had a number of systems do this, over the last >> couple years, where someone noted it had stopped responding; I go down, >> and it doesn't respond *at* *all* when I plug in a monitor & keyboard, and >> power cycling's the only answer. >> >> Thinking about it, I believe it's mostly been on our Penguin servers, >> and that co. uses Supermicro m/b's, and we've had h/w problems with them, >> also, and have had several m/b's replaced under warranty. > > Nearly every time we've had lockup problems it has come down to bad or > failing memory. > > I've even had memory cause problems where it would pass a quick memtest > but ultimately would fail if you left it running > the tests overnight. Right, but I've always *seen* error messages, dmesg, and, if mcelogd is actually working (I can't figure out why it seems to on some machines, and not on others, or why it doesn't keep running), it's in there. The times we've had lockups, there's been nothing. mark