Michael Eager wrote: > m.roth at 5-cent.us wrote: >> Michael Eager wrote: >>> John Hodrien wrote: >>>> On Wed, 9 Mar 2011, Michael Eager wrote: >> <snip> >> Here's one more, off-the-wall thought: do the setterm --powersave off, >> and find some way to make it work, so that you can see what's on the screen >> when it dies. > > Yes, I did this. Switched to console screen. The correct command > is "setterm -powersave off -blank off", otherwise the screen gets > blanked. Turned the monitor off. I hope it shows something > useful on the next fault. Best of luck. And thanks, I may try that. > >> What may be very important here is I recently had a problem >> with a honkin' big server crashing... and it turned out that a user was >> running a parallel processing job that kicked off three? four? dozen >> threads, and towards the end of the job, every single thread wanted >> 10G... on a system with 256G RAM (which size still boggles my mind). The >> OOM-Killer didn't even have a chance to do its thing.... Yes, he's >> limited what his job requests, and the system hasn't crashed since. > > Strange. OOM-Killer should get priority. That's what it's for. > Although it usually seems to kill the innocent bystanders before > it gets around to killing the offenders. Yeah, but apparently too many of them hit too quickly - that's all I can think. mark