Michael Eager wrote:
m.roth@5-cent.us wrote:
Michael Eager wrote:
John Hodrien wrote:
On Wed, 9 Mar 2011, Michael Eager wrote:
<snip> Here's one more, off-the-wall thought: do the setterm --powersave off, and find some way to make it work, so that you can see what's on the
screen
when it dies.
Yes, I did this. Switched to console screen. The correct command is "setterm -powersave off -blank off", otherwise the screen gets blanked. Turned the monitor off. I hope it shows something useful on the next fault.
Best of luck. And thanks, I may try that.
What may be very important here is I recently had a problem with a honkin' big server crashing... and it turned out that a user was running a parallel processing job that kicked off three? four? dozen threads, and towards the end of the job, every single thread wanted 10G... on a system with 256G RAM (which size still boggles my mind). The OOM-Killer didn't even have a chance to do its thing.... Yes, he's limited what his job requests, and the system hasn't crashed since.
Strange. OOM-Killer should get priority. That's what it's for. Although it usually seems to kill the innocent bystanders before it gets around to killing the offenders.
Yeah, but apparently too many of them hit too quickly - that's all I can think.
mark