On Tue, Jan 25, 2005 at 08:10:11AM -0600, donavan nelson wrote:
Benjamin J. Weiss wrote:
I always keep both the SMP and non-SMP kernel because sometimes the smp kernel doesn't work. Let's face it, SMP is still something that isn't rock-solid in linux.
Your experiences differ greatly from mine. What are you doing that can be directly attributed directly to SMP in the kernel?
Actually, this weekend one of my CentOS 3.3 systems, with two Xeons, ended in some strange state while doing heavy calculations. All seemed normal but one couldn't ssh to the system anymore. After a reboot, the atop datafiles (if you don't know, an enhanced top: http://www.atcomputing.nl/Tools/atop ) showed that after some moment there wasn't any CPU activity anymore. Stronger: it had 0% idle time _and_ 0% user time _and_ 0% sys time. But it didn't hang, there was some activity, for example, yum cron updates where going on. Also note that after the idle/user/sys went to zero, new processed didn't get an proces id anymore and where displayed in atop with proces id '?'.
FWIW: my wild guess was some bug in SMP or SMT stuff.
Regards,