[Centos] SMP rock-solid?

Tue Jan 25 14:36:18 UTC 2005
Henk van Lingen <henkvl at cs.uu.nl>

On Tue, Jan 25, 2005 at 08:10:11AM -0600, donavan nelson wrote:
  > Benjamin J. Weiss wrote:
  > >I always keep both the SMP and non-SMP kernel because sometimes the smp 
  > >kernel doesn't work.  Let's face it, SMP is still something that isn't 
  > >rock-solid in linux.
  > 
  > Your experiences differ greatly from mine.  What are you doing that can 
  > be directly attributed directly to SMP in the kernel?

  Actually, this weekend one of my CentOS 3.3 systems, with two Xeons,
  ended in some strange state while doing heavy calculations. All seemed 
  normal but one couldn't ssh to the system anymore. After a reboot, the
  atop datafiles (if you don't know, an enhanced top:
  http://www.atcomputing.nl/Tools/atop ) showed that after some moment there
  wasn't any CPU activity anymore. Stronger: it had 0% idle time _and_ 0%
  user time  _and_ 0% sys time. But it didn't hang, there was some activity,
  for example, yum cron updates where going on.
  Also note that after the idle/user/sys went to zero, new processed didn't
  get an proces id anymore and where displayed in atop with proces id '?'.

  FWIW: my wild guess was some bug in SMP or SMT stuff.

  Regards,

-- 
Henk van Lingen, Systems & Network Administrator              (o-      -+
Dept. of Computer Science, Utrecht University.                /\        |
phone: +31-30-2535278                                        v_/_
http://henk.vanlingen.net/             http://www.tuxtown.net/netiquette/