[CentOS] System hangs silently

Thu Jan 19 00:19:48 UTC 2006
Fong Vang <sudoyang at gmail.com>

On 1/18/06, Rodrigo Barbosa <rodrigob at suespammers.org> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On Wed, Jan 18, 2006 at 05:01:15PM -0500, Leonard Isham wrote:
> > > These systems are ordered from the same batch (same PO/build spec).
> > > They're all using the same kernel -- the latest of what CentOS 4.1
> > > provided at that time.
> > >
> >
> > I hate to say this, but I have found that this is not a guarantee of
> > 100% duplication of the internals.  Not even when the systems have the
> > same model numbers.  I won't mention a well known computer company
> > with three letters... or big... or blue...
> >
> > I've been bitten by this.
>
> I have to agree. Version numbers mean nothing.
> Most of the time, tho, the Part Number will tell the truth.
>
> With such a batch of machine, it would be interesting to try isolating
> the specifics of the ones giving problems, starting by the processors
> (check the P/N) and then the northbridge, which are the two most likely
> to be the culprid.
>
> We have been discussing this issue on-and-off on the linux-practices
> mailing list so, if you want to go there with some extra info,
> we might be able to help you on this without having people screaming
> "OFF TOPIC!" here on this list :)

Hopefully we're not wondering off topic here.  I have more information
to share.  Here's what I have learned since then"

* when the system appears to hang, you can't ssh to it but if you
already have a connection it works fine.
* high load average (~25)
* vmstat reports no heavy context switching, swapping, cpu
utilization, paging, etc.
* iostat activity is normal (no long iowait or service time)
* netstat/ifconfig is normal (no collision, error, etc.)
* more than a dozen crond process.  It seems to start every 10 minutes
to run sar.  strace of crond shows it doing setup().  Shutting down
crond caused it to hang more than 20 minutes before it came back.

Anyway, I'm having two systems shipped back here from a remote data
center for further analysis.

thank you all for your help and suggestion.

> Best Regards,
>
> - --
> Rodrigo Barbosa <rodrigob at suespammers.org>
> "Quid quid Latine dictum sit, altum viditur"
> "Be excellent to each other ..." - Bill & Ted (Wyld Stallyns)
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFDzrw5pdyWzQ5b5ckRAjrDAJ4gp9PGUGPd0ZsxN1hBDBea6v4IlwCcDCaE
> AuOOC8qS+9X3cHnUs7LBrvA=
> =nBAU
> -----END PGP SIGNATURE-----
> _______________________________________________
> CentOS mailing list
> CentOS at centos.org
> http://lists.centos.org/mailman/listinfo/centos
>