[CentOS] Now I can't shutdown [was: Screen blanks afteral p (Centos 5)]

Sat May 19 21:02:31 UTC 2007
William L. Maltby <CentOS4Bill at triad.rr.com>

On Sat, 2007-05-19 at 23:36 +0300, Itay wrote:
> On Sat, 19 May 2007, William L. Maltby wrote:
> 
> > On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
> 
> [snip]
> 
> >><snip>

> >
> >... (BTW, don't high-jack threads, even you own. It made
> > it more difficult to find you brief originally-posted hardware ref). :-O
> 
> I thought (and still do) that the two issues were related, and 
> therefore modifying the subject line and including a [was:...] 
> clause are sufficient.  Sorry for the extra work.

NP. From *my* background, a media check that is solved becomes un-
related to a kernel panic. So a new thread would be in order. However, I
also understand that from *your* POV, it's all "install fails".

> <snip>

> Wasn't able to switch to virtual consoles.  (I begin to suspect 
> that there are some problems with the keyboard as well, though.)
> No clues in /var/log/messages.
> And no X.log at all!

Ach! Kb problems are not needed when you are suffering "panic attacks".
Prozac works! ;-)

> <snip>

> >>    + Rebooting the machine was accompanied with messages regarding
> >>      ntp/clock skew.  Later, I have found out that I have lost the
> >>      network connection, probably while playing with the
> >>      installation, so this probably explains the clock skew.
> >>      Am not sure if this has any relevance.
> >>    + At no point I was prompted to setup a non-root user.
> >
> > IIRC, when I did my C5 install, I got that prompt. If that's normal, it
> > may mean that the problem actually bit your during the install phase and
> > not everything got done correctly.
> 
> Possibly.  But there were no hints for that in install.log and 
> anaconda.*log*
> 
> >> 5 For each crash / kernel panic I got a screen-load of trace and other
> >>    cryptic output.  Each time, so it seems, the output was different.
> >>    *Q* Is there a way to dump those messages into a file?
> >
> > I'm too ignorant to answer that. But if you do get up and running for a
> > few minutes in a text console, clues may be laying around
> > in /var/log/messages. Search backwards for "restart" (twice) or some
> > other word, like "panic", and read around there.
> 
> No hints except for what I have mentioned below.

I was afraid of that. It was worth a try though.

> 
> >> 6 Only suspicious thing I have found in /var/log/messages was lines
> >>    like this
> >>
> >> May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> >> May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
> >> May 19 11:27:36 bilbo kernel: ata1: EH complete
> >>
> >> 7 Also, /var/log/secure had these errors - I believe for every reboot.
> >> <snip>

> >
> > If the panics are random, IIRC, could be memory, could be ... But a good
> > run of memtest386 from the install CD should help determine that. Also,
> > it is not uncommon for new hardware to have the occasional loose
> > connector or PCI card. Maybe too small power supply. Maybe CPU fan not
> > spinning. Maybe ambient temperature of the room is too high and internal
> > box temperature excessive.
> 
> Running memtest now for the night (runs for 2 hours already).
> If it was a question of excess heat I would expect to have 
> trouble during memtest run as well; no?

Memtest will not exercise the CPU enough to add substantial heat issues.
Doing a couple *heavy* compilations at the same time, like compiling the
kernel, glibc, ... and more will spin HDs and tax the CPU substantially.

This will add a... who was said "a buttload"... of heat. With the speed
of your CPU, might need to do a couple of instances of some *big* stuff
at the same time in different directories.

> <snip>

> > Is your AC power from the electric company reliable? Fluctuations of 20%
> > are not uncommon here. Battery backup with power conditioning helps a
> > lot.
> 
> Actually, the power supply is not stable enough.  But there 
> were no fluctuations that I could notice during my attempts 
> this morning.  We'll keep this in mind, though.

Fluctuations generally go unnoticed. A drop of 10-20 volts or a spike of
a similar amount won't disturb a typical PS. If you have a BBS with
conditioning or "brownout" protection, an alarm may sound, if available
and not disabled.

"Outages" for fractional seconds are noticed, but even a second or two
is often survived with todays capacitor-laden supplies and mobos.

> 
> > Since you mentioned a delay sometimes (IIRC), heat sounds like a
> > possible culprit. If the room is cool, take the covers off and see if it
> > runs longer. If it stays up long enough, do
> 
> Again: memtest'ing for few hours should produce a similar 
> challenge I should think.
> I could try running knoppix 5 for a while and straining somehow 
> the CPU.

Except for things like I mentioned above, I doubt you good load it
enough to cause substantial effect. Hmmm ... got it.

Run several "bzip2 --best" on several large files simultaneously (like
CentOS ISO images which have a lot of already compressed stuff - that
adds more load to bzip2's effort).

That should also tromp the HDs reasonable hard as a lot of reading input
and creating temporary files should occur.

> <snip>

HTH
--
Bill