[CentOS] Now I can't shutdown [was: Screen blanks afteral p (Centos 5)]

Sat May 19 18:26:51 UTC 2007
William L. Maltby <CentOS4Bill at triad.rr.com>

On Sat, 2007-05-19 at 17:54 +0300, Itay wrote: 
> On Wed, 16 May 2007, William L. Maltby wrote:
> 
> ><snip my prior reply header junk>

> > 
> > On Wed, 2007-05-16 at 21:10 +0300, Itay wrote:
> >
> >> ** I failed to mention earlier that the media check prior to
> >> installation *failed*.  I decided to go on with the installation
> >> because my own experience so far, and others', showed that
> >> failure does not necessarily imply bad media.  But see below.
> >
> > Based on investigation I've done in response to the thread here,
> >
> >    http://lists.centos.org/pipermail/centos/2007-April/079718.html
> >
> > and the advice by several in that thread, you need to have padding on
> > the media. I'm authoring a "SOLVED" message for that thread, but haven't
> > completed my tests.
> >
> [snip]
> >
> > Use one of the padding methods mentioned in that thread and you should
> > be OK and the errors should disappear.
> 
> *panic*
> 
> Bill, thank you for the detailed reply.
> 
> I tried the padding technique - the media errors were gone; 
> kernel panic - stayed.  I hope that you or others may help me 
> with this.

Most likely it will be others. My ignorance is boundless and,
fortunately, my ego is inversely proportional to that! :-)

I'm glad the media errors are gone.

> 
> Here is what I did:
> 
> 1 Burnt i386 DVD using the padding method offered by Johnny Hughes
>    http://lists.centos.org/pipermail/centos/2007-April/079828.html
>    (Check sum OK.)
> 
> 2 Booted to install
>    + Media check OK
>    + Finished installation w/o noticable problems
>      - This time there were no error messages about i/o problems, bad
>        sectors, etc.
>      - anaconda.log had some warning messages regarding to some missing
>        /etc, /usr, and few libs.  (Similar to 1st installation.)

I'm at console. Hang around and I'll reboot my LFS machine into CentOS 5
and see what I have in the logs.

I'm posting this so you can see if any of the stuff I mention can be
pursued while I compose a response to these log items.

>    + Rebooted from installer
>    + Expected Setup Agent.  Got *kernel panic* instead.
>    + Second and third reboot have landed me in text-mode Setup Agent.
> 
> 3 I tried several things, each one of them ended in *kernel panic*
>    either before logging in as root, or some minutes after.  The panic
>    appeared after idling the machine for some time.

*sniff* Smells hardware-related. But whether it's bad hardware or kernel
handling of it, I'm too ignorant to hazard a guess. I googled and found
your original post (BTW, don't high-jack threads, even you own. It made
it more difficult to find you brief originally-posted hardware ref). :-O

I was going to ask about x586 or C5 processors, but I did manage to find
your OP and saw AMD 4200+, IIRC. So we don't have to worry about that.

> 
> 4 A couple of strange things
>    + I have found out that the default run level was set to 3.
>      When, as a root I tried 'telinit 5', the machine responded with a
>      blank screen.  I had to reset.

Have you tried a <CTRL>-<ALT>-<F1> when this happens? Since desktop is
being started on tty7, if it fails and seems blank, maybe switching to
virtual console 1 will work, if the machine is still alive. If so, maybe
some answers are there (view /var/log/messages, the X log, etc.).

>    + Rebooting the machine was accompanied with messages regarding
>      ntp/clock skew.  Later, I have found out that I have lost the
>      network connection, probably while playing with the
>      installation, so this probably explains the clock skew.
>      Am not sure if this has any relevance.
>    + At no point I was prompted to setup a non-root user.

IIRC, when I did my C5 install, I got that prompt. If that's normal, it
may mean that the problem actually bit your during the install phase and
not everything got done correctly.

Maybe my install log will give you something to check against.

> 
> 5 For each crash / kernel panic I got a screen-load of trace and other
>    cryptic output.  Each time, so it seems, the output was different.
>    *Q* Is there a way to dump those messages into a file?

I'm too ignorant to answer that. But if you do get up and running for a
few minutes in a text console, clues may be laying around
in /var/log/messages. Search backwards for "restart" (twice) or some
other word, like "panic", and read around there.

> 
> 6 Only suspicious thing I have found in /var/log/messages was lines
>    like this
> 
> May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error)
> May 19 11:27:36 bilbo kernel: ata1: EH complete
> 
> 7 Also, /var/log/secure had these errors - I believe for every reboot.
> 
> ...
> May 19 11:25:30 bilbo login: ROOT LOGIN ON tty1
> May 19 11:26:06 bilbo login: pam_unix(login:session): session closed for user root
> May 19 11:26:09 bilbo sshd[2677]: Received signal 15; terminating.
> May 19 11:27:26 bilbo sshd[2687]: Server listening on :: port 22.
> May 19 11:27:26 bilbo sshd[2687]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
> May 19 11:29:51 bilbo login: pam_unix(login:session): session opened for user root by LOGIN(uid=0)
> May 19 11:29:51 bilbo login: pam_selinux(login:session): Warning!  Could not get new context for /dev/tty1, not relabeling: Invalid argument
> May 19 11:29:51 bilbo login: pam_selinux(login:session): usercon=(null), prev_context=system_u:object_r:tty_device_t
> May 19 11:29:51 bilbo login: ROOT LOGIN ON tty1

I'm too ignorant to answer authoritatively.
> 
> > My *guess* is that the application related errors you reported may be a
> > result of certain installation steps terminating early due to the false
> > I/O errors reported by the kernel/driver(s).
> >
> > HTH
> > --
> > Bill
> 
> Any recommendation how to proceed?
> (The most pressing question: is it the hardware? Should I take 
> the box back to the seller?)

If the panics are random, IIRC, could be memory, could be ... But a good
run of memtest386 from the install CD should help determine that. Also,
it is not uncommon for new hardware to have the occasional loose
connector or PCI card. Maybe too small power supply. Maybe CPU fan not
spinning. Maybe ambient temperature of the room is too high and internal
box temperature excessive.

If you suspect hardware, check all connectors. Make sure memory, power
supply connectors and PCI cards are firmly seated. Make sure your power
supply is adequate (my EPOX board needed much more than the PS for the
ACER box, into which the EPOX was originally installed, could supply.
Had random panics, usually near startup times, sometimes a few minutes
after. That's natural because the ACER had an integrated SiS chip set
which needs much less power than the Via-based EPOX.

Make sure the CPU fan is seated and working.

Is your AC power from the electric company reliable? Fluctuations of 20%
are not uncommon here. Battery backup with power conditioning helps a
lot.

Since you mentioned a delay sometimes (IIRC), heat sounds like a
possible culprit. If the room is cool, take the covers off and see if it
runs longer. If it stays up long enough, do

  # cat /proc/acpi/thermal_zone/THRM/temperature
  temperature:             36 C

Make sure it's in the range for the AMD you have. BTW, mine is lower
than it used to be. I added an expensive Zallman FHS a few months back.
May try overclocking someday if I get enough interest.

> 
> Many thanks.
> 

Use google with "site:centos.org" added, e.g. like this

    screen blanks after initial setup site:centos.org

in advanced search fields (I had site:... in the "all of the words"
field and "screen blanks after initial setup" in the "exact phrase"
field. You'll find lots of instances of kernel panics discussed on the
list and some suggestions, in some cases, for "noapic" and similar boot-
time parameters.