(Any list admins reading: This silliness with the multihop block list is getting really old.)
---------- Forwarded message ---------- From: Bart Schaefer barton.schaefer@gmail.com Date: May 13, 2007 12:29 AM Subject: Re: [CentOS] Now I can't shutdown [was: Screen blanks after initial setup (Centos 5)] To: CentOS mailing list centos@centos.org
On 5/12/07, Itay centos@nospammail.net wrote:
And I can't shutdown it, not even with the power button.
There's no reset button?
Have you tried holding the power button down for 6-10 seconds?
On Sun, 13 May 2007, Bart Schaefer wrote:
On 5/12/07, Itay centos@nospammail.net wrote:
And I can't shutdown it, not even with the power button.
There's no reset button?
Not seen any.
Have you tried holding the power button down for 6-10 seconds?
I tried the 3-5 seconds :-) When I get home -- this is my home machine -- I'll try for longer.
Thanks,
On 5/13/07, Itay centos@nospammail.net wrote:
On Sun, 13 May 2007, Bart Schaefer wrote:
On 5/12/07, Itay centos@nospammail.net wrote:
And I can't shutdown it, not even with the power button.
There's no reset button?
Not seen any.
Have you tried holding the power button down for 6-10 seconds?
I tried the 3-5 seconds :-) When I get home -- this is my home machine -- I'll try for longer.
Thanks,
Itay Furman centos@nospammail.net
While this may be a long shot, you may have a hardware issue.
Holding the power button for > 5 seconds should take the system down (unless you've done something funky in the BIOS power settings).
Do you have the latest and greatest BIOS for the motherboard installed? [Also load "Optimal" and/or "Fail-Safe" Defaults (or what-ever your BIOS calls them, e.g. "Factory Defaults").]
When you re-boot up, everything may be OK. Or, you may need to enter "Rescue Mode" via CD #1 and (re-)install grub to get a working system. Or, (depending on where it died), re-install all over again. While in Rescue Mode, check-out your install.log and syslog files for any hints as to what happened. {Usually when you die like this, nothing gets written -- but it may tell where it was when it did freeze-up.}
I would also run "memtest" off CD #1 for a couple of hours to make certain that memory is working OK, etc.
Just some random things to think about .... (I've been down the "frustrating" path before and know your "pain"....)
Rich
On Sun, 13 May 2007, Richard Karhuse wrote:
On Sun, 13 May 2007, Bart Schaefer wrote:
Holding the power button for > 5 seconds should take the system down (unless you've done something funky in the BIOS power settings).
Bart and Richard - thank you for your reply.
I just came back home, and first thing I did was to try again to shut down the system. It worked (counting s-l-o-w-l-y until 4). Hurrah!
Next, I will try to follow the various tips Richard has offered and see what I dig.
Do you have the latest and greatest BIOS for the motherboard installed? [Also load "Optimal" and/or "Fail-Safe" Defaults (or what-ever your BIOS calls them, e.g. "Factory Defaults").]
I didn't play with the BIOS when I recieved the box. The only thing I did before installing centos 5 was to run knoppix 5. Once. If the BIOS was touched, it must be by the supplier. I will check that, too.
Thanks again for the tips. Itay
When you re-boot up, everything may be OK. Or, you may need to enter "Rescue Mode" via CD #1 and (re-)install grub to get a working system. Or, (depending on where it died), re-install all over again. While in Rescue Mode, check-out your install.log and syslog files for any hints as to what happened. {Usually when you die like this, nothing gets written -- but it may tell where it was when it did freeze-up.}
I would also run "memtest" off CD #1 for a couple of hours to make certain that memory is working OK, etc.
Just some random things to think about .... (I've been down the "frustrating" path before and know your "pain"....)
Rich
On Sun, 13 May 2007, Richard Karhuse wrote:
So as I said in my previous reply I was able to shutdown via the power button. Now I attempted to power on and enter the bios set up. Well, guess what? I was able to reboot right away into centos 5. It seems to be working, but not perfect (time and date are not correct).
I am trying to follow up your suggestions but have few problems.
Do you have the latest and greatest BIOS for the motherboard installed? [Also load "Optimal" and/or "Fail-Safe" Defaults (or what-ever your BIOS calls them, e.g. "Factory Defaults").]
I think I do have the latest, but am not sure. Explanation: asus site provides two latest BIOS revisions: 0301 for DOS, and 0203 for all OS's. I've got 0203 (according to dmidecode) so I think this is fine.
When you re-boot up, everything may be OK. Or, you may need to enter "Rescue Mode" via CD #1 and (re-)install grub to get a working system. Or, (depending on where it died), re-install all over again. While in Rescue Mode, check-out your install.log and syslog files for any hints as to what happened. {Usually when you die like this, nothing gets written -- but it may tell where it was when it did freeze-up.}
I would also run "memtest" off CD #1 for a couple of hours to make certain that memory is working OK, etc.
I rebooted into rescue mode but can't find memtest, only memtest-setup. According to some reading (oh! it's already midnight here) I should do:
* reboot again to rescue (because I have mounted fs read-only) * chroot /mnt/sysimage * run memtest-setup (to get a memtest86 entry in grub) * reboot to memtest86 in grub
Is this correct? Or maybe I should just reboot to centos, yum install memtest and continue from there?
Just some random things to think about .... (I've been down the "frustrating" path before and know your "pain"....)
Rich
Thanks so much.
Centos - indeed a community!
I would also run "memtest" off CD #1 for a couple of hours to make certain that memory is working OK, etc.
I rebooted into rescue mode but can't find memtest, only memtest-setup.
No, you boot up *with* CentOS CD #1 in the CD drive. At the "boot:" prompt (where you normally just hit <ENTER> to install CentOS), type:
memtest86<ENTER>
It will run memtest off the CD.
johnn
On Mon, 14 May 2007, Johnny Tan wrote:
I rebooted into rescue mode but can't find memtest, only memtest-setup.
No, you boot up *with* CentOS CD #1 in the CD drive. At the "boot:" prompt (where you normally just hit <ENTER> to install CentOS), type:
memtest86<ENTER>
It will run memtest off the CD.
johnn
Oh - silly me :-) Ok, now testing (so far so good).
On 5/14/07, Itay centos@nospammail.net wrote:
On Mon, 14 May 2007, Johnny Tan wrote:
I rebooted into rescue mode but can't find memtest, only memtest-setup.
No, you boot up *with* CentOS CD #1 in the CD drive. At the "boot:"
prompt
(where you normally just hit <ENTER> to install CentOS), type:
memtest86<ENTER>
It will run memtest off the CD.
johnn
Oh - silly me :-) Ok, now testing (so far so good).
-- Itay Furman centos@nospammail.net --
First of all, you can run memtest86+ either way --
1. From the 1st CD, Rescue CD (or any other CD that has it), or 2. From Grub if you have installed it.
Memtest86+ runs as a completely stand-alone program.
On any new box that I use, I try to run it at the first opportunity that I get. (I usually let it run overnight, if not a week-end.) Of course, sometimes given a new "play toy" I don't always get around to running it. *However*, if the box ever starts showing "weirdness" (and it's not obvious what's wrong), this test gets scheduled at the next "slow period" for the unit. You'd be surprised the number of times it finds things -- even with ECC, registered memory, etc. (Of course, it is *not* a panacea for all system ills -- just a tool and one of many ....).
I now typically try to install it on all of my servers so remote techs can run it on units via grub without having to find / have a CD.
The reason that I recommend checking the version of the BIOS, I have found "with the latest & greatest" it sometimes helps. The last time I played with ASUS motherboards (many years ago) I found that you had to be on 4th or 5th release of the BIOS before things really go stable. [OT: I recently installed a dual Xeon system and was having lots of problem. Given that it was recently manufactured I figured everything would be up to snuff on it ... Nope -- it needed a new BIOS to work with > 4 GB of RAM correctly.]
However, since your system is booting and working, I would conjecture that everything is fine and should be able to work and use the system. Do a "yum update" and make certain that all things are installed (and up to date).
Have fun ...
Rich
On Tue, 15 May 2007, Richard Karhuse wrote:
Date: Tue, 15 May 2007 09:26:46 -0400 From: Richard Karhuse rkarhuse@gmail.com Reply-To: CentOS mailing list centos@centos.org To: CentOS mailing list centos@centos.org Subject: Re: [CentOS] Re: Now I can't shutdown [was: Screen blanks after initial setup (Centos 5)]
[snipped some good advice regarding memtest]
However, since your system is booting and working, I would conjecture that everything is fine and should be able to work and use the system. Do a "yum update" and make certain that all things are installed (and up to date).
Rich: thank you for the handholding.
Unfortunately the machine doesn't work so well :-< Yester morning I got
kernel panic - not syncing : Fatal exception.
This was after being able to boot once (or twice - don't remember) into the installed centos. Now the box is shut down.
So here is the information I have right now. I appologize if some of it will be proved irrelevant.
** Ran memtest for ~4h (5 passes) without errors.
** BIOS seems to be the latest (not sure if it's the greatest :-)
** I failed to mention earlier that the media check prior to installation *failed*. I decided to go on with the installation because my own experience so far, and others', showed that failure does not necessarily imply bad media. But see below.
** Also, this is a x86_64 machine, but the i386 distro (recommended for a desktop).
**** ** Question: should I simply try a fresh installation using ** another copy of the DVD? ****
Here are some things I dug out of /var/log/* files:
anaconda.syslog seem to be more interesting ===========================================
There are quite a few
<4>hda: media error (bad sector): status=0x51 { DriveReady SeekComplete Error } <4>hda: media error (bad sector): error=0x30 { LastFailedSense=0x03 }
Followed by
<3>Buffer I/O error on device hda, logical block ...
(hda is my DVD drive, so I guess it is related to the failed media check above?)
anaconda.log ============
ERROR : failed to insert /tmp/floppy.ko (Seems OK - we don't have a floppy device on this box)
Numerous messages such as
DEBUG : ignoring driverless device ... DEBUG : No package named ... available to be installed
Few warnings about missing /etc files, and some others. For example: WARNING : /etc/rpm/macros doesn't exist
install.log has these errors ============================
Installing vte - 0.14.0-2.el5.i386 error: unpacking of archive failed on file /usr/lib/libvte.so.9.1.5;4645e036: cpio: read
Installing webalizer - 2.01_10-30.1.i386 error: unpacking of archive failed on file /usr/bin/webalizer;4645e036: cpio: read
Installing wget - 1.10.2-7.el5.i386 error: unpacking of archive failed on file /usr/bin/wget;4645e036: cpio: read
Installing openoffice.org-core - 1:2.0.4-5.4.17.i386 error: unpacking of archive failed on file /usr/lib/openoffice.org2.0/help/en/shared.jar;4645e036: cpio: read
Installing kde-i18n-Hebrew - 1:3.5.4-1.noarch error: unpacking of archive failed on file /usr/share/locale/he/LC_MESSAGES/kbruch.mo;4645e036: cpio: read
(Also related to the failed media check above?)
Finally, the installation and boot logs reveal the following bug: =================================================================
ENABLING IO-APIC IRQs ..TIMER: vector=0x31 apic1=0 pin1=0 apic2=-1 pin2=-1 ..MP-BIOS bug: 8254 timer not connected to IO-APIC ...trying to set up timer (IRQ0) through the 8259A ... failed. ...trying to set up timer as Virtual Wire IRQ... failed. ...trying to set up timer as ExtINT IRQ... works.
According to what I have found on the web, and as confirmed by the last line, this seems to be a harmless bug.
Have fun ...
Rich
Indeed, this is a learning experience.
On Wed, 2007-05-16 at 21:10 +0300, Itay wrote:
On Tue, 15 May 2007, Richard Karhuse wrote:
<snip headers>
<snip>
** I failed to mention earlier that the media check prior to installation *failed*. I decided to go on with the installation because my own experience so far, and others', showed that failure does not necessarily imply bad media. But see below.
Based on investigation I've done in response to the thread here,
http://lists.centos.org/pipermail/centos/2007-April/079718.html
and the advice by several in that thread, you need to have padding on the media. I'm authoring a "SOLVED" message for that thread, but haven't completed my tests.
Preliminary conclusions are that the Linux read-ahead feature has a "bug". If the read is near end-of media and the system is filling buffers, it seems that it is ignorant of the fact that the file system size (written media size in this case) has been exceeded. This has been reported as affecting ISO-9660 file systems, but I can testify that it also affects "raw" operations such as "dd".
I *suspect* that speed of equipment (burner, memory, PCI bus, CPU,...) affect this, but I've not yet tested on two slower nodes to confirm this. Real life keeps interfering.
In association with those error messages, you might see some surrounding errors mentioning the "logical blocks" involved with the errors. If those logical blocks are *within* your written media size, you have a definite problem with the media or hardware, IMO.
If they are beyond the written media, it's the kernel/driver error, IMO.
Is it a problem? Depends, I think. For things such as "dd", no. It finishes writing what it could and all is well in la-la land.
For other applications, if the error is reported while they are still processing valid data, the application(s) should/could terminate without completing processing.
Then you have problem.
Use one of the padding methods mentioned in that thread and you should be OK and the errors should disappear.
<snip>
anaconda.syslog seem to be more interesting
There are quite a few
<4>hda: media error (bad sector): status=0x51 { DriveReady SeekComplete Error } <4>hda: media error (bad sector): error=0x30 { LastFailedSense=0x03 }
As mentioned above, ...
Followed by
<3>Buffer I/O error on device hda, logical block ...
(hda is my DVD drive, so I guess it is related to the failed media check above?)
<snip>
(Also related to the failed media check above?)
My *guess* is that the application related errors you reported may be a result of certain installation steps terminating early due to the false I/O errors reported by the kernel/driver(s).
<snip>
HTH -- Bill
On Wed, 16 May 2007, William L. Maltby wrote:
Date: Wed, 16 May 2007 17:56:29 -0400 From: William L. Maltby CentOS4Bill@triad.rr.com Reply-To: CentOS mailing list centos@centos.org To: CentOS General List centos@centos.org Subject: Re: [CentOS] Re: Now I can't shutdown [was: Screen blanks afteral setup (Centos 5)]
On Wed, 2007-05-16 at 21:10 +0300, Itay wrote:
** I failed to mention earlier that the media check prior to installation *failed*. I decided to go on with the installation because my own experience so far, and others', showed that failure does not necessarily imply bad media. But see below.
Based on investigation I've done in response to the thread here,
http://lists.centos.org/pipermail/centos/2007-April/079718.html
and the advice by several in that thread, you need to have padding on the media. I'm authoring a "SOLVED" message for that thread, but haven't completed my tests.
[snip]
Use one of the padding methods mentioned in that thread and you should be OK and the errors should disappear.
*panic*
Bill, thank you for the detailed reply.
I tried the padding technique - the media errors were gone; kernel panic - stayed. I hope that you or others may help me with this.
Here is what I did:
1 Burnt i386 DVD using the padding method offered by Johnny Hughes http://lists.centos.org/pipermail/centos/2007-April/079828.html (Check sum OK.)
2 Booted to install + Media check OK + Finished installation w/o noticable problems - This time there were no error messages about i/o problems, bad sectors, etc. - anaconda.log had some warning messages regarding to some missing /etc, /usr, and few libs. (Similar to 1st installation.) + Rebooted from installer + Expected Setup Agent. Got *kernel panic* instead. + Second and third reboot have landed me in text-mode Setup Agent.
3 I tried several things, each one of them ended in *kernel panic* either before logging in as root, or some minutes after. The panic appeared after idling the machine for some time.
4 A couple of strange things + I have found out that the default run level was set to 3. When, as a root I tried 'telinit 5', the machine responded with a blank screen. I had to reset. + Rebooting the machine was accompanied with messages regarding ntp/clock skew. Later, I have found out that I have lost the network connection, probably while playing with the installation, so this probably explains the clock skew. Am not sure if this has any relevance. + At no point I was prompted to setup a non-root user.
5 For each crash / kernel panic I got a screen-load of trace and other cryptic output. Each time, so it seems, the output was different. *Q* Is there a way to dump those messages into a file?
6 Only suspicious thing I have found in /var/log/messages was lines like this
May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error) May 19 11:27:36 bilbo kernel: ata1: EH complete
7 Also, /var/log/secure had these errors - I believe for every reboot.
... May 19 11:25:30 bilbo login: ROOT LOGIN ON tty1 May 19 11:26:06 bilbo login: pam_unix(login:session): session closed for user root May 19 11:26:09 bilbo sshd[2677]: Received signal 15; terminating. May 19 11:27:26 bilbo sshd[2687]: Server listening on :: port 22. May 19 11:27:26 bilbo sshd[2687]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use. May 19 11:29:51 bilbo login: pam_unix(login:session): session opened for user root by LOGIN(uid=0) May 19 11:29:51 bilbo login: pam_selinux(login:session): Warning! Could not get new context for /dev/tty1, not relabeling: Invalid argument May 19 11:29:51 bilbo login: pam_selinux(login:session): usercon=(null), prev_context=system_u:object_r:tty_device_t May 19 11:29:51 bilbo login: ROOT LOGIN ON tty1
My *guess* is that the application related errors you reported may be a result of certain installation steps terminating early due to the false I/O errors reported by the kernel/driver(s).
HTH
Bill
Any recommendation how to proceed? (The most pressing question: is it the hardware? Should I take the box back to the seller?)
Many thanks.
On Sat, May 19, 2007 at 05:54:37PM +0300, Itay enlightened us:
Any recommendation how to proceed? (The most pressing question: is it the hardware? Should I take the box back to the seller?)
All of the above could certainly be caused by faulty hardware. I would run memtest on the machine for a while and see what it comes up with. I believe you can run it from the installer disk by typing "memtest" at the prompt.
Matt
On Sat, 19 May 2007, Matt Hyclak wrote:
Date: Sat, 19 May 2007 13:21:23 -0400 From: Matt Hyclak hyclak@math.ohiou.edu Reply-To: CentOS mailing list centos@centos.org To: CentOS mailing list centos@centos.org Subject: Re: [CentOS] Now I can't shutdown [was: Screen blanks afteral p (Centos 5)]
On Sat, May 19, 2007 at 05:54:37PM +0300, Itay enlightened us:
Any recommendation how to proceed? (The most pressing question: is it the hardware? Should I take the box back to the seller?)
All of the above could certainly be caused by faulty hardware. I would run memtest on the machine for a while and see what it comes up with. I believe you can run it from the installer disk by typing "memtest" at the prompt.
Matt
I memtest'ed, as you suggest, before this attempt to install. (This is the 2nd attempt; the 1st one was cumbered with errors related to mediacheck.) The test (default settings, ~4hrs, 5 passes) was w/o errors. I guess there is no harm in memtest'ing more.
Thanks,
On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
On Wed, 16 May 2007, William L. Maltby wrote:
<snip my prior reply header junk>
On Wed, 2007-05-16 at 21:10 +0300, Itay wrote:
** I failed to mention earlier that the media check prior to installation *failed*. I decided to go on with the installation because my own experience so far, and others', showed that failure does not necessarily imply bad media. But see below.
Based on investigation I've done in response to the thread here,
http://lists.centos.org/pipermail/centos/2007-April/079718.html
and the advice by several in that thread, you need to have padding on the media. I'm authoring a "SOLVED" message for that thread, but haven't completed my tests.
[snip]
Use one of the padding methods mentioned in that thread and you should be OK and the errors should disappear.
*panic*
Bill, thank you for the detailed reply.
I tried the padding technique - the media errors were gone; kernel panic - stayed. I hope that you or others may help me with this.
Most likely it will be others. My ignorance is boundless and, fortunately, my ego is inversely proportional to that! :-)
I'm glad the media errors are gone.
Here is what I did:
1 Burnt i386 DVD using the padding method offered by Johnny Hughes http://lists.centos.org/pipermail/centos/2007-April/079828.html (Check sum OK.)
2 Booted to install
- Media check OK
- Finished installation w/o noticable problems
- This time there were no error messages about i/o problems, bad sectors, etc.
- anaconda.log had some warning messages regarding to some missing /etc, /usr, and few libs. (Similar to 1st installation.)
I'm at console. Hang around and I'll reboot my LFS machine into CentOS 5 and see what I have in the logs.
I'm posting this so you can see if any of the stuff I mention can be pursued while I compose a response to these log items.
- Rebooted from installer
- Expected Setup Agent. Got *kernel panic* instead.
- Second and third reboot have landed me in text-mode Setup Agent.
3 I tried several things, each one of them ended in *kernel panic* either before logging in as root, or some minutes after. The panic appeared after idling the machine for some time.
*sniff* Smells hardware-related. But whether it's bad hardware or kernel handling of it, I'm too ignorant to hazard a guess. I googled and found your original post (BTW, don't high-jack threads, even you own. It made it more difficult to find you brief originally-posted hardware ref). :-O
I was going to ask about x586 or C5 processors, but I did manage to find your OP and saw AMD 4200+, IIRC. So we don't have to worry about that.
4 A couple of strange things
- I have found out that the default run level was set to 3. When, as a root I tried 'telinit 5', the machine responded with a blank screen. I had to reset.
Have you tried a <CTRL>-<ALT>-<F1> when this happens? Since desktop is being started on tty7, if it fails and seems blank, maybe switching to virtual console 1 will work, if the machine is still alive. If so, maybe some answers are there (view /var/log/messages, the X log, etc.).
- Rebooting the machine was accompanied with messages regarding ntp/clock skew. Later, I have found out that I have lost the network connection, probably while playing with the installation, so this probably explains the clock skew. Am not sure if this has any relevance.
- At no point I was prompted to setup a non-root user.
IIRC, when I did my C5 install, I got that prompt. If that's normal, it may mean that the problem actually bit your during the install phase and not everything got done correctly.
Maybe my install log will give you something to check against.
5 For each crash / kernel panic I got a screen-load of trace and other cryptic output. Each time, so it seems, the output was different. *Q* Is there a way to dump those messages into a file?
I'm too ignorant to answer that. But if you do get up and running for a few minutes in a text console, clues may be laying around in /var/log/messages. Search backwards for "restart" (twice) or some other word, like "panic", and read around there.
6 Only suspicious thing I have found in /var/log/messages was lines like this
May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error) May 19 11:27:36 bilbo kernel: ata1: EH complete
7 Also, /var/log/secure had these errors - I believe for every reboot.
... May 19 11:25:30 bilbo login: ROOT LOGIN ON tty1 May 19 11:26:06 bilbo login: pam_unix(login:session): session closed for user root May 19 11:26:09 bilbo sshd[2677]: Received signal 15; terminating. May 19 11:27:26 bilbo sshd[2687]: Server listening on :: port 22. May 19 11:27:26 bilbo sshd[2687]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use. May 19 11:29:51 bilbo login: pam_unix(login:session): session opened for user root by LOGIN(uid=0) May 19 11:29:51 bilbo login: pam_selinux(login:session): Warning! Could not get new context for /dev/tty1, not relabeling: Invalid argument May 19 11:29:51 bilbo login: pam_selinux(login:session): usercon=(null), prev_context=system_u:object_r:tty_device_t May 19 11:29:51 bilbo login: ROOT LOGIN ON tty1
I'm too ignorant to answer authoritatively.
My *guess* is that the application related errors you reported may be a result of certain installation steps terminating early due to the false I/O errors reported by the kernel/driver(s).
HTH
Bill
Any recommendation how to proceed? (The most pressing question: is it the hardware? Should I take the box back to the seller?)
If the panics are random, IIRC, could be memory, could be ... But a good run of memtest386 from the install CD should help determine that. Also, it is not uncommon for new hardware to have the occasional loose connector or PCI card. Maybe too small power supply. Maybe CPU fan not spinning. Maybe ambient temperature of the room is too high and internal box temperature excessive.
If you suspect hardware, check all connectors. Make sure memory, power supply connectors and PCI cards are firmly seated. Make sure your power supply is adequate (my EPOX board needed much more than the PS for the ACER box, into which the EPOX was originally installed, could supply. Had random panics, usually near startup times, sometimes a few minutes after. That's natural because the ACER had an integrated SiS chip set which needs much less power than the Via-based EPOX.
Make sure the CPU fan is seated and working.
Is your AC power from the electric company reliable? Fluctuations of 20% are not uncommon here. Battery backup with power conditioning helps a lot.
Since you mentioned a delay sometimes (IIRC), heat sounds like a possible culprit. If the room is cool, take the covers off and see if it runs longer. If it stays up long enough, do
# cat /proc/acpi/thermal_zone/THRM/temperature temperature: 36 C
Make sure it's in the range for the AMD you have. BTW, mine is lower than it used to be. I added an expensive Zallman FHS a few months back. May try overclocking someday if I get enough interest.
Many thanks.
Use google with "site:centos.org" added, e.g. like this
screen blanks after initial setup site:centos.org
in advanced search fields (I had site:... in the "all of the words" field and "screen blanks after initial setup" in the "exact phrase" field. You'll find lots of instances of kernel panics discussed on the list and some suggestions, in some cases, for "noapic" and similar boot- time parameters.
On 5/19/07, William L. Maltby CentOS4Bill@triad.rr.com wrote:
On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
4 A couple of strange things
- I have found out that the default run level was set to 3. When, as a root I tried 'telinit 5', the machine responded with a blank screen. I had to reset.
Have you tried a <CTRL>-<ALT>-<F1> when this happens? Since desktop is being started on tty7, if it fails and seems blank, maybe switching to virtual console 1 will work, if the machine is still alive.
This symptom actually sounds exactly like what has been happening on my laptop when no networking is available at boot time. It could be unrelated to the panic problems.
- Rebooting the machine was accompanied with messages regarding ntp/clock skew. Later, I have found out that I have lost the network connection, probably while playing with the installation, so this probably explains the clock skew. Am not sure if this has any relevance.
It's probably not relevant to the panics but it could be producing other symptoms that are confusing the diagnosis. Try to get it fixed again if you can.
Bill's advice (e.g. about checking hardware connections) is good.
On Sat, 19 May 2007, Bart Schaefer wrote:
Date: Sat, 19 May 2007 11:55:11 -0700 From: Bart Schaefer barton.schaefer@gmail.com Reply-To: CentOS mailing list centos@centos.org To: CentOS mailing list centos@centos.org Subject: Re: [CentOS] Now I can't shutdown [was: Screen blanks afteral p (Centos 5)]
On 5/19/07, William L. Maltby CentOS4Bill@triad.rr.com wrote:
On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
4 A couple of strange things
- I have found out that the default run level was set to 3. When, as a root I tried 'telinit 5', the machine responded with a blank screen. I had to reset.
Have you tried a <CTRL>-<ALT>-<F1> when this happens? Since desktop is being started on tty7, if it fails and seems blank, maybe switching to virtual console 1 will work, if the machine is still alive.
This symptom actually sounds exactly like what has been happening on my laptop when no networking is available at boot time. It could be unrelated to the panic problems.
- Rebooting the machine was accompanied with messages regarding ntp/clock skew. Later, I have found out that I have lost the network connection, probably while playing with the installation, so this probably explains the clock skew. Am not sure if this has any relevance.
It's probably not relevant to the panics but it could be producing other symptoms that are confusing the diagnosis. Try to get it fixed again if you can.
Bill's advice (e.g. about checking hardware connections) is good.
I am memtest'ing the machine again for the night. We'll check the connections first thing tomorrow. And will follow with an attempt to boot to centos 5 keeping close watch on the network.
On Sat, 19 May 2007, William L. Maltby wrote:
On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
[snip]
I tried the padding technique - the media errors were gone; kernel panic - stayed. I hope that you or others may help me with this.
Most likely it will be others. My ignorance is boundless and, fortunately, my ego is inversely proportional to that! :-)
I'm glad the media errors are gone.
Which leaves me with the more difficult alternatives. Arrrgh.
[snip]
3 I tried several things, each one of them ended in *kernel panic* either before logging in as root, or some minutes after. The panic appeared after idling the machine for some time.
*sniff* Smells hardware-related. But whether it's bad hardware or kernel handling of it, I'm too ignorant to hazard a guess. I googled and found your original post (BTW, don't high-jack threads, even you own. It made it more difficult to find you brief originally-posted hardware ref). :-O
I thought (and still do) that the two issues were related, and therefore modifying the subject line and including a [was:...] clause are sufficient. Sorry for the extra work.
I was going to ask about x586 or C5 processors, but I did manage to find your OP and saw AMD 4200+, IIRC. So we don't have to worry about that.
:-)
4 A couple of strange things
- I have found out that the default run level was set to 3. When, as a root I tried 'telinit 5', the machine responded with a blank screen. I had to reset.
Have you tried a <CTRL>-<ALT>-<F1> when this happens? Since desktop is being started on tty7, if it fails and seems blank, maybe switching to virtual console 1 will work, if the machine is still alive. If so, maybe some answers are there (view /var/log/messages, the X log, etc.).
Wasn't able to switch to virtual consoles. (I begin to suspect that there are some problems with the keyboard as well, though.) No clues in /var/log/messages. And no X.log at all!
- Rebooting the machine was accompanied with messages regarding ntp/clock skew. Later, I have found out that I have lost the network connection, probably while playing with the installation, so this probably explains the clock skew. Am not sure if this has any relevance.
- At no point I was prompted to setup a non-root user.
IIRC, when I did my C5 install, I got that prompt. If that's normal, it may mean that the problem actually bit your during the install phase and not everything got done correctly.
Possibly. But there were no hints for that in install.log and anaconda.*log*
5 For each crash / kernel panic I got a screen-load of trace and other cryptic output. Each time, so it seems, the output was different. *Q* Is there a way to dump those messages into a file?
I'm too ignorant to answer that. But if you do get up and running for a few minutes in a text console, clues may be laying around in /var/log/messages. Search backwards for "restart" (twice) or some other word, like "panic", and read around there.
No hints except for what I have mentioned below.
6 Only suspicious thing I have found in /var/log/messages was lines like this
May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error) May 19 11:27:36 bilbo kernel: ata1: EH complete
7 Also, /var/log/secure had these errors - I believe for every reboot.
... May 19 11:25:30 bilbo login: ROOT LOGIN ON tty1 May 19 11:26:06 bilbo login: pam_unix(login:session): session closed for user root May 19 11:26:09 bilbo sshd[2677]: Received signal 15; terminating. May 19 11:27:26 bilbo sshd[2687]: Server listening on :: port 22. May 19 11:27:26 bilbo sshd[2687]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use. May 19 11:29:51 bilbo login: pam_unix(login:session): session opened for user root by LOGIN(uid=0) May 19 11:29:51 bilbo login: pam_selinux(login:session): Warning! Could not get new context for /dev/tty1, not relabeling: Invalid argument May 19 11:29:51 bilbo login: pam_selinux(login:session): usercon=(null), prev_context=system_u:object_r:tty_device_t May 19 11:29:51 bilbo login: ROOT LOGIN ON tty1
I'm too ignorant to answer authoritatively.
My *guess* is that the application related errors you reported may be a result of certain installation steps terminating early due to the false I/O errors reported by the kernel/driver(s).
HTH
Bill
Any recommendation how to proceed? (The most pressing question: is it the hardware? Should I take the box back to the seller?)
If the panics are random, IIRC, could be memory, could be ... But a good run of memtest386 from the install CD should help determine that. Also, it is not uncommon for new hardware to have the occasional loose connector or PCI card. Maybe too small power supply. Maybe CPU fan not spinning. Maybe ambient temperature of the room is too high and internal box temperature excessive.
Running memtest now for the night (runs for 2 hours already). If it was a question of excess heat I would expect to have trouble during memtest run as well; no?
If you suspect hardware, check all connectors. Make sure memory, power supply connectors and PCI cards are firmly seated. Make sure your power supply is adequate (my EPOX board needed much more than the PS for the ACER box, into which the EPOX was originally installed, could supply. Had random panics, usually near startup times, sometimes a few minutes after. That's natural because the ACER had an integrated SiS chip set which needs much less power than the Via-based EPOX.
Make sure the CPU fan is seated and working.
Is your AC power from the electric company reliable? Fluctuations of 20% are not uncommon here. Battery backup with power conditioning helps a lot.
Actually, the power supply is not stable enough. But there were no fluctuations that I could notice during my attempts this morning. We'll keep this in mind, though.
Since you mentioned a delay sometimes (IIRC), heat sounds like a possible culprit. If the room is cool, take the covers off and see if it runs longer. If it stays up long enough, do
Again: memtest'ing for few hours should produce a similar challenge I should think. I could try running knoppix 5 for a while and straining somehow the CPU.
# cat /proc/acpi/thermal_zone/THRM/temperature temperature: 36 C
Make sure it's in the range for the AMD you have. BTW, mine is lower than it used to be. I added an expensive Zallman FHS a few months back. May try overclocking someday if I get enough interest.
We'll check tomorrow when attempting to reboot into centos.
Use google with "site:centos.org" added, e.g. like this
screen blanks after initial setup site:centos.org
in advanced search fields (I had site:... in the "all of the words" field and "screen blanks after initial setup" in the "exact phrase" field. You'll find lots of instances of kernel panics discussed on the list and some suggestions, in some cases, for "noapic" and similar boot- time parameters.
Yup. I noticed that some of them were related to nVidia hardware. Well, my box has a few nVidia's, so maybe...
Thanks.
On Sat, 2007-05-19 at 23:36 +0300, Itay wrote:
On Sat, 19 May 2007, William L. Maltby wrote:
On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
[snip]
<snip>
... (BTW, don't high-jack threads, even you own. It made it more difficult to find you brief originally-posted hardware ref). :-O
I thought (and still do) that the two issues were related, and therefore modifying the subject line and including a [was:...] clause are sufficient. Sorry for the extra work.
NP. From *my* background, a media check that is solved becomes un- related to a kernel panic. So a new thread would be in order. However, I also understand that from *your* POV, it's all "install fails".
<snip>
Wasn't able to switch to virtual consoles. (I begin to suspect that there are some problems with the keyboard as well, though.) No clues in /var/log/messages. And no X.log at all!
Ach! Kb problems are not needed when you are suffering "panic attacks". Prozac works! ;-)
<snip>
- Rebooting the machine was accompanied with messages regarding ntp/clock skew. Later, I have found out that I have lost the network connection, probably while playing with the installation, so this probably explains the clock skew. Am not sure if this has any relevance.
- At no point I was prompted to setup a non-root user.
IIRC, when I did my C5 install, I got that prompt. If that's normal, it may mean that the problem actually bit your during the install phase and not everything got done correctly.
Possibly. But there were no hints for that in install.log and anaconda.*log*
5 For each crash / kernel panic I got a screen-load of trace and other cryptic output. Each time, so it seems, the output was different. *Q* Is there a way to dump those messages into a file?
I'm too ignorant to answer that. But if you do get up and running for a few minutes in a text console, clues may be laying around in /var/log/messages. Search backwards for "restart" (twice) or some other word, like "panic", and read around there.
No hints except for what I have mentioned below.
I was afraid of that. It was worth a try though.
6 Only suspicious thing I have found in /var/log/messages was lines like this
May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error) May 19 11:27:36 bilbo kernel: ata1: EH complete
7 Also, /var/log/secure had these errors - I believe for every reboot.
<snip>
If the panics are random, IIRC, could be memory, could be ... But a good run of memtest386 from the install CD should help determine that. Also, it is not uncommon for new hardware to have the occasional loose connector or PCI card. Maybe too small power supply. Maybe CPU fan not spinning. Maybe ambient temperature of the room is too high and internal box temperature excessive.
Running memtest now for the night (runs for 2 hours already). If it was a question of excess heat I would expect to have trouble during memtest run as well; no?
Memtest will not exercise the CPU enough to add substantial heat issues. Doing a couple *heavy* compilations at the same time, like compiling the kernel, glibc, ... and more will spin HDs and tax the CPU substantially.
This will add a... who was said "a buttload"... of heat. With the speed of your CPU, might need to do a couple of instances of some *big* stuff at the same time in different directories.
<snip>
Is your AC power from the electric company reliable? Fluctuations of 20% are not uncommon here. Battery backup with power conditioning helps a lot.
Actually, the power supply is not stable enough. But there were no fluctuations that I could notice during my attempts this morning. We'll keep this in mind, though.
Fluctuations generally go unnoticed. A drop of 10-20 volts or a spike of a similar amount won't disturb a typical PS. If you have a BBS with conditioning or "brownout" protection, an alarm may sound, if available and not disabled.
"Outages" for fractional seconds are noticed, but even a second or two is often survived with todays capacitor-laden supplies and mobos.
Since you mentioned a delay sometimes (IIRC), heat sounds like a possible culprit. If the room is cool, take the covers off and see if it runs longer. If it stays up long enough, do
Again: memtest'ing for few hours should produce a similar challenge I should think. I could try running knoppix 5 for a while and straining somehow the CPU.
Except for things like I mentioned above, I doubt you good load it enough to cause substantial effect. Hmmm ... got it.
Run several "bzip2 --best" on several large files simultaneously (like CentOS ISO images which have a lot of already compressed stuff - that adds more load to bzip2's effort).
That should also tromp the HDs reasonable hard as a lot of reading input and creating temporary files should occur.
<snip>
HTH -- Bill
On Sat, 2007-05-19 at 17:54 +0300, Itay wrote:
On Wed, 16 May 2007, William L. Maltby wrote:
<snip useless hdrs>
On Wed, 2007-05-16 at 21:10 +0300, Itay wrote:
<snip stuff addressed in prior reply by me>
2 Booted to install
- Media check OK
- Finished installation w/o noticable problems
- This time there were no error messages about i/o problems, bad sectors, etc.
- anaconda.log had some warning messages regarding to some missing /etc, /usr, and few libs. (Similar to 1st installation.)
I have my logs available on the C5 box now. I have no such errors in my anaconda-ks.cfg log. I can bzip2 all and mail to you privately, if desired. Then you can do something like "comm -3" against mine and yours to see what sorts of differences there are. Mine was a full install, but no extras or rpmforge stuff (yet).
If you want this, send me a post (anti-spam: you'll have to assemble the real address yourself)
--CentOS4Bill__ the-usual-at triad DOT rr dot-again --com__
Remove *leading* -- and *trailing __ and replace various obvious things.
<snip more stuff addressed in a prior response>
HTH -- Bill