On Sat, 19 May 2007, William L. Maltby wrote: > On Sat, 2007-05-19 at 17:54 +0300, Itay wrote: [snip] >> I tried the padding technique - the media errors were gone; >> kernel panic - stayed. I hope that you or others may help me >> with this. > > Most likely it will be others. My ignorance is boundless and, > fortunately, my ego is inversely proportional to that! :-) > > I'm glad the media errors are gone. Which leaves me with the more difficult alternatives. Arrrgh. [snip] >> 3 I tried several things, each one of them ended in *kernel panic* >> either before logging in as root, or some minutes after. The panic >> appeared after idling the machine for some time. > > *sniff* Smells hardware-related. But whether it's bad hardware or kernel > handling of it, I'm too ignorant to hazard a guess. I googled and found > your original post (BTW, don't high-jack threads, even you own. It made > it more difficult to find you brief originally-posted hardware ref). :-O I thought (and still do) that the two issues were related, and therefore modifying the subject line and including a [was:...] clause are sufficient. Sorry for the extra work. > I was going to ask about x586 or C5 processors, but I did manage to find > your OP and saw AMD 4200+, IIRC. So we don't have to worry about that. :-) >> 4 A couple of strange things >> + I have found out that the default run level was set to 3. >> When, as a root I tried 'telinit 5', the machine responded with a >> blank screen. I had to reset. > > Have you tried a <CTRL>-<ALT>-<F1> when this happens? Since desktop is > being started on tty7, if it fails and seems blank, maybe switching to > virtual console 1 will work, if the machine is still alive. If so, maybe > some answers are there (view /var/log/messages, the X log, etc.). Wasn't able to switch to virtual consoles. (I begin to suspect that there are some problems with the keyboard as well, though.) No clues in /var/log/messages. And no X.log at all! >> + Rebooting the machine was accompanied with messages regarding >> ntp/clock skew. Later, I have found out that I have lost the >> network connection, probably while playing with the >> installation, so this probably explains the clock skew. >> Am not sure if this has any relevance. >> + At no point I was prompted to setup a non-root user. > > IIRC, when I did my C5 install, I got that prompt. If that's normal, it > may mean that the problem actually bit your during the install phase and > not everything got done correctly. Possibly. But there were no hints for that in install.log and anaconda.*log* >> 5 For each crash / kernel panic I got a screen-load of trace and other >> cryptic output. Each time, so it seems, the output was different. >> *Q* Is there a way to dump those messages into a file? > > I'm too ignorant to answer that. But if you do get up and running for a > few minutes in a text console, clues may be laying around > in /var/log/messages. Search backwards for "restart" (twice) or some > other word, like "panic", and read around there. No hints except for what I have mentioned below. >> 6 Only suspicious thing I have found in /var/log/messages was lines >> like this >> >> May 19 11:27:36 bilbo kernel: ata1.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 >> May 19 11:27:36 bilbo kernel: ata1.00: tag 0 cmd 0xb0 Emask 0x1 stat 0x51 err 0x4 (device error) >> May 19 11:27:36 bilbo kernel: ata1: EH complete >> >> 7 Also, /var/log/secure had these errors - I believe for every reboot. >> >> ... >> May 19 11:25:30 bilbo login: ROOT LOGIN ON tty1 >> May 19 11:26:06 bilbo login: pam_unix(login:session): session closed for user root >> May 19 11:26:09 bilbo sshd[2677]: Received signal 15; terminating. >> May 19 11:27:26 bilbo sshd[2687]: Server listening on :: port 22. >> May 19 11:27:26 bilbo sshd[2687]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use. >> May 19 11:29:51 bilbo login: pam_unix(login:session): session opened for user root by LOGIN(uid=0) >> May 19 11:29:51 bilbo login: pam_selinux(login:session): Warning! Could not get new context for /dev/tty1, not relabeling: Invalid argument >> May 19 11:29:51 bilbo login: pam_selinux(login:session): usercon=(null), prev_context=system_u:object_r:tty_device_t >> May 19 11:29:51 bilbo login: ROOT LOGIN ON tty1 > > I'm too ignorant to answer authoritatively. >> >>> My *guess* is that the application related errors you reported may be a >>> result of certain installation steps terminating early due to the false >>> I/O errors reported by the kernel/driver(s). >>> >>> HTH >>> -- >>> Bill >> >> Any recommendation how to proceed? >> (The most pressing question: is it the hardware? Should I take >> the box back to the seller?) > > If the panics are random, IIRC, could be memory, could be ... But a good > run of memtest386 from the install CD should help determine that. Also, > it is not uncommon for new hardware to have the occasional loose > connector or PCI card. Maybe too small power supply. Maybe CPU fan not > spinning. Maybe ambient temperature of the room is too high and internal > box temperature excessive. Running memtest now for the night (runs for 2 hours already). If it was a question of excess heat I would expect to have trouble during memtest run as well; no? > If you suspect hardware, check all connectors. Make sure memory, power > supply connectors and PCI cards are firmly seated. Make sure your power > supply is adequate (my EPOX board needed much more than the PS for the > ACER box, into which the EPOX was originally installed, could supply. > Had random panics, usually near startup times, sometimes a few minutes > after. That's natural because the ACER had an integrated SiS chip set > which needs much less power than the Via-based EPOX. > > Make sure the CPU fan is seated and working. > > Is your AC power from the electric company reliable? Fluctuations of 20% > are not uncommon here. Battery backup with power conditioning helps a > lot. Actually, the power supply is not stable enough. But there were no fluctuations that I could notice during my attempts this morning. We'll keep this in mind, though. > Since you mentioned a delay sometimes (IIRC), heat sounds like a > possible culprit. If the room is cool, take the covers off and see if it > runs longer. If it stays up long enough, do Again: memtest'ing for few hours should produce a similar challenge I should think. I could try running knoppix 5 for a while and straining somehow the CPU. > # cat /proc/acpi/thermal_zone/THRM/temperature > temperature: 36 C > > Make sure it's in the range for the AMD you have. BTW, mine is lower > than it used to be. I added an expensive Zallman FHS a few months back. > May try overclocking someday if I get enough interest. We'll check tomorrow when attempting to reboot into centos. > Use google with "site:centos.org" added, e.g. like this > > screen blanks after initial setup site:centos.org > > in advanced search fields (I had site:... in the "all of the words" > field and "screen blanks after initial setup" in the "exact phrase" > field. You'll find lots of instances of kernel panics discussed on the > list and some suggestions, in some cases, for "noapic" and similar boot- > time parameters. Yup. I noticed that some of them were related to nVidia hardware. Well, my box has a few nVidia's, so maybe... Thanks. -- Itay Furman <centos at nospammail.net> --