[CentOS] How do we handle panics? Bug it here [ SOLVED? Thx J.J. Garcia]

Tue Sep 5 21:31:34 UTC 2006
William L. Maltby <CentOS4Bill at triad.rr.com>

On Mon, 2006-09-04 at 13:59 -0400, William L. Maltby wrote:
> On Mon, 2006-09-04 at 18:57 +0200, J.J. Garcia wrote:
> > El lun, 04-09-2006 a las 10:29 -0400, William L. Maltby escribió:
> > > On Sat, 2006-09-02 at 14:31 -0400, William L. Maltby wrote:
> > > > "Unable to handle kernel paging request". I've saved the OOPS data from
> > > > the logs for 6 panics since the 4.4 update.
> > > 
> > > s/6/10/  # Now
> 
> s/10/11/   # Now, and my first double

Had one more after that.

> <snip>

> > 
> > Maybe it is not the point..., but have you considered the option of
> > running 'badblock' (non-destructive/destructive after dd'ing to backup)?
> > even on swap space? 

That was not the solution, but read on! Stop at this next paragraph if
you have no interest in the details.

Apparently, country living has it's downside: unreliable power from the
electrical utility company caused the problems I've been experiencing
for a couple months... maybe most, but not all? With BBS in place again,
reliability seems to be restored.

Moral of the story: ATX systems may be more sensitive to power
deviations than AT systems? Or... Well you know the other possibilities
related to power.

> 
> OUTSTANDING! I had not even considered. I was wrapped up in my own loop:
> prev mobo bad, took a long time to ID that (was being led to believe
> mem, after being led to believe it started with seamonkey, after.. you
> can see that by now I am tightly focused).

And here is the value of someone taking the time to reply when no one
else did. I was, indeed, trapped in in a non-productive train of
thought. J.J. raised a Q that I had not considered. Being a pretty good
associative processor, I immediately extended that to "what else have I
discarded or not considered that might be in play"?

He offered what was needed to get me moving. Just plain old raising a
question, causing me to start thinking outside the box I had fallen
into.

Thanks be to J.J.

My eyes immediately swiveled left to the BBS containing the 2 new
batteries undergoing their initial 24 hour charge before use. Something
in J.J.'s question had made me recall that all my troubles began about
the same time the BBS died. And as symptoms accumulated, I handled them
in isolation, for some unknown reason. Uncharacteristic of me. Several
other things happened in the same time-frame, clouding the situation for
me.

To keep it short:
  - couple months back BBS died; about the same time some CentOS updates
    had occurred; had random lock-ups & X applications dying
    unexpectedly,
  - after some time, I figured it may be hardware; memtest86 shows
    memory errors, but inconsistently, over a few days,
  - later determined mobo at fault, RMA'd
  - xfer'd load to K6-III, it chugs along NP, no BBS,
  - later summer heat comes, Duke Energy announces brownouts possible,
  - K6-III still runs w/o a hiccup, w/o a BBS,
  - later built new machine; runs at 4.3 for a day or two and does OK
    with a couple freezes only; I'm thinking it's anomalies
    introduced as updates were applied when the mobo was failing and the
    4.4 update may cure it,
  - 4.4 update done and OOPS/panic about a dozen times in 5 days,
  - I dutifully google and find posts indicating this is a known and
    unsolved problem through release 2.6.15 at least,
  - BBS back in service and normal activities produce no OOPs, no panic.

Now, how does it become believable to me that I'm the only one affected
on this list when googling indicates a well-known problem? Lack of
participation and lack of knowledge. I know there's lots of admin types
here. I guess there may be some like me at home just running a desktop
workstation with a little private LAN and old equipment. But never sure.
And do they watch the lists as I do? Do they bother to reply? No way to
know.

For this OP and some others, when I ask "anyone else seeing this?", I
get no reply. That's expected if no one else is seeing it. So I figure
that there's very few with my simple and older setup. Ergo, they may not
be affected. In my experience, which is quite long, I have often exposed
bugs that others do not see. IBM compiler types used to hate me (a
*long* time ago).

That is how it becomes believable that I'm "The One".

> <snip>
> > > Since I'm somewhat new here, and this is a recognized problem in the
> > > community (if my googling is correct), I am still uncertain how I should
> > > deal with this, other than the workaround. Can someone please respond to
> > > my simple question: "Do I bugzilla CentOS, RH or ignore it?"

Lack of response to this bothered me. Makes me wonder if I'm where I
need to be.

> > > 
> > > <snip>
> > > 
> > 
> > What im doing (up to anybody tells me other thing) is to post the bug at
> > Centos Bug Tracker, this is what we are using, this is where i think we
> > have to toss the bugs, well, my way only of interpreting things...
> 
> Thanks. At least someone answered.

> 
> > 
> > Have good luck Bill

I did. You brought it by being willing to take the time.

> 
> Thanks. And for taking the time too.
> <snip sig stuff>

--
Bill