How do we handle panics? Bug it here or RH or ignore it (been around *long* time apparently)? - Discuss

List overview All Threads
Download

newer

How do we handle panics? Bug it here or RH or ignore it (been around long time apparently)?

older

Centos 2.1 and Grub Error 28

kernel 2.6.9-42.0.2.plus.c4 not...

William L. Maltby

2 Sep 2006 2 Sep '06

6:31 p.m.

"Unable to handle kernel paging request". I've saved the OOPS data from the logs for 6 panics since the 4.4 update. Two before I turned on DRI for my radeon (why not? Couldn't be any less "stable"). Did my google work and became aware it is a long-standing unfixed problem (at least through kernel 2.6.15? IIRC).

I've started reading what I can find on crash to see if I can get the whole thing. Do we have a kernel with debug symbols and associated?

I don't know if I'll pursue this. Disabled swap and that seems to be working. I'm fortunate I can do this. Others may not be so lucky. I'd like to turn swap back on, but I'll probably have to disable all that read-ahead stuff again and find the thread on the swappiness switch and try and find the list of kernel parameters *again* (I know there in here somewhere)...

Enough griping for today. Do I bugzilla CentOS, RH or ignore it?

TIA, -- Bill

Show replies by date

William L. Maltby

4 Sep 4 Sep

2:29 p.m.

New subject: How do we handle panics? Bug it here or RH or ignore it (been around *long* time apparently)?

On Sat, 2006-09-02 at 14:31 -0400, William L. Maltby wrote:

...

"Unable to handle kernel paging request". I've saved the OOPS data from the logs for 6 panics since the 4.4 update.

s/6/10/ # Now

...

<snip>

...

I've started reading what I can find on crash to see if I can get the whole thing. Do we have a kernel with debug symbols and associated?

Haven't stayed running long enough to pursue this (did take time for real life stuff, seems more manageable)

...

I don't know if I'll pursue this. Disabled swap and that seems to be working. I'm fortunate I can do this. Others may not be so lucky. I'd like to turn swap back on, but I'll probably have to disable all that read-ahead stuff again and find the thread on the swappiness switch and try and find the list of kernel parameters *again* (I know there in here somewhere)...

Have tried several combinations without success. Swapoff, 0

...

/proc/sys/vm/swappiness, those two without each other, disabled the

readahead and readahead_early stuff, running in run level 3 only, nothing's worked to keep it up. Need Robo-Viagra here.

Am currently running with swappiness at 10, swap enabled both readaheads disabled. This is all the stuff that was gleaned from the prior CentOS list discussions.

Anyone got any other things I might try? I know a fix is not yet available (if my googling was extensive enough and I missed nothing) but I would like a work-around that might keep it up more than 24 hours about 90% of the time.

One new piece of info: a lot of the OOPS, but not all, have started after the machine was idle and I touch the keyboard to bring the things back to life (AFAIK, just a screen-saver, blanked, going). So I turned off the BIOS ACPI stuff. Since I know zilch about the ACPI stuff, I've begun reading /usr/share/doc stuff (kernel params and pm) to see what I might disable in there. Maybe that will help some.

But if anyone has a couple suggestions regarding that, I might get the docs read faster (fewer boots). I would appreciate it.

...

Enough griping for today. Do I bugzilla CentOS, RH or ignore it?

Since I'm somewhat new here, and this is a recognized problem in the community (if my googling is correct), I am still uncertain how I should deal with this, other than the workaround. Can someone please respond to my simple question: "Do I bugzilla CentOS, RH or ignore it?"

<snip>

-- Bill

J.J. Garcia

4:57 p.m.

New subject: How do we handle panics? Bug it here or RH or ignore it (been around *long* time apparently)?

El lun, 04-09-2006 a las 10:29 -0400, William L. Maltby escribió:

...

On Sat, 2006-09-02 at 14:31 -0400, William L. Maltby wrote:

...
"Unable to handle kernel paging request". I've saved the OOPS data from the logs for 6 panics since the 4.4 update.

s/6/10/ # Now

...
<snip>

...
I've started reading what I can find on crash to see if I can get the whole thing. Do we have a kernel with debug symbols and associated?

Haven't stayed running long enough to pursue this (did take time for real life stuff, seems more manageable)

...
I don't know if I'll pursue this. Disabled swap and that seems to be working. I'm fortunate I can do this. Others may not be so lucky. I'd like to turn swap back on, but I'll probably have to disable all that read-ahead stuff again and find the thread on the swappiness switch and try and find the list of kernel parameters *again* (I know there in here somewhere)...

Have tried several combinations without success. Swapoff, 0

...
/proc/sys/vm/swappiness, those two without each other, disabled the

readahead and readahead_early stuff, running in run level 3 only, nothing's worked to keep it up. Need Robo-Viagra here.

Am currently running with swappiness at 10, swap enabled both readaheads disabled. This is all the stuff that was gleaned from the prior CentOS list discussions.

Anyone got any other things I might try? I know a fix is not yet available (if my googling was extensive enough and I missed nothing) but I would like a work-around that might keep it up more than 24 hours about 90% of the time.

One new piece of info: a lot of the OOPS, but not all, have started after the machine was idle and I touch the keyboard to bring the things back to life (AFAIK, just a screen-saver, blanked, going). So I turned off the BIOS ACPI stuff. Since I know zilch about the ACPI stuff, I've begun reading /usr/share/doc stuff (kernel params and pm) to see what I might disable in there. Maybe that will help some.

But if anyone has a couple suggestions regarding that, I might get the docs read faster (fewer boots). I would appreciate it.

...
Enough griping for today. Do I bugzilla CentOS, RH or ignore it?

Bill,

Maybe it is not the point..., but have you considered the option of running 'badblock' (non-destructive/destructive after dd'ing to backup)? even on swap space?

As you said, it was after 4.4 update, new space on disk used to store new updates, maybe...

Sure im wrong... but btw... sometimes when i have disk issues i also check temperature for transient failures due mobo/processor...

...

Since I'm somewhat new here, and this is a recognized problem in the community (if my googling is correct), I am still uncertain how I should deal with this, other than the workaround. Can someone please respond to my simple question: "Do I bugzilla CentOS, RH or ignore it?"

<snip>

What im doing (up to anybody tells me other thing) is to post the bug at Centos Bug Tracker, this is what we are using, this is where i think we have to toss the bugs, well, my way only of interpreting things...

Have good luck Bill

...

-- Bill

CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

William L. Maltby

5:59 p.m.

New subject: How do we handle panics? Bug it here or RH or ignore it (been around *long* time apparently)?

On Mon, 2006-09-04 at 18:57 +0200, J.J. Garcia wrote:

...

El lun, 04-09-2006 a las 10:29 -0400, William L. Maltby escribió:

...
On Sat, 2006-09-02 at 14:31 -0400, William L. Maltby wrote:

...
"Unable to handle kernel paging request". I've saved the OOPS data from the logs for 6 panics since the 4.4 update.

s/6/10/ # Now

s/10/11/ # Now, and my first double

...

...
...
<snip><snip>

...

...
Have tried several combinations without success. Swapoff, 0

...
/proc/sys/vm/swappiness, those two without each other, disabled the

readahead and readahead_early stuff, running in run level 3 only, nothing's worked to keep it up. Need Robo-Viagra here.

Am currently running with swappiness at 10, swap enabled both readaheads disabled. This is all the stuff that was gleaned from the prior CentOS list discussions.

This last configuration looked about the same, but for the double panic now.

...

...
Anyone got any other things I might try? I know a fix is not yet

<snip>

...

...
One new piece of info: a lot of the OOPS, but not all, have started after the machine was idle and I touch the keyboard to bring the things back to life (AFAIK, just a screen-saver, blanked, going). So I turned off the BIOS ACPI stuff. Since I know zilch about the ACPI stuff, I've begun reading /usr/share/doc stuff (kernel params and pm) to see what I might disable in there. Maybe that will help some.

But if anyone has a couple suggestions regarding that, I might get the docs read faster (fewer boots). I would appreciate it.

...
Enough griping for today. Do I bugzilla CentOS, RH or ignore it?

Bill,

Maybe it is not the point..., but have you considered the option of running 'badblock' (non-destructive/destructive after dd'ing to backup)? even on swap space?

OUTSTANDING! I had not even considered. I was wrapped up in my own loop: prev mobo bad, took a long time to ID that (was being led to believe mem, after being led to believe it started with seamonkey, after.. you can see that by now I am tightly focused).

Anyway, both relatively new 100GB commodity HDs. Had not though of that. They are S>M>A>R>T capable. I'll do what you suggest and look at the smartmon output.

...

As you said, it was after 4.4 update, new space on disk used to store new updates, maybe...

It's even more than that. It is a new install of 4.3 on a unit built with a new mobo to replace the one that I RMA'd. Memtest86 makes it seem that memory is still OK. CPU temps are good, etc. So a lot of these disk surfaces have not been used before. Since I have no history, I'll check it.

But I'm still betting on the findings from my googling. Through 6.15 (IIRC) the OOPS has been identified and unsolved. Even saw one entry where Linux was involved discussing some of the options.

...

Sure im wrong... but btw... sometimes when i have disk issues i also check temperature for transient failures due mobo/processor...

I took a look at the BIOS displayed temps last reboot (no cool-down time, I ribbitted right away). Well below temps of concern. I need to install gkrelm(sp?) for the graphical monitoring. I need to look and see if there's a text version too.

...

...
Since I'm somewhat new here, and this is a recognized problem in the community (if my googling is correct), I am still uncertain how I should deal with this, other than the workaround. Can someone please respond to my simple question: "Do I bugzilla CentOS, RH or ignore it?"

<snip>

What im doing (up to anybody tells me other thing) is to post the bug at Centos Bug Tracker, this is what we are using, this is where i think we have to toss the bugs, well, my way only of interpreting things...

Thanks. At least someone answered.

...

Have good luck Bill

Thanks. And for taking the time too.

...

<snip sig stuff>

-- Bill

William L. Maltby

6:18 p.m.

New subject: How do we handle panics? Bug it here or RH or ignore it (been around *long* time apparently)?

On Mon, 2006-09-04 at 13:59 -0400, William L. Maltby wrote:

...

On Mon, 2006-09-04 at 18:57 +0200, J.J. Garcia wrote:

...
El lun, 04-09-2006 a las 10:29 -0400, William L. Maltby escribió:

...
On Sat, 2006-09-02 at 14:31 -0400, William L. Maltby wrote:

...
<snip>

...

I took a look at the BIOS displayed temps last reboot (no cool-down time, I ribbitted right away). Well below temps of concern. I need to install gkrelm(sp?) for the graphical monitoring. I need to look and see if there's a text version too.

FYI for the list in general

$ cat /proc/acpi/thermal_zone/THRM/temperature temperature: 39 C

There's a couple interesting things under /proc/acpi (discovered in my reading prior to the last OOPS). I remembered right after "send", of course.

William L. Maltby

5 Sep 5 Sep

9:31 p.m.

New subject: How do we handle panics? Bug it here [ SOLVED? Thx J.J. Garcia]

On Mon, 2006-09-04 at 13:59 -0400, William L. Maltby wrote:

...

On Mon, 2006-09-04 at 18:57 +0200, J.J. Garcia wrote:

...
El lun, 04-09-2006 a las 10:29 -0400, William L. Maltby escribió:

...
On Sat, 2006-09-02 at 14:31 -0400, William L. Maltby wrote:

...
"Unable to handle kernel paging request". I've saved the OOPS data from the logs for 6 panics since the 4.4 update.

s/6/10/ # Now

s/10/11/ # Now, and my first double

Had one more after that.

...

<snip>

...

...
Maybe it is not the point..., but have you considered the option of running 'badblock' (non-destructive/destructive after dd'ing to backup)? even on swap space?

That was not the solution, but read on! Stop at this next paragraph if you have no interest in the details.

Apparently, country living has it's downside: unreliable power from the electrical utility company caused the problems I've been experiencing for a couple months... maybe most, but not all? With BBS in place again, reliability seems to be restored.

Moral of the story: ATX systems may be more sensitive to power deviations than AT systems? Or... Well you know the other possibilities related to power.

...

OUTSTANDING! I had not even considered. I was wrapped up in my own loop: prev mobo bad, took a long time to ID that (was being led to believe mem, after being led to believe it started with seamonkey, after.. you can see that by now I am tightly focused).

And here is the value of someone taking the time to reply when no one else did. I was, indeed, trapped in in a non-productive train of thought. J.J. raised a Q that I had not considered. Being a pretty good associative processor, I immediately extended that to "what else have I discarded or not considered that might be in play"?

He offered what was needed to get me moving. Just plain old raising a question, causing me to start thinking outside the box I had fallen into.

Thanks be to J.J.

My eyes immediately swiveled left to the BBS containing the 2 new batteries undergoing their initial 24 hour charge before use. Something in J.J.'s question had made me recall that all my troubles began about the same time the BBS died. And as symptoms accumulated, I handled them in isolation, for some unknown reason. Uncharacteristic of me. Several other things happened in the same time-frame, clouding the situation for me.

To keep it short: - couple months back BBS died; about the same time some CentOS updates had occurred; had random lock-ups & X applications dying unexpectedly, - after some time, I figured it may be hardware; memtest86 shows memory errors, but inconsistently, over a few days, - later determined mobo at fault, RMA'd - xfer'd load to K6-III, it chugs along NP, no BBS, - later summer heat comes, Duke Energy announces brownouts possible, - K6-III still runs w/o a hiccup, w/o a BBS, - later built new machine; runs at 4.3 for a day or two and does OK with a couple freezes only; I'm thinking it's anomalies introduced as updates were applied when the mobo was failing and the 4.4 update may cure it, - 4.4 update done and OOPS/panic about a dozen times in 5 days, - I dutifully google and find posts indicating this is a known and unsolved problem through release 2.6.15 at least, - BBS back in service and normal activities produce no OOPs, no panic.

Now, how does it become believable to me that I'm the only one affected on this list when googling indicates a well-known problem? Lack of participation and lack of knowledge. I know there's lots of admin types here. I guess there may be some like me at home just running a desktop workstation with a little private LAN and old equipment. But never sure. And do they watch the lists as I do? Do they bother to reply? No way to know.

For this OP and some others, when I ask "anyone else seeing this?", I get no reply. That's expected if no one else is seeing it. So I figure that there's very few with my simple and older setup. Ergo, they may not be affected. In my experience, which is quite long, I have often exposed bugs that others do not see. IBM compiler types used to hate me (a *long* time ago).

That is how it becomes believable that I'm "The One".

...

<snip> > > Since I'm somewhat new here, and this is a recognized problem in the > > community (if my googling is correct), I am still uncertain how I should > > deal with this, other than the workaround. Can someone please respond to > > my simple question: "Do I bugzilla CentOS, RH or ignore it?"

Lack of response to this bothered me. Makes me wonder if I'm where I need to be.

...

...
...
<snip>

What im doing (up to anybody tells me other thing) is to post the bug at Centos Bug Tracker, this is what we are using, this is where i think we have to toss the bugs, well, my way only of interpreting things...

Thanks. At least someone answered.

...

...
Have good luck Bill

I did. You brought it by being willing to take the time.

...

Thanks. And for taking the time too.

<snip sig stuff>

-- Bill

J.J. Garcia

8 Sep 8 Sep

7:43 a.m.

New subject: How do we handle panics? Bug it here [ SOLVED? Thx J.J. Garcia]

El mar, 05-09-2006 a las 17:31 -0400, William L. Maltby escribió:

...

On Mon, 2006-09-04 at 13:59 -0400, William L. Maltby wrote:

...
On Mon, 2006-09-04 at 18:57 +0200, J.J. Garcia wrote:

...
El lun, 04-09-2006 a las 10:29 -0400, William L. Maltby escribió:

...
On Sat, 2006-09-02 at 14:31 -0400, William L. Maltby wrote:

...
"Unable to handle kernel paging request". I've saved the OOPS data from the logs for 6 panics since the 4.4 update.

s/6/10/ # Now

s/10/11/ # Now, and my first double

Had one more after that.

...
<snip>

...
...
Maybe it is not the point..., but have you considered the option of running 'badblock' (non-destructive/destructive after dd'ing to backup)? even on swap space?

That was not the solution, but read on! Stop at this next paragraph if you have no interest in the details.

Apparently, country living has it's downside: unreliable power from the electrical utility company caused the problems I've been experiencing for a couple months... maybe most, but not all? With BBS in place again, reliability seems to be restored.

City living too! :), im living in a place called "Barrio de la Luz" and if i can translate it to correctly english.... "Light Quarter/district not sure when this name was elected, anyway ironically the meaning is i new 3 UPS (3000VA) for almost 10 hosts due intermitent black-down by the local power company... :) ... as you see, even on city Bill

...

Moral of the story: ATX systems may be more sensitive to power deviations than AT systems? Or... Well you know the other possibilities related to power.

...
OUTSTANDING! I had not even considered. I was wrapped up in my own loop: prev mobo bad, took a long time to ID that (was being led to believe mem, after being led to believe it started with seamonkey, after.. you can see that by now I am tightly focused).

And here is the value of someone taking the time to reply when no one else did. I was, indeed, trapped in in a non-productive train of thought. J.J. raised a Q that I had not considered. Being a pretty good associative processor, I immediately extended that to "what else have I discarded or not considered that might be in play"?

He offered what was needed to get me moving. Just plain old raising a question, causing me to start thinking outside the box I had fallen into.

Thanks be to J.J.

You're welcome but it's due you are tenacious and the more important, never surrender! :) It's your point not mine.

...

My eyes immediately swiveled left to the BBS containing the 2 new batteries undergoing their initial 24 hour charge before use. Something in J.J.'s question had made me recall that all my troubles began about the same time the BBS died. And as symptoms accumulated, I handled them in isolation, for some unknown reason. Uncharacteristic of me. Several other things happened in the same time-frame, clouding the situation for me.

To keep it short:

couple months back BBS died; about the same time some CentOS updates had occurred; had random lock-ups & X applications dying unexpectedly,

after some time, I figured it may be hardware; memtest86 shows memory errors, but inconsistently, over a few days,

later determined mobo at fault, RMA'd

xfer'd load to K6-III, it chugs along NP, no BBS,

later summer heat comes, Duke Energy announces brownouts possible,

K6-III still runs w/o a hiccup, w/o a BBS,

later built new machine; runs at 4.3 for a day or two and does OK with a couple freezes only; I'm thinking it's anomalies introduced as updates were applied when the mobo was failing and the 4.4 update may cure it,

4.4 update done and OOPS/panic about a dozen times in 5 days,

I dutifully google and find posts indicating this is a known and unsolved problem through release 2.6.15 at least,

BBS back in service and normal activities produce no OOPs, no panic.

Now, how does it become believable to me that I'm the only one affected on this list when googling indicates a well-known problem? Lack of participation and lack of knowledge. I know there's lots of admin types here. I guess there may be some like me at home just running a desktop workstation with a little private LAN and old equipment. But never sure. And do they watch the lists as I do? Do they bother to reply? No way to know.

For this OP and some others, when I ask "anyone else seeing this?", I get no reply. That's expected if no one else is seeing it. So I figure that there's very few with my simple and older setup. Ergo, they may not be affected. In my experience, which is quite long, I have often exposed bugs that others do not see. IBM compiler types used to hate me (a *long* time ago).

That is how it becomes believable that I'm "The One".

If i understand well... "You're not the one" asking for support/ideas in the ml, i know the feeling anyway...

What i think is not all the ppl take the time to write, ask and follow hints/tests, but there're ppl that do, and as i can see there are initiatives to improve the usage/maintaining the system by the ppl working under curtains, sometimes i can think they feel the same as you, don't you think?

Well, take only the positive things and try to do your best, this is why i try to do when i reach the technical level and i know you do all the time in the ML, this is why :)

...

...
<snip> > > Since I'm somewhat new here, and this is a recognized problem in the > > community (if my googling is correct), I am still uncertain how I should > > deal with this, other than the workaround. Can someone please respond to > > my simple question: "Do I bugzilla CentOS, RH or ignore it?"

Lack of response to this bothered me. Makes me wonder if I'm where I need to be.

It depends on you, but if im not going to be considered as prophet, please! :) i remember a lil proverb: "It's best a bad world known than an unclear promise land", anyway... keep your eyes open to reality... ;)

...

...
...
...
<snip>

What im doing (up to anybody tells me other thing) is to post the bug at Centos Bug Tracker, this is what we are using, this is where i think we have to toss the bugs, well, my way only of interpreting things...

Thanks. At least someone answered.

...
...
Have good luck Bill

I did. You brought it by being willing to take the time.

Bill, keep on pushing the list, we all need your time, ideas and patience all the time, that's for sure!

And keep out smoking!

...

...
Thanks. And for taking the time too.

<snip sig stuff>

-- Bill

7151

Age (days ago)

7157

Last active (days ago)

discuss@lists.centos.org

6 comments

2 participants

tags (0)

participants (2)

J.J. Garcia
William L. Maltby