I don't think that is the "harmless" error message mentioned in the release notes as that had to do with the "crash kernel".
I saw this same error on a Dell AMD system. It seems the motherboard in that system didn't do ACPI IRQ routing as the kernel expected and experienced a lot of random problems until "acpi=noirq" was passed as a kernel option to disable ACPI IRQ routing defaulting back to the APIC IRQ routing. If that still gives you problems then you may need to use "irq=poll" which forces the kernel to poll for IRQ changes.
-Ross
----- Original Message ----- From: centos-bounces@centos.org centos-bounces@centos.org To: CentOS mailing list centos@centos.org Sent: Wed Feb 06 08:03:47 2008 Subject: Re: [CentOS] Re: system gets suspended automatically!
On Wed, 2008-02-06 at 21:48 +0900, Chandra wrote:
=========================================================== AN ERROR IS SHOWING UP AT BOOT TIME. It seems to be a BUG: ============================================================ Memory for crash kernel (0x0 to 0x0) notwithin permissible range ..MP-BIOS bug: 8254 timer not connected to IO-APIC Red Hat nash version 5.1.19.6 starting Welcome to CentOS release 5 (Final) .... ..... and continues normal booting.
Any idea how to deal with it. Please not that it has 4 CPUs.
Thanks a lot,
- Chandra
Check the Release Notes. It is apparently harmless. I see it on all my CentOS 5.1 machines.
B.J.
Ubuntu 7.10, Linux 2.6.22-14-generic unknown 08:02:44 up 21:42, 2 users, load average: 0.15, 0.22, 0.16
_______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
______________________________________________________________________ This e-mail, and any attachments thereto, is intended only for use by the addressee(s) named herein and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, you are hereby notified that any dissemination, distribution or copying of this e-mail, and any attachments thereto, is strictly prohibited. If you have received this e-mail in error, please immediately notify the sender and permanently delete the original and any copy or printout thereof.
========================================================================
Memory for crash kernel (0x0 to 0x0) notwithin permissible range ..MP-BIOS bug: 8254 timer not connected to IO-APIC Red Hat nash version 5.1.19.6 starting Welcome to CentOS release 5 (Final) .... ..... and continues normal booting.
2008/2/6 Ross S. W. Walker rwalker@medallion.com:
I don't think that is the "harmless" error message mentioned in the release notes as that had to do with the "crash kernel".
I saw this same error on a Dell AMD system. It seems the motherboard in that system didn't do ACPI IRQ routing as the kernel expected and experienced a lot of random problems until "acpi=noirq" was passed as a kernel option to disable ACPI IRQ routing defaulting back to the APIC IRQ routing. If that still gives you problems then you may need to use "irq=poll" which forces the kernel to poll for IRQ changes.
At first, I am sorry for my late reply. I was very busy.
Well, "acpi=noirq" didn't work but after using the "irq=poll" option, the message "MP-BIOS bug: 8254 timer not connected to IO-APIC" stopped appearing. However, the message "Memory for crash kernel (0x0 to 0x0) notwithin permissible range" is still appearing. I have started my computation program after booting the OS with "irq=poll" option. I will report later if it really worked and system doesn't freez anymore after running the program for long time.
This is the grub.conf: kernel /boot/vmlinuz-2.6.18-53.el5PAE ro root=LABEL=/12 irq=poll early-login quiet. Also, the deamon "acpid" is not running. ============================================================================
============================================================================ 2008/2/6 Tru Huynh tru@centos.org:
Looks like some hardware crash to me, otherwise you would have some logs for oops/hangs.
Can you make available somewhere your /var/log/messages (don't send a few MB file to the list) and the /proc/cmdline content ?
You said you used "acpi=off" and acpid disabled is it still the case?
~> chkconfig --list cpuspeed cpuspeed 0:off 1:on 2:off 3:off 4:off 5:off 6:off
As far as the the kernel log message and content of "/proc/cmdline" is concerned, I will certainly make these available if the aforementioned "irq=poll" optioned also fails. And yes, the until last time, "acpi=off, noapic" options were passed to the kernel and acpid were kept stopped. The output of "chkconfig --list cpuspeed" is "cpuspeed 0:off 1:on 2:on 3:on 4:on 5:on 6:off". However, the "service cpuspeed start" or "service cpuspeed stop" commands doesn't show any message. Also, the gui to control the services (system-config-services) shows that cpuspeed is stopped. So, I guess, cpuspeed is of no effect. But anyway, I will report the details a little lated after I finish checking the "irq=poll" option. =================================================================
In the mean time, I also verified that it is NOT a hardware problem. I installed FC5 in one of the other partitions and ran a SERIAL version of the same program (i.e. no OpenMP, gcc without -fopenmp flag) and it didn't freez at all. Well, I had to pass the "noapic" option during this installation and it didn't recognize my network card ;). When I run the PARALLEL version of the program (gcc with -fopenmp option), it ran for few hours and stopped with an error message something like "libopenmp: not sufficient memory...allocating 60 bytes". However, the system didn't hang or didn't reboot. So, I believe, it has something to do with the OpenMP, not the hardware.
Anyway, thank you for all your replies. I will keep posting the updates here.
- Chandra
2008/2/6 Ross S. W. Walker rwalker@medallion.com:
I don't think that is the "harmless" error message mentioned in the release notes as that had to do with the "crash kernel".
I saw this same error on a Dell AMD system. It seems the motherboard in that system didn't do ACPI IRQ routing as the kernel expected and experienced a lot of random problems until "acpi=noirq" was passed as a kernel option to disable ACPI IRQ routing defaulting back to the APIC IRQ routing. If that still gives you problems then you may need to use "irq=poll" which forces the kernel to poll for IRQ changes.
-Ross
Thanks a lot for the tip. This seems to have worked. My system is running continuously from last 45 hours without any hang. This is the miracle grub.conf entry: kernel /boot/vmlinuz-2.6.18-53.el5PAE ro root=LABEL=/12 irq=poll acpi=off noapic nolapic early-login quiet with the acpid daemon off. It is working well with the OpenMP parallelization.
When I tried without the "noapic nolapic" option in grub.conf, the system worked with serial code but hanged while the OpenMP is used for parallelization.
Anyway, thanks a lot for all you guys' responses.
Well, I don't have much idea but when the kernel detects multiple cpus, the "irq=poll" entry should be added by default. It may be useful in solving a lot of such problems (well, just a thought) (-__^)
-Chandra
on 2/12/2008 9:13 PM Chandra spake the following:
2008/2/6 Ross S. W. Walker rwalker-gVKREKZ8iPqaMJb+Lgu22Q@public.gmane.org:
I don't think that is the "harmless" error message mentioned in the release notes as that had to do with the "crash kernel".
I saw this same error on a Dell AMD system. It seems the motherboard in that system didn't do ACPI IRQ routing as the kernel expected and experienced a lot of random problems until "acpi=noirq" was passed as a kernel option to disable ACPI IRQ routing defaulting back to the APIC IRQ routing. If that still gives you problems then you may need to use "irq=poll" which forces the kernel to poll for IRQ changes.
-Ross
Thanks a lot for the tip. This seems to have worked. My system is running continuously from last 45 hours without any hang. This is the miracle grub.conf entry: kernel /boot/vmlinuz-2.6.18-53.el5PAE ro root=LABEL=/12 irq=poll acpi=off noapic nolapic early-login quiet with the acpid daemon off. It is working well with the OpenMP parallelization.
When I tried without the "noapic nolapic" option in grub.conf, the system worked with serial code but hanged while the OpenMP is used for parallelization.
Anyway, thanks a lot for all you guys' responses.
Well, I don't have much idea but when the kernel detects multiple cpus, the "irq=poll" entry should be added by default. It may be useful in solving a lot of such problems (well, just a thought) (-__^)
-Chandra
As has been stated many times on this list, unless the upstream creator decides to make that change, it won't get done here either. CentOS is meant to be as close to the RedHat offering as you can get without a support contract.
But I'm sure they would appreciate a few $$$ here and there! ;-P
Chandra wrote:
Well, I don't have much idea but when the kernel detects multiple cpus, the "irq=poll" entry should be added by default. It may be useful in solving a lot of such problems (well, just a thought) (-__^)
I'm not familiar with the actual impact of that option, but I'm guessing it could be a real hit on IO performance in systems doing lots of network and disk operations?
most of my centos systems are multiprocessor and I've not had to set that flag on any of them. they are mostly used for network middleware and database development and often run heavily saturated with IO workloads.