[CentOS] Re: system gets suspended automatically!

Sat Feb 9 08:46:59 UTC 2008
Chandra <shekharc.2004 at gmail.com>

========================================================================
>  > Memory for crash kernel (0x0 to 0x0) notwithin permissible range
>  > ..MP-BIOS bug: 8254 timer not connected to IO-APIC
>  > Red Hat nash version 5.1.19.6 starting
>  >     Welcome to CentOS release 5 (Final)
>  > ....
>  > .....
>  > and continues normal booting.

2008/2/6 Ross S. W. Walker <rwalker at medallion.com>:
> I don't think that is the "harmless" error message mentioned in the release
> notes as that had to do with the "crash kernel".
>
>  I saw this same error on a Dell AMD system. It seems the motherboard in
> that system didn't do ACPI IRQ routing as the kernel expected and
> experienced a lot of random problems until "acpi=noirq" was passed as a
> kernel option to disable ACPI IRQ routing defaulting back to the APIC IRQ
> routing. If that still gives you problems then you may need to use
> "irq=poll" which forces the kernel to poll for IRQ changes.

At first, I am sorry for my late reply. I was very busy.

Well, "acpi=noirq" didn't work but after using the "irq=poll" option,
the message "MP-BIOS bug: 8254 timer not connected to IO-APIC" stopped
appearing. However, the message "Memory for crash kernel (0x0 to 0x0)
notwithin permissible range" is still appearing. I have started my
computation program after booting the OS with "irq=poll" option. I
will report later if it really worked and system doesn't freez anymore
after running the program for long time.

This is the grub.conf: kernel /boot/vmlinuz-2.6.18-53.el5PAE ro
root=LABEL=/12 irq=poll early-login quiet. Also, the deamon "acpid" is
not running.
============================================================================


============================================================================
2008/2/6 Tru Huynh <tru at centos.org>:
> Looks like some hardware crash to me, otherwise you would have
> some logs for oops/hangs.
>
> Can you make available somewhere your /var/log/messages (don't
> send a few MB file to the list)
> and the /proc/cmdline content ?
>
> You said you used "acpi=off" and acpid disabled is it still the case?
>
> ~> chkconfig --list cpuspeed
> cpuspeed        0:off   1:on    2:off   3:off   4:off   5:off   6:off

As far as the the kernel log message and content of "/proc/cmdline" is
concerned, I will certainly make these available if the aforementioned
"irq=poll" optioned also fails. And yes, the until last time,
"acpi=off, noapic" options were passed to the kernel and acpid were
kept stopped. The output of "chkconfig --list cpuspeed" is "cpuspeed
     0:off   1:on    2:on    3:on    4:on    5:on    6:off". However,
the "service cpuspeed start" or "service cpuspeed stop" commands
doesn't show any message. Also, the gui to control the services
(system-config-services) shows that cpuspeed is stopped. So, I guess,
cpuspeed is of no effect. But anyway, I will report the details a
little lated after I finish checking the "irq=poll" option.
=================================================================

In the mean time, I also verified that it is NOT a hardware problem. I
installed FC5 in one of the other partitions and ran a SERIAL version
of the same program (i.e. no OpenMP, gcc without -fopenmp flag) and it
didn't freez at all. Well, I had to pass the "noapic" option during
this installation and it didn't recognize my network card ;). When I
run the PARALLEL version of the program (gcc with -fopenmp option), it
ran for few hours and stopped with an error message something like
"libopenmp: not sufficient memory...allocating 60 bytes". However, the
system didn't hang or didn't reboot. So, I believe, it has something
to do with the OpenMP, not the hardware.

Anyway, thank you for all your replies. I will keep posting the updates here.

- Chandra