Hi all,
One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it.
I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this?
Here's a snipped from the log, around the time of the reboot:
Jun 2 14:59:59 usaxen02 kernel: EXT3-fs: mounted filesystem with ordered data mode. Jun 2 15:00:06 usaxen02 kernel: kjournald starting. Commit interval 5 seconds Jun 2 15:00:06 usaxen02 kernel: EXT3 FS on dm-8, internal journal Jun 2 15:00:06 usaxen02 kernel: EXT3-fs: mounted filesystem with ordered data mode. Jun 2 15:00:39 usaxen02 kernel: device vifvenu0 entered promiscuous mode Jun 2 15:00:39 usaxen02 kernel: ADDRCONF(NETDEV_UP): vifvenu0: link is not ready Jun 2 21:00:39 usaxen02 logger: /etc/xen/scripts/vif-bridge: iptables -A FORWARD -m physdev --physdev-in vifvenu0 -s 72.9.241.226 72.9.241.227 72.9.2 41.232 72.9.247.207 -j ACCEPT failed. If you are using iptables, this may affect networking for guest domains. Jun 2 15:00:43 usaxen02 kernel: blkback: ring-ref 8, event-channel 6, protocol 1 (x86_64-abi) Jun 2 15:00:43 usaxen02 kernel: blkback: ring-ref 9, event-channel 7, protocol 1 (x86_64-abi) Jun 2 15:00:43 usaxen02 kernel: ADDRCONF(NETDEV_CHANGE): vifvenu0: link becomes ready Jun 2 15:00:43 usaxen02 kernel: xenbr1: topology change detected, propagating Jun 2 15:00:43 usaxen02 kernel: xenbr1: port 5(vifvenu0) entering forwarding state Jun 2 17:30:22 usaxen02 syslogd 1.4.1: restart. Jun 2 17:30:22 usaxen02 kernel: klogd 1.4.1, log source = /proc/kmsg started. Jun 2 17:30:22 usaxen02 kernel: Bootdata ok (command line is ro root=/dev/VolGroup00/LogVol01 ide0=noprobe) Jun 2 17:30:22 usaxen02 kernel: Linux version 2.6.18-128.1.10.el5xen (mockbuild@builder10.centos.org) (gcc version 4.1.2 20080704 (Red Hat 4.1.2-44)) #1 SMP Thu May 7 11:07:18 EDT 2009 Jun 2 17:30:22 usaxen02 kernel: BIOS-provided physical RAM map: Jun 2 17:30:22 usaxen02 kernel: Xen: 0000000000000000 - 00000001de804000 (usable) Jun 2 17:30:22 usaxen02 kernel: DMI 2.4 present. Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x01] lapic_id[0x00] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x03] lapic_id[0x02] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x02] lapic_id[0x01] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC (acpi_id[0x04] lapic_id[0x03] enabled) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC_NMI (acpi_id[0x01] dfl dfl lint[0x1]) Jun 2 17:30:22 usaxen02 kernel: ACPI: LAPIC_NMI (acpi_id[0x02] dfl dfl lint[0x1]) Jun 2 17:30:22 usaxen02 kernel: ACPI: IOAPIC (id[0x02] address[0xfec00000] gsi_base[0]) Jun 2 17:30:22 usaxen02 kernel: IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 Jun 2 17:30:22 usaxen02 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl) Jun 2 17:30:22 usaxen02 kernel: ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level) Jun 2 17:30:22 usaxen02 kernel: Setting APIC routing to xen Jun 2 17:30:22 usaxen02 kernel: Using ACPI (MADT) for SMP configuration information Jun 2 17:30:22 usaxen02 kernel: Allocating PCI resources starting at d4000000 (gap: d0000000:2ff00000)
on 6-2-2009 2:30 PM Rudi Ahlers spake the following:
Hi all,
One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it.
I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this?
Here's a snipped from the log, around the time of the reboot:
<snip> Random reboots can happen fast enough that nothing gets into the logs. You can try setting up a console and have the system post there. It sometimes catches things.
But until then I would do the obvious... Make sure the system is clean and not overheating from "dust bunnies" filling up the chassis.
Remove and re-seat all cards and ram. Make sure all fans are working. Run memtest overnight if possible. Look back to when the reboots started and see if something was added or upgraded.
On 6/2/09, Scott Silva ssilva@sgvwater.com wrote:
on 6-2-2009 2:30 PM Rudi Ahlers spake the following:
Hi all,
One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it.
I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this?
Here's a snipped from the log, around the time of the reboot:
<snip> Random reboots can happen fast enough that nothing gets into the logs. You can try setting up a console and have the system post there. It sometimes catches things.
But until then I would do the obvious... Make sure the system is clean and not overheating from "dust bunnies" filling up the chassis.
Remove and re-seat all cards and ram. Make sure all fans are working. Run memtest overnight if possible. Look back to when the reboots started and see if something was added or upgraded.
Hi Scott, the server is in the USA, and I'm in ZA. I've been trying to get the IDC to look into the problem, but they're not very helpful and recon I need to check my software. I know the "server" runs desktop hardware, so it could be a hardware problem, but they don't seem to think so.
So, I'm trying todo everything I can, from my side, via SSH to see if I can figure it out.
On Tue, 02 Jun 2009 23:46:39 +0200 Rudi Ahlers wrote:
So, I'm trying todo everything I can, from my side, via SSH to see if I can figure it out.
If it's a hardware-related issue, as Scott suggested, you can spend all the time you want fiddling around with the software and you'll never solve the problem.
Frank Cox wrote:
So, I'm trying todo everything I can, from my side, via SSH to see if I can figure it out.
If it's a hardware-related issue, as Scott suggested, you can spend all the time you want fiddling around with the software and you'll never solve the problem.
Yes, you'll almost certainly end up swapping it out anyway, either all at once or piecemeal (power supply, memory, motherboard, etc.). It's probably not worth the time to try to diagnose it. Working hardware should stay up for years.
on 6-2-2009 2:46 PM Rudi Ahlers spake the following:
On 6/2/09, Scott Silva ssilva@sgvwater.com wrote:
on 6-2-2009 2:30 PM Rudi Ahlers spake the following:
Hi all,
One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it.
I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this?
Here's a snipped from the log, around the time of the reboot:
<snip> Random reboots can happen fast enough that nothing gets into the logs. You can try setting up a console and have the system post there. It sometimes catches things.
But until then I would do the obvious... Make sure the system is clean and not overheating from "dust bunnies" filling up the chassis.
Remove and re-seat all cards and ram. Make sure all fans are working. Run memtest overnight if possible. Look back to when the reboots started and see if something was added or upgraded.
Hi Scott, the server is in the USA, and I'm in ZA. I've been trying to get the IDC to look into the problem, but they're not very helpful and recon I need to check my software. I know the "server" runs desktop hardware, so it could be a hardware problem, but they don't seem to think so.
So, I'm trying todo everything I can, from my side, via SSH to see if I can figure it out.
Will the data center hang a serial port monitor on it for a while? Many of them will do it for free, or a few dollars a day, and give you remote access into it. Is it your server, or a lease/rental?
On 6/3/09, Scott Silva ssilva@sgvwater.com wrote:
on 6-2-2009 2:46 PM Rudi Ahlers spake the following:
On 6/2/09, Scott Silva ssilva@sgvwater.com wrote:
on 6-2-2009 2:30 PM Rudi Ahlers spake the following:
Hi all,
One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it.
I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this?
Here's a snipped from the log, around the time of the reboot:
<snip> Random reboots can happen fast enough that nothing gets into the logs. You can try setting up a console and have the system post there. It sometimes catches things.
But until then I would do the obvious... Make sure the system is clean and not overheating from "dust bunnies" filling up the chassis.
Remove and re-seat all cards and ram. Make sure all fans are working. Run memtest overnight if possible. Look back to when the reboots started and see if something was added or upgraded.
Hi Scott, the server is in the USA, and I'm in ZA. I've been trying to get the IDC to look into the problem, but they're not very helpful and recon I need to check my software. I know the "server" runs desktop hardware, so it could be a hardware problem, but they don't seem to think so.
So, I'm trying todo everything I can, from my side, via SSH to see if I can figure it out.
Will the data center hang a serial port monitor on it for a while? Many of them will do it for free, or a few dollars a day, and give you remote access into it. Is it your server, or a lease/rental?
It's a rented server from a 3rd party who feels that it's not their problem. Seems I need to get a new server, from someone else.
on 6-2-2009 11:53 PM Rudi Ahlers spake the following:
On 6/3/09, Scott Silva ssilva@sgvwater.com wrote:
on 6-2-2009 2:46 PM Rudi Ahlers spake the following:
On 6/2/09, Scott Silva ssilva-m4n3GYAQT2lWk0Htik3J/w@public.gmane.org wrote:
on 6-2-2009 2:30 PM Rudi Ahlers spake the following:
Hi all,
One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it.
I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this?
Here's a snipped from the log, around the time of the reboot:
<snip> Random reboots can happen fast enough that nothing gets into the logs. You can try setting up a console and have the system post there. It sometimes catches things.
But until then I would do the obvious... Make sure the system is clean and not overheating from "dust bunnies" filling up the chassis.
Remove and re-seat all cards and ram. Make sure all fans are working. Run memtest overnight if possible. Look back to when the reboots started and see if something was added or upgraded.
Hi Scott, the server is in the USA, and I'm in ZA. I've been trying to get the IDC to look into the problem, but they're not very helpful and recon I need to check my software. I know the "server" runs desktop hardware, so it could be a hardware problem, but they don't seem to think so.
So, I'm trying todo everything I can, from my side, via SSH to see if I can figure it out.
Will the data center hang a serial port monitor on it for a while? Many of them will do it for free, or a few dollars a day, and give you remote access into it. Is it your server, or a lease/rental?
It's a rented server from a 3rd party who feels that it's not their problem. Seems I need to get a new server, from someone else.
That might be best, if just to get a decent provider. If they aren't willing to check it, they are a poor excuse for a service business. And the fact that the system isn't functioning properly should be enough for you to get out of a contract if you have one.
On Wed, Jun 3, 2009 at 1:53 AM, Rudi Ahlers rudiahlers@gmail.com wrote: <snip>
It's a rented server from a 3rd party who feels that it's not their problem. Seems I need to get a new server, from someone else.
Do you have an SLA from the provider? If you look for a Dedicated box from another provider, I suggest that you check out the offers on http://www.webhostingtalk.com/ and then carefully read the reviews of any providers you begin to consider and ask questions of the providers you consider. I've seen photos on the blog of OLM and I know how they have a ton of spares. HW does break. It's very common. Easy for them to point fingers and blame it on your SW, but if it turns out to be HW, move.....
Rudi Ahlers пишет:
Hi all,
One of our CentOS 5.3 randomly reboots, at different times of the day, and I can't see why it's doing it.
I have looked through the logs, but don't see any thing in there that shows me why it has rebooted. How can I debug this?
Hi,
try to enable kdump to get kernel dump, if this software-related issue.
http://download.swsoft.com/virtuozzo/virtuozzo4.0/docs/en/lin/VzLinuxUG/2002... Using Kexec and Kdump For System Troubleshooting
yum install kexec-tools edit /etc/grub.conf and append to the end of the kernel line: "crashkernel=128M@16M" chkconfig kdump on reboot
Also look this:
http://kbase.redhat.com/faq/docs/DOC-6039 How do I configure kexec/kdump on Red Hat Enterprise Linux 5?
http://kbase.redhat.com/faq/docs/DOC-2119 How can I voluntarily crash my machine to test if netdump/diskdump/kdump I configured works?
http://kbase.redhat.com/faq/docs/DOC-5413 My server crashes once in awhile. How can I debug it?
http://kbase.redhat.com/faq/docs/DOC-1742 My system has started to hang randomly. What information does Red Hats technical support need to diagnose the problem?
http://kbase.redhat.com/faq/docs/DOC-10828 My Red Hat Enterprise Linux 2.1 system had a kernel panic, an oops message, or is freezing for no apparent reason. How can I find out what is causing this?
Next, I recommend you setup and run memtest86+.x86_64 : Stand-alone memory tester for x86 and x86-64 computers
You should ask the support to reboot machine for a night and chose the memtest in grub loader. If DC has ipkvm - ask it.
Also what a network card on your server ? I had some troubles with non-brand network card..