Since repeated power cuts last week I've been a bit worried about the state of my server. Today it spontaneously rebooted - but failed to complete. Now it isn't recognising me as a user, by the look of it, so I've got some questions before I do something that might make things worse.
I can boot as root, and it appears that all my files are present. System-config-user sees me as user 500, which is correct, so it must be my kde login that is trashed. Foolishly, I didn't install a second desktop system, so I can't deal with things there.
While waiting for a reply I'm going to try to get essential files from my home directory onto an external disk (I did a huge backup yesterday of data files).
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
This is the information I gathered from /var/log/messages:
shows a host of setroubleshoot messages, culminating in SELinux is preventing dovecot (dovecot_t) "append" to /var/log/mail/mail.info (sendmail_log_t). For complete SELinux messages run.....
then syslogd 1.4.1:restart
no reason given. Everything then continues as normal until suddenly EXT3-fs error (device sda7): ext3_lookup: unlinked inode 638978 in dir #638977.
Lots of 'Last message repeated' messages, and eventualy it shut down.
I did allow fsk to run when I restarted the box.
That doesn't seem a lot to go on. Any advice? Thanks
Anne
On Thu, Jan 29, 2009 at 9:15 AM, Anne Wilson cannewilson@googlemail.com wrote: <snip>
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
<snip>
Most hdd manufacturers have bootable CD images you can download which have utilities and thorough diagnostics.
On Thursday 29 January 2009 10:15:38 am Anne Wilson wrote:
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
try smartctl to see what the monitors have been finding for you.
man smartctl
Alex ===
2009/1/29 Alex H. Vandenham alex@avantel.ca:
On Thursday 29 January 2009 10:15:38 am Anne Wilson wrote:
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
try smartctl to see what the monitors have been finding for you.
man smartctl
Thanks. I'd been trying to remember what command I needed for that :-)
The short test has completed without errors. I'll run the long test during dinner. Assuming that that also runs without errors, I guess that the next thing is memtest?
More suggestions?
Thanks
Anne
on 1-29-2009 8:30 AM Anne Wilson spake the following:
2009/1/29 Alex H. Vandenham alex-qMVNeVs1MAKw5LPnMra/2Q@public.gmane.org:
On Thursday 29 January 2009 10:15:38 am Anne Wilson wrote:
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
try smartctl to see what the monitors have been finding for you.
man smartctl
Thanks. I'd been trying to remember what command I needed for that :-)
The short test has completed without errors. I'll run the long test during dinner. Assuming that that also runs without errors, I guess that the next thing is memtest?
More suggestions?
Thanks
Anne
If you had many power failures, the filesystem might just be severely trashed. Journals and files out of sync, etc... If a good fsck didn't fix it, you might just be in for a wipe-reinstall, or many hours of finding and fixing corrupted files.. I would install to a new drive, and then you can take some time recovering from the old drive as you find things missing. That way you will still have the old system for whatever might come up. I always seem to find something that didn't get backed up properly.
On Thursday 29 January 2009 11:37:00 am Scott Silva wrote:
If you had many power failures, the filesystem might just be severely trashed. Journals and files out of sync, etc... If a good fsck didn't fix it, you might just be in for a wipe-reinstall, or many hours of finding and fixing corrupted files.. I would install to a new drive, and then you can take some time recovering from the old drive as you find things missing. That way you will still have the old system for whatever might come up. I always seem to find something that didn't get backed up properly.
Since you can log in as root, a less drastic first step might be to:
Change your runlevel (as root) to 3 and try a text login (as you) for access to your files.
man init
If the kde files are trashed, perhaps you can create another user on the system and copy over your personal files, or do a diff to see which kde files might have been trashed.
If it really looks bad (disk bad and/or major file corruption) , then I agree that a new install might be the way to go but that's significant pain . . .
Alex ===
2009/1/29 Alex H. Vandenham alex@avantel.ca:
On Thursday 29 January 2009 11:37:00 am Scott Silva wrote:
If you had many power failures, the filesystem might just be severely trashed. Journals and files out of sync, etc... If a good fsck didn't fix it, you might just be in for a wipe-reinstall, or many hours of finding and fixing corrupted files.. I would install to a new drive, and then you can take some time recovering from the old drive as you find things missing. That way you will still have the old system for whatever might come up. I always seem to find something that didn't get backed up properly.
Since you can log in as root, a less drastic first step might be to:
Change your runlevel (as root) to 3 and try a text login (as you) for access to your files.
man init
If the kde files are trashed, perhaps you can create another user on the system and copy over your personal files, or do a diff to see which kde files might have been trashed.
If it really looks bad (disk bad and/or major file corruption) , then I agree that a new install might be the way to go but that's significant pain . . .
Yes, I'll try that first - if I can convince myself that the hardware is OK. I really wish I know what caused it, though.
Anne
2009/1/29 Scott Silva ssilva@sgvwater.com:
on 1-29-2009 8:30 AM Anne Wilson spake the following:
2009/1/29 Alex H. Vandenham alex-qMVNeVs1MAKw5LPnMra/2Q@public.gmane.org:
On Thursday 29 January 2009 10:15:38 am Anne Wilson wrote:
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
try smartctl to see what the monitors have been finding for you.
man smartctl
Thanks. I'd been trying to remember what command I needed for that :-)
The short test has completed without errors. I'll run the long test during dinner. Assuming that that also runs without errors, I guess that the next thing is memtest?
More suggestions?
Thanks
Anne
If you had many power failures, the filesystem might just be severely trashed. Journals and files out of sync, etc... If a good fsck didn't fix it, you might just be in for a wipe-reinstall, or many hours of finding and fixing corrupted files.. I would install to a new drive, and then you can take some time recovering from the old drive as you find things missing. That way you will still have the old system for whatever might come up. I always seem to find something that didn't get backed up properly.
Two days ago I discovered that the failures had indeed totally trashed the system. I did re-install, formatting only / and /boot, but I've had a couple of these spontaneous shutdowns since then, which is why I suspected hardware failure.
I've got copies of just about everything, I think, on an external drive, and I could try another drive as you suggest, mounting the old one in an external case, which I have. I can cope with this, but I'm deeply unhappy about not knowing what happened, and whether it is likely to happen again.
Anne
on 1-29-2009 9:02 AM Anne Wilson spake the following:
2009/1/29 Scott Silva ssilva@sgvwater.com:
on 1-29-2009 8:30 AM Anne Wilson spake the following:
2009/1/29 Alex H. Vandenham alex-qMVNeVs1MAKw5LPnMra/2Q-XMD5yJDbdMReXY1tMh2IBg@public.gmane.org:
On Thursday 29 January 2009 10:15:38 am Anne Wilson wrote:
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
try smartctl to see what the monitors have been finding for you.
man smartctl
Thanks. I'd been trying to remember what command I needed for that :-)
The short test has completed without errors. I'll run the long test during dinner. Assuming that that also runs without errors, I guess that the next thing is memtest?
More suggestions?
Thanks
Anne
If you had many power failures, the filesystem might just be severely trashed. Journals and files out of sync, etc... If a good fsck didn't fix it, you might just be in for a wipe-reinstall, or many hours of finding and fixing corrupted files.. I would install to a new drive, and then you can take some time recovering from the old drive as you find things missing. That way you will still have the old system for whatever might come up. I always seem to find something that didn't get backed up properly.
Two days ago I discovered that the failures had indeed totally trashed the system. I did re-install, formatting only / and /boot, but I've had a couple of these spontaneous shutdowns since then, which is why I suspected hardware failure.
I've got copies of just about everything, I think, on an external drive, and I could try another drive as you suggest, mounting the old one in an external case, which I have. I can cope with this, but I'm deeply unhappy about not knowing what happened, and whether it is likely to happen again.
Anne
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
Anne
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
2009/1/29 Les Mikesell lesmikesell@gmail.com:
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
I'm beginning to think that way. The smartctl long test has completed without errors. I think that I'll dig out the specs for the mobo/cpu tonight, then buy a new PSU and CPU fan in the morning.
Anne
Anne Wilson wrote:
2009/1/29 Les Mikesell lesmikesell@gmail.com:
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
I'm beginning to think that way. The smartctl long test has completed without errors. I think that I'll dig out the specs for the mobo/cpu tonight, then buy a new PSU and CPU fan in the morning.
Anne
Anne, the machine I'm using right now ate my lunch big-time, perhaps a year ago. It turned out to be the front panel momentary power switch on the junky consumer-grade case. The really weird thing is, this box runs 24x7 and the switch hasn't been used very much. Go figure. Work-around/proof was to disconnect the power switch entirely, plugging reset switch to the vacated pwr sw header.
On 1/29/09, Anne Wilson cannewilson@googlemail.com wrote:
2009/1/29 Les Mikesell lesmikesell@gmail.com:
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
I'm beginning to think that way. The smartctl long test has completed without errors. I think that I'll dig out the specs for the mobo/cpu tonight, then buy a new PSU and CPU fan in the morning.
Assuming that the Diagnostic tests you run on the hard drive and RAM are OK, if the box was made by Dell, Compaq/HP, etc., they probably have Diagnostics you can run on the mobo/cpu that you can Download from their web site. If not, hopefully from the web site of the mobo manufacturer.
You said that the UPS is fully charged. I wonder if you need a UPS with larger capacity and if your UPS is working properly. Depends on how long the frequent outages were that day. My observation is that if the power goes down (especially when we have Thunderstorm activity) it may come back up and then go down again, sometimes in 1 or 2 minutes or less.
The cheap PSU's are vastly over rated, with regard to their capacity. The one I bought for this Dell Dimension 2400 a few weeks ago says "550 watts". The motherboard repairman told me he believes the true capacity is about 50% of that.....
If your data is critical, the backups should be stored off site. There are some companies mentioned on webhostingtalk.com who provide backup service to their servers over the Internet.
In my own box, the vast majority of the symptoms, if not all symptoms, disappeared, after I unplugged the connectors and reseated them. Then, the new PSU..... In my wife's box, a strange intermittent problem, where the BIOS couldn't see the hard drive when booting, disappeared, when I replaced the EIDE cable.
When you have the cover off, put your hand on the Shroud over the CPU and see whether or not it is hot or cool. If it is hot, that's not an indication of good cooling. The Capacitors on the motherboard should look alike and not be hot to the touch. GL
On Thursday 29 January 2009 20:23:40 Lanny Marcus wrote:
Anne, the machine I'm using right now ate my lunch big-time, perhaps a year ago. It turned out to be the front panel momentary power switch on the junky consumer-grade case. The really weird thing is, this box runs 24x7 and the switch hasn't been used very much. Go figure. Work-around/proof was to disconnect the power switch entirely, plugging reset switch to the vacated pwr sw header.
As far as I can see, all that's gone is my ~/.kde - and that's a big enough pain :-) So far, everything I've checked has been fine, so I'm thinking that it's either power or over-heating. Since I've been running tests for several hours, I'm inclined to rule out over-heating, so I'll get a new PSU. The case was not a cheap one, but it did have a built-in PSU. Come to think of it, I do believe that I have a 400w PSU in my cupboard, that is a known good brand, and unused.
Anne
On Thursday 29 January 2009 20:23:40 Lanny Marcus wrote:
Assuming that the Diagnostic tests you run on the hard drive and RAM are OK, if the box was made by Dell, Compaq/HP, etc., they probably have Diagnostics you can run on the mobo/cpu that you can Download from their web site. If not, hopefully from the web site of the mobo manufacturer.
It's a home-build. I've been doing this since about 1990. The drives are Hitachi, and I seem to recall that once before I tried to run the Hitachi diagnostics, without success. My request for help/information from them was ignored. However, at the time I got the drives they had a good warranty period, which is something I always check as a guide to how much confidence the manufacturer has in them.
You said that the UPS is fully charged. I wonder if you need a UPS with larger capacity and if your UPS is working properly.
I don't think there's any problem with the UPS (APC).
Depends on how long the frequent outages were that day. My observation is that if the power goes down (especially when we have Thunderstorm activity) it may come back up and then go down again, sometimes in 1 or 2 minutes or less.
The village has had several weeks of being powered by emergency generators stuck in fields. We've had very many power dips and momentary losses, then in the space of last week we had an 11-hour outage, followed a few days later by a 4.5 hour one and two short ones soon after that. I think it was the rapidity of those outages that caused the problem.
The cheap PSU's are vastly over rated, with regard to their capacity. The one I bought for this Dell Dimension 2400 a few weeks ago says "550 watts". The motherboard repairman told me he believes the true capacity is about 50% of that.....
I buy only recommended brands, and watch the load. However, that box has a PSU that came with the (not cheap) box, so I don't know the quality. I think it should be replaced. I can't remember its rating - I'll check tomorrow when I pull the box out.
If your data is critical, the backups should be stored off site. There are some companies mentioned on webhostingtalk.com who provide backup service to their servers over the Internet.
Critical only to me - personal stuff. All the same, I take your point. I will move the backups to a safer spot.
In my own box, the vast majority of the symptoms, if not all symptoms, disappeared, after I unplugged the connectors and reseated them. Then, the new PSU..... In my wife's box, a strange intermittent problem, where the BIOS couldn't see the hard drive when booting, disappeared, when I replaced the EIDE cable.
When you have the cover off, put your hand on the Shroud over the CPU and see whether or not it is hot or cool. If it is hot, that's not an indication of good cooling. The Capacitors on the motherboard should look alike and not be hot to the touch. GL
I'll check those at the same time. Thanks for the reply
Anne
At 03:44 PM 1/29/2009, you wrote:
On Thursday 29 January 2009 20:23:40 Lanny Marcus wrote:
Assuming that the Diagnostic tests you run on the hard drive and RAM are OK, if the box was made by Dell, Compaq/HP, etc., they probably have Diagnostics you can run on the mobo/cpu that you can Download from their web site. If not, hopefully from the web site of the mobo manufacturer.
It's a home-build. I've been doing this since about 1990. The drives are Hitachi, and I seem to recall that once before I tried to run the Hitachi diagnostics, without success. My request for help/information from them was ignored. However, at the time I got the drives they had a good warranty period, which is something I always check as a guide to how much confidence the manufacturer has in them.
You said that the UPS is fully charged. I wonder if you need a UPS with larger capacity and if your UPS is working properly.
I don't think there's any problem with the UPS (APC).
Depends on how long the frequent outages were that day. My observation is that if the power goes down (especially when we have Thunderstorm activity) it may come back up and then go down again, sometimes in 1 or 2 minutes or less.
The village has had several weeks of being powered by emergency generators stuck in fields. We've had very many power dips and momentary losses, then in the space of last week we had an 11-hour outage, followed a few days later by a 4.5 hour one and two short ones soon after that. I think it was the rapidity of those outages that caused the problem.
The cheap PSU's are vastly over rated, with regard to their capacity. The one I bought for this Dell Dimension 2400 a few weeks ago says "550 watts". The motherboard repairman told me he believes the true capacity is about 50% of that.....
I buy only recommended brands, and watch the load. However, that box has a PSU that came with the (not cheap) box, so I don't know the quality. I think it should be replaced. I can't remember its rating - I'll check tomorrow when I pull the box out.
If your data is critical, the backups should be stored off site. There are some companies mentioned on webhostingtalk.com who provide backup service to their servers over the Internet.
Critical only to me - personal stuff. All the same, I take your point. I will move the backups to a safer spot.
In my own box, the vast majority of the symptoms, if not all symptoms, disappeared, after I unplugged the connectors and reseated them. Then, the new PSU..... In my wife's box, a strange intermittent problem, where the BIOS couldn't see the hard drive when booting, disappeared, when I replaced the EIDE cable.
When you have the cover off, put your hand on the Shroud over the CPU and see whether or not it is hot or cool. If it is hot, that's not an indication of good cooling. The Capacitors on the motherboard should look alike and not be hot to the touch. GL
Capacitors on the motherboard will look like they are rounded and bowing upward or cracked, or may even have some yellowish, dried, liquid, if they are defective. If they are intermittent, they may show only the slightest signs of this. The capacitors have a liquid inside that literally cooks off if they get too hot.
I'll check those at the same time. Thanks for the reply
Anne
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
No virus found in this incoming message. Checked by AVG - http://www.avg.com Version: 8.0.176 / Virus Database: 270.10.15/1923 - Release Date: 1/29/2009 7:13 AM
On Jan 29, 2009, at 2:44 PM, Anne Wilson wrote:
You said that the UPS is fully charged. I wonder if you need a UPS with larger capacity and if your UPS is working properly.
I don't think there's any problem with the UPS (APC).
I've seen APC rack mount systems shut down the power outlets when doing their self-test and the batteries were dodgy.
--Chris
On Thu, Jan 29, 2009 at 6:07 PM, Chris Boyd cboyd@gizmopartners.com wrote:
On Jan 29, 2009, at 2:44 PM, Anne Wilson wrote:
You said that the UPS is fully charged. I wonder if you need a UPS with larger capacity and if your UPS is working properly.
I don't think there's any problem with the UPS (APC).
I've seen APC rack mount systems shut down the power outlets when doing their self-test and the batteries were dodgy.
After reading that her village was on generator power, etc., I would be suspicious of the health of that UPS. If the box has problems, it may be because the UPS was unable to cope with the very heavy prolonged workload. If the UPS does not have Automatic Voltage Regulation, without using the battery, it would also have been working harder?
On Thursday 29 January 2009 23:46:12 Lanny Marcus wrote:
On Thu, Jan 29, 2009 at 6:07 PM, Chris Boyd cboyd@gizmopartners.com wrote:
On Jan 29, 2009, at 2:44 PM, Anne Wilson wrote:
You said that the UPS is fully charged. I wonder if you need a UPS with larger capacity and if your UPS is working properly.
I don't think there's any problem with the UPS (APC).
I've seen APC rack mount systems shut down the power outlets when doing their self-test and the batteries were dodgy.
After reading that her village was on generator power, etc., I would be suspicious of the health of that UPS. If the box has problems, it may be because the UPS was unable to cope with the very heavy prolonged workload. If the UPS does not have Automatic Voltage Regulation, without using the battery, it would also have been working harder?
Technicalities of power supply are not in any way my expertise. Are you saying that it would be wise to change the battery?
Anne
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf
Of
Anne Wilson Sent: Friday, January 30, 2009 9:15 AM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
Technicalities of power supply are not in any way my expertise. Are you saying that it would be wise to change the battery?
Three, four years is a normal lifespan for a UPS-battery IIRC. Less if the batteries are used often, ie you have plenty of brown- and/or powerouts.
Most UPS-brands have a battery-exchange program.
However, in some cases it's cheaper to just get a whole new UPS... 8-/
HTH.
on 1-30-2009 12:15 AM Anne Wilson spake the following:
On Thursday 29 January 2009 23:46:12 Lanny Marcus wrote:
On Thu, Jan 29, 2009 at 6:07 PM, Chris Boyd cboyd-1sEnLahcNUY4yJ9dIELTZQC/G2K4zDHf@public.gmane.org wrote:
On Jan 29, 2009, at 2:44 PM, Anne Wilson wrote:
You said that the UPS is fully charged. I wonder if you need a UPS with larger capacity and if your UPS is working properly.
I don't think there's any problem with the UPS (APC).
I've seen APC rack mount systems shut down the power outlets when doing their self-test and the batteries were dodgy.
After reading that her village was on generator power, etc., I would be suspicious of the health of that UPS. If the box has problems, it may be because the UPS was unable to cope with the very heavy prolonged workload. If the UPS does not have Automatic Voltage Regulation, without using the battery, it would also have been working harder?
Technicalities of power supply are not in any way my expertise. Are you saying that it would be wise to change the battery?
Batteries are usually good for 3 to 4 years, but I have about a dozen APC UPS's that didn't make 2 years. Either the batteries were a defective lot, or the UPS's themselves overcharged them as they are all swollen and some have started seeping. All the bad UPS's are APC LS 700's purchased in mid 2007 per my records. I have considered contacting APC, but I doubt that they will care much.
On Thu, Jan 29, 2009 at 9:32 AM, Les Mikesell lesmikesell@gmail.com wrote:
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
-- Les Mikesell lesmikesell@gmail.com
Yesterday I had a system shutting down on its own. It would power up and stay on, but I discover that the PSU fans were not spinning. The failing fans explained a warning message at POST indicating that the system was overheating and needed immediate repairs.
~af
on 1-29-2009 4:29 PM Aldo Foot spake the following:
On Thu, Jan 29, 2009 at 9:32 AM, Les Mikesell lesmikesell-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org wrote:
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
-- Les Mikesell lesmikesell@gmail.com
Yesterday I had a system shutting down on its own. It would power up and stay on, but I discover that the PSU fans were not spinning. The failing fans explained a warning message at POST indicating that the system was overheating and needed immediate repairs.
~af
I have a recovered desktop at home that was my daughters. Sometimes it powers up and shuts off, and sometimes it will die before it gets out of the post screens. I suspect the CPU fan, but since I don't need the system right now (I just gave her a better one so she is happy), it sits until I have time.
There are many ways a system can break, and some of them can be very difficult to figure out without a bunch of spare parts you can swap.
On Thursday 29 January 2009 17:32:00 Les Mikesell wrote:
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
I've changed the PSU for a better one, but it may have been not the PSU at all. While fitting the new one my hand brushed against a sata cable, which came unplugged. The connector had cracked. I've replaced that too.
Anne
On Fri, Jan 30, 2009 at 12:31 PM, Anne Wilson cannewilson@googlemail.com wrote:
On Thursday 29 January 2009 17:32:00 Les Mikesell wrote:
Anne Wilson wrote:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
My first guesses would be the system power supply or the CPU fan.
I've changed the PSU for a better one, but it may have been not the PSU at all. While fitting the new one my hand brushed against a sata cable, which came unplugged. The connector had cracked. I've replaced that too.
Cool. I solved a variety of mysterious HW problems, recently, in two boxes, by reseating a connector or replacing an EIDE ribbon cable.
I just put three (3) high end Tripp Lite UPS (AVR without using the battery) into the garage to give away. Probably dead batteries and other problems. Getting them fixed properly is a PITA here. We bought two (2) non name brand UPS last Friday and when the store receives more, we will buy another one. Much much cheaper and if they break after warranty, I will put them into the garage.
Even if the battery in your APC UPS is fully charged now, I would assume that it has been through the ringer, with what your village experienced recently, power wise.
On Friday 30 January 2009 21:27:12 Lanny Marcus wrote:
Even if the battery in your APC UPS is fully charged now, I would assume that it has been through the ringer, with what your village experienced recently, power wise.
I'm inclined to think so, too. Do you find that nut handles all brands without problems?
Anne
on 1-29-2009 9:28 AM Anne Wilson spake the following:
Are the failures power related, or is the system just shutting down on its own?
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
Anne
Since you have other systems, maybe you could hang something on the serial port and get the kernel messages there to see if you get any other info that might not be making it to the logs.
Anne Wilson wrote: . . .
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
http://en.wikipedia.org/wiki/Capacitor_plague.
I just fixed a test box that kept getting something like "received INT 11 - no one cared" and then locks up. Replaced two caps - I yanked them from some old, defunct power supplies.
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Toby Bluhm Sent: Thursday, January 29, 2009 12:42 PM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
Anne Wilson wrote: . . .
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
http://en.wikipedia.org/wiki/Capacitor_plague.
I just fixed a test box that kept getting something like "received INT 11 - no one cared" and then locks up. Replaced two caps - I yanked them from some old, defunct power supplies.
----- How did you know they were bad? Could you explain to her what to look for and how to use a Multimeter?
JohnStanley
John wrote: . . .
http://en.wikipedia.org/wiki/Capacitor_plague.
I just fixed a test box that kept getting something like "received INT 11 - no one cared" and then locks up. Replaced two caps - I yanked them from some old, defunct power supplies.
How did you know they were bad? Could you explain to her what to look for and how to use a Multimeter?
You look at them - no meter required. The tops of the electrolytic capacitors should be flat and clean looking - not bulged, puffed or discolored. It's all described very well in the wiki page.
Replacing the capacitors does require soldering equipment and soldering skills. Or just replace the whole MB or power supply - whichever is the problem.
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Toby Bluhm Sent: Friday, January 30, 2009 8:05 AM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
John wrote: . . .
http://en.wikipedia.org/wiki/Capacitor_plague.
I just fixed a test box that kept getting something like "received INT 11 - no one cared" and then locks up. Replaced two caps - I yanked them from some old, defunct power supplies.
How did you know they were bad? Could you explain to her
what to look for
and how to use a Multimeter?
You look at them - no meter required. The tops of the electrolytic capacitors should be flat and clean looking - not bulged, puffed or discolored. It's all described very well in the wiki page.
Replacing the capacitors does require soldering equipment and soldering skills. Or just replace the whole MB or power supply - whichever is the problem.
---
Thank You!
JohnStanley
On Saturday 31 January 2009 16:29:32 John wrote:
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Toby Bluhm Sent: Friday, January 30, 2009 8:05 AM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
John wrote: . . .
http://en.wikipedia.org/wiki/Capacitor_plague.
I just fixed a test box that kept getting something like "received INT 11 - no one cared" and then locks up. Replaced two caps - I yanked them from some old, defunct power supplies.
How did you know they were bad? Could you explain to her
what to look for
and how to use a Multimeter?
You look at them - no meter required. The tops of the electrolytic capacitors should be flat and clean looking - not bulged, puffed or discolored. It's all described very well in the wiki page.
Replacing the capacitors does require soldering equipment and soldering skills. Or just replace the whole MB or power supply - whichever is the problem.
I've lost track of who told me to check capacitors, so apologies to the person concerned. I did look - and everything looks fine.
Anne
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Anne Wilson Sent: Saturday, January 31, 2009 12:01 PM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
On Saturday 31 January 2009 16:29:32 John wrote:
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Toby Bluhm Sent: Friday, January 30, 2009 8:05 AM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
John wrote: . . .
http://en.wikipedia.org/wiki/Capacitor_plague.
I just fixed a test box that kept getting something like "received INT 11 - no one cared" and then locks up. Replaced two caps - I yanked them from some old, defunct power supplies.
How did you know they were bad? Could you explain to her
what to look for
and how to use a Multimeter?
You look at them - no meter required. The tops of the electrolytic capacitors should be flat and clean looking - not bulged,
puffed or
discolored. It's all described very well in the wiki page.
Replacing the capacitors does require soldering equipment and soldering skills. Or just replace the whole MB or power supply - whichever is the problem.
I've lost track of who told me to check capacitors, so apologies to the person concerned. I did look - and everything looks fine.
Anne
----
Toby asked you to...
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf
Of
Toby Bluhm Sent: Thursday, January 29, 2009 6:42 PM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
Anne, is your motherboard an oldish MSI (Microstar)?
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf
Of
Anne Wilson Sent: Thursday, January 29, 2009 6:28 PM To: CentOS mailing list Subject: Re: [CentOS] Emergency rescue help needed
I do have a UPS, and it's fully charged. The system is just spontaneously rebooting or shutting down.
Try another PSU.
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Scott Silva Sent: Thursday, January 29, 2009 6:18 PM To: centos@centos.org Subject: Re: [CentOS] Emergency rescue help needed
Are the failures power related, or is the system just shutting down on its own?
I've had this happen on more than one occasion, that is to say when the motherboard didn't totally bail at the occasion. Anne, if you have a spare PSU, try your system with that one and see if the system is more stable.
Brand-name PSU's is *not* a guarantee it will last and/or be resilient. I have experience with those as well... 8-/
If the latter, I would suspect either a power supply or a processor fan. If the former, maybe you need to invest in an inexpensive UPS.
I second that. A UPS, as in prevention, is THE starting point for stability.
At home I have a fairly big one, a Powerware 5115 rated at 1400VA. My two Windows DC's, another Windows intranet web portal, firewall-appliance, linux web server and switch, as well as one monitor are connected to it. The three workstations are not though, all docs and files are on the DC's. 8-}
Use Spinrite from www.grc.com It will work on any file system I have used it on Windows - Ubuntu Linux - and even an X-Box Works wonders....
Franklin S Werren
Scott Silva wrote:
on 1-29-2009 8:30 AM Anne Wilson spake the following:
2009/1/29 Alex H. Vandenham alex-qMVNeVs1MAKw5LPnMra/2Q@public.gmane.org:
On Thursday 29 January 2009 10:15:38 am Anne Wilson wrote:
I assume that the hdd is failing - but I haven't seen any messages from smartmontools. Is there any way I can check that? If it is I don't want to waste time trying to repair it.
try smartctl to see what the monitors have been finding for you.
man smartctl
Thanks. I'd been trying to remember what command I needed for that :-)
The short test has completed without errors. I'll run the long test during dinner. Assuming that that also runs without errors, I guess that the next thing is memtest?
More suggestions?
Thanks
Anne
If you had many power failures, the filesystem might just be severely trashed. Journals and files out of sync, etc... If a good fsck didn't fix it, you might just be in for a wipe-reinstall, or many hours of finding and fixing corrupted files.. I would install to a new drive, and then you can take some time recovering from the old drive as you find things missing. That way you will still have the old system for whatever might come up. I always seem to find something that didn't get backed up properly.
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
2009/1/29 Franklin S Werren admin@chautauqualake.net:
Use Spinrite from www.grc.com It will work on any file system I have used it on Windows - Ubuntu Linux - and even an X-Box Works wonders....
Data recovery isn't a problem - I've copied everything off. Still, I've bookmarked that, just in case. It's lucky I don't have just one box :-)
Anne
-----Original Message----- From: centos-bounces@centos.org [mailto:centos-bounces@centos.org] On Behalf Of Scott Silva Sent: Thursday, January 29, 2009 5:37 PM To: centos@centos.org Subject: Re: [CentOS] Emergency rescue help needed
If you had many power failures, the filesystem might just be severely trashed. Journals and files out of sync, etc... If a good fsck didn't fix it, you might just be in for a wipe-reinstall, or many hours of finding and fixing corrupted files.. [...]
A UPS hasn't previously been mentioned AFAICT, or possibly I missed it... You do have one connected, don't you?
The controlled shutdown a UPS usually offers at power/brownouts, is a really good solution IMHO.
/Sorin