I started to receive this kind of messages a few days ago on one of my servers:
Message from syslogd@ at Mon Apr 29 08:02:55 2013 ... server1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, UE Err=0x2 (Aliased Uncorrectable Non-Mirrored Demand Data ECC))
I've never had ECC memory to fail on me before, so now I am wondering the following:
* The server is running CentOS 5.7 and is acting as Xen dom0. Is there any possibility this could be a kernel issue and upgrading would help, or would upgrading at this point just cause more trouble?
* Is there now a possibility that my data can get corrupt: should I shutdown the server as soon as possible or can I keep running until I replace the memories?
* This server has been running for several years in a datacenter without problems: what are your experiences, are these kind of problems most likely caused by a failing motherboard or the memories?
Regards, Peter
On 04/29/13 04:17, Peter Peltonen wrote:
I started to receive this kind of messages a few days ago on one of my servers:
Message from syslogd@ at Mon Apr 29 08:02:55 2013 ... server1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, UE Err=0x2 (Aliased Uncorrectable Non-Mirrored Demand Data ECC))
I've never had ECC memory to fail on me before, so now I am wondering the following:
- The server is running CentOS 5.7 and is acting as Xen dom0. Is there any
possibility this could be a kernel issue and upgrading would help, or would upgrading at this point just cause more trouble?
Not in my experience.
- Is there now a possibility that my data can get corrupt: should I
shutdown the server as soon as possible or can I keep running until I replace the memories?
Maybe - I'm just not sure. You need to replace the memory asap; order it, and schedule a maintenance window with all your users *now*.
- This server has been running for several years in a datacenter without
problems: what are your experiences, are these kind of problems most likely caused by a failing motherboard or the memories?
DIMM went bad. No big thing. Your only problem may be to identify which one, he says, about to go into work to do just that.
mark
Monday, April 29, 2013, 1:59:44 PM, you wrote:
Message from syslogd@ at Mon Apr 29 08:02:55 2013 ... server1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, UE Err=0x2 (Aliased Uncorrectable Non-Mirrored Demand Data ECC))
Maybe - I'm just not sure. You need to replace the memory asap; order it, and schedule a maintenance window with all your users *now*.
fully agrees, however I had a situation once where these messages appeared after a Kernel update. I learned that the kernel developers added some extra error messages for my chip set with that update. Booting an older kernel made the errors go away. OTOH, if the board is already some years old, this is not very likely (and you will have to replace the memory anyway.)
best regards --- Michael Schumacher PAMAS Partikelmess- und Analysesysteme GmbH Dieselstr.10, D-71277 Rutesheim Tel +49-7152-99630 Fax +49-7152-996333 Geschäftsführer: Gerhard Schreck Handelsregister B Stuttgart HRB 252024
Hi,
On Mon, Apr 29, 2013 at 2:59 PM, mark m.roth@5-cent.us wrote:
DIMM went bad. No big thing. Your only problem may be to identify which one, he says, about to go into work to do just that.
Thanks for your response and suggestions.
About identifying the faulty DIMM: Is the memtest provided on the CentOS5 installation disk best tool for this purpose? And do I need to switch ECC off from BIOS while I test the memories?
The EDAC error msg reports problems with bank0. Can I trust this? I tried installing edac-utils to get more information, but after installation it only generates segmentation fault:
# edac-util --report=simple Segmentation fault
# edac-util -s Segmentation fault
# rpm -qv edac-utils edac-utils-0.9-6.el5
Regards, Peter
On Mon, Apr 29, 2013 at 1:41 PM, Peter Peltonen peter.peltonen@gmail.comwrote:
Hi,
On Mon, Apr 29, 2013 at 2:59 PM, mark m.roth@5-cent.us wrote:
DIMM went bad. No big thing. Your only problem may be to identify which one, he says, about to go into work to do just that.
Thanks for your response and suggestions.
About identifying the faulty DIMM: Is the memtest provided on the CentOS5 installation disk best tool for this purpose? And do I need to switch ECC off from BIOS while I test the memories?
The EDAC error msg reports problems with bank0. Can I trust this? I tried installing edac-utils to get more information, but after installation it only generates segmentation fault:
# edac-util --report=simple Segmentation fault
# edac-util -s Segmentation fault
# rpm -qv edac-utils edac-utils-0.9-6.el5
Regards, Peter _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hi Peter
One of my old HP DL585 had a similar issue but it turned out that the DIMM slots were at fault. The server chassis had few led blinking red for those DIMM slots and indicating that they are faulty. I removed the memory from those slot and re-inserted them to the spare DIMM slots and everything is working fine since then.
Regards, Vipul
Vipul Agarwal wrote:
One of my old HP DL585 had a similar issue but it turned out that the DIMM slots were at fault. The server chassis had few led blinking red for those DIMM slots and indicating that they are faulty. I removed the memory from those slot and re-inserted them to the spare DIMM slots and everything is working fine since then.
Hi, there Vipul, old buddy, old pal.... I've *got* an HP DL580 G5 that was spitting ECC errors, too. It was fully populated, both the m/b and the four risers. I took my best guess last week and pulled one mirrored pair of memory (OP: make sure that memory isn't mirrored - then you have to pull at *least* two), replaced them with two from a riser... and then had to take out *two* of the four risers.
Now, on an HP support list, someone left a message over the weekend that I should do a BIOS update... except all I can find is a DOS .exe to do it, *and* there's a comment about needing to install previous BIOS updates.... You don't happen to know if I do need to install the previous update?
mark, looking into a freedos USB key solution....
On Mon, Apr 29, 2013 at 11:11:45AM -0400, m.roth@5-cent.us wrote:
Vipul Agarwal wrote:
One of my old HP DL585 had a similar issue but it turned out that the DIMM slots were at fault. The server chassis had few led blinking red for those DIMM slots and indicating that they are faulty. I removed the memory from those slot and re-inserted them to the spare DIMM slots and everything is working fine since then.
Hi, there Vipul, old buddy, old pal.... I've *got* an HP DL580 G5 that was spitting ECC errors, too. It was fully populated, both the m/b and the four risers. I took my best guess last week and pulled one mirrored pair of memory (OP: make sure that memory isn't mirrored - then you have to pull at *least* two), replaced them with two from a riser... and then had to take out *two* of the four risers.
Now, on an HP support list, someone left a message over the weekend that I should do a BIOS update... except all I can find is a DOS .exe to do it, *and* there's a comment about needing to install previous BIOS updates.... You don't happen to know if I do need to install the previous update?
mark, looking into a freedos USB key solution....
I think the "emergency boot cd" contains bootable freedos images...
was looking for a URL for it, and I find many things under that name, but I'm not sure that any of them is the one I'm thinking of. I'll post again if I find it.
On Mon, Apr 29, 2013 at 11:46:08AM -0400, Fred Smith wrote:
On Mon, Apr 29, 2013 at 11:11:45AM -0400, m.roth@5-cent.us wrote:
Vipul Agarwal wrote:
One of my old HP DL585 had a similar issue but it turned out that the DIMM slots were at fault. The server chassis had few led blinking red for those DIMM slots and indicating that they are faulty. I removed the memory from those slot and re-inserted them to the spare DIMM slots and everything is working fine since then.
Hi, there Vipul, old buddy, old pal.... I've *got* an HP DL580 G5 that was spitting ECC errors, too. It was fully populated, both the m/b and the four risers. I took my best guess last week and pulled one mirrored pair of memory (OP: make sure that memory isn't mirrored - then you have to pull at *least* two), replaced them with two from a riser... and then had to take out *two* of the four risers.
Now, on an HP support list, someone left a message over the weekend that I should do a BIOS update... except all I can find is a DOS .exe to do it, *and* there's a comment about needing to install previous BIOS updates.... You don't happen to know if I do need to install the previous update?
mark, looking into a freedos USB key solution....
I think the "emergency boot cd" contains bootable freedos images...
was looking for a URL for it, and I find many things under that name, but I'm not sure that any of them is the one I'm thinking of. I'll post again if I find it.
Hmm. this may be what I was thinking of, though I think the one I have at home has a different menu. nevertheless, this may help...:
http://www.ultimatebootcd.com/
-- ---- Fred Smith -- fredex@fcshome.stoneham.ma.us ----------------------------- The Lord is like a strong tower. Those who do what is right can run to him for safety. --------------------------- Proverbs 18:10 (niv) ----------------------------- _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Mon, Apr 29, 2013 at 11:11 AM, m.roth@5-cent.us wrote:
Now, on an HP support list, someone left a message over the weekend that I should do a BIOS update... except all I can find is a DOS .exe to do it, *and* there's a comment about needing to install previous BIOS updates.... You don't happen to know if I do need to install the previous update?
mark, looking into a freedos USB key solution....
Or set up a PXE server with a freedos floppy image. Use memdisk to load the floppy image into memory. [0] [1]
I've uncompressed and loopback mounted a freedos image and then added bios updating utilities my image(s). Unmount and re-compress the image if you like. It is possible to boot a compressed (gzip only?) image via memdisk for the record.
[0] http://raftaman.net/?p=491 [1] http://blog.mozilla.org/jv/2011/01/07/pxe-booting-dos/
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
From: "m.roth@5-cent.us" m.roth@5-cent.us
Now, on an HP support list, someone left a message over the weekend that I should do a BIOS update... except all I can find is a DOS .exe to do it, *and* there's a comment about needing to install previous BIOS updates.... You don't happen to know if I do need to install the previous update? mark, looking into a freedos USB key solution....
You can boot from the Firmware Maintenance CD. It will auto-detect all the hardware firmwares and update them if needed... You just have to find the most recent CD that still supports your model (see in the release notes), since they gradually remove old hw to make some room for new ones...
JD
John Doe wrote:
From: "m.roth@5-cent.us" m.roth@5-cent.us
Now, on an HP support list, someone left a message over the weekend that I should do a BIOS update... except all I can find is a DOS .exe to do it, *and* there's a comment about needing to install previous BIOS updates.... You don't happen to know if I do need to install the previous update? mark, looking into a freedos USB key solution....
You can boot from the Firmware Maintenance CD. It will auto-detect all the hardware firmwares and update them if needed... You just have to find the most recent CD that still supports your model (see in the release notes), since they gradually remove old hw to make some room for new ones...
No such luck: we don't have such a maintenance CD; if anyone does, it's the other Institute, and as we're doing admin work, I'd guess they don't have a real admin, so who knows where it is.
But wait, it's worse than that, Jim.... I did, in fact, boot freedos from the USB key this morning... and when I tried to run it, it announced that it *can't* be run in DOS mode.
No, there's no way anyone's going to spring for a Windows license, install it, do the update, and redo the system as CentOS.
Joking aside, do you, or does anyone, have an opinion on the advisability of trying to flash the BIOS under wine?
mark
From: "m.roth@5-cent.us" m.roth@5-cent.us
You can boot from the Firmware Maintenance CD. It will auto-detect all the hardware firmwares and update them if needed... You just have to find the most recent CD that still supports your model (see in the release notes), since they gradually remove old hw to make some room for new ones...
No such luck: we don't have such a maintenance CD; if anyone does, it's the other Institute, and as we're doing admin work, I'd guess they don't have a real admin, so who knows where it is.
The ISOSs are downloadable... Google "firmware maintenance cd" and check the "version history" to get the latest one. Then, try the "release notes" to see if you find your server model (not always listed). If not, go back a few versions until you find it. CD 8.60 by example seems to have it and is not too old... Download and burn. Or, you could try the new way (I never tried it yet): http://h18004.www1.hp.com/products/servers/management/spp/index.html
JD
John Doe wrote:
From: "m.roth@5-cent.us" m.roth@5-cent.us
You can boot from the Firmware Maintenance CD. It will auto-detect all the hardware firmwares and update them if needed... You just have to find the most recent CD that still supports your model (see in the release notes), since they gradually remove old hw
to make
some room for new ones...
No such luck: we don't have such a maintenance CD; if anyone does, it's the other Institute, and as we're doing admin work, I'd guess they don't have a real admin, so who knows where it is.
The ISOSs are downloadable... Google "firmware maintenance cd" and check the "version history" to get the latest one. Then, try the "release notes" to see if you find your server model (not always listed). If not, go back a few versions until you find it. CD 8.60 by example seems to have it and is not too old... Download and burn. Or, you could try the new way (I never tried it yet): http://h18004.www1.hp.com/products/servers/management/spp/index.html
Found what seemed to be it - http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00308226, but the one for the DL580 GL5 was from '09. I clicked the link at the top of the list, found mine, clicked that, and was offered a choice of OS's, including CentOS, but that had some accelerator. At the bottom of the offered OS's, they say "cross-os - BIOS, etc". I follow that... and *all* I get is a WinDoze .exe. Could I use one of the burnable DVDs to boot from then run this, having copied onto the h/d on the server?
mark
From: "m.roth@5-cent.us" m.roth@5-cent.us
John Doe wrote:
The ISOSs are downloadable... Google "firmware maintenance cd" and check the "version
history" to get
the latest one. Then, try the "release notes" to see if you find
your
server model (not always listed). If not, go back a few versions until you find it. CD 8.60 by example seems to have it and is not too old... Download and burn. Or, you could try the new way (I never tried it yet): http://h18004.www1.hp.com/products/servers/management/spp/index.html
Found what seemed to be it - http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?objectID=c00308226, but the one for the DL580 GL5 was from '09. I clicked the link at the top of the list, found mine, clicked that, and was offered a choice of OS's, including CentOS, but that had some accelerator. At the bottom of the offered OS's, they say "cross-os - BIOS, etc". I follow that... and *all* I get is a WinDoze .exe. Could I use one of the burnable DVDs to boot from then run this, having copied onto the h/d on the server?
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
While I still think it is easier to just download and boot on the DVD (which works for many models too), you can try: http://h20565.www2.hp.com/portal/site/hpsc/public/psi/swdHome/?lang=en&c... and choose Red Hat instead of CentOS. Then choose the "Obtain software" entries instead of the Download ones...
JD
On 5/6/2013 8:09 AM, m.roth@5-cent.us wrote:
But wait, it's worse than that, Jim.... I did, in fact, boot freedos from the USB key this morning... and when I tried to run it, it announced that it*can't* be run in DOS mode.
No, there's no way anyone's going to spring for a Windows license, install it, do the update, and redo the system as CentOS.
run it from hirens usb ? that can boot to a winPE style environment.
John R Pierce wrote:
On 5/6/2013 8:09 AM, m.roth@5-cent.us wrote:
But wait, it's worse than that, Jim.... I did, in fact, boot freedos from the USB key this morning... and when I tried to run it, it announced that it*can't* be run in DOS mode.
No, there's no way anyone's going to spring for a Windows license, install it, do the update, and redo the system as CentOS.
run it from hirens usb ? that can boot to a winPE style environment.
Never heard of it. Just looked at it... winPE - is that the miniXP?
And it bothers me that I've never heard of hirens - I *do* have to be aware of security. I'm still thinking of wine.
mark
On 5/6/2013 12:21 PM, m.roth@5-cent.us wrote:
run it from hirens usb ? that can boot to a winPE style environment.
Never heard of it. Just looked at it... winPE - is that the miniXP?
And it bothers me that I've never heard of hirens - I*do* have to be aware of security. I'm still thinking of wine.
Hirens has been around for quite a while, and gets updated periodically. its a all-in-one CD or USB boot full of mostly open source tools, can boot into memtest86, a linux kernel/shell/gui environment, or into a 'bartPE' mini-XP environment. freedos too, I think.
hirens themselves only distributes the kit to build it, since it involves some licensed software, but pre-built ones are available from somewhat marginal sources (bit-torrent, etc).
you could, of course, simply use a plain BartPE boot, too, that you built yourself (Bart provides a toolkit for creating a winPE style environment). you need access to a Windows desktop system somewhere to build one of these, along with the Windows XP CD to get the required files.
John R Pierce wrote:
On 5/6/2013 12:21 PM, m.roth@5-cent.us wrote:
run it from hirens usb ? that can boot to a winPE style environment.
Never heard of it. Just looked at it... winPE - is that the miniXP?
And it bothers me that I've never heard of hirens - I*do* have to be aware of security. I'm still thinking of wine.
Hirens has been around for quite a while, and gets updated periodically. its a all-in-one CD or USB boot full of mostly open source tools, can boot into memtest86, a linux kernel/shell/gui environment, or into a 'bartPE' mini-XP environment. freedos too, I think.
<snip> Interesting. I need to look at them further.
HOWEVER: I saw something there about unpacking an .exe... and googled that, and found someone talking about doing that... which led me to cabextract, and, sure 'nough, I now have what was in that exe - flat files, CD, floppy! even a WinDoze floppy label printer!
Now all I have to do is figure out which will be easiest to use - I'm hoping I can just copy the flat files or the floppy files, or USB files, and reboot into freedos and go.
mark
On Mon, May 6, 2013 at 3:39 PM, m.roth@5-cent.us wrote:
John R Pierce wrote:
On 5/6/2013 12:21 PM, m.roth@5-cent.us wrote:
run it from hirens usb ? that can boot to a winPE style environment.
Never heard of it. Just looked at it... winPE - is that the miniXP?
And it bothers me that I've never heard of hirens - I*do* have to be aware of security. I'm still thinking of wine.
Hirens has been around for quite a while, and gets updated periodically. its a all-in-one CD or USB boot full of mostly open source tools, can boot into memtest86, a linux kernel/shell/gui environment, or into a 'bartPE' mini-XP environment. freedos too, I think.
<snip> Interesting. I need to look at them further.
HOWEVER: I saw something there about unpacking an .exe... and googled that, and found someone talking about doing that... which led me to cabextract, and, sure 'nough, I now have what was in that exe - flat files, CD, floppy! even a WinDoze floppy label printer!
Now all I have to do is figure out which will be easiest to use - I'm hoping I can just copy the flat files or the floppy files, or USB files, and reboot into freedos and go.
Hope this works out. :)
You could use a Windows 98 (or other Windows boot floppy) to boot and run the flashing utility. You can even boot up with your USB stick (has the utilities) plugged in ( it should become C: ).
I'm sure you could find a Windows floppy image online or one of us would be happy to send you a copy (it would be about the security/trust you can have in freedos images anyways).
Once you have the image it's trivial to turn a floppy image into a bootable ISO. But it sounds as if you have floppies and a floppy drive.
I think the freedos bits I saw in the HP bios updating archive for my system was for the "Crisis Recovery Disk".
mark
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On May 6, 2013, at 1:39 PM, m.roth@5-cent.us wrote:
Interesting. I need to look at them further.
HOWEVER: I saw something there about unpacking an .exe... and googled that, and found someone talking about doing that... which led me to cabextract, and, sure 'nough, I now have what was in that exe - flat files, CD, floppy! even a WinDoze floppy label printer!
Now all I have to do is figure out which will be easiest to use - I'm hoping I can just copy the flat files or the floppy files, or USB files, and reboot into freedos and go.
Can you get to this page without a service contract? You do have to log in.
http://h18004.www1.hp.com/products/servers/service_packs/en/index.html
There's a 2.5 GB bootable ISO that has all the updates for the entire Proliant line, basically, as well as the Bladeserver line.
Nate
Replying to myself:
On Mon, Apr 29, 2013 at 3:41 PM, Peter Peltonen peter.peltonen@gmail.comwrote:
The EDAC error msg reports problems with bank0. Can I trust this? I tried installing edac-utils to get more information, but after installation it only generates segmentation fault:
# edac-util --report=simple Segmentation fault
Replacing the first memory pair made the error messages go away.
Edac-util still segfaults though. But as the system seems to be otheriwse stable, I probably will not investigate this further.
Regards, Peter