Hi,
I updated a server yesterday from
"kernel 2.6.18-128.7.1.el5xen" to "kernel 2.6.18-164.el5xen"
After rebooting, my message log is flooded every second or so with this error messages:
Oct 6 14:52:20 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recovera ble Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
and
Oct 6 15:17:23 xenserver1 kernel: EDAC MC0: CE row 0, channel 0, label "": Corrected error (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, CE Err=0x10000 (Correctable Non-Mirrored Demand Data E CC))
The machine is a new Tyan S5397 mobo with 16GB Kingston RAM KVR667D2D4F5K2/8G
Removing and replacing memory to different locations doesn't make any difference.
After some digging, I noticed that the new kernel has added support for the i5400 chipset. I found some reference that the new kernel has this error reporting capability the old one hadn't.
Question1: how many recoverable RAM errors are acceptable? Question2: The error appears always with the same id in the error message. Mobo problem? Question3: Are there any recommended BIOS settings to operate the RAM slower to see if the problem disappears? Question4: Any other proposals.
Being located in Germany makes the "just return it to the dealer" proposal quite unattractive.
best regards --- Michael Schumacher PAMAS Partikelmess- und Analysesysteme GmbH Dieselstr.10, D-71277 Rutesheim Tel +49-7152-99630 Fax +49-7152-996333 Geschäftsführer: Gerhard Schreck Handelsregister B Stuttgart HRB 252024
Run a memtest instead. If it fails, simply replace it.
On Tue, Oct 6, 2009 at 9:28 PM, Michael Schumacher < michael.schumacher@pamas.de> wrote:
Hi,
I updated a server yesterday from
"kernel 2.6.18-128.7.1.el5xen" to "kernel 2.6.18-164.el5xen"
After rebooting, my message log is flooded every second or so with this error messages:
Oct 6 14:52:20 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recovera ble Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
and
Oct 6 15:17:23 xenserver1 kernel: EDAC MC0: CE row 0, channel 0, label "": Corrected error (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, CE Err=0x10000 (Correctable Non-Mirrored Demand Data E CC))
The machine is a new Tyan S5397 mobo with 16GB Kingston RAM KVR667D2D4F5K2/8G
Removing and replacing memory to different locations doesn't make any difference.
After some digging, I noticed that the new kernel has added support for the i5400 chipset. I found some reference that the new kernel has this error reporting capability the old one hadn't.
Question1: how many recoverable RAM errors are acceptable? Question2: The error appears always with the same id in the error message. Mobo problem? Question3: Are there any recommended BIOS settings to operate the RAM slower to see if the problem disappears? Question4: Any other proposals.
Being located in Germany makes the "just return it to the dealer" proposal quite unattractive.
best regards
Michael Schumacher PAMAS Partikelmess- und Analysesysteme GmbH Dieselstr.10, D-71277 Rutesheim Tel +49-7152-99630 Fax +49-7152-996333 Geschäftsführer: Gerhard Schreck Handelsregister B Stuttgart HRB 252024
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
On Tuesday 06 October 2009 09:28, Michael Schumacher wrote:
Question1: how many recoverable RAM errors are acceptable?
No errors are acceptable.
Being located in Germany makes the "just return it to the dealer" proposal quite unattractive.
I don't understand why you can't return the memory itself, especially since you say this is a new machine.
Am Dienstag, den 06.10.2009, 15:28 +0200 schrieb Michael Schumacher:
Hi,
I updated a server yesterday from
"kernel 2.6.18-128.7.1.el5xen" to "kernel 2.6.18-164.el5xen"
After rebooting, my message log is flooded every second or so with this error messages:
Oct 6 14:52:20 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recovera ble Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
and
Oct 6 15:17:23 xenserver1 kernel: EDAC MC0: CE row 0, channel 0, label "": Corrected error (Branch=0 DRAM-Bank=0 RDWR=Read RAS=0 CAS=0, CE Err=0x10000 (Correctable Non-Mirrored Demand Data E CC))
The machine is a new Tyan S5397 mobo with 16GB Kingston RAM KVR667D2D4F5K2/8G
Simply open an RMA at Kingston they will send you replacement memory.
Chris
financial.com AG
Munich head office/Hauptsitz München: Maria-Probst-Str. 19 | 80939 München | Germany Frankfurt branch office/Niederlassung Frankfurt: Messeturm | Friedrich-Ebert-Anlage 49 | 60327 Frankfurt | Germany Management board/Vorstand: Dr. Steffen Boehnert | Dr. Alexis Eisenhofer | Dr. Yann Samson | Matthias Wiederwach Supervisory board/Aufsichtsrat: Dr. Dr. Ernst zur Linden (chairman/Vorsitzender) Register court/Handelsregister: Munich – HRB 128 972 | Sales tax ID number/St.Nr.: DE205 370 553
Hi everybody,
thanks for your immediate response. I will replace the board, but I am wondering what the error message actually means?
Oct 16 14:07:36 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
I understand that the system logs an error if the configuration data is written into the RAM-configuration.
The error happens precisely once a second.
Why the * would the kernel reprogram the RAM configuration once every second?
best regards Michael Schumacher mailto:michael.schumacher@pamas.de
Hi,
to finish that story: The error was actually produced by a faulty motherboard. Tyan replaced it without much trouble. Be advised that MEMTEST86+ couldn't find the problem.
thanks for your immediate response. I will replace the board, but I am wondering what the error message actually means?
Oct 16 14:07:36 xenserver1 kernel: EDAC MC0: UE row 0, channel-a= 0 channel-b= 1 labels "-": NON-FATAL recoverable (Branch=0 DRAM-Bank=0 Buffer ID = 0 RDWR=Read RAS=0 CAS=0 NON-FATAL recoverable Err=0x2000 (FB-DIMM Configuration Write error on first attempt))
I understand that the system logs an error if the configuration data is written into the RAM-configuration.
The error happens precisely once a second.
Why the * would the kernel reprogram the RAM configuration once every second?