[CentOS] data corruption on AMD AM2 systems with 4GB of RAM or more

Mon Jun 4 06:19:48 UTC 2007
Akemi Yagi <amyagi at gmail.com>

On 5/30/07, Akemi Yagi <amyagi at gmail.com> wrote:
> On 5/30/07, Feizhou <feizhou at graffiti.net> wrote:
> > Dan Halbert wrote:
> > > Feizhou wrote:
> > > I searched for the commit id in the Redhat bugzilla, and found it for
> > > RHEL5: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=238709. That
> > > bug says the fix has been backported and is in RHEL5 kernel
> > > 2.6.18-18.el5. But I don't see anything about backporting to RHEL4.
> >
> > I see. So forget RAM >= 4GB on Centos 4 on AMD AM2. Hmmph.
>
> Well, not really.  There seems to be a test kernel (2.6.9-42.EL) with
> this bug fix at:
>
> http://people.redhat.com/coldwell/kernel/bugs/223238/
>
> Therefore, upstream may be working on backporting for RHEL4
> (hopefully).  Note also that the official patched version for CentOS 5
> will not be available for a while either.  The 2.6.18-18.el5 kernel
> might be for RHEL 5.1.
>
> Because the patch is available now, another option is to rebuild the
> kernel by applying it.  Certainly not for everyone but if the fix is
> needed right now, this is the olny option.
>
> Akemi

The patch file for this bug did not work on the CentOS source file as
such.  I have recreated it for CentOS 5.0 x86_64 (see below) and was
able to rebuild kernels.  It you ever decided to do the same, here's
the modified patch, pci-gart.c :

--- 2.6-git.orig/arch/x86_64/kernel/pci-gart.c
+++ 2.6-git/arch/x86_64/kernel/pci-gart.c
@@ -523,6 +523,10 @@
        gatt = (void *)__get_free_pages(GFP_KERNEL, get_order(gatt_size));
        if (!gatt)
                panic("Cannot allocate GATT table");
+       if (change_page_attr_addr((unsigned long)gatt, gatt_size >>
PAGE_SHIFT, PAGE_KERNEL_NOCACHE))
+               panic("Could not set GART PTEs to uncacheable pages");
+       global_flush_tlb();
+
        memset(gatt, 0, gatt_size);
        agp_gatt_table = gatt;

================================

Then edit the kernel-2.6.spec file as follows:

Add this at line ~935 or so
Patch40000: pci_new.patch

Add this at line ~1908 or so
%patch40000 -p1

Good luck,
Akemi