[CentOS] Re: pagecache corruption on Tyan S3870

Thu Mar 1 17:10:58 UTC 2007
Scott Silva <ssilva at sgvwater.com>

Dan Halbert spake the following on 2/28/2007 8:21 PM:
> A couple of months ago I reported some problems with a batch of Tyan
> K8SSA (S3870) based machines. We are continuing to have an odd problem
> with these boxes, and if anyone has seen something similar elsewhere,
> I'd appreciate hearing about it.
> These boxes are running Centos 4.4 x86_64 with kernel
> 2.6.9-42.0.3.ELsmp. They are dual Opteron 265's (dual core) with 4x2GB
> DIMM's. The DIMMs used to be mixed sizes, but Tyan recommended making
> them all the same, and the vendor made the substitutions. We have also
> clocked the memory down from 400 MHz to 266 MHz, also on the advice of
> Tyan.
> The symptom is that some large (700MB to >1GB) files opened for read and
> then closed show corruption in the pagecache. One or more 4k blocks in a
> file will be completely trashed. It's as if a random page of other data
> is substituted. A reboot or a flush of the pagecache fixes the problem,
> so it's only in the pagecache, not on disk. We are doing regular MD5
> checksums of the files, which shows up the problem, in addition to
> having our application crash from time to time.
> We have some older Tyan motherboards that don't show this problem. At
> this point it seems it is either a hardware problem or a kernel
> motherboard-support problem, but it's pretty baffling.
> Thanks,
> Dan
Have you tried a newer kernel to see if it changes the problem?


MailScanner is like deodorant...
You hope everybody uses it, and
you notice quickly if they don't!!!!