On Fri, Aug 1, 2008 at 1:43 PM, William L. Maltby <CentOS4Bill at triad.rr.com> wrote: > On Fri, 2008-08-01 at 16:13 -0400, Toby Bluhm wrote: >> Mufit Eribol wrote: >> ><snip> >..... ..... that you would correctly try to > fsck the *device*. > First backup data... It is possible to run "fsck" with a media test flag. Bad blocks are assigned to dummy files. Inadvertently reading one of these files can take a drive off line. One reason a device will go off line is the presence of a media error, or the presence of a situation assumed by "smartd" to be a pending data risk..... Understanding the root cause error should be done. Smartd tends to be cautious but does identify pending problems. One puzzle can be the loss of log file data. It is sometimes possible to see events on a live system that later vanish after a reboot because buffers are live in memory but not on the disk. Sending logs to another 'log system' can be helpful and is a good idea on production systems for exactly this reason. -- NiftyCluster T o m M i t c h e l l