On Mon, 2008-08-25 at 10:43 +0200, Lorenzo Quatrini wrote:
William L. Maltby ha scritto:
Yep. Only a few copies of the superblock and the i-node tables are written by the file system make process. That's why it's important for files systems in critical applications to be created with the check forced. Folks should also keep in mind that the default check, read only, is really not sufficient for critical situations. The full write/read check should be forced on *new* partitions/disks.
First, a correction. I earlier mentioned "-C" as causing the read/write check for mke2fs. It is "-c -c". I must've been thinking of some other FS software.
So again my question is: can I use dd to "test" the disk? what about
dd if=/dev/sda of=/dev/sda bs=512
It ought to do what you think it would. But ...
Is this safe on a full running system? Has to be done at runlevel 1 or with a live cd?
Safe on a full running system? Probably. I suggest a test before you do it on an important system. I've never had the urge to do it the way you suggest. It can be done at run level 1 or from a live CD too. But ..
I think this is "better" than the manufacturer way, as dd is always present and works with any brand.
s/better/convenient/ # IMO
Now for the "buts". I presume that there are still two basic types of media errors on HDs, "hard" and "soft". Hard errors are those that are not recoverable through the normal hardware crc check process (or whatever they use these days). Soft errors are errors that are recoverable via the normal hardware crc check process.
Hard errors are always reported to the OS, soft errors are not, IIRC. So you could have recovered media failures that do not get reported to the OS. IF these failures are early indicators of deteriorating media you will not be notified of them.
For this reason, hardware-specific diagnostic software is "better". Further, the "smart" capabilities are *really* hardware specific and will detect and report things that normal read/write activities, like dd, cannot.
As to running on a live system, you might not want to for several reasons. If you are using the system to do anything useful at the time, there will be a big hit on responsiveness. Unlike the real original UNIX, Linux still does not have preemptive scheduling (somebody please correct me if I missed this potentially earth-shattering advancement - last I heard, earliest was to be the 2.7 kernel, presuming no slippage).
Because dd is fast, it will consume all I/O capability, especially the way you propose running it. Further, you will be causing a *LARGE* number of system calls, further degrading system responsiveness. It could be so slow to respond that one might think the system is "frozen".
If you insist on doing this, I would suggest something like
nice <:your priority here:> dd if=/dev/xxxx of=/dev/xxxx bs=16384&
"Man nice" for details. This helps a little bit. I've not tried to see how much responsiveness can be "recovered". A larger "bs=" will reduce system calls, but will increase buffer sizes and usage and increase I/O load. Even if you omit the trailing "&" to run in foreground, the responsiveness may be so slow that a <CTL>-<C> may appear to fail and make you think the system is "frozen"... for a little while.
The larger "bs=" would seem to negate what you want with the "bs=512". Not so. Since the detection of failures happens on the hardware, it will still detect failures and handle them as it normally would. The "bs=" is only a blocking factor. Your "512" only saves doing math to figure out what the "sector" really is. But it has a large cost. BTW, you don't really know what the sector size is these days. It may not be 512. Back in the old days, sector size was selectable via jumpers. Today I suspect the drives don't have sectors in the same way/size as they used to.
Closing (really, they are!) arguments: 1. Any OS, rather than hardware specific, test will be less rigorous. This is "optimal" only if other factors trump reliability. Usually "convenience" and "portability" will not trump reliability for server or critical platforms.
2. The "smart" feature has capabilities of which you may not be aware. One of these is to run in such a way as to minimize performance impact on a live system. If you've run "makewhatis", then "man -k smart" or "apropos smart" will get you started on the reading you may want to do.
3. Hardware-specific diagnostics and repair utilities from the manufacturer (this includes the "smart" capability of the drives) will be more rigorous and reliable than general-purpose utilities.
4. The manufacturer utilities can "repair" media failures as they are detected. If you are taking the time to run diagnostics, why not fix failures at the same time? If you believe that the "dd" way can accomplish the same thing (through the alternate block assignment process), why not grab a drive with known bad sectors and run a test to see if it will be satisfactory to you?
Lorenzo
<snip sig stuff>