[CentOS] Comparing directories recursively

Sat Oct 28 12:10:34 UTC 2017
Rich <centos at foxengines.net>


On Fri, Oct 27, 2017 at 05:27:22PM -0400, H wrote:
> What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
> Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?

rsync obviously offers the 'exist in only one place' feature but also offers checksum comparisons (in version 3 and higher, I understand)...

-c, --checksum
      This changes the way rsync checks if the files have been changed
      and  are in need of a transfer.  Without this option, rsync uses
      a "quick check" that (by default) checks if each file’s size and
      time of last modification match between the sender and receiver.
      This option changes this to compare a 128-bit checksum for  each
      file  that  has a matching size.  Generating the checksums means
      that both sides will expend a lot of disk I/O  reading  all  the
      data  in  the  files  in  the transfer (and this is prior to any
      reading that will be done to transfer changed  files),  so  this
      can slow things down significantly.

      The  sending  side generates its checksums while it is doing the
      file-system scan that builds the list of  the  available  files.
      The  receiver  generates  its  checksums when it is scanning for
      changed files, and will checksum any file that has the same size
      as the corresponding sender’s file:  files with either a changed
      size or a changed checksum are selected for transfer.

      Note that rsync always verifies that each transferred  file  was
      correctly  reconstructed  on  the  receiving  side by checking a
      whole-file checksum that is generated  as  the  file  is  trans‐
      ferred,  but  that automatic after-the-transfer verification has
      nothing to do with this option’s before-the-transfer "Does  this
      file need to be updated?" check.

      For  protocol  30  and  beyond  (first  supported in 3.0.0), the
      checksum used is MD5.  For older protocols, the checksum used is