What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
I found a mention of hashdeep on the 'net which means first running it against the first directory generating a file with checksums and then running it a second time against the second directory using this checksum file. Hashdeep, however, is not in the CentOS repository and, according to the 'net, is possibly no longer maintained.\
I also found md5deep which seems similar.
Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?
On Fri, 27 Oct 2017 17:27:22 -0400 H wrote:
What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
diff --brief -r dir1/ dir2/
might do what you need.
If you also want to see differences for files that may not exist in either directory:
diff --brief -Nr dir1/ dir2/
On 10/27/2017 05:35 PM, Frank Cox wrote:
On Fri, 27 Oct 2017 17:27:22 -0400 H wrote:
What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
diff --brief -r dir1/ dir2/
might do what you need.
If you also want to see differences for files that may not exist in either directory:
diff --brief -Nr dir1/ dir2/
But is diff not best suited for text files?
On Fri, 27 Oct 2017 17:51:46 -0400 H wrote:
But is diff not best suited for text files?
The standard unix diff will show if the files are the same or not:
$diff 1.bin 2.bin Binary files 1.bin and 2.bin differ
If there is no output from the command, it means that the files have no differences.
Since you don't need to know exactly how the files are different (the mere fact that they are different is what you do want to know), that should do it.
On October 27, 2017 6:23:45 PM EDT, Frank Cox theatre@sasktel.net wrote:
On Fri, 27 Oct 2017 17:51:46 -0400 H wrote:
But is diff not best suited for text files?
The standard unix diff will show if the files are the same or not:
$diff 1.bin 2.bin Binary files 1.bin and 2.bin differ
If there is no output from the command, it means that the files have no differences.
Since you don't need to know exactly how the files are different (the mere fact that they are different is what you do want to know), that should do it.
-- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Ok!
On October 27, 2017 5:35:59 PM EDT, Frank Cox theatre@sasktel.net wrote:
On Fri, 27 Oct 2017 17:27:22 -0400 H wrote:
What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from
one
drive to another? I used cp -au to copy directories, not rsync, since
it is
between local disks.
diff --brief -r dir1/ dir2/
might do what you need.
If you also want to see differences for files that may not exist in either directory:
diff --brief -Nr dir1/ dir2/
-- MELVILLE THEATRE ~ Real D 3D Digital Cinema ~ www.melvilletheatre.com _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Great, used as suggested!
Am 27.10.2017 um 23:27 schrieb H agents@meddatainc.com:
What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
I found a mention of hashdeep on the 'net which means first running it against the first directory generating a file with checksums and then running it a second time against the second directory using this checksum file. Hashdeep, however, is not in the CentOS repository and, according to the 'net, is possibly no longer maintained.\
I also found md5deep which seems similar.
Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?
source:
find . -type f -exec md5sum {} ; > checksum.list
destination:
md5sum -c checksum.list
-- LF
On Sat, 28 Oct 2017 00:47:32 +0200 Leon Fauster wrote:
source:
find . -type f -exec md5sum {} ; > checksum.list
destination:
md5sum -c checksum.list
Wouldn't diff be faster because it doesn't have to read to the end of every file and it isn't really calculating anything? Or am I looking at this in the wrong way.
Am 28.10.2017 um 01:54 schrieb Frank Cox theatre@sasktel.net:
On Sat, 28 Oct 2017 00:47:32 +0200 Leon Fauster wrote:
source:
find . -type f -exec md5sum {} ; > checksum.list
destination:
md5sum -c checksum.list
Wouldn't diff be faster because it doesn't have to read to the end of every file and it isn't really calculating anything? Or am I looking at this in the wrong way.
Fast was not communicated as requirement, albeit md5 algo works by design "fast" and it was asked to "to compare file hashes". Nevertheless I also use diff to compare. It depends on your needs ...
-- LF
On October 28, 2017 9:09:49 AM EDT, Leon Fauster leonfauster@googlemail.com wrote:
Am 28.10.2017 um 01:54 schrieb Frank Cox theatre@sasktel.net:
On Sat, 28 Oct 2017 00:47:32 +0200 Leon Fauster wrote:
source:
find . -type f -exec md5sum {} ; > checksum.list
destination:
md5sum -c checksum.list
Wouldn't diff be faster because it doesn't have to read to the end of
every file and
it isn't really calculating anything? Or am I looking at this in the
wrong way.
Fast was not communicated as requirement, albeit md5 algo works by design "fast" and it was asked to "to compare file hashes". Nevertheless I also use diff to compare. It depends on your needs ...
-- LF
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
I did end up using diff which seemed to work well.
In article 20171027175431.e265479c4f9b4658fe2179bf@sasktel.net, Frank Cox theatre@sasktel.net wrote:
On Sat, 28 Oct 2017 00:47:32 +0200 Leon Fauster wrote:
source:
find . -type f -exec md5sum {} ; > checksum.list
destination:
md5sum -c checksum.list
Wouldn't diff be faster because it doesn't have to read to the end of every file and it isn't really calculating anything? Or am I looking at this in the wrong way.
If the files are the same (which is what the OP is hoping), then diff does indeed have to read to the end of both files to be certain of this. Only if they differ can it stop reading the files as soon as a difference between them is found.
Cheers Tony
On October 27, 2017 6:47:32 PM EDT, Leon Fauster leonfauster@googlemail.com wrote:
Am 27.10.2017 um 23:27 schrieb H agents@meddatainc.com:
What is the best tool to compare file hashes in two different
drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
I found a mention of hashdeep on the 'net which means first running
it against the first directory generating a file with checksums and then running it a second time against the second directory using this checksum file. Hashdeep, however, is not in the CentOS repository and, according to the 'net, is possibly no longer maintained.\
I also found md5deep which seems similar.
Are there other tools for this automatic compare where I am really
looking for a list of files that exist in only one place or where checksums do not match?
source:
find . -type f -exec md5sum {} ; > checksum.list
destination:
md5sum -c checksum.list
-- LF
CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Thank you, saving this for the future.
Hi,
On Fri, Oct 27, 2017 at 05:27:22PM -0400, H wrote:
What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
[snip]
Are there other tools for this automatic compare where I am really looking for a list of files that exist in only one place or where checksums do not match?
rsync obviously offers the 'exist in only one place' feature but also offers checksum comparisons (in version 3 and higher, I understand)...
-c, --checksum This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file’s size and time of last modification match between the sender and receiver. This option changes this to compare a 128-bit checksum for each file that has a matching size. Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly.
The sending side generates its checksums while it is doing the file-system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender’s file: files with either a changed size or a changed checksum are selected for transfer.
Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is trans‐ ferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check.
For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used is MD4.
Rich.
On October 28, 2017 8:10:34 AM EDT, Rich centos@foxengines.net wrote:
Hi,
On Fri, Oct 27, 2017 at 05:27:22PM -0400, H wrote:
What is the best tool to compare file hashes in two different
drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks. [snip]
Are there other tools for this automatic compare where I am really
looking for a list of files that exist in only one place or where checksums do not match?
rsync obviously offers the 'exist in only one place' feature but also offers checksum comparisons (in version 3 and higher, I understand)...
-c, --checksum This changes the way rsync checks if the files have been changed and are in need of a transfer. Without this option, rsync uses a "quick check" that (by default) checks if each file’s size and time of last modification match between the sender and receiver. This option changes this to compare a 128-bit checksum for each file that has a matching size. Generating the checksums means that both sides will expend a lot of disk I/O reading all the data in the files in the transfer (and this is prior to any reading that will be done to transfer changed files), so this can slow things down significantly.
The sending side generates its checksums while it is doing the file-system scan that builds the list of the available files. The receiver generates its checksums when it is scanning for changed files, and will checksum any file that has the same size as the corresponding sender’s file: files with either a changed size or a changed checksum are selected for transfer. Note that rsync always verifies that each transferred file was correctly reconstructed on the receiving side by checking a whole-file checksum that is generated as the file is trans‐ ferred, but that automatic after-the-transfer verification has nothing to do with this option’s before-the-transfer "Does this file need to be updated?" check. For protocol 30 and beyond (first supported in 3.0.0), the checksum used is MD5. For older protocols, the checksum used is MD4.
Rich. _______________________________________________ CentOS mailing list CentOS@centos.org https://lists.centos.org/mailman/listinfo/centos
Thank you, this time I used diff.
On 10/27/2017 05:27 PM, H wrote:
What is the best tool to compare file hashes in two different drives/directories such as after copying a large number of files from one drive to another? I used cp -au to copy directories, not rsync, since it is between local disks.
I typically use 'rsync -av -c --dry-run ${dir1}/ ${dir2}/ ' (or some variation) for this. rsync works just as well on local disks as remote. This isn't as strong of a comparison as even an md5, but it's not a bad one and gives you a quick compare.
You can even use git for this: 'git diff --no-index ${dir1}/ ${dir2}/' and that would be a stronger comparison.