[CentOS-devel] Centos Wiki tip proposal: Copy/verify files from CD/DVD quickly

Wed Mar 28 23:18:37 UTC 2007
Wojciech Pilorz <wpilorz at gmail.com>

John,

Thank you for you comments;

On 3/28/07, John Summerfield <debian at herakles.homelinux.org> wrote:
> Wojciech Pilorz wrote:
> > Hi folks,
> >
> > Could you please look at the text below and attachements and comment
> > please?
> >
> > Best regards,
> >
> > Wojtek
> > ---------------------------------
> > CentOS Wiki tip proposal: Copy/verify files from CD/DVD quickly
> >
> > If you want to copy a lot of files from CD or DVD, or verify md5 or
> > sha1 of a large number of files, keep in mind that CD/DVD drive seeks
> > are expensive and if are avoided as much as possible, the task could
> > be completed faster (and quieter).
> >
> > Method 1.
> > Copy the image to hard disk and loopback mount it.
> >
> > a) copying image
> > assuming device is /dev/dvd and disk is either a single-session CD or
> > CD-RW single session recorded in DAO mode, or DVD+RW recorded with
> > growisofs
> > $ dd if=/dev/dvd bs=2k of=hard_disk_directory/imagename.iso
> >
> > It the media is a single-session CD or CD-RW recorded in TAO mode,
> > there are unreadable sectors at end of session, you need to determine
> > size:
> >
> > $ isosize -x /dev/dvd
> >
> > results are like this:
> >
> > sector count: 346739, sector size: 2048
> >
> > note sector count and read image using dd:
> >
> > $ dd if=/dev/dvd bs=2k count=sector_count_value
> > of=destination_image_file_path
> >
> > Then, become root (e.g. using su), and
> > make mount point if needed, mount the image:
> > # mkdir -p /mnt/loop/myimage
> > # mount -o ro,loop destination_image_file_path /mnt/loop/myimage
> >
> > Now you can use files in /mnt/loop/myimage, CD/DVD media is not needed.
> >
> > NOTES on loopback method:
> > - needs root for lopback mount
> Can be overcome with appropriate mount options in /etc/fstab or
> /etc/auto.misc (for example)
> [summer at ns ~]$ grep loop /etc/auto.misc | head -2
> S1      -fstype=iso9660,ro,nosuid,nodev,noexec,loop
> :/var/local/mirrors/linux/SUSE/10.0/i386/ISO/SUSE-10.0-CD-i386-GM-CD1.iso
> S2      -fstype=iso9660,ro,nosuid,nodev,noexec,loop
> :/var/local/mirrors/linux/SUSE/10.0/i386/ISO/SUSE-10.0-CD-i386-GM-CD2.iso
>

Thank you for pointing that. Still, this required admin privileges for
plcing in fstab or a friendly admin.

> > - will work only for single session media (or first session of
> > multi-session media)
> > - no seeks when reading DV/DVD media, but entire media is being read,
> > justified only if you need to read most of the files, or a number of
> > file multiple times
> >
> > Method 2
> > Access files in physical media order
> >
> > You could observe, that (most of the time) inode numbers of files in
> > ISO9660 filesystem mounted on Linux are increasing as the start sector
> > numer of file increase (within a single session).
> > So to minimize seek operations access files in increasing inode value
> > order.
> >
> > Please use attached scripts
> > flist_by_inum.pl - sort file list by file inode number
> > flist_by_inum.pl - sort md5/sha1 list by file inode number
> >
> > Mount the CD/DVD media, let us assume the mount point is /mnt/dvd
> >
> > If you want to copy some or all regular files to destination_directory, run
> >
> > cd /mnt/dvd
> > find . -type f -print0 | flist_by_inum.pl -0 | cpio -p0md
> > destination_directory
> >
> > You can of course give find utility some files/directories instead of .,
> > as well as any criteria.
> > If you are sure no file name contain white-space, etc, you can remove -0
> > and 0:
> >
> > find . -type f -print | flist_by_inum.pl | cpio -pmd destination_directory
>
> I've not tested this, but "it should work."
>
> find . -type f \
>         | while read f ; do echo $(stat -c %i $f) " ' $f ; done \
>         | sort -n
>

Thank you, nice trick;
I would optimize it a bit (about 20x on my system) and remove numbers:

find . -type f -print0 | xargs -r0 stat -c '%i %n' | Csort -n | sed
's/^[0-9]\+ //'

This is about two time slower than my perl script, quite good!
And perl is not needed!

> >
> > If you want to verify SHA1 fo the files, do the following:
> > cd /mnt/dvd
> > cat sha1sum_file | mdlist_by_inum.pl | sha1sum -wc > ~/verify.rslt
> > 2>~/verify.msg
> >
> > NOTES:
> > - no root privileges needed
> > - should work for multi-session discs (although with some seeks)
> > - only files needed are accessed
> > - method also useful for accessing files on loopback-mounted images
> > located on DVD
>
> Much of this is "black majick" that few users will even think of pursuing.
>
> I think media verification should be build directly into the the burning
> tools, cdrecord (or whatever is is now) and growisofs, as it is in
> hdiutil in OS X. It's easy to find how to verify a burn if it's built
> into the burning tool and documented in the man page. Doing it in the
> GUI as at present doesn't work for those who don't use the gui.
>
>
> I can attest that seeking is a major pain. I've been having problems
> with DVD coasters, and no indication from growisofs that there was a
> problem. I tried a file-by-file comparison by md5sum but it really is
> too tedious.
>
> Since most of my DVDs are full of rpms , a suitable incantation of rpm
> isn't too bad.
>
> growisofs could compute an md5sum (or sha1sum) as it writes, then read
> back and check. As it knows how much to read, it's far more reliable
> than most users {c,w}ould do.

This would detect obviously bad recording.

But media to deteriorate with time, mishandling, etc.

Also, some media could be OK on the write It was burnt, but not on
another drive.

When I record my files on CD/DVD, I almost always include files containing
MD5 and SHA1 for all other files on the media.
Verifying is then very easy, just good idea to sort by inode number if
files read from CD/DVD media, and output needs to be filtered.
This allows me to detect problems with image creating, e.g. Joilet
name clashing or truncated.


>
>
>
> --
>
> Cheers
> John
>
I think about changing the proposed tip as follows:

- remove method 1, which  is standard and rather obvious, to make tip shorter
- add desription of sorting with stat from coreutils, as suggested be John



Thank you again,

Wojtek