Wojciech Pilorz wrote:
> CentOS Wiki tip proposal: Copy/verify files from CD/DVD quickly
> If you want to copy a lot of files from CD or DVD, or verify md5 or
> sha1 of a large number of files, keep in mind that CD/DVD drive seeks
> are expensive and if are avoided as much as possible, the task could
> be completed faster (and quieter).
> Method 1.
> Copy the image to hard disk and loopback mount it.
> a) copying image
> assuming device is /dev/dvd and disk is either a single-session CD or
> CD-RW single session recorded in DAO mode, or DVD+RW recorded with
> growisofs
> $ dd if=/dev/dvd bs=2k of=hard_disk_directory/imagename.iso
> It the media is a single-session CD or CD-RW recorded in TAO mode,
> there are unreadable sectors at end of session, you need to determine
> size:
> $ isosize -x /dev/dvd
> results are like this:
> sector count: 346739, sector size: 2048
> note sector count and read image using dd:
> $ dd if=/dev/dvd bs=2k count=sector_count_value 
> of=destination_image_file_path
> Then, become root (e.g. using su), and
> make mount point if needed, mount the image:
> # mkdir -p /mnt/loop/myimage
> # mount -o ro,loop destination_image_file_path /mnt/loop/myimage
> Now you can use files in /mnt/loop/myimage, CD/DVD media is not needed.
> NOTES on loopback method:
> - needs root for lopback mount
Can be overcome with appropriate mount options in /etc/fstab or 
/etc/auto.misc (for example)
[summer at ns ~]$ grep loop /etc/auto.misc | head -2
S1      -fstype=iso9660,ro,nosuid,nodev,noexec,loop 
S2      -fstype=iso9660,ro,nosuid,nodev,noexec,loop 

> - will work only for single session media (or first session of
> multi-session media)
> - no seeks when reading DV/DVD media, but entire media is being read,
> justified only if you need to read most of the files, or a number of
> file multiple times
> Method 2
> Access files in physical media order
> You could observe, that (most of the time) inode numbers of files in
> ISO9660 filesystem mounted on Linux are increasing as the start sector
> numer of file increase (within a single session).
> So to minimize seek operations access files in increasing inode value 
> order.
> Please use attached scripts
> flist_by_inum.pl - sort file list by file inode number
> flist_by_inum.pl - sort md5/sha1 list by file inode number
> Mount the CD/DVD media, let us assume the mount point is /mnt/dvd
> If you want to copy some or all regular files to destination_directory, run
> cd /mnt/dvd
> find . -type f -print0 | flist_by_inum.pl -0 | cpio -p0md 
> destination_directory
> You can of course give find utility some files/directories instead of .,
> as well as any criteria.
> If you are sure no file name contain white-space, etc, you can remove -0 
> and 0:
> find . -type f -print | flist_by_inum.pl | cpio -pmd destination_directory

I've not tested this, but "it should work."

find . -type f \
	| while read f ; do echo $(stat -c %i $f) " ' $f ; done \
	| sort -n

> If you want to verify SHA1 fo the files, do the following:
> cd /mnt/dvd
> cat sha1sum_file | mdlist_by_inum.pl | sha1sum -wc > ~/verify.rslt
> 2>~/verify.msg
> - no root privileges needed
> - should work for multi-session discs (although with some seeks)
> - only files needed are accessed
> - method also useful for accessing files on loopback-mounted images
> located on DVD

Much of this is "black majick" that few users will even think of pursuing.

I think media verification should be build directly into the the burning 
tools, cdrecord (or whatever is is now) and growisofs, as it is in 
hdiutil in OS X. It's easy to find how to verify a burn if it's built 
into the burning tool and documented in the man page. Doing it in the 
GUI as at present doesn't work for those who don't use the gui.

I can attest that seeking is a major pain. I've been having problems 
with DVD coasters, and no indication from growisofs that there was a 
problem. I tried a file-by-file comparison by md5sum but it really is 
too tedious.

Since most of my DVDs are full of rpms , a suitable incantation of rpm 
isn't too bad.

growisofs could compute an md5sum (or sha1sum) as it writes, then read 
back and check. As it knows how much to read, it's far more reliable 
than most users {c,w}ould do.



