[CentOS] CentOS 4.4 LILO Raid (also e2fsprogs in general)

Sat Sep 2 20:51:48 UTC 2006
Benjamin Karhan <simon at pop.psu.edu>

i noticed that the 4.4 upgrade broke LILO on Raid...
it gave errors on partitions not on the primary disk
   along the lines of "/dev/dis: no such device"...
even explicitly installing the boot sector on each
   device in the array, while it allowed the system
   to boot, gave errors at boot time and did not
   display the CentOS boot screen anymore...

i solved the problem by creating a package for the
   most recent version of LILO and installing that...
   with a few minor changes to "lilo.conf" everything
   was kosker for LILO install and boot...
   i even made a nice new 640x480x256 CentOS boot
   screen for it...
there is one caveat to using the new LILO...
   the new version (or at least one above 22.x) must be
   booted once (text-mode) before LILO can correctly
   probe the video BIOS...
   so, the boot screen needs to be installed after
   an initial priming boot...

afterwards, i did some thorough testing before deploying
   on any production servers... and noticed only one problem...
   the new version of LILO broke the "grubby" probe...
   primarily because "/boot/boot.b" is no longer needed...
   but also because the comparisons between "boot.b"
   and the actual boot sector aren't accurate anymore either...
   since i'd noticed that "grubby"'s probe for GRUB itself
   has been broken for a while... i decided a quick
   patch-job for the "mkinitrd" package was in order...
my new package correctly identifies whether either
   LILO or GRUB has been installed at all, but lacks the
   careful byte comparisons that were the core of the
   breakage of the original "grubby"...
   it's possibly problematic... but i doubt it... and it
   solves more problems than it creates...

anyways... that solved my 4.4 upgrade breakage...

and... since i was writing to the list... i figured i should
   mention a rather serious, but extremely unlikely to happen,
   bug i noticed in "e2fsprogs" (specifically "e2fsck")...
with the version on CentOS... "e2fsck" will cause some big
   problems and potential data loss on directories containing
   more than 1.2 million inodes...
   the first 1.2 million inodes will be ok...
   the next 1.2 million (max) will be saved to /lost+found
   as unidentified files... before it too fills up...
   and then everything else will just "vanish"...
this problem is due to some incorrect math in the handling
   of the maximum inode size... and has already been fixed
   in e2fsprogs 1.38...
   so, my solution was to take the 1.38 Fedora Core sources
   and build a package for CentOS...

if anyone is particularly interested... i have the SRPMS
   for my lilo, mkinitrd, and e2fsprogs packages...
   also, i have the pretty new CentOS boot screen (and a few
   others i made) and some working lilo configurations for
   switching over to them...
   all of which i could probably post somewhere public...
   but if they are really useful, i'd rather ship them along to
   someone more closely involved in maintaining CentOS...
   so as to "contribute" as best i could to everyone's future
   well-being as well...

B. Karhan
simon at pop.psu.edu
PRI/SSRI Unix Administrator