Installation problem, possibly RAID

List overview All Threads
Download

newer

older

is there a utility to tell if the...

/etc/rc.d/init.d/xfs freezes...

Edward Diener

10 Sep 2005 10 Sep '05

2:58 a.m.

I burned a DVD for CentOS 4.1. I booted from it and everything went fine for the graphical screen installation. I choose separate partitions for my /boot (hde6), / (hde9), and swap (hde12) areas. My /boot partition was ext2 and my / partition was ext3. I installed grub in my /boot partition successfully. I have a Boot Loader, System Commander 8.13 which controls the MBR. The installation then nicely ejected my DVD disk, told me to remove any other installation media, and rebooted my machine. It rebooted into System Commander, I chose the CentOS boot partition, and this rebooted me to CentOS without a problem.

CentOS now finished its installation steps, among which was setting up a user account, and attempted to bring up the login screen. My screen went dark, the icon went to a waiting/turning icon for a long time, and I said to myself uh-oh. Finally with most of the screen still dark a small message box appeared with an OK button which said:

"Can not start the greeter program, you will not be able to log in. This display will be disabled. Try logging in by other means and editing the configuration file."

I pressed OK, my screen went into text mode, and repeated lines of:

ext3-fs error (device hde9) in start transaction, Journal has aborted.

continued to fill the screen.

The only way to proceed was to hit the restart button of my computer.

My thoughts of possible reasons for the failure are these. My hard drive is off of an HPT 374 Raid controller, without Raid being actually used on it, and is a 160 GB hard drive, 147 GB formatted. The /boot (hde6) partition starts at approximately the 56 GB boundary, the / (hde9) partition starts at approximately the 76 GB, and the swap drive starts at approximately the 106 GB boundary.

Is it possible that I needed to turn on LBA32 as an install option, since there was a screen where I could have checked it but did not ? Is it possible that I needed to tell the install, when choosing my partitions for /boot,/, and swap that this was a Raid controller, event though I am not using Raid with it, since I noticed a Raid button on the Disk Druid graphical screen but used Edit instead to set up my partitions ? Is it possible that CentOS either does not support my Raid controller or supports it in some earlier release which does not work properly even when not using Raid, so that I need to install the proper release of it somehow during the installation process ?

I am groping for answers but am really hoping that someone has some knowledge of this problem so that I can use CentOS. Everything went well until the final disaster, and I was keen on getting CentOS to install on my computer. I had previously tried FC4 previously but that wouldn't even get past my graphical screen, failing because I have a Matrox P650 video adapter, but CentOS handled it with aplomb. I am a relatively Linux newbie although an experienced software developer and computer user, so if someone could help me get CentOS running it would really be appreciated. Thank you !

Show replies by date

Craig White

10 Sep 10 Sep

4:38 a.m.

On Fri, 2005-09-09 at 22:58 -0400, Edward Diener wrote:

...

I burned a DVD for CentOS 4.1. I booted from it and everything went fine for the graphical screen installation. I choose separate partitions for my /boot (hde6), / (hde9), and swap (hde12) areas. My /boot partition was ext2 and my / partition was ext3. I installed grub in my /boot partition successfully. I have a Boot Loader, System Commander 8.13 which controls the MBR. The installation then nicely ejected my DVD disk, told me to remove any other installation media, and rebooted my machine. It rebooted into System Commander, I chose the CentOS boot partition, and this rebooted me to CentOS without a problem.

CentOS now finished its installation steps, among which was setting up a user account, and attempted to bring up the login screen. My screen went dark, the icon went to a waiting/turning icon for a long time, and I said to myself uh-oh. Finally with most of the screen still dark a small message box appeared with an OK button which said:

"Can not start the greeter program, you will not be able to log in. This display will be disabled. Try logging in by other means and editing the configuration file."

I pressed OK, my screen went into text mode, and repeated lines of:

ext3-fs error (device hde9) in start transaction, Journal has aborted.

continued to fill the screen.

The only way to proceed was to hit the restart button of my computer.

My thoughts of possible reasons for the failure are these. My hard drive is off of an HPT 374 Raid controller, without Raid being actually used on it, and is a 160 GB hard drive, 147 GB formatted. The /boot (hde6) partition starts at approximately the 56 GB boundary, the / (hde9) partition starts at approximately the 76 GB, and the swap drive starts at approximately the 106 GB boundary.

Is it possible that I needed to turn on LBA32 as an install option, since there was a screen where I could have checked it but did not ? Is it possible that I needed to tell the install, when choosing my partitions for /boot,/, and swap that this was a Raid controller, event though I am not using Raid with it, since I noticed a Raid button on the Disk Druid graphical screen but used Edit instead to set up my partitions ? Is it possible that CentOS either does not support my Raid controller or supports it in some earlier release which does not work properly even when not using Raid, so that I need to install the proper release of it somehow during the installation process ?

I am groping for answers but am really hoping that someone has some knowledge of this problem so that I can use CentOS. Everything went well until the final disaster, and I was keen on getting CentOS to install on my computer. I had previously tried FC4 previously but that wouldn't even get past my graphical screen, failing because I have a Matrox P650 video adapter, but CentOS handled it with aplomb. I am a relatively Linux newbie although an experienced software developer and computer user, so if someone could help me get CentOS running it would really be appreciated. Thank you !

---- sounds like you handled things right but I'm confused as to what comprises things like /dev/hde1, hde2, hde3, hde4, hde5 etc.

is it possible for you to boot CD #1 again and type 'linux rescue' (no quotes) at the boot prompt to enter rescue mode.

When it completes booting, it would be interesting to find out...

fdisk -l /dev/hde

(this will list the partitions)

you could try repairing the partitions...

e2fsck -fy /dev/hde6 e2fsck -fy /dev/hde9

Is this a dual-boot with Windows? If so, did Windows think any part of hde to be part of a RAID array?

Craig

Edward Diener

10:19 a.m.

Craig White wrote:

...

On Fri, 2005-09-09 at 22:58 -0400, Edward Diener wrote:

...
I burned a DVD for CentOS 4.1. I booted from it and everything went fine for the graphical screen installation. I choose separate partitions for my /boot (hde6), / (hde9), and swap (hde12) areas. My /boot partition was ext2 and my / partition was ext3. I installed grub in my /boot partition successfully. I have a Boot Loader, System Commander 8.13 which controls the MBR. The installation then nicely ejected my DVD disk, told me to remove any other installation media, and rebooted my machine. It rebooted into System Commander, I chose the CentOS boot partition, and this rebooted me to CentOS without a problem.

CentOS now finished its installation steps, among which was setting up a user account, and attempted to bring up the login screen. My screen went dark, the icon went to a waiting/turning icon for a long time, and I said to myself uh-oh. Finally with most of the screen still dark a small message box appeared with an OK button which said:

"Can not start the greeter program, you will not be able to log in. This display will be disabled. Try logging in by other means and editing the configuration file."

I pressed OK, my screen went into text mode, and repeated lines of:

ext3-fs error (device hde9) in start transaction, Journal has aborted.

continued to fill the screen.

The only way to proceed was to hit the restart button of my computer.

My thoughts of possible reasons for the failure are these. My hard drive is off of an HPT 374 Raid controller, without Raid being actually used on it, and is a 160 GB hard drive, 147 GB formatted. The /boot (hde6) partition starts at approximately the 56 GB boundary, the / (hde9) partition starts at approximately the 76 GB, and the swap drive starts at approximately the 106 GB boundary.

Is it possible that I needed to turn on LBA32 as an install option, since there was a screen where I could have checked it but did not ? Is it possible that I needed to tell the install, when choosing my partitions for /boot,/, and swap that this was a Raid controller, event though I am not using Raid with it, since I noticed a Raid button on the Disk Druid graphical screen but used Edit instead to set up my partitions ? Is it possible that CentOS either does not support my Raid controller or supports it in some earlier release which does not work properly even when not using Raid, so that I need to install the proper release of it somehow during the installation process ?

I am groping for answers but am really hoping that someone has some knowledge of this problem so that I can use CentOS. Everything went well until the final disaster, and I was keen on getting CentOS to install on my computer. I had previously tried FC4 previously but that wouldn't even get past my graphical screen, failing because I have a Matrox P650 video adapter, but CentOS handled it with aplomb. I am a relatively Linux newbie although an experienced software developer and computer user, so if someone could help me get CentOS running it would really be appreciated. Thank you !

sounds like you handled things right

But obviously CentOS did not. Why ?

...

but I'm confused as to what comprises things like /dev/hde1, hde2, hde3, hde4, hde5 etc.

I have 3 non-Linux primary partitions and then an extended partition where all my Linux logical partitions exist. My linux partitions in the extended partition consist of 3 100MB boot partitions for various Linux distros, of which I attempted to use the second of the 3 for CentOS, followed by 3 20 GB root partitions for those Linux distros, the second of the 3 used for CentOS, followed by a 10 GB common Linux partition to share files between the distros, followed by my common swap drive.

...

is it possible for you to boot CD #1 again and type 'linux rescue' (no quotes) at the boot prompt to enter rescue mode.

When it completes booting, it would be interesting to find out...

fdisk -l /dev/hde

(this will list the partitions)

you could try repairing the partitions...

e2fsck -fy /dev/hde6 e2fsck -fy /dev/hde9

CentOS formatted these when I installed. Why would they need repair ?

...

Is this a dual-boot with Windows? If so, did Windows think any part of hde to be part of a RAID array?

Again, as explained in my OP, I have no RAID array but just the HPT 374 onboard RAID controller handling my hard drives. This is because my normal IDE contrller has other non-harddisk devices attached to it.

Bryan J. Smith

1:38 p.m.

Edward Diener eddielee@tropicsoft.com wrote:

...

But obviously CentOS did not. Why ?

Installers are not perfect, and they never will be. That includes Windows especially.

...

Again, as explained in my OP, I have no RAID array but just the HPT 374 onboard RAID controller handling my hard

drives.

Yep. FRAID cards are just "regular ATA" bus arbitrators. So as long as you don't setup the RAID organization, the disks should not have any special striping/blocking.

The HPT36x/37x are no exception, they are standard ATA channels, period.

...

This is because my normal IDE contrller has other non-harddisk devices attached to it.

Yep, ATAPI devices like CD/DVD.

-- Bryan

P.S. BTW, I've just started deploying some Intel i8xx/9xx systems with ICH5+ controllers and I am extremely _disappointed_ with the BIOS disk / Linux device mapping that causes both installer and rescue mode recovery issues. It was clearly not as "well thought out" by Intel compared to most of the nVidia MCP-04 ATA/SATA. Is it a newer Intel i8xx/9xx chipset?

-- Bryan J. Smith | Sent from Yahoo Mail mailto:b.j.smith@ieee.org | (please excuse any http://thebs413.blogspot.com/ | missing headers)

Edward Diener

2:28 p.m.

Bryan J. Smith wrote:

...

Edward Diener eddielee@tropicsoft.com wrote:

...
But obviously CentOS did not. Why ?

Installers are not perfect, and they never will be. That includes Windows especially.

I was thinking that somebody, given the error I reported on this post, might know why it occurred. I understand that installers are not perfect but they should give error messages that might tell one what went wrong.

...

...
Again, as explained in my OP, I have no RAID array but just the HPT 374 onboard RAID controller handling my hard

drives.

Yep. FRAID cards are just "regular ATA" bus arbitrators. So as long as you don't setup the RAID organization, the disks should not have any special striping/blocking.

The HPT36x/37x are no exception, they are standard ATA channels, period.

...
This is because my normal IDE contrller has other non-harddisk devices attached to it.

Yep, ATAPI devices like CD/DVD.

-- Bryan

P.S. BTW, I've just started deploying some Intel i8xx/9xx systems with ICH5+ controllers and I am extremely _disappointed_ with the BIOS disk / Linux device mapping that causes both installer and rescue mode recovery issues. It was clearly not as "well thought out" by Intel compared to most of the nVidia MCP-04 ATA/SATA. Is it a newer Intel i8xx/9xx chipset?

No it is an older Via chipset, the KT333 northbridge and the VT8233A southbridge, using an AMD processor. The mobo is an Abit AT7. I have had this machine run SimpleMEPIS and FC3. I was able to upgrade to the latest 3.3.1 version of SimpleMEPIS. I failed completely to get FC4 to install due to the fact that my video adapter's ( Matrox 650 ) support using VESA went bad between FC3 and FC4, and now have experienced this problem with CentOS. It is disappointing to see Linux get worse, rather than better, as new distros are created in dealing with hardware.

Bryan J. Smith

11 Sep 11 Sep

5:42 a.m.

On Sat, 2005-09-10 at 10:28 -0400, Edward Diener wrote:

...

I was thinking that somebody, given the error I reported on this post, might know why it occurred. I understand that installers are not perfect but they should give error messages that might tell one what went wrong.

Sometimes they literally can't.

Anything to do with storage is one area where there are _countless_ variations, combinations and other details that are just too broad, compounding and other issues.

Furthermore, it takes a _deep_ understanding of how 16-bit BIOS Int13h Disk Services work, and how disk geometry, disk numbering/ ordering/ mapping, Linux device/ driver/ initrd and other things differ. It literally took me _years_ to come up to speed on those things, and I _still_ only know how to solve maybe 75% of all issues.

And that's me, a human. An installer, no way.

...

No it is an older Via chipset, the KT333 northbridge and the VT8233A southbridge, using an AMD processor. The mobo is an Abit AT7. I have had this machine run SimpleMEPIS and FC3. I was able to upgrade to the latest 3.3.1 version of SimpleMEPIS. I failed completely to get FC4 to install due to the fact that my video adapter's ( Matrox 650 ) support using VESA went bad between FC3 and FC4,

I can't remember if I said so here, but even Alan Cox said he's not installing Fedora Core 4, and sticking with Fedora Core 3. But this is nothing new, some Red Hat Linux (now Fedora Core) releases are too new in their adoptions. But it was a little easier to know with Red Hat Linux (".0") -- although I've starting calling it the "reverse Star Trek."

The even are bad, the odd are good. Funny that it's Kinda opposite of most kernel/project revisions.

...

and now have experienced this problem with CentOS. It is disappointing to see Linux get worse, rather than better, as new distros are created in dealing with hardware.

Installers are _not_ Linux. Installers are _not_ distros.

In fact, the main problem isn't Linux, but the increase in "superstore- designed hardware." And that means cheap, poorly tested, Windows version _specific_ drivers, and absolutely _no_ public specifications.

-- Bryan J. Smith b.j.smith@ieee.org http://thebs413.blogspot.com ---------------------------------------------------------------------- The best things in life are NOT free - which is why life is easiest if you save all the bills until you can share them with the perfect woman

Edward Diener

11:06 a.m.

Bryan J. Smith wrote:

...

On Sat, 2005-09-10 at 10:28 -0400, Edward Diener wrote:

...
I was thinking that somebody, given the error I reported on this post, might know why it occurred. I understand that installers are not perfect but they should give error messages that might tell one what went wrong.

Sometimes they literally can't.

Anything to do with storage is one area where there are _countless_ variations, combinations and other details that are just too broad, compounding and other issues.

What I meant were two things.

First I received an error message box that told me nothing about what my problem was. I am a programmer myself and find such error messages to be, almost always, just lazy programming with little regard for the end user. The days when something wrong happens in code and the error message is essentially "something wrong happened" should have been over decades ago, so I find it pretty disappointing it persists, especially in OS code because OS code is crucial.

Second, even with the vague error message I received I would think that some CentOS developer might tell me what the possibilities are that generate this message. Then it would be easier for me to find a workaround rather than having to spend some time experimenting on installation options to get the OS installed, and not even knowing if there was a way for me to succeed.

...

Furthermore, it takes a _deep_ understanding of how 16-bit BIOS Int13h Disk Services work, and how disk geometry, disk numbering/ ordering/ mapping, Linux device/ driver/ initrd and other things differ. It literally took me _years_ to come up to speed on those things, and I _still_ only know how to solve maybe 75% of all issues.

And that's me, a human. An installer, no way.

Actually an installation program must know these things as far as I am concerned. If the idea is "we have a great OS but our installer is not nearly as good", how does one expect an OS to attract users if the installer can not even install properly, or at least tell the end-user why it failed for the end-user's particular hardware.

...

...
No it is an older Via chipset, the KT333 northbridge and the VT8233A southbridge, using an AMD processor. The mobo is an Abit AT7. I have had this machine run SimpleMEPIS and FC3. I was able to upgrade to the latest 3.3.1 version of SimpleMEPIS. I failed completely to get FC4 to install due to the fact that my video adapter's ( Matrox 650 ) support using VESA went bad between FC3 and FC4,

I can't remember if I said so here, but even Alan Cox said he's not installing Fedora Core 4, and sticking with Fedora Core 3. But this is nothing new, some Red Hat Linux (now Fedora Core) releases are too new in their adoptions. But it was a little easier to know with Red Hat Linux (".0") -- although I've starting calling it the "reverse Star Trek."

The even are bad, the odd are good. Funny that it's Kinda opposite of most kernel/project revisions.

From what I got investigating messages for FC4 installation black screens and white screens, the failure to support various video cards which worked flawlessly in FC3 was discovered soon after the FC4 release, and the reasons for this failure were well-known ( buggy Gnu C++ 4.0 code ). So why not just fix it and post an updated set of ISOs ? Not doing so just gives a release a bad name from the start.

...

...
and now have experienced this problem with CentOS. It is disappointing to see Linux get worse, rather than better, as new distros are created in dealing with hardware.

Installers are _not_ Linux. Installers are _not_ distros.

Installers are the first thing one sees when using an OS. If the installer fails the user is not going to think much of the OS. If one is concerned on promoting an OS, the installer needs to be first-rate.

...

In fact, the main problem isn't Linux, but the increase in "superstore- designed hardware." And that means cheap, poorly tested, Windows version _specific_ drivers, and absolutely _no_ public specifications.

Do Linux developers study Windows drivers in order to create Linux device driver code ? I can understand that the public specs can be bad and I can understand, in an unfortunate way, that the hardware companies have been sold on the idea that only Microsoft should be able to create software fot their hardware. In the latter case I sympathize with Linux device driver developers. I would not mind if an OS says that it can not support certain hardware in any way due to the lack of information by the hardware vendor, but I did not find any such information for CentOS ( or FC4 when I failed to install it ).

Thanks for your help and I am glad I got CentOS working.

Bryan J. Smith

3:47 p.m.

New subject: Installation problem, possibly RAID -- [OT] Why OS installers always "suck"

On Sun, 2005-09-11 at 07:06 -0400, Edward Diener wrote:

...

First I received an error message box that told me nothing about what my problem was. I am a programmer myself and find such error messages to be, almost always, just lazy programming with little regard for the end user. The days when something wrong happens in code and the error message is essentially "something wrong happened" should have been over decades ago, so I find it pretty disappointing it persists, especially in OS code because OS code is crucial.

As both a programmer and (wanna-be ;-) kernel developer, debugging at the OS level is far more complex than end-user programs. In an OS installer, with all the variables, it's damn near impossible to figure out the exact sequence of events. So it's most likely you are getting an error code/message that seems rather useless, but it's the only "common denominator" the installer can come up with.

We're not talking an application installer on a known, good, usable OS that is already installed. We are talking the OS itself, which is far, far, far more involved -- because the entire system is _not_ yet in a usable state.

...

Second, even with the vague error message I received I would think that some CentOS developer might tell me what the possibilities are that generate this message. Then it would be easier for me to find a workaround rather than having to spend some time experimenting on installation options to get the OS installed, and not even knowing if there was a way for me to succeed.

Again, see my comment above. Bugzilla reports are the best means to find out more.

...

Actually an installation program must know these things as far as I am concerned. If the idea is "we have a great OS but our installer is not nearly as good", how does one expect an OS to attract users if the installer can not even install properly, or at least tell the end-user why it failed for the end-user's particular hardware.

Again, you're asking for the _impossible_ -- even on Windows. There is the terminology barrier, the plethora of combinations that could have caused the error, etc... Installer issues are almost _never_ understood until they are repeated and documented in a bugzilla report -- and since hardware is _always_ changing, it's a "moving target."

Which is why any "sane" professional recommends that end-users _always_ get their OS "pre-installed." By pre-installed I mean either OEM, by a LUG (with users more familiar with the installer in use), etc...

Because not even experts who wrote the installer itself and have extensive kernel development experience can always figure out the massive set of combinations that _could_ be thrown at an installer. Which is why many installer type issues (like the kernel 2.6 / buggy BIOS geometry / NT 255/63 head/sector issue) were _not_ discovered until _after_ the installer was released. Not everyone can test every single hardware combination out there.

Again, this is very, very, _very_different_ than installing an application. When installing an application, most everyone has the same set of libraries, binaries, etc..., or they can easily bring the system to that same state as everyone else. Again, remember the purpose of an OS -- to take radically different hardware and capabilities and present them for applications in a single set of common interfaces. So, again, don't compare OS installers to applications or even application installers.

...

From what I got investigating messages for FC4 installation black screens and white screens, the failure to support various video cards which worked flawlessly in FC3 was discovered soon after the FC4 release, and the reasons for this failure were well-known ( buggy Gnu C++ 4.0 code ).

Again, is it _not_uncommon_ for Red Hat to use the "bleeding edge" code every 2-3 releases of their community release (Red Hat Linux, now Fedora core).

...

So why not just fix it and post an updated set of ISOs ? Not doing so just gives a release a bad name from the start.

Because who says one GNU C++ 4.0.x revision will fix them all? So then the Fedora team keeps spinning out revision of the installer and corresponding CD after revision after revision. Red Hat learned early on in Red Hat Linux that it cannot support respinning installers/CDs. In fact, I know of _no_ major vendor (including Microsoft) that respins new installers every few weeks -- only every 6-24 months.

Again, these releases are no different than the "bad name" releases of Red Hat Linux 5.0, Red Hat Linux 7.0, Fedora Core 2. Right now I've adopted the "reverse Star Trek" attitude on Fedora -- the even are bad, the odd are good. On the evens, Red Hat changes things in Fedora Core majorily (like old RHL ".0" releases). On the odds, Red Hat changes things minorly.

I'm looking forward to Fedora Core 5, just like I did Fedora Core 3.

...

Installers are the first thing one sees when using an OS. If the installer fails the user is not going to think much of the OS. If one is concerned on promoting an OS, the installer needs to be first-rate.

Impossible. See above commentary.

...

Do Linux developers study Windows drivers in order to create Linux device driver code ?

No offense, but are you really a developer? You're talking disassembly. You're talking machine code-level into assembler and "headache" level reverse engineering, and possible legal issues.

And lastly, you're talking about _software_ based hardware. You can't just "send codes X, Y and Z to a printer, scanner, etc..." but you have to build the entire _support_ code that the vendor probably licensed from a 3rd party before customizing. Sometimes Linux comes up with replacement "subsystems" for Windows equivalents, but getting them to work with proprietary hardware is a long, painful process.

But some do it. And it takes 6+ months. Which means half-way through the product lifecycle of a "superstore product" (~12 months), once the Linux driver is finally written (if at all), the "superstore vendor" has already introduced a replacement product for the next revision of Windows, or whatever "technology" is being pushed.

There are so many things here -- I can't even begin.

...

I can understand that the public specs can be bad and I can understand, in an unfortunate way, that the hardware companies have been sold on the idea that only Microsoft should be able to create software fot their hardware.

The logic is 180 degrees. Microsoft does _not_ create software for their hardware. In fact, if Microsoft had to write their own drivers, Windows would have about 1/100th of the drivers available in the stock Linux kernel. Microsoft would _die_overnight_ if vendors stopped producing Windows drivers for their hardware.

In fact, the reason why people upgrade Windows/applications is because hardware vendors force them too, and vice-versa. It's the "superstore model" which is come over from the decade-long, MS-driven OEM model. Hence why Microsoft has a stake in Best Buy (which really began it in the late '90s).

Hardware vendors support Microsoft because that addresses 90% of consumers, nearly 100% of consumers who shop at the superstore. That's where their profit model is.

...

In the latter case I sympathize with Linux device driver developers. I would not mind if an OS says that it can not support certain hardware in any way due to the lack of information by the hardware vendor, but I did not find any such information for CentOS ( or FC4 when I failed to install it ). Thanks for your help and I am glad I got CentOS working.

Again, I think your issues have more to do with going with a "bleeding- edge" Fedora Core adoption, just like others who used to try the latest Red Hat Linux prior.

Edward Diener

5:40 p.m.

New subject: Installation problem, possibly RAID -- [OT] Why OS installers always "suck"

Bryan J. Smith wrote:

...

On Sun, 2005-09-11 at 07:06 -0400, Edward Diener wrote:

...
First I received an error message box that told me nothing about what my problem was. I am a programmer myself and find such error messages to be, almost always, just lazy programming with little regard for the end user. The days when something wrong happens in code and the error message is essentially "something wrong happened" should have been over decades ago, so I find it pretty disappointing it persists, especially in OS code because OS code is crucial.

As both a programmer and (wanna-be ;-) kernel developer, debugging at the OS level is far more complex than end-user programs. In an OS installer, with all the variables, it's damn near impossible to figure out the exact sequence of events. So it's most likely you are getting an error code/message that seems rather useless, but it's the only "common denominator" the installer can come up with.

My thought does not involve having to know the exact sequence of events at any level when an error occurs but rather having that error recognized at the earliest possible time and propagating that error to the code which can put out an intelligent, even very technical if necessary, message to end-user. That message would then give the end-user at least a fighting chance of either understanding what went wrong, or reporting the message so that a developer of the install could explain it to the end-user.

I grant that OS code, especially in an installer, is almost certainly far more complicated than any application code. Still there are techniques for reporting errors from the point in which they are discovered. The modern way is exception handling, but even if one uses the older error code technique, the error should translate into a more narrow possibility thatn the vague and generalized message which I received.

...

We're not talking an application installer on a known, good, usable OS that is already installed. We are talking the OS itself, which is far, far, far more involved -- because the entire system is _not_ yet in a usable state.

Agreed.

...

...
Second, even with the vague error message I received I would think that some CentOS developer might tell me what the possibilities are that generate this message. Then it would be easier for me to find a workaround rather than having to spend some time experimenting on installation options to get the OS installed, and not even knowing if there was a way for me to succeed.

Again, see my comment above. Bugzilla reports are the best means to find out more.

Is there a Bugzilla data base for CentOS ? If so I sure do not see it anywhere off of the home page at http://www.centos.org/.

...

...
Actually an installation program must know these things as far as I am concerned. If the idea is "we have a great OS but our installer is not nearly as good", how does one expect an OS to attract users if the installer can not even install properly, or at least tell the end-user why it failed for the end-user's particular hardware.

Again, you're asking for the _impossible_ -- even on Windows. There is the terminology barrier, the plethora of combinations that could have caused the error, etc... Installer issues are almost _never_ understood until they are repeated and documented in a bugzilla report -- and since hardware is _always_ changing, it's a "moving target."

Which is why any "sane" professional recommends that end-users _always_ get their OS "pre-installed." By pre-installed I mean either OEM, by a LUG (with users more familiar with the installer in use), etc...

Because not even experts who wrote the installer itself and have extensive kernel development experience can always figure out the massive set of combinations that _could_ be thrown at an installer. Which is why many installer type issues (like the kernel 2.6 / buggy BIOS geometry / NT 255/63 head/sector issue) were _not_ discovered until _after_ the installer was released. Not everyone can test every single hardware combination out there.

Again, this is very, very, _very_different_ than installing an application. When installing an application, most everyone has the same set of libraries, binaries, etc..., or they can easily bring the system to that same state as everyone else. Again, remember the purpose of an OS -- to take radically different hardware and capabilities and present them for applications in a single set of common interfaces. So, again, don't compare OS installers to applications or even application installers.

I agree with all you say above. My argument is not that any OS must install on my computer, or anyone elses computer, but that if it does not the end-user should be given a decent indication why not. Even though OS code is generally more complicated than application code I see no difference in its ability to give intelligent error messages, or even error message numbers which translate to an intelligent reason.

...

...
From what I got investigating messages for FC4 installation black screens and white screens, the failure to support various video cards which worked flawlessly in FC3 was discovered soon after the FC4 release, and the reasons for this failure were well-known ( buggy Gnu C++ 4.0 code ).

Again, is it _not_uncommon_ for Red Hat to use the "bleeding edge" code every 2-3 releases of their community release (Red Hat Linux, now Fedora core).

I was spoiled by FC3 which installed and worked nearly flawlessly on my machine.

...

...
So why not just fix it and post an updated set of ISOs ? Not doing so just gives a release a bad name from the start.

Because who says one GNU C++ 4.0.x revision will fix them all? So then the Fedora team keeps spinning out revision of the installer and corresponding CD after revision after revision. Red Hat learned early on in Red Hat Linux that it cannot support respinning installers/CDs. In fact, I know of _no_ major vendor (including Microsoft) that respins new installers every few weeks -- only every 6-24 months.

You make a good point but it is annoying that what worked on a previous release could not even install on the next one. C'est la vie.

...

Again, these releases are no different than the "bad name" releases of Red Hat Linux 5.0, Red Hat Linux 7.0, Fedora Core 2. Right now I've adopted the "reverse Star Trek" attitude on Fedora -- the even are bad, the odd are good. On the evens, Red Hat changes things in Fedora Core majorily (like old RHL ".0" releases). On the odds, Red Hat changes things minorly.

I'm looking forward to Fedora Core 5, just like I did Fedora Core 3.

...
Installers are the first thing one sees when using an OS. If the installer fails the user is not going to think much of the OS. If one is concerned on promoting an OS, the installer needs to be first-rate.

Impossible. See above commentary.

...
Do Linux developers study Windows drivers in order to create Linux device driver code ?

No offense, but are you really a developer? You're talking disassembly. You're talking machine code-level into assembler and "headache" level reverse engineering, and possible legal issues.

My question was a reply to your statement of

"In fact, the main problem isn't Linux, but the increase in "superstore- designed hardware." And that means cheap, poorly tested, Windows version _specific_ drivers, and absolutely _no_ public specifications."

You seemed to be implying that poor Windows version _specific_ drivers was one of the reasons why Linux has trouble creating device drivers for hardware. So I asked the above as if to say that I find it hard to believe that Linux depends on disassembling Windows driver code in order to get at the internal hardware specs for a device.

...

And lastly, you're talking about _software_ based hardware. You can't just "send codes X, Y and Z to a printer, scanner, etc..." but you have to build the entire _support_ code that the vendor probably licensed from a 3rd party before customizing. Sometimes Linux comes up with replacement "subsystems" for Windows equivalents, but getting them to work with proprietary hardware is a long, painful process.

But some do it. And it takes 6+ months. Which means half-way through the product lifecycle of a "superstore product" (~12 months), once the Linux driver is finally written (if at all), the "superstore vendor" has already introduced a replacement product for the next revision of Windows, or whatever "technology" is being pushed.

There are so many things here -- I can't even begin.

I do not doubt the difficulty but I have done some very difficult programming myself so I do not doubt the solution also.

...

...
I can understand that the public specs can be bad and I can understand, in an unfortunate way, that the hardware companies have been sold on the idea that only Microsoft should be able to create software fot their hardware.

The logic is 180 degrees. Microsoft does _not_ create software for their hardware. In fact, if Microsoft had to write their own drivers, Windows would have about 1/100th of the drivers available in the stock Linux kernel. Microsoft would _die_overnight_ if vendors stopped producing Windows drivers for their hardware.

In fact, the reason why people upgrade Windows/applications is because hardware vendors force them too, and vice-versa. It's the "superstore model" which is come over from the decade-long, MS-driven OEM model. Hence why Microsoft has a stake in Best Buy (which really began it in the late '90s).

Hardware vendors support Microsoft because that addresses 90% of consumers, nearly 100% of consumers who shop at the superstore. That's where their profit model is.

Thanks for pointing this out to me.

Microsoft does have a very good record in supporting hardware devices, and their install programs are very solid. If Linux is to compete against Windows it must take the same attitude that it needs to be as good, or at least needs to be more informative when an error occurs.

...

...
In the latter case I sympathize with Linux device driver developers. I would not mind if an OS says that it can not support certain hardware in any way due to the lack of information by the hardware vendor, but I did not find any such information for CentOS ( or FC4 when I failed to install it ). Thanks for your help and I am glad I got CentOS working.

Again, I think your issues have more to do with going with a "bleeding- edge" Fedora Core adoption, just like others who used to try the latest Red Hat Linux prior.

I am now on CentOS 4.1 so I will leave bleeding edge Fedora behind. I did not appreciate the Fedora 4 answers I got when I brought up the video card problem on their forums so I will go for greater stability instead. I never wanted to be bleeding edge but Fedora 3 worked so easily I thought that Fedora 4 would easy to setup and use. I was wrong about that.

Johnny Hughes

6:24 p.m.

New subject: Installation problem, possibly RAID -- [OT] Why OS installers always "suck"

On Sun, 2005-09-11 at 13:40 -0400, Edward Diener wrote: <snip>

...

Is there a Bugzilla data base for CentOS ? If so I sure do not see it anywhere off of the home page at http://www.centos.org/.

yes ... http://bugs.centos.org/

it is under the www.centos.org main menu at:

Support -> Bugs

------------

there is also http://redhat.bugzilla.org/ ... most of the time, unless we changed some code to cause the bug, it is a bug that is from upstream. Looking through the redhat bugzilla is always a good thing for finding out if someone else has your issue.

<snip>

Johnny Hughes

6:26 p.m.

New subject: Installation problem, possibly RAID -- [OT] Why OS installers always "suck"

On Sun, 2005-09-11 at 13:24 -0500, Johnny Hughes wrote:

...

On Sun, 2005-09-11 at 13:40 -0400, Edward Diener wrote:

<snip>

...
Is there a Bugzilla data base for CentOS ? If so I sure do not see it anywhere off of the home page at http://www.centos.org/.

yes ... http://bugs.centos.org/

it is under the www.centos.org main menu at:

Support -> Bugs

there is also http://redhat.bugzilla.org/ ... most of the time, unless

wrong-----------^^^^^^^^^^^^^^^^^^^^^^^^^^^

of course, I meant http://bugzilla.redhat.com/ ... sorry

...

we changed some code to cause the bug, it is a bug that is from upstream. Looking through the redhat bugzilla is always a good thing for finding out if someone else has your issue.

<snip> _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos

Edward Diener

10:41 p.m.

New subject: Installation problem, possibly RAID -- [OT] Why OS installers always "suck"

Johnny Hughes wrote:

...

On Sun, 2005-09-11 at 13:40 -0400, Edward Diener wrote:

<snip>

...
Is there a Bugzilla data base for CentOS ? If so I sure do not see it anywhere off of the home page at http://www.centos.org/.

yes ... http://bugs.centos.org/

it is under the www.centos.org main menu at:

Support -> Bugs

Thanks, I see it now.

Bryan J. Smith

6:32 p.m.

New subject: Installation problem, possibly RAID -- [OT] Why OS installers always "suck"

On Sun, 2005-09-11 at 13:40 -0400, Edward Diener wrote:

...

My thought does not involve having to know the exact sequence of events at any level when an error occurs but rather having that error recognized at the earliest possible time and propagating that error to the code which can put out an intelligent, even very technical if necessary, message to end-user. That message would then give the end-user at least a fighting chance of either understanding what went wrong, or reporting the message so that a developer of the install could explain it to the end-user.

You're thinking like an application/installer developer, not an OS/installer developer. They are _worlds_ of difference.

Remember why the OS exists, to provide a _standard_set_ of interfaces to applications. That means there are "known, standardized interfaces" when you install an application, or when an application is running.

When the OS installer itself is running, you have _no_ standardization in hardware. The OS _provides_ the standardization for the hardware. So until that OS is setup -- the installer can do little to provide why something failed.

...

I grant that OS code, especially in an installer, is almost certainly far more complicated than any application code. Still there are techniques for reporting errors from the point in which they are discovered. The modern way is exception handling, but even if one uses the older error code technique, the error should translate into a more narrow possibility thatn the vague and generalized message which I received.

Again, you're _still_ thinking like an application/installer developer. Monolithic kernel device drivers do _not_ have exception handling, they can_not_ call some "user-space system function" to throw an exception -- they throw far more "low-level" interrupts. There are no "rich exception handling features" for them to use.

Otherwise device drivers would be slow and bloated. We're talking the kernel program and OS itself -- not some selection of programs that can take advantage of the exception facilities of a pre-emptive OS. In other words, there is no "pre-emptive OS" for the OS itself. ;->

So the common way to "find out what's wrong" in a kernel is to run a _separate_ "host system" (be it a 2nd, physical system, or run the "target" as a virtual system on the "host").

...

Is there a Bugzilla data base for CentOS ? If so I sure do not see it anywhere off of the home page at http://www.centos.org/.

CentOS is a 100% 1:1 rebuild of Red Hat Enterprise Linux. File them at Red Hat. You do _not_ need to pay Red Hat one dime.

...

I agree with all you say above. My argument is not that any OS must install on my computer, or anyone elses computer, but that if it does not the end-user should be given a decent indication why not. Even though OS code is generally more complicated than application code I see no difference in its ability to give intelligent error messages, or even error message numbers which translate to an intelligent reason.

And I'm telling you it's impossible.

As someone who has built numerous embedded systems -- the "system-level" developer standpoint is 100% different than the "application-level" (or even library-level) developer standpoint. Again, in a nutshell, there is no "pre-emptive OS" to service the OS itself.

Which is why, in the embedded world, we use failsafe firmwares with a failsafe mode, a "host" platform for remote debugging, etc... In the commodity PC world -- if the OS/installer screws up, it's very, very difficult to get anything useable like application exception handling.

...

I was spoiled by FC3 which installed and worked nearly flawlessly on my machine.

Stick with Fedora Core 3. Alan Cox just came out the other day on how many issues have been introduced on Fedora Core 4 by various changes, and why he's staying with Fedora Core 3.

My History ...

I did _not_ upgrade to Red Hat Linux 5.0, waited for Red Hat Linux 5.2. I did _not_ upgrade to Red Hat Linux 6.0, waited for Red Hat Linux 6.1. I did _not_ upgrade to Red Hat Linux 7.0, waited for Red Hat Linux 7.2. I did _not_ upgrade to Red Hat Linux 8.0, waited for Red Hat Linux 9. I did _not_ upgrade to Fedora Core 2, waited for Fedora Core 3. I am _not_ upgrading to Fedora Core 4, waiting for Fedora Core 5.

...

You make a good point but it is annoying that what worked on a previous release could not even install on the next one. C'est la vie.

Because things changed _radically_. Anytime you do that, you introduce issues. But if Red Hat didn't adopt GLibC 2.0 (RHL5.0), GCC 2.96/3 (RHL7.0),

Frankly, I continue to blame this on Red Hat. I have point-by-point recommended why they need to go back to the revisioning so people _know_ when there is a ".0" release when there is a ".1+" revision.

Last year I covered this _in_depth_ (through Fedora Core 3): http://www.geocities.com/thebs413/RH-Distribution-FAQ-3.html

...

My question was a reply to your statement of "In fact, the main problem isn't Linux, but the increase in "superstore- designed hardware." And that means cheap, poorly tested, Windows version _specific_ drivers, and absolutely _no_ public specifications." You seemed to be implying that poor Windows version _specific_ drivers was one of the reasons why Linux has trouble creating device drivers for hardware. So I asked the above as if to say that I find it hard to believe that Linux depends on disassembling Windows driver code in order to get at the internal hardware specs for a device.

Yes, that's _exactly_ it in many, many cases!

Furthermore, a lot of "superstore hardware" is now 99.9% software. And you can't grow those software subsystems overnight. There are major 3rd parties who sell software RAID, software modem, software audio, software MAC (network), etc... software that make a _killing_ because hardware vendors license it, and change a few things.

Eventually Linux creates replacement, unified subsystems, but it's not always easy to figure out how to get them to interface to the specific and endless variants of hardware out there. E.g., there might be literally 1,000 products with 1,000 _different_ Windows drivers, all from the same codebase (but slightly different), and if the _single_ Linux driver can support 300 of those variants, that's a "good driver."

Not to mention the fact that when a new piece of "superstore hardware" comes out, the Windows drivers have been written for it _prior_ to release. Linux drivers writers typically have to wait until the hardware is on the shelf before they can start. Hence the lag time.

This is _not_ going to get better. It's one of the falicies in the Linux world. Only sheer volume matters, and that's most server hardware (non-superstore) have good Linux drivers, because Linux is about 30+% of new server shipments from OEMs.

...

I do not doubt the difficulty but I have done some very difficult programming myself so I do not doubt the solution also.

Have you done kernel-level and device-driver development? It's not about "difficult programming." It's about system-level development.

E.g., have you written a boot-loader?

...

Thanks for pointing this out to me. Microsoft does have a very good record in supporting hardware devices,

Not Microsoft, but hardware vendors. Microsoft does _not_ write device drivers. If they did, Windows would have 1/100th the driver support of Linux.

...

and their install programs are very solid.

I completely _disagree_. Especially for those of us who _only_ supported Windows NT 3.5, 3.51 and 4.0 when Windows 95 and 98 were popular.

Hardware is developed for specific Windows releases. If you have hardware that is not supported by your Windows version, you can find all sorts of issues.

And the installer can get into a self-rebooting loop that it _never_ exits. With Linux installers, I have a known quantity that I can control. Windows installers are a _joke_.

...

If Linux is to compete against Windows

Linux does _not_ compete against Windows. Windows is a distribution avenue for Microsoft. Linux is an open solution for companies that want to avoid such.

...

it must take the same attitude

What attitude? Microsoft's attitude? That we (like Microsoft) expect vendor to write device drivers? If so, then there will be _few_ device drivers for Linux.

...

that it needs to be as good, or at least needs to be more informative when an error occurs.

You can't give a hardware detection error in an installer if the installer doesn't know how to detect the hardware. And the only way an installer knows what the heck a piece of hardware does is if there is a driver telling it so! It's a chicken-egg issue.

So no, that's not it at all. What Linux needs is vendors to create driver disks and include them with their products -- even if only for select Linux distributions (just RHEL/RHD and NLD will do). But they won't. That's the problem.

It has *0* to do with the installer.

E.g., the installer doesn't know what to do with an unsupported storage device. There is _no_ reference information on what type of storage device it is. It can't tell you, "this is brand X model Y" other than what the PCI ID or other strings tell it -- hence the problem.

...

I am now on CentOS 4.1 so I will leave bleeding edge Fedora behind. I did not appreciate the Fedora 4 answers I got when I brought up the video card problem on their forums so I will go for greater stability instead.

It's probably because of your poor assumptions like above. No offense, but that's probably it.

...

I never wanted to be bleeding edge but Fedora 3 worked so easily I thought that Fedora 4 would easy to setup and use. I was wrong about that.

Fedora Core 2 was "bleeding edge." Fedora Core 3 was an evolutionary revision after Fedora Core 2.

Fedora Core 4 is "bleeding edge." Fedora Core 5 will be an evolutionary revision after Fedora Core 4.

The same was true prior, especially with Red Hat Linux. Everytime Red Hat changes things, it takes 1-2 revisions to get all the bugs worked out. Red Hat tends to "push the envelope."

In fact, Windows XP (NT5.1) is just an evolutionary revision from Windows 2000 (NT5.0), which was an evolutionary revision from Windows NT 4.0.

And Windows 98 (MSDOS7.1) was an evolutionary revision from Windows 95 (MSDOS7.0/7.1) which was an evolutionary revision from MS-DOS 6.x / Windows 3.x -- something Caldera proved in court, to complete technical emulation (removing MSDOS7.0 from Windows 95) against Microsoft (i.e., so Windows 95 was illegal product bundling of DOS/Windows into one).

You're assumptions are based on other assumptions that just aren't true.

Microsoft doesn't write drivers, and doesn't even develop much of the installer/upgrader. And Microsoft works with superstore vendors so anytime you upgrade just 1 -- applications, OS, PC or peripherals -- you are forced to upgrade _all_ of the other 3 if they are more than 2-3 years "out-of-date."

In Linux, once a "core/base" hardware driver is developed, it is perpetual for many, many products of the same "core/base". Although the superstore vendors regularly introduce new variants that differ slightly, so it's always an issue, and not all are supported (just the most popular/known).

Edward Diener

10 Sep 10 Sep

9:24 p.m.

Bryan J. Smith wrote:

...

Edward Diener eddielee@tropicsoft.com wrote:

...
But obviously CentOS did not. Why ?

Installers are not perfect, and they never will be. That includes Windows especially.

...
Again, as explained in my OP, I have no RAID array but just the HPT 374 onboard RAID controller handling my hard

drives.

Yep. FRAID cards are just "regular ATA" bus arbitrators. So as long as you don't setup the RAID organization, the disks should not have any special striping/blocking.

The HPT36x/37x are no exception, they are standard ATA channels, period.

...
This is because my normal IDE contrller has other non-harddisk devices attached to it.

Yep, ATAPI devices like CD/DVD.

-- Bryan

P.S. BTW, I've just started deploying some Intel i8xx/9xx systems with ICH5+ controllers and I am extremely _disappointed_ with the BIOS disk / Linux device mapping that causes both installer and rescue mode recovery issues. It was clearly not as "well thought out" by Intel compared to most of the nVidia MCP-04 ATA/SATA. Is it a newer Intel i8xx/9xx chipset?

Just though I mentioned that I got CentOS installed properly by not using a boot partitition and just installing everything into a root partition. Why this bug insists on my machine, and whether it exists for others for some reason is something I do not know. My guess is that grub booting on a boot partition and executing programs on a root partition needs to know the disk geometry of the root partition and is not doing that properly on my machine for some reason or another.

Bryan J. Smith

11 Sep 11 Sep

5:16 a.m.

On Sat, 2005-09-10 at 17:24 -0400, Edward Diener wrote:

...

Just though I mentioned that I got CentOS installed properly by not using a boot partitition and just installing everything into a root partition. Why this bug insists on my machine, and whether it exists for others for some reason is something I do not know. My guess is that grub booting on a boot partition and executing programs on a root partition needs to know the disk geometry of the root partition and is not doing that properly on my machine for some reason or another.

I pretty much killed the "/boot as a separate slice" option years ago. Too many little nagging headaches, especially with GRUB (although LILO itself is another headache).

About the only time you need it is for MD (software) RAID. But if I need RAID, I throw in a 3Ware (ATA/SATA) or LSI (SCSI/SATA) card. So that removes that issue for myself.

Edward Diener

10:40 a.m.

Bryan J. Smith wrote:

...

On Sat, 2005-09-10 at 17:24 -0400, Edward Diener wrote:

...
Just though I mentioned that I got CentOS installed properly by not using a boot partitition and just installing everything into a root partition. Why this bug insists on my machine, and whether it exists for others for some reason is something I do not know. My guess is that grub booting on a boot partition and executing programs on a root partition needs to know the disk geometry of the root partition and is not doing that properly on my machine for some reason or another.

I pretty much killed the "/boot as a separate slice" option years ago. Too many little nagging headaches, especially with GRUB (although LILO itself is another headache).

About the only time you need it is for MD (software) RAID. But if I need RAID, I throw in a 3Ware (ATA/SATA) or LSI (SCSI/SATA) card. So that removes that issue for myself.

I was doing it so that if I ever needed to move my root partition, while keeping the same partition order, I would not need to reinitialize grub on the boot partition. I was told however that grub does not have the smarts to recognize where its root partition is at run-time so that I would have to reinitialize it anyway if I moved my root partition.

In general Linux's reliance on partition order in mounting hard disk partitions, and evidently grub's reliance on specific hardcoded hard disk geometry in order to find its root partition, seem primitive to me. I could be wrong but I thought that hard disk information on PCs was a pretty well determined thing and a better system would automatically detemine these things at run-time. Of course MS Windows does not allow a separate boot partition at all and also relies on partition order, so my remarks go to Windows as well as Linux.

In these days of large hard disks, boot loaders which can boot multiple operating systems, and partition managers which can move and image hard disk partitions at will, a better system for OSs to find what they need on hard disks is needed.

Bryan J. Smith

11:04 a.m.

On Sun, 2005-09-11 at 06:40 -0400, Edward Diener wrote:

...

In general Linux's reliance on partition order in mounting hard disk partitions, and evidently grub's reliance on specific hardcoded hard disk geometry in order to find its root partition, seem primitive to me. I could be wrong but I thought that hard disk information on PCs was a pretty well determined thing and a better system would automatically detemine these things at run-time. Of course MS Windows does not allow a separate boot partition at all and also relies on partition order, so my remarks go to Windows as well as Linux.

Windows can _only_ boot off of BIOS disk 80h (i.e., first). That's the limitation of not only the MS MBR, but both IO.SYS (DOS) and NTLDR (NT). In the case NT loader, this is known as the "System" volume (and, ironically, the partition that holds C:\WINNT is the "Boot" volume).

Linux can boot off of an array of different partitions, disks, etc... In fact, Linux can load _multiple_ disk drivers in its initrd at boot. NT can only load 1 additional vendor driver (ntbootdd.sys). There are other advantages to Linux as well.

The trade-off -- on the PC BIOS architecture -- is that Linux _must_ map the BIOS disk numbering to Linux devices. That's why GRUB, LILO, etc... have an issue when the BIOS disk numbering/order is changed -- because it screws up assumptions.

The MS MBR blindly assumes BIOS disk 80h is correct. Then it blindly assumes what slices the bootstrap/NTLDR is at. If it guesses wrong, game over. Although a NT5 (200x/XP) boot disk has "fixmbr" (MBR) and "fixboot" (bootstrap/NTLDR) are on.

Try dual-booting Microsoft systems with their loaders -- let alone to multiple disks without going into the BIOS and "hiding" some.

...

In these days of large hard disks, boot loaders which can boot multiple operating systems, and partition managers which can move and image hard disk partitions at will, a better system for OSs to find what they need on hard disks is needed.

It's a problem _only_ with (largely) the PC. Don't get me started. @-p

Les Mikesell

4:41 p.m.

On Sun, 2005-09-11 at 05:40, Edward Diener wrote:

...

I was doing it so that if I ever needed to move my root partition, while keeping the same partition order, I would not need to reinitialize grub on the boot partition. I was told however that grub does not have the smarts to recognize where its root partition is at run-time so that I would have to reinitialize it anyway if I moved my root partition.

I thought you could specify root=LABEL=xxx in the grub kernel line. I usually avoid labels when making custom changes because the stock labels aren't unique and the system cannot deal with duplicates if you ever move a disk to a different machine but it should work if you just shift them around in the same box. It might be better to use LVM identifiers which are probably unique if you don't do full disk image copies to clone machines.

...

In general Linux's reliance on partition order in mounting hard disk partitions, and evidently grub's reliance on specific hardcoded hard disk geometry in order to find its root partition, seem primitive to me.

In the first stages of booting you are at the mercy of the BIOS in rom. Lilo pre-computes the disk sectors that will need to be loaded to the the kernel and initrd into memory and stores the map in bios terms, so you have to re-run lilo after any change to lilo.conf. Grub loads through stages that eventually know enough about the filesystem to find and read grub.conf at boot time, but you still have the problem of having to map everything you want into bios terms and fitting within the bios limitations until the kernel and initrd are in memory so you have real disk drivers.

...

I could be wrong but I thought that hard disk information on PCs was a pretty well determined thing and a better system would automatically detemine these things at run-time.

It's slightly better than in the years when PC disks had a 32 meg maximum size, but not much. Even more recently the 1024 cylinder limit has been in bios for so long that I just always put a /boot partition first on the first drive automatically even if it is not always necessary these days.

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

5:17 p.m.

On Sun, 2005-09-11 at 11:41 -0500, Les Mikesell wrote:

...

I thought you could specify root=LABEL=xxx in the grub kernel line. I usually avoid labels when making custom changes because the stock labels aren't unique and the system cannot deal with duplicates if you ever move a disk to a different machine but it should work if you just shift them around in the same box.

Yes and no.

Yes, it's more capable than NTLDR on a legacy PC BIOS/DOS disk label.

But no, it doesn't solve the BIOS disk order to Linux device name mapping issue. You _still_ have to tell GRUB how to map BIOS fixed disks (80h, 81h, 82h, etc..., what GRUB calls hd0, hd1, hd2, etc...) to Linux devices (hda, hde, sda, etc...).

...

It might be better to use LVM identifiers which are probably unique if you don't do full disk image copies to clone machines.

NT's LDM and Linux's LVM drastically help the bootstrap location issue. Unfortunately, there are still assumptions/issues of the PC BIOS Int13h Disk Services to contend with at boot (that aren't issues on other platforms). The disk label will never solve this until we use a non- legacy boot approach.

In all honesty, with Intel and Phoenix sleeping together, and Microsoft's wish (and their "superstore vendor" whores) to tie the PC firmware to specific Windows releases, there's virtually _no_movement_ on this. Which brings me to Apple -- I think they will "raise the bar" on the PC platform with their proprietary approach. From that, AMD and the various Tawainese/Chinese manufacturers will come up with a clone that is an open standard, and will be universally adopted.

That's my prediction in the next 3-4 years. Just like PXE finally brought UNIX workstation-like booting to the PC world, I think Apple's innovations on the PC platform (even if proprietary) will encourage vendors to clone it with an open implementation.

...

It's slightly better than in the years when PC disks had a 32 meg maximum size, but not much. Even more recently the 1024 cylinder limit has been in bios for so long that I just always put a /boot partition first on the first drive automatically even if it is not always necessary these days.

As far as I'm concerned, _unless_ you are using a LDM Disk Label (aka "dynamic disk"), there is a real 33.8GB (32GiB) limit on the C: filesystem when dual-booting with Windows XP SP2+.

Les Mikesell

6:06 p.m.

On Sun, 2005-09-11 at 12:17, Bryan J. Smith wrote:

...

On Sun, 2005-09-11 at 11:41 -0500, Les Mikesell wrote:

...
I thought you could specify root=LABEL=xxx in the grub kernel line. I usually avoid labels when making custom changes because the stock labels aren't unique and the system cannot deal with duplicates if you ever move a disk to a different machine but it should work if you just shift them around in the same box.

Yes and no.

Yes, it's more capable than NTLDR on a legacy PC BIOS/DOS disk label.

But no, it doesn't solve the BIOS disk order to Linux device name mapping issue. You _still_ have to tell GRUB how to map BIOS fixed disks (80h, 81h, 82h, etc..., what GRUB calls hd0, hd1, hd2, etc...) to Linux devices (hda, hde, sda, etc...).

He's talking about finding the root partition here. At that point you have the kernel and initrd loaded and no longer need bios. Assume you have /boot on a partition that bios understands and want to be able to arbitrarily move the drive that holds the / partition around. Won't labels, LVM identifiers and even md devices be found and assembled correctly from anywhere at this stage?

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

6:34 p.m.

On Sun, 2005-09-11 at 13:06 -0500, Les Mikesell wrote:

...

He's talking about finding the root partition here. At that point you have the kernel and initrd loaded and no longer need bios. Assume you have /boot on a partition that bios understands and want to be able to arbitrarily move the drive that holds the / partition around. Won't labels, LVM identifiers and even md devices be found and assembled correctly from anywhere at this stage?

If the disk order does not change in the BIOS, yes. But if the disk order changes in the BIOS, no. That's typically the problem.

In reality, it's _also_ a problem for NT-based Windows too. Many people have only used DOS-based (9x/Me) Windows which can use BIOS Disk Services (aka "Using Compatibility Mode").

Les Mikesell

6:46 p.m.

On Sun, 2005-09-11 at 13:34, Bryan J. Smith wrote:

...

On Sun, 2005-09-11 at 13:06 -0500, Les Mikesell wrote:

...
He's talking about finding the root partition here. At that point you have the kernel and initrd loaded and no longer need bios. Assume you have /boot on a partition that bios understands and want to be able to arbitrarily move the drive that holds the / partition around. Won't labels, LVM identifiers and even md devices be found and assembled correctly from anywhere at this stage?

If the disk order does not change in the BIOS, yes. But if the disk order changes in the BIOS, no. That's typically the problem.

In reality, it's _also_ a problem for NT-based Windows too. Many people have only used DOS-based (9x/Me) Windows which can use BIOS Disk Services (aka "Using Compatibility Mode").

Why should the kernel/initrd-loaded stage care anything at all about bios when finding the root partition? /boot has to be found by bios, of course, but I'm pretty sure I've put / partitions on drives with no bios access at all that nothing knows about until the kernel is loaded and probes for them.

-- Les Mikesell lesmikesell@gmail.com

Bryan J. Smith

6:53 p.m.

On Sun, 2005-09-11 at 13:46 -0500, Les Mikesell wrote:

...

Why should the kernel/initrd-loaded stage care anything at all about bios when finding the root partition? /boot has to be found by bios, of course, but I'm pretty sure I've put / partitions on drives with no bios access at all that nothing knows about until the kernel is loaded and probes for them.

Okay, I get what you mean. But there is still the /boot mapping issue. In other words, whatever you boot (be it / or a separate /boot), you still have issues.

So I agree that disk labels are _great_ for all filesystems to avoid such things, even for GRUB. But there are still boot-time mapping issues before you get to that stage.

7284

Age (days ago)

7285

Last active (days ago)

discuss@lists.centos.org

22 comments

5 participants

tags (0)

participants (5)

Bryan J. Smith
Craig White
Edward Diener
Johnny Hughes
Les Mikesell