I have a new machine I'm trying to install Centos 5.0 on and I'm not getting very far.
The system is 2 dual core xeons (5160, 3.0 GHZ) w/ 8GB ram. It has two 320 GB disks on the motherboard controller (Supermicro X7DAE+), and 8 750 GB disks on a 3ware 9650SE-8ml, pcie (x4) controller card. The 8 disks are set up as two raid 5 volumes (4 disks each).
There is a scsi card in the machine w/ nothing attached to it.
The graphics card is na NVIDIA Quadro FX 1500 (pci express x16).
The intent is to install the OS onto the 2-320GB drives on the motherboard controller (preferrably in a raid 1 configuration). The other disks are for our data requirements.
1)I used bit torrent (azureus on windows) to download the dvd iso for Centos 5.0, and it completed without any errors. I believe it does the checksumming verification automatically. I also ran sha1sum against the image, and it came out fine.
2)I burned the image to a dvd using roxio. No errors. When I couldn't get down the road, I burned another copy with no errors.
3)During the install, I verified the media with no errors for both of the disks.
4)I downloaded the driver for this OS and raid card from AMCC-3ware site and made a driver floppy.
5)I booted the dvd and ran "linux dd" to do the install.
Should the graphical installer work on an nvidia quadro fx1500 graphics card? At the present it doesn't appear to work for me, I get hash all over the screen, once X starts.
6)I booted the dvd again and ran "linux text dd". I verified my dvd media without problems, and it reads the driver floppy and loads the 3w-9xxx driver. It asks questions about lang, kbd and timezone.
7)For partitions, I selected custom, created a /boot, /, swap and /home on the first 320 GB disk (it turns out to be sdc, with sda and sdb being the big raid volumes). After the grub section (I told it to put grub on the /boot partition) the screen is blue, and it just sis without any further response (I left it over night, so it should have finished).
8)If I press alt-f3, the last thing I see is: 13:18:16 INFO : Moving (1) to step reposetup
9)If I press alt-f4, the last thing I see is: <5>SQUASHFS error : sb_bread failed reading block 0x6acc <5>SQUASHFS error : unable to read page, block 1aaa0d9, size 9154
8)I'm at a loss as to what to try next, or how to find out what is wrong.
thanks in advance for any and all help, -chuck
By the Way, a knoppix 3.8 live cd will boot and run fine on the hardware.
Chuck Campbell campbell@accelinc.com wrote: I have a new machine I'm trying to install Centos 5.0 on and I'm not getting very far.
The system is 2 dual core xeons (5160, 3.0 GHZ) w/ 8GB ram. It has two 320 GB disks on the motherboard controller (Supermicro X7DAE+), and 8 750 GB disks on a 3ware 9650SE-8ml, pcie (x4) controller card. The 8 disks are set up as two raid 5 volumes (4 disks each).
There is a scsi card in the machine w/ nothing attached to it.
The graphics card is na NVIDIA Quadro FX 1500 (pci express x16).
The intent is to install the OS onto the 2-320GB drives on the motherboard controller (preferrably in a raid 1 configuration). The other disks are for our data requirements.
1)I used bit torrent (azureus on windows) to download the dvd iso for Centos 5.0, and it completed without any errors. I believe it does the checksumming verification automatically. I also ran sha1sum against the image, and it came out fine.
2)I burned the image to a dvd using roxio. No errors. When I couldn't get down the road, I burned another copy with no errors.
3)During the install, I verified the media with no errors for both of the disks.
4)I downloaded the driver for this OS and raid card from AMCC-3ware site and made a driver floppy.
5)I booted the dvd and ran "linux dd" to do the install.
Should the graphical installer work on an nvidia quadro fx1500 graphics card? At the present it doesn't appear to work for me, I get hash all over the screen, once X starts.
6)I booted the dvd again and ran "linux text dd". I verified my dvd media without problems, and it reads the driver floppy and loads the 3w-9xxx driver. It asks questions about lang, kbd and timezone.
7)For partitions, I selected custom, created a /boot, /, swap and /home on the first 320 GB disk (it turns out to be sdc, with sda and sdb being the big raid volumes). After the grub section (I told it to put grub on the /boot partition) the screen is blue, and it just sis without any further response (I left it over night, so it should have finished).
8)If I press alt-f3, the last thing I see is: 13:18:16 INFO : Moving (1) to step reposetup
9)If I press alt-f4, the last thing I see is: <5>SQUASHFS error : sb_bread failed reading block 0x6acc <5>SQUASHFS error : unable to read page, block 1aaa0d9, size 9154
8)I'm at a loss as to what to try next, or how to find out what is wrong.
Chuck,
I'm suprised that the raid array wasn't named as /dev/mapper/isw_xxxyyyxxx
to be named as /dev/sdc suggests that anaconda didn't use dmraid. To be sure that the installer missed using dmraid, you could do a quick knoppix (4.0+) live session and try to mount and read the fakeraid array named above.
If you don't find the isw_ device, then you will have to redo the install, adding the dmraid kernel parameter along with "dd text dmraid".
As far as the blank screen, while in the knoppix session try to see if your xorg.conf is missing a modeline. If missing, add a modeline suitable for your monitor.
Section Screen ... Modes "1280x1024" EndSection
On Fri, Sep 07, 2007 at 03:32:55PM -0700, mark pryor wrote:
Chuck Campbell campbell@accelinc.com wrote: I have a new machine I'm trying to install Centos 5.0 on and I'm not getting very far.
Chuck,
I'm suprised that the raid array wasn't named as /dev/mapper/isw_xxxyyyxxx
raid arrays are real hw raid on the 3ware card, and show up as very large disks.
I was trying to install to a single drive (non raid) in the earlier messages.
to be named as /dev/sdc suggests that anaconda didn't use dmraid. To be sure that the installer missed using dmraid, you could do a quick knoppix (4.0+) live session and try to mount and read the fakeraid array named above.
There are no fake raid arrays, just the hw raid arrays and the two individual disks, which show up as sda and sdb (very large 2Tb disks) and sdc, sdd which are the two single disks.
If you don't find the isw_ device, then you will have to redo the install, adding the dmraid kernel parameter along with "dd text dmraid".
The install never runs, it just hangs as I described, so I have nothing on any of the disks...
As far as the blank screen, while in the knoppix session try to see if your xorg.conf is missing a modeline. If missing, add a modeline suitable for your monitor.
Not a blank screen, a screen full of hash with an X cursor which changes to the arrow, but I cvan't see anything in the the hashed up screen.
I'll look for the xorg.conf details in knoppix, but how do I use those to do a centos graphical install?
-chuck
Chuck Campbell campbell@accelinc.com wrote: On Fri, Sep 07, 2007 at 03:32:55PM -0700, mark pryor wrote:
Chuck Campbell wrote: I have a new machine I'm trying to install Centos 5.0 on and I'm not getting very far.
Chuck,
I'm suprised that the raid array wasn't named as /dev/mapper/isw_xxxyyyxxx
raid arrays are real hw raid on the 3ware card, and show up as very large disks.
I was trying to install to a single drive (non raid) in the earlier messages.
This is what you said in the OP <quote> The intent is to install the OS onto the 2-320GB drives on the motherboard controller (preferrably in a raid 1 configuration). The other disks are for our data requirements.
</quote>
The MB controller is fakeraid and to use it would require the dmraid support in the install. Was your MB setup by the reseller with the 2 320 GB drives in Raid1? What shows in the Intel Matrix Raid bios?
I have installed Fedora on such a SuperMicro board and we went Raid1 using the onboard device. What's easy to mess up is the boot order menu. If you want to boot from the Raid1 array, you have to bring it in as one of the choices. If you have never setup Linux on a SuperMicro its a little tricky.
On Sat, Sep 08, 2007 at 01:07:50PM -0700, mark pryor wrote:
This is what you said in the OP
<quote> The intent is to install the OS onto the 2-320GB drives on the motherboard controller (preferrably in a raid 1 configuration). The other disks are for our data requirements.
</quote>
Yes, but it turned out that my install media was indeed corrupt, even though it passed media verification.
I got a new iso image for the install dvd from a different location, and burned a new disc. I did an install from this w/o the earlier reported hang, or SQUASHFS errors.
I couldn't figure out how to do the raid1 boot device on the MB controller, or how to do it via s/w raid in the installer, so I installed onto a single disc on the MB controller. I now have 2 individual disks which show up as sda and sdb. I did custom partitioning in the install and set up a /boot, / and swap partitions on sda. I set up additional swap and other partitions on sdb.
During the install, I deselected the sdc and sdd, devices during partitioning (these are the raid arrays on the 3ware card). Each is over 2 TB as they stand, so I was afraid of problems with the mke2fs step, and thought I'd create filesystems after initial boot. I believe I need to use 4KB block size (or maybe 8KB) to get 2TB filesystems, and wasn't sure the installer would do this correctly.
The installer saw the 3ware devices (2 of them) because I loaded a driver from floppy (linux text dd). I just didn't use them in the install steps.
I ran the install as described above and put grub on the MBR of sda.
The system now boots, so I ran a yum update, which updated 156 packages. The kernel was updated too, so I set it up to boot the new xen kernel.
The large arrays on the 3ware card don't seem to be recognized either before or after the yum update. The /var/log/messages file shows the 3ware card was found, but doesn't seem to find any exported devices...
The smartd man page indicates I need to use /dev/twaN in the smartd.conf file, but these device files don't exist.
I'm stuck and at a loss on how to find these 3ware arrays to put filesystems on.
The MB controller is fakeraid and to use it would require the dmraid support in the install.
is this through a driver disk? I'll have to delve into this another day, I need to get this machine online with the big raid arrays useable ASAP.
Was your MB setup by the reseller with the 2 320 GB drives in Raid1? What shows in the Intel Matrix Raid bios?
No, I added the disks after the machine arrived w/o any OS installed. At boot time I see the six onboard slots, with two 400gb grives recognized.
Following this, the 3ware bios reports the other 8 disks in two arrays.
Lastly the adaptec scsi card bios shows no devices attached (this if for the tape drives later).
I have installed Fedora on such a SuperMicro board and we went Raid1 using the onboard device. What's easy to mess up is the boot order menu. If you want to boot from the Raid1 array, you have to bring it in as one of the choices. If you have never setup Linux on a SuperMicro its a little tricky.
I have, but not as raid1. I still haven't done a raid1, because I just installed on a single disk to get down the road.
Unfortunately I can't see my 3ware raid arrays now... I'm getting a bit frustrated.
-chuck
Chuck Campbell wrote:
Unfortunately I can't see my 3ware raid arrays now... I'm getting a bit frustrated.
Are you using a 3ware-96xx card that you need a driver disk ?
Chuck Campbell wrote:
The system now boots, so I ran a yum update, which updated 156 packages. The kernel was updated too, so I set it up to boot the new xen kernel.
Depending on the way your Driverdisk is setup - it would have only installed the drivers for the kernel you installed initially. So if you have problems talking to the 3ware drives, try booting from that kernel instead.
If you want to keep the driver in-place even when the kernel updates, you might want to investigate the weak-updates process and how you might get a driver included into that. Pretty much everything you need to make it happen would be on the system already.
On Tue, Sep 11, 2007 at 05:19:07PM +0100, Karanbir Singh wrote:
Chuck Campbell wrote:
The system now boots, so I ran a yum update, which updated 156 packages. The kernel was updated too, so I set it up to boot the new xen kernel.
Depending on the way your Driverdisk is setup - it would have only installed the drivers for the kernel you installed initially. So if you have problems talking to the 3ware drives, try booting from that kernel instead.
If you want to keep the driver in-place even when the kernel updates, you might want to investigate the weak-updates process and how you might get a driver included into that. Pretty much everything you need to make it happen would be on the system already.
yes, it is a 3ware 9650SE-8ML.
I used the driver disk during the install, and the installer saw the raid devices. I deselected them during the partitioning of the install disks, and the installed system doesn't see those devices.
I then updated the kernel and the new kernel doesn't see the devices either.
When I boot the old kernel again, it still doesn't see the devices.
Bothe kernels see the card though (looking in /var/log/messages after boot.
-chuck
Chuck Campbell wrote:
Bothe kernels see the card though (looking in /var/log/messages after boot.
Can you post the output from 'dmesg; lsmod; lspci -n' booting the installtime kernel at http://pastebin.ca/ and post the url to that here..
On Tue, Sep 11, 2007 at 10:37:05PM +0100, Karanbir Singh wrote:
Chuck Campbell wrote:
Bothe kernels see the card though (looking in /var/log/messages after boot.
Can you post the output from 'dmesg; lsmod; lspci -n' booting the installtime kernel at http://pastebin.ca/ and post the url to that here..
I concatenated dmesg, lsmod and lspci outputs from booting the install kernel into a single file and put it here:
Glancing through dmesg I do see the 3ware controller and sda, sdb, sdc and sdd, so in the install kernel, it looks like everything is recognized.
I then booted the updated kernel and reran the dmesg, lsmod and lspci commands, then concatenated the outputs into another file and put it here:
I see differences with respect to the 3ware stuff and disks recognized, but I don't know how to reconcile them with the new kernel. Both kernels seem to load the 3ware module (lsmod output), but the updated kernel doesn't see the raid devices (only /dev/sda and /dev/sdb).
-chuck
--
Chuck Campbell wrote:
As you have already pointed out in this email, yes - the installtime kernel does see the drives fine.
On Wed, Sep 12, 2007 at 08:06:03PM +0100, Karanbir Singh wrote:
Chuck Campbell wrote:
As you have already pointed out in this email, yes - the installtime kernel does see the drives fine.
Thanks for your help, I appreciate it!
-chuck
Chuck Campbell wrote:
As you have already pointed out in this email, yes - the installtime kernel does see the drives fine.
Thanks for your help, I appreciate it!
You are most welcome. However, you can repay me!
I am going to follow up on your next post with details on howto make weak-updates working in your case. You can repay me by writing up a wiki page [1] and getting that setup at wiki.centos.org for people who come along with the same issue :)
[1] Maybe Phil Schaffner can help, he seems to be in a similar situation at the moment.
On Tue, Sep 11, 2007 at 05:19:07PM +0100, Karanbir Singh wrote:
If you want to keep the driver in-place even when the kernel updates, you might want to investigate the weak-updates process and how you might get a driver included into that. Pretty much everything you need to make it happen would be on the system already.
Where do I find info about this? I suspect I will need to do this with every kernel update???
-chuck --
Chuck Campbell wrote:
If you want to keep the driver in-place even when the kernel updates, you might want to investigate the weak-updates process and how you might get a driver included into that. Pretty much everything you need to make it happen would be on the system already.
Where do I find info about this? I suspect I will need to do this with every kernel update???
how exactly where you planning on managing out-of-tree kernel drivers otherwise ?
btw, since this is a stable distro you are using, the chances are that the same driver will work through the life of the product. Try this command : /sbin/weak-modules and register the driver you have against that. Then reinstall the updated kernel and the driver should move along.
I shall try and do some more specific docs on this, in the centos wiki, over the next few days.
On Wed, Sep 12, 2007 at 08:10:32PM +0100, Karanbir Singh wrote:
Chuck Campbell wrote:
If you want to keep the driver in-place even when the kernel updates, you might want to investigate the weak-updates process and how you might get a driver included into that. Pretty much everything you need to make it happen would be on the system already.
Where do I find info about this? I suspect I will need to do this with every kernel update???
how exactly where you planning on managing out-of-tree kernel drivers otherwise ?
I've no idea... I've never had to deal with this before, so I didn't even understand this could be an issue.
It raises more questions for me than I had thought of previously though. I have more homework ahead of me :-)
btw, since this is a stable distro you are using, the chances are that the same driver will work through the life of the product. Try this command : /sbin/weak-modules and register the driver you have against that. Then reinstall the updated kernel and the driver should move along.
I will try this in a few moments. One last observation though, 3Ware has a newer driver for the updated kernel. If I wish to use it, is it a simple matter of replacing the 3w-9xxx.ko file with the appropriate one? If it is more complicated than that, where do I find info about this issue?
I shall try and do some more specific docs on this, in the centos wiki, over the next few days.
Thanks, I'm anxious to learn, since I'll probably need to deploy more systems with this (and other similar) issue(s).
-chuck
Chuck Campbell wrote:
how exactly where you planning on managing out-of-tree kernel drivers otherwise ?
I've no idea... I've never had to deal with this before, so I didn't even understand this could be an issue.
This is one the major issues with the Linux process these days, as you move from kernel to kernel there is almost zero assurance of driver abi/api stability - and that in turn creates a situation like this wherein one kernel works while another does not. Its enough of a situation that on a lot of platforms sysadmins will not upgrade a kernel unless they really really need to. On CentOS and the EL codebase, this isnt so much of an issue because upstream do some work on trying to make sure they dont break driver compatibility. If they do break this compatibility, its easy to detect.
And most of the heavy lifting is getting done by a fairly simple shell script called weak-modules, based in /sbin/ and comes from module-init-tools.rpm
weak-modules will basically take a given driver .ko and check what other kernels installed at this time will work with it, it will then create the symlinks for each of those kernels to point at this .ko. It will then check each initrd in the /boot dir, and update each initrd for kernels it found compatible with the driver. Rather than overwrite the initrd, it will create a new one with the same-name but followed by a number. It will then edit /etc/grub.conf and add a *new* section for this just created initrd. So when you reboot the machine you have the choice to boot the kernel.rpm shipped initrd or the newly updated one.
Ok, so how does this work ? lets say you have drivers ( from install time ) in /lib/modules/2.6.8-8.el5/updates/
1) sudo to root
2) find /lib/modules/2.6.8-8.el5/updates | /sbin/weak-modules --add-modules
3) watch the blinking lights, depending on how many kernels you have installed it could be a few seconds
4) check /boot/ and make sure you have the new updated initrd's for all kernels you thought it would work with.
5) check /etc/grub.conf for new sections [1]
6) reboot with whatever kernel + initrd you want
7) all further kernels brought down by yum from the centos repos will auto magically get this driver included in the initrd. ( rpm -q --scripts kernel-version will show you what happens in the post install section, and how weak-updates does an --add-kernel )
thats about it. Give it a shot and let me know how you get along. I am writing this from memory so i might have missed something :) One way to find out....
btw, before everyone crys foul over where the drivers are put insalltime - remember, even when the kernel rpm is removed, because there are files under the tree that are not owner by rpm, it wont zap those files. So once in the system your driver is going to stay there. Keep that in mind.
I will try this in a few moments. One last observation though, 3Ware has a newer driver for the updated kernel. If I wish to use it, is it a simple matter of replacing the 3w-9xxx.ko file with the appropriate one? If it is more complicated than that, where do I find info about this issue?
well, in this case, build that .ko agains the oldest kernel-devel you have on the machine ( ideally, I should say only against the installtime kernel, but lots of people dont have that hanging around :/ ) and weak-modules should do its magic.
I shall try and do some more specific docs on this, in the centos wiki, over the next few days.
Thanks, I'm anxious to learn, since I'll probably need to deploy more systems with this (and other similar) issue(s).
Slight change in plan, I did this braindump and you get to write the wiki page :) You might also want to look and see how the /lib/modules/<kver>/extras/ directory contents are handled and include some info on that. Since that would basically address non install time .ko
[1] I know its called "Red Hat Updated Driver Model", if someone files a bug at http://bugs.centos.org/ I will make sure that its changed for next update.
On Thu, Sep 13, 2007 at 10:11:35PM +0100, Karanbir Singh wrote:
This is one the major issues with the Linux process these days, as you move from kernel to kernel there is almost zero assurance of driver abi/api stability - and that in turn creates a situation like this wherein one kernel works while another does not. Its enough of a situation that on a lot of platforms sysadmins will not upgrade a kernel unless they really really need to. On CentOS and the EL codebase, this isnt so much of an issue because upstream do some work on trying to make sure they dont break driver compatibility. If they do break this compatibility, its easy to detect.
And most of the heavy lifting is getting done by a fairly simple shell script called weak-modules, based in /sbin/ and comes from module-init-tools.rpm
weak-modules will basically take a given driver .ko and check what other kernels installed at this time will work with it, it will then create the symlinks for each of those kernels to point at this .ko. It will then check each initrd in the /boot dir, and update each initrd for kernels it found compatible with the driver. Rather than overwrite the initrd, it will create a new one with the same-name but followed by a number. It will then edit /etc/grub.conf and add a *new* section for this just created initrd. So when you reboot the machine you have the choice to boot the kernel.rpm shipped initrd or the newly updated one.
Ok, so how does this work ? lets say you have drivers ( from install time ) in /lib/modules/2.6.8-8.el5/updates/
sudo to root
find /lib/modules/2.6.8-8.el5/updates | /sbin/weak-modules --add-modules
This didn't work. I did: ls -1 /lib/modules/2.6.8-8.el5/updates | /sbin/weak-modules --add-modules
- watch the blinking lights, depending on how many kernels you have
installed it could be a few seconds
- check /boot/ and make sure you have the new updated initrd's for all
kernels you thought it would work with.
check /etc/grub.conf for new sections [1]
reboot with whatever kernel + initrd you want
All worked fine for the 2.6.18-8.1.8.el5xen kernel.
- all further kernels brought down by yum from the centos repos will auto
magically get this driver included in the initrd. ( rpm -q --scripts kernel-version will show you what happens in the post install section, and how weak-updates does an --add-kernel )
New kernel was released, so I did yum update. The new kernel boots, but does not see the raid devices on the 3ware card. the update also seems to have removed my install kernel (2.6.18-8.el5xen). Did that step on something? If I boot the 2.6.18-8.1.8el5xen kernel, I still see my raid devices, so it worked for the first update...
I will try this in a few moments. One last observation though, 3Ware has a newer driver for the updated kernel. If I wish to use it, is it a simple matter of replacing the 3w-9xxx.ko file with the appropriate one? If it is more complicated than that, where do I find info about this issue?
well, in this case, build that .ko agains the oldest kernel-devel you have on the machine ( ideally, I should say only against the installtime kernel, but lots of people dont have that hanging around :/ ) and weak-modules should do its magic.
There is a pre-built (by 3ware) .ko file for 2.6.18-8.1.8.el5xen
I don't know what to do with it though.
Slight change in plan, I did this braindump and you get to write the wiki page :)
I've never done a wiki page, but I'm okay with writing this all up, once I understand it. I can put what you wrote above (with some minor fixes), but it didn't exactly work out for further kernel updates... Not sure what to say about that.
You might also want to look and see how the /lib/modules/<kver>/extras/ directory contents are handled and include some info on that. Since that would basically address non install time .ko
The extras dir in the orig install kernel tree is empty. So is the weak-updates dir, since the kernel update removed the install kernel???
The 2.6.18-8.1.8.el5xen tree has a weak updates tree which appears to duplicate the old lib/modules tree. It looks like this:
/lib/modules/2.6.18-8.1.8.el5xen/weak-updates/lib/modules/2.6.18-8.el5xen/updates/3w-9xxx.ko
So I'm completely confused at this point...
-chuck