Folks-
I have had a machine dropped in my lap that I am trying to get CentOS 4.1 to run on as a first pass (the hope is for it to eventually run Rocks -- http://www.rocksclusters.org , which uses CentOS 4.x as its underlying OS).
The machine has 2 Opteron 250DP (2.4GHz) with 4 GB of RAM. It is using a Tyan Thunder K8WE (S2895A2NRF) motherboard with an nVidia nForce chipset -- http://www.tyan.com/products/html/thunderk8we.html. It is configured to boot from an SATA drive (80GB) and then use 3Ware 8506-8 card with 8x250GB SATA drives in a big raid 5 as a data store.
So, CentOS appears to recognize the nics and find the boot drive well enough to go through a complete install. Now this is where things get "interesting" .. the install process hung for a very long period of time during the "installing grub" phase (this was a cd install, so no post or anything, done off-line). It finally rebooted, but it just hangs at the "GRUB Loading stage2..."
If I boot the machine using the 4.1 install cd in rescue mode, I can see the filesystem (the OS is installed). /etc/grub.conf doesn't look unreasonable. One oddball thing is that I cannot see the 3ware card at all (doesn't show up in lspci output).
Any thoughts? Anyone set up a similar beastie?
Thanks.
Sean
Sean O'Connell oconnell@soe.ucsd.edu wrote:
The machine has 2 Opteron 250DP (2.4GHz) with 4 GB of RAM. It is using a Tyan Thunder K8WE (S2895A2NRF) motherboard\ with an nVidia nForce chipset -- http://www.tyan.com/products/html/thunderk8we.html.
Yes, it is a seriously sweet board!
It is configured to boot from an SATA drive (80GB) and then use 3Ware 8506-8 card with 8x250GB SATA drives in a big
raid
5 as a data store.
Okay, someone should be shot on that one. You should be able to use the 8506-8 "out-of-the-box" for boot, no "intermedia" disk required.
Option 1: *DEAD*SIMPLE*SOLUTION*
YANK OUT THE SATA DISK ON THE NVIDIA CHIPSET AND USE _ONLY_ THE 3WARE CARD! NOW RE-INSTALL.
Option 2: "Complex solution" (and why this happened)
I'm sure part of the problem is the load order of the nv_sata and 3w-xxxx driver. If the nv_sata loads first, which uses the generic SCSI interafces, and it will be /dev/sda. The 3w-xxxx volumes will /dev/sdb on-ward. You need to tell Linux the _exact_order_ of the host adapters that will map to SCSI cards. So ...
A. Decide who gets to boot first -- nVidia chipset SATA or 3Ware Escalade SATA. Then ...
B. BIOS -- The S2895 uses Phoenix ServerBIOS and will let you select the _exact_ card/[S]ATA channel that gets first boot. I.e., it will even "see" the 3Ware BIOS, and list that under the Boot selection. Make sure you decide which one, and then set that in the BIOS.
C. INITRD/GRUB -- You now need to setup /etc/modprobe.conf in Linux to do the same. E.g., if you select the nv_sata to boot first, /etc/modprobe.conf should have: alias scsi_hostadapter nv_sata alias scsi_hostadapter1 3w-xxxx
Now you'll need to remake the "initrd-*.img" for the kernel with "mkinitrd" -- e.g., mkinitrd /boot/initrd-`uname -r`-NVfirst.img `uname -r`
This will generate a new initrd file of /boot/initrd-(kernel-rev)-NVfirst.img. Now create another GRUB entry to use it and boot it. I.e., you can copy the existing entry, just change the "initrd" line to the new filename for the new entry.
D. GRUB MAP/INIT -- Verify your GRUB map file (/boot/grub/device.map) says the following ... (hd0) /dev/sda (hd1) /dev/sdb
[ hd2 if you have /dev/sdc, etc... ]
And then run: grub-install /dev/sda
NOTE: You may need to _reboot_ into the "Rescue Mode" _after_ modifying /etc/modprobe.conf so it reads the order of SCSI adapters. And even then, I'm not sure it will work, because the driver might assume the nv_sata is always first.
Which is why I recommend the "dead simple solution." Install with _only_ the 3Ware Escalade attached drives, and then the installer _should_ setup /etc/modprobe.conf with "alias scsi_hostadapter 3w-xxxx" and the initrd will be correct.
Also make sure you set the 3Ware to _always_ boot before the on-board ATA/SATA in the Phoenix ServerBIOS.
One oddball thing is that I cannot see the 3ware card at all (doesn't show up in lspci output).
Of course! Because it's very likely the nVidia SATA is loading _first_! That's the problem.
Any thoughts? Anyone set up a similar beastie?
All-the-time. If I have a 3Ware card, there is *0* reason to use the on-board SATA channels. If you do, make sure you put the 3Ware as the _first_ card in the /etc/modprobe.conf with a "scsi_hostadapter" alias.
Otherwise any other SCSI card might be assumed to be first, like the nVidia SATA with its nv_sata driver (which appears as SCSI).
-- Bryan
P.S. Remember, if you don't want to use RAID-5 for system volumes, you _can_ configure your 3Ware card with _multiple_ volumes. E.g., (2) RAID-1 /dev/sda for "System" (6) RAID-5 /dev/sdb for "Data"
In fact, this is what I do normally. Sometimes I'll do (4) RAID-10 and (4) RAID-5, if I want some faster RAID-10 storage for some data, using the RAID-5 volume for lesser and/or more "read-only" access.
This is a follow up to my battle of wills with this machine :)
1) I have managed to get this beastie to boot off the SATA drive - I had to change the disk access mode from DOS to OTHER in the BIOS - I had to disable the floppy settings in the BIOS (no floppy drive in the machine, but it was enabled in the bios)
After these two changes (oh, and reinstalling the machine), the machine worked fine. I had made the change in the bios and done a grub- install /dev/sda from chroot in rescue mode, but it panicd in the nv_sata coming back up. I'm guessing the different geometry settings caused it to barf.
2) The kernel/linux does *NOT* see the 3Ware card, which is in the 64 bit 133Mhz slot. Which according to the diagram at http://www.tyan.com/products/html/thunderk8we.html is on Bridge A.
Here is the listing for lspci
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:00.0 VGA compatible controller: nVidia Corporation NV45GL [Quadro FX 3400/4400] (rev a2) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
Should I not see the card? I wonder if this chipset is not fully supported by the 2.6.9-11ELspm kernel? Or does anyone have any suggestions for kernel flags?
In dmesg output I see
3ware Storage Controller device driver for Linux v1.26.00.039. 3w-xxxx: No cards found.
I could try moving the 3Ware card to one of the slower PCI-X slots and see if that helps. Perhaps, I will give this a whirl manana. I am at least encouraged that the bloody thing installs and boots on its own :)
Sean
On Mon, 2005-07-25 at 17:17 -0700, Sean O'Connell wrote:
This is a follow up to my battle of wills with this machine :)
- I have managed to get this beastie to boot off the SATA drive
- I had to change the disk access mode from DOS to OTHER in the BIOS
Interesting. I can't remember off the top of my head, but isn't there an "Auto"? When all else fails, "LBA" typically works -- especially if you're dual-booting.
- I had to disable the floppy settings in the BIOS (no floppy drive in
the machine, but it was enabled in the bios)
Shouldn't affect it, as floppy disks are assigned as BIOS disk 00h (A:), 01h (B:), and fixed disks are assigned BIOS disk 80h (C:), 81h (D:), etc...
After these two changes (oh, and reinstalling the machine), the machine worked fine. I had made the change in the bios and done a grub- install /dev/sda from chroot in rescue mode, but it panicd in the nv_sata coming back up. I'm guessing the different geometry settings caused it to barf.
Hmmm, depends. Linux is fairly good on auto-detecting geometry, even when the BIOS and legacy BIOS/DOS Disk Label differ.
The problem is if you wrote the GRUB MBR when you booting into the Rescue disk and it was using a different geometry. Then yes, that would dork it up. @-ppp
- The kernel/linux does *NOT* see the 3Ware card,
What about the BIOS? The ServerBIOS will list all storage cards it sees. It should let you select what boot device you want.
Also, try _manually_ loading the 3w-xxxx driver with "modprobe 3w-xxxx".
which is in the 64 bit 133Mhz slot. Which according to the diagram at http://www.tyan.com/products/html/thunderk8we.html is on Bridge A.
And the manual concurs with you too (see pages 9 & 10): ftp://ftp.tyan.com/manuals/m_s2895_100.pdf
Hmmm, maybe you should try putting it on Bridge B and closing jumper J92. That will force the PCI-X slots to 66MHz, which might be required to support the 64-bit@66MHz PCI 3Ware Escalade card.
I didn't run into this -- especially on the Bridge A which is a dedicated PCI-X slot for 1 card.
Here is the listing for lspci 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:00.0 VGA compatible controller: nVidia Corporation NV45GL [Quadro FX 3400/4400] (rev a2) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
Hmmm, it's like the PCI-X busses are not even there. Sometimes BIOSes can be configured to snoop all PCI busses. Also try resetting all configuration data.
This is very troubling.
Should I not see the card? I wonder if this chipset is not fully supported by the 2.6.9-11ELspm kernel?
Has _nothing_ to do with the chipset. All chipset are APIC/I2C compliant, and present a PCI bus as a PCI bus -- be it bridged, HT'd, etc... PCI, PCI-X, PCIe.
So, going back into the ServerBIOS, is there a setting for various card BIOS detections?
Or does anyone have any suggestions for kernel flags?
Hmmm, I don't think "noapic" will help you here.
It's clearly a PCI-X bus detection issue -- be it the POST not configuring the chipset registers, or the Linux kernel just not seeing anything.
I'd clearly point to the POST, if you're not seeing it as an available boot card in the BIOS.
In dmesg output I see 3ware Storage Controller device driver for Linux v1.26.00.039. 3w-xxxx: No cards found.
Hmmm, so you did try manually loading the driver, eh?
I could try moving the 3Ware card to one of the slower PCI-X slots and see if that helps. Perhaps, I will give this a whirl manana. I am at least encouraged that the bloody thing installs and boots on its own :)
Yeah, try Bridge B and slowing it down to 66MHz by closing J92.
BTW, you're not using a riser card, correct? I assume not (I see a PCIe video card), but I had to ask.
On Mon, 2005-07-25 at 20:53 -0500, Bryan J. Smith wrote:
BTW, you're not using a riser card, correct? I assume not (I see a PCIe video card), but I had to ask.
No riser card. This is a 3U box.
Sean
One last possibility.
When the first Opteron mainboards came out with the AMD8131 tunnel, Tyan had an issue when 4GiB of RAM was used. They recommended you enable the memory hole.
That issue should be _long_removed_ in the S2895, but it probably wouldn't hurt to enable any memory hole (above 3.65GiB), or actually remove 2GiB of RAM and see if it sees the card and its firmware in the BIOS (as well as Linux).
If so, get Tyan on the phone. You should _not_ be seeing that.
Bryan J. Smith wrote:
One last possibility.
When the first Opteron mainboards came out with the AMD8131 tunnel, Tyan had an issue when 4GiB of RAM was used. They recommended you enable the memory hole.
That issue should be _long_removed_ in the S2895, but it probably wouldn't hurt to enable any memory hole (above 3.65GiB), or actually remove 2GiB of RAM and see if it sees the card and its firmware in the BIOS (as well as Linux).
If so, get Tyan on the phone. You should _not_ be seeing that.
I 2nd that motion (remove 2 GB of RAM), I am also on the SuSE AMD64 list & there are dozens of threads about people having install problems with
2 GB of RAM onboard during install. They have various recommendations
to get around it (noapic during install, others, I don't recall them all), but get down to 2 GB RAM or less seems to cure a multitude of ills during install.
On Mon, 2005-07-25 at 22:23 -0500, William A. Mahaffey III wrote:
Bryan J. Smith wrote:
One last possibility.
When the first Opteron mainboards came out with the AMD8131 tunnel, Tyan had an issue when 4GiB of RAM was used. They recommended you enable the memory hole.
That issue should be _long_removed_ in the S2895, but it probably wouldn't hurt to enable any memory hole (above 3.65GiB), or actually remove 2GiB of RAM and see if it sees the card and its firmware in the BIOS (as well as Linux).
If so, get Tyan on the phone. You should _not_ be seeing that.
I 2nd that motion (remove 2 GB of RAM), I am also on the SuSE AMD64 list & there are dozens of threads about people having install problems with
2 GB of RAM onboard during install. They have various recommendations
to get around it (noapic during install, others, I don't recall them all), but get down to 2 GB RAM or less seems to cure a multitude of ills during install.
OK. But here's the problem. What difference will removing 2GB of Ram make any difference once the machine is back up and running with 4GB of RAM. I would feel better if the PCI bus actually saw the card. I will try doing a few things tomorrow:
1) Try moving the 3Ware card from the 133MHz slot to one of the 66MHz slots and see if linux sees the card. 2) I will possibly try removing 2GB of RAM and see if things magically start working :) 3) I'll poke around in the BIOS some more and reread the kernel parameters list again and see if something obvious appears.
On Mon, 2005-07-25 at 20:44 -0700, Sean O'Connell wrote:
OK. But here's the problem. What difference will removing 2GB of Ram make any difference once the machine is back up and running with 4GB of RAM.
The idea here is to _eliminate_ the fact that it could be a memory hole or 4GiB issue. If it _is_ the 4GiB issue, then Tyan has a _lot_ of explaining to do, and a fix to provide. I seriously doubt this is your issue, but I had to suggest it anyways.
would feel better if the PCI bus actually saw the card. I will try doing a few things tomorrow:
- Try moving the 3Ware card from the 133MHz slot to one of the 66MHz
slots and see if linux sees the card. 2) I will possibly try removing 2GB of RAM and see if things magically start working :) 3) I'll poke around in the BIOS some more and reread the kernel parameters list again and see if something obvious appears.
There's a lot of BIOS settings that could be affecting it at the post.
On Mon, 2005-07-25 at 20:51 -0500, Bryan J. Smith wrote:
- I had to disable the floppy settings in the BIOS (no floppy drive in
the machine, but it was enabled in the bios)
Shouldn't affect it, as floppy disks are assigned as BIOS disk 00h (A:), 01h (B:), and fixed disks are assigned BIOS disk 80h (C:), 81h (D:), etc...
I was kind of thinking this might be a red herring, but, unfortunately, the change was done at the same time as the other change. So much for the good scientific principle of changing one variable at a time.
Hmmm, depends. Linux is fairly good on auto-detecting geometry, even when the BIOS and legacy BIOS/DOS Disk Label differ.
The problem is if you wrote the GRUB MBR when you booting into the Rescue disk and it was using a different geometry. Then yes, that would dork it up. @-ppp
I don't think it ever was written to properly from install, and grub- install didn't work period until I made the two aforementioned changes.
- The kernel/linux does *NOT* see the 3Ware card,
What about the BIOS? The ServerBIOS will list all storage cards it sees. It should let you select what boot device you want.
I do see the 3Ware BIOS at boot. Trouble is once the box has booted, no love.
Hmmm, it's like the PCI-X busses are not even there. Sometimes BIOSes can be configured to snoop all PCI busses. Also try resetting all configuration data.
This is very troubling.
Should I not see the card? I wonder if this chipset is not fully supported by the 2.6.9-11ELspm kernel?
Has _nothing_ to do with the chipset. All chipset are APIC/I2C compliant, and present a PCI bus as a PCI bus -- be it bridged, HT'd, etc... PCI, PCI-X, PCIe.
So, going back into the ServerBIOS, is there a setting for various card BIOS detections?
I'll poke around some more manana.
Or does anyone have any suggestions for kernel flags?
Hmmm, I don't think "noapic" will help you here.
It's clearly a PCI-X bus detection issue -- be it the POST not configuring the chipset registers, or the Linux kernel just not seeing anything.
I'd clearly point to the POST, if you're not seeing it as an available boot card in the BIOS.
See above. Card is seen during POST.
In dmesg output I see 3ware Storage Controller device driver for Linux v1.26.00.039. 3w-xxxx: No cards found.
Hmmm, so you did try manually loading the driver, eh?
Yeppers. No love.
I could try moving the 3Ware card to one of the slower PCI-X slots and see if that helps. Perhaps, I will give this a whirl manana. I am at least encouraged that the bloody thing installs and boots on its own :)
Yeah, try Bridge B and slowing it down to 66MHz by closing J92.
On Mon, 2005-07-25 at 21:01 -0700, Sean O'Connell wrote:
I do see the 3Ware BIOS at boot. Trouble is once the box has booted, no love.
Oh, so you _are_ seeing the 3Ware BIOS.
If you go into the "Boot" portion of the Phoenix ServerBIOS, you should also see the 3Ware card as a boot option under disks (and can move around the order) -- correct?
So now it looks like it might be the Linux kernel.
See above. Card is seen during POST. Yeppers. No love.
Hmmm, it's a "long shot," but you could try the nForce package from nVidia. I seriously doubt it will do a thing, because the package is pretty much just peripheral support (ATA, NIC, audio etc...), GPL components that are already in stock kernel 2.4.23+/2.6.5+ (with exception of the older/alternative OSS audio and older NIC drivers).
The APIC, I2C, PCI, etc... issues are not what those packages address. I.e., when most people say "a chipset is not supported by Linux," they are talking about the peripheral components in the chipset, not the core APIC, I2C, PCI, etc...
BTW, I saw a note on the nVidia CK04 (nVidia Pro 2200) chipset in Red Hat Bugzilla, but it seemed unrelated. It was also for CentOS 3, not CentOS 4. No searches anywhere are turning up issues with 3Ware cards on the S2895 mainboard.
On Tue, 2005-07-26 at 00:01 -0500, Bryan J. Smith wrote:
On Mon, 2005-07-25 at 21:01 -0700, Sean O'Connell wrote:
I do see the 3Ware BIOS at boot. Trouble is once the box has booted, no love.
Oh, so you _are_ seeing the 3Ware BIOS.
If you go into the "Boot" portion of the Phoenix ServerBIOS, you should also see the 3Ware card as a boot option under disks (and can move around the order) -- correct?
So now it looks like it might be the Linux kernel.
See above. Card is seen during POST. Yeppers. No love.
Hmmm, it's a "long shot," but you could try the nForce package from nVidia. I seriously doubt it will do a thing, because the package is pretty much just peripheral support (ATA, NIC, audio etc...), GPL components that are already in stock kernel 2.4.23+/2.6.5+ (with exception of the older/alternative OSS audio and older NIC drivers).
The APIC, I2C, PCI, etc... issues are not what those packages address. I.e., when most people say "a chipset is not supported by Linux," they are talking about the peripheral components in the chipset, not the core APIC, I2C, PCI, etc...
BTW, I saw a note on the nVidia CK04 (nVidia Pro 2200) chipset in Red Hat Bugzilla, but it seemed unrelated. It was also for CentOS 3, not CentOS 4. No searches anywhere are turning up issues with 3Ware cards on the S2895 mainboard.
Hmmm... I need to see what version of the BIOS is on this thing. When in doubt, flash the BIOS.
http://www.tyan.com/support/html/b_s2895.html
One of the items in the listing is..
* Fixed some PCI-X device Option ROM does not scan or * initialize correctly
As for seeing the 3ware card under the boot order list, there are a couple of entries that might be the 3Ware card (it's not spelled out explicitly), and I don't recall the exact notation. One might be the PXE nic and the other could be the 3Ware card.
On Mon, 2005-07-25 at 22:15 -0700, Sean O'Connell wrote:
Hmmm... I need to see what version of the BIOS is on this thing. When in doubt, flash the BIOS. http://www.tyan.com/support/html/b_s2895.html
Ahhh, yes, I thought you would have already tried that. But, alas, I should have checked too, especially given ...
One of the items in the listing is..
- Fixed some PCI-X device Option ROM does not scan or
- initialize correctly
That's a damn big one. I think you found your issue. Hmmm, the 3Ware posts, but it might not be initializing correctly. But still, you'd figure the 3w-xxxx driver would see it. Maybe it's not initializing some registers for PCI-X access correctly?
On Tue, 2005-07-26 at 00:55 -0500, Bryan J. Smith wrote:
On Mon, 2005-07-25 at 22:15 -0700, Sean O'Connell wrote:
Hmmm... I need to see what version of the BIOS is on this thing. When in doubt, flash the BIOS. http://www.tyan.com/support/html/b_s2895.html
Ahhh, yes, I thought you would have already tried that. But, alas, I should have checked too, especially given ...
One thing I forgot to check. Foolish of me to think the vendor would have flashed it with the latest. Again, I'll know more tomorrow.
One of the items in the listing is..
- Fixed some PCI-X device Option ROM does not scan or
- initialize correctly
That's a damn big one. I think you found your issue. Hmmm, the 3Ware posts, but it might not be initializing correctly. But still, you'd figure the 3w-xxxx driver would see it. Maybe it's not initializing some registers for PCI-X access correctly?
That's my one thought. I think I'll check various combinations of moving the card and removing some memory, etc. It is odd that the card would POST and (probably) show up in the boot option menu but not be initialized properly. I just wish I knew what the BIOS would call the 3Ware card in that menu.
Folks-
I'm sure everyone is sitting on the edge of their seats awaiting the latest installment of this saga :) I have good news to report, the machine is up and running quite happily now and seeing the 3Ware card.
I think there were two big wins:
1) I updated the BIOS from 1.00.2895 to 1.01.2895, which is available from the Tyan site (http://www.tyan.com/support/html/b_s2895.html) This by itself made zero difference in the behavior of the system.
2) I added pci=bios to the kernel line in grub.conf and now I can see all of the PCI buses and the 3Ware card (one note the 3Ware card was not in the 133MHz PCI-X bus on bridge A -- it does work in that slot just fine -- I moved it back to the original location after a quick test). Without this kernel flag, I do not see the entire pci bus, and as you can see with this enabled all sorts of goodies appear.
lspci 00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:00.0 VGA compatible controller: nVidia Corporation NV45GL [Quadro FX 3400/4400] (rev a2) 08:0a.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 08:0a.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 08:0b.0 PCI bridge: Advanced Micro Devices [AMD] AMD-8131 PCI-X Bridge (rev 12) 08:0b.1 PIC: Advanced Micro Devices [AMD] AMD-8131 PCI-X IOAPIC (rev 01) 0a:09.0 RAID bus controller: 3ware Inc 3ware Inc 3ware 7xxx/8xxx-series PATA/SATA-RAID (rev 01) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
Compare this to an earlier email with the machine booted w/o that flag:
00:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 00:01.0 ISA bridge: nVidia Corporation CK804 ISA Bridge (rev a3) 00:01.1 SMBus: nVidia Corporation CK804 SMBus (rev a2) 00:02.0 USB Controller: nVidia Corporation CK804 USB Controller (rev a2) 00:02.1 USB Controller: nVidia Corporation CK804 USB Controller (rev a3) 00:04.0 Multimedia audio controller: nVidia Corporation CK804 AC'97 Audio Controller (rev a2) 00:06.0 IDE interface: nVidia Corporation CK804 IDE (rev a2) 00:07.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:08.0 IDE interface: nVidia Corporation CK804 Serial ATA Controller (rev a3) 00:09.0 PCI bridge: nVidia Corporation CK804 PCI Bridge (rev a2) 00:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 00:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3) 01:05.0 FireWire (IEEE 1394): Texas Instruments TSB43AB22/A IEEE-1394a-2000 Controller (PHY/Link) 02:00.0 VGA compatible controller: nVidia Corporation NV45GL [Quadro FX 3400/4400] (rev a2) 80:00.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:01.0 Memory controller: nVidia Corporation CK804 Memory Controller (rev a3) 80:0a.0 Ethernet controller: nVidia Corporation CK804 Ethernet Controller (rev a3) 80:0e.0 PCI bridge: nVidia Corporation CK804 PCIE Bridge (rev a3)
Also, as a further test, I reset the BIOS to their default settings and the machine works just fine. It looks a combination of updated BIOS and of course the kernel flags results in a functional machine.
Thanks Sean
On Tue, 2005-07-26 at 15:30 -0700, Sean O'Connell wrote:
- I added pci=bios to the kernel line in grub.conf and now I can see
all of the PCI buses and the 3Ware card (one note the 3Ware card was not in the 133MHz PCI-X bus on bridge A -- it does work in that slot just fine -- I moved it back to the original location after a quick test). ... 00:18.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:18.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:18.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:18.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control 00:19.0 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] HyperTransport Technology Configuration 00:19.1 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Address Map 00:19.2 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] DRAM Controller 00:19.3 Host bridge: Advanced Micro Devices [AMD] K8 [Athlon64/Opteron] Miscellaneous Control
There's the AMD8131 dual PCI-X HyperTransport Tunnel. The kernel was not seeing it at all before, which is why your PCI-X channels were useless.
Why the kernel is not seeing it is beyond me. When you first posted, I didn't think a kernel flag would help because you were saying the BIOS didn't see it. You later then said it was, so I should have agreed with your prior assertion that a kernel flag might help.
Now here comes the biggie ... I typically use pci=nobios on a few mainboards when necessary. That's because pci=bios is (or was?) the default. So I didn't even think of it. The Boot Prompt HOWTO seems to collaborate this: http://www.tldp.org/HOWTO/BootPrompt-HOWTO-4.html#ss4.2
Now maybe I'm outta-date, since when did the default change to pci=nobios? Or maybe pci=bios now explicitly forces it to read the _entire_ BIOS configuration information, including extra PCI busses? This really disturbs me that the Linux kernel is not doing a good job of reading the entire PCI configuration from the BIOS -- unless there is a reason (stability?) for not doing so.
Of course, the nForce Pro 2200 + nForce Pro 2050 + AMD 8131 combination is new. Maybe the APIC settings aren't perfected. But then again, I'm still bothered that the kernel is supposed to read the BIOS by default, and your issue was solved by pci=bios which is supposed to be the default.
[ BTW, where did you find this suggestion? ]
Also, as a further test, I reset the BIOS to their default settings and the machine works just fine. It looks a combination of updated BIOS and of course the kernel flags results in a functional machine.
I would venture to say it was just "pci=bios".
I would really like to know the "root cause" of this. Especially since there are literally a half-dozen PCI busses on that mainboard.
On Tue, 2005-07-26 at 17:49 -0500, Bryan J. Smith wrote:
There's the AMD8131 dual PCI-X HyperTransport Tunnel. The kernel was not seeing it at all before, which is why your PCI-X channels were useless.
Makes sense.
Why the kernel is not seeing it is beyond me. When you first posted, I didn't think a kernel flag would help because you were saying the BIOS didn't see it. You later then said it was, so I should have agreed with your prior assertion that a kernel flag might help.
I may not have mentioned that it was POSTing, but I guess it didn't strike me as significant (I wouldn't have expected to see it if it hadn't POSTed :).
Now here comes the biggie ... I typically use pci=nobios on a few mainboards when necessary. That's because pci=bios is (or was?) the default. So I didn't even think of it. The Boot Prompt HOWTO seems to collaborate this: http://www.tldp.org/HOWTO/BootPrompt-HOWTO-4.html#ss4.2
Now maybe I'm outta-date, since when did the default change to pci=nobios? Or maybe pci=bios now explicitly forces it to read the _entire_ BIOS configuration information, including extra PCI busses? This really disturbs me that the Linux kernel is not doing a good job of reading the entire PCI configuration from the BIOS -- unless there is a reason (stability?) for not doing so.
Of course, the nForce Pro 2200 + nForce Pro 2050 + AMD 8131 combination is new. Maybe the APIC settings aren't perfected. But then again, I'm still bothered that the kernel is supposed to read the BIOS by default, and your issue was solved by pci=bios which is supposed to be the default.
I believe the kernels use ACPI to enumerate PCI buses. Maybe a kernel guru could chime in? :) As one of the potential kernel parameters is pci=noacpi
noacpi [IA-32] Do not use ACPI for IRQ routing or for PCI scanning.
For S&G, I did try booting the machine with acpi=off and it panicd :)
[ BTW, where did you find this suggestion? ]
I was re-reading the various boot flag options in kernel-paramaters.txt (yum install kernel-doc), and I found a few that looked promising (see below). I tried pci=biosirq first (you can see why :), but then tried pci=bios
bios [IA-32] force use of PCI BIOS, don't access the hardware directly. Use this if your machine has a non-standard PCI host bridge.
biosirq [IA-32] Use PCI BIOS calls to get the interrupt routing table. These calls are known to be buggy on several machines and they hang the machine when used, but on other computers it's the only way to get the interrupt routing table. Try this option if the kernel is unable to allocate IRQs or discover secondary PCI buses on your motherboard.
I was also pondering the use of (but wasn't really sure how to go about the value of N).
lastbus=N [IA-32] Scan all buses till bus #N. Can be useful if the kernel is unable to find your secondary buses and you want to tell it explicitly which ones they are.
Also, as a further test, I reset the BIOS to their default settings and the machine works just fine. It looks a combination of updated BIOS and of course the kernel flags results in a functional machine.
I would venture to say it was just "pci=bios".
I'm fairly sure, but the newer BIOS seemed to have a few more options and reorganizes things a bit more nicely.
I would really like to know the "root cause" of this. Especially since there are literally a half-dozen PCI busses on that mainboard.
If someone has an RHEL account/entitlement (I don't), they could file a bug against the upstream kernel.
Sean
On Tue, 2005-07-26 at 16:05 -0700, Sean O'Connell wrote:
I may not have mentioned that it was POSTing, but I guess it didn't strike me as significant (I wouldn't have expected to see it if it hadn't POSTed :).
You mentioned it later, which made me switch my focus away from the BIOS to the kernel again. So your initial instinct was correct, and I should have realized that.
I believe the kernels use ACPI to enumerate PCI buses. Maybe a kernel guru could chime in? :) As one of the potential kernel parameters is pci=noacpi noacpi [IA-32] Do not use ACPI for IRQ routing or for PCI scanning.
Yeah, now it makes sense. I bet ACPI trumps the PCI setting, and is probably set to "pci=nobios" by default in kernel 2.6. I'm sure that works for Intel and lower-end AMD board with only a few PCI channels.
And not this monster. ;->
BTW, I'm running RHEL3 on the S2895s I asssembled, which are kernel 2.4 without ACPI, hence why I didn't run into it. I loaded the nvnet for the NIC (didn't trust forcedeth at the time), and didn't worry about any other peripherals since I was using the 3Ware card.
For S&G, I did try booting the machine with acpi=off and it panicd :)
Yep.
[ BTW, where did you find this suggestion? ]
I was re-reading the various boot flag options in kernel-paramaters.txt (yum install kernel-doc), and I found a few that looked promising (see below). I tried pci=biosirq first (you can see why :), but then tried pci=bios
I'd:
1) Add a RHEL Bugzilla entry 2) Drop a note to Tyan (so they are aware) 3) Drop a note to 3Ware (so people are aware)
bios [IA-32] force use of PCI BIOS, don't access the hardware directly. Use this if your machine has a non-standard PCI host bridge.
What do they mean by "non-standard"?
Everything's been pretty much "non-standard" every since AMD left GTL and adopted EV6. But even then, there wasn't much difference between bridged (Intel GTL) and switched (AMD EV6), and the latter emulated GTL from the APIC's standpoint.
Now with HyperTransport, we have a _real_ system interconnect -- no more fudging a peripheral interconnect like PCI as a system interconnect. But even then, HyperTransport is supposed handle this.
The only thing I can think of is the fact that unlike connecting an AMD 8131 either directly to the CPU or an AMD 8151, which are the same vendor, connecting an AMD 8131 to a nForce Pro 2200 or nForce Pro 2050 may result in the kernel getting confused on vendor IDs at the APIC/ACPI level. So maybe it just ignores the fact that the AMD 8131 is there.
Then again, people previously connected the AMD 8131 to the nForce3 too, and didn't have an issue. So I don't know what's up here -- other than the sheer number of PCI channels!
I mean, the sucker's got: 2x20 PCIe channels (the nForce Pro 2200 and the nForce Pro 2050) 2 PCI-X channels (the AMD8131) 1 legacy PCI channel (the nForce Pro 2200)
I was also pondering the use of (but wasn't really sure how to go about the value of N). lastbus=N [IA-32] Scan all buses till bus #N. Can be useful if the kernel is unable to find your secondary buses and you want to tell it explicitly which ones they are.
Bus assignment is arbitrary, that's the problem. I guess it's the sheer number of PCI busses. More than anything I've seen, other than on a HP DL585 or Sun Sunfire v40z.
I'm fairly sure, but the newer BIOS seemed to have a few more options and reorganizes things a bit more nicely.
I no longer have the S2895 systems I assembled, so I can't try it.
If someone has an RHEL account/entitlement (I don't), they could file a bug against the upstream kernel.
???
You don't have to have a RHEL account/entitlement to file a bug in Bugzilla. Just create a new account and have fun! https://bugzilla.redhat.com/bugzilla/createaccount.cgi