Hi,
I am installing Centos ServerCD 4.4 64-bit on a new Supermicro board, the RAID BIOS looks fine, RAID 5, all 4 Seagate ES 250 GB drives show up as a single array in the RAID BIOS tool, and the OS install phase completes without a hitch.
One thing I noticed was that the OS was loading the sata_nv driver and picking up the 4 Sata drives instead of the large RAID5 volume, it should just appear as one big 1TB drive. Usually it should see the RAID and not worry about loading the sata driver per se from what I remember...
When the OS install completes and I reboot, I simply get a blinking cursor.
On consulting Amax/ Supermicro, they write:
---snip---- Since this Supermicro model was just recently released, it is possible that the drivers on your installation disk are out of date. Supermicro suggested downloading the latest Redhat Driver from their FTP Site at:
ftp://ftp.supermicro.com/driver/SATA/nVidia/MCP55/Linux/Redhat choose either:
For 32-bit
File: nvsata-rhel4.4-0.11-1.21.i686.img ftp://ftp.supermicro.com/driver/SATA/nVidia/MCP55/Linux/Redhat/nvsata-rhel4.4-0.11-1.21.i686.img 1440 KB 7:03:00 PM
or for 64-bit
File: nvsata-rhel4.4-0.11-1.21.x86_64.img ftp://ftp.supermicro.com/driver/SATA/nVidia/MCP55/Linux/Redhat/nvsata-rhel4.4-0.11-1.21.x86_64.img 1440 KB 7:04:00 PM
----snip----
My question is how, during the install, do I specify this driver? Right now the OS simply reads out sata_nv as it progresses, but I see no interrupt point where it allows one to 're-specify' or otherwise encourage the OS to utilize this updated .img file supermicro is suggesting.
Any suggestions on the step-by-step sequence for this?
bonus/ p.s. -anyone had any experience with the Nvidia onboard RAID/ Supermicros? Did you like it, (I'm an LSI-Logic fan myself), just looking for opinions...
-karlski
Hello Karl,
Any suggestions on the step-by-step sequence for this?
bonus/ p.s. -anyone had any experience with the Nvidia onboard RAID/ Supermicros? Did you like it, (I'm an LSI-Logic fan myself), just looking for opinions...
NVRaid is not supported as there is no NVRaid driver for Linux. NVRaid also belongs to what Linux kernel developers call 'fake-raid'. It is really just some RAID Bios coupled with a software RAID driver.
You are better off using them as individual disks and Linux software raid instead.
so what's the sata_nv I see during the install, that's not a driver?
I'm familiar with fake-raid and usually avoid it, but as a dev box it seemed OK to let it slide and put the cash into more RAM & disks.
-krb
Feizhou wrote:
Hello Karl,
Any suggestions on the step-by-step sequence for this?
bonus/ p.s. -anyone had any experience with the Nvidia onboard RAID/ Supermicros? Did you like it, (I'm an LSI-Logic fan myself), just looking for opinions...
NVRaid is not supported as there is no NVRaid driver for Linux. NVRaid also belongs to what Linux kernel developers call 'fake-raid'. It is really just some RAID Bios coupled with a software RAID driver.
You are better off using them as individual disks and Linux software raid instead. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Karl R. Balsmeier wrote:
so what's the sata_nv I see during the install, that's not a driver?
I'm familiar with fake-raid and usually avoid it, but as a dev box it seemed OK to let it slide and put the cash into more RAM & disks.
-krb
Feizhou wrote:
Hello Karl,
Any suggestions on the step-by-step sequence for this?
bonus/ p.s. -anyone had any experience with the Nvidia onboard RAID/ Supermicros? Did you like it, (I'm an LSI-Logic fan myself), just looking for opinions...
NVRaid is not supported as there is no NVRaid driver for Linux. NVRaid also belongs to what Linux kernel developers call 'fake-raid'. It is really just some RAID Bios coupled with a software RAID driver.
You are better off using them as individual disks and Linux software raid instead.
the sata_nv is the interface driver for your sata interface.
- KB
PS: please refrain from top posting
Karanbir Singh wrote:
Karl R. Balsmeier wrote:
so what's the sata_nv I see during the install, that's not a driver?
I'm familiar with fake-raid and usually avoid it, but as a dev box it seemed OK to let it slide and put the cash into more RAM & disks.
-krb
Feizhou wrote:
Hello Karl,
Any suggestions on the step-by-step sequence for this?
bonus/ p.s. -anyone had any experience with the Nvidia onboard RAID/ Supermicros? Did you like it, (I'm an LSI-Logic fan myself), just looking for opinions...
NVRaid is not supported as there is no NVRaid driver for Linux. NVRaid also belongs to what Linux kernel developers call 'fake-raid'. It is really just some RAID Bios coupled with a software RAID driver.
You are better off using them as individual disks and Linux software raid instead.
the sata_nv is the interface driver for your sata interface.
- KB
PS: please refrain from top posting
by top posting you mean writing on top? OK. thanks for the info, very useful...
Karl R. Balsmeier wrote:
so what's the sata_nv I see during the install, that's not a driver?
I'm familiar with fake-raid and usually avoid it, but as a dev box it seemed OK to let it slide and put the cash into more RAM & disks.
sata_nv is the Nvidia sata controller driver. It does not have any raid code in it at all.
Karl R. Balsmeier wrote:
Hi,
I am installing Centos ServerCD 4.4 64-bit on a new Supermicro board, the RAID BIOS looks fine, RAID 5, all 4 Seagate ES 250 GB drives show up as a single array in the RAID BIOS tool, and the OS install phase completes without a hitch.
As others pointed out, the hardware you have has no RAID controller. It's just fake-RAID BIOS thing. Linux sees your hardware as what it really is. An ordinary SATA controller with 4 individual drives attached to it. That's why it loads sata_nv driver (which is correct driver for your hardware).
The reason why booting fails is most likely due to the fact that your BIOS is attempting to emulate (in software) RAID-5 volume when loading boot loader, kernel and initrd image. While in reality those are probably stored on your first drive.
Simply disable fake-RAID in BIOS. You might not even need to reinstall.
If you choose to reinstall, and want everything on RAID-5:
- disable fake-RAID in BIOS (let it be what it is, SATA controller with 4 individual drives) - create two small partitions (around 100MB) on first two drives. Configure them as Linux software RAID-1. Use it as /boot - use remaining space on all four drives as one big partition, create Linux software RAID-5, use that as physical volume, carve logical volumes for rest of your system out of it.
There's also a project that uses device-mapper and an user space utility to support fake-RAID functionality in BIOS. Basically, the user space utility reads out fake-RAID metadata and configures device mapper. However, this is still not something you would get working out of the box. You'd need to manually hack scripts in initrd image and add user space utility to it. Also, I don't think it would work with RAID-5. RAID-0, RAID-1, RAID-10 configurations might work.
bonus/ p.s. -anyone had any experience with the Nvidia onboard RAID/ Supermicros? Did you like it, (I'm an LSI-Logic fan myself), just looking for opinions...
Well. It is not RAID. Just an ordinary SATA controller. See above.
Hi all! I need this PHP extension. I'm in CentOS 4.4, plus extras, plus, dag, freshrpm and rpmforge repos, but I can't find it. I found libmcrypt, but PHP seems ignore it... Ah, I use PHP5, MySQL5 and Apache 2.2 I tried to hand write right conf of *.ini PHP conf files, and I tried to manually soft-link libraries in php.d/modules/.. folder, but nothing works. So I would know if there's a Centos repos for this extension. FC5 repos have it, but I don't know if including a Fedora repos into CentOS would be high risk....
Can someone make me know? Thank you guys, and girls, of course...! ;-)
Nando Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com
Greetings, Ferdinando.
On 5 ??????? 2006 ?., 12:15:09 you wrote:
Hi all! I need this PHP extension. I'm in CentOS 4.4, plus extras, plus, dag, freshrpm and rpmforge repos, but I can't find it. I found libmcrypt, but PHP seems ignore it... Ah, I use PHP5, MySQL5 and Apache 2.2 I tried to hand write right conf of *.ini PHP conf files, and I tried to manually soft-link libraries in php.d/modules/.. folder, but nothing works. So I would know if there's a Centos repos for this extension. FC5 repos have it, but I don't know if including a Fedora repos into CentOS would be high risk....
This question was discussed on the list about a month ago, you'd better search archives before posting.
The trouble with PHP5 mcrypt is that it is not enabled by default in CentOS4. You have two options: 1.Head on to http://mirror.centos.org/centos/4.4/centosplus/i386/SRPMS/, grab SRPMS for PHP5 version you need, correct spec file to include mcrypt and recompile it. You will have later to manually download-recompile-upgrade PHP, as your recompiled version won't be supported by up2date/yum. 2. Compile the module itself using PECL. This one should be much more easier and faster.
Greetings, Alexey.
On 5 декабря 2006 г., 12:29:24 you wrote:
- Compile the module itself using PECL. This one should be much more
easier and faster.
Some more on it. This extension is not available separatelly from PHP5 source distribution. So you'll need fetch php sources from php.net, extract them somewhere, head on to <sources_dir>/ext/ and copy somewhere mcrypt folder. Then you should go to that folder, run from there: # phpize # ./configure # make # make install-modules
It should be compiled and placed to your PHP_EXT dir. After that head on to a place in PHP configs where shared extensions get loaded and include there mcrypt.so. You should be done.
Thank you Alexey, today I'll try! Nando
Alexey Loukianov ha scritto:
Greetings, Alexey.
On 5 декабря 2006 г., 12:29:24 you wrote:
- Compile the module itself using PECL. This one should be much more
easier and faster.
Some more on it. This extension is not available separatelly from PHP5 source distribution. So you'll need fetch php sources from php.net, extract them somewhere, head on to <sources_dir>/ext/ and copy somewhere mcrypt folder. Then you should go to that folder, run from there: # phpize # ./configure # make # make install-modules
It should be compiled and placed to your PHP_EXT dir. After that head on to a place in PHP configs where shared extensions get loaded and include there mcrypt.so. You should be done.
Temporaly solved enabling Centos-Testing.repo http://dev.centos.org/centos/4/testing/i386/RPMS/
php-mcrypt and php-mhas was added at 05-Dec-2006 01:08
WOW, what a luck!!! Someone love me.... ;-)
Bye! Nando
Ferdinando Santacroce ha scritto:
Thank you Alexey, today I'll try! Nando
Alexey Loukianov ha scritto:
Greetings, Alexey.
On 5 декабря 2006 г., 12:29:24 you wrote:
- Compile the module itself using PECL. This one should be much more
easier and faster.
Some more on it. This extension is not available separatelly from PHP5 source distribution. So you'll need fetch php sources from php.net, extract them somewhere, head on to <sources_dir>/ext/ and copy somewhere mcrypt folder. Then you should go to that folder, run from there: # phpize # ./configure # make # make install-modules
It should be compiled and placed to your PHP_EXT dir. After that head on to a place in PHP configs where shared extensions get loaded and include there mcrypt.so. You should be done.
__________ Informazione NOD32 1900 (20061205) __________
Questo messaggio и stato controllato dal Sistema Antivirus NOD32 http://www.nod32.it
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
__________ Informazione NOD32 1900 (20061205) __________
Questo messaggio e` stato controllato dal Sistema Antivirus NOD32 http://www.nod32.it
Thank you Alexey! I'm reading this newsletter form 11/15/2006 only... :-( I search in forum, many quetions about it, but non answers... Yes, recompile can be a solution, but I don't want to do it, due tu updates that can broke it...
Ok, using pecl would be a better solution. I never use it, but now it is the time...
Thank you a lot! Nando
Alexey Loukianov ha scritto:
Greetings, Ferdinando.
On 5 ??????? 2006 ?., 12:15:09 you wrote:
Hi all! I need this PHP extension. I'm in CentOS 4.4, plus extras, plus, dag, freshrpm and rpmforge repos, but I can't find it. I found libmcrypt, but PHP seems ignore it... Ah, I use PHP5, MySQL5 and Apache 2.2 I tried to hand write right conf of *.ini PHP conf files, and I tried to manually soft-link libraries in php.d/modules/.. folder, but nothing works. So I would know if there's a Centos repos for this extension. FC5 repos have it, but I don't know if including a Fedora repos into CentOS would be high risk....
This question was discussed on the list about a month ago, you'd better search archives before posting.
The trouble with PHP5 mcrypt is that it is not enabled by default in CentOS4. You have two options: 1.Head on to http://mirror.centos.org/centos/4.4/centosplus/i386/SRPMS/, grab SRPMS for PHP5 version you need, correct spec file to include mcrypt and recompile it. You will have later to manually download-recompile-upgrade PHP, as your recompiled version won't be supported by up2date/yum. 2. Compile the module itself using PECL. This one should be much more easier and faster.
For PHP4 for Centos I have php-mcrypt and php-mhash at my site (http://tcs.uj.edu.pl/~buildcentos/) You can probably look at that diff and put together something for PHP5.
You need to:
install php5 src.rpm (get from centosplus) go to source spec directory (/usr/src/redhat/SPECS) and apply something along the lines of the diff linked above. compile via rpmbuild -ba --with=mcrypt php.spec (maybe ' ' instead of '=', can't remember) take generated php-mcrypt rpm (inside /usr/src/redhat/RPMS) and enjoy.
If you do more of this you should probably set up a non-root build environment.
Basically: create a new user (or use a current user), stick the above attachment (after editing) into ~user/.rpmmacros create ~user/rpm/{BUILD,RPMS,SOURCES,SPECS,SRPMS,tmp} directories
Use above process, except this time running as user and not as root, and replacing /usr/src/redhat with ~user/rpm
Another simpler solution is to as root run chown -R user:user /usr/src/redhat (but it's not really nice...)
Cheers, Maciej
On Tue, 5 Dec 2006, Ferdinando Santacroce wrote:
Hi all! I need this PHP extension. I'm in CentOS 4.4, plus extras, plus, dag, freshrpm and rpmforge repos, but I can't find it. I found libmcrypt, but PHP seems ignore it... Ah, I use PHP5, MySQL5 and Apache 2.2 I tried to hand write right conf of *.ini PHP conf files, and I tried to manually soft-link libraries in php.d/modules/.. folder, but nothing works. So I would know if there's a Centos repos for this extension. FC5 repos have it, but I don't know if including a Fedora repos into CentOS would be high risk....
Can someone make me know? Thank you guys, and girls, of course...! ;-)
Nando Chiacchiera con i tuoi amici in tempo reale! http://it.yahoo.com/mail_it/foot/*http://it.messenger.yahoo.com _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Aleksandar Milivojevic wrote:
Karl R. Balsmeier wrote:
Hi,
I am installing Centos ServerCD 4.4 64-bit on a new Supermicro board, the RAID BIOS looks fine, RAID 5, all 4 Seagate ES 250 GB drives show up as a single array in the RAID BIOS tool, and the OS install phase completes without a hitch.
As others pointed out, the hardware you have has no RAID controller. It's just fake-RAID BIOS thing. Linux sees your hardware as what it really is. An ordinary SATA controller with 4 individual drives attached to it. That's why it loads sata_nv driver (which is correct driver for your hardware).
The reason why booting fails is most likely due to the fact that your BIOS is attempting to emulate (in software) RAID-5 volume when loading boot loader, kernel and initrd image. While in reality those are probably stored on your first drive.
Simply disable fake-RAID in BIOS. You might not even need to reinstall.
If you choose to reinstall, and want everything on RAID-5:
- disable fake-RAID in BIOS (let it be what it is, SATA controller with
4 individual drives)
- create two small partitions (around 100MB) on first two drives.
Configure them as Linux software RAID-1. Use it as /boot
- use remaining space on all four drives as one big partition, create
Linux software RAID-5, use that as physical volume, carve logical volumes for rest of your system out of it.
There's also a project that uses device-mapper and an user space utility to support fake-RAID functionality in BIOS. Basically, the user space utility reads out fake-RAID metadata and configures device mapper. However, this is still not something you would get working out of the box. You'd need to manually hack scripts in initrd image and add user space utility to it. Also, I don't think it would work with RAID-5. RAID-0, RAID-1, RAID-10 configurations might work.
bonus/ p.s. -anyone had any experience with the Nvidia onboard RAID/ Supermicros? Did you like it, (I'm an LSI-Logic fan myself), just looking for opinions...
Well. It is not RAID. Just an ordinary SATA controller. See above.
disappointing, but it's true. Supermicro verified what all of you said. And since I'm not very interested in software RAID, i'll be dropping a hardware RAID card in and let my vendor know not to try and sell any more of these to those intending to use a Linux OS.
I'll try out your software RAID steps to give me something to play with till the hardware card shows up, since they are very clear and concise. Maybe it'll soften the empty 'no hardware raid5' feeling one has at present.
-krb
Karl R. Balsmeier wrote:
Well. It is not RAID. Just an ordinary SATA controller. See above.
disappointing, but it's true. Supermicro verified what all of you said. And since I'm not very interested in software RAID, i'll be dropping a hardware RAID card in and let my vendor know not to try and sell any more of these to those intending to use a Linux OS.
I'll try out your software RAID steps to give me something to play with till the hardware card shows up, since they are very clear and concise. Maybe it'll soften the empty 'no hardware raid5' feeling one has at present.
You might still have a decent motherboard even though the integrated "RAID" is bogus. I've got a few Supermicro boards like that too. When I need Hardware RAID, I just drop a 3Ware card in and call it a day. If you've got the spare cpu cycles, Linux's software RAID is pretty mature and quite fast. I was using it *in production* environments back in the late 90's.
Cheers,
On 06/12/06, Karl R. Balsmeier karl@klxsystems.net wrote:
The reason why booting fails is most likely due to the fact that your BIOS is attempting to emulate (in software) RAID-5 volume when loading boot loader, kernel and initrd image. While in reality those are probably stored on your first drive.
Copying over bootloader/grub on to all drives would then solve this?
Sudev Barar wrote:
On 06/12/06, Karl R. Balsmeier karl@klxsystems.net wrote:
The reason why booting fails is most likely due to the fact that your BIOS is attempting to emulate (in software) RAID-5 volume when loading boot loader, kernel and initrd image. While in reality those are probably stored on your first drive.
Copying over bootloader/grub on to all drives would then solve this?
No. It won't solve it.
Let say you configured fake-RAID, level 5, with stripe size of 64kB in the BIOS.
The installation was done onto first disk, since that is what Linux saw (4 individual drives). The BIOS thinks that what you have is 4 drives in software RAID-5, because that's how it was configured.
Depending on the location of RAID metadata on the drives, and other things, BIOS might load bootloader from MBR. MBR is in the first stripe, so it's in the same place on the first drive be it RAID-5 volume or single drive.
Bootloader than attempts to load stuff from your filesystems. It has to use BIOS calls. Now let say it needs to read first few MB from the disk (to load stage2, to load kernel, or whatever, not important what). That stuff is written by Linux on the first drive. What BIOS will do, it will read first 64k from disk 1, next 64k from disk2, next 64k from disk3, next 64k from disk2 and so on. Why? Because it thinks it is RAID-5 volume, and it attempts to emulate RAID-5. Of course, this is completely wrong. There's nothing on disks 2, 3 and 4.
Not to mention couple of other things. The checksums for RAID-5 will be completely wrong, if BIOS bothers to check them on reads at all. The partition table will be bogus (it was generated by Linux for single drive, and BIOS thinks it is RAID-5 volume). And so on, and so on...
disappointing, but it's true. Supermicro verified what all of you said. And since I'm not very interested in software RAID, i'll be dropping a hardware RAID card in and let my vendor know not to try and sell any more of these to those intending to use a Linux OS.
I'll try out your software RAID steps to give me something to play with till the hardware card shows up, since they are very clear and concise. Maybe it'll soften the empty 'no hardware raid5' feeling one has at present.
Hardware raid is not necessarily faster than software raid. If you are going to do raid5...make sure your card has plenty of cache. However, that may be moot anyway unless you use some filesystem other than ext3...is it a 3ware card? 3ware + RAID5 + ext3 = slow.
Feizhou wrote:
disappointing, but it's true. Supermicro verified what all of you said. And since I'm not very interested in software RAID, i'll be dropping a hardware RAID card in and let my vendor know not to try and sell any more of these to those intending to use a Linux OS.
I'll try out your software RAID steps to give me something to play with till the hardware card shows up, since they are very clear and concise. Maybe it'll soften the empty 'no hardware raid5' feeling one has at present.
Hardware raid is not necessarily faster than software raid. If you are going to do raid5...make sure your card has plenty of cache. However, that may be moot anyway unless you use some filesystem other than ext3...is it a 3ware card? 3ware + RAID5 + ext3 = slow.
Well, I had them ship me a 3ware 9550 card, and yes, am using ext3, -there's no onboard battery backup module, so the write cache is disabled. So this would be slow then? Am using this as a Java Development machine.
Any suggestions as to the filesystem type that would work best? I'll order a battery backup unit so we can proceed and enable write cache.
-krb
CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Hardware raid is not necessarily faster than software raid. If you are going to do raid5...make sure your card has plenty of cache. However, that may be moot anyway unless you use some filesystem other than ext3...is it a 3ware card? 3ware + RAID5 + ext3 = slow.
Well, I had them ship me a 3ware 9550 card, and yes, am using ext3, -there's no onboard battery backup module, so the write cache is disabled. So this would be slow then? Am using this as a Java Development machine.
Disabling write cache will slow you down big time if you have a lot of I/O. If you don't, it probably does not matter.
Any suggestions as to the filesystem type that would work best? I'll order a battery backup unit so we can proceed and enable write cache.
Since you are running 64-bit Centos, I guess using the plus kernel and the XFS filesystem should do you good. XFS has the best write performance and it sings on the 3ware.
Hardware raid is not necessarily faster than software raid.
hardware raid with battery-backed writeback cache is HUGELY faster at the very critical synchronous commit (fsync()) operations required by transactional database servers. so it all depends on what your application requirements are. if you just care about single threaded sequential reads, you'll probably do best with software SATA RAID10 on SATA. if you need fast random access committed writes, then you need a raid card with cache, and a battery so that you can safely enable writeback caching in the controller.
John R Pierce wrote:
Hardware raid is not necessarily faster than software raid.
hardware raid with battery-backed writeback cache is HUGELY faster at the very critical synchronous commit (fsync()) operations required by transactional database servers. so it all depends on what your application requirements are. if you just care about single threaded sequential reads, you'll probably do best with software SATA RAID10 on SATA. if you need fast random access committed writes, then you need a raid card with cache, and a battery so that you can safely enable writeback caching in the controller.
I contest the 'HUGELY faster' part. I won't contest that a 3ware RAID card can do mirroring better than Linux md even though md has improved in its handling of mirrors; it is not quite at the level of the 3ware RAID card but that does not mean it is faster except when doing rebuilds.
Fast random access committed writes like those of a mail queue happen to be my area of experience. It took a ten disk RAID array of SATA drives which just happened to be configured in RAID5 mode (not my idea) to get acceptable performance compared to 4 disk RAID10 md arrays.
I can accept faster in certain cases but if you say HUGELY faster, I would like to see some numbers.
I can accept faster in certain cases but if you say HUGELY faster, I
would like to see some numbers.
Ok, first a specific case that actually came up at my work in the last week.....
We've got a middleware messaging application we developed we'll call a 'republisher'. it recieves a stream of 'event' messages from a upstream server, and forwards them to a number of downstream servers that have registered ('subscribed') with it. it queues for each downstream subscriber so if they are down, it will hold event messages for it. not all servers want all 'topics', so we only send each server the specific event types its interested in. it writes the incoming event stream to a number of subscriber queue files as well as a series of journal files to track all this state. there is a queue for each incoming 'topic', its entries aren't cleared til the last subscriber on that topic has confirmed delivery.
{before someone screams my-favorite-MQ-server, let me just say, we HAD a commercial messaging system doing this, and our production operations staff is fed up with realtime problems that involve multinational vendor finger pointing, so we have developed our own to replace it}
On a typical dual xeon linux server running CentOS 4.4, with a simple direct connect disk, this republisher can easily handle 1000 messages/second using simple write(). However, if this process is busy humming away under the production workload of 60-80 messages/sec, and the power is pulled or the server crashes (this happened exactly once so far at a Thailand manufacturing facility, due to operator error), it lost 2000+ manufacturing events that the downstream servers couldn't easily recover, this was data in Linux's disk cache that hadn't yet been commited to disk. So, the obvious solution is to call fsync() on the various files after each 'event' has been processed, to insure its an atomic operation.
However, if this republisher does an fsync() after each event, it slows to like 50/second on a direct connect disk. If its run on a similar server with RAID controller that has battery-protected writeback cache enabled, it can easily do 700-800/second. We need 100+/second and prefer 200/second to have margins for catchup after data interruptions.
now, everything I've described above is a rather unusual application... so let me present a far more common scenarios...
Relational DB Management Servers, like Oracle, or PostgreSQL. when the RDBMS does a 'commit' at transaction END;, the server HAS to fsync its buffers to disk to maintain data integrity. With a writeback cache disk controller, the controller can acknowlege the writes as soon as the data is in its cache, then it can write that data to disk at its leisure. With software RAID, the server has to wait until ALL drives of the RAID slice have seeked, and completed the physical writes to the disk. In a write intensive database, where most of the read data is cached to memory, this is a HUGE performance hit.
John R Pierce wrote:
I can accept faster in certain cases but if you say HUGELY faster, I
would like to see some numbers.
now, everything I've described above is a rather unusual application... so let me present a far more common scenarios...
not so. I used to run boxes that handled 600 to 1000 smtp connections each. Creating and deleting thousands of small files was the environment I worked in.
Relational DB Management Servers, like Oracle, or PostgreSQL. when the RDBMS does a 'commit' at transaction END;, the server HAS to fsync its buffers to disk to maintain data integrity. With a writeback cache disk controller, the controller can acknowlege the writes as soon as the data is in its cache, then it can write that data to disk at its leisure. With software RAID, the server has to wait until ALL drives of the RAID slice have seeked, and completed the physical writes to the disk. In a write intensive database, where most of the read data is cached to memory, this is a HUGE performance hit.
I must say that the 3ware cards on those boxes that had them were not 955x series and therefore had no cache. Perhaps things would have been different if it were 955x cards in there but at that time, the 9xxx 3ware cards were not even out yet.
Since you have clearly pointed out the performance benefit really comes from the cache (if you have enough) on the board I do not see why using software raid and a battery-backed RAM card like the umem or even the gigabyte i-ram for the journal of the filesystem will be any less slow if at all.
Feizhou wrote:
not so. I used to run boxes that handled 600 to 1000 smtp connections each. Creating and deleting thousands of small files was the environment I worked in.
and if the power failed in the middle of this, how many messages were lost?
I must say that the 3ware cards on those boxes that had them were not 955x series and therefore had no cache. Perhaps things would have been different if it were 955x cards in there but at that time, the 9xxx 3ware cards were not even out yet.
We typically use 15000 rpm scsi or fiberchannel storage for our databases, not SATA.
Since you have clearly pointed out the performance benefit really comes from the cache (if you have enough) on the board I do not see why using software raid and a battery-backed RAM card like the umem or even the gigabyte i-ram for the journal of the filesystem will be any less slow if at all.
its not the file system journal I'm talking about, its an application specific journal file, which contains the indicies and state of the queue files, of which there's a very large number constantly being written. We need to flush the queue files AND the journal files for it to be safe. These run around 10GB total as I understand it (not each flush, but the aggregate queues can be this big).
if the server has a writeback enabled controller like a HP Smart Array 5i/532, it all works great. if it doesn't, it all grinds to a halt. quite simple, really. We have absolutely no desire to start architecting around 3rd party ram/battery disks, they won't be supported by our production system vendors, and they will make what is currently a fairly simple and robust system a lot more convoluted..
John R Pierce wrote:
Feizhou wrote:
not so. I used to run boxes that handled 600 to 1000 smtp connections each. Creating and deleting thousands of small files was the environment I worked in.
and if the power failed in the middle of this, how many messages were lost?
Heh, which boxes, the 3ware ones (750x series no battery backed cache...in fact no cache at all!) or the ide disks only ones or the scsi only ones or the compaq hardware raid scsi ones? Ans: Only on two occasions have I got corrupted queue files and that was because I used the XFS filesystem which is a disaster in a case of power loss. That was on a 3ware box. Had no problems with the rest.
I must say that the 3ware cards on those boxes that had them were not 955x series and therefore had no cache. Perhaps things would have been different if it were 955x cards in there but at that time, the 9xxx 3ware cards were not even out yet.
We typically use 15000 rpm scsi or fiberchannel storage for our databases, not SATA.
OOH, nice hardware you have for your uber databases. The outfit I worked for did well with mysql + software raid/3ware + ide disks. No need for FC.
Since you have clearly pointed out the performance benefit really comes from the cache (if you have enough) on the board I do not see why using software raid and a battery-backed RAM card like the umem or even the gigabyte i-ram for the journal of the filesystem will be any less slow if at all.
its not the file system journal I'm talking about, its an application specific journal file, which contains the indicies and state of the queue files, of which there's a very large number constantly being written. We need to flush the queue files AND the journal files for it to be safe. These run around 10GB total as I understand it (not each flush, but the aggregate queues can be this big).
Now you are telling me that somehow you have code that makes your database stuff its journal on your RAID controller's cache. Cool, mind sharing it with the rest of us?
Let me just say that I know that the code in the kernel for RAID controllers that have cache will, as you say, give the OK once the data that needs to be written hits the cache.
In the case of a RAM card, I am pointing out that that effect can be achieved by putting the journal of a journaling filesystem like ext3 on the RAM card especially since ext3 supports data journaling too.
If the aggregate queues are up to 10GB, I really wonder wonder how much faster your hardware raid makes things unless of course your cache is much larger than 2GB. Just on the basis of the inadequate size of your cache I would give software raid + RAM card the benefit of the doubt.
if the server has a writeback enabled controller like a HP Smart Array 5i/532, it all works great. if it doesn't, it all grinds to a halt. quite simple, really. We have absolutely no desire to start architecting around 3rd party ram/battery disks, they won't be supported by our production system vendors, and they will make what is currently a fairly simple and robust system a lot more convoluted..
Yada yada. The compaqs that had hardware raid with scsi disks were the slowest performers of all those boxes I managed not to mention that lack of any tools under 2.6 for monitoring or whatever too. I am not telling you to what hardware to use. What I am doing is contesting your claims of hardware raid with battery-backed cache being hugely faster than software raid. I will concede that there will be cases where it will indeed be hugely faster but not always.
Now you are telling me that somehow you have code that makes your database stuff its journal on your RAID controller's cache. Cool, mind sharing it with the rest of us?
fsync(handle);
If we -dont- do this after processing each event, and the system fails catastrophically, a thousand or so events (a couple seconds worth of realtime data) are lost in the operating systems buffering. I feel like I'm repeating myself.
If the aggregate queues are up to 10GB, I really wonder wonder how much faster your hardware raid makes things unless of course your cache is much larger than 2GB. Just on the basis of the inadequate size of your cache I would give software raid + RAM card the benefit of the doubt.
the combined queue files average a few to 10GB total under a normal workload. if a downstream subscriber backs up, they can grow quite a bit, up to an arbitrarily set 100GB limit.. its these queue files that we are flushing with fsync(). each fsync is writing a few K to a few 100K bytes out, one 'event' worth of data which has been appended to one or another of the queues, from where it will eventually be forwarded to some number of downstream subscribers. What we're calling a journal is just the index/state of these queues, stored in a couple seperate very small files, that also get fsync() on writes, it has NOTHING to do with the file system.
to store these queues on a ramcard, we'd need 100GB to handle the backup cases, which, I hope you can agree, is ludicrious.
Throughput under test load (incoming streams free running as fast as they can be processed)
no fsync - 1000 events/second fsync w/ direct connect disk - 50-80 events/second fsync w/ hardware writeback cached raid - 800/second
seems like a clear win to me.
John R Pierce wrote:
Now you are telling me that somehow you have code that makes your database stuff its journal on your RAID controller's cache. Cool, mind sharing it with the rest of us?
fsync(handle); If we -dont- do this after processing each event, and the system fails catastrophically, a thousand or so events (a couple seconds worth of realtime data) are lost in the operating systems buffering. I feel like I'm repeating myself.
Oh, I thought you meant that you might have some special code to put for example postgresql's database journal on the raid cache.
If the aggregate queues are up to 10GB, I really wonder wonder how much faster your hardware raid makes things unless of course your cache is much larger than 2GB. Just on the basis of the inadequate size of your cache I would give software raid + RAM card the benefit of the doubt.
the combined queue files average a few to 10GB total under a normal workload. if a downstream subscriber backs up, they can grow quite a bit, up to an arbitrarily set 100GB limit.. its these queue files that we are flushing with fsync(). each fsync is writing a few K to a few 100K bytes out, one 'event' worth of data which has been appended to one or another of the queues, from where it will eventually be forwarded to some number of downstream subscribers. What we're calling a journal is just the index/state of these queues, stored in a couple seperate very small files, that also get fsync() on writes, it has NOTHING to do with the file system.
Yes it does if you have a journaling filesystem. For example, fsync/fsyncdata calls get special treatment on filesystems like ext3. When the filesystem containing the files on which fsync is called and it is mounted data=journal, those writes hit the filesystem journal first after which the fsync gets to say OK. After that the kernel will write from the journal to the rest of the disk at its leisure.
to store these queues on a ramcard, we'd need 100GB to handle the backup cases, which, I hope you can agree, is ludicrious.
Which is not I would do too. I would just put the filesystem's journal on a ramcard with data journaling which will achieve the same effect of what your hardware raid writeback cached controller does. Data hits ramcard, fsync says OK, kernel writes to disk from ramcard at its leisure, just like the RAID card.
Throughput under test load (incoming streams free running as fast as they can be processed)
no fsync - 1000 events/second fsync w/ direct connect disk - 50-80 events/second fsync w/ hardware writeback cached raid - 800/second
seems like a clear win to me.
Yeah, with your paltry journal files, they would fit in the raid cache.
I would imagine that 'fsync w/ direct connect disk + filesystem journal on ramcard' would give you the same results as 'fsync w/ hardware writeback cached raid'
The performance therefore comes not from the RAID processing being done on a processor on the card but from its cache. So if you have such a card, you could get away with ext2 since they should not be any filesystem corruption due to power loss or otherwise.
Yes it does if you have a journaling filesystem. For example, fsync/fsyncdata calls get special treatment on filesystems like ext3. When the filesystem containing the files on which fsync is called and it is mounted data=journal, those writes hit the filesystem journal first after which the fsync gets to say OK. After that the kernel will write from the journal to the rest of the disk at its leisure.
I thought file system journals like ext3 were just used for the file system metadata? inode allocations and directory updates and so forth, not actual user data?.
If I understand what you're suggesting, if I write 200MB of data then fsync, my -data- is written to the journal, then later written to the actual file system?
anyways, I seriously doubt we could convince operations at our manufacturing facilities to add ramdrives to their mostly HP servers. I don't even know if they'd fit in the blade servers most commonly used.
On Tue, December 12, 2006 5:31 pm, John R Pierce wrote:
I thought file system journals like ext3 were just used for the file system metadata? inode allocations and directory updates and so forth, not actual user data?.
Note the "data=journal" option in the previous reply. This option forces of writing all data to the jounral, before being written to the actual filesystem. Though, this is not the default (which is data=ordered).
-- Daniel
anyways, I seriously doubt we could convince operations at our manufacturing facilities to add ramdrives to their mostly HP servers. I don't even know if they'd fit in the blade servers most commonly used.
John, I am not trying to convince you to do ram drives. I just want to point out the below.
On the point of hardware raid with a battery backed write cache, its write performance comes mainly from the cache itself and not from the fact that the raid processing is done on the card. For this reason, raid card manufacturers such as 3ware and Areca offer up to 2GB cache sizes.
However, if you are using raid5 for the array and a disk drops out, that raid card is going to have severe performance penalties even when writing due to the processor on the card not being able to keep up and the benefit of the cache is nullified. Same story if the processor sucks.
In such cases, software raid will perform better due to its use of the system processor for processing.
So if you run raid10, then the hardware raid with write cache will probably be the thing to do since it most likely handles mirrors better than the md driver although I feel more warm and fuzzy about the filesystem handling its journal on a ramdrive.
http://thebs413.blogspot.com/2005/09/fake-raid-fraid-sucks-even-more-at.html
Feizhou wrote:
John R Pierce wrote:
I can accept faster in certain cases but if you say HUGELY faster,
I would like to see some numbers.
now, everything I've described above is a rather unusual application... so let me present a far more common scenarios...
not so. I used to run boxes that handled 600 to 1000 smtp connections each. Creating and deleting thousands of small files was the environment I worked in.
Relational DB Management Servers, like Oracle, or PostgreSQL. when the RDBMS does a 'commit' at transaction END;, the server HAS to fsync its buffers to disk to maintain data integrity. With a writeback cache disk controller, the controller can acknowlege the writes as soon as the data is in its cache, then it can write that data to disk at its leisure. With software RAID, the server has to wait until ALL drives of the RAID slice have seeked, and completed the physical writes to the disk. In a write intensive database, where most of the read data is cached to memory, this is a HUGE performance hit.
I must say that the 3ware cards on those boxes that had them were not 955x series and therefore had no cache. Perhaps things would have been different if it were 955x cards in there but at that time, the 9xxx 3ware cards were not even out yet.
Since you have clearly pointed out the performance benefit really comes from the cache (if you have enough) on the board I do not see why using software raid and a battery-backed RAM card like the umem or even the gigabyte i-ram for the journal of the filesystem will be any less slow if at all. _______________________________________________ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Please do not top post. This has been fixed for you.
I must say that the 3ware cards on those boxes that had them were not 955x series and therefore had no cache. Perhaps things would have been different if it were 955x cards in there but at that time, the 9xxx 3ware cards were not even out yet.
Since you have clearly pointed out the performance benefit really comes from the cache (if you have enough) on the board I do not see why using software raid and a battery-backed RAM card like the umem or even the gigabyte i-ram for the journal of the filesystem will be any less slow if at all.
William Warren wrote:
http://thebs413.blogspot.com/2005/09/fake-raid-fraid-sucks-even-more-at.html
Are you trying to say that the drivers provided by Promise, Highpoint and other cruft are on the same level as Linux's software raid code and therefore Linux's software raid driver sucks?
Feizhou wrote:
Please do not top post. This has been fixed for you.
I must say that the 3ware cards on those boxes that had them were not 955x series and therefore had no cache. Perhaps things would have been different if it were 955x cards in there but at that time, the 9xxx 3ware cards were not even out yet.
Since you have clearly pointed out the performance benefit really comes from the cache (if you have enough) on the board I do not see why using software raid and a battery-backed RAM card like the umem or even the gigabyte i-ram for the journal of the filesystem will be any less slow if at all.
William Warren wrote:
http://thebs413.blogspot.com/2005/09/fake-raid-fraid-sucks-even-more-at.html
Are you trying to say that the drivers provided by Promise, Highpoint and other cruft are on the same level as Linux's software raid code and therefore Linux's software raid driver sucks? _______________________________________________
No, I believe Bryan was saying it. :)
I've had pretty good luck with the Linux software RAID stuff in the past, but these days a decent RAID controller is so cheap that it's just easier to integrate and maintain that way (at least it is for me). The 3ware stuff just plain works.
Eeek, I've just invoked his name. Prepare for a 30 page post on cpu memory controller interconnects and how that is responsible for the fall in bumblebee sperm counts across the globe. :)
Cheers,
Are you trying to say that the drivers provided by Promise, Highpoint and other cruft are on the same level as Linux's software raid code and therefore Linux's software raid driver sucks? _______________________________________________
No, I believe Bryan was saying it. :)
I've had pretty good luck with the Linux software RAID stuff in the past, but these days a decent RAID controller is so cheap that it's just easier to integrate and maintain that way (at least it is for me). The 3ware stuff just plain works.
that's true.
Eeek, I've just invoked his name. Prepare for a 30 page post on cpu memory controller interconnects and how that is responsible for the fall in bumblebee sperm counts across the globe. :)
rotfl
On Tuesday 12 December 2006 20:30, chrism@imntv.com wrote:
Feizhou wrote:
Are you trying to say that the drivers provided by Promise, Highpoint and other cruft are on the same level as Linux's software raid code and therefore Linux's software raid driver sucks? _______________________________________________
No, I believe Bryan was saying it. :)
Actually, if you read the entire article, he makes a point of saying he is talking about Fake RAID and not software RAID which he does have some good comments for. The article goes into how Fake RAID forces the drives to operate in a sub-standard way (by virtue of how the BIOS is used to access the drives), which is not the same as software RAID, which is standard drive access with an OS access layer thrown on top.
He isn't very flattering towards software RAID-5 though.