[CentOS] 4.4/64-bit Supermicro/ Nvidia RAID [thanks]

Tue Dec 12 10:13:14 UTC 2006
Feizhou <feizhou at graffiti.net>

John R Pierce wrote:
> 
>>
>> Now you are telling me that somehow you have code that makes your 
>> database stuff its journal on your RAID controller's cache. Cool, mind 
>> sharing it with the rest of us?
>>
> 
>    fsync(handle);  
> If we -dont- do this after processing each event, and the system fails 
> catastrophically, a thousand or so events (a couple seconds worth of 
> realtime data) are lost in the operating systems buffering.      I feel 
> like I'm repeating myself.

Oh, I thought you meant that you might have some special code to put for 
example postgresql's database journal on the raid cache.

> 
> 
>> If the aggregate queues are up to 10GB, I really wonder wonder how 
>> much faster your hardware raid makes things unless of course your 
>> cache is much larger than 2GB. Just on the basis of the inadequate 
>> size of your cache I would give software raid + RAM card the benefit 
>> of the doubt.
> 
> the combined queue files average a few to 10GB total under a normal 
> workload.  if a downstream subscriber backs up, they can grow quite a 
> bit, up to an arbitrarily set 100GB limit..   its these queue files that 
> we are flushing with fsync().   each fsync is writing a few K to a few 
> 100K bytes out, one 'event' worth of data which has been appended to one 
> or another of the queues, from where it will eventually be forwarded to 
> some number of downstream subscribers.   What we're calling a journal is 
> just the index/state of these queues, stored in a couple seperate very 
> small files, that also get fsync() on writes, it has NOTHING to do with 
> the file system.

Yes it does if you have a journaling filesystem. For example, 
fsync/fsyncdata calls get special treatment on filesystems like ext3. 
When the filesystem containing the files on which fsync is called and it 
is mounted data=journal, those writes hit the filesystem journal first 
after which the fsync gets to say OK. After that the kernel will write 
from the journal to the rest of the disk at its leisure.

> 
> to store these queues on a ramcard, we'd need 100GB to handle the backup 
> cases, which, I hope you can agree, is ludicrious.

Which is not I would do too. I would just put the filesystem's journal 
on a ramcard with data journaling which will achieve the same effect of 
what your hardware raid writeback cached controller does. Data hits 
ramcard, fsync says OK, kernel writes to disk from ramcard at its 
leisure, just like the RAID card.

> 
> Throughput under test load (incoming streams free running as fast as 
> they can be processed)
> 
>    no fsync - 1000 events/second
>    fsync w/ direct connect disk - 50-80 events/second
>    fsync w/ hardware writeback cached raid - 800/second
> 
> seems like a clear win to me.

Yeah, with your paltry journal files, they would fit in the raid cache.

I would imagine that 'fsync w/ direct connect disk + filesystem journal 
on ramcard' would give you the same results as 'fsync w/ hardware 
writeback cached raid'

The performance therefore comes not from the RAID processing being done 
on a processor on the card but from its cache. So if you have such a 
card, you could get away with ext2 since they should not be any 
filesystem corruption due to power loss or otherwise.