Thanks again for those answers which I'm sure will
also interest other people.
If I understand correctly, I can use software RAID
with no disadvantages over hardware RAID as for reliability in cases of power
failures (as long as we use journalised filesystem), is that correct
?
I'm not in fear about he CPU increase for software
RAID, that isn't a problem to me.
What else do you think I would loose by using
software RAID instead of hardware raid ?
Also, sorry to insist about power failures, but
here is the reason: my provider allows me to have remote reboot on my server,
which is very helpful when the system hangs suddenly as I have no physical
access. Curently I must do a hardware reboot 1-2 times per year and the system
starts and runs again correctly. These remote reboots are power failure types
and are more and more frenquently offered to customers. It avoids us to ask the
technician to reboot the machine.
Your professionnal experience is very instructive.
All you said about the flush problem is thrilling. Do you mean then that having
battery bakup RAID will not help much regarding reliability in cases of power
failures ? Then many people have been tricked ?
Last: is there a mean of knowing what datas have
been lost when a power failure happened ? Can we see that in terms of sectors,
clusters or in terms of files which are corrupted ? Can we see a list of those
?
Thank you a lot, Terrence !
Daniel
----- Original Message -----
Sent: Friday, November 05, 2004 2:01
AM
Subject: Re: [Centos] Promise raid cards
- software raid
dan1 wrote:
> Hello,
Terrence.
>
> Thank you for your complete answers. That's
very interesting.
>
>
> > I am not sure what you
mean about the file system crashing?
>
> I meant that it
becomes unrecoverable, or that some datas are missing.
>
These are two seperate problems with two seperate
solutions.
You can have loss of data without the file system having any
problems.
That is data is missing, perfectly working file system. In fact
journaled file systems pick preserving the file system over missing data
every time.
Eg.
You have a database and are writing a query
to the database that inerts
1000 records. If you do not insert all 1000
records then you cannot use
any of them. That is record 1000 depends on
all the previous 999. On
record 678 the power fails. When the system
comes back what happened to
the first 678 inserts? You assume they
complete but since you need all
1000 for the data to be valid you
basically have to delete the ones you
did insert and start again, if you
can.
Deleting means that you first have to know what records to
delete.
Depending on the complexity of the inserts (they could touch
dozens of
tables and interrelationships) you could have a lot of work
ahead of you
to manually find what is a record that is part of that
incomplete set
and what records are not.
This is where a journal
comes in. A journal records when data is
written. Basically you record
what you write after you finish writing a
record so that worst case you
can replay that journal of changes to back
out of what happened. This is
what a journaled file system does. It
allows you to more easily back out
of the incomplete data problem
quickly. This is opposed to the old method
of file system consistency
checking which was like a manual search of the
entire database. If you
only have to go over the changes, rather than
searching the whole
database it is faster. Also the long manual
check is complicated and
prone to error. For speed and accuracy a journal
is better.
However journaled file systems do not save the data. In fact
journaled
file systems will throw away data if it is incomplete. Say in
the above
example for some reason that you could use the first 678
records, or
that there was no other way to recover the 1000 records again
so 678 was
better than nothing. Well a journal does not care. It simply
looks to
see if the whole transactions completed of 1000 records. If it
didn't it
deletes everything up to the failure. Even if 999 of the 1000
records
was written it will still delete the 999. The assumption being
that it
is better to have a consistent file system and protect the good
data
than have partially written data, that while valuable is
inconsistent.
If you want to ensure that even partial data is preserved
you have to do
other things to protect it. A battery protect RAID
card is one very
very narrow approach that solves one specific type of
failure state.
Where data is written to the RAID card but not to disk
yet.
Lets go back to our above example
You have a power failure
on record 678. The raid card has memory to
store 5 records. At the time of
the power failure it has only sent the
first 673 records to the disks for
writting. The other 5 are in the
controller cache. If you have a battery
on that memory you will save
those 5 records. However does it matter?
Afterall 678 or 673 they are
still not 1000. Also the disks themselves may
store in their cache 2
records. So the disk has only written records up to
670 with records 671
and 672 still waiting in volatile RAM with no battery
backup attached
directly to the disk (write back cache on all PATA
drives).
The power fails and the system comes back online. The RAID
card writes
records 673-678 to the disks and they write them.
Unfortunately records
671 and 672 are lost because they were in volatile
disk cache on the
disk itself.
So you have records 0-670,637-678.
You in fact have a hole in what you
have on disk, and who cares anyway
because the journaled file system is
going to delete all those records
when it goes to work to ensure
integrity over data
preservation.
Basically RAID batteries buy you something, but not much,
and they buy
you even less when they are attached to ATA drives that have
write back
cache that essentially makes the RAID cache moot.
>
> I do not recommend ext3 for anything over about 120GB.
>
> OK, that's interesting.
>
I work with compute
cluster and with file systems that are in the
terabytes of size. Basically
nothing else has come close to XFS in
practice. The guys that admin
the really big stuff that we collaborate
with will not touch anything but
XFS and they have petabytes of storage.
If you can go with XFS. Even RHEL4
should finally have XFS standard
since fedora core 2 and later has it as
an option, even for the root disk.
> > My biggest question is why
at this point are you even bothering with
> PATA drives? Compared to
SATA drives they are unreliable and poor
> performing for about the same
cost.
> This is what I get from my ISP (I think). However it doesn't
change a
> lot my conception and thoughts about raid. The flush problem
remains
> the same.
> Also I am more familiar to PATA.
The
flush problem as I hope I have demonstrated is not at all addressed
by
battery backups on RAID card ram. Battery based backup of RAID memory
is a
good gimmick, but in practice is useless. It covers such a narrow
part of
the problem space as to be irrelevant.
If you are concerned that power
failure will loose data get a UPS for
the entire system. It is the only
thing that will help you because it is
the only thing that will allow your
entire system, from software to
hardware to achieve a consistent state
before shutting down. Otherwise
you may save a few bytes of data that was
in cache on the RAID card but
that does not matter since you will still
end up with an incomplete file
system transaction that the journaled file
system is going to delete
anyway.
The linux journaled file systems
are very good at preserving integrity,
even in the face of underlying
hardware failure in some cases. Choosing
a good file system is all you
need to do there to ensure that aspect. As
far as data loss, aside from
backups after the fact the only solution
that will work in practice is
Uninteruptible Power Supplies that will
give you enough time to shutdown
the entire system in a consistent way.
Terrence
>
> Thank you for your interesting advices. I appreciate that
!
>
> Best regards,
>
>
Daniel
>
>
>