[CentOS] block level changes at the file system level?
Devin Reade
gdr at gno.org
Fri Jul 4 20:41:05 UTC 2014
--On Thursday, July 03, 2014 04:47:30 PM -0400 Stephen Harris
<lists at spuddy.org> wrote:
> On Thu, Jul 03, 2014 at 12:48:34PM -0700, Lists wrote:
>> Whatever we do, we need the ability to create a point-in-time history.
>> We commonly use our archival dumps for audit, testing, and debugging
>> purposes. I don't think PG + WAL provides this type of capability. So at
>> the moment we're down to:
>
> You can recover WAL files up until the point in time specified in the
> restore file
>
> See, for example
>
> http://opensourcedbms.com/dbms/how-to-do-point-in-time-recovery-with-post
> gresql-9-2-pitr-3/
I have to back up Stephen on this one:
1. The most efficient way to get minimal diffs is generally to get
the program that understands the semantics of the data to make
the diffs. In the DB world this is typically some type of
baseline + log shipping. It comes in various flavours and names,
but the concept is the same across the various enterprise-grade
databases.
Just as algorithmic changes to an application to increase performance
are always going to be much better than trying to tune OS-level
parameters, doing "dedup" at the application level (where the capability
exists) is always going to be more efficient than trying to do it
at the OS level.
2. Recreating a point-in-time image for audits, testing, etc, then
becomes the process of exercising your recovery/DR procedures (which
is a very good side effect). Want to do an audit? Recover the
db by starting with the baseline and rolling the log forward to
the desired point.
3. Although rolling the log forward can take time, you can find a
suitable tradeoff between recover time and disk space by periodically
taking a new baseline (weekly? monthly? depends on your write load)
Then anything older than that baseline is only of interest for
audit data/retention purposes, and no longer factors into the
recovery/DR/test scenarios.
4. Using baseline + log shipping generally results in smaller storage
requirements for offline / offsite backups. (Don't forget that
you're not exercising your DR procedure unless you sometimes recover
from your offsite backups, so maybe it would be good to have a policy
that all audits are performed based on recovery from offsite media,
only.)
5. With the above mechanisms in place, there's basically zero need for
block- or file-based deduplication, so you can save yourself from
that level of complexity / resource usage. You may decide that
filesystem-level snapshots of the filesystem holding the log files
still plays a part in your backup strategy, but that's separate from
the dedup issue.
Echoing one of John's comments, I would be very surprised if doing
dedup on database-type data would realize any significant benefits
for common configurations/loads.
Devin
More information about the CentOS
mailing list