incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josep Blanquer <>
Subject Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?
Date Thu, 23 Jun 2011 14:17:36 GMT
On Thu, Jun 23, 2011 at 5:04 AM, Peter Schuller <
> wrote:

> > 1. Is it feasible to run directly against a Cassandra data directory
> > restored from an EBS snapshot? (as opposed to nodetool snapshots restored
> > from an EBS snapshot).
> Assuming EBS is not buggy, including honor write barriers, including
> the linux guest kernel etc, then yes. EBS snapshots of a single
> volumes are promised to be atomic. As such, a restore from an EBS
> snapshot should be semantically identical to recover after a power
> outage or sudden reboot of the node.
> I make no claims as to how well EBS snapshot atomicity is actually
> tested in practice.
EBS volume atomicity is good. We've had tons of experience since EBS came
out almost 4 years ago,  to back all kinds of things, including large DBs.
One important thing to have in mind though, is that EBS snapshots are done
at the block level, not at the filesystem level. So depending on the
filesystem you have on top of the drives you might need to tell the
filesystem to "make sure this is consistent or recoverable now". For
example, if you use the log-based XFS, you might need to do xfs_freeze,
snapshot disc/s, xfs_unfreeze. To make sure that the restored filesystem
data (and not only the low level disk blocks) is recoverable when you
restore them.

 Snapshotting volume stripes works exactly in the same way, you just have to
keep track of what position each snapshot has in the stripe, so you can
recreate the stripe back correctly.

The "freezing" of the filesystem might cause a quick/mini hickup, which is
usually not noticeable unless you have very stringent requirements in the
box (or if you have a very large stripe, and/or some sort of network issue
where the calls to amazon endpoint are very slow...and therefore you're
locking the FS a tad longer than you'd want to).


Josep M.

View raw message