cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thoku Hansen <>
Subject Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?
Date Wed, 22 Jun 2011 22:48:44 GMT
I have a couple of questions regarding the coordination of Cassandra nodetool snapshots with
Amazon EBS snapshots as part of a Cassandra backup/restore strategy.

Background: I have a cluster running in EC2. Its nodes are configured like so:

* Instance type: m1.xlarge
* Cassandra commit log writing to RAID-0 ephemeral storage
* Cassandra data writing to an EBS volume.

Note: there is a lot of conflicting information/advice about using Cassandra in EC2 w.r.t
ephemeral vs. EBS. The above configuration seems to work well for my application. I only described
this to provide context for my EBS snapshotting question. With respect, I hope not to debate
Cassandra performance for ephemeral vs. EBS in this thread!

I am setting up a process that performs regular EBS (->S3) snapshots for the purpose of
backing up Cassandra plus other data.
I presume this will need to be coordinated with regular Cassandra (nodetool) snapshots also.

My questions:
1. Is it feasible to run directly against a Cassandra data directory restored from an EBS
snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).
2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots
across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far
apart can they be before its a problem? (seconds? minutes? hours?)

My motivation for these two questions: I'm trying to figure out how much effort needs to be
put into:
* Time-coordinated scheduling of nodetool snapshots across the cluster
* Automation of the process of determining the most appropriate set of nodetool snapshots
to use when restoring a cluster.

View raw message