incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Backup/Restore: Coordinating Cassandra Nodetool Snapshots with Amazon EBS Snapshots?
Date Wed, 22 Jun 2011 23:34:48 GMT
> 1. Is it feasible to run directly against a Cassandra data directory restored from an
EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).

I dont have experience with the EBS snapshot, but I've never been a fan of OS level snapshots
that are not coordinated with the DB layer. 

> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots
across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far
apart can they be before its a problem? (seconds? minutes? hours?)

Consider the snapshot to be from the time of the first one. 

Previous discussion on AWS backup 
http://www.mail-archive.com/user@cassandra.apache.org/msg12831.html

Hope that helps. 

-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 23 Jun 2011, at 10:48, Thoku Hansen wrote:

> I have a couple of questions regarding the coordination of Cassandra nodetool snapshots
with Amazon EBS snapshots as part of a Cassandra backup/restore strategy.
> 
> Background: I have a cluster running in EC2. Its nodes are configured like so:
> 
> * Instance type: m1.xlarge
> * Cassandra commit log writing to RAID-0 ephemeral storage
> * Cassandra data writing to an EBS volume.
> 
> Note: there is a lot of conflicting information/advice about using Cassandra in EC2 w.r.t
ephemeral vs. EBS. The above configuration seems to work well for my application. I only described
this to provide context for my EBS snapshotting question. With respect, I hope not to debate
Cassandra performance for ephemeral vs. EBS in this thread!
> 
> I am setting up a process that performs regular EBS (->S3) snapshots for the purpose
of backing up Cassandra plus other data.
> I presume this will need to be coordinated with regular Cassandra (nodetool) snapshots
also.
> 
> My questions:
> 1. Is it feasible to run directly against a Cassandra data directory restored from an
EBS snapshot? (as opposed to nodetool snapshots restored from an EBS snapshot).
> 2. Noting the wiki's consistent Cassandra backups advice; if I schedule nodetool snapshots
across the cluster, should the relative age of the 'sibling' snapshots be a concern? How far
apart can they be before its a problem? (seconds? minutes? hours?)
> 
> My motivation for these two questions: I'm trying to figure out how much effort needs
to be put into:
> * Time-coordinated scheduling of nodetool snapshots across the cluster
> * Automation of the process of determining the most appropriate set of nodetool snapshots
to use when restoring a cluster.
> 
> Thanks!


Mime
View raw message