On another thought I am writing a code/script for taking a backup of all the nodes in a single DC , renaming data files with some uid and then merging them . The storage however would happen on some storage medium nas for ex which would be in the same DC. This would help in data copying a non hefty job.

Hopefully the one single DC data(from all the nodes in this DC) should give me the complete data just in case if RF >=1 .

The next improvement would be do do the same on incremental snapshots so that once you have a baseline data all the rest would be collecting chunks of increments alone and merging it with the original global snapshot.

I have do the same on each individual DC's.

Do you guys agree?


From: Tamar Fraenkel [tamar@tok-media.com]
Sent: Tuesday, May 01, 2012 10:50 AM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

Thanks for posting the script.
I see that the snapshot is always a full one, and if I understand correctly, it replaces the old snapshot on S3. Am I right?

Tamar Fraenkel 
Senior Software Engineer, TOK Media 

On Thu, Apr 26, 2012 at 9:39 AM, Deno Vichas <deno@syncopated.net> wrote:
On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.

 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job.  you'll need pssh and s3cmd -

cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stocktouch-backups

echo "DONE!"