cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shubham Srivastava <>
Subject RE: Taking a Cluster Wide Snapshot
Date Thu, 26 Apr 2012 13:54:39 GMT
I was trying to get hold of all the data kind of a global snapshot.

I did the below :

I copied all the snapshots from each individual nodes where the snapshot data size was around
12Gb on each node to a common folder(one folder alone).

Strangely I found duplicate file names in multiple snapshots and more strangely the data size
was different of each duplicate file which lead to the total data size to close to 13Gb(else
have to be overwritten) where as the expectation was 12*6 = 72Gb.

Does that mean that if I need to create a new ring with the same data as the existing one
I cant just do that or should I start with the 13Gb copy to check if all the data is present
which sounds pretty illogical.

Please suggest??

From: Shubham Srivastava
Sent: Thursday, April 26, 2012 12:43 PM
To: ''
Subject: Re: Taking a Cluster Wide Snapshot

Your second part was what I was also referring where I put all the files from nodes to a single
node to create a similar bkp which needs to have unique file names across cluster.

From: Deno Vichas []
Sent: Thursday, April 26, 2012 12:29 PM
To: <>
Subject: Re: Taking a Cluster Wide Snapshot

there's no prerequisite for unique names.  each node's snapshot gets tar'ed up and then copied
over to a directory the name of the hostname of the node.  then those dirs are tar'ed and
copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single node cluster.
 i'm assuming i can get tar to replace or skip existing file so i end up with a set of unique
files.  can somebody confirm this?

On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command should be there with nodetool.
Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it index or filters
etc will have unique names across cluster. Is this a reasoanble assumption to have.

From: Deno Vichas [<>]
Sent: Thursday, April 26, 2012 12:09 PM
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassandra. Cant find
much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.

 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron job.  you'll need
pssh and s3cmd -

cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 clearsnapshot
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 7199 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-cassandra-snapshot.tar.gz

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapshot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stocktouch-backups

echo "DONE!"

View raw message