Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8F65B9C46 for ; Thu, 26 Apr 2012 13:55:12 +0000 (UTC) Received: (qmail 92431 invoked by uid 500); 26 Apr 2012 13:55:10 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 92409 invoked by uid 500); 26 Apr 2012 13:55:10 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 92398 invoked by uid 99); 26 Apr 2012 13:55:10 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 13:55:10 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of Shubham.Srivastava@makemytrip.com designates 125.20.38.76 as permitted sender) Received: from [125.20.38.76] (HELO ironport2.makemytrip.com) (125.20.38.76) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 13:55:02 +0000 X-IronPort-Anti-Spam-Filtered: true X-IronPort-Anti-Spam-Result: Ap4EALVSmU+sEAii/2dsb2JhbABEgkawDIIJAQEBBFM2AgEIDQQEAQELHQcyFAkIAQEEEwjDJopshRZjBIgvk2SNEg Received: from unknown (HELO mmt-srv-xfe2.mmt.com) ([172.16.8.162]) by ironport2.makemytrip.com with ESMTP; 26 Apr 2012 19:24:40 +0530 Received: from MMT-SRV-XDB2.mmt.com ([fe80::655f:2e39:da9:5947]) by mmt-srv-xfe2.mmt.com ([fe80::1405:e804:6676:ecb3%17]) with mapi id 14.01.0339.001; Thu, 26 Apr 2012 19:24:39 +0530 From: Shubham Srivastava To: "'user@cassandra.apache.org'" Subject: RE: Taking a Cluster Wide Snapshot Thread-Topic: Taking a Cluster Wide Snapshot Thread-Index: Ac0jdppmTw+6M5tzS6Sxuf18Ia6/dP//pU+AgABdJD3//6hxAP//n/8jgAEuo3k= Date: Thu, 26 Apr 2012 13:54:39 +0000 Message-ID: References: <4F98F25C.4060200@syncopated.net>, In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.16.5.73] Content-Type: multipart/alternative; boundary="_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_" MIME-Version: 1.0 --_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable I was trying to get hold of all the data kind of a global snapshot. I did the below : I copied all the snapshots from each individual nodes where the snapshot da= ta size was around 12Gb on each node to a common folder(one folder alone). Strangely I found duplicate file names in multiple snapshots and more stran= gely the data size was different of each duplicate file which lead to the t= otal data size to close to 13Gb(else have to be overwritten) where as the e= xpectation was 12*6 =3D 72Gb. Does that mean that if I need to create a new ring with the same data as th= e existing one I cant just do that or should I start with the 13Gb copy to = check if all the data is present which sounds pretty illogical. Please suggest?? ________________________________ From: Shubham Srivastava Sent: Thursday, April 26, 2012 12:43 PM To: 'user@cassandra.apache.org' Subject: Re: Taking a Cluster Wide Snapshot Your second part was what I was also referring where I put all the files fr= om nodes to a single node to create a similar bkp which needs to have uniqu= e file names across cluster. From: Deno Vichas [mailto:deno@syncopated.net] Sent: Thursday, April 26, 2012 12:29 PM To: user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot there's no prerequisite for unique names. each node's snapshot gets tar'ed= up and then copied over to a directory the name of the hostname of the nod= e. then those dirs are tar'ed and copied to S3. what i haven't tried yet is to untar everything for all nodes into a single= node cluster. i'm assuming i can get tar to replace or skip existing file= so i end up with a set of unique files. can somebody confirm this? On 4/25/2012 11:45 PM, Shubham Srivastava wrote: Thanks a Lot Deno. A bit surprised that the an equivalent command should b= e there with nodetool. Not sure if it is in the latest release. BTW this makes a prerequisite that all the Data files of Cassandra be it in= dex or filters etc will have unique names across cluster. Is this a reasoan= ble assumption to have. Regards, Shubham ________________________________ From: Deno Vichas [deno@syncopated.net] Sent: Thursday, April 26, 2012 12:09 PM To: user@cassandra.apache.org Subject: Re: Taking a Cluster Wide Snapshot On 4/25/2012 11:34 PM, Shubham Srivastava wrote: Whats the best way(or the only way) to take a cluster wide backup of Cassan= dra. Cant find much of the documentation on the same. I am using a MultiDC setup with cassandra 0.8.6. Regards, Shubham here's how i'm doing in AWS land using the DataStax AMI via a nightly cron= job. you'll need pssh and s3cmd - #!/bin/bash cd /home/ec2-user/ops echo "making snapshots" pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719= 9 clearsnapshot stocktouch' pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719= 9 snapshot stocktouch' echo "making tar balls" pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra= -snapshot.tar.gz' pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-ca= ssandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots' echo "coping tar balls" pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapsh= ot.tar.gz . echo "tar'ing tar balls" tar -cvf cassandra-snapshots-all-nodes.tar 10* echo "pushing to S3" ../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar s3://stoc= ktouch-backups echo "DONE!" --_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_ Content-Type: text/html; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable
I was trying to get hold of all the data kind of a global snapshot.

I did the below : 

I copied all the snapshots from each individual nodes where the snapsh= ot data size was around 12Gb on each node to a common folder(one folder alo= ne).

Strangely I found duplicate file names in multiple snapshots and = more strangely the data size was different of each duplicate file= which lead to the total data size to close to 13Gb(else have to be overwri= tten) where as the expectation was 12*6 =3D 72Gb. 

Does that mean that if I need to create a new ring with the same data = as the existing one I cant just do that or should I start with the 13Gb cop= y to check if all the data is present which sounds pretty illogical.

Please suggest??

From: Shubham Srivastava
Sent: Thursday, April 26, 2012 12:43 PM
To: 'user@cassandra.apache.org'
Subject: Re: Taking a Cluster Wide Snapshot

Your second part was what I was also refe= rring where I put all the files from nodes to a single node to create a sim= ilar bkp which needs to have unique file names across cluster.

 
From: Deno Vichas [mailto:deno@syncopated.net]
Sent: Thursday, April 26, 2012 12:29 PM
To: user@cassandra.apache.org <user@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot
 
there's no prerequisite for unique names.  each node's snapshot gets t= ar'ed up and then copied over to a directory the name of the hostname of th= e node.  then those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single= node cluster.  i'm assuming i can get tar to replace or skip existing= file so i end up with a set of unique files.  can somebody confirm th= is?




On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent comman= d should be there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandr= a be it index or filters etc will have unique names across cluster. Is this= a reasoanble assumption to have.

Regards,
Shubham

From: Deno Vichas [deno@sy= ncopated.net]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.org
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of C= assandra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax= AMI via a nightly cron job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719= 9 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719= 9 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra= -snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-ca= ssandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapsh= ot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3:/= /stocktouch-backups

echo "DONE!"


--_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_--