Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of
 Shubham.Srivastava@makemytrip.com designates 125.20.38.76 as permitted
 sender)
From: Shubham Srivastava <Shubham.Srivastava@makemytrip.com>
To: "'user@cassandra.apache.org'" <user@cassandra.apache.org>
Subject: RE: Taking a Cluster Wide Snapshot
Thread-Topic: Taking a Cluster Wide Snapshot
Thread-Index: Ac0jdppmTw+6M5tzS6Sxuf18Ia6/dP//pU+AgABdJD3//6hxAP//n/8jgAEuo3k=
Date: Thu, 26 Apr 2012 13:54:39 +0000
Message-ID: <DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952@MMT-SRV-XDB2.mmt.com>
References: 
 <4F98F25C.4060200@syncopated.net>,<DA6AD352B2C6DC4EAB14CD8FAFB2686417A27092@MMT-SRV-XDB2.mmt.com>
In-Reply-To: <DA6AD352B2C6DC4EAB14CD8FAFB2686417A27092@MMT-SRV-XDB2.mmt.com>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_"
MIME-Version: 1.0

--_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

I was trying to get hold of all the data kind of a global snapshot.

I did the below :

I copied all the snapshots from each individual nodes where the snapshot da=
ta size was around 12Gb on each node to a common folder(one folder alone).

Strangely I found duplicate file names in multiple snapshots and more stran=
gely the data size was different of each duplicate file which lead to the t=
otal data size to close to 13Gb(else have to be overwritten) where as the e=
xpectation was 12*6 =3D 72Gb.

Does that mean that if I need to create a new ring with the same data as th=
e existing one I cant just do that or should I start with the 13Gb copy to =
check if all the data is present which sounds pretty illogical.

Please suggest??

________________________________
From: Shubham Srivastava
Sent: Thursday, April 26, 2012 12:43 PM
To: 'user@cassandra.apache.org'
Subject: Re: Taking a Cluster Wide Snapshot

Your second part was what I was also referring where I put all the files fr=
om nodes to a single node to create a similar bkp which needs to have uniqu=
e file names across cluster.


From: Deno Vichas [mailto:deno@syncopated.net]
Sent: Thursday, April 26, 2012 12:29 PM
To: user@cassandra.apache.org <user@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

there's no prerequisite for unique names.  each node's snapshot gets tar'ed=
 up and then copied over to a directory the name of the hostname of the nod=
e.  then those dirs are tar'ed and copied to S3.

what i haven't tried yet is to untar everything for all nodes into a single=
 node cluster.  i'm assuming i can get tar to replace or skip existing file=
 so i end up with a set of unique files.  can somebody confirm this?


On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
Thanks a Lot Deno.  A bit surprised that the an equivalent command should b=
e there with nodetool. Not sure if it is in the latest release.

BTW this makes a prerequisite that all the Data files of Cassandra be it in=
dex or filters etc will have unique names across cluster. Is this a reasoan=
ble assumption to have.

Regards,
Shubham
________________________________
From: Deno Vichas [deno@syncopated.net<mailto:deno@syncopated.net>]
Sent: Thursday, April 26, 2012 12:09 PM
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Re: Taking a Cluster Wide Snapshot

On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
Whats the best way(or the only way) to take a cluster wide backup of Cassan=
dra. Cant find much of the documentation on the same.

I am using a MultiDC setup with cassandra 0.8.6.


Regards,
Shubham
 here's how i'm doing in AWS land using the DataStax AMI via a nightly cron=
 job.  you'll need pssh and s3cmd -


#!/bin/bash
cd /home/ec2-user/ops

echo "making snapshots"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719=
9 clearsnapshot stocktouch'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719=
9 snapshot stocktouch'

echo "making tar balls"
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra=
-snapshot.tar.gz'
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-ca=
ssandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'

echo "coping tar balls"
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapsh=
ot.tar.gz .

echo "tar'ing tar balls"
tar -cvf cassandra-snapshots-all-nodes.tar 10*

echo "pushing to S3"
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar  s3://stoc=
ktouch-backups

echo "DONE!"


--_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

<html dir=3D"ltr">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Diso-8859-=
1">
</head>
<body bgcolor=3D"#FFFFFF" fpstyle=3D"1" ocsi=3D"0">
<div style=3D"direction: ltr;font-family: Tahoma;color: #000000;font-size: =
10pt;">
<div>I was trying to get hold of all the data kind of a global snapshot.</d=
iv>
<div><br>
</div>
I did the below :&nbsp;
<div><br>
</div>
<div>I copied all the snapshots from each individual nodes where the snapsh=
ot data size was around 12Gb on each node to a common folder(one folder alo=
ne).</div>
<div><br>
</div>
<div>Strangely&nbsp;I found duplicate file names in multiple snapshots and =
more&nbsp;strangely&nbsp;the data size was different of each duplicate file=
 which lead to the total data size to close to 13Gb(else have to be overwri=
tten) where as the expectation was 12*6 =3D 72Gb.&nbsp;</div>
<div><br>
</div>
<div>Does that mean that if I need to create a new ring with the same data =
as the existing one I cant just do that or should I start with the 13Gb cop=
y to check if all the data is present which sounds pretty illogical.</div>
<div><br>
</div>
<div>Please suggest??<br>
<br>
<div style=3D"font-family: Times New Roman; color: #000000; font-size: 16px=
">
<hr tabindex=3D"-1">
<div id=3D"divRpF600184" style=3D"direction: ltr; "><font face=3D"Tahoma" s=
ize=3D"2" color=3D"#000000"><b>From:</b> Shubham Srivastava<br>
<b>Sent:</b> Thursday, April 26, 2012 12:43 PM<br>
<b>To:</b> 'user@cassandra.apache.org'<br>
<b>Subject:</b> Re: Taking a Cluster Wide Snapshot<br>
</font><br>
</div>
<div></div>
<div><font style=3D"font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot=
;sans-serif&quot;; color:#1F497D">Your second part was what I was also refe=
rring where I put all the files from nodes to a single node to create a sim=
ilar bkp which needs to have unique file names across
 cluster. <br>
</font><br>
&nbsp;<br>
<div style=3D"border:none; border-top:solid #B5C4DF 1.0pt; padding:3.0pt 0i=
n 0in 0in">
<font style=3D"font-size:10.0pt; font-family:&quot;Tahoma&quot;,&quot;sans-=
serif&quot;"><b>From</b>: Deno Vichas [mailto:deno@syncopated.net]
<br>
<b>Sent</b>: Thursday, April 26, 2012 12:29 PM<br>
<b>To</b>: user@cassandra.apache.org &lt;user@cassandra.apache.org&gt; <br>
<b>Subject</b>: Re: Taking a Cluster Wide Snapshot <br>
</font>&nbsp;<br>
</div>
there's no prerequisite for unique names.&nbsp; each node's snapshot gets t=
ar'ed up and then copied over to a directory the name of the hostname of th=
e node.&nbsp; then those dirs are tar'ed and copied to S3.<br>
<br>
what i haven't tried yet is to untar everything for all nodes into a single=
 node cluster.&nbsp; i'm assuming i can get tar to replace or skip existing=
 file so i end up with a set of unique files.&nbsp; can somebody confirm th=
is?<br>
<br>
<br>
<br>
<br>
On 4/25/2012 11:45 PM, Shubham Srivastava wrote:
<blockquote type=3D"cite">
<div style=3D"direction:ltr; font-family:Tahoma; color:#000000; font-size:1=
0pt">Thanks a Lot Deno. &nbsp;A bit surprised that the an equivalent comman=
d should be there with nodetool. Not sure if it is in the latest release.
<div><br>
</div>
<div>BTW&nbsp;this makes a prerequisite that all the Data files of Cassandr=
a be it index or filters etc will have unique names across cluster. Is this=
 a reasoanble assumption to have.</div>
<div><br>
</div>
<div>Regards,</div>
<div>Shubham</div>
<div>
<div style=3D"font-family:Times New Roman; color:#000000; font-size:16px">
<hr tabindex=3D"-1">
<div id=3D"divRpF405793" style=3D"direction:ltr"><font color=3D"#000000" fa=
ce=3D"Tahoma" size=3D"2"><b>From:</b> Deno Vichas [<a class=3D"moz-txt-link=
-abbreviated" href=3D"mailto:deno@syncopated.net" target=3D"_blank">deno@sy=
ncopated.net</a>]<br>
<b>Sent:</b> Thursday, April 26, 2012 12:09 PM<br>
<b>To:</b> <a class=3D"moz-txt-link-abbreviated" href=3D"mailto:user@cassan=
dra.apache.org" target=3D"_blank">
user@cassandra.apache.org</a><br>
<b>Subject:</b> Re: Taking a Cluster Wide Snapshot<br>
</font><br>
</div>
<div>On 4/25/2012 11:34 PM, Shubham Srivastava wrote:
<blockquote type=3D"cite"><style type=3D"text/css" id=3D"owaParaStyle">=0A=
<!--=0A=
-->=0A=
</style>
<div style=3D"direction:ltr; font-family:Tahoma; color:#000000; font-size:1=
0pt">Whats the best way(or the only way) to take a cluster wide backup of C=
assandra. Cant find much of the documentation on the same.
<div><br>
</div>
<div>I am using a MultiDC setup with cassandra 0.8.6.
<div><br>
</div>
<div><br>
</div>
<div>Regards,</div>
<div>Shubham</div>
</div>
</div>
</blockquote>
<font size=3D"-1">&nbsp;here's how i'm doing in AWS land using the DataStax=
 AMI via a nightly cron job.&nbsp;
</font><font size=3D"-1">you'll need pssh and s3cmd - </font><br>
<font size=3D"-1"><br>
<br>
#!/bin/bash<br>
cd /home/ec2-user/ops<br>
<br>
echo &quot;making snapshots&quot;<br>
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719=
9 clearsnapshot stocktouch'<br>
pssh -h prod-cassandra-nodes.txt -l ubuntu -P 'nodetool -h localhost -p 719=
9 snapshot stocktouch'<br>
<br>
echo &quot;making tar balls&quot;<br>
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'rm `hostname`-cassandra=
-snapshot.tar.gz'<br>
pssh -h prod-cassandra-nodes.txt -l ubuntu -P -t 0 'tar -zcvf `hostname`-ca=
ssandra-snapshot.tar.gz /raid0/cassandra/data/stocktouch/snapshots'<br>
<br>
echo &quot;coping tar balls&quot;<br>
pslurp -h prod-cassandra-nodes.txt -l ubuntu /home/ubuntu/*cassandra-snapsh=
ot.tar.gz .<br>
<br>
echo &quot;tar'ing tar balls&quot;<br>
tar -cvf cassandra-snapshots-all-nodes.tar 10*<br>
<br>
echo &quot;pushing to S3&quot;<br>
../s3cmd-1.1.0-beta3/s3cmd put cassandra-snapshots-all-nodes.tar&nbsp; s3:/=
/stocktouch-backups<br>
<br>
echo &quot;DONE!&quot;<br>
<br>
</font></div>
</div>
</div>
</div>
</blockquote>
<br>
</div>
</div>
</div>
</div>
</body>
</html>

--_000_DA6AD352B2C6DC4EAB14CD8FAFB2686417A27952MMTSRVXDB2mmtco_--