Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (nike.apache.org: domain of dave@stormpath.com designates
 209.85.192.177 as permitted sender)
From: David Laube <dave@stormpath.com>
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81"
Message-Id: <E21C80FC-86B7-49DB-BBEB-A597EF1DBA86@stormpath.com>
Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\))
Subject: Re: Read inconsistency after backup and restore to different cluster
Date: Thu, 14 Nov 2013 13:44:14 -0800
References: <F213349C-C08D-4B65-A2EB-81BB91258141@stormpath.com>
 <CAEDUwd2W=OiaZzzeubshEUid6LbC2RLXU4+RS0722E9xH+Zo6w@mail.gmail.com>
To: user@cassandra.apache.org
In-Reply-To: 
 <CAEDUwd2W=OiaZzzeubshEUid6LbC2RLXU4+RS0722E9xH+Zo6w@mail.gmail.com>


--Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=iso-8859-1

Thank you for the detailed reply Rob!  I have replied to your comments =
in-line below;

On Nov 14, 2013, at 1:15 PM, Robert Coli <rcoli@eventbrite.com> wrote:

> On Thu, Nov 14, 2013 at 12:37 PM, David Laube <dave@stormpath.com> =
wrote:
> It is almost as if the data only exists on some of the nodes, or =
perhaps the token ranges are dramatically different --again, we are =
using vnodes so I am not exactly sure how this plays into the equation.
>=20
> The token ranges are dramatically different, due to vnode random token =
selection from not setting initial_token, and setting num_tokens.
>=20
> You can verify this by listing the tokens per physical node in =
nodetool gossipinfo or (iirc) nodetool status.
> =20
> 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the =
five nodes in the new cluster-B ring.
>=20
> I don't understand this at all, do you mean that you are using one =
source node's data to load each of of the target nodes? Or are you just =
saying there's a 1:1 relationship between source snapshots and target =
nodes to load into? Unless you have RF=3DN, using one source for 5 =
target nodes won't work.

We have configured RF=3D3 for the keyspace in question. Also, from a =
client perspective, we read with CL=3D1 and write with CL=3DQUORUM. =
Since we have 5 nodes total in cluster-A, we snapshot keyspace_name on =
each of the five nodes which results in a snapshot directory on each of =
the five nodes that we archive and ship off to s3. We then take the =
snapshot archive generated FROM cluster-A_node1 and copy/extract/restore =
TO cluster-B_node1,  then we take the snapshot archive FROM =
cluster-A_node2 and copy/extract/restore TO cluster-B_node2 and so on =
and so forth.

>=20
> To do what I think you're attempting to do, you have basically two =
options.
>=20
> 1) don't use vnodes and do a 1:1 copy of snapshots
> 2) use vnodes and
>    a) get a list of tokens per node from the source cluster
>    b) put a comma delimited list of these in initial_token in =
cassandra.yaml on target nodes
>    c) probably have to un-set num_tokens (this part is unclear to me, =
you will have to test..)
>    d) set auto_bootstrap:false in cassandra.yaml
>    e) start target nodes, they will not-bootstrap into the same ranges =
as the source cluster
>    f) load schema / copy data into datadir (being careful of =
https://issues.apache.org/jira/browse/CASSANDRA-6245)
>    g) restart node or use nodetool refresh (I'd probably restart the =
node to avoid the bulk rename that refresh does) to pick up sstables
>    h) remove auto_bootstrap:false from cassandra.yaml
>   =20
> I *believe* this *should* work, but have never tried it as I do not =
currently run with vnodes. It should work because it basically makes =
implicit vnode tokens explicit in the conf file. If it *does* work, I'd =
greatly appreciate you sharing details of your experience with the list.=20=


I'll start with parsing out the token ranges that our vnode config ends =
up assigning in cluster-A, and doing some creative config work on the =
target cluster-B we are trying to restore to as you have suggested. =
Depending on what additional comments/recommendation you or another =
member of the list may have (if any) based on the clarification I've =
made above, I will absolutely report back my findings here.


>=20
> General reference on tasks of this nature (does not consider vnodes, =
but treat vnodes as "just a lot of physical nodes" and it is mostly =
relevant) : =
http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra
>=20
> =3DRob


--Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=iso-8859-1

<html><head><meta http-equiv=3D"Content-Type" content=3D"text/html =
charset=3Diso-8859-1"></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Thank =
you for the detailed reply Rob! &nbsp;I have replied to your comments =
in-line below;<div><br><div><div>On Nov 14, 2013, at 1:15 PM, Robert =
Coli &lt;<a =
href=3D"mailto:rcoli@eventbrite.com">rcoli@eventbrite.com</a>&gt; =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><div dir=3D"ltr">On Thu, Nov 14, 2013 at 12:37 PM, David =
Laube <span dir=3D"ltr">&lt;<a href=3D"mailto:dave@stormpath.com" =
target=3D"_blank">dave@stormpath.com</a>&gt;</span> wrote:<br><div =
class=3D"gmail_extra"><div class=3D"gmail_quote">
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex">It is almost as if the data only exists =
on some of the nodes, or perhaps the token ranges are dramatically =
different --again, we are using vnodes so I am not exactly sure how this =
plays into the equation.</blockquote>
<div><br></div><div>The token ranges are dramatically different, due to =
vnode random token selection from not setting initial_token, and setting =
num_tokens.</div><div><br></div><div>You can verify this by listing the =
tokens per physical node in nodetool gossipinfo or (iirc) nodetool =
status.</div>
<div><div>&nbsp;</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px =
0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left=
-style:solid;padding-left:1ex"><span =
style=3D"font-family:arial,sans-serif;font-size:12.800000190734863px">5. =
Copy 1 of the 5 snapshot archives from cluster-A to each of the five =
nodes in the new cluster-B ring.</span><br>
</blockquote></div><div><span =
style=3D"font-family:arial,sans-serif;font-size:12.800000190734863px"><br>=
</span></div><div><span =
style=3D"font-family:arial,sans-serif;font-size:12.800000190734863px">I =
don't understand this at all, do you mean that you are using one source =
node's data to load each of of the target nodes? Or are you just saying =
there's a 1:1 relationship between source snapshots and target nodes to =
load into? Unless you have RF=3DN, using one source for 5 target nodes =
won't =
work.</span></div></div></div></div></blockquote><div><br></div><div>We =
have configured RF=3D3 for the keyspace in question. Also, from a client =
perspective,&nbsp;<span style=3D"font-family: arial, sans-serif; =
font-size: 13px; line-height: 16px; ">we read with CL=3D1 and write with =
CL=3DQUORUM.</span>&nbsp;Since we have 5 nodes total in cluster-A, we =
snapshot keyspace_name on each of the five nodes which results in a =
snapshot directory on each of the five nodes that we archive and ship =
off to s3. We then take the snapshot archive generated FROM =
cluster-A_node1 and copy/extract/restore TO cluster-B_node1, &nbsp;then =
we&nbsp;take the snapshot archive FROM cluster-A_node2 and =
copy/extract/restore TO cluster-B_node2 and so on and so =
forth.</div><br><blockquote type=3D"cite"><div dir=3D"ltr"><div =
class=3D"gmail_extra"><div class=3D"gmail_quote">
<div><br></div><div>To do what I think you're attempting to do, you have =
basically two options.</div><div><br></div><div>1) don't use vnodes and =
do a 1:1 copy of snapshots</div><div>2) use vnodes and</div><div>&nbsp; =
&nbsp;a) get a list of tokens per node from the source cluster</div>
<div>&nbsp; &nbsp;b) put a comma delimited list of these in =
initial_token in cassandra.yaml on target nodes</div><div>&nbsp; =
&nbsp;c) probably have to un-set num_tokens (this part is unclear to me, =
you will have to test..)</div><div>&nbsp; &nbsp;d) set =
auto_bootstrap:false in cassandra.yaml</div>
<div>&nbsp; &nbsp;e) start target nodes, they will not-bootstrap into =
the same ranges as the source cluster</div><div>&nbsp; &nbsp;f) load =
schema / copy data into datadir (being careful of&nbsp;<a =
href=3D"https://issues.apache.org/jira/browse/CASSANDRA-6245">https://issu=
es.apache.org/jira/browse/CASSANDRA-6245</a>)</div>
<div>&nbsp; &nbsp;g) restart node or use nodetool refresh (I'd probably =
restart the node to avoid the bulk rename that refresh does) to pick up =
sstables</div><div>&nbsp; &nbsp;h) remove auto_bootstrap:false from =
cassandra.yaml</div><div>
&nbsp; &nbsp;</div><div>I *believe* this *should* work, but have never =
tried it as I do not currently run with vnodes. It should work because =
it basically makes implicit vnode tokens explicit in the conf file. If =
it *does* work, I'd greatly appreciate you sharing details of your =
experience with the =
list.&nbsp;</div></div></div></div></blockquote><div><br></div><div>I'll =
start with parsing out the token ranges that our vnode config ends up =
assigning in cluster-A, and doing some creative config work on the =
target cluster-B we are trying to restore to as you have =
suggested.&nbsp;Depending on what additional comments/recommendation you =
or another member of the list may have (if any) based on the =
clarification I've made above, I will absolutely report back my findings =
here.</div><div><br></div><br><blockquote type=3D"cite"><div =
dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">
<div><br></div><div>General reference on tasks of this nature (does not =
consider vnodes, but treat vnodes as "just a lot of physical nodes" and =
it is mostly relevant) :&nbsp;<a =
href=3D"http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cas=
sandra">http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cas=
sandra</a></div>
<div><br></div><div>=3DRob</div></div></div></div>
</blockquote></div><br></div></body></html>=

--Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81--