Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C7E1E10CBD for ; Thu, 14 Nov 2013 21:44:47 +0000 (UTC) Received: (qmail 83417 invoked by uid 500); 14 Nov 2013 21:44:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 83394 invoked by uid 500); 14 Nov 2013 21:44:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 83386 invoked by uid 99); 14 Nov 2013 21:44:45 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 21:44:44 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dave@stormpath.com designates 209.85.192.177 as permitted sender) Received: from [209.85.192.177] (HELO mail-pd0-f177.google.com) (209.85.192.177) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 14 Nov 2013 21:44:38 +0000 Received: by mail-pd0-f177.google.com with SMTP id v10so2529127pde.8 for ; Thu, 14 Nov 2013 13:44:17 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:content-type:message-id:mime-version :subject:date:references:to:in-reply-to; bh=bG4mzkUdeAhZRRjOATFmWGhKrsgW/b5iueKM688FrjI=; b=blOAKz+mWG9X70pvRQDTyzCYksWtuFkXuC4gYIk8tVD5w0UYgpUz0YyBFG6+CMSW9M gpPflpbybSOKrI683Dr/exhrYf/sR8XehZQyryKLxpGjpU18wljo8KHagWL/DwBMi3p5 dmWjHgsX210ceNvIfLp8gF1gDvqJSDFPsCgPAaBI8Ohd5aE8lwbmF2h0R8OV/mBMKqbA jFGuGjTXH1RS1RqE0WAFm8bIur48xp4SXfmstz0YnsYkd8NQecqcAXwzHNrAAoO/5S2i +5wzXcuTrYPvlyej3FMPtQxA0af0Wr/18ivdzuafGjdiS1NEjmFurd90mVEpqU4jqQL6 OiuQ== X-Gm-Message-State: ALoCoQl3UXcdnmBOZ9yDTfeismliVG/yur4e/b3XRZGXfLqHc38OowQKSiDS1MDGXnA3foJ5h203 X-Received: by 10.68.217.226 with SMTP id pb2mr3432637pbc.165.1384465456961; Thu, 14 Nov 2013 13:44:16 -0800 (PST) Received: from wsip-72-215-66-79.lv.lv.cox.net (wsip-72-215-66-79.lv.lv.cox.net. [72.215.66.79]) by mx.google.com with ESMTPSA id yg3sm1466361pab.16.2013.11.14.13.44.15 for (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Thu, 14 Nov 2013 13:44:16 -0800 (PST) From: David Laube Content-Type: multipart/alternative; boundary="Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81" Message-Id: Mime-Version: 1.0 (Mac OS X Mail 6.5 \(1508\)) Subject: Re: Read inconsistency after backup and restore to different cluster Date: Thu, 14 Nov 2013 13:44:14 -0800 References: To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1508) X-Virus-Checked: Checked by ClamAV on apache.org --Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=iso-8859-1 Thank you for the detailed reply Rob! I have replied to your comments = in-line below; On Nov 14, 2013, at 1:15 PM, Robert Coli wrote: > On Thu, Nov 14, 2013 at 12:37 PM, David Laube = wrote: > It is almost as if the data only exists on some of the nodes, or = perhaps the token ranges are dramatically different --again, we are = using vnodes so I am not exactly sure how this plays into the equation. >=20 > The token ranges are dramatically different, due to vnode random token = selection from not setting initial_token, and setting num_tokens. >=20 > You can verify this by listing the tokens per physical node in = nodetool gossipinfo or (iirc) nodetool status. > =20 > 5. Copy 1 of the 5 snapshot archives from cluster-A to each of the = five nodes in the new cluster-B ring. >=20 > I don't understand this at all, do you mean that you are using one = source node's data to load each of of the target nodes? Or are you just = saying there's a 1:1 relationship between source snapshots and target = nodes to load into? Unless you have RF=3DN, using one source for 5 = target nodes won't work. We have configured RF=3D3 for the keyspace in question. Also, from a = client perspective, we read with CL=3D1 and write with CL=3DQUORUM. = Since we have 5 nodes total in cluster-A, we snapshot keyspace_name on = each of the five nodes which results in a snapshot directory on each of = the five nodes that we archive and ship off to s3. We then take the = snapshot archive generated FROM cluster-A_node1 and copy/extract/restore = TO cluster-B_node1, then we take the snapshot archive FROM = cluster-A_node2 and copy/extract/restore TO cluster-B_node2 and so on = and so forth. >=20 > To do what I think you're attempting to do, you have basically two = options. >=20 > 1) don't use vnodes and do a 1:1 copy of snapshots > 2) use vnodes and > a) get a list of tokens per node from the source cluster > b) put a comma delimited list of these in initial_token in = cassandra.yaml on target nodes > c) probably have to un-set num_tokens (this part is unclear to me, = you will have to test..) > d) set auto_bootstrap:false in cassandra.yaml > e) start target nodes, they will not-bootstrap into the same ranges = as the source cluster > f) load schema / copy data into datadir (being careful of = https://issues.apache.org/jira/browse/CASSANDRA-6245) > g) restart node or use nodetool refresh (I'd probably restart the = node to avoid the bulk rename that refresh does) to pick up sstables > h) remove auto_bootstrap:false from cassandra.yaml > =20 > I *believe* this *should* work, but have never tried it as I do not = currently run with vnodes. It should work because it basically makes = implicit vnode tokens explicit in the conf file. If it *does* work, I'd = greatly appreciate you sharing details of your experience with the list.=20= I'll start with parsing out the token ranges that our vnode config ends = up assigning in cluster-A, and doing some creative config work on the = target cluster-B we are trying to restore to as you have suggested. = Depending on what additional comments/recommendation you or another = member of the list may have (if any) based on the clarification I've = made above, I will absolutely report back my findings here. >=20 > General reference on tasks of this nature (does not consider vnodes, = but treat vnodes as "just a lot of physical nodes" and it is mostly = relevant) : = http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra >=20 > =3DRob --Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=iso-8859-1 Thank = you for the detailed reply Rob!  I have replied to your comments = in-line below;

On Nov 14, 2013, at 1:15 PM, Robert = Coli <rcoli@eventbrite.com> = wrote:

On Thu, Nov 14, 2013 at 12:37 PM, David = Laube <dave@stormpath.com> wrote:
It is almost as if the data only exists = on some of the nodes, or perhaps the token ranges are dramatically = different --again, we are using vnodes so I am not exactly sure how this = plays into the equation.

The token ranges are dramatically different, due to = vnode random token selection from not setting initial_token, and setting = num_tokens.

You can verify this by listing the = tokens per physical node in nodetool gossipinfo or (iirc) nodetool = status.
 
5. = Copy 1 of the 5 snapshot archives from cluster-A to each of the five = nodes in the new cluster-B ring.

=
I = don't understand this at all, do you mean that you are using one source = node's data to load each of of the target nodes? Or are you just saying = there's a 1:1 relationship between source snapshots and target nodes to = load into? Unless you have RF=3DN, using one source for 5 target nodes = won't = work.

We = have configured RF=3D3 for the keyspace in question. Also, from a client = perspective, we read with CL=3D1 and write with = CL=3DQUORUM. Since we have 5 nodes total in cluster-A, we = snapshot keyspace_name on each of the five nodes which results in a = snapshot directory on each of the five nodes that we archive and ship = off to s3. We then take the snapshot archive generated FROM = cluster-A_node1 and copy/extract/restore TO cluster-B_node1,  then = we take the snapshot archive FROM cluster-A_node2 and = copy/extract/restore TO cluster-B_node2 and so on and so = forth.


To do what I think you're attempting to do, you have = basically two options.

1) don't use vnodes and = do a 1:1 copy of snapshots
2) use vnodes and
  =  a) get a list of tokens per node from the source cluster
   b) put a comma delimited list of these in = initial_token in cassandra.yaml on target nodes
  =  c) probably have to un-set num_tokens (this part is unclear to me, = you will have to test..)
   d) set = auto_bootstrap:false in cassandra.yaml
   e) start target nodes, they will not-bootstrap into = the same ranges as the source cluster
   f) load = schema / copy data into datadir (being careful of https://issu= es.apache.org/jira/browse/CASSANDRA-6245)
   g) restart node or use nodetool refresh (I'd probably = restart the node to avoid the bulk rename that refresh does) to pick up = sstables
   h) remove auto_bootstrap:false from = cassandra.yaml
   
I *believe* this *should* work, but have never = tried it as I do not currently run with vnodes. It should work because = it basically makes implicit vnode tokens explicit in the conf file. If = it *does* work, I'd greatly appreciate you sharing details of your = experience with the = list. 

I'll = start with parsing out the token ranges that our vnode config ends up = assigning in cluster-A, and doing some creative config work on the = target cluster-B we are trying to restore to as you have = suggested. Depending on what additional comments/recommendation you = or another member of the list may have (if any) based on the = clarification I've made above, I will absolutely report back my findings = here.



General reference on tasks of this nature (does not = consider vnodes, but treat vnodes as "just a lot of physical nodes" and = it is mostly relevant) : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cas= sandra

=3DRob

= --Apple-Mail=_572F3376-1CC2-45DC-8ED5-122530A3FA81--