we then take the snapshot archive generated FROM cluster-A_node1 and copy/extract/restore TO cluster-B_node1,  then we 
sounds correct.

Depending on what additional comments/recommendation you or another member of the list may have (if any) based on the clarification I've made above,

Also if you backup the system data it will bring along the tokens. This can be a pain if you want to change the cluster name. 

cheers

-----------------
Aaron Morton
New Zealand
@aaronmorton

Co-Founder & Principal Consultant
Apache Cassandra Consulting

On 15/11/2013, at 10:44 am, David Laube <dave@stormpath.com> wrote:

Thank you for the detailed reply Rob!  I have replied to your comments in-line below;

On Nov 14, 2013, at 1:15 PM, Robert Coli <rcoli@eventbrite.com> wrote:

On Thu, Nov 14, 2013 at 12:37 PM, David Laube <dave@stormpath.com> wrote:
It is almost as if the data only exists on some of the nodes, or perhaps the token ranges are dramatically different --again, we are using vnodes so I am not exactly sure how this plays into the equation.

The token ranges are dramatically different, due to vnode random token selection from not setting initial_token, and setting num_tokens.

You can verify this by listing the tokens per physical node in nodetool gossipinfo or (iirc) nodetool status.
 
5. Copy 1 of the 5 snapshot archives from cluster-A to each of the five nodes in the new cluster-B ring.

I don't understand this at all, do you mean that you are using one source node's data to load each of of the target nodes? Or are you just saying there's a 1:1 relationship between source snapshots and target nodes to load into? Unless you have RF=N, using one source for 5 target nodes won't work.

We have configured RF=3 for the keyspace in question. Also, from a client perspective, we read with CL=1 and write with CL=QUORUM. Since we have 5 nodes total in cluster-A, we snapshot keyspace_name on each of the five nodes which results in a snapshot directory on each of the five nodes that we archive and ship off to s3. We then take the snapshot archive generated FROM cluster-A_node1 and copy/extract/restore TO cluster-B_node1,  then we take the snapshot archive FROM cluster-A_node2 and copy/extract/restore TO cluster-B_node2 and so on and so forth.


To do what I think you're attempting to do, you have basically two options.

1) don't use vnodes and do a 1:1 copy of snapshots
2) use vnodes and
   a) get a list of tokens per node from the source cluster
   b) put a comma delimited list of these in initial_token in cassandra.yaml on target nodes
   c) probably have to un-set num_tokens (this part is unclear to me, you will have to test..)
   d) set auto_bootstrap:false in cassandra.yaml
   e) start target nodes, they will not-bootstrap into the same ranges as the source cluster
   f) load schema / copy data into datadir (being careful of https://issues.apache.org/jira/browse/CASSANDRA-6245)
   g) restart node or use nodetool refresh (I'd probably restart the node to avoid the bulk rename that refresh does) to pick up sstables
   h) remove auto_bootstrap:false from cassandra.yaml
   
I *believe* this *should* work, but have never tried it as I do not currently run with vnodes. It should work because it basically makes implicit vnode tokens explicit in the conf file. If it *does* work, I'd greatly appreciate you sharing details of your experience with the list. 

I'll start with parsing out the token ranges that our vnode config ends up assigning in cluster-A, and doing some creative config work on the target cluster-B we are trying to restore to as you have suggested. Depending on what additional comments/recommendation you or another member of the list may have (if any) based on the clarification I've made above, I will absolutely report back my findings here.



General reference on tasks of this nature (does not consider vnodes, but treat vnodes as "just a lot of physical nodes" and it is mostly relevant) : http://www.palominodb.com/blog/2012/09/25/bulk-loading-options-cassandra

=Rob