Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6F22910A28 for ; Wed, 19 Jun 2013 17:51:22 +0000 (UTC) Received: (qmail 45747 invoked by uid 500); 19 Jun 2013 17:51:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 45728 invoked by uid 500); 19 Jun 2013 17:51:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 45715 invoked by uid 99); 19 Jun 2013 17:51:18 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jun 2013 17:51:18 +0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of fsareshwala@quantcast.com designates 64.78.22.19 as permitted sender) Received: from [64.78.22.19] (HELO EXHUB017-4.exch017.msoutlookonline.net) (64.78.22.19) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 19 Jun 2013 17:51:12 +0000 Received: from quantcast.com (72.5.114.17) by smtpx17.msoutlookonline.net (64.78.22.39) with Microsoft SMTP Server (TLS) id 8.3.264.0; Wed, 19 Jun 2013 10:50:51 -0700 Date: Wed, 19 Jun 2013 10:50:46 -0700 From: Faraaz Sareshwala To: user@cassandra.apache.org Subject: Joining distinct clusters with the same schema together Message-ID: <20130619175045.GA2194@quantcast.com> MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline User-Agent: Mutt/1.5.20 (2009-06-14) X-Virus-Checked: Checked by ClamAV on apache.org My company is planning on deploying cassandra to three separate datacenters. Each datacenter will have a cassandra cluster with a separate set of seeds specific to that datacenter. However, the cluster name will be the same. Question 1: is this enough to guarentee that the three datacenters will have distinct cassandra clusters as well? Or will one node in datacenter A still somehow be able to join datacenter B's ring. Cassandra has cross datacenter replication and we plan to use that in the future. For now, we are planning on using our own relay mechanism to transfer data changes from one datacenter to another. Each cassandra cluster in each datacenter will have the same keyspaces and column families with the same schema. Datacenter A will send mutations over this relay to datacenter B which will replay the mutation in cassandra. Therefore, datacenter A's cassandra cluster will look identical to datacenter B's cassandra cluster, but not through the cross datacenter replication that cassandra offers. Question 2: is this a sane strategy? We're trying to make the smallest possible change when deploying cassandra. Our plan is to slowly move our infrastructure over to relying more on cassandra once we can assess how it behaves with our workload. Question 3: eventually, we want to turn all these cassandra clusters into one large multi-datacenter cluster. What's the best practice to do this? Should I just add nodes from all datacenters to the list of seeds and let cassandra resolve differences? Is there another way I don't know about? Thank you, Faraaz