Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of etamme@gmail.com designates
 209.85.212.44 as permitted sender)
Message-ID: <4E9443EC.10609@gmail.com>
Date: Tue, 11 Oct 2011 09:26:04 -0400
From: Eric Tamme <etamme@gmail.com>
User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US;
 rv:1.9.2.23) Gecko/20110921 Lightning/1.0b2 Thunderbird/3.1.15
MIME-Version: 1.0
To: user@cassandra.apache.org
Subject: Re: Multi DC setup
References: 
 <CAF3+uoskFDq2ng00rsKGFAM5egXCdK_=UrV6XHRBtWqqJrx9AA@mail.gmail.com>
	<CALoo1W2AwTOkTx=5nCLjoCP4xXj-zTge6ZHyBmw5DTf+-+aGHA@mail.gmail.com>
	<CAF3+uovDe043HYVyy6AmQ+PrVe-Coy0WOfL=dSxu2z4rcJYUjg@mail.gmail.com>
 <CAO5xsd3wxgOPwdVESCwkkW1HVvhkS_ONkuUOTrSDdb-tOBU_gw@mail.gmail.com>
In-Reply-To: 
 <CAO5xsd3wxgOPwdVESCwkkW1HVvhkS_ONkuUOTrSDdb-tOBU_gw@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit


>> We already have two separate rings. Idea of bidirectional sync is, if one
>> ring is down, we can still send the traffic to other ring. When original
>> cluster comes back, it will pick up the data from available cluster. I'm not
>> sure if it makes sense to have separate rings or combine these two rings
>> into one.
I am not sure you fully understand how Cassandra is supposed to work - 
you do not need two rings to have two complete sets of data that you can 
"hot cutover" between.

> Cassandra doesn't have support for synchronizing data between two
> different rings. The multi-dc support in Cassandra amounts to having a
> single ring containing all nodes from all data centers. Cassandra is
> told (by configuring the snitch, such as through a property files)
> which nodes are in which data center. Using the
> NetworkTopologyStrategy, you then make sure to distribute replicas in
> DC:s as you see fit.
Using NTS you can configure a single ring into multiple "logical 
rings".  This is effectively what the property file snitch does in 
conjunction with NTS.

I gave a presentation on the NTS internals, and replicating data across 
geographically distributed data centers. You can find the slides here 
http://files.meetup.com/1794037/NTS_presentation.pdf

Also Edward Capriolio's book "high performance cassandra" has some 
recipes for using NTS.

I currently have 4 nodes in two data centers and I use NTS with property 
file snitch to write 1 copy of data to each DC (one node per DC) so that 
in the event of a total DC failure, we can still get to the data.  The 
first write is "local" and the replica is asynchronous if you set write 
consistency to 1 - so you get fast writes with distribution.

-Eric