cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Wayne Lewis <>
Subject multiple datacenter with low replication factor - idea for greater flexibility
Date Wed, 10 Nov 2010 19:43:15 GMT

We've had Cassandra running in a single production data center now for several months and
have started detailed plans to add data center fault tolerance.

Our requirements do not appear to be solved out-of-the-box with Cassandra. I'd like to share
a solution we're planning and find others considering similar problems.

We require the following:

1. Two data centers
One is primary, the other hot standby to be used when primary fails. Of course Cassandra has
no such bias, but as will be seen below this becomes important when considering app latency.

2. No more than 3 copies of data total
We are storing blob-like objects. Cost per unit of usable storage is closely scrutinized vs
other solutions. Hence we want to keep replication factor low.
Two copies will be held in the primary DC, 1 in the secondary DC - with the corresponding
ratio of machines in each DC.

3. Immediate consistency

4. No waiting on remote data center
The application front-end runs in the primary data center and expects that operations using
a local coordinator node will not suffer a response time determined by the WAN. Hence we cannot
require a response from the node in the secondary data center to achieve quorum.

5. Ability to operate with a single working node per key, if necessary
We wish to temporarily operate with even a single working node per token in desperate situations
involving data center failures or combinations of node and data center failure.

Existing Cassandra solutions offer combinations of the above, but it is not at all clear how
to achieve all the above without custom work. 
Normal quorum with N=3 can only work with a single down node regardless of topology. Furthermore
if one node in the primary DC fails, quorum requires synchronous operations over the WAN.
NetworkTopologyStrategy is nice, but requiring quorum in the primary DC with 2 nodes means
no tolerance to a single node failure there.
If we're overlooking something I'd love to know.

Hence the following proposal for a new replication strategy we're calling SubQuorum.

In short SubQuorum allows administratively marking some nodes as being exempt from participating
in quorum. As all nodes agree as to exemption status, consistency is still guaranteed as quorum
is still achieved amongst the remaining nodes. We gain tremendous flexibility to deal with
node and DC failures. Exempt nodes, if up, still receive mutation messages as usual.

For example : If a primary DC node fails we can mark its remote counterpart exempt from quorum,
hence allowing continued operation without a synchronous call over the WAN.

Or another example : If the primary DC fails we mark all primary DC nodes exempt and move
the entire application to the secondary DC where it runs as usual but with just the one copy.

The implementation is trivial and consists of two pieces:

1. Exempt node management. The list of exempt nodes is broadcast out of band. In our case
we're leveraging puppet and a admin server.

2. We've written an implementation of AbstractReplicationStrategy that returns custom QuorumResponseHandler
and IWriteResponseHandler. These simply wait for quorum amongst non-exempt nodes.
This requires a small change to the AbstractReplicationStrategy interface to pass the endpoints
to getQuorumResponseHandler and getWriteResponseHandler, but otherwise changes are contained
in the plugin.

There is more analysis I can share if anyone is interested. But at this point I'd like to
get feedback.

Wayne Lewis

View raw message