Hi Alex,

Can you share what replication factor you're running?
And, are you using ephemeral disks or EBS volumes?


- Dan

On Jul 3, 2012, at 5:52 PM, Alex Major wrote:

Hi Mike,

We've run a small (4 node) cluster in the EU region since September last year. We run across all 3 availability zones in the EU region, with 2 nodes in one AZ and then a further node in each AZ. The latency difference between running inside of and between AZ's has been minimal in our experience. 

It's only when we've gone cross-region that there's been latency problem. We temporarily ran a 9 node cluster across 3 regions, however even then using local quoram the latency was better than the standard datacenter - datacenter latency we're used to.

EC2Snitch is definitely the way to go in favour of NTS in my opinion. NTS was a pain to get setup with the internal (private) IP address setup, so much so that we never got it safely replicating the data as we wanted.


On Tue, Jul 3, 2012 at 2:16 PM, Michael Theroux <mtheroux2@yahoo.com> wrote:

We are currently running a web application utilizing Cassandra on EC2.  Given the recent outages experienced with Amazon, we want to consider expanding Cassandra across availability zones sooner rather than later.

We are trying to determine the optimal way to deploy Cassandra in this deployment.  We are researching the NetworkTopologyStrategy, and the EC2Snitch.  We are also interested in providing a high level of read or write consistency,

My understanding is that the EC2Snitch recognizes availability zones as racks, and regions as data-centers.  This seems to be a common configuration.  However, if we were to want to utilize queries with a READ or WRITE consistency of QUORUM, would there be a high possibility that the communication necessary to establish a quorum, across availability zones?

My understanding is that the NetworkTopologyStrategy attempts to prefer replicas be stored on other racks within the datacenter, which would equate to other availability zones in EC2.  This implies to me that in order to have the quorum of nodes necessary to achieve consistency, that Cassandra will communicate with nodes across availability zones.

First, is my understanding correct?  Second, given the high latency that can sometimes exists between availability zones, is this a problem, and instead we should treat availability zones as data centers?

Ideally, we would be able to setup a situation where we could store replicas across availability zones in case of failure, but establish a high level of read or write consistency within a single availability zone.

I appreciate your responses,