Makes sense ! Thanks.
Just a quick follow-up:
Now I understand the write is not made to coordinator (unless it is part of the replica for that key). But does the write column traffic 'flow' through the coordinator node. For a 2G column write, will I see 2G network traffic on the coordinator node  or just a few bytes of traffic on the co-ordinator of it reading the key and talking to nodes/client etc ?

This will be a factor for us. So need to make sure exactly.

On Tue, Feb 15, 2011 at 5:02 PM, Matthew Dennis <> wrote:
It doesn't write anything to the coordinator node, it just forwards it to nodes in the replica set for that row key.

write goes to some node (coordinator, i.e. whatever node you connected to).
coordinator looks at key, determines which nodes are responsible for it.
in parallel it forwards the requests to those nodes (in the case it is in the replica set for that key, it will write it locally in parallel with the writes that were forwarded).
the coordinator waits until it has the appropriate number of responses to meet your consistency level from the nodes in the replica set for the key (possibly including itself).
the coordinator determines the correct value to send to the client based on the responses it receives and then sends it.

On Tue, Feb 15, 2011 at 3:55 PM, A J <> wrote:
1. That is somewhat disappointing. Wish the redundancy of write on the coordinator node could have been avoided somehow.
Does the write on the coordinator node (incase it is not part of the N replica nodes for that key) get deleted before response of the write is returned back to the client ?

On Tue, Feb 15, 2011 at 4:40 PM, Matthew Dennis <> wrote:
1. Yes, the coordinator node propagates requests to the correct nodes.

2. most (all?) higher level clients (pycassa, hector, etc) load balance for you.  In general your client and/or the caller of the client needs to catch exceptions and retry.  If you're using RRDNS and some of the nodes are temporarily down, you wouldn't bother to update DNS; your client would just route to some other node that is up after noticing the first node is down.

In general you don't want a load balancer in front of the nodes as the load balancer itself becomes a SPOF as well as a performance bottleneck (not to mention the extra cost and complexity).  By far the most common setup is to have the clients load balance for you, coupled with retry logic in your application.

On Tue, Feb 15, 2011 at 2:45 PM, A J <> wrote:
>From my reading it seems like the node that the client connects to becomes the coordinator node. Questions:

1. Is it true that the write first happens on the coordinator node and then the coordinator node propagates it to the right primary node and the replicas ? In other words if I have a 2G write, would the 2G be transferred first to the coordinator node or is it just a witness and just waits for the transfer to happen directly between the client and required right nodes ?

2. How do you load-balance between the different nodes to give all equal chance to become co-ordinator node ? Does the client need a sort of round-robin DNS balancer ? if so, what if some of the nodes drop off. How to inform the DNS balancer  ?
Or do I need a proper load balancer in front that looks at the traffic on each node and accordingly selects a co-ordinator node ? What is more pervalent ?