On Mon, Mar 4, 2013 at 9:53 PM, Kanwar Sangha <kanwar@mavenir.com> wrote:

Hi – If I configure a RF across 2 Data centres as below  and assuming 3 nodes per Data centre.


DC1: 2, DC2:2


I do a write with consistency level – local_quorum which ensures that there is no inter DC latency. Now say 2 nodes in DC1 crash and I am doing a read with CL = One. Will it return failure to client since the data is now only present in DC2 ?

No it won't (return a failure). CL.ONE means, "any one node" and this is irrelevant of datacenters.

That being, a bit a nuance should be added. Just after the 2 nodes in DC1 crashes, then you may have a small window of time during which you may get a TimeoutException, especially if your client is connecting to a coordinator in DC1. The reason is that failure detection internally is not immediate. So the coordinator in DC1 may not know that the 2 nodes in DC1 are dead yet, and may send the read query to one of them (and since they do are dead, it will timeout). However, as soon as the 2 nodes are properly detected as dead by the cluster, query will be fine. Which means in practice, that if you do get a timeout, you should retry your query. If you do so, by the time you retry, the
node will be detected as dead and the query will be fine. In fact, it could be that your client library do that retry automatically for you, so that in practice you never experience a timeout.

As for how this happens internally, this depends a bit on the read_repair_chance value for that CF. If that parameter is set to 1, then the query will be sent to all replica in the first place. However, even when that's the case, only one replica is asked for the data, other replica just send a digest of the data that is use to check consistency of all replica.  If read_repair_chance is 0, then at CL.ONE, only one replica is contacted. Value of read_repair_chance between 0 and 1 control the probably which which one of the other scenario is taken.