incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Lebresne <>
Subject Re: Replication Question
Date Tue, 05 Mar 2013 09:33:48 GMT
On Mon, Mar 4, 2013 at 9:53 PM, Kanwar Sangha <> wrote:

>  Hi – If I configure a RF across 2 Data centres as below  and assuming 3
> nodes per Data centre.****
> ** **
> DC1: 2, DC2:2 ****
> ** **
> I do a write with consistency level – local_quorum which ensures that
> there is no inter DC latency. Now say 2 nodes in DC1 crash and I am doing a
> read with CL = One. Will it return failure to client since the data is now
> only present in DC2 ?

No it won't (return a failure). CL.ONE means, "any one node" and this is
irrelevant of datacenters.

That being, a bit a nuance should be added. Just after the 2 nodes in DC1
crashes, then you may have a small window of time during which you may get
a TimeoutException, especially if your client is connecting to a
coordinator in DC1. The reason is that failure detection internally is not
immediate. So the coordinator in DC1 may not know that the 2 nodes in DC1
are dead yet, and may send the read query to one of them (and since they do
are dead, it will timeout). However, as soon as the 2 nodes are properly
detected as dead by the cluster, query will be fine. Which means in
practice, that if you do get a timeout, you should retry your query. If you
do so, by the time you retry, the
node will be detected as dead and the query will be fine. In fact, it could
be that your client library do that retry automatically for you, so that in
practice you never experience a timeout.

As for how this happens internally, this depends a bit on the
read_repair_chance value for that CF. If that parameter is set to 1, then
the query will be sent to all replica in the first place. However, even
when that's the case, only one replica is asked for the data, other replica
just send a digest of the data that is use to check consistency of all
replica.  If read_repair_chance is 0, then at CL.ONE, only one replica is
contacted. Value of read_repair_chance between 0 and 1 control the probably
which which one of the other scenario is taken.


View raw message