incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <>
Subject Re: Manual Conflict Resolution in Cassandra
Date Sun, 24 Apr 2011 02:31:48 GMT
Have not read the whole thing just the time line. Couple of issues...

At t8 The request would not start as the CL level of nodes is not available, the write would
not be written to node X. The client would get an UnavailableException. In response it should
connect to a new coordinator and try again. 

At t12 if RR is enabled for the request the read is sent to all UP endpoints for the key.
Once CL requests have returned (including the data / non digest request) the responses are
repaired and a synchronous (to the read request) RR round is initiated. 

Once all the requests have responded they are compared again an async RR process is kicked
off. So it seems that in a worse case scenario two round of RR are possible, one to make sure
the correct data is returned for the request. And another to make sure that all UP replicas
agree, as it may not be the case that all UP replicas were involved in completing the request.

So as written, at t8 the write would have failed and not be stored on any nodes. So the write
at t7 would not be lost.  

I think the crux of this example is the failure mode at t8, I'm assuming Alice is connected
to node x:

1) if X is disconnected before the write starts, it will not start any write that requires
Quorum CL. Write fails with Unavailable error. 
2) If X disconnects from the network *after* sending the write messages, and all messages
are successfully  actioned (including a local write) the request will fail with a TimedOutException
as < CL nodes will respond. 
3) If X disconnects from the cluster after sending the messages, and the messages it  sends
are lost but the local write succeeds. The request will fail with a TimedOutException as <
CL nodes will respond. 

In all these cases the request is considered to have failed. The client should connect to
another node and try again. In the case of timeout the operation was not completed to the
CL level you asked for. In the case of unavailable the operation was not started.

It can look like the RR conflict resolution is a little naive here, but it's less simple when
you consider another scenario. The write at t8 failed at Quorum, and in your deployment the
client cannot connect to another node in the cluster, so your code drops the CL down to ONE
and gets the write done. You are happy that any nodes in Alice's partition see her write,
and that those in Bens partition see he's. When things get back to normal you want the most
recent write to what clients consistently see, not the most popular value. The Consistency
section here says the same, it's the
most recent value.

I tend to think of Consistency as all clients getting the same response to the same query.
Not sure if I've made things clearer, feel free to poke holes in my logic :)

Hope that helps.

On 23 Apr 2011, at 09:02, Edward Capriolo wrote:

> On Fri, Apr 22, 2011 at 4:31 PM, Milind Parikh <> wrote:
>> Is there a chance of getting manual conflict resolution in Cassandra?
>> Please see attachment for why this is important in some cases.
>> Regards
>> Milind
> I think about this often. LDAP servers like SunOne have pluggable
> conflict resolution. I could see the read-repair algorithm being
> pluggable.

View raw message