incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anthony John <chirayit...@gmail.com>
Subject Re: How does Cassandra handle failure during synchronous writes
Date Wed, 23 Feb 2011 22:48:30 GMT
Ritesh,

At CL ANY - if all endpoints are down - a HH is written. And it is a
successful write - not a failed write.

Now that does not guarantee a READ of the value just written - but that is a
risk that you take when you use the ANY CL!

HTH,

-JA

On Wed, Feb 23, 2011 at 4:40 PM, Ritesh Tijoriwala <
tijoriwala.ritesh@gmail.com> wrote:

> hi Anthony,
> While you stated the facts right, I don't see how it relates to the
> question I ask. Can you elaborate specifically what happens in the case I
> mentioned above to Dave?
>
> thanks,
> Ritesh
>
>
> On Wed, Feb 23, 2011 at 1:57 PM, Anthony John <chirayithaj@gmail.com>wrote:
>
>> Seems to me that the explanations are getting incredibly complicated -
>> while I submit the real issue is not!
>>
>> Salient points here:-
>> 1. To be guaranteed data consistency - the writes and reads have to be at
>> Quorum CL or more
>> 2. Any W/R at lesser CL means that the application has to handle the
>> inconsistency, or has to be tolerant of it
>> 3. Writing at "ANY" CL - a special case - means that writes will always go
>> through (as long as any node is up), even if the destination nodes are not
>> up. This is done via hinted handoff. But this can result in inconsistent
>> reads, and yes that is a problem but refer to pt-2 above
>> 4. At QUORUM CL R/W - after Quorum is met, hinted handoffs are used to
>> handle that case where a particular node is down and the write needs to be
>> replicated to it. But this will not cause inconsistent R as the hinted
>> handoff (in this case) only applies after Quorum is met - so a Quorum R is
>> not dependent on the down node being up, and having got the hint.
>>
>> Hope I state this appropriately!
>>
>> HTH,
>>
>> -JA
>>
>>
>> On Wed, Feb 23, 2011 at 3:39 PM, Ritesh Tijoriwala <
>> tijoriwala.ritesh@gmail.com> wrote:
>>
>>> > Read repair will probably occur at that point (depending on your
>>> config), which would cause the newest value to propagate to more replicas.
>>>
>>> Is the newest value the "quorum" value which means it is the old value
>>> that will be written back to the nodes having "newer non-quorum" value or
>>> the newest value is the real new value? :) If later, than this seems kind of
>>> odd to me and how it will be useful to any application. A bug?
>>>
>>> Thanks,
>>> Ritesh
>>>
>>>
>>> On Wed, Feb 23, 2011 at 12:43 PM, Dave Revell <dave@meebo-inc.com>wrote:
>>>
>>>> Ritesh,
>>>>
>>>> You have seen the problem. Clients may read the newly written value even
>>>> though the client performing the write saw it as a failure. When the client
>>>> reads, it will use the correct number of replicas for the chosen CL, then
>>>> return the newest value seen at any replica. This "newest value" could be
>>>> the result of a failed write.
>>>>
>>>> Read repair will probably occur at that point (depending on your
>>>> config), which would cause the newest value to propagate to more replicas.
>>>>
>>>> R+W>N guarantees serial order of operations: any read at CL=R that
>>>> occurs after a write at CL=W will observe the write. I don't think this
>>>> property is relevant to your current question, though.
>>>>
>>>> Cassandra has no mechanism to "roll back" the partial write, other than
>>>> to simply write again. This may also fail.
>>>>
>>>> Best,
>>>> Dave
>>>>
>>>>
>>>> On Wed, Feb 23, 2011 at 10:12 AM, <tijoriwala.ritesh@gmail.com> wrote:
>>>>
>>>>> Hi Dave,
>>>>> Thanks for your input. In the steps you mention, what happens when
>>>>> client tries to read the value at step 6? Is it possible that the client
may
>>>>> see the new value? My understanding was if R + W > N, then client
will not
>>>>> see the new value as Quorum nodes will not agree on the new value. If
that
>>>>> is the case, then its alright to return failure to the client. However,
if
>>>>> not, then it is difficult to program as after every failure, you as an
>>>>> client are not sure if failure is a pseudo failure with some side effects
or
>>>>> real failure.
>>>>>
>>>>> Thanks,
>>>>> Ritesh
>>>>>
>>>>> <quote author='Dave Revell'>
>>>>>
>>>>> Ritesh,
>>>>>
>>>>> There is no commit protocol. Writes may be persisted on some replicas
>>>>> even
>>>>> though the quorum fails. Here's a sequence of events that shows the
>>>>> "problem:"
>>>>>
>>>>> 1. Some replica R fails, but recently, so its failure has not yet been
>>>>> detected
>>>>> 2. A client writes with consistency > 1
>>>>> 3. The write goes to all replicas, all replicas except R persist the
>>>>> write
>>>>> to disk
>>>>> 4. Replica R never responds
>>>>> 5. Failure is returned to the client, but the new value is still in the
>>>>> cluster, on all replicas except R.
>>>>>
>>>>> Something very similar could happen for CL QUORUM.
>>>>>
>>>>> This is a conscious design decision because a commit protocol would
>>>>> constitute tight coupling between nodes, which goes against the
>>>>> Cassandra
>>>>> philosophy. But unfortunately you do have to write your app with this
>>>>> case
>>>>> in mind.
>>>>>
>>>>> Best,
>>>>> Dave
>>>>>
>>>>> On Tue, Feb 22, 2011 at 8:22 PM, tijoriwala.ritesh <
>>>>> tijoriwala.ritesh@gmail.com> wrote:
>>>>>
>>>>> >
>>>>> > Hi,
>>>>> > I wanted to get details on how does cassandra do synchronous writes
>>>>> to W
>>>>> > replicas (out of N)? Does it do a 2PC? If not, how does it deal
with
>>>>> > failures of of nodes before it gets to write to W replicas? If the
>>>>> > orchestrating node cannot write to W nodes successfully, I guess
it
>>>>> will
>>>>> > fail the write operation but what happens to the completed writes
on
>>>>> X (W
>>>>> > >
>>>>> > X) nodes?
>>>>> >
>>>>> > Thanks,
>>>>> > Ritesh
>>>>> > --
>>>>> > View this message in context:
>>>>> >
>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055152.html
>>>>> > Sent from the cassandra-user@incubator.apache.org mailing list
>>>>> archive at
>>>>> > Nabble.com.
>>>>> >
>>>>>
>>>>> </quote>
>>>>> Quoted from:
>>>>>
>>>>> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/How-does-Cassandra-handle-failure-during-synchronous-writes-tp6055152p6055408.html
>>>>>
>>>>
>>>>
>>>
>>
>

Mime
View raw message