cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <>
Subject Re: clarification of the consistency guarantees of Counters
Date Tue, 31 May 2011 08:35:34 GMT
never mind ,  I see that if leader/owner dies, the other replicas can simply
use whoever has the highest count of the leader bucket,
though not the authoritative number

On Tue, May 31, 2011 at 1:21 AM, Yang <> wrote:

> thanks Sylvain, I agree with what you said for the first few paragraphs
> ---- Jeremy corrected me just now.
> regarding the last point, you are right in using the term "by operation",
> but you should also note that it's a leader
> "data ownership", in the meaning that the leader has the authoritative
> power when it comes to reconciliation on that
> bucket of count owned by the leader -----  yes you've convinced me that we
> DO need to use CL > ONE, but for the sake of
> argument, if CL = ONE is used, the leader's data loss causes the other
> replicas to not being able to reconcile, that's what I mean.
> but anyway it's not relevant now since CL can be > ONE
> but I'd really appreciate if you could give some review to my newer post on
> FIFO, I think that could be an interesting approach
> yang
> On Tue, May 31, 2011 at 12:59 AM, Sylvain Lebresne <>wrote:
>>  >apart from the questions, some thoughts on Counters:
>> >the idea of distributed counters can be seen, in distributed algorithms
>> terms, as a state machine (see Fred Schneider 93'),  where ideally we send
>> the messages (delta increments) to each node, and the final state (sum of
>> deltas, or the counter value) is deduced independently at each node.  in the
>> current implementation, it's really not a distributed state machine, since
>> state is deduced only at the leader, and what is replicated is just the
>> final state. in fact, the data from different leaders are orthogonal, and
>> within the data flow from one leader, it's really just a master-slave
>> system. then we realize that this system is prone to single master failure.
>> Don't get fooled by the term 'leader': there is one leader *by
>> operation*, not one global leader. Again, the leader of an operation
>> is really just 'the first of the replica we're replicating to'.
>> It's not more a master-slave design than regular writes are because
>> they use a distinguished coordinator node for each operation. And it's
>> not prone to single node failure because if you do counter increments
>> at CL.QUORUM against say a cluster with RF=3, then you will still be
>> able to write and read even if one node is down and which node exactly
>> doesn't matter at all.
>> --
>> Sylvain

View raw message