incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yang <>
Subject clarification of the consistency guarantees of Counters
Date Tue, 31 May 2011 00:57:46 GMT
I went through

and realize that the consistency guarantees of Counters are a bit different
from those of regular columns, so could you please confirm
that the following are true?

1) comment

still holds : "there is no way to create a write CL greater than ONE, and
thus, no defense against *permanent* failures of single machines"
2) due to the above, the best I can achieve to increase reliability is to
enable REPLICATE_ON_WRITE, but this  would still expose the recent updates
on the leader to being lost during a short interval
3) without REPLICATE_ON_WRITE (or equivalently, read repair ) I would have
to do CL=ALL on read. then in this case, if the leader fails, all future
reads fail. so for counters I have to enable
REPLICATE_ON_WRITE or set read_repair chance to a reasonably high value, and
do read CL!= ALL.

apart from the questions, some thoughts on Counters:
the idea of distributed counters can be seen, in distributed algorithms
terms, as a state machine (see Fred Schneider 93'),  where ideally we send
the messages (delta increments) to each node, and the final state (sum of
deltas, or the counter value) is deduced independently at each node.  in the
current implementation, it's really not a distributed state machine, since
state is deduced only at the leader, and what is replicated is just the
final state. in fact, the data from different leaders are orthogonal, and
within the data flow from one leader,* it's really just a master-slave
system. then we realize that this system is prone to single master failure.*

if we want to build a truely distributed state machine, I am afraid there
are no easier/faster solutions than existing ones (Paxos, etc). But I guess
that a possible solution could lay in the fact that our goal allows for a
relaxation than traditional state machine: Eventually consistent, and also
that our operations are commutative ( re-ordering 2 adds yields the same
state , when we apply the state changes ). how we take advantage of these
facts could probably enable us to come to a truely distributed counters

the route of keeping all individual updates at each node has been mentioned
in the JIRA, and later do reconciliation on the history. because messages
losses are less common than success, maybe this is not as bad a route as we


View raw message