Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of teddyyyy123@gmail.com
 designates 209.85.161.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:date:message-id:subject:from:to:content-type;
        b=L1TT5CZqYSNlfboAArNZhDJDZXqWLE2kBoKC7Eud9fPS+lNk01Jszja/WcssABcB9/
         AAGX8yNtx091xV6Tr5dz9V9KSEks4Rk3rYLEhVRkw2ewTrGHQtlcuceFlc1yfp6EJHQQ
         9BGgqm7IJjLrXu+UevqtPznrWWk1YPrIZtU30=
MIME-Version: 1.0
Date: Mon, 30 May 2011 17:57:46 -0700
Message-ID: <BANLkTi=4psaXkyF__akHYr282Vy7kxGj=Q@mail.gmail.com>
Subject: clarification of the consistency guarantees of Counters
From: Yang <teddyyyy123@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=20cf300fb31767c5e104a487e4e9

--20cf300fb31767c5e104a487e4e9
Content-Type: text/plain; charset=ISO-8859-1

I went through https://issues.apache.org/jira/browse/CASSANDRA-1072

and realize that the consistency guarantees of Counters are a bit different
from those of regular columns, so could you please confirm
that the following are true?

1) comment
https://issues.apache.org/jira/browse/CASSANDRA-1072?focusedCommentId=12900659&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12900659

still holds : "there is no way to create a write CL greater than ONE, and
thus, no defense against *permanent* failures of single machines"
2) due to the above, the best I can achieve to increase reliability is to
enable REPLICATE_ON_WRITE, but this  would still expose the recent updates
on the leader to being lost during a short interval
3) without REPLICATE_ON_WRITE (or equivalently, read repair ) I would have
to do CL=ALL on read. then in this case, if the leader fails, all future
reads fail. so for counters I have to enable
REPLICATE_ON_WRITE or set read_repair chance to a reasonably high value, and
do read CL!= ALL.


apart from the questions, some thoughts on Counters:
the idea of distributed counters can be seen, in distributed algorithms
terms, as a state machine (see Fred Schneider 93'),  where ideally we send
the messages (delta increments) to each node, and the final state (sum of
deltas, or the counter value) is deduced independently at each node.  in the
current implementation, it's really not a distributed state machine, since
state is deduced only at the leader, and what is replicated is just the
final state. in fact, the data from different leaders are orthogonal, and
within the data flow from one leader,* it's really just a master-slave
system. then we realize that this system is prone to single master failure.*

if we want to build a truely distributed state machine, I am afraid there
are no easier/faster solutions than existing ones (Paxos, etc). But I guess
that a possible solution could lay in the fact that our goal allows for a
relaxation than traditional state machine: Eventually consistent, and also
that our operations are commutative ( re-ordering 2 adds yields the same
state , when we apply the state changes ). how we take advantage of these
facts could probably enable us to come to a truely distributed counters
solution.

the route of keeping all individual updates at each node has been mentioned
in the JIRA, and later do reconciliation on the history. because messages
losses are less common than success, maybe this is not as bad a route as we
thought??


Thanks
Yang

--20cf300fb31767c5e104a487e4e9
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

I went through=A0<a href=3D"https://issues.apache.org/jira/browse/CASSANDRA=
-1072" target=3D"_blank">https://issues.apache.org/jira/browse/CASSANDRA-10=
72</a><div><br></div><div>and realize that the consistency guarantees of Co=
unters are a bit different from those of regular columns, so could you plea=
se confirm</div>


<div>that the following are true?</div><div><br></div><div>1) comment=A0<a =
href=3D"https://issues.apache.org/jira/browse/CASSANDRA-1072?focusedComment=
Id=3D12900659&amp;page=3Dcom.atlassian.jira.plugin.system.issuetabpanels:co=
mment-tabpanel#comment-12900659" target=3D"_blank">https://issues.apache.or=
g/jira/browse/CASSANDRA-1072?focusedCommentId=3D12900659&amp;page=3Dcom.atl=
assian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12900659<=
/a>=A0</div>


<div>still holds : &quot;<span style=3D"font-family:arial, FreeSans, Helvet=
ica, sans-serif;font-size:14px;line-height:20px">there is no way to create =
a write CL greater than ONE,=A0</span><span style=3D"font-family:arial, Fre=
eSans, Helvetica, sans-serif;font-size:14px;line-height:20px">and thus, no =
defense against=A0<em>permanent</em>=A0failures of single machines</span><s=
pan style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font-size:1=
4px;line-height:20px">&quot;=A0</span></div>


<div><span style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font=
-size:14px;line-height:20px">2) due to the above, the best I can achieve to=
 increase reliability is to enable REPLICATE_ON_WRITE, but this =A0would st=
ill expose the recent updates on the leader to being lost during a short in=
terval</span></div>


<div><span style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font=
-size:14px;line-height:20px">3) without REPLICATE_ON_WRITE (or equivalently=
, read repair ) I would have to do CL=3DALL on read. then in this case, if =
the leader fails, all future reads fail. so for counters I have to enable=
=A0</span></div>


<div><span style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font=
-size:14px;line-height:20px">REPLICATE_ON_WRITE or set read_repair chance t=
o a reasonably high value, and do read CL!=3D ALL.</span></div>
<div><span style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font=
-size:14px;line-height:20px"><br></span></div><div><span style=3D"font-fami=
ly:arial, FreeSans, Helvetica, sans-serif;font-size:14px;line-height:20px">=
<br>


</span></div><div><span style=3D"font-family:arial, FreeSans, Helvetica, sa=
ns-serif;font-size:14px;line-height:20px"><br></span></div><div><span style=
=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font-size:14px;line-=
height:20px"><br>


</span></div><div><span style=3D"font-family:arial, FreeSans, Helvetica, sa=
ns-serif;font-size:14px;line-height:20px"><br></span></div><div><span style=
=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font-size:14px;line-=
height:20px"><br>


</span></div><div><span style=3D"font-family:arial, FreeSans, Helvetica, sa=
ns-serif;font-size:14px;line-height:20px">apart from the questions, some th=
oughts on Counters:</span></div><div><span style=3D"font-family:arial, Free=
Sans, Helvetica, sans-serif;font-size:14px;line-height:20px">the idea of di=
stributed counters can be seen, in distributed algorithms terms, as a state=
 machine (see Fred Schneider 93&#39;), =A0where ideally we send the message=
s (delta increments) to each node, and the final state (sum of deltas, or t=
he counter value) is deduced independently at each node. =A0in the current =
implementation, it&#39;s really not a distributed state machine, since stat=
e is deduced only at the leader, and what is replicated is just the final s=
tate. in fact, the data from different leaders are orthogonal, and within t=
he data flow from one leader,<i> it&#39;s really just a master-slave system=
. then we realize that this system is prone to single master failure.</i>

</span></div><div><span style=3D"font-family:arial, FreeSans, Helvetica, sa=
ns-serif;font-size:14px;line-height:20px"><br></span></div><div><span style=
=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font-size:14px;line-=
height:20px">if we want to build a truely distributed state machine, I am a=
fraid there are no easier/faster solutions than existing ones (Paxos, etc).=
 But I guess that a possible solution could lay in the fact that our goal a=
llows for a relaxation than traditional state machine: Eventually consisten=
t, and also that our operations are commutative ( re-ordering 2 adds yields=
 the same state , when we apply the state changes ). how we take advantage =
of these facts could probably enable us to come to a truely distributed cou=
nters solution.</span></div>
<div><span style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font=
-size:14px;line-height:20px"><br></span></div><div><span style=3D"font-fami=
ly:arial, FreeSans, Helvetica, sans-serif;font-size:14px;line-height:20px">=
the route of keeping all individual updates at each node has been mentioned=
 in the JIRA, and later do reconciliation on the history. because messages =
losses are less common than success, maybe this is not as bad a route as we=
 thought??</span></div>
<div><span style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font=
-size:14px;line-height:20px"><br></span></div><div><span style=3D"font-fami=
ly:arial, FreeSans, Helvetica, sans-serif;font-size:14px;line-height:20px">=
<br>
</span></div><div><span style=3D"font-family:arial, FreeSans, Helvetica, sa=
ns-serif;font-size:14px;line-height:20px">Thanks</span></div><div><span sty=
le=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font-size:14px;lin=
e-height:20px">Yang</span></div>
<div><span style=3D"font-family:arial, FreeSans, Helvetica, sans-serif;font=
-size:14px;line-height:20px"><br></span></div>

--20cf300fb31767c5e104a487e4e9--