Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: pass (athena.apache.org: domain of teddyyyy123@gmail.com
 designates 209.85.213.172 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=lY1M3xGqWkG86vmJmH4xvmx7vE+WgiL5OmXxhaiAkQL/nSC9LotnbJO8CAtUvrourZ
         QzqNydtyzpelP15zD13mAS0H20JcayndnQn37BP9PvD7qBfPQ/6YytYuUytQhQiLXXVN
         8e65NGomk+ay3licclwLSSunvOa0BRzR38lBE=
MIME-Version: 1.0
In-Reply-To: <BANLkTinQ7K1FnFf2vBXKXAxDx8WN1fv4SA@mail.gmail.com>
References: <BANLkTi=4psaXkyF__akHYr282Vy7kxGj=Q@mail.gmail.com>
	<BANLkTinQ7K1FnFf2vBXKXAxDx8WN1fv4SA@mail.gmail.com>
Date: Tue, 31 May 2011 01:21:47 -0700
Message-ID: <BANLkTimrcCLvLHsop4ZhW=NajQe-DBr_mQ@mail.gmail.com>
Subject: Re: clarification of the consistency guarantees of Counters
From: Yang <teddyyyy123@gmail.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=20cf305b12c253625604a48e18ed

--20cf305b12c253625604a48e18ed
Content-Type: text/plain; charset=ISO-8859-1

thanks Sylvain, I agree with what you said for the first few paragraphs ----
Jeremy corrected me just now.

regarding the last point, you are right in using the term "by operation",
but you should also note that it's a leader
"data ownership", in the meaning that the leader has the authoritative power
when it comes to reconciliation on that
bucket of count owned by the leader -----  yes you've convinced me that we
DO need to use CL > ONE, but for the sake of
argument, if CL = ONE is used, the leader's data loss causes the other
replicas to not being able to reconcile, that's what I mean.
but anyway it's not relevant now since CL can be > ONE


but I'd really appreciate if you could give some review to my newer post on
FIFO, I think that could be an interesting approach


yang


On Tue, May 31, 2011 at 12:59 AM, Sylvain Lebresne <sylvain@datastax.com>wrote:
>
> >apart from the questions, some thoughts on Counters:
> >the idea of distributed counters can be seen, in distributed algorithms
> terms, as a state machine (see Fred Schneider 93'),  where ideally we send
> the messages (delta increments) to each node, and the final state (sum of
> deltas, or the counter value) is deduced independently at each node.  in the
> current implementation, it's really not a distributed state machine, since
> state is deduced only at the leader, and what is replicated is just the
> final state. in fact, the data from different leaders are orthogonal, and
> within the data flow from one leader, it's really just a master-slave
> system. then we realize that this system is prone to single master failure.
>
> Don't get fooled by the term 'leader': there is one leader *by
> operation*, not one global leader. Again, the leader of an operation
> is really just 'the first of the replica we're replicating to'.
>
> It's not more a master-slave design than regular writes are because
> they use a distinguished coordinator node for each operation. And it's
> not prone to single node failure because if you do counter increments
> at CL.QUORUM against say a cluster with RF=3, then you will still be
> able to write and read even if one node is down and which node exactly
> doesn't matter at all.
>
> --
> Sylvain
>

--20cf305b12c253625604a48e18ed
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

thanks Sylvain, I agree with what you said for the first few paragraphs ---=
- Jeremy corrected me just now.<div><br></div><div>regarding the last point=
, you are right in using the term &quot;by operation&quot;, but you should =
also note that it&#39;s a leader</div>
<div>&quot;data ownership&quot;, in the meaning that the leader has the aut=
horitative power when it comes to reconciliation on that</div><div>bucket o=
f count owned by the leader ----- =A0yes you&#39;ve convinced me that we DO=
 need to use CL &gt; ONE, but for the sake of</div>
<div>argument, if CL =3D ONE is used, the leader&#39;s data loss causes the=
 other replicas to not being able to reconcile, that&#39;s what I mean.</di=
v><div>but anyway it&#39;s not relevant now since CL can be &gt; ONE</div>
<div><br></div><div><br></div><div>but I&#39;d really appreciate if you cou=
ld give some review to my newer post on FIFO, I think that could be an inte=
resting approach</div><div><br></div><div><br></div><div>yang</div><div>
<div><br><br><div class=3D"gmail_quote">On Tue, May 31, 2011 at 12:59 AM, S=
ylvain Lebresne <span dir=3D"ltr">&lt;<a href=3D"mailto:sylvain@datastax.co=
m">sylvain@datastax.com</a>&gt;</span> wrote:<blockquote class=3D"gmail_quo=
te" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;=
">
<div class=3D"im">
&gt;apart from the questions, some thoughts on Counters:<br>
&gt;the idea of distributed counters can be seen, in distributed algorithms=
 terms, as a state machine (see Fred Schneider 93&#39;), =A0where ideally w=
e send the messages (delta increments) to each node, and the final state (s=
um of deltas, or the counter value) is deduced independently at each node. =
=A0in the current implementation, it&#39;s really not a distributed state m=
achine, since state is deduced only at the leader, and what is replicated i=
s just the final state. in fact, the data from different leaders are orthog=
onal, and within the data flow from one leader, it&#39;s really just a mast=
er-slave system. then we realize that this system is prone to single master=
 failure.<br>

<br>
</div>Don&#39;t get fooled by the term &#39;leader&#39;: there is one leade=
r *by<br>
operation*, not one global leader. Again, the leader of an operation<br>
is really just &#39;the first of the replica we&#39;re replicating to&#39;.=
<br>
<br>
It&#39;s not more a master-slave design than regular writes are because<br>
they use a distinguished coordinator node for each operation. And it&#39;s<=
br>
not prone to single node failure because if you do counter increments<br>
at CL.QUORUM against say a cluster with RF=3D3, then you will still be<br>
able to write and read even if one node is down and which node exactly<br>
doesn&#39;t matter at all.<br>
<br>
--<br>
<font color=3D"#888888">Sylvain<br>
</font></blockquote></div><br></div></div>

--20cf305b12c253625604a48e18ed--