Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 87C1F183B4 for ; Tue, 25 Aug 2015 13:54:46 +0000 (UTC) Received: (qmail 79406 invoked by uid 500); 25 Aug 2015 13:54:40 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 79365 invoked by uid 500); 25 Aug 2015 13:54:40 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 79355 invoked by uid 99); 25 Aug 2015 13:54:40 -0000 Received: from Unknown (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Aug 2015 13:54:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id D9A7EEDBEE for ; Tue, 25 Aug 2015 13:54:39 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, RCVD_IN_MSPIKE_H2=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=datastax.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id IVXrG29NvMr3 for ; Tue, 25 Aug 2015 13:54:31 +0000 (UTC) Received: from mail-io0-f175.google.com (mail-io0-f175.google.com [209.85.223.175]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 2542220865 for ; Tue, 25 Aug 2015 13:54:31 +0000 (UTC) Received: by iodv127 with SMTP id v127so186737813iod.3 for ; Tue, 25 Aug 2015 06:54:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=datastax.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=u3SqTAvDaMLps9QU/9wzdo72XspXyx2QSApfeUzh2AU=; b=XOu5sV+YKmObCXXmR3RMtHNi3RnOneUdIh+2cd6fo5KJrpLjT0BrmW1ehgxkdBJapk HevVk0GnBRduyqDMYsoFHT4X9YAgQSi5cemaKDrQQ6wPMmIEs1VY2PeFm2C+RKxnCFeG h4eul1vp3XNNqkVOo0jpQgWbKgxy/xVz+2CI4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type; bh=u3SqTAvDaMLps9QU/9wzdo72XspXyx2QSApfeUzh2AU=; b=HZZjKxhpTIPLBPBk3QkPGSuZDWvdyQEegFfnf3mrKfrPIaaUqtyAHeTvCeRhhus/je nyFpsc+uIUSC2Ytg+B9NQip0pR3uUkiXPSoxkmhQZkNc4Cr8jx5BNSYKpHp1/2n7DbJm 7iECb/wKw7alcZ0RmovxzM73GdB72/0TydLI/VAkY+AJLfFSMhVnD6FF/e535akMXI9i 7gxG6Ht/Yzol37lea74iM8512epGn9ouZH98jCH6WK4HNYJiwLaSvs+4lL6D6asLhzZE VeRfJoxBal5mM4+qqa/5grB5IuWvxrcyOyQzX7VOm7+0Em1rzQQG5FmlLu0HY3I+HvJh 3mIw== X-Gm-Message-State: ALoCoQkAI+e8W+F0v1SQ0Q6YGk+r84gp+8017k4c9A9hxuFcktEE2ek97qiMEX91Aw6/lJGwJzhp X-Received: by 10.107.10.9 with SMTP id u9mr21494520ioi.118.1440510870331; Tue, 25 Aug 2015 06:54:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.50.173.40 with HTTP; Tue, 25 Aug 2015 06:54:10 -0700 (PDT) In-Reply-To: References: From: Sylvain Lebresne Date: Tue, 25 Aug 2015 15:54:10 +0200 Message-ID: Subject: Re: lightweight transactions with potential problem? To: "user@cassandra.apache.org" Content-Type: multipart/alternative; boundary=001a113ee47eb8be3e051e231261 --001a113ee47eb8be3e051e231261 Content-Type: text/plain; charset=UTF-8 > > > So you meant that the older ballot will not only reject in round-trip1 > (prepare/promise), it also can be reject in propose/accept round-trips2, Is > that correct? > Yes. > > You Said : Or more precisely, you got step 8 wrong: when a replica > PROMISE, the promise is not that they won't "promise" a ballot older than > 2,it's that they won't "accept" a ballot older than 2 > > Why step 8 wrong? I think replicas can accept any highest ballot, so > ballot 2 is the highest in step 8? what do you think? > Do you also mean replica can promise older ballot. > I shouldn't have said "wrong". What I meant is that your description of what a PROMISE meant was incomplete. It's true that in practice replicas won't promise older ballots, but it's not the important property in this case, the important property is that they also promise to not "accept" any older ballot. > > I wish you could make it more clear. > > Thank you a lot Sylvain > > Ibrahim > > > On Tue, Aug 25, 2015 at 1:40 PM, Sylvain Lebresne > wrote: > >> That scenario cannot happen. More specifically, your step 12 cannot >> happen if >> step 8 has happen. Or more precisely, you got step 8 wrong: when a replica >> PROMISE, the promise is not that they won't "promise" a ballot older than >> 2, >> it's that they won't "accept" a ballot older than 2. Therefore, after >> step 8, >> the accept from N1 will be reject in step 12 and the insert from N1 will >> be >> rejected (that is, N1 will restart the whole algorithm with a new ballot). >> >> >> On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sanosi < >> ibrahimsabattt@gmail.com> wrote: >> >>> Hi folks, >>> >>> >>> Cassandra provides *linearizable consistency (CAS, Compare-and-Set) by >>> using Paxos 4 round-trips as following* >>> >>> *1. **Prepare/promise* >>> >>> *2. **Read/result* >>> >>> *3. **Propose/accept* >>> >>> *4. **Commit/acknowledgment * >>> >>> Assume we have an application for resistering new account, I want to >>> make sure I only allow exactly one user to claim a given account. For >>> example, we do not allow two users having the same username. >>> >>> Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We >>> have two concurrent clients C1 and C2. We have replication factor 3 and the >>> partitioner has determined the primary and the replicas nodes of the INSERT >>> example are N3, N4, and N5. >>> >>> >>> The scenario happens in following order: >>> >>> 1. C1 connects to coordinator N1 and sends INSERT V1 (assume V1 >>> is username, not resister before) >>> >>> 2. N1 sends PREPARE message with ballot 1 (highest ballot have >>> seen) to N3, N4 and N5. Note that this prepare for C1 and V1. >>> >>> 3. N3, N4 and N5 send a PROMISE message to N1, to not promise any >>> with older than ballot 1. >>> >>> 4. N1 sends READ message to N3, N4 and N5 to read V1. >>> >>> 5. N3, N4 and N5 send RESULT message to N1, informing that V1 not >>> exist which results in N1 will go forward to next round. >>> >>> 6. Now C2 connects to coordinator N2 and sends INSERT V1. >>> >>> 7. N2 sends PREPARE message with ballot 2 (highest ballot after >>> re-prepare because first time, N2 does not know about ballot 1, then >>> eventual it solves and have ballot 2) to N3, N4 and N5. Note that this >>> prepare for C2 and V1. >>> >>> 8. N3, N4 and N5 send a PROMISE message to N2, to not promise any >>> with older than ballot 2. >>> >>> 9. N2 sends READ message to N3, N4 and N5 to read V1. >>> >>> 10. N3, N4 and N5 send RESULT message to N2, informing that V1 not >>> exist which results in N2 will go forward to next round. >>> >>> 11. Now N1 send PROPOSE message to N3, N4 and N5 (ballot 1, V1). >>> >>> 12. N3, N4 and N5 send ACCEPT message to N1. >>> >>> 13. N2 send PROPOSE message to N3, N4 and N5 (ballot 2, V1). >>> >>> 14. N3, N4 and N5 send ACCEPT message to N2. >>> >>> 15. N1 send COMMIT message to N3, N4 and N5 (ballot 1). >>> >>> 16. N3, N4 and N5 send ACK message to N1. >>> >>> 17. N2 send COMMIT message to N3, N4 and N5 (ballot 2). >>> >>> 18. N3, N4 and N5 send ACK message to N2. >>> >>> >>> As result, both V1 from client C1 and V1 from client C2 have written to >>> replicas N3, N4, and N5. Which I think it does not achieve the goal of *linearizable >>> consistency and CAS. * >>> >>> >>> >>> *Is that true and such scenario could be occurred?* >>> >>> >>> >>> I look forward to hearing from you. >>> >>> >>> Regards, >>> >> >> > --001a113ee47eb8be3e051e231261 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

So you meant that the older ballot will not only reject in round-trip1 = (prepare/promise), it also can be reject in propose/accept round-trips2, Is= that correct?

Yes.
=C2=A0

You Said : Or more precisely, you got step 8 = wrong: when a replica PROMISE, the promise is not that they won't "= ;promise" a ballot older than 2,it's that they won't "acc= ept" a ballot older than 2

Why step 8 wrong? I think repl= icas can accept any highest ballot, so ballot 2 is the highest in step 8? w= hat do you think?
=C2=A0Do you also mean replica can promise older ballo= t.

I shouldn= 9;t have said "wrong". What I meant is that your description of w= hat a PROMISE meant was incomplete. It's true that in practice replicas= won't promise older ballots, but it's not the important property i= n this case, the important property is that they also promise to not "= accept" any older ballot.
=C2=A0

I wish you could make it= more clear.

Thank you a lot Sylvain

Ibrahim


On Tue, Aug = 25, 2015 at 1:40 PM, Sylvain Lebresne <sylvain@datastax.com> wrote:
That sce= nario cannot happen. More specifically, your step 12 cannot happen if
=
step 8 has happen. Or more precisely, you got step 8 wrong: when a rep= lica
PROMISE, the promise is not that they won't "promis= e" a ballot older than 2,
it's that they won't "= ;accept" a ballot older than 2. Therefore, after step 8,
the= accept from N1 will be reject in step 12 and the insert from N1 will be
rejected (that is, N1 will restart the whole algorithm with a new b= allot).


=
On Tue, Aug 25, 2015 at 1:54 PM, ibrahim El-sano= si <ibrahimsabattt@gmail.com> wrote:

Hi folks,


Cassandra provides linearizable consistency (CAS, Compare-and-Set) by using Paxos 4 round-trips as following

1.=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Prepare/promise=

2.=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Read/result

3.=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 Propose/accept<= /span>

4.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 Commit/acknowledgment

Assume we have an application for resistering new ac= count, I want to make sure I only allow exactly one user to claim a given account. F= or example, we do not allow two users having the same username.

Assuming we have a cluster consist of 5 nodes N1, N2, N3, N4, and N5. We have two co= ncurrent clients C1 and C2. We have replication factor 3 and the partitioner has determined the primary and the replicas nodes of the INSERT example are N3, N4, and N5= .=C2=A0


The scenario happens = in following order:

1.=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 C1 connects to coordinator N1 and sends INSERT=C2=A0 V1 (assume V1 is username, not resister before)

2.=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 N1 sends PREPARE message with ballot 1 (highest ballot have seen) to N3, N4 and N5. Note that this prepare for C1 and V1.

3.=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 N3, N4 and N5 send a PROMI= SE message to N1, to not promise any with older than ballot 1.

4.=C2= =A0=C2=A0=C2=A0 N1 =C2=A0sends READ messa= ge to N3, N4 and N5 to read V1.

5.=C2= =A0=C2=A0=C2=A0 N3, N4 and N5 send RESULT message to N1, informing that V1 not exist which results in N1 will go forw= ard to next round.

6.=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 Now C2 connects to coordin= ator N2 and sends INSERT=C2=A0 V1.

7.=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 N2 sends PREPARE message with ballot 2 (highest ballot after re-prepare because first time, N2 does = not know about ballot 1, then eventual it solves and have ballot 2) to N3, N4 a= nd N5. Note that this prepare for C2 and V1.

8.=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 N3, N4 and N5 send a PROMI= SE message to N2, to not promise any with older than ballot 2.

9.=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0 N2 =C2=A0send= s READ message to N3, N4 and N5 to read V1.

10.=C2= =A0=C2=A0 N3, N4 and N5 send RESULT message to N2, informing that V1 not exist which results in N2 will go forw= ard to next round.

11.=C2= =A0=C2=A0 Now N1 send PROPOSE message to =C2=A0N3, N4 and N5 (ballot 1, V1).

12.=C2= =A0 N3, N4 and N5 send ACCEPT message to N1.

13.=C2= =A0 N2 send PROPOSE message to =C2=A0N3, = N4 and N5 (ballot 2, V1).

14.=C2= =A0 N3, N4 and N5 send ACCEPT mes= sage to N2.

15.=C2= =A0 N1 send COMMIT message to =C2=A0N3, N4 and N5 (ballot 1).

16.=C2= =A0=C2=A0 N3, N4 and N5 send = ACK message to N1.

17.=C2= =A0=C2=A0 N2 send COMMIT message to =C2=A0N3, N4 and N5 (ballot 2).

18.=C2= =A0 N3, N4 and N5 send ACK messag= e to N2.


As result, both V1 fr= om client C1 and V1 from client C2 have written to replicas N3, N4, and N5. Which I think it does not achieve the g= oal of linearizable consistency and CAS.

=C2=A0

Is that true and such scenario could be occurred= ?

=C2=A0

I look forward to hearing from you.


Regards,




--001a113ee47eb8be3e051e231261--