Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id AF2E0D4C5 for ; Thu, 25 Oct 2012 16:38:09 +0000 (UTC) Received: (qmail 10780 invoked by uid 500); 25 Oct 2012 16:38:07 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 10751 invoked by uid 500); 25 Oct 2012 16:38:07 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 10742 invoked by uid 99); 25 Oct 2012 16:38:07 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 16:38:07 +0000 X-ASF-Spam-Status: No, hits=3.1 required=5.0 tests=FREEMAIL_ENVFROM_END_DIGIT,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,URI_HEX X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of owenzhang1990@gmail.com designates 209.85.215.172 as permitted sender) Received: from [209.85.215.172] (HELO mail-ea0-f172.google.com) (209.85.215.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 16:38:02 +0000 Received: by mail-ea0-f172.google.com with SMTP id k13so673270eaa.31 for ; Thu, 25 Oct 2012 09:37:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=q9xrBWqLkfadLrbXIdgnD+WCAZ73BA4NdTt8ogGSE9c=; b=hKoHVbotmIlQSa8no71wFyX6UbzeQORVWkJuwdObKbfTSblkVbIMNYp/8YTCfLzOID oUMNXq/xTwkdZsY96+3ypKRyxSbh8VBOwK5sr8WrL633DB/Frj1aNaR3TUSV78N1L9qJ vwY7inUpQOz5DZpbj3SnJGxeWST3MPqS+KthqDw7pVncuXSd5UW5P68zMgNiX2gQC1Kn Yv12L6o2tEHlSAEO4HPdBhD4QV7CnnnGF54LY4nNSSpY9r2MXn7Qqkqs4q6UIMiAyJL7 F5VdiwB2b08oiQbpzR1OifI7+jj26OXBUvaarLuGHjVUeGb+qibOjI8AXOW6XB754mCz 8ytQ== MIME-Version: 1.0 Received: by 10.14.194.72 with SMTP id l48mr27723667een.9.1351183061139; Thu, 25 Oct 2012 09:37:41 -0700 (PDT) Received: by 10.14.223.67 with HTTP; Thu, 25 Oct 2012 09:37:41 -0700 (PDT) In-Reply-To: References: <1351179938911-7583395.post@n2.nabble.com> Date: Fri, 26 Oct 2012 00:37:41 +0800 Message-ID: Subject: Re: What does ReadRepair exactly do? From: Manu Zhang To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b343a3662bfd404cce4d289 X-Virus-Checked: Checked by ClamAV on apache.org --047d7b343a3662bfd404cce4d289 Content-Type: text/plain; charset=ISO-8859-1 read quorum doesn't mean we read newest values from a quorum number of replicas but to ensure we read at least one newest value as long as write quorum succeeded beforehand and W+R > N. On Fri, Oct 26, 2012 at 12:00 AM, Hiller, Dean wrote: > Kind of an interesting question > > I think you are saying if a client read resolved only the two nodes as > said in Aaron's email back to the client and read -repair was kicked off > because of the inconsistent values and the write did not complete yet and > I guess you would have two nodes go down to lose the value right after the > read, and before write was finished such that the client read a value that > was never stored in the database. The odds of two nodes going out are > pretty slim though. > > Or, what if the node with part of the write went down, as long as the > client stays up, he would complete his write on the other two nodes. > Seems to me as long as two nodes don't fail, you are reading at quorum and > fit with the consistency model since you get a value that will be on two > nodes in the immediate future. > > Thanks, > Dean > > On 10/25/12 9:45 AM, "shankarpnsn" wrote: > > >aaron morton wrote > >>> 2. You do a write operation (W1) with quorom of val=2 > >>> node1 = val1 node2 = val2 node3 = val1 (write val2 is not complete > >>>yet) > >> If the write has not completed then it is not a successful write at the > >> specified CL as it could fail now. > >> > >> Therefor the R +W > N Strong Consistency guarantee does not apply at > >>this > >> exact point in time. A read to the cluster at this exact point in time > >> using QUOURM may return val2 or val1. Again the operation W1 has not > >> completed, if read R' starts and completes while W1 is processing it may > >> or may not return the result of W1. > > > >I agree completely that it is fair to have this indeterminism in case of > >partial/failed/in-flight writes, based on what nodes respond to a > >subsequent > >read. > > > > > >aaron morton wrote > >> It's import to point out the difference between Read Repair, in the > >> context of the read_repair_chance setting, and Consistent Reads in the > >> context of the CL setting. All of this is outside of the processing of > >> your read request. It is separate from the stuff below. > >> > >> Inside the user read request when ReadCallback.get() is called and CL > >> nodes have responded the responses are compared. If a DigestMismatch > >> happens then a Row Repair read is started, the result of this read is > >> returned to the user. This Row Repair read MAY detect differences, if it > >> does it resolves the super set, sends the delta to the replicas and > >> returns the super set value to be returned to the client. > >> > >>> In this case, for read R1, the value val2 does not have a quorum. Would > >>> read > >>> R1 return val2 or val4 ? > >> > >> If val4 is in the memtable on node before the second read the result > >>will > >> be val4. > >> Writes that happen between the initial read and the second read after a > >> Digest Mismatch are included in the read result. > > > >Thanks for clarifying this, Aaron. This is very much in line with what I > >figured out from the code and brings me back to my initial question on the > >point of when and what the user/client gets to see as the read result. Let > >us, for now, consider only the repairs initiated as a part of /consistent > >reads/. If the Row Repair (after resolving and sending the deltas to > >replicas, but not waiting for a quorum success after the repair) returns > >the > >super set value immediately to the user, wouldn't it be a breach of the > >consistent reads paradigm? My intuition behind saying this is because we > >would respond to the client without the replicas having confirmed their > >meeting the consistency requirement. > > > >I agree that returning val4 is the right thing to do if quorum (two) nodes > >among (node1,node2,node3) have the val4 at the second read after digest > >mismatch. But wouldn't it be incorrect to respond to user with any value > >when the second read (after mismatch) doesn't find a quorum. So after > >sending the deltas to the replicas as a part of the repair (still a part > >of > >/consistent reads/), shouldn't the value be read again to check for the > >presence of a quorum after the repair? > > > >In the example we had, assume the mismatch is detected during a read R1 > >from > >coordinator node C, that reaches node1, node2 > >State seen by C after first read R1: >= > >val1> > > > >A second read is initiated as a part of repair for consistent read of R1. > >This second read observes the values (val1, val2) from (node1, node2) and > >sends the corresponding row repair delta to node1. I'm guessing C cannot > >respond back to user with val2 until C knows that node1 has actually > >written > >the value val2 thereby meeting the quorum. Is this interpretation correct > >? > > > > > > > > > > > > > >-- > >View this message in context: > > > http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does > >-ReadRepair-exactly-do-tp7583261p7583395.html > >Sent from the cassandra-user@incubator.apache.org mailing list archive at > >Nabble.com. > > --047d7b343a3662bfd404cce4d289 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable read quorum doesn't mean we read newest values from a quorum number of = replicas but to ensure we read at least one newest value as long as write q= uorum succeeded beforehand and W+R > N.=A0

On Fri, Oct 26, 2012 at 12:00 AM, Hiller, Dean <Dean.Hiller@nrel.gov> wrote:
Kind of an interesting question

I think you are saying if a client read resolved only the two nodes as
said in Aaron's email back to the client and read -repair was kicked of= f
because of the inconsistent values and the write did not complete yet and I guess you would have two nodes go down to lose the value right after the<= br> read, and before write was finished such that the client read a value that<= br> was never stored in the database. =A0The odds of two nodes going out are pretty slim though.

Or, what if the node with part of the write went down, as long as the
client stays up, he would complete his write on the other two nodes.
Seems to me as long as two nodes don't fail, you are reading at quorum = and
fit with the consistency model since you get a value that will be on two nodes in the immediate future.

Thanks,
Dean

On 10/25/12 9:45 AM, "shankarpnsn" <
shankarpnsn@gmail.com> wrote:

>aaron morton wrote
>>> 2. You do a write operation (W1) with quorom of val=3D2
>>> node1 =3D val1 node2 =3D val2 node3 =3D val1 =A0(write val2 is= not complete
>>>yet)
>> If the write has not completed then it is not a successful write a= t the
>> specified CL as it could fail now.
>>
>> Therefor the R +W > N Strong Consistency guarantee does not app= ly at
>>this
>> exact point in time. A read to the cluster at this exact point in = time
>> using QUOURM may return val2 or val1. Again the operation W1 has n= ot
>> completed, if read R' starts and completes while W1 is process= ing it may
>> or may not return the result of W1.
>
>I agree completely that it is fair to have this indeterminism in case o= f
>partial/failed/in-flight writes, based on what nodes respond to a
>subsequent
>read.
>
>
>aaron morton wrote
>> It's import to point out the difference between Read Repair, i= n the
>> context of the read_repair_chance setting, and Consistent Reads in= the
>> context of the CL setting. All of this is outside of the processin= g of
>> your read request. It is separate from the stuff below.
>>
>> Inside the user read request when ReadCallback.get() is called and= CL
>> nodes have responded the responses are compared. If a DigestMismat= ch
>> happens then a Row Repair read is started, the result of this read= is
>> returned to the user. This Row Repair read MAY detect differences,= if it
>> does it resolves the super set, sends the delta to the replicas an= d
>> returns the super set value to be returned to the client.
>>
>>> In this case, for read R1, the value val2 does not have a quor= um. Would
>>> read
>>> R1 return val2 or val4 ?
>>
>> If val4 is in the memtable on node before the second read the resu= lt
>>will
>> be val4.
>> Writes that happen between the initial read and the second read af= ter a
>> Digest Mismatch are included in the read result.
>
>Thanks for clarifying this, Aaron. This is very much in line with what = I
>figured out from the code and brings me back to my initial question on = the
>point of when and what the user/client gets to see as the read result. = Let
>us, for now, consider only the repairs initiated as a part of /consiste= nt
>reads/. If the Row Repair (after resolving and sending the deltas to >replicas, but not waiting for a quorum success after the repair) return= s
>the
>super set value immediately to the user, wouldn't it be a breach of= the
>consistent reads paradigm? My intuition behind saying this is because w= e
>would respond to the client without the replicas having confirmed their=
>meeting the consistency requirement.
>
>I agree that returning val4 is the right thing to do if quorum (two) no= des
>among (node1,node2,node3) have the val4 at the second read after digest=
>mismatch. But wouldn't it be incorrect to respond to user with any = value
>when the second read (after mismatch) doesn't find a quorum. So aft= er
>sending the deltas to the replicas as a part of the repair (still a par= t
>of
>/consistent reads/), shouldn't the value be read again to check for= the
>presence of a quorum after the repair?
>
>In the example we had, assume the mismatch is detected during a read R1=
>from
>coordinator node C, that reaches node1, node2
>State seen by C after first read R1: =A0<node1 =3D val1, node2 =3D v= al 2, node3
>=3D
>val1>
>
>A second read is initiated as a part of repair for consistent read of R= 1.
>This second read observes the values (val1, val2) from (node1, node2) a= nd
>sends the corresponding row repair delta to node1. I'm guessing C c= annot
>respond back to user with val2 until C knows that node1 has actually >written
>the value val2 thereby meeting the quorum. Is this interpretation corre= ct
>?
>
>
>
>
>
>
>--
>View this message in context:
>http://cassandra-user-incubator-apache-or= g.3065146.n2.nabble.com/What-does
>-ReadRepair-exactly-do-tp7583261p7583395.html
>Sent from the ca= ssandra-user@incubator.apache.org mailing list archive at
>Nabble.com.


--047d7b343a3662bfd404cce4d289--