Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1C66D2C0 for ; Thu, 25 Oct 2012 15:46:07 +0000 (UTC) Received: (qmail 6390 invoked by uid 500); 25 Oct 2012 15:46:05 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 6365 invoked by uid 500); 25 Oct 2012 15:46:05 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 6356 invoked by uid 500); 25 Oct 2012 15:46:05 -0000 Delivered-To: apmail-incubator-cassandra-user@incubator.apache.org Received: (qmail 6353 invoked by uid 99); 25 Oct 2012 15:46:05 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 15:46:05 +0000 X-ASF-Spam-Status: No, hits=2.0 required=5.0 tests=SPF_NEUTRAL,URI_HEX X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: 216.139.250.139 is neither permitted nor denied by domain of shankarpnsn@gmail.com) Received: from [216.139.250.139] (HELO joe.nabble.com) (216.139.250.139) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 25 Oct 2012 15:45:59 +0000 Received: from jim.nabble.com ([192.168.236.80]) by joe.nabble.com with esmtp (Exim 4.72) (envelope-from ) id 1TRPco-000255-U7 for cassandra-user@incubator.apache.org; Thu, 25 Oct 2012 08:45:39 -0700 Date: Thu, 25 Oct 2012 08:45:38 -0700 (PDT) From: shankarpnsn To: cassandra-user@incubator.apache.org Message-ID: <1351179938911-7583395.post@n2.nabble.com> In-Reply-To: <24A11BDE-052B-4D4C-82DD-980A139DAC24@thelastpickle.com> References: <1351040699422-7583355.post@n2.nabble.com> <1351087333586-7583366.post@n2.nabble.com> <1351092512818-7583372.post@n2.nabble.com> <24A11BDE-052B-4D4C-82DD-980A139DAC24@thelastpickle.com> Subject: Re: What does ReadRepair exactly do? MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org aaron morton wrote >> 2. You do a write operation (W1) with quorom of val=2 >> node1 = val1 node2 = val2 node3 = val1 (write val2 is not complete yet) > If the write has not completed then it is not a successful write at the > specified CL as it could fail now. > > Therefor the R +W > N Strong Consistency guarantee does not apply at this > exact point in time. A read to the cluster at this exact point in time > using QUOURM may return val2 or val1. Again the operation W1 has not > completed, if read R' starts and completes while W1 is processing it may > or may not return the result of W1. I agree completely that it is fair to have this indeterminism in case of partial/failed/in-flight writes, based on what nodes respond to a subsequent read. aaron morton wrote > It's import to point out the difference between Read Repair, in the > context of the read_repair_chance setting, and Consistent Reads in the > context of the CL setting. All of this is outside of the processing of > your read request. It is separate from the stuff below. > > Inside the user read request when ReadCallback.get() is called and CL > nodes have responded the responses are compared. If a DigestMismatch > happens then a Row Repair read is started, the result of this read is > returned to the user. This Row Repair read MAY detect differences, if it > does it resolves the super set, sends the delta to the replicas and > returns the super set value to be returned to the client. > >> In this case, for read R1, the value val2 does not have a quorum. Would >> read >> R1 return val2 or val4 ? > > If val4 is in the memtable on node before the second read the result will > be val4. > Writes that happen between the initial read and the second read after a > Digest Mismatch are included in the read result. Thanks for clarifying this, Aaron. This is very much in line with what I figured out from the code and brings me back to my initial question on the point of when and what the user/client gets to see as the read result. Let us, for now, consider only the repairs initiated as a part of /consistent reads/. If the Row Repair (after resolving and sending the deltas to replicas, but not waiting for a quorum success after the repair) returns the super set value immediately to the user, wouldn't it be a breach of the consistent reads paradigm? My intuition behind saying this is because we would respond to the client without the replicas having confirmed their meeting the consistency requirement. I agree that returning val4 is the right thing to do if quorum (two) nodes among (node1,node2,node3) have the val4 at the second read after digest mismatch. But wouldn't it be incorrect to respond to user with any value when the second read (after mismatch) doesn't find a quorum. So after sending the deltas to the replicas as a part of the repair (still a part of /consistent reads/), shouldn't the value be read again to check for the presence of a quorum after the repair? In the example we had, assume the mismatch is detected during a read R1 from coordinator node C, that reaches node1, node2 State seen by C after first read R1: A second read is initiated as a part of repair for consistent read of R1. This second read observes the values (val1, val2) from (node1, node2) and sends the corresponding row repair delta to node1. I'm guessing C cannot respond back to user with val2 until C knows that node1 has actually written the value val2 thereby meeting the quorum. Is this interpretation correct ? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/What-does-ReadRepair-exactly-do-tp7583261p7583395.html Sent from the cassandra-user@incubator.apache.org mailing list archive at Nabble.com.