Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 5E4AA6AC4 for ; Thu, 16 Jun 2011 20:06:22 +0000 (UTC) Received: (qmail 70240 invoked by uid 500); 16 Jun 2011 20:06:19 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 70182 invoked by uid 500); 16 Jun 2011 20:06:19 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 70007 invoked by uid 99); 16 Jun 2011 20:06:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 20:06:19 +0000 X-ASF-Spam-Status: No, hits=0.7 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [204.13.248.66] (HELO mho-01-ewr.mailhop.org) (204.13.248.66) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 20:06:11 +0000 Received: from 67-6-236-30.hlrn.qwest.net ([67.6.236.30] helo=[192.168.0.2]) by mho-01-ewr.mailhop.org with esmtpsa (TLSv1:CAMELLIA256-SHA:256) (Exim 4.72) (envelope-from ) id 1QXIp4-000OA0-4L for user@cassandra.apache.org; Thu, 16 Jun 2011 20:05:50 +0000 X-Mail-Handler: MailHop Outbound by DynDNS X-Originating-IP: 67.6.236.30 X-Report-Abuse-To: abuse@dyndns.com (see http://www.dyndns.com/services/mailhop/outbound_abuse.html for abuse reporting information) X-MHO-User: U2FsdGVkX18QNqnVrD93C6wlbuVWbYmTf2RPtZgB8sw= Message-ID: <4DFA6215.5080103@dude.podzone.net> Date: Thu, 16 Jun 2011 14:05:41 -0600 From: AJ User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.17) Gecko/20110414 Lightning/1.0b2 Thunderbird/3.1.10 MIME-Version: 1.0 To: user@cassandra.apache.org Subject: Re: Propose new ConsistencyLevel.ALL_AVAIL for reads References: <4DFA1EBE.3030202@dude.podzone.net> <4dfa3659.09a32a0a.3123.658a@mx.google.com> In-Reply-To: <4dfa3659.09a32a0a.3123.658a@mx.google.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit On 6/16/2011 10:58 AM, Dan Hendry wrote: > I think this would add a lot of complexity behind the scenes and be conceptually confusing, particularly for new users. I'm not so sure about this. Cass is already somewhat sophisticated and I don't see how this could trip-up anyone who can already grasp the basics. The only thing I am adding to the CL concept is the concept of available replication nodes, versus total replication nodes. But, don't forget; a competitor to Cass is probably in the works this very minute so constant improvement is a good thing. > The Cassandra consistency model is pretty elegant and this type of approach breaks that elegance in many ways. It would also only really be useful when the value has a high probability of being updated between a node going down and the value being read. I'm not sure what you mean. A node can be down for days during which time the value can be updated. The intention is to use the nodes available even if they fall below the RF. If there is only 1 node available for accepting a replica, that should be enough given the conditions I stated and updated below. > Perhaps the simpler approach which is fairly trivial and does not require any Cassandra change is to simply downgrade your read from ALL to QUORUM when you get an unavailable exception for this particular read. It's not so trivial, esp since you would have to build that into your client at many levels. I think it would be more appropriate (if this idea survives) to put it into Cass. > I think the general answerer for 'maximum consistency' is QUORUM reads/writes. Based on the fact you are using CL=ALL for reads I assume you are using CL=ONE for writes: this itself strikes me as a bad idea if you require 'maximum consistency for one critical operation'. > Very true. Specifying quorum for BOTH reads/writes provides the 100% consistency because of the overlapping of the availability numbers. But, only if the # of available nodes is not < RF. Upon further reflection, this idea can be used for any consistency level. The general thrust of my argument is: If a particular value can be overwritten by one process regardless of it's prior value, then that implies that the value in the down node is no longer up-to-date and can be disregarded. Just work with the nodes that are available. Actually, now that I think about it... ALL_AVAIL guarantees 100% consistency iff the latest timestamp of the value > latest unavailability time of all unavailable replica nodes for that value's row key. Unavailable is defined as a node's Cass process that is not reachable from ANY node in the cluster in the same data center. If the node in question is available to at least one node, then the read should fail as there is a possibility that the value could have been updated some other way. After looking at the code, it doesn't look like it will be difficult. Instead of skipping the request for values from the nodes when CL nodes aren't available, it would have to go ahead and request the values from the available nodes as usual and then look at the timestamps which it does anyways and compare it to the latest unavailability time of the relevant replica nodes. The code that keeps track of what nodes are down simply records the time it went down. But, I've only been looking at the code for a few days so I'm not claiming to know everything by any stretch. > Dan > Thanks for your reply. I still welcome critiques.