Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id EF0A065E7 for ; Thu, 16 Jun 2011 16:06:00 +0000 (UTC) Received: (qmail 52978 invoked by uid 500); 16 Jun 2011 16:05:58 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 52948 invoked by uid 500); 16 Jun 2011 16:05:57 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 52935 invoked by uid 99); 16 Jun 2011 16:05:57 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 16:05:57 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ryan@twitter.com designates 209.85.210.172 as permitted sender) Received: from [209.85.210.172] (HELO mail-iy0-f172.google.com) (209.85.210.172) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 16 Jun 2011 16:05:51 +0000 Received: by iyn15 with SMTP id 15so1658730iyn.31 for ; Thu, 16 Jun 2011 09:05:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=twitter.com; s=google; h=domainkey-signature:mime-version:in-reply-to:references:from:date :message-id:subject:to:content-type:content-transfer-encoding; bh=fVlKVJqdycwY28ihW1lxI3SliOo+QNVL+PEbZuy59Pg=; b=DEmFqCqlMSoOFe1m2fvv9nbvJAaI6s402asU9tNKuaKwiJCVBjqM9wpVZUHDB5w4Ni wu1/crM862UDC6NzEew7BsDMxVsaNAVFxFnWwNODI8j96BDx9xWbxgQdaHeMIqcFIS77 ngaiXT9LTPJgGL2jOEZqtsjpLY1i4yT1Ty93o= DomainKey-Signature: a=rsa-sha1; c=nofws; d=twitter.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type:content-transfer-encoding; b=BbE1eodIejazgBSjqEzWMeoYGi5J8QJQavM4apWuxedNjPmyKdbmbEsw0XP6s54fiy KwfUdgK39EZtt0HrBfJZh2frxM8L0YsTwsYVQKE5W3qFXGS4iM80JefSnNcDervWPuLj b6oAh2EY/rapbFkndC54by3NAHeKGSp+vOKQA= Received: by 10.42.173.9 with SMTP id p9mr962088icz.268.1308240330109; Thu, 16 Jun 2011 09:05:30 -0700 (PDT) MIME-Version: 1.0 Received: by 10.42.219.8 with HTTP; Thu, 16 Jun 2011 09:05:10 -0700 (PDT) In-Reply-To: <4DFA1EBE.3030202@dude.podzone.net> References: <4DFA1EBE.3030202@dude.podzone.net> From: Ryan King Date: Thu, 16 Jun 2011 09:05:10 -0700 Message-ID: Subject: Re: Propose new ConsistencyLevel.ALL_AVAIL for reads To: user@cassandra.apache.org Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org On Thu, Jun 16, 2011 at 8:18 AM, AJ wrote: > Good morning all. > > Hypothetical Setup: > 1 data center > RF =3D 3 > Total nodes > 3 > > Problem: > Suppose I need maximum consistency for one critical operation; thus I > specify CL =3D ALL for reads. =C2=A0However, this will fail if only 1 rep= lica > endpoint is down. =C2=A0I don't see why this fail is necessary all of the= time > since the data could have been updated since the node became unavailable = and > it's data is old anyways. =C2=A0If only one node goes down and it has the= key I > need, then the app is not 100% available and it could take some time maki= ng > the node available again. > > Proposal: > If all of the *available* replica nodes answer the read operation and the > latest value timestamp is clearly AFTER the time the down node became > unavailable, then this situation can meet the requirements for *near* 100= % > consistency since the value in the down node would be outdated anyway. > =C2=A0Clearly, the value was updated some time *after* the node went down= or > unavailable. =C2=A0This way, you can have max availability when using rea= d with > CL.ALL... or something CL close in meaning to ALL. > > I say "near" 100% consistency to leave room for some situation where the > unavailable node was only unavailable to the coordinating node for some > reason such as a network issue and thus still received an update by some > other route after it "appeared" unavailable to the current coordinating > node. =C2=A0In a situation like this, there is a chance the read will sti= ll not > return the latest value. =C2=A0So, this will not be truly 100% consistent= which > CL.ALL guarantees. =C2=A0However, I think this logic could justify a new > consistency level slightly lower than ALL, such as ALL_AVAIL. > > What do you think? =C2=A0Is my logic correct? =C2=A0Is there a conflict w= ith the > architecture or base principles? =C2=A0This fits with the tunable consist= ency > principle for sure. I don't think this buys you anything that you can't get with quorum reads and writes. -ryan