Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 13221D950 for ; Mon, 27 Aug 2012 08:57:17 +0000 (UTC) Received: (qmail 68775 invoked by uid 500); 27 Aug 2012 08:57:14 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 68736 invoked by uid 500); 27 Aug 2012 08:57:13 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 68711 invoked by uid 99); 27 Aug 2012 08:57:13 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 08:57:13 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=FSL_RCVD_USER,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Received: from [208.113.200.5] (HELO homiemail-a93.g.dreamhost.com) (208.113.200.5) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 27 Aug 2012 08:57:06 +0000 Received: from homiemail-a93.g.dreamhost.com (localhost [127.0.0.1]) by homiemail-a93.g.dreamhost.com (Postfix) with ESMTP id C56FA8405B for ; Mon, 27 Aug 2012 01:56:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=thelastpickle.com; h=from :content-type:message-id:mime-version:subject:date:references:to :in-reply-to; s=thelastpickle.com; bh=WqjKPzKe7S4Bg9VJLKn/J3L7B6 I=; b=lckavbaskZreMlqoU9xX0upCZ56rRZBFbVFQnByxJBr9LuTW7HpVrry++8 IrbjgpdVfrxGJsGq3AM+gNwavB42qgmbdYVOKvw1nNwqAwvcixf8/G+Lsx2YPsk3 jZLoat2rQ75M2jk+N9DfiW2bGzkB7cIRsSEncuK+GSvHeRAgE= Received: from [172.16.1.10] (unknown [203.86.207.101]) (using TLSv1 with cipher AES128-SHA (128/128 bits)) (No client certificate requested) (Authenticated sender: aaron@thelastpickle.com) by homiemail-a93.g.dreamhost.com (Postfix) with ESMTPSA id 5713884059 for ; Mon, 27 Aug 2012 01:56:40 -0700 (PDT) From: aaron morton Content-Type: multipart/alternative; boundary="Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198" Message-Id: <00CC516E-BF60-46A4-9FD7-D6A5E466A331@thelastpickle.com> Mime-Version: 1.0 (Mac OS X Mail 6.0 \(1486\)) Subject: Re: QUORUM writes, QUORUM reads -- and eventual consistency Date: Mon, 27 Aug 2012 20:56:34 +1200 References: <20120825045509.GA2237@loggly.com> <20120825062704.GA2570@loggly.com> To: user@cassandra.apache.org In-Reply-To: X-Mailer: Apple Mail (2.1486) --Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=windows-1252 > Doesn't this mean that the read does not "reflect the most recent = write"? Yes.=20 A write that fails is not a write.=20 > If it were to have read the newer data from the 1 node and then = afterwards read the old data from the other 2 then there is a = consistency problem, but in the example you give the second reader seems = to still have a consistent view. In the scenario of a TimedOutException for a write that is entirely = possible. The write is not considered to be successful at the CL = requested. So R + W > N does not hold for that datum.=20 When in doubt, ask Werner=85 when R + W > N we have strong consistency=85 "Strong consistency. After the update completes, any subsequent access = (by A, B, or C) will return the updated value." when R + W <=3D N we have weak / eventual consistency=85 "*Eventual consistency. This is a specific form of weak consistency; the = storage system guarantees that if no new updates are made to the object, = eventually *all* accesses will return the last updated value." http://queue.acm.org/detail.cfm?id=3D1466448 (emphasis added) In C* this may mean HH or RR or repair or standard CL checks kicking in = to make the second read return the "correct" consistent value.=20 > Isn't it cheaper to retry the mutation on _any exception_ than to have = a transaction in place for the majority of non failing writes? Yes (with the counter exception).=20 if you get an UnavailableException it's from the point of view of the = coordinator. it may be the case that the coordinator is isolated and all = the other nodes are UP and happy.=20 Hope that helps.=20 ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 26/08/2012, at 5:03 AM, Guillermo Winkler = wrote: > Isn't it cheaper to retry the mutation on _any exception_ than to have = a transaction in place for the majority of non failing writes? >=20 > The special case to be considered is obviously counters which are not = idempotent >=20 > https://issues.apache.org/jira/browse/CASSANDRA-2495=20 >=20 >=20 >=20 > On Sat, Aug 25, 2012 at 4:38 AM, Russell Haering = wrote: > The "issue" is that it is possible for a quorum write to return an > error, but for the result of the write to still be reflected in the > view seen by the client. There is really no performant way around this > (although reading at ALL can make it much less frequent). Guaranteeing > complete success or failure would (barring a creative solution I'm > unaware of) require a transactional commit of some sort across the > replica nodes for the key being written to. The performance tradeoff > might be desirable under some circumstances, but if this is a > requirement you should probably look at other databases. >=20 > Some good rules to play by (someone correct me if these aren't 100% = true): >=20 > 1. For writes to a single key, an UnavailableException means the write > failed totally (clients will never see the data you wrote) > 2. For writes to a single key, a TimedOutException means you cannot > know whether the write succeeded or failed > 3. For writes to multiple keys, either an UnavailableException or a > TimedOutException means you cannot know whether the write succeeded or > failed. >=20 > -Russell >=20 > On Sat, Aug 25, 2012 at 12:17 AM, Guillermo Winkler > wrote: > > Hi Philip, > > > > =46rom http://wiki.apache.org/cassandra/ArchitectureOverview > > > > Quorum write: blocks until quorum is reached > > > > By my understanding if you _did_ a quorum write it means it = successfully > > completed. > > > > Guille > > > > > >> I *think* we're saying the same thing here. The addition of the = word > >> "successful" (or something more suitable) would make the = documentation more > >> precise, not less. >=20 --Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=windows-1252  Doesn't this mean that the read does = not "reflect the most recent = write"?
Yes. 
A write that fails is not = a write. 

If it = were to have read the newer data from the 1 node and then afterwards = read the old data from the other 2 then there is a consistency problem, = but in the example you give the second reader seems to still have a = consistent view.
In the scenario of a TimedOutException = for a write that is entirely possible. The write is not considered to be = successful at the CL requested. So R + W > N does not hold for that = datum. 

When in doubt, ask = Werner=85

when R + W > N we have strong = consistency=85
"Strong consistency. After the update = completes, any subsequent access (by A, B, or C) will return the updated = value."

when R + W <=3D N we have weak = / eventual consistency=85
"*Eventual consistency. This is = a specific form of weak consistency; the storage system guarantees that = if no new updates are made to the object, eventually *all* accesses = will return the last updated value."

(emphasis = added)

In C* this may mean HH or RR or repair = or standard CL checks kicking in to make the second read return the = "correct" consistent value. 

Isn't it cheaper to retry the mutation on _any exception_ = than to have a transaction in place for the majority of non failing = writes?
Yes (with the counter = exception). 

if you get an = UnavailableException it's from the point of view of the coordinator. it = may be the case that the coordinator is isolated and all the other nodes = are UP and happy. 

Hope that = helps. 

http://www.thelastpickle.com

On 26/08/2012, at 5:03 AM, Guillermo Winkler <gwinkler@inconcertcc.com> = wrote:

Isn't it cheaper to retry the mutation on _any exception_ = than to have a transaction in place for the majority of non failing = writes?

The special case to be considered is obviously counters = which are not idempotent




On Sat, Aug 25, 2012 at = 4:38 AM, Russell Haering <russellhaering@gmail.com> wrote:
The "issue" is that it is possible for a quorum write to return = an
error, but for the result of the write to still be reflected in the
view seen by the client. There is really no performant way around = this
(although reading at ALL can make it much less frequent). = Guaranteeing
complete success or failure would (barring a creative solution I'm
unaware of) require a transactional commit of some sort across the
replica nodes for the key being written to. The performance tradeoff
might be desirable under some circumstances, but if this is a
requirement you should probably look at other databases.

Some good rules to play by (someone correct me if these aren't 100% = true):

1. For writes to a single key, an UnavailableException means the = write
failed totally (clients will never see the data you wrote)
2. For writes to a single key, a TimedOutException means you cannot
know whether the write succeeded or failed
3. For writes to multiple keys, either an UnavailableException or a
TimedOutException means you cannot know whether the write succeeded = or
failed.

-Russell


= --Apple-Mail=_DD5C7D41-448C-4DFE-960B-FADDF5EA0198--