From user-return-10971-apmail-cassandra-user-archive=cassandra.apache.org@cassandra.apache.org Sat Dec 04 06:01:52 2010 Return-Path: Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: (qmail 23646 invoked from network); 4 Dec 2010 06:01:52 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 4 Dec 2010 06:01:52 -0000 Received: (qmail 99481 invoked by uid 500); 4 Dec 2010 06:01:49 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 99324 invoked by uid 500); 4 Dec 2010 06:01:49 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 99316 invoked by uid 99); 4 Dec 2010 06:01:48 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Dec 2010 06:01:48 +0000 X-ASF-Spam-Status: No, hits=1.5 required=10.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS,T_TO_NO_BRKTS_FREEMAIL X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of dan.hendry.junk@gmail.com designates 209.85.216.179 as permitted sender) Received: from [209.85.216.179] (HELO mail-qy0-f179.google.com) (209.85.216.179) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Dec 2010 06:01:40 +0000 Received: by qyk11 with SMTP id 11so10928019qyk.10 for ; Fri, 03 Dec 2010 22:01:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:received:received:in-reply-to :references:date:message-id:subject:from:to:content-type; bh=rDaq3PbtdVXeCO71knTmJiXjWUzj924QVH0++oH5AxU=; b=SK306t8Dno8U+JyzMmyMU73wVrRe77sRQ9H8ztpBADhfNMmQ2yXKKfbfrXjkQD6dWL Gv/CTgN+bHl+biUidKa6lrijiu5UpA2uYpQ8MHH48aaixntHTabY/jyUmhJgFFbGbYOB 7MZnW5y7Mc5u0b2ZpskfwFQRkkaA5jxKY+EWw= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; b=FeVs0kxBC/9jwvwOVyOcUlArjnaYIkMpby63W0DJ9ul4S1gMC1O8ZshRiUIkXsXVQ6 1JunRCOK+zRGiSl3+bkxLuxT8lPrTvkCVs8un9PoUmqlrx4IxPqXVQ+IvRQMr2cqtErd HuWMNqs62Rl+iEE6rNMhYtAyYgqc8heJq4F9o= MIME-Version: 1.0 Received: by 10.220.162.18 with SMTP id t18mr544115vcx.239.1291442479518; Fri, 03 Dec 2010 22:01:19 -0800 (PST) Received: by 10.220.188.9 with HTTP; Fri, 3 Dec 2010 22:01:19 -0800 (PST) In-Reply-To: References: Date: Sat, 4 Dec 2010 01:01:19 -0500 Message-ID: Subject: Re: Confused about consistency From: Dan Hendry To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=001485ea3c743d54c704968f62e6 X-Virus-Checked: Checked by ClamAV on apache.org --001485ea3c743d54c704968f62e6 Content-Type: text/plain; charset=ISO-8859-1 Doesn't consistency level ALL=QUORUM at RF=2 ? I have not had a chance to test your fix but I don't THINK this is the issue. If it is the issue, how do consistency levels ALL and QUORUM differ at this replication factor? On Sat, Dec 4, 2010 at 12:03 AM, Jonathan Ellis wrote: > I think you are running into > https://issues.apache.org/jira/browse/CASSANDRA-1316, where when an > inconsistency on QUORUM/ALL is discovered it always peformed the > repair at QUORUM instead of the original CL. Thus, reading at ALL you > would see the correct answer on the 2nd read but you weren't > guaranteed to see it on the first. > > This was fixed in 0.6.4 but apparently I botched the merge to the 0.7 > branch. I corrected that just now, so when you update, you should be > good to go. > > On Fri, Dec 3, 2010 at 9:19 PM, Dan Hendry > wrote: > > I am seeing fairly strange, behavior in my Cassandra cluster. > > Setup > > - 3 nodes (lets call them nodes 1 2 and 3) > > - RF=2 > > - A set of servers (producers) which which write data to the cluster at > > consistency level ONE > > - A set of servers (consumers/processors) which read data from the > cluster > > at consistency level ALL > > - Cassandra 0.7 (recent out of the svn branch, post beta 3) > > - Clients use the pelops library > > Situation: > > - Everything is humming along nicely > > - A Cassandra node (say 3) goes down (even with 24 GB of ram, OOM errors > > are the bain of my existence) > > - Producers continue to happily write to the cluster but consumers start > > complaining by throwing TimeOutExceptions and UnavailableExceptions. > > - I stagger out of bed in the middle of the night and restart Cassandra > on > > node 3. > > - The consumers stop complaining and get back to business but generate > > garbage data for the period node 3 was down. Its almost like half the > data > > is missing half the time. (Again, I am reading at consistency level ALL). > > - I force the consumers to reprocess data for the period node 3 was > down. > > They generate accurate output which is different from the first time > round. > > To be explicit, what seems to be happening is first read at consistency > ALL > > gives "A,C,E" (for example) and the second read at consistency level ALL > > gives "A,B,C,D,E". Is this a Cassandra bug? Is my knowledge of > consistency > > levels flawed? My understanding is that you could achieve strongly > > consistent behavior by writing at ONE and reading at ALL. > > After this experience, my theory (uneducated, untested, and > > under-researched) is that "strong consistency" applies only to column > > values, not the set of columns (or super-columns in this case) which make > up > > a row. Any thoughts? > > > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of Riptano, the source for professional Cassandra support > http://riptano.com > --001485ea3c743d54c704968f62e6 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Doesn't consistency level ALL=3DQUORUM at RF=3D2 ?=A0

I hav= e not had a chance to test your fix but I don't THINK this is the issue= . If it is the issue, how do consistency levels ALL and QUORUM differ at th= is replication factor?

On Sat, Dec 4, 2010 at 12:03 AM, J= onathan Ellis <jb= ellis@gmail.com> wrote:
I think you are running into
https://issues.apache.org/jira/browse/CASSANDRA-1316, where whe= n an
inconsistency on QUORUM/ALL is discovered it always peformed the
repair at QUORUM instead of the original CL. =A0Thus, reading at ALL you would see the correct answer on the 2nd read but you weren't
guaranteed to see it on the first.

This was fixed in 0.6.4 but apparently I botched the merge to the 0.7
branch. =A0I corrected that just now, so when you update, you should be
good to go.

On Fri, Dec 3, 2010 at 9:19 PM, Dan Hendry <dan.hendry.junk@gmail.com> wrote:
> I am seeing fairly strange, behavior in my Cassandra cluster.
> Setup
> =A0- 3 nodes (lets call them nodes 1 2 and 3)
> =A0- RF=3D2
> =A0- A set of servers=A0(producers)=A0which which write data to the cl= uster at
> consistency level ONE
> =A0- A set of servers (consumers/processors) which read data from the = cluster
> at consistency level ALL
> =A0- Cassandra 0.7 (recent out of the svn branch, post beta 3)
> =A0- Clients use the pelops library
> Situation:
> =A0- Everything is humming along nicely
> =A0- A Cassandra node (say 3) goes down (even with 24 GB of ram, OOM e= rrors
> are the bain of my existence)
> =A0- Producers continue to happily write to the cluster but consumers = start
> complaining by throwing TimeOutExceptions and UnavailableExceptions. > =A0- I stagger out of bed in the middle of the night and restart Cassa= ndra on
> node 3.
> =A0- The consumers stop complaining and get back to business but gener= ate
> garbage data for the period node 3 was down. Its almost like half the = data
> is missing half the time. (Again, I am reading at consistency level AL= L).
> =A0- I force the consumers to reprocess data for the period node 3 was= down.
> They generate accurate output which is different from the first time r= ound.
> To be explicit, what seems to be happening is first read at consistenc= y ALL
> gives "A,C,E" (for example) and the second read at consisten= cy level ALL
> gives "A,B,C,D,E". Is this a Cassandra bug? Is my knowledge = of consistency
> levels flawed? My understanding is that you could achieve strongly
> consistent behavior by writing at ONE and reading at ALL.
> After this experience, my theory (uneducated, untested, and
> under-researched) is that "strong consistency" applies only = to column
> values, not the set of columns (or super-columns in this case) which m= ake up
> a row. Any thoughts?



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

--001485ea3c743d54c704968f62e6--