Return-Path: X-Original-To: apmail-cassandra-user-archive@www.apache.org Delivered-To: apmail-cassandra-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id B1BCECFB7 for ; Tue, 5 Jun 2012 20:30:47 +0000 (UTC) Received: (qmail 24758 invoked by uid 500); 5 Jun 2012 20:30:45 -0000 Delivered-To: apmail-cassandra-user-archive@cassandra.apache.org Received: (qmail 24730 invoked by uid 500); 5 Jun 2012 20:30:45 -0000 Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@cassandra.apache.org Delivered-To: mailing list user@cassandra.apache.org Received: (qmail 24722 invoked by uid 99); 5 Jun 2012 20:30:45 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 20:30:45 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [209.85.160.44] (HELO mail-pb0-f44.google.com) (209.85.160.44) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 05 Jun 2012 20:30:39 +0000 Received: by pbcwy7 with SMTP id wy7so7822421pbc.31 for ; Tue, 05 Jun 2012 13:30:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anconafamily.com; s=google; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:content-type; bh=o8baZU9RbW1Fgk+NIKRDuxgTv9fV9HZH60D+RmDK+cc=; b=ALNIKeCX9UunVglw1XU5Ei1AO8wkROzc72m1hwldOG/UEmvDFjhZ3YXlQKjVFRYXDE PnN1uOsMeqKCmVDwupZG678DNJsJUaZiStwe6z1srKgnPRPgl7cnnYDoFTtVTRl+WyKY LxKIBxr60m5wVGtRDIjZW+LiKCt+Aq1aVyoWk= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=o8baZU9RbW1Fgk+NIKRDuxgTv9fV9HZH60D+RmDK+cc=; b=kvs1eyMBx10RBhXY88qpTZCpF4FEuTWEssSmylgBK+8qY9pZZRhFQI2lVNJt6HidHO aKLv021cEQxQJ4iPSlXI6/x+Km9S39OHTyyTDFneNajsYZxwM2yRWNJYn7C6okoYdKOX 3A2MMSjAnlv2NUU9JuyEldJ3Oyyr1cOe6eHNgui0EdPGl0Wt5NrElnFVMIVrY3F7Zn6I adZgsScGuYEN4dq4usQh9n+yp/i0tshTE+kddocr0tDZ1yoz2g9CuNpk/rjAUrtSylhX Rn83OXbLbmdMrwN4cXh59Jym5zIE5gdVqIQwtOBknL5vgv4V01cVrIzC+IFi4bMmAI3N poNg== MIME-Version: 1.0 Received: by 10.68.200.102 with SMTP id jr6mr383369pbc.0.1338928216710; Tue, 05 Jun 2012 13:30:16 -0700 (PDT) Received: by 10.68.6.102 with HTTP; Tue, 5 Jun 2012 13:30:16 -0700 (PDT) X-Originating-IP: [205.207.104.233] In-Reply-To: References: Date: Tue, 5 Jun 2012 16:30:16 -0400 Message-ID: Subject: Re: Secondary Indexes, Quorum and Cluster Availability From: Jim Ancona To: user@cassandra.apache.org Content-Type: multipart/alternative; boundary=047d7b10d205bc9c7504c1bf844a X-Gm-Message-State: ALoCoQkfC0bgxplViE/E8iQAGN16Vj2XM21VW2NGwvGaEsRcnxByDNydz6VcWDr91D4OhNFvmTZi X-Virus-Checked: Checked by ClamAV on apache.org --047d7b10d205bc9c7504c1bf844a Content-Type: text/plain; charset=UTF-8 On Mon, Jun 4, 2012 at 2:34 PM, aaron morton wrote: > IIRC index slices work a little differently with consistency, they need to > have CL level nodes available for all token ranges. If you drop it to CL > ONE the read is local only for a particular token range. > Yes, this is what we observed. When I reasoned my way through what I knew about how secondary indexes work, I came to the same conclusion about all token ranges having to be available. My surprise at the behavior was because I *hadn't* reasoned my way through it until we had the issue. Somehow I doubt I'm the only user of secondary indexes that was unaware of this ramification of CL choice. It might be a good idea for the documentation to reflect the tradeoffs more clearly. Thanks for you help! Jim > > The problem when doing index reads is the nodes that contain the results > can no longer be selected by the partitioner. > > Cheers > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 2/06/2012, at 5:15 AM, Jim Ancona wrote: > > Hi, > > We have an application with two code paths, one of which uses a secondary > index query and the other, which doesn't. While testing node down scenarios > in our cluster we got a result which surprised (and concerned) me, and I > wanted to find out if the behavior we observed is expected. > > Background: > > - 6 nodes in the cluster (in order: A, B, C, E, F and G) > - RF = 3 > - All operations at QUORUM > - Operation 1: Read by row key followed by write > - Operation 2: Read by secondary index, followed by write > > While running a mixed workload of operations 1 and 2, we got the following > results: > > * Scenario* * Result* All nodes up All operations succeed One node downAll operations succeedNodes A and E downAll operations succeedNodes A and B downOperation 1: ~33% fail > Operation 2: All fail Nodes A and C down Operation 1: ~17% fail > Operation 2: All fail > We had expected (perhaps incorrectly) that the secondary index reads would > fail in proportion to the portion of the ring that was unable to reach > quorum, just as the row key reads did. For both operation types the > underlying failure was an UnavailableException. > > The same pattern repeated for the other scenarios we tried. The row key > operations failed at the expected ratios, given the portion of the ring > that was unable to meet quorum because of nodes down, while all the > secondary index reads failed as soon as 2 out of any 3 adjacent nodes were > down. > > Is this an expected behavior? Is it documented anywhere? I didn't find it > with a quick search. > > The operation doing secondary index query is an important one for our app, > and we'd really prefer that it degrade gracefully in the face of cluster > failures. My plan at this point is to do that query at ConsistencyLevel.ONE > (and accept the increased risk of inconsistency). Will that work? > > Thanks in advance, > > Jim > > > --047d7b10d205bc9c7504c1bf844a Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

On Mon, Jun 4, 2012 at 2:34 PM, aaron mo= rton <aaron@thelastpickle.com> wrote:
IIRC index slices work a little differe= ntly with consistency, they need to have CL level nodes available for all t= oken ranges. If you drop it to CL ONE the read is local only for a particul= ar token range.=C2=A0

Yes, this is what we observed. When I reas= oned my way through what I knew about how secondary indexes work, I came to= the same conclusion about all token ranges having to be available.=C2=A0

My surprise at the behavior was because I hadn't= =C2=A0reasoned my way through it until we had the issue. Somehow I doub= t I'm the only user of secondary indexes that was unaware of this ramif= ication of CL choice. It might be a good idea for the documentation to refl= ect the tradeoffs more clearly.

Thanks for you help!

Jim
=
=C2=A0

The problem when doing index reads is the nodes that co= ntain the results can no longer be selected by the partitioner.=C2=A0
=

Cheers
<= div style=3D"word-wrap:break-word">
-----------------
Aaron Morton
Freelance Deve= loper
@aaronmorton

On 2/06/2012, at 5:15 AM, Jim Ancona wrote:

Hi,

We have an application with two code paths, on= e of which uses a secondary index query and the other, which doesn't. W= hile testing node down scenarios in our cluster we got a result which surpr= ised (and concerned) me, and I wanted to find out if the behavior we observ= ed is expected.

Background:
  • 6 nodes in the cluster (in order: A, B, C, E, F = and G)
  • RF =3D 3
  • All operations at QUORUM
  • Operation = 1: Read by row key followed by write
  • Operation 2: Read by secondary= index, followed by write
While running a mixed workload of operations 1 and 2, we got the follo= wing results:

Scenario Result
All nodes up All operations succeed
One node down All operations succeed
Nodes A and E down All operations succeed
Nodes A and B down Operation 1: ~33% fail
Operation 2: All fail
Nodes A and C down Operation 1: ~17% fail
Operation 2: All fail
We had expected (perhaps incorrectly) that the secondary index= reads would fail in proportion to the portion of the ring that was unable = to reach quorum, just as the row key reads did. For both operation types th= e underlying failure was an=C2=A0UnavailableException.

The same pattern repeated for the other scenarios we tr= ied. The row key operations failed at the expected ratios, given the portio= n of the ring that was unable to meet quorum because of nodes down, while a= ll the secondary index reads failed as soon as 2 out of any 3 adjacent node= s were down.

Is this an expected behavior? Is it documented anywhere= ? I didn't find it with a quick search.

The=C2= =A0operation doing=C2=A0secondary index query is an important one for our a= pp, and we'd really prefer that it degrade gracefully in the face of cl= uster failures. My plan at this point is to do that query at ConsistencyLev= el.ONE (and accept the increased risk of inconsistency). Will that work?

Thanks in advance,

Jim


--047d7b10d205bc9c7504c1bf844a--