Mailing-List: contact user-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@cassandra.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: <E2EF9A31-66EC-4A15-9307-20EA92B70D5C@thelastpickle.com>
References: 
 <CAKYY9ALtVJvV6sYygygbVbDn8mM92wmBhk2fxo4wLwrMeG5d+A@mail.gmail.com>
	<E2EF9A31-66EC-4A15-9307-20EA92B70D5C@thelastpickle.com>
Date: Tue, 5 Jun 2012 16:30:16 -0400
Message-ID: 
 <CAKYY9A+OR3+=xt3CzZ58Y60nzxuy+yvRs553ksNHbdHvTQKvBw@mail.gmail.com>
Subject: Re: Secondary Indexes, Quorum and Cluster Availability
From: Jim Ancona <jim@anconafamily.com>
To: user@cassandra.apache.org
Content-Type: multipart/alternative; boundary=047d7b10d205bc9c7504c1bf844a

--047d7b10d205bc9c7504c1bf844a
Content-Type: text/plain; charset=UTF-8

On Mon, Jun 4, 2012 at 2:34 PM, aaron morton <aaron@thelastpickle.com>wrote:

> IIRC index slices work a little differently with consistency, they need to
> have CL level nodes available for all token ranges. If you drop it to CL
> ONE the read is local only for a particular token range.
>

Yes, this is what we observed. When I reasoned my way through what I knew
about how secondary indexes work, I came to the same conclusion about all
token ranges having to be available.

My surprise at the behavior was because I *hadn't* reasoned my way through
it until we had the issue. Somehow I doubt I'm the only user of secondary
indexes that was unaware of this ramification of CL choice. It might be a
good idea for the documentation to reflect the tradeoffs more clearly.

Thanks for you help!

Jim


>
> The problem when doing index reads is the nodes that contain the results
> can no longer be selected by the partitioner.
>

> Cheers
>
> -----------------
> Aaron Morton
> Freelance Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 2/06/2012, at 5:15 AM, Jim Ancona wrote:
>
> Hi,
>
> We have an application with two code paths, one of which uses a secondary
> index query and the other, which doesn't. While testing node down scenarios
> in our cluster we got a result which surprised (and concerned) me, and I
> wanted to find out if the behavior we observed is expected.
>
> Background:
>
>    - 6 nodes in the cluster (in order: A, B, C, E, F and G)
>    - RF = 3
>    - All operations at QUORUM
>    - Operation 1: Read by row key followed by write
>    - Operation 2: Read by secondary index, followed by write
>
> While running a mixed workload of operations 1 and 2, we got the following
> results:
>
>  * Scenario* * Result* All nodes up All operations succeed One node downAll operations succeedNodes A and E downAll operations succeedNodes A and B downOperation 1: ~33% fail
> Operation 2: All fail Nodes A and C down Operation 1: ~17% fail
> Operation 2: All fail
> We had expected (perhaps incorrectly) that the secondary index reads would
> fail in proportion to the portion of the ring that was unable to reach
> quorum, just as the row key reads did. For both operation types the
> underlying failure was an UnavailableException.
>
> The same pattern repeated for the other scenarios we tried. The row key
> operations failed at the expected ratios, given the portion of the ring
> that was unable to meet quorum because of nodes down, while all the
> secondary index reads failed as soon as 2 out of any 3 adjacent nodes were
> down.
>
> Is this an expected behavior? Is it documented anywhere? I didn't find it
> with a quick search.
>
> The operation doing secondary index query is an important one for our app,
> and we'd really prefer that it degrade gracefully in the face of cluster
> failures. My plan at this point is to do that query at ConsistencyLevel.ONE
> (and accept the increased risk of inconsistency). Will that work?
>
> Thanks in advance,
>
> Jim
>
>
>

--047d7b10d205bc9c7504c1bf844a
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<br><br><div class=3D"gmail_quote">On Mon, Jun 4, 2012 at 2:34 PM, aaron mo=
rton <span dir=3D"ltr">&lt;<a href=3D"mailto:aaron@thelastpickle.com" targe=
t=3D"_blank">aaron@thelastpickle.com</a>&gt;</span> wrote:<br><blockquote c=
lass=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;=
padding-left:1ex">
<div style=3D"word-wrap:break-word">IIRC index slices work a little differe=
ntly with consistency, they need to have CL level nodes available for all t=
oken ranges. If you drop it to CL ONE the read is local only for a particul=
ar token range.=C2=A0</div>
</blockquote><div><br></div><div>Yes, this is what we observed. When I reas=
oned my way through what I knew about how secondary indexes work, I came to=
 the same conclusion about all token ranges having to be available.=C2=A0</=
div>
<div><br></div><div>My surprise at the behavior was because I <b>hadn&#39;t=
</b>=C2=A0reasoned my way through it until we had the issue. Somehow I doub=
t I&#39;m the only user of secondary indexes that was unaware of this ramif=
ication of CL choice. It might be a good idea for the documentation to refl=
ect the tradeoffs more clearly.</div>
<div><br></div><div>Thanks for you help!</div><div><br></div><div>Jim</div>=
<div>=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8=
ex;border-left:1px #ccc solid;padding-left:1ex"><div style=3D"word-wrap:bre=
ak-word">
<div><br></div><div>The problem when doing index reads is the nodes that co=
ntain the results can no longer be selected by the partitioner.=C2=A0</div>=
</div></blockquote><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 =
.8ex;border-left:1px #ccc solid;padding-left:1ex">
<div style=3D"word-wrap:break-word"><div><br></div><div>Cheers</div><div><b=
r></div><div><div>
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;te=
xt-align:-webkit-auto;font-style:normal;font-weight:normal;line-height:norm=
al;border-collapse:separate;text-transform:none;font-size:medium;white-spac=
e:normal;font-family:Helvetica;word-spacing:0px"><span style=3D"text-indent=
:0px;letter-spacing:normal;font-variant:normal;font-style:normal;font-weigh=
t:normal;line-height:normal;border-collapse:separate;text-transform:none;fo=
nt-size:medium;white-space:normal;font-family:Helvetica;word-spacing:0px"><=
div style=3D"word-wrap:break-word">
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<span style=3D"text-indent:0px;letter-spacing:normal;font-variant:normal;fo=
nt-style:normal;font-weight:normal;line-height:normal;border-collapse:separ=
ate;text-transform:none;font-size:medium;white-space:normal;font-family:Hel=
vetica;word-spacing:0px"><div style=3D"word-wrap:break-word">
<div><div>-----------------</div><div>Aaron Morton</div><div>Freelance Deve=
loper</div><div>@aaronmorton</div><div><a href=3D"http://www.thelastpickle.=
com" target=3D"_blank">http://www.thelastpickle.com</a></div></div></div></=
span></div>
</span></div></span></span>
</div><div><div class=3D"h5">

<br><div><div>On 2/06/2012, at 5:15 AM, Jim Ancona wrote:</div><br><blockqu=
ote type=3D"cite">Hi,<br><br>We have an application with two code paths, on=
e of which uses a secondary index query and the other, which doesn&#39;t. W=
hile testing node down scenarios in our cluster we got a result which surpr=
ised (and concerned) me, and I wanted to find out if the behavior we observ=
ed is expected.<br>


<br>Background:<br><ul><li>6 nodes in the cluster (in order: A, B, C, E, F =
and G)</li><li>RF =3D 3</li><li>All operations at QUORUM</li><li>Operation =
1: Read by row key followed by write</li><li>Operation 2: Read by secondary=
 index, followed by write</li>


</ul>While running a mixed workload of operations 1 and 2, we got the follo=
wing results:<div><br><div><table style=3D"font-family:&#39;Droid Serif&#39=
;,serif;line-height:26px;word-spacing:2px;font-size:medium">
<tbody><tr><td style=3D"border-top-width:1px;border-right-width:1px;border-=
bottom-width:1px;border-left-width:1px;border-top-style:solid;border-right-=
style:solid;border-bottom-style:solid;border-left-style:solid;border-top-co=
lor:black;border-right-color:black;border-bottom-color:black;border-left-co=
lor:black;vertical-align:top">


<b>
Scenario</b></td><td style=3D"border-top-width:1px;border-right-width:1px;b=
order-bottom-width:1px;border-left-width:1px;border-top-style:solid;border-=
right-style:solid;border-bottom-style:solid;border-left-style:solid;border-=
top-color:black;border-right-color:black;border-bottom-color:black;border-l=
eft-color:black;vertical-align:top">


<b>
Result</b></td></tr><tr><td style=3D"border-top-width:1px;border-right-widt=
h:1px;border-bottom-width:1px;border-left-width:1px;border-top-style:solid;=
border-right-style:solid;border-bottom-style:solid;border-left-style:solid;=
border-top-color:black;border-right-color:black;border-bottom-color:black;b=
order-left-color:black;vertical-align:top">


All nodes up</td><td style=3D"border-top-width:1px;border-right-width:1px;b=
order-bottom-width:1px;border-left-width:1px;border-top-style:solid;border-=
right-style:solid;border-bottom-style:solid;border-left-style:solid;border-=
top-color:black;border-right-color:black;border-bottom-color:black;border-l=
eft-color:black;vertical-align:top">


All operations succeed</td></tr><tr><td style=3D"border-top-width:1px;borde=
r-right-width:1px;border-bottom-width:1px;border-left-width:1px;border-top-=
style:solid;border-right-style:solid;border-bottom-style:solid;border-left-=
style:solid;border-top-color:black;border-right-color:black;border-bottom-c=
olor:black;border-left-color:black;vertical-align:top">


One node down</td><td style=3D"border-top-width:1px;border-right-width:1px;=
border-bottom-width:1px;border-left-width:1px;border-top-style:solid;border=
-right-style:solid;border-bottom-style:solid;border-left-style:solid;border=
-top-color:black;border-right-color:black;border-bottom-color:black;border-=
left-color:black;vertical-align:top">


All operations succeed</td></tr><tr><td style=3D"border-top-width:1px;borde=
r-right-width:1px;border-bottom-width:1px;border-left-width:1px;border-top-=
style:solid;border-right-style:solid;border-bottom-style:solid;border-left-=
style:solid;border-top-color:black;border-right-color:black;border-bottom-c=
olor:black;border-left-color:black;vertical-align:top">


Nodes A and E down</td><td style=3D"border-top-width:1px;border-right-width=
:1px;border-bottom-width:1px;border-left-width:1px;border-top-style:solid;b=
order-right-style:solid;border-bottom-style:solid;border-left-style:solid;b=
order-top-color:black;border-right-color:black;border-bottom-color:black;bo=
rder-left-color:black;vertical-align:top">


All operations succeed</td></tr><tr><td style=3D"border-top-width:1px;borde=
r-right-width:1px;border-bottom-width:1px;border-left-width:1px;border-top-=
style:solid;border-right-style:solid;border-bottom-style:solid;border-left-=
style:solid;border-top-color:black;border-right-color:black;border-bottom-c=
olor:black;border-left-color:black;vertical-align:top">


Nodes A and B down</td><td style=3D"border-top-width:1px;border-right-width=
:1px;border-bottom-width:1px;border-left-width:1px;border-top-style:solid;b=
order-right-style:solid;border-bottom-style:solid;border-left-style:solid;b=
order-top-color:black;border-right-color:black;border-bottom-color:black;bo=
rder-left-color:black;vertical-align:top">


Operation 1: ~33% fail<br>Operation 2: All fail</td></tr><tr><td style=3D"b=
order-top-width:1px;border-right-width:1px;border-bottom-width:1px;border-l=
eft-width:1px;border-top-style:solid;border-right-style:solid;border-bottom=
-style:solid;border-left-style:solid;border-top-color:black;border-right-co=
lor:black;border-bottom-color:black;border-left-color:black;vertical-align:=
top">


Nodes A and C down</td><td style=3D"border-top-width:1px;border-right-width=
:1px;border-bottom-width:1px;border-left-width:1px;border-top-style:solid;b=
order-right-style:solid;border-bottom-style:solid;border-left-style:solid;b=
order-top-color:black;border-right-color:black;border-bottom-color:black;bo=
rder-left-color:black;vertical-align:top">


Operation 1: ~17% fail<br>Operation 2: All fail</td></tr></tbody></table><b=
r></div><div>We had expected (perhaps incorrectly) that the secondary index=
 reads would fail in proportion to the portion of the ring that was unable =
to reach quorum, just as the row key reads did. For both operation types th=
e underlying failure was an=C2=A0UnavailableException.</div>


<div><br></div><div>The same pattern repeated for the other scenarios we tr=
ied. The row key operations failed at the expected ratios, given the portio=
n of the ring that was unable to meet quorum because of nodes down, while a=
ll the secondary index reads failed as soon as 2 out of any 3 adjacent node=
s were down.</div>


<div><br></div><div>Is this an expected behavior? Is it documented anywhere=
? I didn&#39;t find it with a quick search.</div><div><br></div><div>The=C2=
=A0operation doing=C2=A0secondary index query is an important one for our a=
pp, and we&#39;d really prefer that it degrade gracefully in the face of cl=
uster failures. My plan at this point is to do that query at ConsistencyLev=
el.ONE (and accept the increased risk of inconsistency). Will that work?</d=
iv>


<div><br></div><div>Thanks in advance,</div><div><br></div><div>Jim</div>
</div>
</blockquote></div><br></div></div></div></div></blockquote></div><br>

--047d7b10d205bc9c7504c1bf844a--