incubator-cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From aaron morton <aa...@thelastpickle.com>
Subject Re: Secondary Indexes, Quorum and Cluster Availability
Date Mon, 04 Jun 2012 18:34:17 GMT
IIRC index slices work a little differently with consistency, they need to have CL level nodes
available for all token ranges. If you drop it to CL ONE the read is local only for a particular
token range. 

The problem when doing index reads is the nodes that contain the results can no longer be
selected by the partitioner. 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 2/06/2012, at 5:15 AM, Jim Ancona wrote:

> Hi,
> 
> We have an application with two code paths, one of which uses a secondary index query
and the other, which doesn't. While testing node down scenarios in our cluster we got a result
which surprised (and concerned) me, and I wanted to find out if the behavior we observed is
expected.
> 
> Background:
> 6 nodes in the cluster (in order: A, B, C, E, F and G)
> RF = 3
> All operations at QUORUM
> Operation 1: Read by row key followed by write
> Operation 2: Read by secondary index, followed by write
> While running a mixed workload of operations 1 and 2, we got the following results:
> 
> Scenario	 Result
> All nodes up	 All operations succeed
> One node down	 All operations succeed
> Nodes A and E down	 All operations succeed
> Nodes A and B down	 Operation 1: ~33% fail
> Operation 2: All fail
> Nodes A and C down	 Operation 1: ~17% fail
> Operation 2: All fail
> 
> We had expected (perhaps incorrectly) that the secondary index reads would fail in proportion
to the portion of the ring that was unable to reach quorum, just as the row key reads did.
For both operation types the underlying failure was an UnavailableException.
> 
> The same pattern repeated for the other scenarios we tried. The row key operations failed
at the expected ratios, given the portion of the ring that was unable to meet quorum because
of nodes down, while all the secondary index reads failed as soon as 2 out of any 3 adjacent
nodes were down.
> 
> Is this an expected behavior? Is it documented anywhere? I didn't find it with a quick
search.
> 
> The operation doing secondary index query is an important one for our app, and we'd really
prefer that it degrade gracefully in the face of cluster failures. My plan at this point is
to do that query at ConsistencyLevel.ONE (and accept the increased risk of inconsistency).
Will that work?
> 
> Thanks in advance,
> 
> Jim


Mime
View raw message