cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: CL.ONE reads and SimpleSnitch unnecessary timeouts
Date Wed, 13 Apr 2011 17:58:50 GMT
First, our contract with the client says "we'll give you the answer or
a timeout after rpc_timeout." Once we start trying to cheat on that
the client has no guarantee anymore when it should expect a response
by. So that feels iffy to me.

Second, retrying to a different node isn't expected to give
substantially better results than the client issuing a retry itself if
that's what it wants, since by the time we timeout once then FD and/or
dynamic snitch should route the request to another node for the retry
without adding additional complexity to StorageProxy.  (If that's not
what you see in practice, then we probably have a dynamic snitch bug.)

On Wed, Apr 13, 2011 at 12:32 PM, Erik Onnen <> wrote:
> Sorry for the complex setup, took a while to identify the behavior and
> I'm still not sure I'm reading the code correctly.
> Scenario:
> Six node ring w/ SimpleSnitch and RF3. For the sake of discussion
> assume the token space looks like:
> node-0 1-10
> node-1 11-20
> node-2 21-30
> node-3 31-40
> node-4 41-50
> node-5 51-60
> In this scenario we want key 35 where nodes 3,4 and 5 are natural
> endpoints. Client is connected to node-0, node-1 or node-2. node-3
> goes into a full GC lasting 12 seconds.
> What I think we're seeing is that as long as we read with CL.ONE *and*
> are connected to 0,1 or 2, we'll never get a response for the
> requested key until the failure detector kicks in and convicts 3
> resulting in reads spilling over to the other endpoints.
> We've tested this by switching to CL.QUORUM and since haven't seen
> read timeouts during big GCs.
> Assuming the above, is this behavior really correct? We have copies of
> the data on two other nodes but because this snitch config always
> picks node-3, we always timeout until conviction which can take up to
> 8 seconds sometimes. Shouldn't the read attempt to pick a different
> endpoint in the case of the first timeout rather than repeatedly
> trying a node that isn't responding?
> Thanks,
> -erik

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message