cassandra-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonathan Ellis <>
Subject Re: CL.ONE reads and SimpleSnitch unnecessary timeouts
Date Wed, 13 Apr 2011 19:00:44 GMT
Yes, we've had dynamic snitch on by default in all the 0.7 releases so
it's pretty well tested by this point.

On Wed, Apr 13, 2011 at 1:17 PM, Erik Onnen <> wrote:
> So we're not currently using a dynamic snitch, only the SimpleSnitch
> is at play (lots of history as to why, I won't go into it). If this
> would solve our problems I'm fine changing it.
> Understood re: client contract. I guess in this case my issue is that
> the server we're connected to never tries more than the one failing
> server until failure detector has kicked in - it keeps flogging the
> bad server so subsequent requests never produce a different result
> until conviction.
> Regarding clients retrying, in this configuration the situation
> doesn't improve and it still times out because our client libraries
> don't try another host. They still have a valid connection to a
> working host, it's just that given our configuration that one node
> keeps proxying to a bad server and never routes around it. It sounds
> like switching to the dynamic switch would adjust for the first
> timeout on subsequent attempts so maybe that's the most advisable
> thing in this case.
> On Wed, Apr 13, 2011 at 10:58 AM, Jonathan Ellis <> wrote:
>> First, our contract with the client says "we'll give you the answer or
>> a timeout after rpc_timeout." Once we start trying to cheat on that
>> the client has no guarantee anymore when it should expect a response
>> by. So that feels iffy to me.
>> Second, retrying to a different node isn't expected to give
>> substantially better results than the client issuing a retry itself if
>> that's what it wants, since by the time we timeout once then FD and/or
>> dynamic snitch should route the request to another node for the retry
>> without adding additional complexity to StorageProxy.  (If that's not
>> what you see in practice, then we probably have a dynamic snitch bug.)
>> On Wed, Apr 13, 2011 at 12:32 PM, Erik Onnen <> wrote:
>>> Sorry for the complex setup, took a while to identify the behavior and
>>> I'm still not sure I'm reading the code correctly.
>>> Scenario:
>>> Six node ring w/ SimpleSnitch and RF3. For the sake of discussion
>>> assume the token space looks like:
>>> node-0 1-10
>>> node-1 11-20
>>> node-2 21-30
>>> node-3 31-40
>>> node-4 41-50
>>> node-5 51-60
>>> In this scenario we want key 35 where nodes 3,4 and 5 are natural
>>> endpoints. Client is connected to node-0, node-1 or node-2. node-3
>>> goes into a full GC lasting 12 seconds.
>>> What I think we're seeing is that as long as we read with CL.ONE *and*
>>> are connected to 0,1 or 2, we'll never get a response for the
>>> requested key until the failure detector kicks in and convicts 3
>>> resulting in reads spilling over to the other endpoints.
>>> We've tested this by switching to CL.QUORUM and since haven't seen
>>> read timeouts during big GCs.
>>> Assuming the above, is this behavior really correct? We have copies of
>>> the data on two other nodes but because this snitch config always
>>> picks node-3, we always timeout until conviction which can take up to
>>> 8 seconds sometimes. Shouldn't the read attempt to pick a different
>>> endpoint in the case of the first timeout rather than repeatedly
>>> trying a node that isn't responding?
>>> Thanks,
>>> -erik
>> --
>> Jonathan Ellis
>> Project Chair, Apache Cassandra
>> co-founder of DataStax, the source for professional Cassandra support

Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support

View raw message