cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dikang Gu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-14252) Use zero as default score in DynamicEndpointSnitch
Date Wed, 28 Feb 2018 20:08:00 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-14252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16380957#comment-16380957
] 

Dikang Gu commented on CASSANDRA-14252:
---------------------------------------

[~szhou], hmm, it's might be easier to explain offline. Anyway, what I'm trying to do is to
set a default score which is 0, to any node in the cluster. Then, as Cassandra dispatch the
requests, it will get the correct latency among different nodes, and have correct score for
each one. The node is in local or remote does not matter a lot actually. Just in the function
`sortByProximityWithBadness`, we will consider local node first. If local node is slow but
alive, we can fall back to remote node. Without a default score, we will NEVER have a chance
to talk to a remote node, as long as local node is alive. 

> Use zero as default score in DynamicEndpointSnitch
> --------------------------------------------------
>
>                 Key: CASSANDRA-14252
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14252
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>            Reporter: Dikang Gu
>            Assignee: Dikang Gu
>            Priority: Major
>             Fix For: 4.0, 3.0.17, 3.11.3
>
>
> The problem I want to solve is that I found in our deployment, one slow but alive data
node can slow down the whole cluster, even caused timeout of our requests. 
> We are using DynamicEndpointSnitch, with badness_threshold 0.1. I expect the DynamicEndpointSnitch
switch to sortByProximityWithScore, if local data node latency is too high.
> I added some debug log, and figured out that in a lot of cases, the score from remote
data node was not populated, so the fallback to sortByProximityWithScore never happened.
That's why a single slow data node, can cause huge problems to the whole cluster.
> In this jira, I'd like to use zero as default score, so that we will get a chance to
try remote data node, if local one is slow. 
> I tested it in our test cluster, it improved the client latency in single slow data
node case significantly.  
> I flag this as a Bug, because it caused problems to our use cases multiple times.
>  ==== logs ===
> _2018-02-21_23:08:57.54145 WARN 23:08:57 [RPC-Thread:978]: sortByProximityWithBadness:
after sorting by proximity, addresses order change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.54319 WARN 23:08:57 [RPC-Thread:967]: sortByProximityWithBadness:
after sorting by proximity, addresses order change to [ip1, ip2], with scores [0.0]_
>  _2018-02-21_23:08:57.55111 WARN 23:08:57 [RPC-Thread:453]: sortByProximityWithBadness:
after sorting by proximity, addresses order change to [ip1, ip2], with scores [1.0]_
>  _2018-02-21_23:08:57.55687 WARN 23:08:57 [RPC-Thread:753]: sortByProximityWithBadness:
after sorting by proximity, addresses order change to [ip1, ip2], with scores [1.0]_
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org


Mime
View raw message