Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cassandra.apache.org
Date: Fri, 8 May 2015 10:15:03 +0000 (UTC)
From: "Benedict (JIRA)" <jira@apache.org>
To: commits@cassandra.apache.org
Message-ID: <JIRA.12827721.1430931031000.46750.1431080103697@Atlassian.JIRA>
In-Reply-To: <JIRA.12827721.1430931031000@Atlassian.JIRA>
References: <JIRA.12827721.1430931031000@Atlassian.JIRA>
 <JIRA.12827721.1430931031849@arcas>
Subject: [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight
 requests at the coordinator
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534260#comment-14534260 ] 

Benedict commented on CASSANDRA-9318:
-------------------------------------

bq. In my mind in-flight means that until the response is sent back to the client the request counts against the in-flight limit.
bq. We already keep the original request around until we either get acks from all replicas, or they time out (and we write a hint).

Perhaps we need to clarify more clearly what this ticket is proposing. I understood it to mean what Ariel commented here, whereas it sounds like Jonathan is suggesting we simply prune our ExpiringMap based on bytes tracked as well as time?

The ExpiringMap requests are already "in-flight" and cannot be cancelled, so their effect on other nodes cannot be rescinded, and imposing a limit does not stop us issuing more requests to the nodes in the cluster that are failing to keep up and respond to us. It _might_ be sensible if introducing more aggressive shedding at processing nodes to also shed the response handlers more aggressively locally, but I'm not convinced it would have a significant impact on cluster health by itself; cluster instability spreads out from problematic nodes, and this scheme still permits us to inundate those nodes with requests they cannot keep up with.

Alternatively, the approach of forbidding new requests if you have items in the ExpiringMap causes the collapse of other nodes to spread throughout the cluster, as rapidly (especially on small clusters) all requests on the system are destined for the collapsing node, and every coordinator stops accepting requests. The system seizes up, and that node is still failing since it's got requests from the entire cluster queued up with it. In general on the coordinator there's no way of distinguishing between a failed node, network partition, or just struggling, so we don't know if we should wait.

Some mix of the two might be possible, if we were to wait while a node is just slow, then drop our response handlers for the node if it's marked as down. This latter may not be a bad thing to do anyway, but I would not want to depend on this behaviour to maintain our precious "A"

It still seems the simplest and most robust solution is to make our work queues leaky, since this insulates the processing nodes from cluster-wide inundation, which the coordinator approach cannot (even with the loss of "A" and cessation of processing, there is a whole cluster vs a potentially single node; with hash partition it doesn't take long for all processing to begin involving a single failing node). We can do this just on the number of requests and still be much better than we are currently. 

We could also pair this with coordinator-level dropping of handlers for "down" nodes, and above a threshold. This latter, though, could result in widespread uncoordinated dropping of requests, which may leave us open to a multiplying effect of cluster overload, with each node dropping different requests, possibly leading to only a tiny fraction of requests being serviced to their required CL across the cluster. I'm not sure how we can best model this risk, or avoid it without notifying coordinators of the drop of a message, and I don't see that being delivered for 2.1

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.1.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests and if it reaches a high watermark disable read on client connections until it goes back below some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other issues.


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)