cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Tue, 12 May 2015 15:36:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14540051#comment-14540051
] 

Benedict commented on CASSANDRA-9318:
-------------------------------------

bq. The problem is that we need to give the clients better feedback so they know to modify
their behavior.

I should make it clear I'm not at all opposed to the idea of back pressure. I have voiced
in favour of it many times. However this design as proposed (or, as I'm inferring, there isn't
a formal proposal I don't think? Would be helpful still, to make sure we are discussing the
same thing) does not seem safe to me.

Fundamentally I don't see how you can safely distinguish between a "slow" node that is under
load that will catch up shortly, and a dead node, at least without an active "congestion control"
algorithm as Ariel described it. Stopping accepting queries for dead nodes is a catastrophic
loss of "A". If you have an elegant solution to this that can be implemented in this coordinator
level rate limiting, the only real showstopping concern I have is alleviated, but I don't
currently see one. It seems we absolutely have to have a positive signal from the processing
node to slow down, and if we lose that signal we should continue accepting work (but potentially
hint), and that is essentially the congestion control, and probably really for 2.1. Depending
on gossip is not sufficient (i.e. only implementing this algorithm while nodes are UP) since
there will be an indeterminate period of crossover during which we lose our "A"

bq.  we can keep the coordinator from falling over which is what turns a single-node hiccup
into a cluster-wide problem.

We seem to be conflating two goals here: stopping the cluster falling over, and stopping clients
from spamming it. I'm pretty sure we can do the former in 2.1 safely with improved shedding.
The latter seems much more difficult than it is being given credit for, and since the solution
being proposed clearly affects the semantics of our headline feature I'm unconvinced it is
mid-release material.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>             Fix For: 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other
issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message