cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Wed, 13 Jul 2016 14:43:20 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375132#comment-15375132
] 

Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------

[~jbellis],

bq. it causes other problems in the other two (non-global-overload) scenarios.

I think you are overstating the problem here, because the first two scenarios are either very
limited in time (the first), or very limited in magnitude (the second), and the back-pressure
algorithm is configurable to be as sensitive and as reactive as you wish, by tuning the incoming/outgoing
imbalance you want to tolerate, and the growth factor.

bq. I honestly don't see what is "better" about a "slow every write down to the speed of the
slowest, possibly sick, replica" approach. Defining a simple high water mark on requests in
flight should be much simpler without the negative side effects.

Such kind of threshold would be too arbitrary and coarse grained, but that's not even the
problem; the point is rather what you're going to do when the threshold is met. That is, say
the high water mark is met, we really have these options:
1) Throttle at the rate of the slow replicas, which is what we do in this patch.
2) Take the slow replica(s) out, which is even worse in terms of availability.
3) Rate limit the message dequeueing in the outbound connection, but this only moves the back-pressure
problem from a place to another.
4) Rate limit at a global rate equal to the water mark, but this only helps the coordinator,
as such rate might still be too high for the slow replicas.

In the end, I can't see any better options than what we implement in this patch for those
use cases willing to trade performance for overall stability, and I would at least have it
go through proper QA testing, to see how it behaves on larger clusters, fix any sharp edges,
and see how it stands overall.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, limit.btm,
no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other
issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message