cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Thu, 15 Sep 2016 16:50:21 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15493898#comment-15493898
] 

Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------

[~Stefania],

I fixed the tests related to the new {{DatabaseDescriptor}} initialization methods.

I've also addressed [~slebresne]'s concerns and modified the back-pressure algorithm to always
observe the write timeout, and if the rate limit causes it to be exceeded, rather than observe
the rate limit, just pause up to the timeout _minus_ the current response time from the replica
with the lower rate: this is to avoid client timeouts and also give enough time to replicas
to actually acknowledge the mutations (at the expense of having more inflight mutations than
the rate limit, but I believe this is the right tradeoff).

I've run several round of tests and dtests: tests are always green, but some dtests always
fail intermittently; those failures do not seem related to this issue, but someone else more
familiar with the failing dtests might want to have a look.

Finally, I've re-run some manual stress tests on an overloaded 4 nodes RF=3 cluster, and here
are the results of inserting 1M rows at CL.ONE:
\\
\\
* SLOW back-pressure.
||Node||Dropped Mutations||Dropped Hints||
|1|18143|0|
|2|10|0|
|3|0|0|
|4|0|0|
Timeouts: 39
Total runtime: 20 mins

* No back-pressure
||Node||Dropped Mutations||Dropped Hints||
|1|471751|248403|
|2|70996|13571|
|3|640|0|
|4|75318|24801|
Timeouts: 6
Total runtime: 5 mins

At CL.QUORUM:
\\
\\
* SLOW back-pressure.
||Node||Dropped Mutations||Dropped Hints||
|1|27781|8584|
|2|4650|0|
|3|0|0|
|4|0|0|
Timeouts: 37
Total runtime: 17 mins

* No back-pressure
||Node||Dropped Mutations||Dropped Hints||
|1|353972|133429|
|2|258776|81981|
|3|636|0|
|4|13870|1710|
Timeouts: 74
Total runtime: 6 mins

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, limit.btm,
no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other
issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message