cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Fri, 16 Oct 2015 10:03:06 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14960450#comment-14960450
] 

Sergio Bossa commented on CASSANDRA-9318:
-----------------------------------------

I'd like to resurrect this one and if time permits take it by following Jonathan's proposal
above, except I'd also like to propose an additional form of back-pressure at the coordinator->replica
level. Such beack-pressure would be applied by the coordinator when sending messages to *each*
replica if some kind of flow control condition is met (i.e. number of in-flight requests,
drop rate, we can talk about this more in a second time or even experiment); that is, each
replica would have its own flow control, allowing to better fine tune the applied back-pressure.
The memory-based back-pressure would at this point work as a kind of circuit breaker: if replicas
can't keep up, and the applied flow control causes too many requests to accumulate on the
coordinator, the memory-based limit will kick in and start pushing back to the client by either
pausing or throwing OveloadedException.

There are obviously details we need to discuss and/or experiment with, i.e.:
1) The flow control algorithm (we could steal from the TCP literature, using something like
CoDel or Adaptive RED).
2) If posing any limit to coordinator-level throttling, i.e. shedding requests that have been
throttled for too much time (I would say no, because the memory limit should protect against
OOMs and allow the in-flight requests to be processed).
3) What to do when the memory limit is reached (we could make this policy-based).

I hope it makes sense, and I hope you see the reason behind that: dropped mutations are a
problem for many C* users, and even more for C* applications that cannot rely on QUORUM reads
(i.e. inverted index queries, graph queries); the proposal above is not meant to be the definitive
solution, but should greatly help reducing the number of dropped mutations on replicas, which
memory-based back-pressure alone doesn't (as by the time you kick it, without flow control
replicas will be already flooded with requests).

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Ariel Weisberg
>            Assignee: Jacek Lewandowski
>             Fix For: 2.1.x, 2.2.x
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other
issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message