cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sergio Bossa (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Thu, 21 Jul 2016 19:39:20 GMT


Sergio Bossa commented on CASSANDRA-9318:


bq. We just need to decide if we are happy having a back pressure state for each OutboundTcpConnectionPool,
even though back pressure may be disabled or some strategies may use different metrics, e.g.
total in-flight requests or memory, or whether the states should be hidden inside the strategy.

As I said, if you move the states inside the strategy, you make the strategy "fattier", and
I don't like that. I think keeping the states associated to the connections (as a surrogate
for the actual nodes) makes sense, because that's what back-pressure is about; any strategy
that doesn't take into account per-node state, is IMHO not a proper back-pressure strategy,
and can be implemented in the same way as hints overflowing is implemented (that is, outside
the back-pressure APIs).

Anyway, I've made the {{BackPressureState}} interface more generic so it can react on messages
when they're about to be sent, which should allow to implement different strategies as you
mentioned. This should also make the current {{RateBased}} implementation clearer (see below),
which is nice :)

bq. Now the back pressure is applied before hinting, or inserting local mutations, or sending
to remote data centers, but after sending to local replicas. I think we may need to rework
SP.sendToHintedEndpoints a little bit, I think we want to fire the insert local, then block,
then send all messages.

Apologies, the {{SP}} method went through several rewrites in our back and forth and I didn't
pay enough attention during the last one, totally stupid mistake. Should be fixed now, but
I've kept the back-pressure application on top, as in case the strategy implementation wants
to terminate the request straight away (i.e. by throwing an exception) it doesn't make sense
to partially send messages (that is, either all or nothing is IMHO better).

bq. There is one more potential issue in the case of non local replicas. We send the mutation
only to one remote replica, which then forwards it to other replicas in that DC. So remote
replicas may have the outgoing rate set to zero, and therefore the limiter rate set to positive
infinity, which means we won't throttle at all with FAST flow selected.

The {{RateBased}} implementation doesn't "count" outgoing requests when they're sent, which
would cause the bug you mentioned, but when the response is received or the callback is expired,
in order to guarantee a consistent counting between outgoing and incoming messages, i.e. if
outgoing messages are more than incoming ones we know that's because of timeouts, not because
some requests are still in-flight; this makes the algorithm way more stable, and also overcomes
the problem you mentioned. The latest changes to the state interface should hopefully make
this implementation detail clearer.

bq. In fact, we have no way of reliably knowing the status of remote replicas, or replicas
to which we haven't sent any mutations recently, and we should perhaps exclude them.

This can only happen when a new replica is injected into the system, or if it comes back alive
after being dead, otherwise we always send mutations to all replicas, or am I missing something?
There's no lock-free way to remove such replicas, because "0 outgoing" is a very transient
state, and that's actually the point: the algorithm is designed to work at steady state, and
after a single "window" (rpc timeout seconds) the replica state will have received enough

bq. I think we can start testing once we have sorted the second discussion point above, as
for the API issues, we'll eventually need to reach consensus but we don't need to block the
tests for it.

Please have a look at my changes, I will do one more review pass myself as I think some aspects
of the rate-based algorithm can be improved, then if we're all happy I will go ahead with
rebasing to trunk and then moving into testing.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>                 Key: CASSANDRA-9318
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, limit.btm,
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other

This message was sent by Atlassian JIRA

View raw message