cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Thu, 30 Jun 2016 09:17:10 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15356800#comment-15356800
] 

Stefania commented on CASSANDRA-9318:
-------------------------------------

Well, I read all of the discussion above and it took perhaps longer than to actually review
the code! :) If I understood correctly, what we are trying to achieve with this new approach,
is reducing the number of dropped mutations rather than preventing OOM but, in a way that
doesn't cause OOM by invoking a rate-limiter-acquire in a Netty shared worker pool thread,
which would slow down our ability to handle new client requests, as well as throw OEs once
in a while when a low water-mark is reached. Is this a fair description? Assuming we can prove
this in a larger scenario, I am +1 on the approach but I would think it should go to trunk
rather than 3.0? In the CCM test performed with two nodes, what was the difference in write
timeout?

In terms of testing, here is what I would suggest (cc [~cassandra-te] for any help they may
be able to provide):

* verify we can reduce the number of dropped mutations in a larger (5-10 nodes) cluster with
multiple clients writing simultaneously
* some cstar perf tests to ensure ops per second are not degraded, both read and writes
* the dtests should be run with and without backpressure enabled
* we should do a bulk load test, for example for cqlsh COPY FROM

As for the code review, I've addressed some nits, mostly comments and trailing spaces, directly
[here|https://github.com/stef1927/cassandra/commit/334509c10a5f4f759550fb64902aabc8c4a4f40c].
The code is really of excellent quality and very well unit tested, so I didn't really find
much, and they are mostly suggestions. I will however have another pass next week, now that
I am more familiar with the feature.

h5. conf/cassandra.yaml

I've edited the comments slightly, mostly to spell out for users why they may need a back
pressure strategy, and to put the emphasis on the default implementation first, then say how
to create new strategies. As you mentioned, new strategies may require new state parameters
anyway, so it's not clear how easy it will be to add new strategies. You may be able to improve
on my comments further.

Should we perhaps group all settings under something like back_pressure_options?

back_pressure_timeout_override: should we call it for what it is, the back_pressure_window_size
and recommend that they increase the write timeout in the comments, or is an override of the
write timeout necessary for correctness?

h5. src/java/org/apache/cassandra/config/Config.java

I moved the creation of the default parameterized class to RateBasedBackPressure, to remove
some pollution of constants. 

h5. src/java/org/apache/cassandra/net/BackPressureState.java

Should we use a dedicated package? It would require a couple of setters and getters in the
state to keep the properties non public, which I would prefer anyway.

h5. src/java/org/apache/cassandra/net/BackPressureStrategy.java

As above, should we use a dedicated package?

h5. src/java/org/apache/cassandra/net/RateBasedBackPressure.java

OK - I've changed the code slightly to calculate backPressureWindowHalfSize in seconds only
once per call.

h5. src/java/org/apache/cassandra/utils/SystemTimeSource.java

Depending on the precision, in trunk there is an approximate time service with a precision
of 10 ms, which is faster than calling System.currentTimeMillis()

h5. src/java/org/apache/cassandra/utils/TestRateLimiter.java

Is this still used?

h5. src/java/org/apache/cassandra/utils/TestTimeSource.java

I think we need to move it somewhere with the test sources.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: backpressure.png, limit.btm, no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other
issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message