cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stefania (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9318) Bound the number of in-flight requests at the coordinator
Date Mon, 04 Jul 2016 03:21:11 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15360784#comment-15360784
] 

Stefania commented on CASSANDRA-9318:
-------------------------------------

bq. For that specific test I've got no client timeouts at all, as I wrote at ONE.

Sorry I should have been clearer, I meant what were the {{write_request_timeout_in_ms}} and
{{back_pressure_timeout_override}} yaml settings?

bq. Agreed with all your points. I'll see what I can do, but any help/pointers will be very
appreciated.

We can do the following:

bq. verify we can reduce the number of dropped mutations in a larger (5-10 nodes) cluster
with multiple clients writing simultaneously

I will ask for help to the TEs, more details to follow.

bq. some cstar perf tests to ensure ops per second are not degraded, both read and writes
    
We can launch a comparison test [here|http://cstar.datastax.com], 30M rows should be enough.
I can launch it for you if you don't have an account.

bq. the dtests should be run with and without backpressure enabled
    
This can be done by temporarily changing cassandra.yaml on your branch and then launching
the dtests.

bq. we should do a bulk load test, for example for cqlsh COPY FROM

I can take care of this. I don't expect problems because COPY FROM should contact the replicas
directly, it's just a box I want to tick. Importing 5 to 10M rows with 3 nodes should be sufficient.

bq. Please send me a PR and I'll incorporate those in my branch

I couldn't create a PR, for some reason sbtourist/cassandra wasn't in the base fork list.
I've attached a patch to this ticket, [^9318-3.0-nits-trailing-spaces.patch].

bq. I find the current layout effective and simple enough, but I'll not object if you want
to push those under a common "container" option.

The encryption options are what I was aiming at, but it's true that for everything else we
have a flat layout, so let's leave it as it is.

bq. I don't like much that name either, as it doesn't convey very well the (double) meaning;
making the back-pressure window the same as the write timeout is not strictly necessary, but
it makes the algorithm behave better in terms of reducing dropped mutations as it gives replica
more time to process its backlog after the rate is reduced. Let me think about that a bit
more, but I'd like to avoid requiring the user to increase the write timeout manually, as
again, it reduces the effectiveness of the algorithm.

I'll let you think about it. Maybe a boolean property that is true by default and that clearly
indicates that the timeout is overridden, although this complicates things somewhat.

bq. Sure I can switch to that on trunk, if you think it's worth performance-wise (I can write
a JMH test if there isn't one already).

The precision is only 10 milliseconds, if this is acceptable it would be interesting to see
what the difference in performance is.

bq. It is not used in any unit tests code, but it is used in my manual byteman tests, and
unfortunately I need it on the C* classpath; is that a problem to keep it?

Sorry I missed the byteman imports and helper. Let's just move it to the test source folder
and add a comment. 

--

The rest of the CR points are fine. 

One thing we did not confirm is whether you are happy committing this only to trunk or whether
you need this in 3.0. Strictly speaking 3.0 accepts only bug fixes, not new features. However,
this is an optional feature that solves a problem (dropped mutations) and that is disabled
by default, so we have a case for an exception.

> Bound the number of in-flight requests at the coordinator
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-9318
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9318
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Local Write-Read Paths, Streaming and Messaging
>            Reporter: Ariel Weisberg
>            Assignee: Sergio Bossa
>         Attachments: 9318-3.0-nits-trailing-spaces.patch, backpressure.png, limit.btm,
no_backpressure.png
>
>
> It's possible to somewhat bound the amount of load accepted into the cluster by bounding
the number of in-flight requests and request bytes.
> An implementation might do something like track the number of outstanding bytes and requests
and if it reaches a high watermark disable read on client connections until it goes back below
some low watermark.
> Need to make sure that disabling read on the client connection won't introduce other
issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message