cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-9558) Cassandra-stress regression in 2.2
Date Mon, 08 Jun 2015 18:29:01 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-9558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14577606#comment-14577606
] 

Benedict commented on CASSANDRA-9558:
-------------------------------------

bq. Evidently this is exactly what writeAndFlush does which is what the driver is using when
coalescing is disabled, but i'll keep exploring alternatives

But it's not a choice between the two. There should absolutely be coalescing, and it should
never be disabled. The question is if we should artificially delay our messages in order to
coalesce more of them. On a client I cannot see it making sense to do so: on the server, we
expect the server to have other useful work to do, to produce more responses that can be coalesced
together. On a client, however, we should not make that assumption: if the client is synchronously
waiting for a result, we're pointlessly delaying them (and cannot know if this is the case),
whereas if they are asynchronously producing work, this will accumulate or not, completely
independent of our delay, and after the first potentially more costly message the costs will
reach a steady state, that the delay is unlikely to have any positive effect on.

The main idea of it on the server is that it permits the server to exhaust its current burst
of messages (if possible), so that all messages that would naturally be grouped given the
chance can be.

That all said, some basic back-of-envelope maths suggest this cannot sufficiently account
for the problem in this case. That doesn't mean we shouldn't change it though, but it is unlikely
to explain this ticket.

We should really try to profile the client and server, to establish which is the bottleneck,
and where. It should not be the case that we need multiple threads to deal with this workload:
we're effectively batching up to 300 of these messages together, with a single point-to-point
high-bandwidth TCP connection. The fact that this cannot cope with more than 7MB/s is crazy.
There is maximal amortization of costs. It is possible we're hitting another weird issue with
interrupt queues in AWS.

> Cassandra-stress regression in 2.2
> ----------------------------------
>
>                 Key: CASSANDRA-9558
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9558
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Alan Boudreault
>            Priority: Blocker
>         Attachments: 2.1.log, 2.2.log, CASSANDRA-9558-2.patch, CASSANDRA-9558-ProtocolV2.patch,
atolber-CASSANDRA-9558-stress.tgz, atolber-trunk-driver-coalescing-disabled.txt, stress-2.1-java-driver-2.0.9.2.log,
stress-2.1-java-driver-2.2+PATCH.log, stress-2.1-java-driver-2.2.log, stress-2.2-java-driver-2.2+PATCH.log,
stress-2.2-java-driver-2.2.log
>
>
> We are seeing some regression in performance when using cassandra-stress 2.2. You can
see the difference at this url:
> http://riptano.github.io/cassandra_performance/graph_v5/graph.html?stats=stress_regression.json&metric=op_rate&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=108.57&ymin=0&ymax=168147.1
> The cassandra version of the cluster doesn't seem to have any impact. 
> //cc [~tjake] [~benedict]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message