cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput
Date Sat, 17 May 2014 17:33:19 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000830#comment-14000830
] 

Benedict edited comment on CASSANDRA-4718 at 5/17/14 5:32 PM:
--------------------------------------------------------------

I meant 250Kop/s. We're now pushing 6Kop/s. The numbers from 16th May are the latest posted,
to my knowledge, and the ones we're discussing?

You can make stress do a fixed number of ops per run, but not a fixed set of thread counts
currently - its auto mode (that this is from) ramps up thread counts until it detects a plateau;
in these tests it seems that sep reached a higher throughput rate earlier, and so when it
normalised down again stress considered it to have plateaued earlier. As to #2, run1 when
it is truncated at a lower tc is as fast as stock is at its peak. However, you're right that
it is possible it would have tanked further - in this case this would be indicative of a bug
rather than a fundamental flaw in its design, but it is almost certainly down to the natural
tendency to dip slightly below peak throughput after the real plateau.

I can patch stress briefly to force it to run all thread counts in the requested range, instead
of stopping when it hits a plateau, but the auto-mode isn't really designed to be a canonical
test. If we want accurate like-for-like comparisons we want to graph each thread count separately
for its whole run, and ensure each run is long enough to spot the general behavioural pattern
(i.e. at least a few minutes for IO bound work). I'd also ensure we interleaved the two branches
to try to avoid any weird page caching / other utilisation interferences.

That said, I don't think we're going to see a great deal by doing any of that. But I'm always
pleased to see more data points (I will be continuing to run more tests, and burn in tests
on the executor service).


was (Author: benedict):
I meant 250Kop/s. We're now pushing 6Kop/s. The numbers from 16th May are the latest posted,
to my knowledge, and the ones we're discussing?

You can make stress do a fixed number of ops per run, but not a fixed set of thread counts
currently - its auto mode (that this is from) ramps up thread counts until it detects a plateau;
in these tests it seems that sep reached a higher throughput rate earlier, and so when it
normalised down again stress considered it to have plateaued earlier. As to #2, run1 when
it is truncated at a lower tc is as fast as stock is at its peak. However, you're right that
it is possible it would have tanked further - in this case this would be indicative of a bug
rather than a fundamental flaw in its design, but it is almost certainly down to the natural
tendency to dip slightly below peak throughput after the real plateau.

I can patch stress briefly to force it to run all thread counts in the requested range, instead
of stopping when it hits a plateau, but the auto-mode isn't really designed to be a canonical
test. If we want accurate like-for-like comparisons we want to graph each thread count separately
for its whole run, and ensure each run is long enough to spot the general behavioural pattern
(i.e. at least a few minutes for IO bound work). I'd also ensure we interleaved the two branches
to try to avoid any weird page caching / other utilisation interferences.

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4718
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>
>         Attachments: 4718-v1.patch, PerThreadQueue.java, austin_diskbound_read.svg, aws.svg,
aws_read.svg, backpressure-stress.out.txt, baq vs trunk.png, belliotsmith_branches-stress.out.txt,
jason_read.svg, jason_read_latency.svg, jason_run1.svg, jason_run2.svg, jason_run3.svg, jason_write.svg,
op costs of various queues.ods, stress op rate with various queues.ods, stress_2014May15.txt,
stress_2014May16.txt, v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can result in contention
between producers and consumers (although we do our best to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more work in "bulk"
instead of just one task per dequeue.  (Producer threads tend to be single-task oriented by
nature, so I don't see an equivalent opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for this. 
However, no ExecutorService in the jdk supports using drainTo, nor could I google one.
> What I would like to do here is create just such a beast and wire it into (at least)
the write and read stages.  (Other possible candidates for such an optimization, such as the
CommitLog and OutboundTCPConnection, are not ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of ICommitLogExecutorService
may also be useful. (Despite the name these are not actual ExecutorServices, although they
share the most important properties of one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message