cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput
Date Sat, 17 May 2014 09:28:17 GMT


Benedict commented on CASSANDRA-4718:

[~xedin] why are you only counting the primary replica data? Requests will hit both replicas
by default? If you look at the results there is a reasonable amount of variability for both
runs, so it's not clear that one is slower or faster - there are a number of points where
4718-sep is faster than 2.1, and vice versa, and given it is disk bound I am inclined to suggest
this is not the patch making it perform worse. In fact, a majority of data points show higher
throughput for 4718-sep, not for 2.1. Your first test, every thread count below 271 is faster;
271 seems to be a blip due to a small number of very slow reads affecting the very last measurement
(there's a "race" in stress' auto mode where some measurements are still accepted after it's
decided enough have been taken, as can be seen by the final stderr being above the acceptability
point); 2.1 showed a similar effect at this tc, but smaller, so this seems likely to be random
chance. The last test it is faster for all thread counts despite some weird max latencies.
It's only the middle test where it appears to be marginally slower, and given this test performs
effectively exactly the same amount of work as the first test, I'm not sure this demonstrates
a great deal other than the variability.

It's also worth asking what your max read concurrency is? As I'm surprised to see thread counts
> 180 causing dramatic spikes in latency (both branches) when I'd expect them to be saturating
the read stage well before then?

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>                 Key: CASSANDRA-4718
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>         Attachments: 4718-v1.patch,, austin_diskbound_read.svg, aws.svg,
aws_read.svg, backpressure-stress.out.txt, baq vs trunk.png, belliotsmith_branches-stress.out.txt,
jason_read.svg, jason_read_latency.svg, jason_write.svg, op costs of various queues.ods, stress
op rate with various queues.ods, stress_2014May15.txt, stress_2014May16.txt, v1-stress.out
> Currently all our execution stages dequeue tasks one at a time.  This can result in contention
between producers and consumers (although we do our best to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more work in "bulk"
instead of just one task per dequeue.  (Producer threads tend to be single-task oriented by
nature, so I don't see an equivalent opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for this. 
However, no ExecutorService in the jdk supports using drainTo, nor could I google one.
> What I would like to do here is create just such a beast and wire it into (at least)
the write and read stages.  (Other possible candidates for such an optimization, such as the
CommitLog and OutboundTCPConnection, are not ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of ICommitLogExecutorService
may also be useful. (Despite the name these are not actual ExecutorServices, although they
share the most important properties of one.)

This message was sent by Atlassian JIRA

View raw message