cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput
Date Sat, 10 May 2014 22:12:54 GMT


Benedict commented on CASSANDRA-4718:

I have a few branches to test out, and I want to test them out an a variety of hardware. [~enigmacurry]
can you run them on our internal multi-cpu boxes, and an AWS c3.8xlarge 4node cluster to the
following spec:

For each branch run: 20M inserts over 1M unique keys with 30, 90, 270 and 810 threads, then
wipe each cluster and perform a single 1M key insert, and then run 20M reads over 1M unique
keys with the same thread counts. All told that should take around 3hrs for -mode cql3 native
prepared; I'd then like to repeat the tests for -mode thrift smart.

The branches are: 

Make sure you use my cassandra-2.1 so we're testing like-to-like (they're all rebased to the
same version).

I'll elaborate on the contents of these branches later, but suffice it to say the 4718-lse
branch contains a new executor which attempts to reduce signalling costs to near zero by scheduling
the correct number of threads to deal with the level of throughput the executor has been dealing
with over the previous (short) adjustment window. -batchnetty includes some simple batching
of netty messages. 4718-lowsignal is an enhanced version of the patch I uploaded previously
to this ticket, and 4718-fjp is largely unchanged.

On my own box, and on our austin test cluster, I see -lse faster than both -fjp and -lowsignal,
however on our austin cluster (which is a not super-modern 4-cpu no-hyperthreading setup)
I see both of them slower than stock 2.1, however -lse is only slightly slower, whereas -fjp
is around 30% slower. I'll post polished numbers a little later.

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>                 Key: CASSANDRA-4718
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>         Attachments: 4718-v1.patch,, backpressure-stress.out.txt,
baq vs trunk.png, op costs of various queues.ods, stress op rate with various queues.ods,
> Currently all our execution stages dequeue tasks one at a time.  This can result in contention
between producers and consumers (although we do our best to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more work in "bulk"
instead of just one task per dequeue.  (Producer threads tend to be single-task oriented by
nature, so I don't see an equivalent opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for this. 
However, no ExecutorService in the jdk supports using drainTo, nor could I google one.
> What I would like to do here is create just such a beast and wire it into (at least)
the write and read stages.  (Other possible candidates for such an optimization, such as the
CommitLog and OutboundTCPConnection, are not ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of ICommitLogExecutorService
may also be useful. (Despite the name these are not actual ExecutorServices, although they
share the most important properties of one.)

This message was sent by Atlassian JIRA

View raw message