cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <>
Subject [jira] [Updated] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput
Date Fri, 16 May 2014 17:39:18 GMT


Jason Brown updated CASSANDRA-4718:

    Attachment: stress_2014May16.txt

Next attachment (stress_2014May16). After I created a new set of data (about 300GB per node),
generated with populate=x (rather than uniform=0..x), I dropped the page cache, then I attempted
to pre-populate the page cache by running a warm up read round, like this:

{code}./tools/bin/cassandra-stress read n=180664790 -key dist=extr\(1..600000000,2\) -rate
threads=75 -mode native prepared cql3 -port native=9043 thrift=9161  -node .... {code}

I used the value "180664790" from stress's print function, which can give you a reasonable
dist count to hit a certain percentage of coverage for the different models (note: i may be
completely incorrect on this point, so feel free to correct my understanding).

Then I ran the same 3 stress tests i ran in yesterday's run (one where I load all 21 cols
with the row, load the default count (5), then only iterate over the default key count (1mil)).
I then cleared the page cache and performed the same testing for the other branch (using the
existing data, but warming up the cache).

The results look like the sep branch had lower latencies vs. cassandra 2.1 (same as yesterday),
but less ops/sec.

tbh, I'm not sure I proved a whole lot since yesterday’s tests, as dstat showed I was loading
200-300Mb of data per second, so I was certainly giving the page cache a good workout in any

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>                 Key: CASSANDRA-4718
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>         Attachments: 4718-v1.patch,, aws.svg, aws_read.svg, backpressure-stress.out.txt,
baq vs trunk.png, belliotsmith_branches-stress.out.txt, jason_read.svg, jason_read_latency.svg,
jason_write.svg, op costs of various queues.ods, stress op rate with various queues.ods, stress_2014May15.txt,
stress_2014May16.txt, v1-stress.out
> Currently all our execution stages dequeue tasks one at a time.  This can result in contention
between producers and consumers (although we do our best to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more work in "bulk"
instead of just one task per dequeue.  (Producer threads tend to be single-task oriented by
nature, so I don't see an equivalent opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for this. 
However, no ExecutorService in the jdk supports using drainTo, nor could I google one.
> What I would like to do here is create just such a beast and wire it into (at least)
the write and read stages.  (Other possible candidates for such an optimization, such as the
CommitLog and OutboundTCPConnection, are not ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of ICommitLogExecutorService
may also be useful. (Despite the name these are not actual ExecutorServices, although they
share the most important properties of one.)

This message was sent by Atlassian JIRA

View raw message