cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Ellis (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput
Date Thu, 11 Apr 2013 21:59:16 GMT


Jonathan Ellis commented on CASSANDRA-4718:

Let me give a little more color as to what our existing stages are.  Most of these are ThreadPoolExecutors
connected by LinkedBlockingQueue.

A client sends each request to a node in the cluster called the Coordinator.  The coordinator
stages are
# Request: either Thrift or Netty reads the request from the client
# StorageProxy: the coordinator validates the request and decides which replicas need to be
# MessagingService (out): the coordinator sends the requests to the appropriate replicas
# MessagingService (in): the coordinator reads the reply
# Response: the coordinator processes callbacks for the reply
# StorageProxy: this thread will have been waiting on a Future or a Condition for the callbacks,
and can now reply to the client

When a replica receives a message, it also goes through a few stages:
# MessagingService (in): the replica reads the coordinator's request
# Read or Write: fetch or append the data specified by the request
# MessagingService (out): the replica sends the result to the coordinator

So the obstacles I see to incorporating Disruptor are

- MessagingService.  This is an exception to the rule in that it is not actually a ThreadPoolExecutor;
we have a custom thread pool per replica that does some gymnastics to keep its queue from
growing indefinitely when a replica gets behind (CASSANDRA-3005).  MS uses blocking sockets;
long ago, we observed this to give better performance than NIO.  I'd be willing to evaluate
redoing this on e.g. Netty, but:
- More generally, requests are not constant-size, which makes disruptor Entry re-use difficult
- The read stage is basically forced to be a separate thread pool because of blocking i/o
from disk
- StorageProxy is not yet asynchronous

Addressing the last of these is straightforward, but the others give me pause.

What I'd like to do is pick part of the system and see if converting that to Disruptor gives
a big enough win to be worth pursuing with a full-scale conversion, but given how Disruptor
wants to manage everything I'm not sure how to do that either!
> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>                 Key: CASSANDRA-4718
>                 URL:
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Priority: Minor
>         Attachments: baq vs trunk.png,
> Currently all our execution stages dequeue tasks one at a time.  This can result in contention
between producers and consumers (although we do our best to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more work in "bulk"
instead of just one task per dequeue.  (Producer threads tend to be single-task oriented by
nature, so I don't see an equivalent opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for this. 
However, no ExecutorService in the jdk supports using drainTo, nor could I google one.
> What I would like to do here is create just such a beast and wire it into (at least)
the write and read stages.  (Other possible candidates for such an optimization, such as the
CommitLog and OutboundTCPConnection, are not ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of ICommitLogExecutorService
may also be useful. (Despite the name these are not actual ExecutorServices, although they
share the most important properties of one.)

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message