cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8457) nio MessagingService
Date Tue, 06 Jan 2015 17:04:36 GMT


Ariel Weisberg commented on CASSANDRA-8457:

I think I stumbled onto what is going on based on Benedict's suggestion to disable TCP no
delay. It looks like there is a small message performance issue. 

This is something I have seen before in EC2 where you can only send a surprisingly small number
of messages in/out of a VM. I don't have the numbers from when I micro benchmarked it, but
it is something like 450k messages with TCP no delay and a million or so without. Adding more
sockets helps but it doesn't  even double the number of messages in/out. Throwing more cores
at the problem doesn't help you just end up with under utilized cores which matches the mysterious
levels of starvation I was seeing in C* even though I was exposing sufficient concurrency.

14 servers nodes. 6 client nodes. 500 threads per client. Server started with
        "row_cache_size_in_mb" : "2000",
        "rpc_max_threads" : "1024",
        "rpc_min_threads" : "16",
        "native_transport_max_threads" : "1024"
8-gig old gen, 2 gig new gen.

Client running CL=ALL and the same schema I have been using throughout this ticket.

With no delay off
First set of runs
After replacing 10 instances

No delay on 

Modified trunk to fix a bug batching messages and add a configurable window for coalescing
multiple messages into a socket see
||Coalesce window microseconds|Throughput||
|250| 502614|
|200| 496206|
|150| 487195|
|100| 423415|
|50| 326648|
|25| 308175|
|12| 292894|
|6| 268456|
|0| 153688|

I did not expect get mileage out of coalescing at the application level but it works extremely
well. CPU utilization is still low at 1800%. There seems to be less correlation between CPU
utilization and throughput as I vary the coalescing window and throughput changes dramatically.
I do see that core 0 is looking pretty saturated and is only 10% idle. That might be the next
or actual bottleneck.

What role this optimization plays at different cluster sizes is an important question. There
hast to be a tipping point where coalescing stops working  because not enough packets go to
each end point at the same time. With vnodes it wouldn't be unusual to be communicating with
a large number of other hosts right?

It also takes a significant amount of additional latency to get the mileage at high levels
of throughput, but at lower concurrency there is no benefit and it will probably show up as
decreased throughput. It makes it tough to crank it up as a default. Either it is adaptive
or most people don't get the benefit.

At high levels of throughput it is a clear latency win. Latency is much lower for individual
requests on average. Making this a config option is viable as a starting point. Possibly a
separate option for local/remote DC coalescing. Ideally we could make it adapt to the workload.

I am going to chase down what impact coalescing has at lower levels of concurrency so we can
quantify the cost of turning it on. I'm also going to try and get to the bottom of all interrupts
going to core 0. Maybe it is the real problem and coalescing is just a band aid to get more

> nio MessagingService
> --------------------
>                 Key: CASSANDRA-8457
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 3.0
> Thread-per-peer (actually two each incoming and outbound) is a big contributor to context
switching, especially for larger clusters.  Let's look at switching to nio, possibly via Netty.

This message was sent by Atlassian JIRA

View raw message