cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ariel Weisberg (JIRA)" <>
Subject [jira] [Commented] (CASSANDRA-8457) nio MessagingService
Date Wed, 17 Dec 2014 14:52:14 GMT


Ariel Weisberg commented on CASSANDRA-8457:

#1 The wakeup is protected by a CAS so in the common case there shouldn't be multiple threads
contending to dispatch. The synchronized block is there for the case where the thread that
is finishing up dispatching signals that it is going to sleep and a dispatch task will be
necessary for the next submission. At that point it has to check the queue one more time to
avoid lost wakeups, and it is possible a new dispatch task will be created while that is happening.
The synchronized forces the new task to wait while the last check and drain completes. How
often this race occurs and blocks a thread I have no idea. I could add a counter and check.

The only way to avoid it is to lock while checking the queue empty condition and updating
the needs wakeup field, or to have a 1:1 mapping between sockets and dispatch threads (AKA
not SEP). This would force producers to lock on task submission as well. I don't see how the
dispatch task can atomically check that there is no work to do and set the needs wakeup flag
at the same time. At that point is there a reason to use a lock free queue? 

#2 I didn't replace the queue because I needed to maintain size for the dropped message functionality
and I didn't want to reason about maintaining size non-atomically with queue operations like
offer/poll/drainto. I could give it a whirl. I am also not sure how well iterator.remove in
CLQ works, but I can check.

#3 Indeed this is a a typo

Jake it definitely doesn't address several sources of signaling, but should reduce total #
of threads signaled per request.

I will profile the two versions today and then add more nodes. For benchmark purposes I could
disable the message dropping functionality and use MPSCLinkedQueue from Netty.

> nio MessagingService
> --------------------
>                 Key: CASSANDRA-8457
>                 URL:
>             Project: Cassandra
>          Issue Type: New Feature
>          Components: Core
>            Reporter: Jonathan Ellis
>            Assignee: Ariel Weisberg
>              Labels: performance
>             Fix For: 3.0
> Thread-per-peer (actually two each incoming and outbound) is a big contributor to context
switching, especially for larger clusters.  Let's look at switching to nio, possibly via Netty.

This message was sent by Atlassian JIRA

View raw message