cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Brown (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8457) nio MessagingService
Date Mon, 27 Mar 2017 12:33:42 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943160#comment-15943160
] 

Jason Brown commented on CASSANDRA-8457:
----------------------------------------

- I really like what you've done with {{HandshakeProtocol}}. It's a nice clarification/centralization
of the handshake data.
- I'm cool with ditching the {{DisabledCoalescingStrategy}}, but didn't like passing {{null}}
around. Thus, I switched to passing {{Optional<CoalesingStrategy>}} as the intent is
a bit more explicit.
- With the refactorings, it made it easier to remove {{OutboundConnection}} as it's reduced
functionality can now live simply in {{OutboundMessagingConnection}}, without complicating
it.
- addressed the comments about timeouts on the outbound handshake side, and reduced them to
a single timeout (the one that was already in {{OutboundMessagingConnection}})
- upgraded to netty 4.1.9
- updated a lot of comments and documentation.
- added in a lot more tests. code coverage is now > 90% for the {{org.apache.cassandra.net.async}}
package.

One remaining open issue is the current patch does not expire messages on the producer thread,
like what we used to do in {{OutboundTcpConnection#enqueue}}. I'm awaiting the outcome of
CASSANDRA-13324, and will apply the result here.	

bq. In {{connectionTimeout()}}, what happens when a connection attempt timeout?

This is what [{{OutboundTcpConnnection}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L248]
does, and I've replicated it here. Note that there could be ssveral connection attempts before
the connection timeout triggers.

bq. in the DISCONNECT/NEGOTIATION_FAILURE case, we don't seem to actively trying to reconect

The {{DISCONNECT}} case handles the cases when the protocol versions do not match. In {{OutboundTcpConnnection}},
if the [peer's version is lesser than|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L465]
what we expected it to be, we clear the backlog and do not attempt to reconnect until more
messages are {{#enqueue()}}'d. On the contrary, if the [peer's version is greater than|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L488]
what we expected it to be, we'll finish the handshake, send what's currently in the backlog,
and then disconnect. 

TBH, I think both of these behaviors are incorrect. If the peer's version is lesser than what
we expect, we know how to send messages (both handshake and message serialization), so we
could send what we have in the backlog without throwing them away. If the peer's version is
greater than what we expect, if we try to send messages it might fail on the receiver side
due to a change in the handshake sequence (unlikely) or messaging framing/format (likely).
WDYT? Either way, I'll need to update {{OutboundMessageConnection}} as it's not quite correct
rn.

{{NEGOTIATION_FAILURE}} does what [{{OutboundTcpConnnection}}|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L453]:
if we can't get a version from the peer, either we couldn't connect or the handshake failed.
I think the current behavior (throw away the backlog) is desirable.

bq. In {{InboundHandshakeHandler.handleMessagingStartResponse()}}, maybe we should check the
return of handshakeTimeout.cancel()

As the {{handshakeTimeout}} executes in the netty event loop, it won't complete with the {{#decode}}
method. As {{#failHandshake()}} closes the channel, we won't even get the {{ThirdHandshakeMessage}}
into {{handleMessagingStartResponse}} as {{#decode()}} would never be triggered.

bq. Regarding the sending and receiving buffer size handling, ....

I agree with all your points and have implemented them.

> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big contributor to context
switching, especially for larger clusters.  Let's look at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message