Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F28BD200B8E for ; Mon, 26 Sep 2016 16:00:33 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F11A5160AC8; Mon, 26 Sep 2016 14:00:33 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 132B4160AB8 for ; Mon, 26 Sep 2016 16:00:32 +0200 (CEST) Received: (qmail 85180 invoked by uid 500); 26 Sep 2016 14:00:32 -0000 Mailing-List: contact commits-help@cassandra.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cassandra.apache.org Delivered-To: mailing list commits@cassandra.apache.org Received: (qmail 85167 invoked by uid 99); 26 Sep 2016 14:00:32 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2016 14:00:32 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 10DD72C0032 for ; Mon, 26 Sep 2016 14:00:32 +0000 (UTC) Date: Mon, 26 Sep 2016 14:00:32 +0000 (UTC) From: "Sylvain Lebresne (JIRA)" To: commits@cassandra.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (CASSANDRA-8457) nio MessagingService MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 26 Sep 2016 14:00:34 -0000 [ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523138#comment-15523138 ] Sylvain Lebresne commented on CASSANDRA-8457: --------------------------------------------- Here comes more review bits, though it's still incomplete as I have to context switch a bit. I'll start by a number of general remarks, followed by some more specific comments on some classes, and a few random nits to finish. The first thing that I think we should discuss is how we want to handle the transition to this (that is, what happens to the old implementation). Namely, I see 2 possible options: # we simply switch to Netty, period, and the old implementation goes away; # we keep both implementations in, with a configuration flag to pick which one is used as implemented in the patch, and this probably until 5.0. I think the main advantage of keeping both implementation for a while is that it's somewhat safer and easier for user to evaluate outside of the 4.0 upgrade itself. But the downsides are obviously that it's a lot more code to maintain, it's more messy right off the bat, and given only the new implementation will stay eventually, it's only pushing the switch to later. Personally, since we'll ship this in a new major (4.0) and since the cost of maintaining 2 implementations is pretty unpleasant and we don't have infinite devs resources, I'd have a slightly preference for option 1, but I'm not going to argue strongly either way. Mostly, I want us to decide, because I think this influence the final product here. The second thing I want to mention is streaming. I could be missing something here but it appears streaming is just not supported at all by the new code, meaning that the patch as is is unusable outside of toy testing (I mean by that that if you start a node with the Netty MessagingService, it seems you can't received any streaming, so you're not a fully functioning node). So I'm kind of wondering what is the plan here. I'm happy to thinker on another ticket as Streaming is a best on its own, but I'm not sure I'm ok committing anything until it's a fully working product. So anyway, with those talking point out of the way, a few more "technical" general remarks on the patch: * Netty's openssl: what's the fundamental difference between the new {{keyfile}} and the old {{keystore}}? It sounds confusing to have to configure new things when there is already pre-existing parameters for SSL. * Some "old" code is made public (many constants in particular) and reused by the "new" code, while other bits are copy-pasted, but it's not very consistent and makes things sometimes a bit hard to follow ({{QueuedMessage}} (used by new code too) being an internal class of {{OutboundTcpConnection}} (old code) is a particularly bad offender imo). That's typically where I want us to make a decision on the point I raise above, as whatever decision we take I feel there is some cleanups to be done. * It would be really nice to have a package javadoc for the new "async" package that explains the main classes involved and how they interact. * There is a few TODO that needs to be handled. * I feel some of the naming could be made more consistent. In particular, the patch uses Client/Server for some classes (e.g. {{ClientHandshakeHandler}}/{{ClientConnector}}) but In/Out for others (e.g. {{MessageInHandler}}). And things like {{InternodeMessagingPool}} is imo less good than the old {{OutboundTcpConnectionPool}} in that it doesn't indicates it's only for outbound connections. I'd prefer avoid Client/Server for internode stuffs and reserve it for native protocol classes, sticking to In/Out (or Inbound/Outbound when that reads better) for internode classes, like we did before. I understand this may have been to avoid name clash with existing classes, but maybe that's where we should also decide how long the old implementation to stay in. Then a few more specific comments: * In {{MessagingService}}: ** We should probably move everything related to {{ServerSocket}} inside the new {{BlockingIOSender}} (though again, if we decide to just ditch the old implementation, it simply goes away). ** Related to the previous point, the {{getSocketThreads()}} is used by {{StreamingTransferTest}} to apparently wait for the streaming connections to close. Returning an empty list seems to kind of break that. I think we could replace the method by some {{awaitAllStreamingConnectionsClosed()}} (that would still just be for testing). Of course, that's kind of streaming related. ** I would probably rename {{switchIpAddress()}} to {{reconnectWithNewIp()}} as it makes it more explicit what does happen (or at least have some javadoc). Similarly, {{getCurrentEndpoint()}} is imo no very clearly named (not this patch fault though), maybe {{getPreferredAddress()}}? ** In {{createConnection()}}, it appears the old implementation was starting the connections and waiting on them to be properly started before returning, but the new implementation doesn't. Not sure how important that is in practice, but it feels unintuitive that a {{createConnection()}} method wouldn't actually do anything. ** Seems {{NettySender.listen()}} always starts a non-secure connection, even if we have {{ServerEncryptionOptions.InternodeEncryption.all}}, which sounds wrong. * In {{ClientConnect}}, in {{connectComplete}}, checking that the future is done is somewhat non-sensical as it shouldn't happen by definition of the method. Maybe an {{assert}} instead? Also, the method should be {{@VisibleForTesting}}. * In {{ClientHandshakeHandler}}: ** There seem to be a typo in javadoc of {{remoteAddr}} (at least I'm having a hard time parsing it). ** Could abstract the message size (in the call to {{ioBuffer}}) into a constant, if only for symmetry with {{SECOND_MESSAGE_LENGTH}}. ** {{handshakeTimeout}} feels misnamed (with the current name, the call in {{exceptionCaught}} is confusing: "why are we timeouting on an exception?!"). It's really failing the handshake, so I'd call it {{failHandshake}} or {{abortHanshake}}. Somewhat related, the field {{handshakeResponse}} is really a future on the timeout, so I'd call it {{timeoutFuture}}. ** In {{decode}}, there is 2 different cases of version mismatch: one where we use to do a hard disconnect, and one where we did a "soft" one. The new implementation still pretend to do something different in the message it logs, but doesn't appear to make a difference in practice, which is confusing. ** It's a tad unexpected to have {{ClientHandshakeResult}} as a subclass of {{InternodeMessagingConnection}} rather than of this class ({{ClientHandshakeHandler}}), and the fact some final parts of the handshake are actually happening in {{InternodeMessagingConnection}} make things a tad harder to follow. ** If the handshake timeout, it doesn't seem we log anything special. Might be worth at least having some debug message? * In {{CoalescingMessageOutHandler}}: ** It would feel simpler (and possibly more efficient) to not add the handler to the pipeline at all if the strategy isn't doing coalescing (and assert it does in the handler). ** In {{write}}, we can avid adding the listener if {{coalesceCallback == null}} rather than checking inside the listener. And that's mostly it for now, but more comments on other classes should come at a later time. Just a few small random nits for the road: * In {{IncomingTcpConnection}}, a new {{from}} parameter is added to {{receiveMessage}} but it seems unused. * You editor seems to expand {{.*}} imports, which adds unecessary noise to the patch/we generally don't do. * {{MessageOutHandler}} creates it's {{Logger}} using {{CoalescingMessageOutHandler}} rather than himself (copy-paste typo I assume). > nio MessagingService > -------------------- > > Key: CASSANDRA-8457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8457 > Project: Cassandra > Issue Type: New Feature > Reporter: Jonathan Ellis > Assignee: Jason Brown > Priority: Minor > Labels: netty, performance > Fix For: 4.x > > > Thread-per-peer (actually two each incoming and outbound) is a big contributor to context switching, especially for larger clusters. Let's look at switching to nio, possibly via Netty. -- This message was sent by Atlassian JIRA (v6.3.4#6332)