cassandra-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sylvain Lebresne (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CASSANDRA-8457) nio MessagingService
Date Mon, 26 Sep 2016 14:00:32 GMT

    [ https://issues.apache.org/jira/browse/CASSANDRA-8457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15523138#comment-15523138
] 

Sylvain Lebresne commented on CASSANDRA-8457:
---------------------------------------------

Here comes more review bits, though it's still incomplete as I have to context switch a bit.
I'll start by a number of general remarks, followed by some more specific comments on some
classes, and a few random nits to finish.

The first thing that I think we should discuss is how we want to handle the transition to
this (that is, what happens to the old implementation). Namely, I see 2 possible options:
# we simply switch to Netty, period, and the old implementation goes away;
# we keep both implementations in, with a configuration flag to pick which one is used as
implemented in the patch, and this probably until 5.0.
I think the main advantage of keeping both implementation for a while is that it's somewhat
safer and easier for user to evaluate outside of the 4.0 upgrade itself. But the downsides
are obviously that it's a lot more code to maintain, it's more messy right off the bat, and
given only the new implementation will stay eventually, it's only pushing the switch to later.
Personally, since we'll ship this in a new major (4.0) and since the cost of maintaining 2
implementations is pretty unpleasant and we don't have infinite devs resources, I'd have a
slightly preference for option 1, but I'm not going to argue strongly either way. Mostly,
I want us to decide, because I think this influence the final product here.

The second thing I want to mention is streaming. I could be missing something here but it
appears streaming is just not supported at all by the new code, meaning that the patch as
is is unusable outside of toy testing (I mean by that that if you start a node with the Netty
MessagingService, it seems you can't received any streaming, so you're not a fully functioning
node). So I'm kind of wondering what is the plan here. I'm happy to thinker on another ticket
as Streaming is a best on its own, but I'm not sure I'm ok committing anything until it's
a fully working product.

So anyway, with those talking point out of the way, a few more "technical" general remarks
on the patch:
* Netty's openssl: what's the fundamental difference between the new {{keyfile}} and the old
{{keystore}}? It sounds confusing to have to configure new things when there is already pre-existing
parameters for SSL.
* Some "old" code is made public (many constants in particular) and reused by the "new" code,
while other bits are copy-pasted, but it's not very consistent and makes things sometimes
a bit hard to follow ({{QueuedMessage}} (used by new code too) being an internal class of
{{OutboundTcpConnection}} (old code) is a particularly bad offender imo). That's typically
where I want us to make a decision on the point I raise above, as whatever decision we take
I feel there is some cleanups to be done.
* It would be really nice to have a package javadoc for the new "async" package that explains
the main classes involved and how they interact.
* There is a few TODO that needs to be handled.
* I feel some of the naming could be made more consistent. In particular, the patch uses Client/Server
for some classes (e.g. {{ClientHandshakeHandler}}/{{ClientConnector}}) but In/Out for others
(e.g. {{MessageInHandler}}). And things like {{InternodeMessagingPool}} is imo less good than
the old {{OutboundTcpConnectionPool}} in that it doesn't indicates it's only for outbound
connections. I'd prefer avoid Client/Server for internode stuffs and reserve it for native
protocol classes, sticking to In/Out (or Inbound/Outbound when that reads better) for internode
classes, like we did before. I understand this may have been to avoid name clash with existing
classes, but maybe that's where we should also decide how long the old implementation to stay
in.

Then a few more specific comments:
* In {{MessagingService}}:
** We should probably move everything related to {{ServerSocket}} inside the new {{BlockingIOSender}}
(though again, if we decide to just ditch the old implementation, it simply goes away).
** Related to the previous point, the {{getSocketThreads()}} is used by {{StreamingTransferTest}}
to apparently wait for the streaming connections to close. Returning an empty list seems to
kind of break that. I think we could replace the method by some {{awaitAllStreamingConnectionsClosed()}}
(that would still just be for testing). Of course, that's kind of streaming related.
** I would probably rename {{switchIpAddress()}} to {{reconnectWithNewIp()}} as it makes it
more explicit what does happen (or at least have some javadoc). Similarly, {{getCurrentEndpoint()}}
is imo no very clearly named (not this patch fault though), maybe {{getPreferredAddress()}}?
** In {{createConnection()}}, it appears the old implementation was starting the connections
and waiting on them to be properly started before returning, but the new implementation doesn't.
Not sure how important that is in practice, but it feels unintuitive that a {{createConnection()}}
method wouldn't actually do anything.
** Seems {{NettySender.listen()}} always starts a non-secure connection, even if we have {{ServerEncryptionOptions.InternodeEncryption.all}},
which sounds wrong.
* In {{ClientConnect}}, in {{connectComplete}}, checking that the future is done is somewhat
non-sensical as it shouldn't happen by definition of the method. Maybe an {{assert}} instead?
Also, the method should be {{@VisibleForTesting}}.
* In {{ClientHandshakeHandler}}:
** There seem to be a typo in javadoc of {{remoteAddr}} (at least I'm having a hard time parsing
it).
** Could abstract the message size (in the call to {{ioBuffer}}) into a constant, if only
for symmetry with {{SECOND_MESSAGE_LENGTH}}.
** {{handshakeTimeout}} feels misnamed (with the current name, the call in {{exceptionCaught}}
is confusing: "why are we timeouting on an exception?!"). It's really failing the handshake,
so I'd call it {{failHandshake}} or {{abortHanshake}}. Somewhat related, the field {{handshakeResponse}}
is really a future on the timeout, so I'd call it {{timeoutFuture}}.
** In {{decode}}, there is 2 different cases of version mismatch: one where we use to do a
hard disconnect, and one where we did a "soft" one. The new implementation still pretend to
do something different in the message it logs, but doesn't appear to make a difference in
practice, which is confusing.
** It's a tad unexpected to have {{ClientHandshakeResult}} as a subclass of {{InternodeMessagingConnection}}
rather than of this class ({{ClientHandshakeHandler}}), and the fact some final parts of the
handshake are actually happening in {{InternodeMessagingConnection}} make things a tad harder
to follow.
** If the handshake timeout, it doesn't seem we log anything special. Might be worth at least
having some debug message?
* In {{CoalescingMessageOutHandler}}:
** It would feel simpler (and possibly more efficient) to not add the handler to the pipeline
at all if the strategy isn't doing coalescing (and assert it does in the handler).
** In {{write}}, we can avid adding the listener if {{coalesceCallback == null}} rather than
checking inside the listener.

And that's mostly it for now, but more comments on other classes should come at a later time.
Just a few small random
nits for the road:
* In {{IncomingTcpConnection}}, a new {{from}} parameter is added to {{receiveMessage}} but
it seems unused.
* You editor seems to expand {{.*}} imports, which adds unecessary noise to the patch/we generally
don't do.
* {{MessageOutHandler}} creates it's {{Logger}} using {{CoalescingMessageOutHandler}} rather
than himself (copy-paste typo I assume).


> nio MessagingService
> --------------------
>
>                 Key: CASSANDRA-8457
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8457
>             Project: Cassandra
>          Issue Type: New Feature
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: netty, performance
>             Fix For: 4.x
>
>
> Thread-per-peer (actually two each incoming and outbound) is a big contributor to context
switching, especially for larger clusters.  Let's look at switching to nio, possibly via Netty.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message