zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2743) Netty connection leaks JMX connection bean upon connection close in certain race conditions.
Date Thu, 06 Apr 2017 05:35:42 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15958313#comment-15958313

ASF GitHub Bot commented on ZOOKEEPER-2743:

Github user hanm commented on a diff in the pull request:

    --- Diff: src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java ---
    @@ -87,6 +87,12 @@ public void close() {
                 LOG.debug("close called for sessionid:0x"
                         + Long.toHexString(sessionId));
    +        // ZOOKEEPER-2743:
    +        // Always unregister connection upon close to prevent
    +        // connection bean leak under certain race conditions.
    +        factory.unregisterConnection(this);
    --- End diff --
    That is fine, I might able to provide a formal verification of the theorem but here is
a quick prove of that case:
    * Assume close is called before connection bean is registered [1]
    * The unregister bean in close call is no-op because the bean is not registered. But the
channel will be closed, as part of close call.
    * Now before finalizing session returns, some sort of exception is going to throw, because
the channel is closed. Probably here [2].
    * As part of exception the close is called again. This time it will unregister the bean
(before this fix it will not, so it will miss this edge case.).
    Basically we are safe as close will be called multiple times and guaranteed at least one
close call will happen after cnx bean is registered. 
    [1] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L699

> Netty connection leaks JMX connection bean upon connection close in certain race conditions.
> --------------------------------------------------------------------------------------------
>                 Key: ZOOKEEPER-2743
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2743
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.10, 3.5.2
>            Reporter: Michael Han
>            Assignee: Michael Han
>              Labels: netty
> This is a tricky issue found while debugging failure of "flaky" watcher test (ZOOKEEPER-2686).
When closing a Netty connection, depend on timing the connection bean registered when the
connection was provisioned might not get unregistered, leading to leaked Java beans. 
> The race happens at the time when the client is in the process of finalizing the session.
As part of session finalization, a connection bean will be registered [1]. But right before
the registering bean, the connection might gets closed, in cases for example the server that
the client is connecting to is shutdown. As part of connection close, the bean will be un-registered,
as expected [2], however the problem is when we execute at [2], the connection bean might
not finish registering at [1], so the unregister of bean is a NOP. What's worse, as part of
connection close, we remove this connection from connection factory [3], so future connection
close call will get short circuited and directly return; in other words the bean unregister
code in connection close call will only get executed once. Depends on luck, the bean might
not get unregistered, as previously illustrated.
> [1] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L700
> [2] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L114
> [3] 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L96

This message was sent by Atlassian JIRA

View raw message