zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2743) Netty connection leaks JMX connection bean upon connection close in certain race conditions.
Date Sat, 01 Apr 2017 20:59:41 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15952406#comment-15952406

ASF GitHub Bot commented on ZOOKEEPER-2743:

GitHub user hanm opened a pull request:


    ZOOKEEPER-2743: Netty connection leaks JMX connection bean.

    See https://issues.apache.org/jira/browse/ZOOKEEPER-2743 for details on the symptom and
    There are many ways of fixing this. For example we can enforce certain ordering of the
close operations to eliminate the race condition however that would incur non trivial performance
penalty as a result of synchronization. I think the solution in this patch is simple and effective
as it observes that the bean unregister call is idempotent and the call itself is trivial
so we just always unregister connection upon close the connection, to ensure the bean is unregistered
regardless of the races.
    I've also stress tested this solution on internal Jenkins which indicates no failures
of https://issues.apache.org/jira/browse/ZOOKEEPER-2707 for the past two days.
    A side note is NIO connection in theory would suffer the same problem however I am unable
to reproduce the same racing with existing unit test. So I just leave NIO as it is, for now.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hanm/zookeeper ZOOKEEPER-2743

Alternatively you can review and apply these changes as the patch at:


To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #211
commit 7dd916c4c7c8d4e0e83cb5780175058a5626de19
Author: Michael Han <hanm@apache.org>
Date:   2017-04-01T20:37:48Z

    ZOOKEEPER-2743: Netty connection leaks JMX connection bean upon connection close in certain
race conditions.
    Always unregister connection upon close to prevent connection bean leak under certain
race conditions.


> Netty connection leaks JMX connection bean upon connection close in certain race conditions.
> --------------------------------------------------------------------------------------------
>                 Key: ZOOKEEPER-2743
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2743
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.4.10, 3.5.2
>            Reporter: Michael Han
>            Assignee: Michael Han
>              Labels: netty
> This is a tricky issue found while debugging failure of "flaky" watcher test (ZOOKEEPER-2686).
When closing a Netty connection, depend on timing the connection bean registered when the
connection was provisioned might not get unregistered, leading to leaked Java beans. 
> The race happens at the time when the client is in the process of finalizing the session.
As part of session finalization, a connection bean will be registered [1]. But right before
the registering bean, the connection might gets closed, in cases for example the server that
the client is connecting to is shutdown. As part of connection close, the bean will be un-registered,
as expected [2], however the problem is when we execute at [2], the connection bean might
not finish registering at [1], so the unregister of bean is a NOP. What's worse, as part of
connection close, we remove this connection from connection factory [3], so future connection
close call will get short circuited and directly return; in other words the bean unregister
code in connection close call will only get executed once. Depends on luck, the bean might
not get unregistered, as previously illustrated.
> [1] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/ZooKeeperServer.java#L700
> [2] https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L114
> [3] 
> https://github.com/apache/zookeeper/blob/master/src/java/main/org/apache/zookeeper/server/NettyServerCnxn.java#L96

This message was sent by Atlassian JIRA

View raw message