zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From DanBenediktson <...@git.apache.org>
Subject [GitHub] zookeeper pull request #330: ZOOKEEPER-2471: ZK Java client should not count...
Date Wed, 09 Aug 2017 17:05:07 GMT
GitHub user DanBenediktson opened a pull request:

    https://github.com/apache/zookeeper/pull/330

    ZOOKEEPER-2471: ZK Java client should not count sleep time as connect time

    ClientCnxnSocket uses a member variable "now" to track the current time, but does not
update it at all potentially-blocking times: in particular, it does not update it after the
random sleep introduced if an initial connect attempt fails. This results in the random sleep
time being counted towards connect time, resulting in incorrect application of connection
timeout currently, and if ZOOKEEPER-2869 is taken, a very real possibility (we have seen it
in production) of wedging the Zookeeper client so that it can never successfully reconnect,
because its sleep time may grow beyond its connection timeout, especially in scenarios where
there is a big gap between negotiated session timeout and client-requested session timeout.
    
    Rather than fixing the bug by adding another "updateNow()" call, keeping the brittle "updateNow()"
implementation which led to the bug in the first place, I have deleted updateNow() and replaced
usage of that member variable with actually getting the current system timestamp whenever
the implementation needs to know the current time.
    
    Regarding unit testing, this is, IMO, too difficult to test without introducing a lot
of invasive changes to ClientCnxn.java, seeing as the only effective change is that, on connection
retry, the random sleep time is no longer counted towards a time budget. I can throw a lot
of mocks at this, like ClientReconnectTest, but I'm still going to be stuck depending on the
behavior of that randomly-generated sleep time, which is going to be inherently unreliable.
If a fix is taken for ZOOKEEPER-2869, this should become much easier to test, since I will
then be able to inject a different backoff sleep behavior, and since I'm planning to submit
a pull request for that ticket as well, so maybe as a compromise I can submit a test for this
bug fix at that time?

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/DanBenediktson/zookeeper ZOOKEEPER-2471

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zookeeper/pull/330.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #330
    
----
commit 60f38726e7f07b4bb970cc8fb089363ff48eb3df
Author: Dan Benediktson <dbenediktson@twitter.com>
Date:   2017-08-09T16:41:42Z

    ZOOKEEPER-2471: Zookeeper Java client should not count time spent sleeping as time spent
connecting
    
    Rather than keep the brittle "updateNow()" implementation which led to the bug and fixing
the bug by
    adding another "updateNow()" call, I have deleted updateNow() and replaced usage of that
member variable
    with actually getting the current system timestamp.
    
    This is, IMO, too difficult to test without introducing a lot of invasive changes to ClientCnxn.java,
    seeing as the only effective change is that, on connection retry, a random sleep time
is no longer
    counted towards a time budget. If a fix is taken for ZOOKEEPER-2869, this should become
much easier to
    test, and since I'm planning to submit a pull request for that ticket as well, maybe as
a compromise
    I can submit a test for this patch at that time?

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message