zookeeper-notifications mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From GitBox <...@apache.org>
Subject [GitHub] [zookeeper] symat commented on issue #1048: ZOOKEEPER-3188: Improve resilience to network
Date Tue, 08 Oct 2019 13:48:57 GMT
symat commented on issue #1048: ZOOKEEPER-3188: Improve resilience to network
URL: https://github.com/apache/zookeeper/pull/1048#issuecomment-539522496
   **Thanks for all of you for the comments and review so far!** 
   I extended a few exiting test classes (e.g. `LearnerTest`, `CnxManagerTest`) with multiple-address
related cases, and added a new Test class (`QuorumPeerMainMultiAddressTest`) that contains
tests for:
   - starting servers with multiple addresses
   - starting servers with some unreachable addresses
   - testing dynamic reconfiguration (both adding or removing addresses; both incremental
and non-incremental reconfigs)
   - test all the above also with IPv6
   An important change: due to the way how we represent the server configs during dynamic
reconfig, I had to change the separator character from comma to pipe. So the valid config
format now looks like:
   **regarding the impact on dynamic reconfig:**
   - we have now a few basic tests for dynamic reconfig, based on these I consider no risks
here at the moment @hanm: what do you think? are the tests in QuorumPeerMainMultiAddressTest
enough for covering basic dynamic reconfig use-cases? 
   - Some remarks might be good to document regarding dynamic reconfig and multi-address:

     - removing addresses / ports which are used currently can cause a new leader election
     - Also it is a good practice to make sure all the ZK servers are connected when we issue
a dynamic reconfigure. (an offline ZK server might be unable to re-join to the quorum if the
other servers reconfigure their ports / addresses)
     - I actually saw during the tests, that simply adding new addresses for servers with
incremental re-config does not cause a new leader election (although I haven't created an
automated test explicitly forcing this). I am not sure if it is worth to add a new test for
this, as dynamic reconfigs are relatively rare events during production use I guess.
   **I try to summarize the open discussions / tasks** to see how much work is required to
close this issue:
   - we should add the new config format to the documentation
   - we should extend the dynamic reconfig documentation with the comments above
   - we should test (for now I propose only to try out manually) and document:
     - the impact on quorum TLS / kerberos feature
     - the impact on rolling upgrades
   - and of course we need to resolve any code related conversation what would be still open
(e.g. on parallel streams with @eolivelli )
   **I think if the tasks above are concluded, then it is safe to merge the PR.**  What do
you think?
   The automated integration tests would be nice to have definitely. These test cases I have
in mind:
   - Automated integration test for rolling upgrades (e.g. from 3.5.6 to master) without changing
the config
   - Automated integration test for rolling upgrades (e.g. from 3.5.6 to master) with changing
the config to use multiple addresses during the upgrade
   - Automated integration test for simulating the case of network failure during the use
of multiple addresses
     - hint: use docker with multiple network interfaces (see: https://github.com/symat/zookeeper-docker-test
     - I am doing this test manually after each of my commits
   **I would cover these integration tests in follow-up jira ticket(s?), as these seems to
be not trivial to automatize.** What do you think?

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:

With regards,
Apache Git Services

View raw message