zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: RC failure root cause: ICMP throttling settings on mac
Date Thu, 23 Jan 2020 16:10:49 GMT
I think that this is far outside the normal operation bounds and has an
easy work-around.

First, it is very uncommon to run more than 5 ZK nodes. Running 23 on a
single host is bizarre (viewed from an operational lens).

Second, there is a simple configuration change that makes the strange
configuration work anyway.

A third point unrelated to operational considerations is that there is risk
in making a last minute changes to code. That risk is borne by normal
configurations as well as these unusual ones.

In sum,

- this might look like a P1 (system down) issue, but there is a workaround
so it is certainly no more than P2

- even P2 is unwarranted because the is a non-production configuration

- a P3 issue isn't a stop-ship issue.



On Fri, Jan 17, 2020 at 5:17 AM Szalay-Bekő Máté <szalay.beko.mate@gmail.com>
wrote:

> TLDR:
> During testing RC for 3.6.0, we found that ZooKeeper cluster with large
> number of ensemble members (e.g. 23) can not start properly. This issue
> seems to happen only on mac, and a workaround is to disable the ICMP
> throttling. The question is if this workaround is enough for the RC, or if
> we should change the code in ZooKeeper to limit the number of ICMP
> requests.
>
>
> The problem:
>
> On linux, I haven't been able to reproduce the problem. I tried with 5, 9,
> 15 and 23 ensemble members and the quorum always seems to start properly in
> a few seconds. (I used OpenJDK 1.8.232 on Ubuntu 18.04)
>
> On mac, the problem is consistently happening for large ensembles. The
> server is very slow to start and we see a lot of warnings in the log like
> these:
>
> 2020-01-15 20:02:13,431 [myid:13] - WARN
>  [ListenerHandler-phunt-MBP13.local/192.168.1.91:4193:QuorumCnxManager@691
> ]
> - None of the addresses (/192.168.1.91:4190) are reachable for sid 10
> java.net.NoRouteToHostException: No valid address among [/
> 192.168.1.91:4190]
>
> 2020-01-17 11:02:26,177 [myid:4] - WARN
>  [Thread-2531:QuorumCnxManager$SendWorker@1269] - destination address /
> 127.0.0.1 not reachable anymore, shutting down the SendWorker for sid 6
>
> The exception is happening when the new MultiAddress feature tries to
> filter the unreachable hosts from the address list when trying to decide
> which election address to connect. This involves the calling of the
> InetAddress.isReachable method with a default timeout of 500ms, which goes
> down to a native call in java and basically try to do a ping (an ICMP echo
> request) to the host. Naturally, the localhost should be always reachable.
> This call gets timeouted on mac if we have many ensemble members. I tested
> with 9 members and the cluster started properly. With 11-13-15 members it
> took more and more time to get the cluster to start, and the
> "NoRouteToHostException" started to appear in the logs. After around 1
> minute the 15 ensemble members cluster started, but obviously this is way
> too long.
>
> On mac, we we have the ICMP rate limit set to 250 by default. You can turn
> this off using the following command: sudo sysctl -w
> net.inet.icmp.icmplim=0
> (see https://krypted.com/mac-os-x/disable-icmp-rate-limiting-os-x/)
>
> Using the above command before starting the 23 ensemble members cluster
> locally seems to solve the problem for me. (can someone verify?) The
> question is if this workaround is enough or not.
>
> As far as I can tell, the current code will generate 2*A*(M-1) ICMP calls
> in each ZooKeeper server during startup, if 'X' is the number of ensemble
> members and 'A' is the number of election addresses provided for each
> member. This is not that high, if each ZooKeeper server is started on a
> different machine, but if we start a lot of ZooKeeper servers on a single
> machine, then it can quickly go beyond the predefined limit of 250 for mac.
>
> OPTION 1: we keep the code as it is. we might change the documentation for
> zkconf mentioning this mac specific issue and the way how to disable the
> ICMP rate limit.
>
> OPTION 2: we change the code not to filter the list of election addresses
> if the list has only a single element. This seems to be a logical way to
> decrease the ICMP requests. However, if we would run a large number of
> ZooKeeper servers on a single machine using multiple election addresses for
> each server, we would get the same problem (most probably even quicker)
>
> OPTION 3: make the address filtering configurable and change zkconf to
> disable it by default. (but disabling will make the quorum potentially
> unable to recover during network failures, so it is not recommended during
> production)
>
> OPTION 4: refactor the MultiAddress feature and remove the ICMP calls from
> the ZooKeeper code. However, it is clearly helps for the quick recovery
> during network failures... at the moment I can't think any good solution to
> avoid it.
>
> Kind regards,
> Mate
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message