zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Swaminathan Gnanaskandan <gsw...@gmail.com>
Subject Zookeeper Observers across WAN : Network connectivity issues
Date Mon, 01 Feb 2016 23:16:19 GMT
Hi,
I am running zookeeper across two data centers. My primary data center
(DC1) has 3-node ensemble and the other data center (DC2) has 2 observer
nodes. Clients in DC1 connect to the main quorum of zookeepers while
clients in DC2 connect to the observers. The clients are written in java
and use the curator library to establish connections with zookeeper
(sessionTimeout = 6 seconds and connectionTimeout = 1 second). There are
about 100 watchers in the primary data center that monitor a ZK-node that
has about 15000 child ephemeral nodes. Each ephemeral node is created by a
new zookeeper connection (think of it as a process registering itself).
About 3000 of these nodes are created in DC2 while the others (12000) are
created in DC1. When there is a packet loss between the 2 data centers I
notice a couple of issues

- Some of the watchers in DC1 loose connectivity to zookeeper with the
exception org.apache.zookeeper.KeeperException$ConnectionLossException and
curator tries to re-establish the connection.
Why would this affect DC1? I can understand observers loosing their
connection to the leader thereby causing  clients in DC2 to experience
zookeeper connectivity issues. DC1 zookeeper does have to notify its
watchers that the 3000 nodes of DC2 no longer exist. Is this what is
causing ZK in  DC1 to misbehave
- The frequent client disconnects at times overload zookeeper- tries to
open too many connections that cause zookeeper to be unresponsive, it fails
the exhibhitor health checks and restarts.

Are there any known issues with having a large number of ZK watchers? If
so, apart from reducing watchers or child nodes under the ZK-node, is there
any other way to avoid this issue.

Thanks,
Swami

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message