hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "ZooKeeper/FAQ" by ChangSong
Date Fri, 05 Nov 2010 05:13:50 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "ZooKeeper/FAQ" page has been changed by ChangSong.
The comment on this change is: Added section 8. Can I run an ensemble cluster behind a load


  See [[http://bit.ly/4ekN8G|this page]] for a survey Patrick Hunt (http://twitter.com/phunt)
did looking at operational latency with both standalone server and an ensemble of size 3.
You'll notice that a single core machine running a standalone ZK ensemble (1 server) is still
able to process 15k requests per second. This is orders of magnitude greater than what most
applications require (if they are using ZooKeeper correctly - ie as a coordination service,
and not as a replacement for a database, filestore, cache, etc...)
+ <<BR>>
+ <<Anchor(8)>>
+ '''8. [[#8|Can I run an ensemble cluster behind a load balancer?]]'''
+ There are two types of servers failures in distributed system from socket I/O perspective.
+  1. server down due to hardware failures and OS panic/hang, Zookeeper daemon hang, temporary/permanent
network outage, network switch anomaly, etc: client cannot figure out failures immediately
since there is no responding entities. As a result, zookeeper clients must rely on timeout
to identify failures.
+  1. Dead zookeeper process (daemon): since OS will respond to closed TCP port, client will
get "connection refused" upon socket connect or "peer reset" on socket I/O. Client immediately
notice that the other end failed.
+ Here's how ZK clients respond to servers in each case.
+  1. In this case (former), ZK client rely on heartbeat algorithm. ZK clients detects server
failures in 2/3 of recv timeout (Zookeeper_init), and then it retries the same IP at every
recv timeout period if only one of ensemble is given. If more than two ensemble IP are given,
ZK clients will try next IP immediately.
+  1. In this scenario, ZK client will immediately detect failure, and will retry connecting
every second assuming only one ensemble IP is given. If multiple ensemble IP is given (most
installation falls into this category), ZK client retries next IP immediately.
+ Notice that in both cases, when more than one ensemble IP is specified, ZK clients retry
next IP immediately with no delay.
+ On some installations, it is preferable to run an ensemble cluster behind a load balancer
such as hardware L4 switch, TCP reverse proxy, or DNS round-robin because such setup allows
users to simply use one hostname or IP (or VIP) for ensemble cluster, and some detects server
failures as well.
+ But there are subtle differences on how these load balancers will react upon server failures.
+  * Hardware L4 load balancer: this setup involves one IP and a hostname. L4 switch usually
does heartbeat on its own, and thus removes non-responding host from its IP list. But this
also relies on the same timeout scheme for fault detection. L4 may redirect you to a unresponsive
server. If hardware LB detect server failures fast enough, this setup will always redirect
you to live ensemble server. 
+  * DNS round robin: this setup involves one hostname and a list of IPs. ZK clients correctly
make used of a list of IPs returned by DNS query. Thus this setup works the same way as multiple
hostname (IP) argument to zookeeper_init. The drawback is that when an ensemble cluster configuration
changes like server addition/removal, it may take a while to propagate the DNS entry change
in all DNS servers and DNS client caching (nscd for example) TTL issue.
+ In conclusion, DNS RR works as good as a list of ensemble IP arguments except cluster reconfiguration

View raw message