Zookeeper is primarily used to keep track of the changing topology. S4 is aimed at running on commodity hardware - nodes might go down anytime, and hence the need to keep track of the topology. The topology is written to zookeeper, and each running node watches the zookeeper node(s) to keep track of any changes. 

Hope that helps.


On Tue, Oct 2, 2012 at 9:17 PM, Frank Zheng <bearzheng2011@gmail.com> wrote:
Hi All,

I am exploring the cluster management mechanism and fault tolerance of S4.
I saw that S4 used ZooKeeper in the communication layer. But it seems not very clear in that pater, " S4: Distributed Stream Computing Platform".
I tried to search the reference "[15] Communication layer using ZooKeeper, Yahoo! Inc. Tech. Rep., 2009", but it is not available.
Could anyone introduce me the role of ZooKeeper in S4, and the cluster management mechanism in detail?