From "Washko, Daniel" <dwas...@gannett.com>
Subject Re: New to zookeeper
Date Wed, 12 Jul 2017 14:53:43 GMT
I speak strictly from my experience with Zookeeper and not an any official capacity of the
project or of exhibitor.

Exhibitor works great and allows you to easily automate clustering zookeeper nodes into an
ensemble and discovering the individual nodes in the ensemble via an http call. We ran into
a problem, though, after we implemented Exhibitor across our infrastructure. Every so often
our Zookeeper ensembles lost the data they stored. While I cannot say this was caused by Exhibitor,
we have Solr clouds where Exhibitor was not used and they never had this problem. My suspicion
is that there was a problem with a zookeeper node and Exhibitor removed that node from the
ensemble then did a rolling restart. When that node recovered for some reason the data was
corrupted or lost. Exhibitor pulled that node back into the ensemble and did a rolling restart.
That node became leader and when the others joined synced from that. Those nodes then dumped
their data stored to be in sync with the leader. This is my speculation, I have had a very
hard time replicating this and have not heard of anyone else having this problem. Again, I
am not definitively saying Exhibitor is the cause of this but since we removed Exhibitor this
problem has not occurred.

Zookeeper 3.5.x branch adds discovery functionality and does automated clustering. It’s
great, but from what I understand is still in alpha. 

Prior to the 3.5.x branch I know of no way to discover what nodes are actually in the ensemble.
The 4 letter commands will tell you whether a node is in an ensemble, whether it is a leader
or follower, but it will not tell you what ensemble it is in or list any other node information.
If someone has a way to do this please post, because I have looked all over. 

We make use of Scalr and that adds an additional layer to automation. I run orchestration
scripts in Scalr that discover the other running zookeeper nodes in (what Scalr calls) the
same Farm Role. This script configures each node with the information for the other nodes
and does a restart of Zookeeper to bring them into an ensemble. Then it collects this information
and stores the IP addresses into a Global Variable in scalr that is available then to Solr.
Changes to the ensemble are reflected in this variable that is then passed to the Solr cloud
where a restart of the service will update the zookeeper information in Solr. We are working
towards moving this functionality to Consul where it will register ther zookeeper ensemble
information allowing Solr to pull it from Consul as opposed to relying on Global Variables.
What I am getting at is that outside the 3.5.x branch, automating this takes a bit of work.

Daniel S Washko
Solutions Architect

dwashko@gannett.com  <http://www.gannett.com/>
On 7/11/17, 6:58 PM, "Luigi Tagliamonte" <luigi.tagliamonte86@gmail.com> wrote:

    Hello, Zookeeper Users!
    I'm currently configuring/exploring zookeeper.
    I'm reading a lot about ensembles and scaling and I got some question that
    I'd like to submit to an expert audience.
    I need zookeeper as Kafka dependency so my deployment goal is the ensemble
    reliability especially because last Kafka version uses zookeeper only to
    store the leader partition.
    Here are my questions:
    - To manage the ensemble I decided to use exhibitor - what do you think
    about? Should I look to something else?
    - Is there a way to discover all the servers of an ensemble apart from
    use 4LTR? I wonder if it is possible to do something like in Cassandra were
    you contact one node and you can get the whole cluster info from it. should
    I configure just a DNS per zookeeper server, this doesn't scale well in a
    dynamic env like servers in autoscaling.
    - is there any white paper that shows a real scalable and reliable
    Zookeeper installation? Any resources are welcome!
    Thank you all in advance!

