zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Han <h...@cloudera.com>
Subject Re: Adding and removing replicas?
Date Thu, 20 Oct 2016 16:26:14 GMT
+1 on what Rakesh mentioned - dynamic reconfig is a great feature for this
use case.

For 3.4.x where dynamic reconfig feature is not available we could do a
'rolling restart' of the cluster for the use case of adding or removing a
ZK server node. A rolling restart is pretty much like what you described
for step 1 / 2 / 3, to minimize downtime, there are some standard practice
to apply when doing the restart step 3:

* Restart one server at a time. If you restart multiple servers at same
time, you have the risk of bring down the cluster when there are not enough
up servers to form a quorum, plus it will cause a lot of clients connecting
to those restarting servers being disconnected and reconnecting which
causes extra load to your cluster.

* Restart from follower nodes, then finally restart leader if possible to
minimize the number of times of leader election. During leader election ZK
ensemble would not server clients, and after leader election it might take
a while to form the quorum (servers sync up with leader) depends on the
size of snapshot etc, and during this period ZK ensemble is also not
available. So to minimize downtime, we need to minimize the chance of
leader election. Worst case - every round we restart the leader and then
ensemble would not be available until the rolling restart is finished.

* Restart from lowest sid server then highest sid - there is a design in ZK
server to minimize duplicated connections between servers so if a server A
trying to connect to another server B find out A has smaller sid then B
then A will drop connection. So if you restart server with smallest sid
last, then that server might not able to join ensemble.

On Thu, Oct 20, 2016 at 8:03 AM, Rakesh Radhakrishnan <rakeshr@apache.org>
wrote:

> Hi Steve,
>
> I'd suggest you to look at ZooKeeper-3.5.2 latest version and use dynamic
> reconfig feature. This will help to resize(add/remove zk server) your
> cluster without restarting entire cluster.
>
> Please refer the following links to understand more about the dynamic
> reconfig feature:-
> https://zookeeper.apache.org/doc/r3.5.2-alpha/zookeeperReconfig.html
> http://www.slideshare.net/Hadoop_Summit/dynamic-
> reconfiguration-of-zookeeper
>
> Regards,
> Rakesh
>
> On Thu, Oct 20, 2016 at 3:19 AM, Steve Newman <steve@scalyr.com> wrote:
>
> > Apologies for a basic question, but I've been researching and haven't
> been
> > able to find the answer online.
> >
> > What is the best way to add or remove replicas from a running ZooKeeper
> > cluster, with minimal downtime? To add a replica, the naive answer would
> > seem to be:
> >
> > 1. Prepare the new replica(s), i.e. install ZooKeeper and set up the
> > configuration files.
> > 2. Edit the configuration for all replicas (new and existing) to list the
> > new replicas.
> > 3. Restart all replicas. (Simultaneously? Or gradually, one at a time?)
> >
> > Is this the best way to do it? Step 3 seems scary in a production
> cluster.
> > Also, will the new replicas smoothly pick up the existing data, or is it
> > better to seed them with a snapshot somehow?
> >
> > Similarly, the naive answer for removing a replica would seem to be:
> >
> > 1. Halt the ZooKeeper process.
> > 2. Edit the configuration for all other replicas to remove the replica
> > that's going away.
> > 3. Restart all remaining replicas (one at a time?).
> >
> > Again, is this the best approach?
> >
> > Thanks,
> > Steve
> >
>



-- 
Cheers
Michael.

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message