zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: DR policies/HA setup in production - best practices
Date Mon, 03 Jan 2011 20:05:41 GMT
Hi Sergei,
 Responses in line:

On 12/22/10 8:20 AM, "Sergei Babovich" <sbabovich@demandware.com> wrote:

> Hi all,
> We are currently looking at the best ways of deploying ZK ensemble to
> our production pod. And I have two things I'd like to clarify (sorry if
> it has been already answered but I did not find exact confirmation in
> admin guide).
> 1. To provide redundancy our POD has two network switches connected to
> each blade through different interfaces. So in case of failure of the
> switch blades will still be connected to the network. Practically it
> means that each blade will have two ips and at least one of them should
> be available. So my question is how to reflect this fact in zk
> configuration? Is there any way to provide multiple addresses for a
> single server? Is it just multiple records in config file? Any catch
> here? What is the best practice?

We currently don't have any way of specifying 2 ip addresses for a single
server. What we should do is use the hostname as the server address and
resolve it to a ip address when we break connection in any of the cases
(server to server or client to server).

The server should be able to bind to all the ip addresses using

Feel free to open a jira for 3.4 release. This would be nice to have.

> 2. The second question regarding the best way of organizing DR policies.
> Basically we want to periodically backup zk state so we will be able to
> restore it remotely. In case of a single node ensemble just backing up
> last data snapshot + log should be completely enough. But it is not
> completely clear to me what would be the best practice in case of a
> cluster? Should I maintain the backup of all nodes and try to restore it
> as a cluster? But in such case how cluster will resolve possible
> timedifference between taking snapshots? It feels enough to backup only
> one node and than bring the whole cluster out of it, but how do I know
> that the node I am planning to backup is a best one? Is it correct to
> say that it is safe to backup any currently healthy node? What are the
> common practices here?
> Sorry if the answers  are well known, but I am just starting...

The policy usually is to back up all the nodes in your cluster for
distributed setup. To restore the cluster you should just use the same setup
as the production one.

Hope that helps.

> Regards,
> Sergei
> This e-mail message and all attachments transmitted with it may contain
> privileged and/or confidential information intended solely for the use of the
> addressee(s). If the reader of this message is not the intended recipient, you
> are hereby notified that any reading, dissemination, distribution, copying,
> forwarding or other use of this message or its attachments is strictly
> prohibited. If you have received this message in error, please notify the
> sender immediately and delete this message, all attachments and all copies and
> backups thereof.

View raw message