zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Kamphuis <Robert.Kamph...@supercell.com>
Subject Deployment scenario's for zookeeper in AWS
Date Thu, 29 Jan 2015 08:34:38 GMT

I would like to discuss deployment scenarios of apps and zookeeper in AWS. I have been trying
to find info about this, but haven’t found too much yet.

We have been working on better redundancy of our apps - hundreds of VMs - running 24x7, and
zookeeper is one component we introduced last year. While it is working fine, there are some
little tricks and missing things in the current setup.

I would really like to hear more how others are configuring their apps and zookeeper in AWS.

Trying to summarise our current setup:
- One autoscalinggroup (ASG) for the 5 zookeeper servers per application - the ASG will replace
an instance by a fresh one if it goes bad.
- the zookeeper servers each have their assigned elastic-ip, and their zoo.cfg lists these
elastic ips in the server.N lines. not the names, the IP addresses directly. We can swap a
zookeeper-VM by terminating it, and have the ASG create a new one, and once it is assigned
the freed elastic-ip, it joins the zookeeper-cluster.
- the zookeepers security group explicitly allows those 5 elastic-ips for the 2888 and 3888
ports, plus the SGs of our app-servers
- the image we use for the zookeeper ASG contains a little extra service which takes care
of automatically assigning the configured elastic IPs to its ASG members. So when an new server
boots up, the remaining ASG members will set the missing elastic-ip to the new instance, and
it will startup zookeeper and join the cluster. The same image is used for all apps-deployments
- with the userdata of the ASG telling what elastic-ips and some other details. One image
to have a redundant self-healing zookeeper-cluster per application.
- the application servers are spread across different SGs depending on needs and their roles,
and the connectstring is configured by logical names like zookeeperX.<app.domainname.<http://domainname.com>org>
for X=1,2,3,4,5. We added manually mappings for these to the ec2-public hostname of the elastic-ips
- like ec2-A-B-C-D.compute-1.amazonaws.com<http://ec2-A-B-C-D.compute-1.amazonaws.com>
 with A.B.C.D being the corresponding elastic-ip. This has the great benefit, that all our
application VMs when looking up these logical zookeeperX.app.domain.org<http://zookeeperN.app.domain.org>
will resolve it to the current private-ip of that zookeeper-server instance, and when connecting
the SG will allow it through. (if we use the A.B.C.D directly, we would need to provision
each application-vm explicitly in the SG of the zookeeper-cluster - hundreds servers which
are changing somewhat from week to week.
- we use curator for leader-election to pick what server is doing what role, and we run some
5-10% more servers than roles we need. Each server holds on to its role until it lost its
session, and another spare-server jumps in to take over. So if an app servers goes bad (eg.
ebs, networking, or it just disappears), one of the others jump in to take over.
- we changed the curator’s leaderlatch somewhat to hang-on to the leadership during suspend
events. Waiting for the reconnect or lost events. A leadership role is an expensive thing
due to the high amount of state and data-caching in each server - which is needed for performance.
This means that when one of the zookeeper-servers goes bad, its not that about one 5th of
our servers loose their role - they have some 30 seconds to reconnect to the remaining servers
and continue their session their.

The current issues we have are the following:

- A while back there was a networking issue in AWS which caused traffic between the zookeeper-servers
to be partially blocking for some minutes. The zookeeper cluster lost its leader, and re-election
failed. The App came to a grinding halt. Not good. We have been working on adding keep-alive
packets to the election ports between the servers which we identified as a working solution
for that issue. We simulate the problems via iptables. We hope get that patch submitted in
the near future for consideration.  This has been reported a while back with discussions on
the best way going forward. eg. https://issues.apache.org/jira/browse/ZOOKEEPER-1748 (we would
prefer application level keepalive packets, in stead of lower level tcpkeepalive socket options.)

- While the replacement of a zookeeper VM instance works great, there is one remaining issue:
how do the applications VMs know about the changed name-to-ip relation? zookeeperX.app.domain.org<http://zookeeperX.app.domain.org>
is no longer mapping to the same private IP anymore - the replacement VM has a different IP.
We tried to work around this by changing the connectstring to a shuffled replacement, but
that expired the sessions, and thus cause the leaderlatches to close, and in some cases some
servers could not get their old role back as some spares got there first.
We now have a prototype working where we use a special HostProvider implementation which resolves
from name to IP when next() is called instead of on construction as the default StaticHostProvider
does. This means that after the mapping changes, the zookeeper client has the new private
IP address to connect to. In addition, this is not ending the zookeeper session, so the leader-latches
remain. (we use a sessiontime of about 1-2 minutes). This solution requires a small fix and
addition to the ZooKeeper class to enable passing a custom HostProvider. see: https://issues.apache.org/jira/browse/ZOOKEEPER-2107

Hope this helps others running on AWS, and please share you experiences ?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message