Hi All,
I wanted to get some feedback about running ZooKeeper on VM's within
public clouds.
If you have experience with this could you share please?
What issues have you run into? Were you able to overcome the issues and
how?
At the end of the day, were you able to get this to work reliably?
Some of the issues we know we need to worry about:
1.
Making sure replicas are in different 'availability zones'.
Without this your VM's might even be running on the same physical machine.
2. Lack of fixed IP
I believe typically in clouds every VM is allocated a new IP so if you're
e.g. upgrading a cluster,
you can't keep the existing IP's for the new VM's. Our solution is to use
our cloud provider's support
for getting a set of fixed IP's which can be dynamically bound to
whichever VM's we want. (aka "portable ip"
on SoftLayer, I believe there is similar support on other providers).
It's probably the case that dynamic reconfig opens up new options, but it
will be a while before this is
supported in a stable version. We prefer to use a stable Zookeeper, unless
there is feedback that the
pro's of using the more recent ZK versions are larger than the cons.
3.
Isolation from other VM's on same physical machine. It seems especially
important to good decent performance for the log disk.
Can be partially dealt with by allocating the log to a non-local disk with
guaranteed IOP's, as
is supported by some providers.
4. Write caching of disk I/O.
Making sure there are no layers which cache disk writes so they do not
really reach the disk even though they have been acknowledged.
Perhaps its not that big of an issue given the provider might have backup
power? What are your thoughts here?
5. Clock-related issues on VM's. It seems people have seen VM clocks
skipping ahead or even going backwards, which caused
e.g. ZooKeeper session disconnection.
We're not entirely clear what exactly we need to do to avoid this. Any
help/pointer are appreciated.
Might be less of an issue in the more recent ZK versions but, again, these
are not yet stable.
c.f. https://issues.apache.org/jira/browse/ZOOKEEPER-1616
Any additional issues to look out for?
Thanks,
Guy
|