Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 94A0C189CE for ; Wed, 29 Apr 2015 12:29:58 +0000 (UTC) Received: (qmail 24781 invoked by uid 500); 29 Apr 2015 12:29:57 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 24731 invoked by uid 500); 29 Apr 2015 12:29:57 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 24719 invoked by uid 99); 29 Apr 2015 12:29:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 12:29:56 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: message received from 54.76.25.247 which is an MX secondary for user@zookeeper.apache.org) Received: from [54.76.25.247] (HELO mx1-eu-west.apache.org) (54.76.25.247) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Apr 2015 12:29:27 +0000 Received: from e06smtp12.uk.ibm.com (e06smtp12.uk.ibm.com [195.75.94.108]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id 88BE727FF9 for ; Wed, 29 Apr 2015 12:29:26 +0000 (UTC) Received: from /spool/local by e06smtp12.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 29 Apr 2015 13:29:26 +0100 Received: from d06dlp03.portsmouth.uk.ibm.com (9.149.20.15) by e06smtp12.uk.ibm.com (192.168.101.142) with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted; Wed, 29 Apr 2015 13:29:23 +0100 Received: from b06cxnps4075.portsmouth.uk.ibm.com (d06relay12.portsmouth.uk.ibm.com [9.149.109.197]) by d06dlp03.portsmouth.uk.ibm.com (Postfix) with ESMTP id 699B71B08069 for ; Wed, 29 Apr 2015 13:30:02 +0100 (BST) Received: from d06av09.portsmouth.uk.ibm.com (d06av09.portsmouth.uk.ibm.com [9.149.37.250]) by b06cxnps4075.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id t3TCTN0k25821236 for ; Wed, 29 Apr 2015 12:29:23 GMT Received: from d06av09.portsmouth.uk.ibm.com (localhost [127.0.0.1]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVout) with ESMTP id t3TCTMrr030725 for ; Wed, 29 Apr 2015 06:29:22 -0600 Received: from d06ml319.portsmouth.uk.ibm.com (d06ml319.portsmouth.uk.ibm.com [9.149.76.146]) by d06av09.portsmouth.uk.ibm.com (8.14.4/8.14.4/NCO v10.0 AVin) with ESMTP id t3TCTMPq030718 for ; Wed, 29 Apr 2015 06:29:22 -0600 To: user@zookeeper.apache.org MIME-Version: 1.0 Subject: Zookeeper on VM's in public cloud X-KeepSent: 5C4AFC59:C129A577-C2257E36:0043D099; type=4; name=$KeepSent X-Mailer: IBM Notes Release 9.0.1SHF211 December 19, 2013 From: Guy Laden Message-ID: Date: Wed, 29 Apr 2015 15:29:21 +0300 X-MIMETrack: Serialize by Router on D06ML319/06/M/IBM(Release 9.0.1FP3|January 12, 2015) at 29/04/2015 15:29:22, Serialize complete at 29/04/2015 15:29:22 Content-Type: multipart/alternative; boundary="=_alternative 00449AD8C2257E36_=" X-TM-AS-MML: disable X-Content-Scanned: Fidelis XPS MAILER x-cbid: 15042912-0009-0000-0000-000003FEA414 X-Virus-Checked: Checked by ClamAV on apache.org --=_alternative 00449AD8C2257E36_= Content-Type: text/plain; charset="US-ASCII" Hi All, I wanted to get some feedback about running ZooKeeper on VM's within public clouds. If you have experience with this could you share please? What issues have you run into? Were you able to overcome the issues and how? At the end of the day, were you able to get this to work reliably? Some of the issues we know we need to worry about: 1. Making sure replicas are in different 'availability zones'. Without this your VM's might even be running on the same physical machine. 2. Lack of fixed IP I believe typically in clouds every VM is allocated a new IP so if you're e.g. upgrading a cluster, you can't keep the existing IP's for the new VM's. Our solution is to use our cloud provider's support for getting a set of fixed IP's which can be dynamically bound to whichever VM's we want. (aka "portable ip" on SoftLayer, I believe there is similar support on other providers). It's probably the case that dynamic reconfig opens up new options, but it will be a while before this is supported in a stable version. We prefer to use a stable Zookeeper, unless there is feedback that the pro's of using the more recent ZK versions are larger than the cons. 3. Isolation from other VM's on same physical machine. It seems especially important to good decent performance for the log disk. Can be partially dealt with by allocating the log to a non-local disk with guaranteed IOP's, as is supported by some providers. 4. Write caching of disk I/O. Making sure there are no layers which cache disk writes so they do not really reach the disk even though they have been acknowledged. Perhaps its not that big of an issue given the provider might have backup power? What are your thoughts here? 5. Clock-related issues on VM's. It seems people have seen VM clocks skipping ahead or even going backwards, which caused e.g. ZooKeeper session disconnection. We're not entirely clear what exactly we need to do to avoid this. Any help/pointer are appreciated. Might be less of an issue in the more recent ZK versions but, again, these are not yet stable. c.f. https://issues.apache.org/jira/browse/ZOOKEEPER-1616 Any additional issues to look out for? Thanks, Guy --=_alternative 00449AD8C2257E36_=--