Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C83167753 for ; Wed, 14 Dec 2011 00:37:09 +0000 (UTC) Received: (qmail 13978 invoked by uid 500); 14 Dec 2011 00:37:09 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 13948 invoked by uid 500); 14 Dec 2011 00:37:09 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 13940 invoked by uid 99); 14 Dec 2011 00:37:09 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Dec 2011 00:37:09 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ted.dunning@gmail.com designates 209.85.213.170 as permitted sender) Received: from [209.85.213.170] (HELO mail-yx0-f170.google.com) (209.85.213.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 14 Dec 2011 00:36:58 +0000 Received: by yenl6 with SMTP id l6so197793yen.15 for ; Tue, 13 Dec 2011 16:36:37 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=siBk8afXxhCi065tSVLqMcbc5AwpY9sSCdKQ0zlbi60=; b=RALeXEucXccPejrjvxhGpCPA1x2IcQ9ZBRQSiDbWAMN31fNXjdKT6wFB6z6yxiAVjV WDVzQnp/9/BRAiNAynuuzpFeNzfilVWdAUNheJcmY42jjcZ2nKb7F9lUvrQxn2vuGJR3 6fh3QTGue4KtFS9d88QjgmRjIKLkooXliy7uI= Received: by 10.236.197.97 with SMTP id s61mr7637127yhn.57.1323822997276; Tue, 13 Dec 2011 16:36:37 -0800 (PST) MIME-Version: 1.0 Received: by 10.100.58.19 with HTTP; Tue, 13 Dec 2011 16:36:16 -0800 (PST) In-Reply-To: References: From: Ted Dunning Date: Tue, 13 Dec 2011 19:36:16 -0500 Message-ID: Subject: Re: Distributed ZooKeeper cluster design To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary=20cf3040ec2c7f878d04b4028f20 X-Virus-Checked: Checked by ClamAV on apache.org --20cf3040ec2c7f878d04b4028f20 Content-Type: text/plain; charset=UTF-8 I am happy to agree with everyone that the mirroring isn't a great idea for most things even if that makes me look like I disagree with myself. I do think that mirroring could be made to happen in a reliable way, but it isn't going to be a viable substitute for direct access to the cluster. By reliable, I think that you could get a reliable picture of what was in the master cluster at some time in the past. Occasionally the mirror would be further behind than other times and it might be necessary for the mirror to be updated much faster than real-time. In my vision, the mirror would be read-only since anything else leads to madness in the strict consistency model that ZK maintains. On Tue, Dec 13, 2011 at 2:57 PM, Benjamin Reed wrote: > i agree with camille that mirror breaks a lot of the basic guarantees > that you use from zookeeper. with that caveat in mind, there is a > patch that enables mirroring: ZOOKEEPER-892. > > ben > > On Tue, Dec 13, 2011 at 8:24 AM, Camille Fournier > wrote: > > I have to strongly disagree with ted on the mirroring idea... I think it > is > > likely to be really error-prone and kind of defeats the purpose of ZK in > my > > mind. It depends on what you're mirroring but if you're trying to keep > all > > the data coherent you can't sensibly do that in two clusters, so unless > the > > mirror is for a really small subset of the data I would stay far far away > > from that. > > > > Observers are available in 3.3.3, yes. > > Unfortunately, we don't have configurable connection logic in ZK client > (at > > least java) right now. We have the ability to add it pretty easily, but > it > > hasn't been put in yet. > > > > You're seeing slow performance for a setup that has all ZK servers in > > region C for clients only in regions A and B and you can't blame the > > network? That's literally the only thing you could blame, unless clients > in > > region C were also seeing slow performance or they have some other > problem > > in they way they are implemented that makes them different from clients > > running in region C. > > > > C > > > > On Tue, Dec 13, 2011 at 11:09 AM, Dima Gutzeit > > wrote: > > > >> Ted and Camille, > >> > >> Thanks for a very details response. > >> > >> At the moment I have an option A implemented in production and what I > see > >> is that ZK client in A and B have a "slow" performance (even reads) and > I > >> can't really blame the network since it does not look like a real > >> bottleneck. > >> > >> I wonder if doing option 2 will improve the ZK client performance/speed > ... > >> > >> As for my use case, its around 50/50 reads and writes. > >> > >> As for fallback, ofcourse in A and B I would want to define C as a > backup, > >> not sure how it can be done since as I understand if I supply several > >> addresses in the connection string the client will use one, randomly. > >> > >> About Ted's suggestion to consider having several clusters and to have a > >> special process to mirror, is it something available as part of > ZooKeeper ? > >> > >> I also read about observers (is it available in 3.3.3 ?) and it seems > to be > >> a good option is my case, which brings me to the question of how to > >> configure explicit fallback instead of random client selection ? If I > want > >> to tell ZK client in B to use the local B instance (observer) and if it > >> fails then contact ANY server in the C (with a list of several). > >> > >> Thanks in advance. > >> > >> Regards, > >> Dima Gutzeit. > >> > >> > >> > >> On Tue, Dec 13, 2011 at 5:44 PM, Camille Fournier >> >wrote: > >> > >> > Ted is of course right, but to speculate: > >> > > >> > The idea you had with 3 in C, one in A and one in B isn't bad, given > >> > some caveats. > >> > > >> > With 3 in C, as long as they are all available, quorum should live in > >> > C and you shouldn't have much slowdown from the remote servers in A > >> > and B. However, if you point your A servers only to the A zookeeper, > >> > you have a failover risk where your A servers will have no ZK if the > >> > sever in region A goes down (same with B, of course). If you have a > >> > lot of servers in the outer regions, this could be a risk. You are > >> > also giving up any kind of load balancing for the A and B region ZKs, > >> > which may not be important but is good to know. > >> > > >> > Another thing to be aware of is that the A and B region ZKs will have > >> > slower write response time due to the WAN cost, and they will tend to > >> > lag behind the majority cluster a bit. This shouldn't cause > >> > correctness issues but could impact client performance in those > >> > regions. > >> > > >> > Honestly, if you're doing a read-mostly workload in the A and B > >> > regions, I doubt this is a bad design. It's pretty easy to test ZK > >> > setups using Pat's zksmoketest utility, so you might try setting up > >> > the sample cluster and running some of the smoketests on it. > >> > (https://github.com/phunt/zk-smoketest/blob/master/zk-smoketest.py). > >> > You could maybe also add observers in the outer regions to improve > >> > client load balancing. > >> > > >> > C > >> > > >> > > >> > > >> > On Tue, Dec 13, 2011 at 9:05 AM, Ted Dunning > >> > wrote: > >> > > Which option is preferred really depends on your needs. > >> > > > >> > > Those needs are likely to vary in read/write ratios, resistance to > >> > network > >> > > and so on. You should also consider the possibility of observers in > >> the > >> > > remote locations. You might also consider separate ZK clusters in > each > >> > > location with a special process to send mirrors of changes to these > >> other > >> > > locations. > >> > > > >> > > A complete and detailed answer really isn't possible without knowing > >> the > >> > > details of your application. I generally don't like distributing a > ZK > >> > > cluster across distant hosts because it makes everything slower and > >> more > >> > > delicate, but I have heard of examples where that is exactly the > right > >> > > answer. > >> > > > >> > > On Tue, Dec 13, 2011 at 4:29 AM, Dima Gutzeit > >> > > wrote: > >> > > > >> > >> Dear list members, > >> > >> > >> > >> I have a question related to "suggested" way of working with > ZooKeeper > >> > >> cluster from different geographical locations. > >> > >> > >> > >> Lets assume a service span across several regions, A, B and C, > while C > >> > is > >> > >> defined as an element that the service can not live without and A > and > >> B > >> > are > >> > >> not critical. > >> > >> > >> > >> Option one: > >> > >> > >> > >> Having one cluster of several ZooKeeper nodes in one location (C) > and > >> > >> accessing that from other locations A,B,C. > >> > >> > >> > >> Option two: > >> > >> > >> > >> Having ZooKeeper cluster span across all regions, i.e. 3 nodes in > C, > >> > one in > >> > >> A and one in B. This way the clients resides in A,B will access the > >> > local > >> > >> ZooKeeper. > >> > >> > >> > >> Which option is preferred and which will work faster from client > >> > >> perspective ? > >> > >> > >> > >> Thanks in advance. > >> > >> > >> > >> Regards, > >> > >> Dima Gutzeit > >> > >> > >> > > >> > --20cf3040ec2c7f878d04b4028f20--