Return-Path: Delivered-To: apmail-hadoop-zookeeper-user-archive@minotaur.apache.org Received: (qmail 44359 invoked from network); 8 Mar 2010 19:26:06 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 8 Mar 2010 19:26:06 -0000 Received: (qmail 59206 invoked by uid 500); 8 Mar 2010 19:25:42 -0000 Delivered-To: apmail-hadoop-zookeeper-user-archive@hadoop.apache.org Received: (qmail 59181 invoked by uid 500); 8 Mar 2010 19:25:42 -0000 Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: zookeeper-user@hadoop.apache.org Delivered-To: mailing list zookeeper-user@hadoop.apache.org Received: (qmail 59173 invoked by uid 99); 8 Mar 2010 19:25:42 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 19:25:42 +0000 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.173] (HELO mrout3.yahoo.com) (216.145.54.173) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Mar 2010 19:25:34 +0000 Received: from SNV-EXPF01.ds.corp.yahoo.com (snv-expf01.ds.corp.yahoo.com [207.126.227.250]) by mrout3.yahoo.com (8.13.6/8.13.6/y.out) with ESMTP id o28JO6Z6094528 for ; Mon, 8 Mar 2010 11:24:06 -0800 (PST) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:user-agent:date:subject:from:to:message-id: thread-topic:thread-index:in-reply-to:mime-version:content-type: content-transfer-encoding:x-originalarrivaltime; b=cBuGFx6wcuDMny+gmHlVn10ET+27f0yH9nYuEwAQh9ulhgVz61I/9scTGFK9YX0W Received: from SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.86]) by SNV-EXPF01.ds.corp.yahoo.com with Microsoft SMTPSVC(6.0.3790.3959); Mon, 8 Mar 2010 11:24:06 -0800 Received: from 10.72.76.77 ([10.72.76.77]) by SNV-EXVS09.ds.corp.yahoo.com ([207.126.227.84]) via Exchange Front-End Server snv-webmail.corp.yahoo.com ([207.126.227.60]) with Microsoft Exchange Server HTTP-DAV ; Mon, 8 Mar 2010 19:24:05 +0000 User-Agent: Microsoft-Entourage/12.20.0.090605 Date: Mon, 08 Mar 2010 11:24:04 -0800 Subject: Re: Managing multi-site clusters with Zookeeper From: Mahadev Konar To: Message-ID: Thread-Topic: Managing multi-site clusters with Zookeeper Thread-Index: Acq+9On2UQcnF1c3M0KU2xfOTSqj5g== In-Reply-To: <8bc75ecf1003081118k3fc82132m2bda0b38761d1258@mail.gmail.com> Mime-version: 1.0 Content-type: text/plain; charset="US-ASCII" Content-transfer-encoding: 7bit X-OriginalArrivalTime: 08 Mar 2010 19:24:06.0347 (UTC) FILETIME=[EB5C41B0:01CABEF4] X-Virus-Checked: Checked by ClamAV on apache.org HI Martin, The results would be really nice information to have on ZooKeeper wiki. Would be very helpful for others considering the same kind of deployment. So, do send out your results on the list. Thanks mahadev On 3/8/10 11:18 AM, "Martin Waite" wrote: > Hi Patrick, > > Thanks for you input. > > I am planning on having 3 zk servers per data centre, with perhaps only 2 in > the tie-breaker site. > > The traffic between zk and the applications will be lots of local reads - > "who is the primary database ?". Changes to the config will be rare (server > rebuilds, etc - ie. planned changes) or caused by server / network / site > failure. > > The interesting thing in my mind is how zookeeper will cope with inter-site > link failure - how quickly the remote sites will notice, and how quickly > normality can be resumed when the link reappears. > > I need to get this running in the lab and start pulling out wires. > > regards, > Martin > > On 8 March 2010 17:39, Patrick Hunt wrote: > >> IMO latency is the primary issue you will face, but also keep in mind >> reliability w/in a colo. >> >> Say you have 3 colos (obv can't be 2), if you only have 3 servers, one in >> each colo, you will be reliable but clients w/in each colo will have to >> connect to a remote colo if the local fails. You will want to prioritize the >> local colo given that reads can be serviced entirely local that way. If you >> have 7 servers (2-2-3) that would be better - if a local server fails you >> have a redundant, if both fail then you go remote. >> >> You want to keep your writes as few as possible and as small as possible? >> Why? Say you have 100ms latency btw colos, let's go through a scenario for a >> client in a colo where the local servers are not the leader (zk cluster >> leader). >> >> read: >> 1) client reads a znode from local server >> 2) local server (usually < 1ms if "in colo" comm) responds in 1ms >> >> write: >> 1) client writes a znode to local server A >> 2) A proposes change to the ZK Leader (L) in remote colo >> 3) L gets the proposal in 100ms >> 4) L proposes the change to all followers >> 5) all followers (not exactly, but hopefully) get the proposal in 100ms >> 6) followers ack the change >> 7) L gets the acks in 100ms >> 8) L commits the change (message to all followers) >> 9) A gets the commit in 100ms >> 10) A responds to client (< 1ms) >> >> write latency: 100 + 100 + 100 + 100 = 400ms >> >> Obviously keeping these writes small is also critical. >> >> Patrick >> >> >> Martin Waite wrote: >> >>> Hi Ted, >>> >>> If the links do not work for us for zk, then they are unlikely to work >>> with >>> any other solution - such as trying to stretch Pacemaker or Red Hat >>> Cluster >>> with their multicast protocols across the links. >>> >>> If the links are not good enough, we might have to spend some more money >>> to >>> fix this. >>> >>> regards, >>> Martin >>> >>> On 8 March 2010 02:14, Ted Dunning wrote: >>> >>> If you can stand the latency for updates then zk should work well for >>>> you. >>>> It is unlikely that you will be able to better than zk does and still >>>> maintain correctness. >>>> >>>> Do note that you can, probalbly bias client to use a local server. That >>>> should make things more efficient. >>>> >>>> Sent from my iPhone >>>> >>>> >>>> On Mar 7, 2010, at 3:00 PM, Mahadev Konar wrote: >>>> >>>> The inter-site links are a nuisance. We have two data-centres with >>>> 100Mb >>>> >>>>> links which I hope would be good enough for most uses, but we need a 3rd >>>>>> site - and currently that only has 2Mb links to the other sites. This >>>>>> might >>>>>> be a problem. >>>>>> >>>>>> >>>