Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 334E8367F for ; Fri, 6 May 2011 11:34:04 +0000 (UTC) Received: (qmail 33761 invoked by uid 500); 6 May 2011 11:34:03 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 33728 invoked by uid 500); 6 May 2011 11:34:03 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 33720 invoked by uid 99); 6 May 2011 11:34:03 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 11:34:03 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ketan@indeed.com designates 209.85.160.170 as permitted sender) Received: from [209.85.160.170] (HELO mail-gy0-f170.google.com) (209.85.160.170) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 May 2011 11:33:57 +0000 Received: by gyb11 with SMTP id 11so1583335gyb.15 for ; Fri, 06 May 2011 04:33:35 -0700 (PDT) MIME-Version: 1.0 Received: by 10.90.6.37 with SMTP id 37mr3185286agf.30.1304681615694; Fri, 06 May 2011 04:33:35 -0700 (PDT) Received: by 10.90.70.10 with HTTP; Fri, 6 May 2011 04:33:35 -0700 (PDT) In-Reply-To: References: Date: Fri, 6 May 2011 06:33:35 -0500 Message-ID: Subject: Re: observers in occasionally disconnected data centers From: Ketan Gangatirkar To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi, Patrick. Were you able to get any assistance from the hudson admins? Thanks. On Wed, May 4, 2011 at 12:53 PM, Patrick Hunt wrote: > This is odd, it's failing in the c tests but for a weird reason: > > in: > https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/247/artifa= ct/trunk/build/tmp/zk.log > > it says: > /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src= /c/tests/zkServer.sh: > line 115: java: command not found > > I'll ping the hudson admins and see if this is a known issue (also > hudson is very slow today for some reason). > > Once that's addressed we should be good to go. > > Patrick > > On Wed, May 4, 2011 at 9:57 AM, Ketan Gangatirkar wrot= e: >> Got the patch formatted right and applying successfully, now I'll see >> if I can figure out the unit test failure. >> >> On Wed, May 4, 2011 at 11:26 AM, Patrick Hunt wrote: >>> Hi Ketan, the patch is failing to apply >>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/246//con= sole >>> >>> Looks like you used git, I usually do something like: >>> git diff rev1..rev2 --no-prefix > ZOOKEEPER-784.patch >>> can you give it another try? >>> >>> Patrick >>> >>> On Tue, May 3, 2011 at 6:42 PM, Ketan Gangatirkar wr= ote: >>>> I have updated Sergey's patch to: >>>> >>>> * apply to current trunk >>>> * incorporate one trivial output change he made to StatCommand in >>>> NettyServerCnxn.java >>>> * change log4j references to slf4j >>>> >>>> I have successfully run ant releaseaudit on the result. =A0The updated >>>> patch is now attached to the issue: >>>> >>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784 >>>> >>>> I do *not* make any claim to have understood the contents of this >>>> patch; all I did was synch everything and fix the obvious log4j/slf4j >>>> change. =A0Now what? >>>> >>>> >>>> On Tue, May 3, 2011 at 5:46 PM, Patrick Hunt wrote: >>>>> The core tests failed on last hudson, I just kicked off a patch build= , >>>>> seems recent changes (logging?) have caused the patch to stop >>>>> applying: >>>>> https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCommi= t-ZOOKEEPER-Build/238/console >>>>> >>>>> Ketan would you like to try updating the patch and resubmit? >>>>> >>>>> Patrick >>>>> >>>>> On Tue, May 3, 2011 at 3:31 PM, Ketan Gangatirkar = wrote: >>>>>> Thanks, Mahadev. =A0I had seen ZOOKEEPER-892 but not ZOOKEEPER-784. = =A0The >>>>>> latter may be what we need. >>>>>> >>>>>> I read the comments attached to that issue. =A0The most recent comme= nt >>>>>> was a Hudson CI message indicating that the tests against the patch >>>>>> failed. =A0I was not able to find out more as it appears that the >>>>>> configuration of the Apache Hudson has changed. =A0It appears that t= he >>>>>> patch was approved but not merged into trunk, and it's now in limbo. >>>>>> What is necessary to get that feature into the next release? =A0I ma= y be >>>>>> able to assist, depending on what's involved. =A0Thank you. >>>>>> >>>>>> >>>>>> On Tue, May 3, 2011 at 4:17 PM, Mahadev Konar w= rote: >>>>>>> Hi Ketan, >>>>>>> =A0You are correct that observers need connection to quorum as well= . >>>>>>> There have been quite a few discussions on multi colo replication a= nd >>>>>>> read only mode of ZooKeeper. >>>>>>> >>>>>>> Here are the jiras for those: >>>>>>> >>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784 >>>>>>> and >>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-892 >>>>>>> >>>>>>> These have been mostly targeted at exactly a use case like yours. >>>>>>> Please take a look and them and feel free to contribute/comment on = the >>>>>>> jiras. >>>>>>> >>>>>>> -- >>>>>>> thanks >>>>>>> mahadev >>>>>>> @mahadevkonar >>>>>>> >>>>>>> >>>>>>> >>>>>>> On Tue, May 3, 2011 at 2:07 PM, Ketan Gangatirkar wrote: >>>>>>>> Hi. =A0We're considering ZooKeeper for coordinating operations acr= oss >>>>>>>> multiple data centers. =A0These data centers will occasionally be >>>>>>>> disconnected. =A0We were planning on using observers in remote dat= a >>>>>>>> centers. =A0Our applications can survive being unable to *write* t= o >>>>>>>> ZooKeeper, but they do need to be able to read from it, even if th= e >>>>>>>> data were stale. >>>>>>>> >>>>>>>> On further examination, it looks like observers must always be >>>>>>>> connected to the quorum to function at all. =A0Is this correct? = =A0Does >>>>>>>> anyone have suggestions for how to work around this problem? =A0Th= e >>>>>>>> first thing that comes to mind is duplicating the required data in >>>>>>>> some other local data store and falling back on that when the DC >>>>>>>> becomes disconnected. =A0I imagine the disadvantages of that are o= bvious >>>>>>>> to everyone. =A0I hope someone can share some great idea that allo= ws me >>>>>>>> to avoid that miserable fate. =A0Thanks. >>>>>>>> >>>>>>>> -- >>>>>>>> Ketan Gangatirkar >>>>>>>> ketan@indeed.com >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ketan Gangatirkar >>>>>> ketan@indeed.com >>>>>> Perishable Developer >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Ketan Gangatirkar >>>> ketan@indeed.com >>>> Perishable Developer >>>> >>> >> >> >> >> -- >> Ketan Gangatirkar >> ketan@indeed.com >> Perishable Developer >> > --=20 Ketan Gangatirkar ketan@indeed.com Perishable Developer