Return-Path: X-Original-To: apmail-zookeeper-user-archive@www.apache.org Delivered-To: apmail-zookeeper-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 2E51F467E for ; Wed, 18 May 2011 01:08:12 +0000 (UTC) Received: (qmail 20813 invoked by uid 500); 18 May 2011 01:08:11 -0000 Delivered-To: apmail-zookeeper-user-archive@zookeeper.apache.org Received: (qmail 20793 invoked by uid 500); 18 May 2011 01:08:11 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 20784 invoked by uid 99); 18 May 2011 01:08:11 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 01:08:11 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of ketan@indeed.com designates 209.85.160.170 as permitted sender) Received: from [209.85.160.170] (HELO mail-gy0-f170.google.com) (209.85.160.170) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 May 2011 01:08:04 +0000 Received: by gyb11 with SMTP id 11so505020gyb.15 for ; Tue, 17 May 2011 18:07:43 -0700 (PDT) MIME-Version: 1.0 Received: by 10.91.66.15 with SMTP id t15mr904133agk.141.1305680863244; Tue, 17 May 2011 18:07:43 -0700 (PDT) Received: by 10.90.34.16 with HTTP; Tue, 17 May 2011 18:07:43 -0700 (PDT) In-Reply-To: References: Date: Tue, 17 May 2011 20:07:43 -0500 Message-ID: Subject: Re: observers in occasionally disconnected data centers From: Ketan Gangatirkar To: user@zookeeper.apache.org Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable X-Virus-Checked: Checked by ClamAV on apache.org Hi. Has there been any progress on this? Thanks. On Fri, May 6, 2011 at 11:32 AM, Patrick Hunt wrote: > Mahadev is working with Giri to address. The jenkins folks are saying > this is a machine administered by Yahoo and the issue needs to be > address with them (their admins, but Mahadev/Giri are looking into it > from our (zk) side). > > Patrick > > On Fri, May 6, 2011 at 4:33 AM, Ketan Gangatirkar wrot= e: >> Hi, Patrick. =A0Were you able to get any assistance from the hudson >> admins? =A0Thanks. >> >> On Wed, May 4, 2011 at 12:53 PM, Patrick Hunt wrote: >>> This is odd, it's failing in the c tests but for a weird reason: >>> >>> in: >>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/247/arti= fact/trunk/build/tmp/zk.log >>> >>> it says: >>> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/s= rc/c/tests/zkServer.sh: >>> line 115: java: command not found >>> >>> I'll ping the hudson admins and see if this is a known issue (also >>> hudson is very slow today for some reason). >>> >>> Once that's addressed we should be good to go. >>> >>> Patrick >>> >>> On Wed, May 4, 2011 at 9:57 AM, Ketan Gangatirkar wr= ote: >>>> Got the patch formatted right and applying successfully, now I'll see >>>> if I can figure out the unit test failure. >>>> >>>> On Wed, May 4, 2011 at 11:26 AM, Patrick Hunt wrote= : >>>>> Hi Ketan, the patch is failing to apply >>>>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/246//c= onsole >>>>> >>>>> Looks like you used git, I usually do something like: >>>>> git diff rev1..rev2 --no-prefix > ZOOKEEPER-784.patch >>>>> can you give it another try? >>>>> >>>>> Patrick >>>>> >>>>> On Tue, May 3, 2011 at 6:42 PM, Ketan Gangatirkar = wrote: >>>>>> I have updated Sergey's patch to: >>>>>> >>>>>> * apply to current trunk >>>>>> * incorporate one trivial output change he made to StatCommand in >>>>>> NettyServerCnxn.java >>>>>> * change log4j references to slf4j >>>>>> >>>>>> I have successfully run ant releaseaudit on the result. =A0The updat= ed >>>>>> patch is now attached to the issue: >>>>>> >>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784 >>>>>> >>>>>> I do *not* make any claim to have understood the contents of this >>>>>> patch; all I did was synch everything and fix the obvious log4j/slf4= j >>>>>> change. =A0Now what? >>>>>> >>>>>> >>>>>> On Tue, May 3, 2011 at 5:46 PM, Patrick Hunt wrot= e: >>>>>>> The core tests failed on last hudson, I just kicked off a patch bui= ld, >>>>>>> seems recent changes (logging?) have caused the patch to stop >>>>>>> applying: >>>>>>> https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCom= mit-ZOOKEEPER-Build/238/console >>>>>>> >>>>>>> Ketan would you like to try updating the patch and resubmit? >>>>>>> >>>>>>> Patrick >>>>>>> >>>>>>> On Tue, May 3, 2011 at 3:31 PM, Ketan Gangatirkar wrote: >>>>>>>> Thanks, Mahadev. =A0I had seen ZOOKEEPER-892 but not ZOOKEEPER-784= . =A0The >>>>>>>> latter may be what we need. >>>>>>>> >>>>>>>> I read the comments attached to that issue. =A0The most recent com= ment >>>>>>>> was a Hudson CI message indicating that the tests against the patc= h >>>>>>>> failed. =A0I was not able to find out more as it appears that the >>>>>>>> configuration of the Apache Hudson has changed. =A0It appears that= the >>>>>>>> patch was approved but not merged into trunk, and it's now in limb= o. >>>>>>>> What is necessary to get that feature into the next release? =A0I = may be >>>>>>>> able to assist, depending on what's involved. =A0Thank you. >>>>>>>> >>>>>>>> >>>>>>>> On Tue, May 3, 2011 at 4:17 PM, Mahadev Konar = wrote: >>>>>>>>> Hi Ketan, >>>>>>>>> =A0You are correct that observers need connection to quorum as we= ll. >>>>>>>>> There have been quite a few discussions on multi colo replication= and >>>>>>>>> read only mode of ZooKeeper. >>>>>>>>> >>>>>>>>> Here are the jiras for those: >>>>>>>>> >>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784 >>>>>>>>> and >>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-892 >>>>>>>>> >>>>>>>>> These have been mostly targeted at exactly a use case like yours. >>>>>>>>> Please take a look and them and feel free to contribute/comment o= n the >>>>>>>>> jiras. >>>>>>>>> >>>>>>>>> -- >>>>>>>>> thanks >>>>>>>>> mahadev >>>>>>>>> @mahadevkonar >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, May 3, 2011 at 2:07 PM, Ketan Gangatirkar wrote: >>>>>>>>>> Hi. =A0We're considering ZooKeeper for coordinating operations a= cross >>>>>>>>>> multiple data centers. =A0These data centers will occasionally b= e >>>>>>>>>> disconnected. =A0We were planning on using observers in remote d= ata >>>>>>>>>> centers. =A0Our applications can survive being unable to *write*= to >>>>>>>>>> ZooKeeper, but they do need to be able to read from it, even if = the >>>>>>>>>> data were stale. >>>>>>>>>> >>>>>>>>>> On further examination, it looks like observers must always be >>>>>>>>>> connected to the quorum to function at all. =A0Is this correct? = =A0Does >>>>>>>>>> anyone have suggestions for how to work around this problem? =A0= The >>>>>>>>>> first thing that comes to mind is duplicating the required data = in >>>>>>>>>> some other local data store and falling back on that when the DC >>>>>>>>>> becomes disconnected. =A0I imagine the disadvantages of that are= obvious >>>>>>>>>> to everyone. =A0I hope someone can share some great idea that al= lows me >>>>>>>>>> to avoid that miserable fate. =A0Thanks. >>>>>>>>>> >>>>>>>>>> -- >>>>>>>>>> Ketan Gangatirkar >>>>>>>>>> ketan@indeed.com >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Ketan Gangatirkar >>>>>>>> ketan@indeed.com >>>>>>>> Perishable Developer >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Ketan Gangatirkar >>>>>> ketan@indeed.com >>>>>> Perishable Developer >>>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Ketan Gangatirkar >>>> ketan@indeed.com >>>> Perishable Developer >>>> >>> >> >> >> >> -- >> Ketan Gangatirkar >> ketan@indeed.com >> Perishable Developer >> > --=20 Ketan Gangatirkar ketan@indeed.com Perishable Developer