zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ketan Gangatirkar <ke...@indeed.com>
Subject Re: observers in occasionally disconnected data centers
Date Thu, 19 May 2011 21:17:25 GMT
Good news: the patch build ran successfully and gave every check a +1.
 What's next to get this into trunk?

On Thu, May 19, 2011 at 2:05 PM, Patrick Hunt <phunt@apache.org> wrote:
> Hi Ketan, sorry about this. A number of build folks have looked but
> can't seem to figure out what's wrong on some of these build hosts.
> Running "java" just fails for no reason.
>
> Nigel and I spent part of the day yesterday looking into this with no
> luck. For the time being I've pinned the job down to hadoop9 (where
> java seems to be picked up fine). You should see a report come through
> shortly.
>
> Feel free to reach out to me personally wrt finalizing this issue.
>
> Patrick
>
> On Tue, May 17, 2011 at 6:07 PM, Ketan Gangatirkar <ketan@indeed.com> wrote:
>> Hi.  Has there been any progress on this?  Thanks.
>>
>> On Fri, May 6, 2011 at 11:32 AM, Patrick Hunt <phunt@apache.org> wrote:
>>> Mahadev is working with Giri to address. The jenkins folks are saying
>>> this is a machine administered by Yahoo and the issue needs to be
>>> address with them (their admins, but Mahadev/Giri are looking into it
>>> from our (zk) side).
>>>
>>> Patrick
>>>
>>> On Fri, May 6, 2011 at 4:33 AM, Ketan Gangatirkar <ketan@indeed.com> wrote:
>>>> Hi, Patrick.  Were you able to get any assistance from the hudson
>>>> admins?  Thanks.
>>>>
>>>> On Wed, May 4, 2011 at 12:53 PM, Patrick Hunt <phunt@apache.org> wrote:
>>>>> This is odd, it's failing in the c tests but for a weird reason:
>>>>>
>>>>> in:
>>>>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/247/artifact/trunk/build/tmp/zk.log
>>>>>
>>>>> it says:
>>>>> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/zkServer.sh:
>>>>> line 115: java: command not found
>>>>>
>>>>> I'll ping the hudson admins and see if this is a known issue (also
>>>>> hudson is very slow today for some reason).
>>>>>
>>>>> Once that's addressed we should be good to go.
>>>>>
>>>>> Patrick
>>>>>
>>>>> On Wed, May 4, 2011 at 9:57 AM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>>> Got the patch formatted right and applying successfully, now I'll
see
>>>>>> if I can figure out the unit test failure.
>>>>>>
>>>>>> On Wed, May 4, 2011 at 11:26 AM, Patrick Hunt <phunt@apache.org>
wrote:
>>>>>>> Hi Ketan, the patch is failing to apply
>>>>>>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/246//console
>>>>>>>
>>>>>>> Looks like you used git, I usually do something like:
>>>>>>> git diff rev1..rev2 --no-prefix > ZOOKEEPER-784.patch
>>>>>>> can you give it another try?
>>>>>>>
>>>>>>> Patrick
>>>>>>>
>>>>>>> On Tue, May 3, 2011 at 6:42 PM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>>>>> I have updated Sergey's patch to:
>>>>>>>>
>>>>>>>> * apply to current trunk
>>>>>>>> * incorporate one trivial output change he made to StatCommand
in
>>>>>>>> NettyServerCnxn.java
>>>>>>>> * change log4j references to slf4j
>>>>>>>>
>>>>>>>> I have successfully run ant releaseaudit on the result.  The
updated
>>>>>>>> patch is now attached to the issue:
>>>>>>>>
>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784
>>>>>>>>
>>>>>>>> I do *not* make any claim to have understood the contents
of this
>>>>>>>> patch; all I did was synch everything and fix the obvious
log4j/slf4j
>>>>>>>> change.  Now what?
>>>>>>>>
>>>>>>>>
>>>>>>>> On Tue, May 3, 2011 at 5:46 PM, Patrick Hunt <phunt@apache.org>
wrote:
>>>>>>>>> The core tests failed on last hudson, I just kicked off
a patch build,
>>>>>>>>> seems recent changes (logging?) have caused the patch
to stop
>>>>>>>>> applying:
>>>>>>>>> https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/238/console
>>>>>>>>>
>>>>>>>>> Ketan would you like to try updating the patch and resubmit?
>>>>>>>>>
>>>>>>>>> Patrick
>>>>>>>>>
>>>>>>>>> On Tue, May 3, 2011 at 3:31 PM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>>>>>>> Thanks, Mahadev.  I had seen ZOOKEEPER-892 but not
ZOOKEEPER-784.  The
>>>>>>>>>> latter may be what we need.
>>>>>>>>>>
>>>>>>>>>> I read the comments attached to that issue.  The
most recent comment
>>>>>>>>>> was a Hudson CI message indicating that the tests
against the patch
>>>>>>>>>> failed.  I was not able to find out more as it appears
that the
>>>>>>>>>> configuration of the Apache Hudson has changed.  It
appears that the
>>>>>>>>>> patch was approved but not merged into trunk, and
it's now in limbo.
>>>>>>>>>> What is necessary to get that feature into the next
release?  I may be
>>>>>>>>>> able to assist, depending on what's involved.  Thank
you.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, May 3, 2011 at 4:17 PM, Mahadev Konar <mahadev@apache.org>
wrote:
>>>>>>>>>>> Hi Ketan,
>>>>>>>>>>>  You are correct that observers need connection
to quorum as well.
>>>>>>>>>>> There have been quite a few discussions on multi
colo replication and
>>>>>>>>>>> read only mode of ZooKeeper.
>>>>>>>>>>>
>>>>>>>>>>> Here are the jiras for those:
>>>>>>>>>>>
>>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784
>>>>>>>>>>> and
>>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-892
>>>>>>>>>>>
>>>>>>>>>>> These have been mostly targeted at exactly a
use case like yours.
>>>>>>>>>>> Please take a look and them and feel free to
contribute/comment on the
>>>>>>>>>>> jiras.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> thanks
>>>>>>>>>>> mahadev
>>>>>>>>>>> @mahadevkonar
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 3, 2011 at 2:07 PM, Ketan Gangatirkar
<ketan@indeed.com> wrote:
>>>>>>>>>>>> Hi.  We're considering ZooKeeper for coordinating
operations across
>>>>>>>>>>>> multiple data centers.  These data centers
will occasionally be
>>>>>>>>>>>> disconnected.  We were planning on using
observers in remote data
>>>>>>>>>>>> centers.  Our applications can survive being
unable to *write* to
>>>>>>>>>>>> ZooKeeper, but they do need to be able to
read from it, even if the
>>>>>>>>>>>> data were stale.
>>>>>>>>>>>>
>>>>>>>>>>>> On further examination, it looks like observers
must always be
>>>>>>>>>>>> connected to the quorum to function at all.
 Is this correct?  Does
>>>>>>>>>>>> anyone have suggestions for how to work around
this problem?  The
>>>>>>>>>>>> first thing that comes to mind is duplicating
the required data in
>>>>>>>>>>>> some other local data store and falling back
on that when the DC
>>>>>>>>>>>> becomes disconnected.  I imagine the disadvantages
of that are obvious
>>>>>>>>>>>> to everyone.  I hope someone can share some
great idea that allows me
>>>>>>>>>>>> to avoid that miserable fate.  Thanks.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> Ketan Gangatirkar
>>>>>>>>>>>> ketan@indeed.com
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Ketan Gangatirkar
>>>>>>>>>> ketan@indeed.com
>>>>>>>>>> Perishable Developer
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Ketan Gangatirkar
>>>>>>>> ketan@indeed.com
>>>>>>>> Perishable Developer
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Ketan Gangatirkar
>>>>>> ketan@indeed.com
>>>>>> Perishable Developer
>>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Ketan Gangatirkar
>>>> ketan@indeed.com
>>>> Perishable Developer
>>>>
>>>
>>
>>
>>
>> --
>> Ketan Gangatirkar
>> ketan@indeed.com
>> Perishable Developer
>>
>



-- 
Ketan Gangatirkar
ketan@indeed.com
Perishable Developer

Mime
View raw message