zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: observers in occasionally disconnected data centers
Date Thu, 19 May 2011 19:05:54 GMT
Hi Ketan, sorry about this. A number of build folks have looked but
can't seem to figure out what's wrong on some of these build hosts.
Running "java" just fails for no reason.

Nigel and I spent part of the day yesterday looking into this with no
luck. For the time being I've pinned the job down to hadoop9 (where
java seems to be picked up fine). You should see a report come through
shortly.

Feel free to reach out to me personally wrt finalizing this issue.

Patrick

On Tue, May 17, 2011 at 6:07 PM, Ketan Gangatirkar <ketan@indeed.com> wrote:
> Hi.  Has there been any progress on this?  Thanks.
>
> On Fri, May 6, 2011 at 11:32 AM, Patrick Hunt <phunt@apache.org> wrote:
>> Mahadev is working with Giri to address. The jenkins folks are saying
>> this is a machine administered by Yahoo and the issue needs to be
>> address with them (their admins, but Mahadev/Giri are looking into it
>> from our (zk) side).
>>
>> Patrick
>>
>> On Fri, May 6, 2011 at 4:33 AM, Ketan Gangatirkar <ketan@indeed.com> wrote:
>>> Hi, Patrick.  Were you able to get any assistance from the hudson
>>> admins?  Thanks.
>>>
>>> On Wed, May 4, 2011 at 12:53 PM, Patrick Hunt <phunt@apache.org> wrote:
>>>> This is odd, it's failing in the c tests but for a weird reason:
>>>>
>>>> in:
>>>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/247/artifact/trunk/build/tmp/zk.log
>>>>
>>>> it says:
>>>> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/zkServer.sh:
>>>> line 115: java: command not found
>>>>
>>>> I'll ping the hudson admins and see if this is a known issue (also
>>>> hudson is very slow today for some reason).
>>>>
>>>> Once that's addressed we should be good to go.
>>>>
>>>> Patrick
>>>>
>>>> On Wed, May 4, 2011 at 9:57 AM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>> Got the patch formatted right and applying successfully, now I'll see
>>>>> if I can figure out the unit test failure.
>>>>>
>>>>> On Wed, May 4, 2011 at 11:26 AM, Patrick Hunt <phunt@apache.org>
wrote:
>>>>>> Hi Ketan, the patch is failing to apply
>>>>>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/246//console
>>>>>>
>>>>>> Looks like you used git, I usually do something like:
>>>>>> git diff rev1..rev2 --no-prefix > ZOOKEEPER-784.patch
>>>>>> can you give it another try?
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>> On Tue, May 3, 2011 at 6:42 PM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>>>> I have updated Sergey's patch to:
>>>>>>>
>>>>>>> * apply to current trunk
>>>>>>> * incorporate one trivial output change he made to StatCommand
in
>>>>>>> NettyServerCnxn.java
>>>>>>> * change log4j references to slf4j
>>>>>>>
>>>>>>> I have successfully run ant releaseaudit on the result.  The
updated
>>>>>>> patch is now attached to the issue:
>>>>>>>
>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784
>>>>>>>
>>>>>>> I do *not* make any claim to have understood the contents of
this
>>>>>>> patch; all I did was synch everything and fix the obvious log4j/slf4j
>>>>>>> change.  Now what?
>>>>>>>
>>>>>>>
>>>>>>> On Tue, May 3, 2011 at 5:46 PM, Patrick Hunt <phunt@apache.org>
wrote:
>>>>>>>> The core tests failed on last hudson, I just kicked off a
patch build,
>>>>>>>> seems recent changes (logging?) have caused the patch to
stop
>>>>>>>> applying:
>>>>>>>> https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/238/console
>>>>>>>>
>>>>>>>> Ketan would you like to try updating the patch and resubmit?
>>>>>>>>
>>>>>>>> Patrick
>>>>>>>>
>>>>>>>> On Tue, May 3, 2011 at 3:31 PM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>>>>>> Thanks, Mahadev.  I had seen ZOOKEEPER-892 but not ZOOKEEPER-784.
 The
>>>>>>>>> latter may be what we need.
>>>>>>>>>
>>>>>>>>> I read the comments attached to that issue.  The most
recent comment
>>>>>>>>> was a Hudson CI message indicating that the tests against
the patch
>>>>>>>>> failed.  I was not able to find out more as it appears
that the
>>>>>>>>> configuration of the Apache Hudson has changed.  It
appears that the
>>>>>>>>> patch was approved but not merged into trunk, and it's
now in limbo.
>>>>>>>>> What is necessary to get that feature into the next release?
 I may be
>>>>>>>>> able to assist, depending on what's involved.  Thank
you.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, May 3, 2011 at 4:17 PM, Mahadev Konar <mahadev@apache.org>
wrote:
>>>>>>>>>> Hi Ketan,
>>>>>>>>>>  You are correct that observers need connection
to quorum as well.
>>>>>>>>>> There have been quite a few discussions on multi
colo replication and
>>>>>>>>>> read only mode of ZooKeeper.
>>>>>>>>>>
>>>>>>>>>> Here are the jiras for those:
>>>>>>>>>>
>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784
>>>>>>>>>> and
>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-892
>>>>>>>>>>
>>>>>>>>>> These have been mostly targeted at exactly a use
case like yours.
>>>>>>>>>> Please take a look and them and feel free to contribute/comment
on the
>>>>>>>>>> jiras.
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> thanks
>>>>>>>>>> mahadev
>>>>>>>>>> @mahadevkonar
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Tue, May 3, 2011 at 2:07 PM, Ketan Gangatirkar
<ketan@indeed.com> wrote:
>>>>>>>>>>> Hi.  We're considering ZooKeeper for coordinating
operations across
>>>>>>>>>>> multiple data centers.  These data centers will
occasionally be
>>>>>>>>>>> disconnected.  We were planning on using observers
in remote data
>>>>>>>>>>> centers.  Our applications can survive being
unable to *write* to
>>>>>>>>>>> ZooKeeper, but they do need to be able to read
from it, even if the
>>>>>>>>>>> data were stale.
>>>>>>>>>>>
>>>>>>>>>>> On further examination, it looks like observers
must always be
>>>>>>>>>>> connected to the quorum to function at all.  Is
this correct?  Does
>>>>>>>>>>> anyone have suggestions for how to work around
this problem?  The
>>>>>>>>>>> first thing that comes to mind is duplicating
the required data in
>>>>>>>>>>> some other local data store and falling back
on that when the DC
>>>>>>>>>>> becomes disconnected.  I imagine the disadvantages
of that are obvious
>>>>>>>>>>> to everyone.  I hope someone can share some
great idea that allows me
>>>>>>>>>>> to avoid that miserable fate.  Thanks.
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ketan Gangatirkar
>>>>>>>>>>> ketan@indeed.com
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ketan Gangatirkar
>>>>>>>>> ketan@indeed.com
>>>>>>>>> Perishable Developer
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ketan Gangatirkar
>>>>>>> ketan@indeed.com
>>>>>>> Perishable Developer
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ketan Gangatirkar
>>>>> ketan@indeed.com
>>>>> Perishable Developer
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Ketan Gangatirkar
>>> ketan@indeed.com
>>> Perishable Developer
>>>
>>
>
>
>
> --
> Ketan Gangatirkar
> ketan@indeed.com
> Perishable Developer
>

Mime
View raw message