zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: observers in occasionally disconnected data centers
Date Thu, 19 May 2011 23:59:16 GMT
Henry just committed this to trunk. Thanks everyone -- esp Sergey!

Patrick

On Thu, May 19, 2011 at 2:17 PM, Ketan Gangatirkar <ketan@indeed.com> wrote:
> Good news: the patch build ran successfully and gave every check a +1.
>  What's next to get this into trunk?
>
> On Thu, May 19, 2011 at 2:05 PM, Patrick Hunt <phunt@apache.org> wrote:
>> Hi Ketan, sorry about this. A number of build folks have looked but
>> can't seem to figure out what's wrong on some of these build hosts.
>> Running "java" just fails for no reason.
>>
>> Nigel and I spent part of the day yesterday looking into this with no
>> luck. For the time being I've pinned the job down to hadoop9 (where
>> java seems to be picked up fine). You should see a report come through
>> shortly.
>>
>> Feel free to reach out to me personally wrt finalizing this issue.
>>
>> Patrick
>>
>> On Tue, May 17, 2011 at 6:07 PM, Ketan Gangatirkar <ketan@indeed.com> wrote:
>>> Hi.  Has there been any progress on this?  Thanks.
>>>
>>> On Fri, May 6, 2011 at 11:32 AM, Patrick Hunt <phunt@apache.org> wrote:
>>>> Mahadev is working with Giri to address. The jenkins folks are saying
>>>> this is a machine administered by Yahoo and the issue needs to be
>>>> address with them (their admins, but Mahadev/Giri are looking into it
>>>> from our (zk) side).
>>>>
>>>> Patrick
>>>>
>>>> On Fri, May 6, 2011 at 4:33 AM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>> Hi, Patrick.  Were you able to get any assistance from the hudson
>>>>> admins?  Thanks.
>>>>>
>>>>> On Wed, May 4, 2011 at 12:53 PM, Patrick Hunt <phunt@apache.org>
wrote:
>>>>>> This is odd, it's failing in the c tests but for a weird reason:
>>>>>>
>>>>>> in:
>>>>>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/247/artifact/trunk/build/tmp/zk.log
>>>>>>
>>>>>> it says:
>>>>>> /grid/0/hudson/hudson-slave/workspace/PreCommit-ZOOKEEPER-Build/trunk/src/c/tests/zkServer.sh:
>>>>>> line 115: java: command not found
>>>>>>
>>>>>> I'll ping the hudson admins and see if this is a known issue (also
>>>>>> hudson is very slow today for some reason).
>>>>>>
>>>>>> Once that's addressed we should be good to go.
>>>>>>
>>>>>> Patrick
>>>>>>
>>>>>> On Wed, May 4, 2011 at 9:57 AM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>>>> Got the patch formatted right and applying successfully, now
I'll see
>>>>>>> if I can figure out the unit test failure.
>>>>>>>
>>>>>>> On Wed, May 4, 2011 at 11:26 AM, Patrick Hunt <phunt@apache.org>
wrote:
>>>>>>>> Hi Ketan, the patch is failing to apply
>>>>>>>> https://builds.apache.org/hudson/job/PreCommit-ZOOKEEPER-Build/246//console
>>>>>>>>
>>>>>>>> Looks like you used git, I usually do something like:
>>>>>>>> git diff rev1..rev2 --no-prefix > ZOOKEEPER-784.patch
>>>>>>>> can you give it another try?
>>>>>>>>
>>>>>>>> Patrick
>>>>>>>>
>>>>>>>> On Tue, May 3, 2011 at 6:42 PM, Ketan Gangatirkar <ketan@indeed.com>
wrote:
>>>>>>>>> I have updated Sergey's patch to:
>>>>>>>>>
>>>>>>>>> * apply to current trunk
>>>>>>>>> * incorporate one trivial output change he made to StatCommand
in
>>>>>>>>> NettyServerCnxn.java
>>>>>>>>> * change log4j references to slf4j
>>>>>>>>>
>>>>>>>>> I have successfully run ant releaseaudit on the result.
 The updated
>>>>>>>>> patch is now attached to the issue:
>>>>>>>>>
>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784
>>>>>>>>>
>>>>>>>>> I do *not* make any claim to have understood the contents
of this
>>>>>>>>> patch; all I did was synch everything and fix the obvious
log4j/slf4j
>>>>>>>>> change.  Now what?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, May 3, 2011 at 5:46 PM, Patrick Hunt <phunt@apache.org>
wrote:
>>>>>>>>>> The core tests failed on last hudson, I just kicked
off a patch build,
>>>>>>>>>> seems recent changes (logging?) have caused the patch
to stop
>>>>>>>>>> applying:
>>>>>>>>>> https://hudson.apache.org/hudson/view/S-Z/view/ZooKeeper/job/PreCommit-ZOOKEEPER-Build/238/console
>>>>>>>>>>
>>>>>>>>>> Ketan would you like to try updating the patch and
resubmit?
>>>>>>>>>>
>>>>>>>>>> Patrick
>>>>>>>>>>
>>>>>>>>>> On Tue, May 3, 2011 at 3:31 PM, Ketan Gangatirkar
<ketan@indeed.com> wrote:
>>>>>>>>>>> Thanks, Mahadev.  I had seen ZOOKEEPER-892 but
not ZOOKEEPER-784.  The
>>>>>>>>>>> latter may be what we need.
>>>>>>>>>>>
>>>>>>>>>>> I read the comments attached to that issue.  The
most recent comment
>>>>>>>>>>> was a Hudson CI message indicating that the tests
against the patch
>>>>>>>>>>> failed.  I was not able to find out more as
it appears that the
>>>>>>>>>>> configuration of the Apache Hudson has changed.
 It appears that the
>>>>>>>>>>> patch was approved but not merged into trunk,
and it's now in limbo.
>>>>>>>>>>> What is necessary to get that feature into the
next release?  I may be
>>>>>>>>>>> able to assist, depending on what's involved.
 Thank you.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, May 3, 2011 at 4:17 PM, Mahadev Konar
<mahadev@apache.org> wrote:
>>>>>>>>>>>> Hi Ketan,
>>>>>>>>>>>>  You are correct that observers need connection
to quorum as well.
>>>>>>>>>>>> There have been quite a few discussions on
multi colo replication and
>>>>>>>>>>>> read only mode of ZooKeeper.
>>>>>>>>>>>>
>>>>>>>>>>>> Here are the jiras for those:
>>>>>>>>>>>>
>>>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-784
>>>>>>>>>>>> and
>>>>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-892
>>>>>>>>>>>>
>>>>>>>>>>>> These have been mostly targeted at exactly
a use case like yours.
>>>>>>>>>>>> Please take a look and them and feel free
to contribute/comment on the
>>>>>>>>>>>> jiras.
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> thanks
>>>>>>>>>>>> mahadev
>>>>>>>>>>>> @mahadevkonar
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, May 3, 2011 at 2:07 PM, Ketan Gangatirkar
<ketan@indeed.com> wrote:
>>>>>>>>>>>>> Hi.  We're considering ZooKeeper for
coordinating operations across
>>>>>>>>>>>>> multiple data centers.  These data centers
will occasionally be
>>>>>>>>>>>>> disconnected.  We were planning on using
observers in remote data
>>>>>>>>>>>>> centers.  Our applications can survive
being unable to *write* to
>>>>>>>>>>>>> ZooKeeper, but they do need to be able
to read from it, even if the
>>>>>>>>>>>>> data were stale.
>>>>>>>>>>>>>
>>>>>>>>>>>>> On further examination, it looks like
observers must always be
>>>>>>>>>>>>> connected to the quorum to function at
all.  Is this correct?  Does
>>>>>>>>>>>>> anyone have suggestions for how to work
around this problem?  The
>>>>>>>>>>>>> first thing that comes to mind is duplicating
the required data in
>>>>>>>>>>>>> some other local data store and falling
back on that when the DC
>>>>>>>>>>>>> becomes disconnected.  I imagine the
disadvantages of that are obvious
>>>>>>>>>>>>> to everyone.  I hope someone can share
some great idea that allows me
>>>>>>>>>>>>> to avoid that miserable fate.  Thanks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> Ketan Gangatirkar
>>>>>>>>>>>>> ketan@indeed.com
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> Ketan Gangatirkar
>>>>>>>>>>> ketan@indeed.com
>>>>>>>>>>> Perishable Developer
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Ketan Gangatirkar
>>>>>>>>> ketan@indeed.com
>>>>>>>>> Perishable Developer
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Ketan Gangatirkar
>>>>>>> ketan@indeed.com
>>>>>>> Perishable Developer
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Ketan Gangatirkar
>>>>> ketan@indeed.com
>>>>> Perishable Developer
>>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Ketan Gangatirkar
>>> ketan@indeed.com
>>> Perishable Developer
>>>
>>
>
>
>
> --
> Ketan Gangatirkar
> ketan@indeed.com
> Perishable Developer
>

Mime
View raw message