hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mahadev Konar <maha...@yahoo-inc.com>
Subject Re: Unending Leader Elections in WAN deploy
Date Tue, 04 Aug 2009 18:20:13 GMT
Hi Todd, 
 I just committed 480 and 491. You can checkout the 3.2 branch now.

Thanks
mahadev


On 8/3/09 4:29 PM, "Todd Greenwood" <toddg@audiencescience.com> wrote:

> That'd be perfect. Thanks!
> 
>> -----Original Message-----
>> From: Mahadev Konar [mailto:mahadev@yahoo-inc.com]
>> Sent: Monday, August 03, 2009 4:24 PM
>> To: zookeeper-user@hadoop.apache.org
>> Subject: Re: Unending Leader Elections in WAN deploy
>> 
>> Hi Todd,
>>   Most of the patches that you mention should be in the branch 3.2 by
> tomm
>> or so. 481, 479 are already in. 480 and 491 should be in by tomm.
> Would
>> that
>> suffice for you?
>> 
>> Thanks
>> mahadev
>> 
>> 
>> On 8/3/09 4:21 PM, "Todd Greenwood" <toddg@audiencescience.com> wrote:
>> 
>>> Another problem...I've reverted to the latest versions of the
> patches
>>> that are not specific to branch-3.2, and I'm getting two compilation
>>> errors:
>>> 
>>> build-generated:
>>>     [javac] Compiling 44 source files to
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>> atched/branch-3.2/build/classes
>>> 
>>> compile-main:
>>>     [javac] Compiling 2 source files to
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>> atched/branch-3.2/build/classes
>>>     [javac]
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>> 
> atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
>>> mStats.java:30: name clash: getQuorumPeers() and getQuorumPeers()
> have
>>> the same erasure
>>>     [javac]         public String[] getQuorumPeers();
>>>     [javac]                         ^
>>>     [javac]
>>> 
> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/src/p
>>> 
> atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/Quoru
>>> mStats.java:31: name clash: getServerState() and getServerState()
> have
>>> the same erasure
>>>     [javac]         public String getServerState();
>>>     [javac]                       ^
>>>     [javac] 2 errors
>>> 
>>> My build process is pretty simple:
>>> 
>>> 1. copy the branch-3.2 source to a temp directory
>>> (src/patched/branch-3.2)
>>> 2. apply the ZOOKEEPER patches in my patches directory
>>> 3. build zookeeper in the temp directory
>>> 
>>> -Todd
>>>> -----Original Message-----
>>>> From: Todd Greenwood [mailto:toddg@audiencescience.com]
>>>> Sent: Monday, August 03, 2009 4:09 PM
>>>> To: zookeeper-user@hadoop.apache.org
>>>> Subject: RE: Unending Leader Elections in WAN deploy
>>>> 
>>>> Flavio,
>>>> I notice that you've updated the patches referenced for the WAN
>>>> deployment. There appears to be an order dependency w/ respect to
>>> these
>>>> four patches...
>>>> 
>>>> ZOOKEEPER-473.patch  ZOOKEEPER-479-branch3.2.patch
>>>> ZOOKEEPER-481-branch3.2.patch  ZOOKEEPER-491.patch
>>>> 
>>>> 473 -> 479 (479 fails)
>>>> 
>>>> 
>>> 
> toddg@TODDG01LT:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
>>>> /src/patched/branch-3.2$ patch -p0 <
>>>> ../patches/ZOOKEEPER-479-branch3.2.patch
>>>> patching file
>>>> 
>>> 
> src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumHierarch
>>>> ical.java
>>>> patching file
>>>> 
>>> 
> src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumMaj.java
>>>> patching file
>>>> 
>>> 
> src/java/main/org/apache/zookeeper/server/quorum/flexible/QuorumVerifier
>>>> .java
>>>> patching file
>>>> src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java
>>>> Hunk #1 FAILED at 93.
>>>> Hunk #2 FAILED at 145.
>>>> 2 out of 2 hunks FAILED -- saving rejects to file
>>>> 
>>> 
> src/java/test/org/apache/zookeeper/test/HierarchicalQuorumTest.java.rej
>>>> 
>>> 
> toddg@TODDG01LT:~/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper
>>>> /src/patched/branch-3.2$ h ../patches/
>>>> 
>>>> Could you advise as to which patches I need to apply, and in what
>>> order?
>>>> 
>>>> -Todd
>>>> 
>>>>> -----Original Message-----
>>>>> From: Flavio Junqueira [mailto:fpj@yahoo-inc.com]
>>>>> Sent: Friday, July 31, 2009 9:51 PM
>>>>> To: zookeeper-user@hadoop.apache.org
>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>> 
>>>>> Perfect! Thanks for the update, Todd.
>>>>> 
>>>>> -Flavio
>>>>> 
>>>>> On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote:
>>>>> 
>>>>>> Thanks. You were right, I had a stale version of 479. Compilation
>>>>>> succeeds and all tests pass on branch-3.2 with the latest patches
>>>> 473,
>>>>>> 479, 481, and 491.
>>>>>> 
>>>>>> -Todd
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Flavio Junqueira [mailto:fpj@yahoo-inc.com]
>>>>>>> Sent: Friday, July 31, 2009 7:48 PM
>>>>>>> To: zookeeper-user@hadoop.apache.org
>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>>>> 
>>>>>>> It should be in 479. Perhaps you have a stale version of the
>>> patch.
>>>>>>> 
>>>>>>> -Flavio
>>>>>>> 
>>>>>>> On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote:
>>>>>>> 
>>>>>>>> Flavio,
>>>>>>>> 
>>>>>>>> I'm getting a compilation error for patch 491:
>>>>>>>> 
>>>>>>>> compile-main:
>>>>>>>>   [javac] Compiling 1 source file to
>>>>>>>> 
>>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
>>>>>>>> src/p
>>>>>>>> atched/branch-3.2/build/classes
>>>>>>>>   [javac]
>>>>>>>> 
>>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
>>>>>>>> src/p
>>>>>>>> 
>>>> atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/
>>>>>>>> FastL
>>>>>>>> eaderElection.java:601: cannot find symbol
>>>>>>>>   [javac] symbol  : method getWeight(long)
>>>>>>>>   [javac] location: interface
>>>>>>>> org.apache.zookeeper.server.quorum.flexible.QuorumVerifier
>>>>>>>>   [javac]
>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
>>>>>>>>   [javac]                                               
    ^
>>>>>>>>   [javac] 1 error
>>>>>>>> 
>>>>>>>> I see a reference to getWeight in both FastLeaderElection.java
>>> in
>>>>>>>> patch
>>>>>>>> 491:
>>>>>>>> 
>>>>>>>> patches/ZOOKEEPER-491.patch:+
>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
>>>>>>>> src/java/main/org/apache/zookeeper/server/quorum/
>>>>>>>> FastLeaderElection.java
>>>>>>>> :
>>>>>>>> if(self.getQuorumVerifier().getWeight(n.sid) !=
>>>>>>>> 0)
>>>>>>>> 
>>>>>>>> However, I don't see a reference to this method in patches
473,
>>>> 479,
>>>>>>>> or
>>>>>>>> 481. I also don't see a reference to this method in the
> trunk...
>>>>>>>> 
>>>>>>>> -Todd
>>>>>>>> 
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Todd Greenwood [mailto:toddg@audiencescience.com]
>>>>>>>>> Sent: Friday, July 31, 2009 7:30 PM
>>>>>>>>> To: zookeeper-user@hadoop.apache.org
>>>>>>>>> Subject: RE: Unending Leader Elections in WAN deploy
>>>>>>>>> 
>>>>>>>>> Ok, I'll apply that patch and report back.
>>>>>>>>> -Todd
>>>>>>>>> 
>>>>>>>>>> -----Original Message-----
>>>>>>>>>> From: Flavio Junqueira [mailto:fpj@yahoo-inc.com]
>>>>>>>>>> Sent: Friday, July 31, 2009 7:18 PM
>>>>>>>>>> To: zookeeper-user@hadoop.apache.org
>>>>>>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>>>>>>> 
>>>>>>>>>> You're missing 491 from your set of patches.
>>>>>>>>>> 
>>>>>>>>>> -Flavio
>>>>>>>>>> 
>>>>>>>>>> On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote:
>>>>>>>>>> 
>>>>>>>>>>> This repro's in both branch-3.2, and branch-3.2+patches(473,
>>>> 479,
>>>>>>>>>>> 481).
>>>>>>>>>>> 
>>>>>>>>>>> Basically, it seems like the nodes are electing
pd4-zook02
> to
>>>> be
>>>>>>>> the
>>>>>>>>>>> leader. However, pd4-zook02 seems to realize
it's not
>>> supposed
>>>> to
>>>>>>>> be
>>>>>>>>>>> and
>>>>>>>>>>> then disconnects everyone. Then they re-elect
it again, and
>>> it
>>>>>>>> loops
>>>>>>>>>>> over and over.
>>>>>>>>>>> 
>>>>>>>>>>> -------------
>>>>>>>>>>> Server config
>>>>>>>>>>> -------------
>>>>>>>>>>> 
>>>>>>>>>>> server.1=dc1-zook01.dc01.revsci.net:2888:3888
>>>>>>>>>>> server.2=dc1-zook02.dc01.revsci.net:2888:3888
>>>>>>>>>>> server.3=dc1-zook03.dc01.revsci.net:2888:3888
>>>>>>>>>>> server.4=dc1-zook04.dc01.revsci.net:2888:3888
>>>>>>>>>>> server.5=dc1-zook05.dc01.revsci.net:2888:3888
>>>>>>>>>>> server.6=pd1-zook01.pd01.revsci.net:2888:3888
>>>>>>>>>>> server.7=pd1-zook02.pd01.revsci.net:2888:3888
>>>>>>>>>>> server.8=pd4-zook01.iad1.audsci.net:2888:3888
>>>>>>>>>>> server.9=pd4-zook02.iad1.audsci.net:2888:3888
>>>>>>>>>>> 
>>>>>>>>>>> group.1:1:2:3:4:5
>>>>>>>>>>> weight.1=1
>>>>>>>>>>> weight.2=1
>>>>>>>>>>> weight.3=1
>>>>>>>>>>> weight.4=1
>>>>>>>>>>> weight.5=1
>>>>>>>>>>> 
>>>>>>>>>>> group.2:6:7:8:9
>>>>>>>>>>> weight.6=0
>>>>>>>>>>> weight.7=0
>>>>>>>>>>> weight.8=0
>>>>>>>>>>> weight.9=0
>>>>>>>>>>> 
>>>>>>>>>>> Note that we have 2 groups, composed of machines
in 3
>>> different
>>>>>>>>>>> locations (dc1, pd1, and pd4). The idea is that
only
> machines
>>>> in
>>>>>>>> dc1
>>>>>>>>>>> have voting rights, and the ability to become
a leader. The
>>>>>>>> machines
>>>>>>>>>>> in
>>>>>>>>>>> the pods all have a weight of zero, and are not
expected to
>>>>>> become
>>>>>>>>>>> leaders, or to vote on transactions.
>>>>>>>>>>> 
>>>>>>>>>>> Let me know what I can do to help resolve this
issue.
>>>>>>>>>>> 
>>>>>>>>>>> -Todd
>>>>>>>> 
>>>>>> 
>>> 
> 


Mime
View raw message