hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@yahoo-inc.com>
Subject Re: Unending Leader Elections in WAN deploy
Date Sat, 01 Aug 2009 04:51:13 GMT
Perfect! Thanks for the update, Todd.

-Flavio

On Jul 31, 2009, at 8:17 PM, Todd Greenwood wrote:

> Thanks. You were right, I had a stale version of 479. Compilation
> succeeds and all tests pass on branch-3.2 with the latest patches 473,
> 479, 481, and 491.
>
> -Todd
>
>> -----Original Message-----
>> From: Flavio Junqueira [mailto:fpj@yahoo-inc.com]
>> Sent: Friday, July 31, 2009 7:48 PM
>> To: zookeeper-user@hadoop.apache.org
>> Subject: Re: Unending Leader Elections in WAN deploy
>>
>> It should be in 479. Perhaps you have a stale version of the patch.
>>
>> -Flavio
>>
>> On Jul 31, 2009, at 7:46 PM, Todd Greenwood wrote:
>>
>>> Flavio,
>>>
>>> I'm getting a compilation error for patch 491:
>>>
>>> compile-main:
>>>   [javac] Compiling 1 source file to
>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
>>> src/p
>>> atched/branch-3.2/build/classes
>>>   [javac]
>>> /home/toddg/asi/workspaces/main/Main/RSI/etc/holmes/main/zookeeper/
>>> src/p
>>> atched/branch-3.2/src/java/main/org/apache/zookeeper/server/quorum/
>>> FastL
>>> eaderElection.java:601: cannot find symbol
>>>   [javac] symbol  : method getWeight(long)
>>>   [javac] location: interface
>>> org.apache.zookeeper.server.quorum.flexible.QuorumVerifier
>>>   [javac]
>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
>>>   [javac]                                                    ^
>>>   [javac] 1 error
>>>
>>> I see a reference to getWeight in both FastLeaderElection.java in
>>> patch
>>> 491:
>>>
>>> patches/ZOOKEEPER-491.patch:+
>>> if(self.getQuorumVerifier().getWeight(n.sid) != 0)
>>> src/java/main/org/apache/zookeeper/server/quorum/
>>> FastLeaderElection.java
>>> :
>>> if(self.getQuorumVerifier().getWeight(n.sid) !=
>>> 0)
>>>
>>> However, I don't see a reference to this method in patches 473, 479,
>>> or
>>> 481. I also don't see a reference to this method in the trunk...
>>>
>>> -Todd
>>>
>>>> -----Original Message-----
>>>> From: Todd Greenwood [mailto:toddg@audiencescience.com]
>>>> Sent: Friday, July 31, 2009 7:30 PM
>>>> To: zookeeper-user@hadoop.apache.org
>>>> Subject: RE: Unending Leader Elections in WAN deploy
>>>>
>>>> Ok, I'll apply that patch and report back.
>>>> -Todd
>>>>
>>>>> -----Original Message-----
>>>>> From: Flavio Junqueira [mailto:fpj@yahoo-inc.com]
>>>>> Sent: Friday, July 31, 2009 7:18 PM
>>>>> To: zookeeper-user@hadoop.apache.org
>>>>> Subject: Re: Unending Leader Elections in WAN deploy
>>>>>
>>>>> You're missing 491 from your set of patches.
>>>>>
>>>>> -Flavio
>>>>>
>>>>> On Jul 31, 2009, at 7:15 PM, Todd Greenwood wrote:
>>>>>
>>>>>> This repro's in both branch-3.2, and branch-3.2+patches(473, 479,
>>>>>> 481).
>>>>>>
>>>>>> Basically, it seems like the nodes are electing pd4-zook02 to be
>>> the
>>>>>> leader. However, pd4-zook02 seems to realize it's not supposed to
>>> be
>>>>>> and
>>>>>> then disconnects everyone. Then they re-elect it again, and it
>>> loops
>>>>>> over and over.
>>>>>>
>>>>>> -------------
>>>>>> Server config
>>>>>> -------------
>>>>>>
>>>>>> server.1=dc1-zook01.dc01.revsci.net:2888:3888
>>>>>> server.2=dc1-zook02.dc01.revsci.net:2888:3888
>>>>>> server.3=dc1-zook03.dc01.revsci.net:2888:3888
>>>>>> server.4=dc1-zook04.dc01.revsci.net:2888:3888
>>>>>> server.5=dc1-zook05.dc01.revsci.net:2888:3888
>>>>>> server.6=pd1-zook01.pd01.revsci.net:2888:3888
>>>>>> server.7=pd1-zook02.pd01.revsci.net:2888:3888
>>>>>> server.8=pd4-zook01.iad1.audsci.net:2888:3888
>>>>>> server.9=pd4-zook02.iad1.audsci.net:2888:3888
>>>>>>
>>>>>> group.1:1:2:3:4:5
>>>>>> weight.1=1
>>>>>> weight.2=1
>>>>>> weight.3=1
>>>>>> weight.4=1
>>>>>> weight.5=1
>>>>>>
>>>>>> group.2:6:7:8:9
>>>>>> weight.6=0
>>>>>> weight.7=0
>>>>>> weight.8=0
>>>>>> weight.9=0
>>>>>>
>>>>>> Note that we have 2 groups, composed of machines in 3 different
>>>>>> locations (dc1, pd1, and pd4). The idea is that only machines in
>>> dc1
>>>>>> have voting rights, and the ability to become a leader. The
>>> machines
>>>>>> in
>>>>>> the pods all have a weight of zero, and are not expected to
> become
>>>>>> leaders, or to vote on transactions.
>>>>>>
>>>>>> Let me know what I can do to help resolve this issue.
>>>>>>
>>>>>> -Todd
>>>
>


Mime
View raw message