hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Killing a zookeeper server
Date Mon, 25 Jan 2010 22:16:11 GMT
No worries. Kudos to Mahadev sniffing out the UDP in the netstat, I 
glossed right over it. ;-)

Lots of good fixes in 3.2.2 vs pre-3.2. Still doesn't explain what Nick 
was seeing originally though...

Patrick

Jean-Daniel d wrote:
> Oh my god! You are right, we run an old dev version of 3.2.0:
> 
> zookeeper-r785019-hbase-1329.jar
> 
> This was what we shipped HBase trunk with last summer... This quorum
> has an uptime of more than 6 months! Well I guess that explains it, I
> thought we restarted it since then during our HBase upgrades but it
> seems not so I'm very sorry about this false alert.
> 
> So... all I can say is thank you guys for such a reliable software!
> We'll be upgrading to 3.2.2 really soon.
> 
> J-D
> 
> On Mon, Jan 25, 2010 at 1:44 PM, Patrick Hunt <phunt@apache.org> wrote:
>> JD, there's something _very_ unusual in your setup. Are you running
>> "official" released ZooKeeper code or something else?
>>
>> Either there is a misconfiguration on the other servers (the configs for the
>> other servers is exactly the same as 222 right?), or perhaps some patches to
>> ZK codebase that went awry?
>>
>> See the attached file "zk_ports.txt". This is a summary of the netstat -a
>> that you sent. Notice in particular that UDP sockets are open for port 2888!
>> This should not happen in the default ZK configuration case.
>>
>> By default we only use tcp connections between servers (quorum & election).
>> There is a "electionAlg" option that allows users to turn off the TCP based
>> fast leader election and go with a UDP based, but I don't see that in the
>> config you provided for 222. (as I said, assuming you are not setting this
>> option on the other servers either, correct?).
>>
>>
>> Mahadev and I do remember that there was a bug in the 3.2 branch prior to
>> 3.2 ever being released that caused us to use non-FLE (so UDP based)
>> election by default, however we fixed that before 3.2.0 ever shipped (it was
>> a bug in our config processing code) and it was never exposed in an official
>> release. Perhaps you have picked up some code prior to that?
>>
>> Patrick
>>
>> Jean-Daniel Cryans wrote:
>>>> According to the log for 222 it can't open a connection to the election
>>>> port
>>>> (3888) for any of the other servers. This seems very unusual. Can you
>>>> verify
>>>> that ther's connectivity on that port btw 222 and all the other servers?
>>> jdcryans@sv4borg222:~$ telnet sv4borg224 3888
>>> Trying 10.10.20.224...
>>> telnet: Unable to connect to remote host: Connection refused
>>> jdcryans@sv4borg222:~$ telnet sv4borg224 2888
>>> Trying 10.10.20.224...
>>> Connected to sv4borg224.
>>> Escape character is '^]'.
>>>
>>>> Also, can you re-run the netstat with -a option? We can see the listen
>>>> sockets that way (omitted by netstat by default). It would be great if
>>>> you
>>>> could send the netstat for all 5 servers.
>>> I updated the tar.gz with the 5 netstat -anp
>>>
>>> Thx!
>>>
>>> J-D
>>>
>>>> Thanks,
>>>>
>>>> Patrick
>>>>
>>>> Jean-Daniel Cryans wrote:
>>>>> Everything is here
>>>>> http://people.apache.org/~jdcryans/zk_election_bug.tar.gz
>>>>>
>>>>> The server we are trying to start is sv4borg222 (myid is 2) and we
>>>>> started it around 10:03:21
>>>>>
>>>>> Thx!
>>>>>
>>>>> J-D
>>>>>
>> tcp6       0      0 10.10.20.221:34865      10.10.20.224:2888
>> ESTABLISHED 14682/java
>> udp6       0      0 :::2888                 :::*
>>    14682/java
>>
>>
>> tcp6       0      0 :::3888                 :::*                    LISTEN
>>    4092/java
>> unix  2      [ ]         STREAM     CONNECTED     721588877 7642/java
>>
>>
>> tcp6       0      0 10.10.20.223:42518      10.10.20.224:2888
>> ESTABLISHED 2704/java
>> udp6       0      0 :::2888                 :::*
>>    2704/java
>>
>>
>> tcp6       0      0 :::2888                 :::*                    LISTEN
>>    31052/java
>> tcp6       0      0 10.10.20.224:2888       10.10.20.223:42518
>>  ESTABLISHED 31052/java
>> tcp6       0      0 10.10.20.224:2888       10.10.20.225:51459
>>  ESTABLISHED 31052/java
>> tcp6       0      0 10.10.20.224:2888       10.10.20.221:34865
>>  ESTABLISHED 31052/java
>> udp6       0      0 :::2888                 :::*
>>    31052/java
>>
>>
>> tcp6       0      0 10.10.20.225:51459      10.10.20.224:2888
>> ESTABLISHED 19545/java
>> udp6       0      0 :::2888                 :::*
>>    19545/java
>>
>>

Mime
View raw message