No worries. Kudos to Mahadev sniffing out the UDP in the netstat, I
glossed right over it. ;-)
Lots of good fixes in 3.2.2 vs pre-3.2. Still doesn't explain what Nick
was seeing originally though...
Patrick
Jean-Daniel d wrote:
> Oh my god! You are right, we run an old dev version of 3.2.0:
>
> zookeeper-r785019-hbase-1329.jar
>
> This was what we shipped HBase trunk with last summer... This quorum
> has an uptime of more than 6 months! Well I guess that explains it, I
> thought we restarted it since then during our HBase upgrades but it
> seems not so I'm very sorry about this false alert.
>
> So... all I can say is thank you guys for such a reliable software!
> We'll be upgrading to 3.2.2 really soon.
>
> J-D
>
> On Mon, Jan 25, 2010 at 1:44 PM, Patrick Hunt <phunt@apache.org> wrote:
>> JD, there's something _very_ unusual in your setup. Are you running
>> "official" released ZooKeeper code or something else?
>>
>> Either there is a misconfiguration on the other servers (the configs for the
>> other servers is exactly the same as 222 right?), or perhaps some patches to
>> ZK codebase that went awry?
>>
>> See the attached file "zk_ports.txt". This is a summary of the netstat -a
>> that you sent. Notice in particular that UDP sockets are open for port 2888!
>> This should not happen in the default ZK configuration case.
>>
>> By default we only use tcp connections between servers (quorum & election).
>> There is a "electionAlg" option that allows users to turn off the TCP based
>> fast leader election and go with a UDP based, but I don't see that in the
>> config you provided for 222. (as I said, assuming you are not setting this
>> option on the other servers either, correct?).
>>
>>
>> Mahadev and I do remember that there was a bug in the 3.2 branch prior to
>> 3.2 ever being released that caused us to use non-FLE (so UDP based)
>> election by default, however we fixed that before 3.2.0 ever shipped (it was
>> a bug in our config processing code) and it was never exposed in an official
>> release. Perhaps you have picked up some code prior to that?
>>
>> Patrick
>>
>> Jean-Daniel Cryans wrote:
>>>> According to the log for 222 it can't open a connection to the election
>>>> port
>>>> (3888) for any of the other servers. This seems very unusual. Can you
>>>> verify
>>>> that ther's connectivity on that port btw 222 and all the other servers?
>>> jdcryans@sv4borg222:~$ telnet sv4borg224 3888
>>> Trying 10.10.20.224...
>>> telnet: Unable to connect to remote host: Connection refused
>>> jdcryans@sv4borg222:~$ telnet sv4borg224 2888
>>> Trying 10.10.20.224...
>>> Connected to sv4borg224.
>>> Escape character is '^]'.
>>>
>>>> Also, can you re-run the netstat with -a option? We can see the listen
>>>> sockets that way (omitted by netstat by default). It would be great if
>>>> you
>>>> could send the netstat for all 5 servers.
>>> I updated the tar.gz with the 5 netstat -anp
>>>
>>> Thx!
>>>
>>> J-D
>>>
>>>> Thanks,
>>>>
>>>> Patrick
>>>>
>>>> Jean-Daniel Cryans wrote:
>>>>> Everything is here
>>>>> http://people.apache.org/~jdcryans/zk_election_bug.tar.gz
>>>>>
>>>>> The server we are trying to start is sv4borg222 (myid is 2) and we
>>>>> started it around 10:03:21
>>>>>
>>>>> Thx!
>>>>>
>>>>> J-D
>>>>>
>> tcp6 0 0 10.10.20.221:34865 10.10.20.224:2888
>> ESTABLISHED 14682/java
>> udp6 0 0 :::2888 :::*
>> 14682/java
>>
>>
>> tcp6 0 0 :::3888 :::* LISTEN
>> 4092/java
>> unix 2 [ ] STREAM CONNECTED 721588877 7642/java
>>
>>
>> tcp6 0 0 10.10.20.223:42518 10.10.20.224:2888
>> ESTABLISHED 2704/java
>> udp6 0 0 :::2888 :::*
>> 2704/java
>>
>>
>> tcp6 0 0 :::2888 :::* LISTEN
>> 31052/java
>> tcp6 0 0 10.10.20.224:2888 10.10.20.223:42518
>> ESTABLISHED 31052/java
>> tcp6 0 0 10.10.20.224:2888 10.10.20.225:51459
>> ESTABLISHED 31052/java
>> tcp6 0 0 10.10.20.224:2888 10.10.20.221:34865
>> ESTABLISHED 31052/java
>> udp6 0 0 :::2888 :::*
>> 31052/java
>>
>>
>> tcp6 0 0 10.10.20.225:51459 10.10.20.224:2888
>> ESTABLISHED 19545/java
>> udp6 0 0 :::2888 :::*
>> 19545/java
>>
>>
|