hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Killing a zookeeper server
Date Mon, 25 Jan 2010 19:56:07 GMT
Everything is here http://people.apache.org/~jdcryans/zk_election_bug.tar.gz

The server we are trying to start is sv4borg222 (myid is 2) and we
started it around 10:03:21

Thx!

J-D

On Mon, Jan 25, 2010 at 10:49 AM, Patrick Hunt <phunt@apache.org> wrote:
> 1) Capture the logs from all 5 servers
> 2) give the config for the "down" server, also indicate that it's server id
> is.
> 3) if possible it would be interesting to see the netstat information from 2
> of the servers - the one that's down and one or more of the others.
>
> Patrick
>
> Jean-Daniel Cryans wrote:
>>
>> I believe we've just hit the same problem with zk-3.2.1
>>
>> For some reason a machine crashed and it was part of our quorum of 5
>> servers. When we try to restart it it this does this (I replaced
>> hostname and IP):
>>
>> 2010-01-25 10:25:06,469 WARN
>> org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open
>> channel to 1 at election address somehost1/someip1:3888
>> java.net.ConnectException: Connection refused
>>        at sun.nio.ch.Net.connect(Native Method)
>>        at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>>        at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
>>        at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
>>        at
>> org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:356)
>>        at
>> org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:603)
>>        at
>> org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:488)
>>
>> It has been like that for almost 20 minutes now, trying every other
>> server in the quorum on different channels. ruok says imok but all
>> other commands say that ZK server isn't running. I don't believe that
>> 3.2.2 will help unless ZK-547 does more than it seems to.
>>
>> Any else I should look at?
>>
>> Thx!
>>
>> J-D
>>
>> On Wed, Jan 13, 2010 at 11:19 AM, Nick Bailey <nickb@mailtrust.com> wrote:
>>>
>>> So the solution for us was to just nuke zookeeper and restart everywhere.
>>>  We will also be upgrading soon as well.
>>>
>>> To answer your question, yes I believe all the servers were running
>>> normally
>>> except for the fact that they were experiencing high CPU usage.  As we
>>> began
>>> to see some CPU alerts I started restarting some of the servers.
>>>
>>> It was then that we noticed that they were not actually running according
>>> to
>>> 'stat'.
>>>
>>> I still have the log from one server with a debug level and the rest with
>>> a
>>> warn level. If you would like to see any of these and analyze them just
>>> let
>>> me know.
>>>
>>> Thanks for the help,
>>> Nick Bailey
>>>
>

Mime
View raw message