hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Killing a zookeeper server
Date Mon, 25 Jan 2010 18:49:10 GMT
1) Capture the logs from all 5 servers
2) give the config for the "down" server, also indicate that it's server 
id is.
3) if possible it would be interesting to see the netstat information 
from 2 of the servers - the one that's down and one or more of the others.

Patrick

Jean-Daniel Cryans wrote:
> I believe we've just hit the same problem with zk-3.2.1
> 
> For some reason a machine crashed and it was part of our quorum of 5
> servers. When we try to restart it it this does this (I replaced
> hostname and IP):
> 
> 2010-01-25 10:25:06,469 WARN
> org.apache.zookeeper.server.quorum.QuorumCnxManager: Cannot open
> channel to 1 at election address somehost1/someip1:3888
> java.net.ConnectException: Connection refused
>         at sun.nio.ch.Net.connect(Native Method)
>         at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:507)
>         at java.nio.channels.SocketChannel.open(SocketChannel.java:146)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectOne(QuorumCnxManager.java:323)
>         at org.apache.zookeeper.server.quorum.QuorumCnxManager.connectAll(QuorumCnxManager.java:356)
>         at org.apache.zookeeper.server.quorum.FastLeaderElection.lookForLeader(FastLeaderElection.java:603)
>         at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:488)
> 
> It has been like that for almost 20 minutes now, trying every other
> server in the quorum on different channels. ruok says imok but all
> other commands say that ZK server isn't running. I don't believe that
> 3.2.2 will help unless ZK-547 does more than it seems to.
> 
> Any else I should look at?
> 
> Thx!
> 
> J-D
> 
> On Wed, Jan 13, 2010 at 11:19 AM, Nick Bailey <nickb@mailtrust.com> wrote:
>> So the solution for us was to just nuke zookeeper and restart everywhere.
>>  We will also be upgrading soon as well.
>>
>> To answer your question, yes I believe all the servers were running normally
>> except for the fact that they were experiencing high CPU usage.  As we began
>> to see some CPU alerts I started restarting some of the servers.
>>
>> It was then that we noticed that they were not actually running according to
>> 'stat'.
>>
>> I still have the log from one server with a debug level and the rest with a
>> warn level. If you would like to see any of these and analyze them just let
>> me know.
>>
>> Thanks for the help,
>> Nick Bailey
>>

Mime
View raw message