Mailing-List: contact zookeeper-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: zookeeper-user@hadoop.apache.org
Received-SPF: neutral (athena.apache.org: local policy)
Message-ID: <4B4F5F25.7060205@apache.org>
Date: Thu, 14 Jan 2010 10:15:01 -0800
From: Patrick Hunt <phunt@apache.org>
User-Agent: Thunderbird 2.0.0.23 (X11/20090817)
MIME-Version: 1.0
To: zookeeper-user@hadoop.apache.org
Subject: Re: Killing a zookeeper server
References: <C7737D94.2CA42%mahadev@yahoo-inc.com>
 <1263421188.526815076@192.168.2.227>
In-Reply-To: <1263421188.526815076@192.168.2.227>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

ruok basically is polling to see if the ZK process is ok, which it is, 
it's just that zk is not part of a quorum (which is potentially a 
problem, but really fine in the sense that it's an expected state).
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands

stat is more focused on the typical *stat like information (in our case 
znode tree summary, some quorum, and connection summary).

I think what we are missing is more of an "elcn" or "election status", 
perhaps "qurm" or "quorum status" is really what we need to add. I'd be 
reticent to change the existing stuff since wrt b/w compat (ppl 
integrating into existing tools like ganglia, and other netmons.)

Also the docs are fairly clear on ruok:
http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
could be more clear though (ff to enter a jira to improve) regarding the 
diff between stat & ruok. Also if we add another elcn/qurm type command 
that would go here as well.

Patrick

Nick Bailey wrote:
> Additionally, in this case the stat command simply stated that the
> zookeeper server was not running, which is obviously in conflict with
> an 'imok' response.  In order to determine that the server was
> attempting to rejoin/elect a leader I needed to enable debug logging,
> restart the process, and check the log.
> 
> So from a user perspective some more info from the stat command in
> this case would be very helpful.
> 
> -Nick
> 
> 
> On Wednesday, January 13, 2010 4:39pm, "Mahadev Konar"
> <mahadev@yahoo-inc.com> said:
> 
>> HI Adam, The 'ruok' command at the server does not have much checks
>> in place (infact it does not have any checks in place). It just
>> gives you an idea that the server is really running. The 'stat'
>> command is meant to analyze the server's state.
>> 
>> 
>> So you should infact file an improvement wherein there are atleast
>> some checks in place for 'ruok' command to return 'imok', so that
>> the users know there servers are running healthy.
>> 
>> 
>> Thanks mahadev
>> 
>> 
>> On 1/13/10 1:29 PM, "Adam Rosien" <adam@rosien.net> wrote:
>> 
>>> Yes, leader election either failed or was in progress yet 'ruok' 
>>> returned 'imok'.
>>> 
>>> .. Adam
>>> 
>>> On Wed, Jan 13, 2010 at 1:08 PM, Mahadev Konar
>>> <mahadev@yahoo-inc.com> wrote:
>>>> Hi Adam, That seems fair to file as an improvement. Running
>>>> 'stat' did return the right stats right? Saying the servers
>>>> werent able to elect a leader?
>>>> 
>>>> mahadev
>>>> 
>>>> 
>>>> On 1/13/10 11:52 AM, "Adam Rosien" <adam@rosien.net> wrote:
>>>> 
>>>>> On a related note, it was initially confusing to me that the
>>>>> server returned 'imok' when it wasn't part of the quorum. I
>>>>> realize the internal checks are probably in separate areas of
>>>>> the code, but if others feel similarly I could file an
>>>>> improvement in JIRA.
>>>>> 
>>>>> .. Adam
>>>>> 
>>>>> On Wed, Jan 13, 2010 at 11:19 AM, Nick Bailey
>>>>> <nickb@mailtrust.com> wrote:
>>>>>> So the solution for us was to just nuke zookeeper and
>>>>>> restart everywhere. We will also be upgrading soon as well.
>>>>>> 
>>>>>> 
>>>>>> To answer your question, yes I believe all the servers were
>>>>>> running normally except for the fact that they were
>>>>>> experiencing high CPU usage.  As we began to see some CPU
>>>>>> alerts I started restarting some of the servers.
>>>>>> 
>>>>>> It was then that we noticed that they were not actually
>>>>>> running according to 'stat'.
>>>>>> 
>>>>>> I still have the log from one server with a debug level and
>>>>>> the rest with a warn level. If you would like to see any of
>>>>>> these and analyze them just let me know.
>>>>>> 
>>>>>> Thanks for the help, Nick Bailey
>>>>>> 
>>>>>> On Jan 12, 2010, at 8:20 PM, Patrick Hunt
>>>>>> <phunt@apache.org> wrote:
>>>>>> 
>>>>>>> Nick Bailey wrote:
>>>>>>>> In my last email I failded to include a log line that
>>>>>>>> may be revelent as well 2010-01-12 18:33:10,658
>>>>>>>> [QuorumPeer:/0.0.0.0:2181] (QuorumCnxManager) DEBUG -
>>>>>>>> Queue size: 0 2010-01-12 18:33:10,659
>>>>>>>> [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO  -
>>>>>>>> Notification time out: 6400
>>>>>>> Yes, that is significant/interesting. I believe this
>>>>>>> means that there is some problem with the election
>>>>>>> process (ie the server re-joining the ensemble). We have
>>>>>>> a backoff on these attempts, which matches your 
>>>>>>> description below. We have fixed some election issues in
>>>>>>> recent versions (we introduced fault injection testing
>>>>>>> prior to the 3.2.1 release which found a few issues with
>>>>>>> election). I don't have them off hand - but I've asked 
>>>>>>> Flavio to comment directly (he's in diff tz).
>>>>>>> 
>>>>>>> Can you provide a bit more background: prior to this
>>>>>>> issue, this particular server was running fine? You
>>>>>>> restarted it and then started seeing the issue? (rather
>>>>>>> than this being a new server I mean). What I'm getting at
>>>>>>>  is that there shouldn't/couldn't be any
>>>>>>> networking/firewall type issue going on right?
>>>>>>> 
>>>>>>> Can you provide a full/more log? What I'd suggest is shut
>>>>>>> down this one server, clear the log4j log file, then
>>>>>>> restart it. Let the problem reproduce, then gzip the
>>>>>>> log4j log file and attach to your response. Ok?
>>>>>>> 
>>>>>>> Patrick
>>>>>>> 
>>>>>>>> We see this line occur frequently and the timeout will
>>>>>>>> graduatlly increase to 60000. It appears that all of
>>>>>>>> our servers that seem to be acting normally are
>>>>>>>> experiencing the cpu issue I mentioned earlier 
>>>>>>>> 'https://issues.apache.org/jira/browse/ZOOKEEPER-427'.
>>>>>>>> Perhaps that is causing the timeout in responding? Also
>>>>>>>> to answer your other questions Patrick, we aren't
>>>>>>>> storing a large amount of data really and network
>>>>>>>> latency appears fine. Thanks for the help, Nick 
>>>>>>>> -----Original Message----- From: "Nick Bailey"
>>>>>>>> <nicholas.bailey@rackspace.com> Sent: Tuesday, January
>>>>>>>> 12, 2010 6:03pm To: zookeeper-user@hadoop.apache.org 
>>>>>>>> Subject: Re: Killing a zookeeper server 12 was just to
>>>>>>>> keep uniformity on our servers. Our clients are
>>>>>>>> connecting from the same 12 servers.  Easily modifiable
>>>>>>>> and perhaps we should look into changing that. The logs
>>>>>>>> just seem to indicate that the servers that claim to
>>>>>>>> have no server running are continually attempting to
>>>>>>>> elect a leader.  A sample is provided below.  The
>>>>>>>> initial exception is something we see regularly in our 
>>>>>>>> logs and the debug and info lines following are simply
>>>>>>>> repeating throughout the log. 2010-01-12 17:55:02,269
>>>>>>>> [NIOServerCxn.Factory:2181] (NIOServerCnxn) WARN -
>>>>>>>> Exception causing close of session 0x0 due to
>>>>>>>> java.io.IOException: Read error 2010-01-12 17:55:02,269
>>>>>>>> [NIOServerCxn.Factory:2181] (NIOServerCnxn) DEBUG -
>>>>>>>> IOException stack trace java.io.IOException: Read error
>>>>>>>>  at 
>>>>>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:295)
>>>>>>>>  at 
>>>>>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
>>>>>>>>  16 2) 2010-01-12 17:55:02,269
>>>>>>>> [NIOServerCxn.Factory:2181] (NIOServerCnxn) INFO -
>>>>>>>> closing session:0x0 NIOServerCnxn: 
>>>>>>>> java.nio.channels.SocketChannel[connected
>>>>>>>> local=/172.20.36.9:2181 remote=/172.20.36.9:50367] 
>>>>>>>> 2010-01-12 17:55:02,270 [NIOServerCxn.Factory:2181]
>>>>>>>> (NIOServerCnxn) DEBUG - ignoring exception during input
>>>>>>>> shutdown java.net.SocketException: Transport endpoint
>>>>>>>> is not connected at
>>>>>>>> sun.nio.ch.SocketChannelImpl.shutdown(Native Method) at
>>>>>>>>  
>>>>>>>> sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640)
>>>>>>>>  at 
>>>>>>>> sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
>>>>>>>>  at 
>>>>>>>> org.apache.zookeeper.server.NIOServerCnxn.close(NIOServerCnxn.java:767)
>>>>>>>>  at 
>>>>>>>> org.apache.zookeeper.server.NIOServerCnxn.doIO(NIOServerCnxn.java:421)
>>>>>>>>  at 
>>>>>>>> org.apache.zookeeper.server.NIOServerCnxn$Factory.run(NIOServerCnxn.java:
>>>>>>>>  16 2) 2010-01-12 17:55:52,181
>>>>>>>> [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO  -
>>>>>>>> Adding vote 2010-01-12 17:55:52,181
>>>>>>>> [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO  -
>>>>>>>> Notification: 3, 30064826826, 1, 9, LOOKING, LOOKING, 9
>>>>>>>>  2010-01-12 17:55:52,181 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) DEBUG - id: 3, proposed id: 3,
>>>>>>>> zxid: 30064826826, proposed zxid: 30064826826 
>>>>>>>> 2010-01-12 17:55:52,181 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Adding vote 2010-01-12
>>>>>>>> 17:55:52,182 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Notification: 3,
>>>>>>>> 30064826826, 1, 9, LOOKING, LOOKING, 11 2010-01-12
>>>>>>>> 17:55:52,182 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) DEBUG - id: 3, proposed id: 3,
>>>>>>>> zxid: 30064826826, proposed zxid: 30064826826 
>>>>>>>> 2010-01-12 17:55:52,182 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Adding vote 2010-01-12
>>>>>>>> 17:55:52,183 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Notification: 3,
>>>>>>>> 30064826826, 1, 9, LOOKING, LOOKING, 12 2010-01-12
>>>>>>>> 17:55:52,183 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) DEBUG - id: 3, proposed id: 3,
>>>>>>>> zxid: 30064826826, proposed zxid: 30064826826 
>>>>>>>> 2010-01-12 17:55:52,183 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Adding vote 2010-01-12
>>>>>>>> 17:56:52,190 [WorkerReceiver Thread] 
>>>>>>>> (FastLeaderElection$Messenger$WorkerReceiver) DEBUG -
>>>>>>>> Receive new message. 2010-01-12 17:56:52,190
>>>>>>>> [WorkerReceiver Thread] 
>>>>>>>> (FastLeaderElection$Messenger$WorkerReceiver) DEBUG -
>>>>>>>> Receive new message. 2010-01-12 17:56:52,191
>>>>>>>> [QuorumPeer:/0.0.0.0:2181] (FastLeaderElection) INFO  -
>>>>>>>> Notification: 3, 30064826826, 1, 9, LOOKING, LOOKING, 8
>>>>>>>>  2010-01-12 17:56:52,191 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) DEBUG - id: 3, proposed id: 3,
>>>>>>>> zxid: 30064826826, proposed zxid: 30064826826 
>>>>>>>> 2010-01-12 17:56:52,191 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Adding vote 2010-01-12
>>>>>>>> 17:56:52,192 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Notification: 3,
>>>>>>>> 30064826826, 1, 9, LOOKING, LOOKING, 12 2010-01-12
>>>>>>>> 17:56:52,192 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) DEBUG - id: 3, proposed id: 3,
>>>>>>>> zxid: 30064826826, proposed zxid: 30064826826 
>>>>>>>> 2010-01-12 17:56:52,192 [QuorumPeer:/0.0.0.0:2181]
>>>>>>>> (FastLeaderElection) INFO  - Adding vote 2010-01-12
>>>>>>>> 17:57:52,200 [WorkerReceiver Thread] 
>>>>>>>> (FastLeaderElection$Messenger$WorkerReceiver) DEBUG -
>>>>>>>> Receive new message. 2010-01-12 17:57:52,201
>>>>>>>> [WorkerReceiver Thread] 
>>>>>>>> (FastLeaderElection$Messenger$WorkerReceiver) DEBUG -
>>>>>>>> Receive new message. -----Original Message----- From:
>>>>>>>> "Patrick Hunt" <phunt@apache.org> Sent: Tuesday,
>>>>>>>> January 12, 2010 5:40pm To:
>>>>>>>> zookeeper-user@hadoop.apache.org,
>>>>>>>> nicholas.bailey@rackspace.com Subject: Re: Killing a
>>>>>>>> zookeeper server 12 servers? That's alot, if you dont'
>>>>>>>> mind my asking why so many? Typically we recommend 5 -
>>>>>>>> that way you can have one down for maintenance and
>>>>>>>> still have a failure that doesn't bring down the
>>>>>>>> cluster. The "electing a leader" is probably the
>>>>>>>> restarted machine attempting to re-join the ensemble
>>>>>>>> (it should join as a follower if you have a leader 
>>>>>>>> already elected, given that it's xid is behind the
>>>>>>>> existing leader.) Hard to tell though without the logs.
>>>>>>>>  You might also be seeing the initLimit exceeded, is
>>>>>>>> the data you are storing in ZK large? Or perhaps
>>>>>>>> network connectivity is slow?
>>>>>>>> 
>>>>>>>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_cl
>>>>>>>>  us terOptions again the logs would give some insight
>>>>>>>> on this. Patrick Nick Bailey wrote:
>>>>>>>>> We are running zookeeper 3.1.0
>>>>>>>>> 
>>>>>>>>> Recently we noticed the cpu usage on our machines
>>>>>>>>> becoming increasingly high and we believe the cause
>>>>>>>>> is
>>>>>>>>> 
>>>>>>>>> https://issues.apache.org/jira/browse/ZOOKEEPER-427
>>>>>>>>> 
>>>>>>>>> However our solution when we noticed the problem was
>>>>>>>>> to kill the zookeeper process and restart it.
>>>>>>>>> 
>>>>>>>>> After doing that though it looks like the newly
>>>>>>>>> restarted zookeeper server is continually attempting
>>>>>>>>> to elect a leader even though one already exists.
>>>>>>>>> 
>>>>>>>>> The process responses with 'imok' when asked, but the
>>>>>>>>> stat command returns 'ZooKeeperServer not running'.
>>>>>>>>> 
>>>>>>>>> I belive that killing the current leader should
>>>>>>>>> trigger all servers to do an election and solve the
>>>>>>>>> problem, but I'm not sure. Should that be the course
>>>>>>>>> of action in this situation?
>>>>>>>>> 
>>>>>>>>> Also we have 12 servers, but 5 are currently not
>>>>>>>>> running according to stat.  So I guess this isn't a
>>>>>>>>> problem unless we lose another one. We have plans to
>>>>>>>>> upgrade zookeeper to solve the cpu issue but haven't 
>>>>>>>>> been able to do that yet.
>>>>>>>>> 
>>>>>>>>> Any help appreciated, Nick Bailey
>>>>>>>>> 
>>>> 
>> 
> 
>