hadoop-zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Pathological ZK cluster: 1 server verbosely WARN'ing, other 2 servers pegging CPU
Date Fri, 30 Apr 2010 17:38:10 GMT

On 04/30/2010 10:16 AM, Aaron Crow wrote:
> Hi Patrick, thanks for your time and detailed questions.
>

No worries. When we hear about an issue we're very interested to 
followup and resolve it, regardless of the source. We take the project 
goals of high reliability/availablity _very_ seriously, we know that our 
users are deploying to mission critical environments.

> We're running on Java build 1.6.0_14-b08, on Ubuntu 4.2.4-1ubuntu3. Below is
> output from a recent stat, and a question about node count. For your other
> questions, I should save your time with a batch reply: I wasn't tracking
> nearly enough things (like logs), so it might not be fruitful to try to
> investigate this failure in detail.
>

NP. If you had those available I would be interested to take a look but 
I think this node count size (below) is probably the issue.

> I've since started rolling out better settings, including the memory and GC
> settings recommended in the wiki, and logging to ROLLINGFILE. Also logging
> and archiving verbose gc. We've been fine tuning our client app with similar
> settings, but my bad that we didn't roll out ZK itself with these settings
> as well. (Oh and I also set my browser's homepage to
> http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing ;-)

Great. That page is really good for challenging assumptions.

> So if we have any more serious issues, I think I should be able to provide
> better details at that time.
>

Great!

> I do have a question about node count, though. Below is output from stat on
> one of our servers. Node count says *1182653*. Really!? Well, from our
> application stand point, we only create and use hundreds of nodes at the
> most. Certainly not any amount in the neighborhood of 1 million. Does this
> demonstrate a serious problem?
>

It certainly looks like this might be the issue. One millions znodes is 
alot, but I've tested with more (upwards of 5 million znodes with 
25million concurrent watches is the most I remember testing) 
successfully. It's really just a matter of tuning the GC and starting 
the JVM with sufficient heap. It could be that the counter is wrong, but 
I doubt it (never seen that before). Is the count approximately the same 
on all servers in the ensemble?

Given that you are only are expecting hundreds of znodes, I suspect that 
you have a "leak" of znodes somewhere.

Suggestion - Fire up the java client shell (bin/zkCli.sh or just run 
ZooKeeperMain directly), connect to one of your servers and use the "ls" 
command to look around. You might be surprised at what you find. (or you 
might find that the count is wrong, etc...) It's a good sanity check.

Patrick

>
> Thanks,
> Aaron
>
> rails_deploy@task01:~/zookeeper-3.3.0/bin$ echo stat | nc 127.0.0.1 2181
>
> Zookeeper version: 3.3.0-925362, built on 03/19/2010 18:38 GMT
>
> Clients:
>
>   /10.0.10.116:34648[1](queued=0,recved=86200,sent=86200)
>
>   /10.0.10.116:45609[1](queued=0,recved=27695,sent=27695)
>
>   /10.0.10.120:48432[1](queued=0,recved=26030,sent=26030)
>
>   /10.0.10.117:53336[1](queued=0,recved=43100,sent=43100)
>
>   /10.0.10.120:35087[1](queued=0,recved=4268,sent=4268)
>
>   /10.0.10.116:49526[1](queued=0,recved=4273,sent=4273)
>
>   /10.0.10.118:45614[1](queued=0,recved=43624,sent=43624)
>
>   /10.0.10.116:45600[1](queued=0,recved=27704,sent=27704)
>
>   /10.0.10.120:48440[1](queued=0,recved=7161,sent=7161)
>
>   /10.0.10.120:48437[1](queued=0,recved=7180,sent=7180)
>
>   /10.0.10.118:59310[1](queued=0,recved=63205,sent=63205)
>
>   /10.0.10.116:51072[1](queued=0,recved=14516,sent=14516)
>
>   /10.0.10.116:51071[1](queued=0,recved=14516,sent=14516)
>
>   /10.0.10.119:34097[1](queued=0,recved=42984,sent=42984)
>
>   /10.0.10.119:41868[1](queued=0,recved=18558,sent=18558)
>
>   /10.0.10.120:48441[1](queued=0,recved=21712,sent=21712)
>
>   /10.0.10.116:49528[1](queued=0,recved=12967,sent=12967)
>
>   /127.0.0.1:37234[1](queued=0,recved=1,sent=0)
>
>
>> Latency min/avg/max: 0/0/105
>
> Received: 497790
>
> Sent: 497788
>
> Outstanding: 0
>
> Zxid: 0xa0003aa4b
>
> Mode: follower
>
> *Node count: 1182653*
>
>
>
>
> On Wed, Apr 28, 2010 at 10:56 PM, Patrick Hunt<phunt@apache.org>  wrote:
>
>> Btw, are you monitoring the ZK server jvms? Please take a look at
>>
>>
>> http://hadoop.apache.org/zookeeper/docs/current/zookeeperAdmin.html#sc_zkCommands
>>
>> It would be interesting if you could run commmands such as "stat" against
>> your currently running cluster. In particular I'd be interested to know what
>> you see for latency and "node count" in the stat response. Run this command
>> against each of your servers in the ensemble. For example high "max latency"
>> might indicate that something (usually swap/gc) is causing the server to
>> respond slowly in some cases.
>>
>> Patrick
>>
>>
>> On 04/28/2010 10:47 PM, Patrick Hunt wrote:
>>
>>> Hi Aaron, some questions/comments below:
>>>
>>> On 04/28/2010 06:29 PM, Aaron Crow wrote:
>>>
>>>> We were running version 3.2.2 for about a month and it was working
>>>> well for
>>>> us. Then late this past Saturday night, our cluster went pathological.
>>>> One
>>>> of the 3 ZK servers spewed many WARNs (see below), and the other 2
>>>> servers
>>>> were almost constantly pegging the CPU. All three servers are on separate
>>>> machines. From what we could tell, the machines were fine... networking
>>>> fine, disk fine, etc. The ZK clients were completely unable to complete
>>>> their connections to ZK.
>>>>
>>>>
>>> These machines are local (not wan) connected then? What OS and java
>>> version are you using?
>>>
>>> Do you see any FATAL or ERROR level messages in the logs?
>>>
>>> It would help to look at your zk config files for these servers. Could
>>> you provide (you might want to create a JIRA first, then just attach
>>> configs and other details/collateral to that, easier than dealing with
>>> email)
>>>
>>> If you have logs for the time period and can share that would be most
>>> useful. (again, gzip and attach to the jira)
>>>
>>>   We tried all sorts of restarts, running zkCleanup, etc. We even
>>>> completely
>>>> shut down our clients... and the pathology continued. Our workaround
>>>> was to
>>>> do an urgent upgrade to version 3.3.0. The new ZK cluster with 3.3.0 has
>>>> been running well for us... so far...
>>>>
>>>>
>>> Off hand and with the data we have so far nothing sticks out that 3.3
>>> would have resolved (JIRA is conveniently down for the last hour or so
>>> so I can't review right now). Although there were some changes to reduce
>>> memory consumption (see below).
>>>
>>>   I realize that, sadly, this message doesn't contain nearly enough
>>>> details to
>>>> trace exactly what happened. I guess I'm wondering if anyone has seen
>>>> this
>>>> general scenario, and/or knows how to prevent? Is there anything we
>>>> might be
>>>> doing client side to trigger this? Our application level request
>>>> frequency
>>>> is maybe a few requests to Zookeeper per second, times 5 clients
>>>> applications. If we detect a SESSION EXPIRED, we do a simple "create new
>>>> client and use that instead". And we were seeing this happen
>>>> occasionally.
>>>>
>>>>
>>> What are the client doing? Do you have a large number/size of znodes?
>>>
>>> Do you see any OutOfMemoryError in the logs?
>>>
>>> Could the ZK server java process be swapping? Are you monitoring GC,
>>> perhaps large GC pauses are happening?
>>>
>>> I have a suspicion that one of a few things might be happening. I see
>>> the following in your original email:
>>>
>>>   >  :FollowerHandler@302] - Sending snapshot last zxid of peer is
>>> 0xd0007d66d
>>>   >  zxid of leader is 0xf00000000
>>>   >  2010-04-24 23:06:03,254 - ERROR [FollowerHandler-/10.0.10.116:34405
>>>   >  :FollowerHandler@415] - Unexpected exception causing shutdown while
>>> sock
>>>   >  still open
>>>   >  java.net.SocketException: Broken pipe
>>>   >  at java.net.SocketOutputStream.socketWrite0(Native Method)
>>>   >  at
>>>
>>> ^^^^ this log section is saying that a ZK server (follower) connected to
>>> the elected leader and started to download (leader is sending here) the
>>> latest ZK snapshot (db) to the follower while it is recovering. However
>>> before the follower fully downloads the snapshot the connection (pipe)
>>> is broken and this operation fails. At this point the leader itself will
>>> probably give up the lead, a new election will happen, and it might be
>>> the case that this same problem occurs - the follower tries to download
>>> the snapshot but it again fails (and this loops forever)
>>>
>>> This issue can happen if the servers are slowed, either by slow
>>> interconnects (wan), network connectivity/perf problems, swapping of the
>>> JVM, and most commonly GC in the VM.
>>>
>>> See this recent case:
>>> http://www.mail-archive.com/zookeeper-dev@hadoop.apache.org/msg08083.html
>>>
>>> What is the size of your snapshot file? (you can see this in your
>>> zookeeper datadirectory, unless you wiped this during your attempt to
>>> cleanup/recover/upgrade?).
>>>
>>> Seeing your config file, and getting a sense of the snapshot file size
>>> will give us more insight.
>>>
>>> Patrick
>>>
>>>   Many, many thanks in advance for any insights or advice.
>>>>
>>>>
>>>> Example log output from the ZK server that spewed many WARNs:
>>>>
>>>> ===
>>>>
>>>> 2010-04-24 22:31:24,551 - WARN
>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@318] - Exception when
>>>> following
>>>> the leader
>>>> java.net.SocketTimeoutException: Read timed out
>>>> at java.net.SocketInputStream.socketRead0(Native Method)
>>>> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>>>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>> at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
>>>> at java.io.DataInputStream.readInt(DataInputStream.java:370)
>>>> at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
>>>> at
>>>>
>>>> org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:66)
>>>>
>>>> at
>>>>
>>>> org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:108)
>>>>
>>>> at
>>>> org.apache.zookeeper.server.quorum.Follower.readPacket(Follower.java:114)
>>>> at
>>>>
>>>> org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:193)
>>>>
>>>> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:525)
>>>> 2010-04-24 22:31:24,552 - INFO
>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@436] - shutdown called
>>>> java.lang.Exception: shutdown Follower
>>>> at
>>>> org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:436)
>>>> at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:529)
>>>> 2010-04-24 22:31:24,552 - INFO
>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@514] - LOOKING
>>>>
>>>> ...
>>>>
>>>> 2010-04-24 22:31:27,257 - WARN
>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@518]
>>>> - Exception causing close of session 0x0 due to java.io.IOException:
>>>> ZooKeeperServer not running
>>>> 2010-04-24 22:31:27,257 - INFO
>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@857]
>>>> - closing session:0x0 NIOServerCnxn:
>>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:2181remote=/
>>>> 127.0.0.1:44589]
>>>> 2010-04-24 22:31:27,705 - INFO
>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@706] -
>>>> Notification:
>>>> 3, 60129542145, 2, 1, LOOKING, LEADING, 3
>>>> 2010-04-24 22:31:27,705 - INFO
>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:QuorumPeer@523] - FOLLOWING
>>>> 2010-04-24 22:31:27,705 - INFO
>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:ZooKeeperServer@160] - Created server
>>>> 2010-04-24 22:31:27,706 - INFO
>>>> [QuorumPeer:/0:0:0:0:0:0:0:0:2181:Follower@147] - Following sum08/
>>>> 10.0.10.118:2888
>>>> 2010-04-24 22:31:29,040 - WARN
>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@518]
>>>> - Exception causing close of session 0x0 due to java.io.IOException:
>>>> ZooKeeperServer not running
>>>> 2010-04-24 22:31:29,040 - INFO
>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@857]
>>>> - closing session:0x0 NIOServerCnxn:
>>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:2181remote=/
>>>> 127.0.0.1:44591]
>>>> 2010-04-24 22:31:31,069 - WARN
>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@518]
>>>> - Exception causing close of session 0x0 due to java.io.IOException:
>>>> ZooKeeperServer not running
>>>> 2010-04-24 22:31:31,069 - INFO
>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@857]
>>>> - closing session:0x0 NIOServerCnxn:
>>>> java.nio.channels.SocketChannel[connected local=/127.0.0.1:2181remote=/
>>>> 127.0.0.1:44592]
>>>> 2010-04-24 22:31:32,589 - WARN
>>>> [NIOServerCxn.Factory:2181:NIOServerCnxn@518]
>>>> - Exception causing close of session 0x0 due to java.io.IOException:
>>>> ZooKeeperServer not running
>>>>
>>>> ===
>>>>
>>>> Example log output from a server that was pegging CPU:
>>>>
>>>> 010-04-24 23:05:27,911 - INFO [WorkerReceiver
>>>> Thread:FastLeaderElection$Messenger$WorkerReceiver@254] - Sending new
>>>> notification.
>>>> 2010-04-24 23:05:27,912 - INFO [WorkerReceiver
>>>> Thread:FastLeaderElection$Messenger$WorkerReceiver@254] - Sending new
>>>> notification.
>>>> 2010-04-24 23:05:27,910 - WARN
>>>> [Thread-15:QuorumCnxManager$SendWorker@539]
>>>> - Interrupted while waiting for message on queue
>>>> java.lang.InterruptedException
>>>> at
>>>>
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.reportInterruptAfterWait(AbstractQueuedSynchronizer.java:1899)
>>>>
>>>> at
>>>>
>>>> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1934)
>>>>
>>>> at
>>>> java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:317)
>>>> at
>>>>
>>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:533)
>>>>
>>>> 2010-04-24 23:05:27,912 - INFO [WorkerReceiver
>>>> Thread:FastLeaderElection$Messenger$WorkerReceiver@254] - Sending new
>>>> notification.
>>>> 2010-04-24 23:05:27,912 - WARN
>>>> [Thread-16:QuorumCnxManager$RecvWorker@623]
>>>> - Connection broken:
>>>> java.nio.channels.AsynchronousCloseException
>>>> at
>>>>
>>>> java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185)
>>>>
>>>> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:263)
>>>> at
>>>>
>>>> org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:594)
>>>>
>>>> 2010-04-24 23:05:27,912 - WARN
>>>> [Thread-15:QuorumCnxManager$SendWorker@554]
>>>> - Send worker leaving thread
>>>>
>>>> ...
>>>>
>>>> 010-04-24 23:05:51,952 - INFO [WorkerReceiver
>>>> Thread:FastLeaderElection$Messenger$WorkerReceiver@254] - Sending new
>>>> notification.
>>>> 2010-04-24 23:05:57,148 - INFO [WorkerReceiver
>>>> Thread:FastLeaderElection$Messenger$WorkerReceiver@254] - Sending new
>>>> notification.
>>>> 2010-04-24 23:06:03,243 - INFO [FollowerHandler-/10.0.10.116:34405
>>>> :FollowerHandler@227] - Follower sid: 1 : info :
>>>> org.apache.zookeeper.server.quorum.QuorumPeer$QuorumServer@364641
>>>> 2010-04-24 23:06:03,244 - WARN [FollowerHandler-/10.0.10.116:34405
>>>> :FollowerHandler@302] - Sending snapshot last zxid of peer is
>>>> 0xd0007d66d
>>>> zxid of leader is 0xf00000000
>>>> 2010-04-24 23:06:03,254 - ERROR [FollowerHandler-/10.0.10.116:34405
>>>> :FollowerHandler@415] - Unexpected exception causing shutdown while sock
>>>> still open
>>>> java.net.SocketException: Broken pipe
>>>> at java.net.SocketOutputStream.socketWrite0(Native Method)
>>>> at
>>>> java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:92)
>>>> at java.net.SocketOutputStream.write(SocketOutputStream.java:136)
>>>> at
>>>> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
>>>> at java.io.BufferedOutputStream.write(BufferedOutputStream.java:109)
>>>> at java.io.DataOutputStream.writeLong(DataOutputStream.java:207)
>>>> at
>>>>
>>>> org.apache.jute.BinaryOutputArchive.writeLong(BinaryOutputArchive.java:59)
>>>>
>>>> at org.apache.zookeeper.server.DataNode.serialize(DataNode.java:129)
>>>> at
>>>>
>>>> org.apache.jute.BinaryOutputArchive.writeRecord(BinaryOutputArchive.java:123)
>>>>
>>>> at
>>>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:917)
>>>> at
>>>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:929)
>>>> at
>>>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:929)
>>>> at
>>>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:929)
>>>> at
>>>> org.apache.zookeeper.server.DataTree.serializeNode(DataTree.java:929)
>>>> at org.apache.zookeeper.server.DataTree.serialize(DataTree.java:979)
>>>> at
>>>>
>>>> org.apache.zookeeper.server.util.SerializeUtils.serializeSnapshot(SerializeUtils.java:104)
>>>>
>>>> at
>>>>
>>>> org.apache.zookeeper.server.ZooKeeperServer.serializeSnapshot(ZooKeeperServer.java:272)
>>>>
>>>> at
>>>>
>>>> org.apache.zookeeper.server.quorum.FollowerHandler.run(FollowerHandler.java:307)
>>>>
>>>> 2010-04-24 23:06:03,255 - WARN [FollowerHandler-/10.0.10.116:34405
>>>> :FollowerHandler@421] - ******* GOODBYE /10.0.10.116:34405 ********
>>>> 2010-04-24 23:06:03,550 - INFO [WorkerReceiver
>>>> Thread:FastLeaderElection$Messenger$WorkerReceiver@254] - Sending new
>>>> notification.
>>>>
>>>>
>

Mime
View raw message