zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: Too many "Broken pipe" in ZooKeeper log
Date Wed, 17 Sep 2014 13:23:18 GMT
I haven't see any issues with large hbase clusters, however the the
hbase community might be a better place to get more insight into that.

Patrick

On Wed, Sep 17, 2014 at 1:55 AM, tobe <tobeg3oogle@gmail.com> wrote:
> @patrick Thanks for your concern and I feel sorry about this. The reason of
> my problem is nothing about ZooKeeper. It's TaoKeeper which is used to
> monitor ZooKeeper and send these commands periodically. Large number of
> watches(more than 700K) may cause Broken Pipe when using wchs to get the
> detailed status of znodes.
>
> The large number of znodes are from HBase. We will store all the files for
> replication in ZooKeeper and it will be large when the replication is out
> of date. And I'm wondering if the HBase cluster scales to thousands of
> servers, can ZooKeeper handle all the meta data of HBase?
>
>
> On Wed, Sep 17, 2014 at 2:47 PM, tobe <tobeg3oogle@gmail.com> wrote:
>
>> @patrick The ports of these commands are sequential. Here're 48668, 48669,
>> 48670. So is it the periodically check of ZooKeeper? We mainly use
>> ZooKeeper for HDFS and HBase. Other processes in this server will not sent
>> 4lw to ZooKeeper.
>>
>>
>> On Wed, Sep 17, 2014 at 2:19 PM, tobe <tobeg3oogle@gmail.com> wrote:
>>
>>>
>>> I have found that the command is called from the same host and run
>>> `stat`, `wchs`, `wchc` and `cons` as well. Here'is log from another
>>> ZooKeeper cluster. It works well because it doesn't have a lot of znodes.
>>> But I wonder why it call the commands frequently? We're using
>>> ZooKeeper-3.4.4 and any patch about this?
>>>
>>> Log from 10.0.4.161 ZooKeeper:
>>> 2014-09-17,08:26:38,336 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Processing stat command from /10.0.4.161:48668
>>> 2014-09-17,08:26:38,336 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Stat command output
>>> 2014-09-17,08:26:38,337 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Closed socket connection for client /10.0.4.161:48668 (no
>>> session established for client)
>>> 2014-09-17,08:26:38,398 INFO
>>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket
>>> connection from /10.0.4.172:62393
>>> 2014-09-17,08:26:38,398 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] Client attempting to establish new session at /10.0.4.172:62393
>>> 2014-09-17,08:26:38,399 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] Established session 0x4833538b30989d with negotiated timeout 30000
>>> for client /10.0.4.172:62393
>>> 2014-09-17,08:26:38,400 INFO
>>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0]
>>> Successfully authenticated client:
>>> authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP;
>>> authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP.
>>> 2014-09-17,08:26:38,400 INFO
>>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0]
>>> Setting authorizedID: hbase_srv
>>> 2014-09-17,08:26:38,400 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] adding SASL authorization for authorizationID: hbase_srv
>>> 2014-09-17,08:26:38,403 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Closed socket connection for client /10.0.4.172:62393 which had
>>> sessionid 0x4833538b30989d
>>> 2014-09-17,08:26:38,526 INFO
>>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket
>>> connection from /10.0.4.161:48669
>>> 2014-09-17,08:26:38,526 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Processing wchs command from /10.0.4.161:48669
>>> 2014-09-17,08:26:38,527 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Closed socket connection for client /10.0.4.161:48669 (no
>>> session established for client)
>>> 2014-09-17,08:26:38,587 INFO
>>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket
>>> connection from /10.0.4.171:50026
>>> 2014-09-17,08:26:38,588 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] Client attempting to establish new session at /10.0.4.171:50026
>>> 2014-09-17,08:26:38,588 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] Established session 0x4833538b30989e with negotiated timeout 30000
>>> for client /10.0.4.171:50026
>>> 2014-09-17,08:26:38,589 INFO
>>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0]
>>> Successfully authenticated client:
>>> authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP;
>>> authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP.
>>> 2014-09-17,08:26:38,589 INFO
>>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0]
>>> Setting authorizedID: hbase_srv
>>> 2014-09-17,08:26:38,589 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] adding SASL authorization for authorizationID: hbase_srv
>>> 2014-09-17,08:26:38,592 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Closed socket connection for client /10.0.4.171:50026 which had
>>> sessionid 0x4833538b30989e
>>> 2014-09-17,08:26:38,613 INFO
>>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket
>>> connection from /10.0.4.172:62397
>>> 2014-09-17,08:26:38,614 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] Client attempting to establish new session at /10.0.4.172:62397
>>> 2014-09-17,08:26:38,614 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] Established session 0x4833538b30989f with negotiated timeout 30000
>>> for client /10.0.4.172:62397
>>> 2014-09-17,08:26:38,615 INFO
>>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0]
>>> Successfully authenticated client:
>>> authenticationID=hbase_srv/hadoop@XIAOMI.HADOOP;
>>> authorizationID=hbase_srv/hadoop@XIAOMI.HADOOP.
>>> 2014-09-17,08:26:38,615 INFO
>>> org.apache.zookeeper.server.auth.SaslServerCallbackHandler: [myid:0]
>>> Setting authorizedID: hbase_srv
>>> 2014-09-17,08:26:38,615 INFO org.apache.zookeeper.server.ZooKeeperServer:
>>> [myid:0] adding SASL authorization for authorizationID: hbase_srv
>>> 2014-09-17,08:26:38,618 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Closed socket connection for client /10.0.4.172:62397 which had
>>> sessionid 0x4833538b30989f
>>> 2014-09-17,08:26:38,707 INFO
>>> org.apache.zookeeper.server.NIOServerCnxnFactory: [myid:0] Accepted socket
>>> connection from /10.0.4.161:48670
>>> 2014-09-17,08:26:38,708 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>> [myid:0] Processing wchc command from /10.0.4.161:48670
>>>
>>>
>>> On Wed, Sep 17, 2014 at 1:00 PM, Patrick Hunt <phunt@apache.org> wrote:
>>>
>>>> Looks like a client is calling the "dump" 4lw and not waiting for the
>>>> results before closing the socket. Try tracking down where that
>>>> command is called from. The logs should have something like this in
>>>> it:
>>>>
>>>> 2014-09-16 21:59:11,407 [myid:1] - INFO
>>>> [NIOWorkerThread-2:NIOServerCnxn@835] - Processing dump command from
>>>> /127.0.0.1:51740
>>>>
>>>> (note the ip address)
>>>>
>>>> Patrick
>>>>
>>>> On Tue, Sep 16, 2014 at 7:57 PM, tobe <tobeg3oogle@gmail.com> wrote:
>>>> > Can any one help to explain why I get so many "Broken pipe IOE" in
>>>> > ZooKeeper log?
>>>> >
>>>> > ZooKeeper throws this exception almost every minute. I don't think we
>>>> use
>>>> > the four letter command to dumpWatches so frequently. So what does this
>>>> > mean?
>>>> >
>>>> > 2014-09-17,10:52:09,179 ERROR
>>>> org.apache.zookeeper.server.NIOServerCnxn:
>>>> > [myid:0] Error sending data synchronously
>>>> > java.io.IOException: Broken pipe
>>>> >         at sun.nio.ch.FileDispatcher.write0(Native Method)
>>>> >         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>>>> >         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
>>>> >         at sun.nio.ch.IOUtil.write(IOUtil.java:40)
>>>> >         at
>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474)
>>>> >         at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
>>>> >         at java.io.BufferedWriter.write(BufferedWriter.java:212)
>>>> >         at java.io.PrintWriter.write(PrintWriter.java:412)
>>>> >         at java.io.PrintWriter.write(PrintWriter.java:429)
>>>> >         at java.io.PrintWriter.print(PrintWriter.java:559)
>>>> >         at java.io.PrintWriter.println(PrintWriter.java:695)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.WatchManager.dumpWatches(WatchManager.java:166)
>>>> >         at
>>>> > org.apache.zookeeper.server.DataTree.dumpWatches(DataTree.java:1240)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn$WatchCommand.commandRun(NIOServerCnxn.java:722)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:496)
>>>> > 2014-09-17,10:52:09,179 INFO org.apache.zookeeper.server.NIOServerCnxn:
>>>> > [myid:0] Closed socket connection for client /10.20.201.234:53756
>>>> which had
>>>> > sessionid 0x34840357f664081
>>>> > 2014-09-17,10:52:09,179 ERROR
>>>> org.apache.zookeeper.server.NIOServerCnxn:
>>>> > [myid:0] Error sending data synchronously
>>>> > java.io.IOException: Broken pipe
>>>> >         at sun.nio.ch.FileDispatcher.write0(Native Method)
>>>> >         at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:29)
>>>> >         at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:69)
>>>> >         at sun.nio.ch.IOUtil.write(IOUtil.java:40)
>>>> >         at
>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:336)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn.sendBufferSync(NIOServerCnxn.java:138)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.checkFlush(NIOServerCnxn.java:453)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn$SendBufferWriter.write(NIOServerCnxn.java:474)
>>>> >         at java.io.BufferedWriter.flushBuffer(BufferedWriter.java:111)
>>>> >         at java.io.BufferedWriter.flush(BufferedWriter.java:235)
>>>> >         at java.io.PrintWriter.flush(PrintWriter.java:276)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn.cleanupWriterSocket(NIOServerCnxn.java:424)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn.access$000(NIOServerCnxn.java:60)
>>>> >         at
>>>> >
>>>> org.apache.zookeeper.server.NIOServerCnxn$CommandThread.run(NIOServerCnxn.java:500)
>>>>
>>>
>>>
>>

Mime
View raw message