hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lachisis (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-3795) ZKRMStateStore crashes due to IOException: Broken pipe
Date Thu, 11 Jun 2015 06:35:00 GMT

    [ https://issues.apache.org/jira/browse/YARN-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14581531#comment-14581531
] 

lachisis commented on YARN-3795:
--------------------------------

I have found ZOOKEEPER-706, this means if zookeeper server receive a request which the body
size is larger than 1M, the server will throw exception "Broken pipe" to reject the request.
this feature is used to limit the body size of Znode.

By scanning the zookeeper snapshot, I do not find a znode created by ZKRMStateStore which
have large data size. 
Then analyzing code,  I find large numbers of Watcher are set when call function of "loadRMAppState"
and "loadApplicationAttemptState". 



> ZKRMStateStore crashes due to IOException: Broken pipe
> ------------------------------------------------------
>
>                 Key: YARN-3795
>                 URL: https://issues.apache.org/jira/browse/YARN-3795
>             Project: Hadoop YARN
>          Issue Type: Bug
>          Components: resourcemanager
>    Affects Versions: 2.5.0
>            Reporter: lachisis
>            Priority: Critical
>
> 2015-06-05 06:06:54,848 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to dap88/134.41.33.88:2181, initiating session
> 2015-06-05 06:06:54,876 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete
on server dap88/134.41.33.88:2181, sessionid = 0x34db2f72ac50c86, negotiated timeout = 10000
> 2015-06-05 06:06:54,881 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
> 2015-06-05 06:06:54,881 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
ZKRMStateStore Session connected
> 2015-06-05 06:06:54,881 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
ZKRMStateStore Session restored
> 2015-06-05 06:06:54,881 WARN org.apache.zookeeper.ClientCnxn: Session 0x34db2f72ac50c86
for server dap88/134.41.33.88:2181, unexpected error, closing socket connection and attempting
reconnect
> java.io.IOException: Broken pipe
> 	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> 	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
> 	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)
> 2015-06-05 06:06:54,986 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Watcher event type: None with state:Disconnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
> 2015-06-05 06:06:54,986 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
ZKRMStateStore Session disconnected
> 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection
to server dap87/134.41.33.87:2181. Will not attempt to authenticate using SASL (unknown error)
> 2015-06-05 06:06:55,278 INFO org.apache.zookeeper.ClientCnxn: Socket connection established
to dap87/134.41.33.87:2181, initiating session
> 2015-06-05 06:06:55,330 INFO org.apache.zookeeper.ClientCnxn: Session establishment complete
on server dap87/134.41.33.87:2181, sessionid = 0x34db2f72ac50c86, negotiated timeout = 10000
> 2015-06-05 06:06:55,343 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
Watcher event type: None with state:SyncConnected for path:null for Service org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore
in state org.apache.hadoop.yarn.server.resourcemanager.recovery.RMStateStore: STARTED
> 2015-06-05 06:06:55,343 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
ZKRMStateStore Session connected
> 2015-06-05 06:06:55,344 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore:
ZKRMStateStore Session restored
> 2015-06-05 06:06:55,345 WARN org.apache.zookeeper.ClientCnxn: Session 0x34db2f72ac50c86
for server dap87/134.41.33.87:2181, unexpected error, closing socket connection and attempting
reconnect
> java.io.IOException: Broken pipe
> 	at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
> 	at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)
> 	at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:94)
> 	at sun.nio.ch.IOUtil.write(IOUtil.java:65)
> 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:450)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:117)
> 	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
> 	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1075)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message