hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Weiwei Yang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-11845) Ozone: Output error when DN handshakes with SCM
Date Fri, 09 Jun 2017 07:41:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-11845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16044094#comment-16044094
] 

Weiwei Yang commented on HDFS-11845:
------------------------------------

This issue is because the RPC timeout was too small (100ms), the 1st RPC call can't be done
in 100ms on my cluster. Print the stack trace I see following error in client side {{StorageContainerDatanodeProtocolClientSideTranslatorPB}}

{noformat}
com.google.protobuf.ServiceException: java.net.SocketTimeoutException: Call From ozone1.fyre.ibm.com/172.16.165.133
to ozone1.fyre.ibm.com:9861 failed on socket timeout exception: java.net.SocketTimeoutException:
100 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected
local=/172.16.165.133:40202 remote=ozone1.fyre.ibm.com/172.16.165.133:9861]; For more details
see:  http://wiki.apache.org/hadoop/SocketTimeout
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:241)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:115)
	at com.sun.proxy.$Proxy76.getVersion(Unknown Source)
	at org.apache.hadoop.ozone.protocolPB.StorageContainerDatanodeProtocolClientSideTranslatorPB.getVersion(StorageContainerDatanodeProtocolClientSideTranslatorPB.java:108)
	at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:52)
	at org.apache.hadoop.ozone.container.common.states.endpoint.VersionEndpointTask.call(VersionEndpointTask.java:30)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
{noformat}

this caused the warning and the output error on the server side. Increase the rpc timeout
from 100 to 1000 fixed this issue. I think we should increase the default timeout value for
{{OZONE_SCM_HEARTBEAT_RPC_TIMEOUT}}, {{100ms}} is just too aggressive. Uploaded a patch to
fix this.

> Ozone: Output error when DN handshakes with SCM
> -----------------------------------------------
>
>                 Key: HDFS-11845
>                 URL: https://issues.apache.org/jira/browse/HDFS-11845
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: ozone
>            Reporter: Weiwei Yang
>            Assignee: Weiwei Yang
>            Priority: Minor
>
> When start SCM and DN, there is always an error in SCM log
> {noformat}
> 17/05/17 15:19:59 WARN ipc.Server: IPC Server handler 9 on 9861, call Call#4 Retry#0
org.apache.hadoop.ozone.protocol.StorageContainerDatanodeProtocol.getVersion from 172.16.165.133:44824:
output error
> 17/05/17 15:19:59 INFO ipc.Server: IPC Server handler 9 on 9861 caught an exception
> java.nio.channels.ClosedChannelException
> 	at sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:270)
> 	at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:461)
> 	at org.apache.hadoop.ipc.Server.channelWrite(Server.java:3216)
> 	at org.apache.hadoop.ipc.Server.access$1600(Server.java:135)
> 	at org.apache.hadoop.ipc.Server$Responder.processResponse(Server.java:1463)
> 	at org.apache.hadoop.ipc.Server$Responder.doRespond(Server.java:1533)
> 	at org.apache.hadoop.ipc.Server$Connection.sendResponse(Server.java:2581)
> 	at org.apache.hadoop.ipc.Server$Connection.access$300(Server.java:1605)
> 	at org.apache.hadoop.ipc.Server$RpcCall.doResponse(Server.java:931)
> 	at org.apache.hadoop.ipc.Server$Call.sendResponse(Server.java:765)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:876)
> 	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:813)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1965)
> 	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2659)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-help@hadoop.apache.org


Mime
View raw message