hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongjun Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3342) SocketTimeoutException in BlockSender.sendChunks could have a better error message
Date Sun, 05 Oct 2014 18:00:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14159609#comment-14159609
] 

Yongjun Zhang commented on HDFS-3342:
-------------------------------------

Hi [~tlipcon],

The reason I'm looking at this issue is, this is still happening in recent use and confused
user. Would you please help review the patch? Thaks a lot.

I was able to reproduce the issue and see the log:
{code}
14/10/04 21:12:04 INFO datanode.DataNode: Failed to send data: java.net.SocketTimeoutException:
480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
14/10/04 21:12:04 WARN datanode.DataNode: DatanodeRegistration(172.17.186.17, datanodeUuid=95f0a627-b010-453b-a432-c147d012c814,
infoPort=42075, ipcPort=42022, storageInfo=lv=-56;cid=CID-13a9b341-3a15-405e-8d07-a719ec9be2ac;nsid=1866275128;c=0):Got
exception while serving BP-326257059-172.17.186.17-1412481724026:blk_1073741825_1001 to /172.17.186.17:60227
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready
for write. ch : java.nio.channels.SocketChannel[connected local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
        at java.lang.Thread.run(Thread.java:724)
14/10/04 21:12:04 ERROR datanode.DataNode: haus03.sjc.cloudera.com:42010:DataXceiver error
processing READ_BLOCK operation  src: /172.17.186.17:60227 dst: /172.17.186.17:42010
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready
for write. ch : java.nio.channels.SocketChannel[connected local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:547)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:716)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:486)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:111)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:69)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:225)
        at java.lang.Thread.run(Thread.java:724)
{code}

I found that the top portion 
{code}
14/10/04 21:12:04 INFO datanode.DataNode: Failed to send data: java.net.SocketTimeoutException:
480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:60227]
{code} 
was introduced by HDFS-3555 for same issue. But the fix there still thows exception, which
is not handled, thus we are seeing the reported error.

I'm submitting a patch that made the output 
{code}
14/10/05 10:56:57 INFO datanode.DataNode: Failed to send data: java.net.SocketTimeoutException:
480000 millis timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:41933]
14/10/05 10:56:57 WARN datanode.DataNode: DatanodeRegistration(172.17.186.17, datanodeUuid=2a87010c-c9fe-4b4e-a249-0d8bf11a8f41,
infoPort=42075, ipcPort=42022, storageInfo=lv=-56;cid=CID-ba2f8c8b-7e49-4514-b74c-201c1e9508ad;nsid=1860548702;c=0):Got
exception while serving BP-269685814-172.17.186.17-1412528857362:blk_1073741825_1001 to /172.17.186.17:41933
java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready
for write. ch : java.nio.channels.SocketChannel[connected local=/172.17.186.17:42010 remote=/172.17.186.17:41933]
        at org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
        at org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
        at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:550)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:730)
        at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:677)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:490)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:116)
        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:71)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:226)
        at java.lang.Thread.run(Thread.java:724)
14/10/05 10:56:57 INFO datanode.DataNode: Likely the client has stopped reading, disconnecting
it (haus03.sjc.cloudera.com:42010:DataXceiver error processing READ_BLOCK operation  src:
/172.17.186.17:41933 dst: /172.17.186.17:42010; java.net.SocketTimeoutException: 480000 millis
timeout while waiting for channel to be ready for write. ch : java.nio.channels.SocketChannel[connected
local=/172.17.186.17:42010 remote=/172.17.186.17:41933])
{code}


> SocketTimeoutException in BlockSender.sendChunks could have a better error message
> ----------------------------------------------------------------------------------
>
>                 Key: HDFS-3342
>                 URL: https://issues.apache.org/jira/browse/HDFS-3342
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.0.0-alpha
>            Reporter: Todd Lipcon
>            Assignee: Yongjun Zhang
>            Priority: Minor
>
> Currently, if a client connects to a DN and begins to read a block, but then stops calling
read() for a long period of time, the DN will log a SocketTimeoutException "480000 millis
timeout while waiting for channel to be ready for write." This is because there is no "keepalive"
functionality of any kind. At a minimum, we should improve this error message to be an INFO
level log which just says that the client likely stopped reading, so disconnecting it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message