hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uma Maheswara Rao G (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-693) java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be ready for write exceptions were cast when trying to read file via StreamFile.
Date Thu, 03 Mar 2011 12:51:36 GMT

    [ https://issues.apache.org/jira/browse/HDFS-693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13001979#comment-13001979
] 

Uma Maheswara Rao G commented on HDFS-693:
------------------------------------------

In our observation this issue came in long run with huge no of blocks in Data Nodes . every
hour Data Nodes are sending their blocks report to the Name Node. If number of blocks in Data
Node are huge (3 Data Nodes with 2GB RAM, Scribe server is sending logs at 5000records/s ,
4 scribe clients , block size is 64MB ) then it requires good amount of time to scan all the
blocks. This block scanning causes lot of IO operations. At this time if any write request
comes , then it will take long time for it to get a free io channel on the Data Node. Because
of this during the blcock scan time a Data Node may not be able to acknowledge the client
requests causing timeouts   on the client sockets.
 If DN1 send the data to DN2 for replication and at that time DN2 is doing the block scanning.
Since DN2 is busy, it may not be able to send the ack to DN1 on time. So here timeouts can
happen. 


> java.net.SocketTimeoutException: 480000 millis timeout while waiting for channel to be
ready for write exceptions were cast when trying to read file via StreamFile.
> --------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-693
>                 URL: https://issues.apache.org/jira/browse/HDFS-693
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 0.20.1
>            Reporter: Yajun Dong
>         Attachments: HDFS-693.log
>
>
> To exclude the case of network problem, I found the count of  dataXceiver is about 30.
 Also, I could see the output of netstate -a | grep 50075 has many TIME_WAIT status when this
happened.
> partial log in attachment. 

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message