hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5286) DFS client blocked for a long time reading blocks of a file on the JobTracker
Date Fri, 20 Feb 2009 20:07:02 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675458#action_12675458
] 

Hairong Kuang commented on HADOOP-5286:
---------------------------------------

I was wrong about JobTracker reading the split file multiple times. Retry is done by the DFSClient
to recover from read error. DataNode does have write timeout introuduced by HADOOP-2346. The
default timeout is 8 mins. In case of read failure, DFSClient retries 3 datanodes which are
different if different ones are available. With HADOOP-3831, each datanode is read 2 times.
So DFSClient retries 6 times before it declares a read failure. in this case, it seems that
the 5th or 6th retry succeeded but reading took nearly 1 hour.

Hemanth, could you please provide related datanode logs to see what was really happened there?


> DFS client blocked for a long time reading blocks of a file on the JobTracker
> -----------------------------------------------------------------------------
>
>                 Key: HADOOP-5286
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5286
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.20.0
>            Reporter: Hemanth Yamijala
>         Attachments: jt-log-for-blocked-reads.txt
>
>
> On a large cluster, we've observed that DFS client was blocked on reading a block of
a file for almost 1 and half hours. The file was being read by the JobTracker of the cluster,
and was a split file of a job. On the NameNode logs, we observed that the block had a message
as follows:
> Inconsistent size for block blk_2044238107768440002_840946 reported from <ip>:<port>
current size is 195072 reported size is 1318567
> Details follow.
>  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message