hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-1950) Blocks that are under construction are not getting read if the blocks are more than 10. Only complete blocks are read properly.
Date Mon, 23 May 2011 15:01:49 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13037981#comment-13037981

Hadoop QA commented on HDFS-1950:

-1 overall.  Here are the results of testing the latest attachment 
  against trunk revision 1126312.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/hudson/job/PreCommit-HDFS-Build/612//console

This message is automatically generated.

> Blocks that are under construction are not getting read if the blocks are more than 10.
Only complete blocks are read properly. 
> --------------------------------------------------------------------------------------------------------------------------------
>                 Key: HDFS-1950
>                 URL: https://issues.apache.org/jira/browse/HDFS-1950
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: hdfs client, name-node
>    Affects Versions: 0.20-append
>            Reporter: ramkrishna.s.vasudevan
>             Fix For: 0.20-append
>         Attachments: HDFS-1950-2.patch
> Before going to the root cause lets see the read behavior for a file having more than
10 blocks in append case.. 
> Logic: 
> ==== 
> There is prefetch size dfs.read.prefetch.size for the DFSInputStream which has default
value of 10 
> This prefetch size is the number of blocks that the client will fetch from the namenode
for reading a file.. 
> For example lets assume that a file X having 22 blocks is residing in HDFS 
> The reader first fetches first 10 blocks from the namenode and start reading 
> After the above step , the reader fetches the next 10 blocks from NN and continue reading

> Then the reader fetches the remaining 2 blocks from NN and complete the write 
> Cause: 
> ======= 
> Lets see the cause for this issue now... 
> Scenario that will fail is "Writer wrote 10+ blocks and a partial block and called sync.
Reader trying to read the file will not get the last partial block" . 
> Client first gets the 10 block locations from the NN. Now it checks whether the file
is under construction and if so it gets the size of the last partial block from datanode and
reads the full file 
> However when the number of blocks is more than 10, the last block will not be in the
first fetch. It will be in the second or other blocks(last block will be in (num of blocks
/ 10)th fetch) 
> The problem now is, in DFSClient there is no logic to get the size of the last partial
block(as in case of point 1), for the rest of the fetches other than first fetch, the reader
will not be able to read the complete data synced...........!! 
> also the InputStream.available api uses the first fetched block size to iterate. Ideally
this size has to be increased

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message