hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ramkumar Vadali (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2186) DistributedRaidFileSystem should implement getFileBlockLocations()
Date Mon, 23 May 2011 17:40:47 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13038086#comment-13038086

Ramkumar Vadali commented on MAPREDUCE-2186:

The main motivation to open this jira was to allow CombineFileInputFormat to work when there
are missing blocks. CombineFileInputFormat figures out the host/rack information for input
blocks and uses that information to create input splits. It does not handle the case where
a block does not have any host/rack information.

The proposed fix to return the location of parity blocks in the case where source blocks are
missing is not good because it is fixing the problem in the wrong place. It also causes us
to get false locality. 
Instead of changing RAID FS to handle this case, its better to fix CFIF to handle the case
when there are missing blocks (MAPREDUCE-2185)

> DistributedRaidFileSystem should implement getFileBlockLocations()
> ------------------------------------------------------------------
>                 Key: MAPREDUCE-2186
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2186
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/raid
>            Reporter: Ramkumar Vadali
>            Assignee: Ramkumar Vadali
> If a RAIDed file has missing blocks, DistributedRaidFileSystem.getFileBlockLocations()
would return no block locations. This could lead a client to believe that the file is not
readable. But if parity data is available, the file actually is readable.
> It would be better to implement getFileBlockLocations() and return the location of the
parity blocks that would be needed to reconstruct the missing block.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message