hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3710) last split generated by FileInputFormat.getSplits may not have the best locality
Date Mon, 23 Jan 2012 19:15:40 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Siddharth Seth updated MAPREDUCE-3710:

    Attachment: MR-3710_v1.txt

Patch for the last split.
Mapred.FileInputFormat seems to have additional optimizations for rack locality which don't
exist in mapreduce.FileInputFormat. The patch does not include a fix for this.
> last split generated by FileInputFormat.getSplits may not have the best locality
> --------------------------------------------------------------------------------
>                 Key: MAPREDUCE-3710
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3710
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv1, mrv2
>    Affects Versions: 0.23.0, 1.0.0
>            Reporter: Siddharth Seth
>            Assignee: Siddharth Seth
>         Attachments: MR-3710_v1.txt
> The last split generated by FileInputFormat.getSplits considers {{blkLocations.length-1}}
to be the hosts for the split.
> The last split may be larger than the rest (SPLIT_SLOP=1.1 by default) - in which case
locality is picked up from a smaller block.
> e.g. 1027MB file with a 128MB split size. The last split ends up being 131MB. The hosts
for locality end up being the nodes containing the 3MB block instead of the 128MB block.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message