hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Malaska (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6367) UniformSizeInputFormat skews left over bytes to last split
Date Mon, 18 May 2015 00:40:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14547421#comment-14547421

Ted Malaska commented on MAPREDUCE-6367:

After more review this bug was in a different implementation.  

Please close and kill this jira.  thanks

> UniformSizeInputFormat skews left over bytes to last split
> ----------------------------------------------------------
>                 Key: MAPREDUCE-6367
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6367
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.6.0, 2.5.2
>            Reporter: Ted Malaska
>            Assignee: Ted Malaska
>            Priority: Minor
> In UniformSizeInputFormat it is trying to get equal amount of bytes to every split. But
the logic today will result in every split having a little less then the perfect amount and
that left over from every split will be put into the last split.
> Resulting in a large skew for the last split.
> Below if the area of the code that is affected:
> https://github.com/apache/hadoop/blob/9ae7f9eb7baeb244e1b95aabc93ad8124870b9a9/hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/UniformSizeInputFormat.java#L98
> The fix would be to change the following line:
> currentSplitSize += srcFileStatus.getLen();
> to
> currentSplitSize += srcFileStatus.getLen() + (currentSplitSize - nBytesPerSplit);

This message was sent by Atlassian JIRA

View raw message