hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-801) MAPREDUCE framework should issue warning with too many locations for a split
Date Tue, 28 Jul 2009 21:10:14 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736295#action_12736295

Doug Cutting commented on MAPREDUCE-801:

In PIG-878 Arun made a sensible suggestion, that the number of locations of a split should
not be greater than the replication level of the file.  This could be checked by FileInputFormat.

Another approach might be to add counters for rack-local and local task placements and i/o.
 If the tasks are placed locally but the i/o is not done locally, that's a bad sign.

> MAPREDUCE framework should issue warning with too many locations for a split
> ----------------------------------------------------------------------------
>                 Key: MAPREDUCE-801
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Hong Tang
> Customized input-format may be buggy and report misleading locations through input-split,
an example of which is PIG-878. When an input split returns too many locations, it would not
only artificially inflate the percentage of data local or rack local maps, but also force
scheduler to use more memory and work harder to conduct task assignment.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message