hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doug Cutting (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-801) MAPREDUCE framework should issue warning with too many locations for a split
Date Mon, 27 Jul 2009 18:07:16 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735690#action_12735690
] 

Doug Cutting commented on MAPREDUCE-801:
----------------------------------------

> discard location information completely when the number of locations reported by an input
split is greater than a threshold (e.g. 20).

This seems rather arbitrary to me, since one might reasonably increase the replication for
an input file to 20 or more, to, e.g., ensure local availability on every rack or node.


> MAPREDUCE framework should issue warning with too many locations for a split
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-801
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Hong Tang
>
> Customized input-format may be buggy and report misleading locations through input-split,
an example of which is PIG-878. When an input split returns too many locations, it would not
only artificially inflate the percentage of data local or rack local maps, but also force
scheduler to use more memory and work harder to conduct task assignment.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message