hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "eric baldeschwieler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-801) MAPREDUCE framework should issue warning with too many locations for a split
Date Wed, 29 Jul 2009 06:18:15 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736464#action_12736464

eric baldeschwieler commented on MAPREDUCE-801:

Hi Doug,

I think we are making the perfect the enemy of the good here.  A real bug existed that cost
us performance.  Having 20 options on placement is not going to improve scheduling noticeably.
 Having hundreds can bring down the centralize resources of the system and even 20 would cause
lots of completely unneeded work in the JT for little gain. 

I'd like to see us discard anything beyond the first 5 options in the JT just to keep bugs
from DOSing the central server.  I am not aware of any use case where this would hinder performance.
 Having a warning and truncating this list would have saved use a lot of resource and time.

The system is full of numbers.  Sometime it is simpler to harden the system then ID general
principles.  There are many places in the system where I think this would be the wrong approach,
but huge huge split lists are much more likely to be the result of bugs or ignorance than

If we inject a warning and anyone hits the case, we can then do more work to enhance this.


> MAPREDUCE framework should issue warning with too many locations for a split
> ----------------------------------------------------------------------------
>                 Key: MAPREDUCE-801
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-801
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>            Reporter: Hong Tang
> Customized input-format may be buggy and report misleading locations through input-split,
an example of which is PIG-878. When an input split returns too many locations, it would not
only artificially inflate the percentage of data local or rack local maps, but also force
scheduler to use more memory and work harder to conduct task assignment.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message