hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests
Date Fri, 06 May 2016 17:37:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274416#comment-15274416

Varun Saxena commented on MAPREDUCE-6689:

If we do introduce a config to decide whether maps have been starved or not(and hence not
ramp up reducers), this will have to be tuned according to job(not only based on type of job
but even the size of data it processes in each run. And several other factors). 
I do see that it will be almost impossible to accurately decide a correct value for such a

We do have fix made in MAPREDUCE-6514 in our private branch since several months. But do not
yet have MAPREDUCE-6302 in.
Let us see how the recent fixes alongwith MAPREDUCE-6302 go on a real cluster. I think it
should cover most of the scenarios.

> MapReduce job can infinitely increase number of reducer resource requests
> -------------------------------------------------------------------------
>                 Key: MAPREDUCE-6689
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>         Attachments: MAPREDUCE-6689.1.patch
> We have seen this issue from one of our clusters: when running terasort map-reduce job,
some mappers failed after reducer started, and then MR AM tries to preempt reducers to schedule
these failed mappers.
> After that, MR AM enters an infinite loop, for every RMContainerAllocator#heartbeat run,
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. (total scheduled
reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every MRAM-RM
heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so we get 18 * 3600 * 1024
~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling anything.
> Thanks to [~sidharta-s] for helping with analysis. 

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org

View raw message