hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increasing number of reducer resource requests
Date Thu, 05 May 2016 02:41:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15271812#comment-15271812
] 

Haibo Chen commented on MAPREDUCE-6689:
---------------------------------------

We saw an instance of the similar problem previously as well, but we were not able to investigate
this in great details because logs were overridden quickly. Do you still have the MR AM logs?

MAPREDUCE-6514 has already been created to fix the issue of not updating the requests in clearAllPendingReduceRequests().

> MapReduce job can infinitely increasing number of reducer resource requests
> ---------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6689
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>
> We have seen this issue from one of our clusters: when running terasort map-reduce job,
some mappers failed after reducer started, and then MR AM tries to preempt reducers to schedule
these failed mappers.
> After that, MR AM enters an infinite loop, for every RMContainerAllocator#heartbeat run,
it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. (total scheduled
reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every MRAM-RM
heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so we get 18 * 3600 * 1024
~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message