hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Lowe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely increase number of reducer resource requests
Date Fri, 06 May 2016 14:34:13 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274118#comment-15274118
] 

Jason Lowe commented on MAPREDUCE-6689:
---------------------------------------

Sorry for arriving late to the discussion.

bq. Should we really be ramping up if we have hanging map requests irrespective of configuration
value of reduce rampup limit ?

Ramping up reducers when maps are hanging does sound a bit dubious, but it may make sense
in some scenarios.  Consider a case where a job issues tons of maps, far more than the queue
can handle.  Some of those maps are going to appear to be hanging for a very long time because
they have to run in multiple waves.  The whole point of ramping up reducers before the maps
are complete is to try to reduce job latency (at the expense of overall cluster throughput)
by pipelining the shuffle of the completed tasks with the remaining map tasks.  If the job
has tons of data to shuffle for each map then it may make sense to sacrifice some of the map
resources to get the reducers running early so they can start chewing on the horde of completed
map output.  It all depends upon the map durations, the shuffle burden, etc.  It is definitely
safer from a correctness point of view to avoid ramping up reducers if there are any hanging
maps at all, but I believe there could be some jobs whose latency could increase as a result
of that change.

I'm guessing the root cause of the issue is an incorrect headroom report, e.g.: there's technically
enough free space in the headroom but it's fragmented across nodes in such a way that no single
map can fit on any node.  The unconditional preemption logic from MAPREDUCE-6302 was supposed
to address this, but it looks like the container allocator can quickly "forget" this decision
and re-schedule the reducers that were shot.

> MapReduce job can infinitely increase number of reducer resource requests
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6689
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>         Attachments: MAPREDUCE-6689.1.patch
>
>
> We have seen this issue from one of our clusters: when running terasort map-reduce job,
some mappers failed after reducer started, and then MR AM tries to preempt reducers to schedule
these failed mappers.
> After that, MR AM enters an infinite loop, for every RMContainerAllocator#heartbeat run,
it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. (total scheduled
reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every MRAM-RM
heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so we get 18 * 3600 * 1024
~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling anything.
> Thanks to [~sidharta-s] for helping with analysis. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message