Mailing-List: contact mapreduce-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Sat, 7 May 2016 00:21:12 +0000 (UTC)
From: "Hudson (JIRA)" <jira@apache.org>
To: mapreduce-issues@hadoop.apache.org
Message-ID: <JIRA.12965142.1462408705000.133918.1462580472987@Atlassian.JIRA>
In-Reply-To: <JIRA.12965142.1462408705000@Atlassian.JIRA>
References: <JIRA.12965142.1462408705000@Atlassian.JIRA> <JIRA.12965142.1462408705216@arcas>
Subject: [jira] [Commented] (MAPREDUCE-6689) MapReduce job can infinitely
 increase number of reducer resource requests
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Sat, 07 May 2016 00:21:15 -0000


    [ https://issues.apache.org/jira/browse/MAPREDUCE-6689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15274965#comment-15274965 ] 

Hudson commented on MAPREDUCE-6689:
-----------------------------------

FAILURE: Integrated in Hadoop-trunk-Commit #9731 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9731/])
MAPREDUCE-6689. MapReduce job can infinitely increase number of reducer (jlowe: rev c9bb96fa81fc925e33ccc0b02c98cc2d929df120)
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/test/java/org/apache/hadoop/mapreduce/v2/app/rm/TestRMContainerAllocator.java
* hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapreduce/v2/app/rm/RMContainerAllocator.java


> MapReduce job can infinitely increase number of reducer resource requests
> -------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-6689
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6689
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Wangda Tan
>            Assignee: Wangda Tan
>            Priority: Blocker
>             Fix For: 2.8.0, 2.7.3, 2.6.5
>
>         Attachments: MAPREDUCE-6689.1.patch
>
>
> We have seen this issue from one of our clusters: when running terasort map-reduce job, some mappers failed after reducer started, and then MR AM tries to preempt reducers to schedule these failed mappers.
> After that, MR AM enters an infinite loop, for every RMContainerAllocator#heartbeat run, it:
> - In {{preemptReducesIfNeeded}}, it cancels all scheduled reducer requests. (total scheduled reducers = 1024)
> - Then, in {{scheduleReduces}}, it ramps up all reducers (total = 1024).
> As a result, we can see total #requested-containers increased 1024 for every MRAM-RM heartbeat (1 sec per heartbeat). The AM is hanging for 18+ hours, so we get 18 * 3600 * 1024 ~ 66M+ requested containers in RM side.
> And this bug also triggered YARN-4844, which makes RM stop scheduling anything.
> Thanks to [~sidharta-s] for helping with analysis. 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org