Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Date: Tue, 25 Oct 2016 09:01:58 +0000 (UTC)
From: "Varun Saxena (JIRA)" <jira@apache.org>
To: yarn-issues@hadoop.apache.org
Message-ID: <JIRA.13014710.1477314475000.77090.1477386118467@Atlassian.JIRA>
In-Reply-To: <JIRA.13014710.1477314475000@Atlassian.JIRA>
References: <JIRA.13014710.1477314475000@Atlassian.JIRA> <JIRA.13014710.1477314475907@arcas>
Subject: [jira] [Commented] (YARN-5773) RM recovery too slow due to
 LeafQueue#activateApplication()
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
archived-at: Tue, 25 Oct 2016 09:02:00 -0000


    [ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604718#comment-15604718 ] 

Varun Saxena commented on YARN-5773:
------------------------------------

Is there any need to activate applications on recovery ? Cluster resources will anyways be 0 on recovery as resource tracker service has not yet started.
We can however check for cluster resources or user limit right in the beginning while activating applications and come out of it if applicable resources are 0. That will have same impact on recovery.

Overall i.e. in normal flow, to optimize activateApplications, Wangda's suggestion sounds good. But ordering policy will have to be maintained as well. Right ?

> RM recovery too slow due to LeafQueue#activateApplication()
> -----------------------------------------------------------
>
>                 Key: YARN-5773
>                 URL: https://issues.apache.org/jira/browse/YARN-5773
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is invoked.Resulting in AM limit check to be done even before Node managers are getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application   {{50000000}} iterations causing time take for Rm to be active more than 10 min.
> Since NM resources are not yet added to during recovery we should skip {{activateApplicaiton()}} 


--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org