Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 38D55200BAD for ; Tue, 25 Oct 2016 11:02:00 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 37724160AFA; Tue, 25 Oct 2016 09:02:00 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 7C991160AE6 for ; Tue, 25 Oct 2016 11:01:59 +0200 (CEST) Received: (qmail 34336 invoked by uid 500); 25 Oct 2016 09:01:58 -0000 Mailing-List: contact yarn-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list yarn-issues@hadoop.apache.org Received: (qmail 34289 invoked by uid 99); 25 Oct 2016 09:01:58 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 25 Oct 2016 09:01:58 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 72E812C2A69 for ; Tue, 25 Oct 2016 09:01:58 +0000 (UTC) Date: Tue, 25 Oct 2016 09:01:58 +0000 (UTC) From: "Varun Saxena (JIRA)" To: yarn-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication() MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 25 Oct 2016 09:02:00 -0000 [ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15604718#comment-15604718 ] Varun Saxena commented on YARN-5773: ------------------------------------ Is there any need to activate applications on recovery ? Cluster resources will anyways be 0 on recovery as resource tracker service has not yet started. We can however check for cluster resources or user limit right in the beginning while activating applications and come out of it if applicable resources are 0. That will have same impact on recovery. Overall i.e. in normal flow, to optimize activateApplications, Wangda's suggestion sounds good. But ordering policy will have to be maintained as well. Right ? > RM recovery too slow due to LeafQueue#activateApplication() > ----------------------------------------------------------- > > Key: YARN-5773 > URL: https://issues.apache.org/jira/browse/YARN-5773 > Project: Hadoop YARN > Issue Type: Bug > Reporter: Bibin A Chundatt > Assignee: Bibin A Chundatt > Priority: Critical > Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch > > > # Submit application 10K application to default queue. > # All applications are in accepted state > # Now restart resourcemanager > For each application recovery {{LeafQueue#activateApplications()}} is invoked.Resulting in AM limit check to be done even before Node managers are getting registered. > Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application {{50000000}} iterations causing time take for Rm to be active more than 10 min. > Since NM resources are not yet added to during recovery we should skip {{activateApplicaiton()}} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org For additional commands, e-mail: yarn-issues-help@hadoop.apache.org