hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bibin A Chundatt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
Date Thu, 27 Oct 2016 02:13:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15610374#comment-15610374

Bibin A Chundatt commented on YARN-5773:

IIUC SchedulerApplicationAttempt#isRecovering is set only in following case .App is in ACCEPTED
state i am not sure we will get isRecovery=true
        // We will replay the final attempt only if last attempt is in final
        // state but application is not in final state.
        if (rmApp.getCurrentAppAttempt() == appAttempt
            && !RMAppImpl.isAppInFinalState(rmApp)) {
          // Add the previous finished attempt to scheduler synchronously so
          // that scheduler knows the previous attempt.
          appAttempt.scheduler.handle(new AppAttemptAddedSchedulerEvent(
            appAttempt.getAppAttemptId(), false, true));
          (new BaseFinalTransition(appAttempt.recoveredFinalState)).transition(
              appAttempt, event);
should we update AM diagnostics if we return right from beginning of activateApplications
Cluster resource UI is self explanatory, so required be add ??
Also, in the patch LOG.debug statements should be guarded with LOG.isDebugEnabled check
Will update the same in next patch

> RM recovery too slow due to LeafQueue#activateApplication()
> -----------------------------------------------------------
>                 Key: YARN-5773
>                 URL: https://issues.apache.org/jira/browse/YARN-5773
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch, YARN-5773.003.patch
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is invoked.Resulting
in AM limit check to be done even before Node managers are getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application   {{50000000}}
iterations causing time take for Rm to be active more than 10 min.
> Since NM resources are not yet added to during recovery we should skip {{activateApplicaiton()}}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message