hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Sunil G (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
Date Wed, 26 Oct 2016 06:59:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607655#comment-15607655

Sunil G commented on YARN-5773:

In scheduler, as of now there are no event/apis to know whether recovery is done or not.
Basically apps could be submitted even when none of the nodes are registered. I understood
that the easy fix here is to skip invoking {{activateApplications}} during recovery. Its already
have no meaning in the recovery flow. So as I mentioned earlier and as in patch, we can have
Now the question here is to invoke  {{activateApplications}} after scheduler recovery is done.
Advantage of such a call is that scheduler will no longer need to worry about the chain of
sequential steps of RM recovery (starting active service as last in those steps or not). 
Since recovery is done event based, a direct api will not be correct. Rather, one more event
to be published. I will check the feasibility for this and update here.

> RM recovery too slow due to LeafQueue#activateApplication()
> -----------------------------------------------------------
>                 Key: YARN-5773
>                 URL: https://issues.apache.org/jira/browse/YARN-5773
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch, YARN-5773.003.patch
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is invoked.Resulting
in AM limit check to be done even before Node managers are getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application   {{50000000}}
iterations causing time take for Rm to be active more than 10 min.
> Since NM resources are not yet added to during recovery we should skip {{activateApplicaiton()}}

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org

View raw message