hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bibin A Chundatt (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
Date Mon, 24 Oct 2016 18:51:58 GMT

    [ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15602871#comment-15602871
] 

Bibin A Chundatt commented on YARN-5773:
----------------------------------------

Thank you [~leftnoteasy] for review comment.

{quote}
I'm not sure if this is safe: activeApplication is majorly to avoid too many applications
are running inside one queue. if we skip the AM limit check for recovering apps, it looks
like some problem may occur. apps,
{quote}
Yes.we should not skip activate application.

RM restart issue with too many pending apps was the main intention of this jira. If too many
pending apps in leaf queue and RM is restarted for each app attempt submit event the Leaf#activateApplication()
gets invoked and for each pending apps the am limit is checked. Restart time increases as
the number of apps increases consuming too much time on restart.

Will handle following two
# If cluster resource is zero don't check AM limit.
# Skip all apps if queue's AM limit is reached.
Will upload a patch soon

> RM recovery too slow due to LeafQueue#activateApplication()
> -----------------------------------------------------------
>
>                 Key: YARN-5773
>                 URL: https://issues.apache.org/jira/browse/YARN-5773
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is invoked.Resulting
in AM limit check to be done even before Node managers are getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application   {{50000000}}
iterations causing time take for Rm to be active more than 10 min.
> Since NM resources are not yet added to during recovery we should skip {{activateApplicaiton()}}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message