hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Varun Saxena (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (YARN-5773) RM recovery too slow due to LeafQueue#activateApplication()
Date Thu, 27 Oct 2016 16:26:59 GMT

    [ https://issues.apache.org/jira/browse/YARN-5773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15612408#comment-15612408
] 

Varun Saxena edited comment on YARN-5773 at 10/27/16 4:26 PM:
--------------------------------------------------------------

bq. I did verify the same when cluster resource is empty and submit application.The first
application attempt is activated.
Yes, that will happen right now because we are checking for am limits inside loop. But its
not required in my opinion when we check for overall cluster resources. Its not that its a
bug per say in your current patch, its just that condition in my opinion is unnecessary. It
is just meant to be there for some other reason. 
As the log suggests it is kept there to cover cases where maximum am resource percent is kept
too low for a queue. We do not want to block apps in this case. 
When overall cluster resources are 0, not even 1 application being activated is because cluster
resources are 0, not because am resource percent is insufficient.

Thoughts ?
{code}
            LOG.warn("maximum-am-resource-percent is insufficient to start a"
                + " single application in queue, it is likely set too low."
                + " skipping enforcement to allow at least one application"
                + " to start");
{code}



was (Author: varun_saxena):
bq. I did verify the same when cluster resource is empty and submit application.The first
application attempt is activated.
Yes, that will happen right now. But its not required in my opinion. Its not a bug per say.
It is just meant to be there for some other reason. 
As the log suggests it is kept there to cover cases where maximum am resource percent is kept
too low for a queue. We do not want to block apps in this case. 
When overall cluster resources are 0, not even 1 application being activated is because cluster
resources are 0, not because am resource percent is insufficient.

Thoughts ?
{code}
            LOG.warn("maximum-am-resource-percent is insufficient to start a"
                + " single application in queue, it is likely set too low."
                + " skipping enforcement to allow at least one application"
                + " to start");
{code}


> RM recovery too slow due to LeafQueue#activateApplication()
> -----------------------------------------------------------
>
>                 Key: YARN-5773
>                 URL: https://issues.apache.org/jira/browse/YARN-5773
>             Project: Hadoop YARN
>          Issue Type: Bug
>            Reporter: Bibin A Chundatt
>            Assignee: Bibin A Chundatt
>            Priority: Critical
>         Attachments: YARN-5773.0001.patch, YARN-5773.0002.patch, YARN-5773.0004.patch,
YARN-5773.0005.patch, YARN-5773.003.patch
>
>
> # Submit application 10K application to default queue.
> # All applications are in accepted state
> # Now restart resourcemanager
> For each application recovery {{LeafQueue#activateApplications()}} is invoked.Resulting
in AM limit check to be done even before Node managers are getting registered.
> Total iteration for N application is about {{N(N+1)/2}} for {{10K}} application   {{50000000}}
iterations causing time take for Rm to be active more than 10 min.
> Since NM resources are not yet added to during recovery we should skip {{activateApplicaiton()}}




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: yarn-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: yarn-issues-help@hadoop.apache.org


Mime
View raw message